Architecture
In this port , I’m going to talk about the FAST Search components and how they are interacting with each other , I recommend to read this article before start deploying FAST search as it should give a good guidance and understanding for your deployment plan:
- FAST Content Search Service Application (SSA):
Is the default indexing connector and retrieves content from various content sources such as SharePoint content repositories, Web servers, Exchange folders, line of business data and file servers.
its worth to mention that this is a SharePoint service applications that will run within SharePoint farm
- FAST Query Search Service Application (SSA):
handle query results from users.
its worth to mention that this is a SharePoint service applications that will run within SharePoint farm
Note: for large FAST deployments I recommend to dedicate a special SharePoint application server to host the Content SSA and Query SSA service applications
The rest of the below components are FAST components:
- Content Distributor:
Content distributor communicates with the indexing connectors and organizes the feeding of documents from indexing connectors to the indexing service. You can set up a primary and a backup content distributor for fault-tolerance.
- Item Processing: (document processor) :
The item processing component receives items to be indexed from indexing connectors. and process the items according to the given configuration. It then sends the processed items to the indexing service ,item processing responsible for:
- Extract properties from crawled content (name, date) or custom property.
- Mapping crawled property to managed property
- Extract content searchable text from document format (word, excel, pdf).
- Linguistic processing: which is the use of information about the structure and variation of languages
- Web Analyzer:
It analyzes search clickthroughs logs and hyperlink structures. Both contribute to better ranked search results as the following:
- Items that show many clicks in the search clickthroughs log are popular and therefore receive better rank scores than less-viewed items.
- Items that are linked to from many other items are also perceived to be more relevant for the user and therefore receive better rank scores.
- Search Cluster:
- The search cluster provides the main topology for indexing and query matching
- These components require their own scaling models using a matrix of servers in a row/column configuration.
- The figure illustrates a deployment with four rows: two pure indexer rows and two pure search rows. In real deployments, it is common to combine indexer and search rows in order to reduce hardware footprint.
Below is a Real deployment scenario using same server role for indexing and Query matching in 3 servers topology will look like below where:
- One Admin server is used to host admin components like (content distributor, web analyzer, indexing dispatcher)
Primary and backup indexer You can configure a backup indexer node for fault tolerance.
One row in the configuration file is used to determine Search rows and indexer rows like in below example is a pure index server. <row id="0" index="primary" search="false" />
- Search row corresponds to Search=”true”. If marked true, Query Matching component is deployed on that row.
- An indexer row can have the roles primary, backup, and none , An indexer row can also be a search row
- A <row> tag can be defined as a primary, secondary or none using the index attributed, There can only be one primary index row in the topology:
- Marking the <row> tag as primary indicates this is the primary row of servers with Indexing components.
- Marking the <row> tag as secondary means the row is redundant index row.
- marking the <row> tag with the value of “none” means there is no Indexing components on this row.
- Indexing:
- The indexing component creates inverted indexes, based on the items that it receives. The indexing component sends these inverted indexes to the query matching component for later use during query evaluation. (indexer is like crawler in SharePoint).
- Each index column will contain one part of the index, and the combined set of index columns will form the complete index. In this case, each indexing node will handle only a part of the whole index.
- Additionally, backup indexing nodes can provide fault tolerance.
- Query Matching:The query matching service uses the inverted indexes created by the indexing service to retrieve the items that match query and then return these items as a query hit list.
- The query matching service looks up each term in the index and retrieves a list of items in which that term appears.
- Resolve Query operators, such as AND and OR, the query hit list will consist of the set of items that contain all the terms. The order of the returned items is based on the requested sorting mechanism.
- The query matching service is responsible for the deep refinement that is associated with query results.
- You can deploy the query matching service in a row/column setup to achieve fault-tolerance and scaling in content and query volume.
- Index columns provide ways to scale out for content volume, by partitioning the overall index into a set of disjoint columns.
- Search rows provide ways to scale out for query volume, by duplicating the same partition of the index across more than one query matching node.
Search Cluster Example:
In below example you can see 4 servers , you can see one Indexing node-X in row 1 with a corresponding Query Matching node-x in row 2 , with fault tolerance applied , the below is called Search cluster, where each index row should contain a complete set of indexing (full index = index-X + index-Y) , mapped to a corresponding querying matching component in the same column. (Index-X in row1 is mapped to Query Matching-X in row 2)
- Query processing
- Is triggered when users hit on search button , The query processing component performs pre-processing of queries and post-processing of results.
- contains Result processing that includes merging the results from multiple index columns.
- It performs formatting for the query hit list, formatting the query refinement data, and removing duplicates.
- The query processing component interacts with the FAST Search Authorization (FSA) component to make sure that the user performing a query sees only the results that he or she is authorized to see.
- The query processing service can be scaled out across multiple nodes to handle fault-tolerance and more queries per second.
- Administration
The SharePoint Server 2010 Central Administration and site collection user interfaces provide the administrative interfaces for managing the FAST Search Server 2010 for SharePoint deployment and features.
The administration component contains functionality to control the search experience, such as determining how to perform property extraction, ascertaining which synonyms to use, and determining which items to use as best bets.
Consist of the below :
- Index schema administration : The index schema contains all the configuration entities that are needed to generate the configuration files that are related to the index schema for all the other services in the system.
- The index schema controls which managed properties of an item will be indexed
- how the properties will be indexed .
- The rank profile is a part of the index schema that controls how the query hit list will be sorted by relevancy.
- Fast Search Authorization (FSA): The FAST Search Authorization (FSA) manager is a part of the administration service that manages user authorization for indexed content.
Ensures that only items that a user is entitled to read appear in the query results.
The FSA manager communicates with Claims services, Active Directory services or other LDAP based directory services to manage the authorization process.
FAST Search Web Crawler
The FAST Search Web crawler is an optional indexing connector that can be used for complex Web crawl scenarios involving a mix of Internet and Intranet sites.
The FAST Search Web crawler reads Web pages and follows links on the pages to process a complete Web of items. It then passes the retrieved items to the item processing service.
References: