How search engines work

How search engines work

The search engine works in the following order: 1) indexing; 2) Depth First Search (DFS); 3) Fresh Breadth First (BFS) research; 4) indexing; 5) Search.
Internet search engines work by storing information about a large number of websites that they retrieve from the World Wide Web itself. These pages are downloaded by a crawler (also called a spider), an automated web browser that follows every link it encounters. The exclusion can be done via the robots.txt file. It then analyzes the content of each page to determine how it is indexed. Data about web pages is stored in the index database for use in subsequent queries.

Some search engines, such as Google, store the entire source page or part of it (called a cache) as well as information about web pages, while others, such as AltaVista, store every word of every page found. This cached page still stores the actual search text because that was actually indexed. This can therefore be very useful when the content of the current page has been updated and the search terms are no longer present.

This issue can be viewed as a mild form of link rot, and Google’s solution increases usability by meeting users’ expectations that search terms will be included in the returned web page. This satisfies the principle of least surprise, as the user generally expects the search terms to be included in the returned pages. The increased relevance in search makes these cached pages very useful, even beyond the fact that they may contain data that may no longer be available elsewhere.

When a user accesses a search engine and enters a query, usually specifying keywords, the search engine retrieves an index and displays a list of the web pages that best match his or her criteria, usually with a brief summary describing the Title of the document and the following sometimes contain excerpts of text. Most search engines support the use of logical terms AND, OR, and NOT to further define a search term. An advanced feature is proximity search, which allows you to set the distance between keywords.