how search engines work

Search engines, on the web, store information from a large number of web pages. These web pages are retrieved from the WWW. They are retrieved by a web crawler (commonly also known as a spider).

Mustangs
Unlimited
ASWP
Talk Pets
Pets

A 'spider' automatically crawls the web, by following every link it finds. Exclusions are made using a robots.txt file. The contents of each web page is analyzed to determine how it is indexed (for example, words are extracted from titles, headings, or special fields - i.e. meta tags).

Forum
Member
Ayonae
Theater
Mod

Web page data is stored in an index database, for use in later queries. Some search engines, like Google, store all or part of the source web page (referred to as a cache). This includes information about the web pages. Other search engines store every word of every web page it finds, like AltaVista.

LiveJournal
LiveJournal Layouts
Xanga
Xanga Layouts
Backgrounds Layouts

The cached page holds the actual search text since it is indexed. This is very useful when the content of the current web page has been updated and the search terms are no longer in there. Some consider this problem a mild form of linkrot.

Myspace
Myspace Layouts
Layouts
Backgrounds
Myspace Backgrounds

Google's handling of linkrot increases 'freshness' by attempting to keep current whether the search terms are on the returned web page. This satisfies the principle of least astonishment. The user expects search terms to be on the returned pages, so Google attempts to not satisfy this. Cached pages help increase search relevance, which mitigates linkrot.

Fri May 12, 2006 5:34 pm MST by PioneerGold | Email Article |

Return to Main Page

how search engines work