Luca Vassalli's Website - Home page



2.Ethical SEO

3.Spider's view

4.SEO spam

5.General topics


2. Ethical SEO

2.1 How a search engine actually works

Search engines use automated software programs, known as spiders or crawlers, to explore the Web and build their databases; a spider analyses page after page following the links which connect them each others. Then every page is added to a giant database, sometimes called the catalog, indexing it with its major keywords. A keyword is a word which is relevant to the content of the page itself or it is an expression which describes it. Usually the spider starts its research from the most relevant pages already in the database, so that pages already highly ranked are examined more frequently. When a new site is created the webmaster should submit a form to the major search engines to speed up the time to be included in the result list for the relevant searches. Anyway since the Web include a quite fair amount of pages, if you changed something in your site to increase its ranking, it might take a month or even more the effects to be seen in its ranking.
After a page is in the database it is ready to be presented in a search, as a result every time a query is made, there is a program inside the search engine which sifts through the millions of pages, recorded in the index, to find matches to the search.
The final step is retrieving the results: after a user makes a query the indexed pages are present to him in the ranking order, it appears already clear that the title of the page, which will be shown in the list of results, is important in the ranking of the page, for that reason a page is penalized if title and content mismatch. There are several different ranking algorithms used by different search engines, in fact the same query on different engines will usually have a different result. These algorithms are kept secret to avoid that studying them will help to find an easy way to get better position in the result list, so, if the general rules we will consider further on, can be considered always valid, you should keep in mind that achieving the top ranks requires optimizing in different way for different search engines. For instance roughly Google considers more important the links, Yahoo the keywords; links and keywords are in somehow relevant in both the search engines but with a different weight, thus what may be a good keywords optimization for Yahoo, will probably be over optimization for Google.

I have just described how the so called crawler based search engine works; actually there are at least other two kinds of source of information: the human-powered directories and the Hybrid Search Engines. In the first case the page are inserted in the database by humans, consequently the SEO rules has no power to increase the ranking which is driven by the quality of the websites; an example is Yahoo directory. In the second case, like suggested by the name, both the mechanisms can be used. For instance in the 2002 MSN Search presented human-powered listings from LookSmart but it also used a crawled based search for the more obscure queries. This hybrid category is disappearing but some minor search engines still adopt this approach.

Nowadays the three main search engines are Google, Yahoo and MSN and they are all crawler based. During July 2006, it is calculated that about 50% of the global searches are run by Google; about the 25% by Yahoo and 10% by MSN thus the space for the other dozens of search engines is really thin. Google is considered the outstanding best engine, no surprise if since it was born its ranking algorithm influenced forever the ranking algorithms of all the other search engines, therefore no dissertation about the working mechanisms of a search engine can avoid to speak about it.