General Information
Search engines basically use a robot, spider or web crawler to gather and index information and then break down usable results based on keywords.
Website owners can also add their URL to a search engines database based
on information submitted via a submitting form but it's still the robots that retrieve and index the
web page.
Once data has been collected, their index is updated and keywords are placed in their directory for search results. This can take anywhere from a few days
to a few months depending on the service collecting the information.
A type of indexing called "instant indexing" is used by
AltaVista, InfoSeek and MSN Search
and means that information is updated quicker, days as compared to months.
Most search engines do not count words such as "the", "a" and "an" in their indexing. These are called "stop words". Which means that these words appear
so frequently that it would compromise the integrity and speed of the search being conducted. Not to mention take up a lot of hard drive space on their servers.
"Paid for Placement" is becoming popular for some Search Engines and Portals looking to increase revenue. People bid on search "keywords",
similar to an auction. The highest bidder gets the top spot. This is why you'll
see a lot of links to websites selling products when you do a search. Of course,
this isn't a problem if your just looking for something to buy. But that leads
to other problems.
More Information:
How Search Engines Work - from Search Engine Watch
Search Engine Sizes - from Search Engine Watch
Boolean Operations |
MetaTags |
Results |
Link Popularity |
Ranking |
Robots |
Boolean Operations:
Many search engines use Boolean operations in their content searches. Boolean
operators are words such as "and", "not", "or", and "near", and function keys such as "+", "-", "()" and "*". These are placed between keywords and will allow
quicker and more specific results to be listed. Each search engine is different and if they allow
Boolean algebraic functioning on their web searches, they will have a "help and usage" page with information to guide you.
Link Popularity:
Some Web crawls are also based on link popularity - the number of websites that
have a link to an individual page. Search Engines that base results on link popularity
include Excite, HotBot
and Lycos. Even though this may seem like a good way of ranking
results, it does have it's deficiencies. Results tend to ignore web page content,
newer websites and educational resources. And I don't know about you, but some of the
more interesting sites are personal web pages which usually don't have a "rating".
Link Popularity is very important at Google.
Metatags:
Metatags are not visible when viewing a web page, they are embedded into the HTML of the
web page itself.
Many forms of Metatags exist. Title, Description, Keywords, Classification and Title are the
primary ones used by search engines. Content rating is gaining popularity and will soon
become standard in search engine "politics".
Title, Description and Keywords are regularly used in determining relevance and ranking.
If a description metatag is not present in the web page, the search engine will usually display the first 200 characters or so of the
Web pages content. You may
have noticed this in your searching adventures.
Of the major search engines, Excite does not support metatags.
Also metatags do not boost ratings with AltaVista,
Excite, FAST,
Google, Magellan, Northern
Lights and WebCrawler.
However, metatags do boost ratings with Inktomi which supplies AOL Search,
HotBot and MSN Search
Some web pages use the Meta Refresh tags, which instruct a web robot when to return.
However, these only effect AltaVista.
Ranking:
In the "old" days of the Web results were primarily listed in alphabetical
order. Which meant a long time sorting through material to find what was relevant to your search.
Today's "search" software is more complex and varies with each search engine. For instance software used by
AltaVista, Excite
and the Go Network use a
web page's content, or relevance, to display results. HotBot's
priority lies in websites that have been frequently accessed in past Web searches. The
Go Network
and some Web directories, like About.com, use reviewed and contributed websites.
With the addition of "paid for placements" or "keyword bidding", results are placed
in order according to highest bidder.
More Information:
How Search Engines Rank Web Pages - Search Engine Watch
Results:
When you perform a "search", data is organized and presented in a relevant order
based upon "algorithmic functions" and search criteria to find "keyword" matches.
Sound complicated? It is.
Some searches will locate specific "keywords" in a web page URL and or "title" while others will display results based upon the number of times that your
"keyword" is displayed in the content of a web page (aka relevance). Of course, this depends on the search service being used.
Robots:
Robot's are primarily programs that operate on a predetermine set of instructions. And will
perform different functions based on the preferences of the company doing the crawling.
They travel the Web gathering information such as URLs, page titles, page description,
meta tags and some even store a cache of each individual web page on their servers
(Go Network, Google).
Some robots perform what is called "deep crawling" as compared to a "surface crawl".
This is when a robot collects information from a website that it crawls, and collects
additional information from URLs that are listed or linked on that initial page. The
following search engines have bots that perform a "deep crawl": AltaVista,
FAST, Google,
HotBot, and Northern Lights. Results usually take longer when searching on these
types of "crawl".
|