Search Engine Guide

Stan's Search Engine Guide - Further Reading Links on Bottom of Page

Navigation is accomplished by using the main menu on the left. Just click on the red help book next to the words Search Engine Library and the topics will fold/unfold.
This information was originally written in October 2001. Some of the numbers and searching criteria have changed over the years. However, everything else is relevant.

General Information	Hybrids	Paid Placement
Further Reading	Major Search Engines	Specialty Search Engines
	MetaSearch Engines	Usenet
	Newsletters	Web Directories

General Information

Search engines basically use a robot, spider or web crawler to gather and index information and then break down usable results based on keywords.

Website owners can also add their URL to a search engines database based on information submitted via a submitting form but it's still the robots that retrieve and index the web page.

Once data has been collected, their index is updated and keywords are placed in their directory for search results. This can take anywhere from a few days to a few months depending on the service collecting the information.

A type of indexing called "instant indexing" is used by AltaVista, InfoSeek and MSN Search and means that information is updated quicker, days as compared to months.

Most search engines do not count words such as "the", "a" and "an" in their indexing. These are called "stop words". Which means that these words appear so frequently that it would compromise the integrity and speed of the search being conducted. Not to mention take up a lot of hard drive space on their servers.

"Paid for Placement" is becoming popular for some Search Engines and Portals looking to increase revenue. People bid on search "keywords", similar to an auction. The highest bidder gets the top spot. This is why you'll see a lot of links to websites selling products when you do a search. Of course, this isn't a problem if your just looking for something to buy. But that leads to other problems.

More Information:
How Search Engines Work - from Search Engine Watch
Search Engine Sizes - from Search Engine Watch

Boolean Operations	MetaTags	Results
Link Popularity	Ranking	Robots

Boolean Operations:

Many search engines use Boolean operations in their content searches. Boolean operators are words such as "and", "not", "or", and "near", and function keys such as "+", "-", "()" and "*". These are placed between keywords and will allow quicker and more specific results to be listed. Each search engine is different and if they allow Boolean algebraic functioning on their web searches, they will have a "help and usage" page with information to guide you.

Link Popularity:

Some Web crawls are also based on link popularity - the number of websites that have a link to an individual page. Search Engines that base results on link popularity include Excite, HotBot and Lycos. Even though this may seem like a good way of ranking results, it does have it's deficiencies. Results tend to ignore web page content, newer websites and educational resources. And I don't know about you, but some of the more interesting sites are personal web pages which usually don't have a "rating".

Link Popularity is very important at Google.

Metatags:

Metatags are not visible when viewing a web page, they are embedded into the HTML of the web page itself.

Many forms of Metatags exist. Title, Description, Keywords, Classification and Title are the primary ones used by search engines. Content rating is gaining popularity and will soon
become standard in search engine "politics".

Title, Description and Keywords are regularly used in determining relevance and ranking.

If a description metatag is not present in the web page, the search engine will usually display the first 200 characters or so of the Web pages content. You may have noticed this in your searching adventures.

Of the major search engines, Excite does not support metatags.

Also metatags do not boost ratings with AltaVista, Excite, FAST, Google, Magellan, Northern Lights and WebCrawler.

However, metatags do boost ratings with Inktomi which supplies AOL Search, HotBot and MSN Search

Some web pages use the Meta Refresh tags, which instruct a web robot when to return. However, these only effect AltaVista.

Ranking:

In the "old" days of the Web results were primarily listed in alphabetical order. Which meant a long time sorting through material to find what was relevant to your search.

Today's "search" software is more complex and varies with each search engine. For instance software used by AltaVista, Excite and the Go Network use a web page's content, or relevance, to display results. HotBot's priority lies in websites that have been frequently accessed in past Web searches. The Go Network and some Web directories, like About.com, use reviewed and contributed websites.

With the addition of "paid for placements" or "keyword bidding", results are placed in order according to highest bidder.

More Information:
How Search Engines Rank Web Pages - Search Engine Watch

Results:

When you perform a "search", data is organized and presented in a relevant order based upon "algorithmic functions" and search criteria to find "keyword" matches. Sound complicated? It is.

Some searches will locate specific "keywords" in a web page URL and or "title" while others will display results based upon the number of times that your "keyword" is displayed in the content of a web page (aka relevance). Of course, this depends on the search service being used.


Robots:

Robot's are primarily programs that operate on a predetermine set of instructions. And will perform different functions based on the preferences of the company doing the crawling. They travel the Web gathering information such as URLs, page titles, page description, meta tags and some even store a cache of each individual web page on their servers (Go Network, Google).

Some robots perform what is called "deep crawling" as compared to a "surface crawl". This is when a robot collects information from a website that it crawls, and collects additional information from URLs that are listed or linked on that initial page. The following search engines have bots that perform a "deep crawl": AltaVista, FAST, Google, HotBot, and Northern Lights. Results usually take longer when searching on these types of "crawl".

Metasearch Engines

"Metasearch engines are search engines that query several search engines for you, and present a mix of the hits on the metasearch engines' own result pages". -- Search Engine World

Seem like the best of all worlds? Not exactly.

Different Search Engines use different "algorithms" in their searches. This means that you can't perform "advanced Boolean" operations because Metasearch Engines do not translate queries to individual Search Engines. It may work in a few instances but not all.

And Google and Northern Light do not allow metasearch engines to query their sites, you'll have to use their search pages to garner results. Google has one of the biggest search directories on the Internet. I say "directory" because Google "caches" pages.

Ideally a Metasearch engine would provide access to all the Major Search Engines.

A factor to be considered in using a metasearch engine is the number of "paid placement" listings. Dogpile and Mamma have a notoriously high percentages of "paid placement" listings. And this means that your first few pages of results will be mostly paid links.

Vivisimo, IxQuick and Profusion have the lowest percentage of "paid placement" listings. With Vivisimo having 0 paid links.

The rest of the major metasearch engines range in between 30% and 40% "paid placement links".

When using a metasearch engine, use the "advanced" search feature to customize your search options and save your search preferences.

Examples of Metasearch engines and the number of search engines:

Dogpile (17)	MetaCrawler (13)	qbSearch (20)
IxQuick (12)	Profusion (14)	Search.com (22)
Mamma (8)		Vivisimo (16)

Hybrids

Hybrids are Search Services with an associated web directory. With reviews and inclusions determined by humans. The exact rules and regulations on being included in both "sides" of a hybrid search service are unknown, but I bet it ranks up there with luck and money.

The following are types of "hybrids":

Lycos      Go Network      About.com

Specialty

Specialty Search Engines are pretty self explanatory. Searches are primarily limited to specific categories or topics. These include: MP3s, Music, News, Newsgroups, Regional, Software, Price Comparison, State Specific, Travel, Video, etc.

Major search engines will have searchable "categories" as well as a "general" search capability. However, Specialty services tend to give better "targeted" results.

More Information:
Specialty Search Engines - Listing and Critique - Search Engine Watch

Web Directories:

Directories depend on human intervention for input. A description is
submitted by the webmaster or an editor writes a review. Once a website is accepted and cataloged, there's virtually nothing that improves a websites ranking. The rules for search engines do not apply to directories.

The following are considered Web Directories:

About.com	LookSmart	OpenDirectory
Britannica	NBCi	WWW Virtual Library
Go Directory		Yahoo

Major Search Engines: The "Search Engine Shuffle"

All search engines are created different and use different algorithms in their search software. This is why the same search on different search engines produce different results.

And to further complicate things, they get their information from different sources.

Inktomi supplies AOL Search, HotBot, and iWon. And secondary results to NBCi, MSN Search and LookSmart.

Google supplies secondary results to Yahoo and secondary results to Netscape Search.

Netscape is supplied primarily by Open Directory and Netscape's own database with secondary results supplied by Google

Yahoo uses primarily it own database with secondary results from Google

MSN Search is a powered by LookSmart, with secondary results that come from Inktomi, RealNames and DirectHit.

Lycos' main listings come from the Open Directory project and results from DirectHit, with secondary results coming from FAST.

HotBot's first page of results comes from DirectHit, and secondary results from Inktomi. It gets its directory information from Open Directory.

Excite owns WebCrawler, and runs them as separate companies. But they both produce the same results.

DirectHit provides main results for HotBot and is available for MSN Search.

DirectHit is owned by Ask Jeeves who it provides results for.

AltaVista is supplied by LookSmart.

AOL Search allows it's members to use it's database of "internal" links but uses Open Directory and Inktomi for members outside the "AOL" circle.

The Following are Considered Major Search Engines:

AOL Search	FAST	LookSmart	Open Directory
AltaVista	Google	MSN Search	RealNames
Ask Jeeves	HotBot	NBCi	WebCrawler
DirectHit	iWon*	Netscape	Yahoo
Excite	Lycos	Northern Light

* Be-aware, Pop-up Banners from hell!!

More Information:
Major Search Engine Listing and Critique - SearchEngineWatch

Pay for Listing and Paid Placement

A "Paid Placement" is when a website pays for search ranking based on keywords. A "Pay for Listing" is a charge made by the web service to be listed in their database.

The following is a list of "Pay for Listing" Search Engines. This is not all of them but these are the major players. However, almost all Search Services have some form of "Pay for Listing" or "Paid Placement" program.

BrainFox	GoPile	Kanoodle	SearchUp
ePilot	Godado**	NetFlip	7Search
Espotting*	Goto	OneSearch	SimpleSearch
FindWhat	HitsGalore	RocketLinks	Sprinks
Galaxy		SearchHound	TheNet1

* Paid Placement - UK and Europe
** European "Pay for Listing" Directory

More Information:
Pay for Placement? - Article from Search Engine Watch
Buying Your Way into Search Engines - from Search Engine Watch

Newsletters, Lists and Discussion Forums:

FantomNews - Cloaking & Search Engine Positioning
I-Search Discussion List
Pandia Post
ResearchBuzz - Excellent Resource
ResearchBuzz  - News
Search Engine Forums
Search Engine Report
Search Engine Showdown Online Newsletter
Search Engine Watch - Home plate
SearchEngineNews - Resource
Web Search Newsletter

Search Engine Library