Earth Station 9 Logo

70 newsfeeds - 790 categories - 50,000 resources
 


Home Support Earth Station 9 Tell a Friend Search the Web | Email

   
Search Engine Library

The Wonderful World of Search Engines and Web Directories

A Search Engine Guide by Stan Daniloski

 

Stan's Search Engine Guide - Further Reading Links on Bottom of Page

  

  • Navigation is accomplished by using the main menu on the left. Just click on the red help book next to the words Search Engine Library and the topics will fold/unfold. 

  • This information was originally written in October 2001. Some of the numbers and searching criteria have changed over the years. However, everything else is relevant.

  

 

  

General Information

Search engines basically use a robot, spider or web crawler to gather and index information and then break down usable results based on keywords. 

Website owners can also add their URL to a search engines database based on information submitted via a submitting form but it's still the robots that retrieve and index the web page.

Once data has been collected, their index is updated and keywords are placed in their directory for search results. This can take anywhere from a few days to a few months depending on the service collecting the information. 

A type of indexing called "instant indexing" is used by AltaVista, InfoSeek and MSN Search and means that information is updated quicker, days as compared to months.

Most search engines do not count words such as "the", "a" and "an" in their indexing. These are called "stop words". Which means that these words appear so frequently that it would compromise the integrity and speed of the search being conducted. Not to mention take up a lot of hard drive space on their servers.

"Paid for Placement" is becoming popular for some Search Engines and Portals looking to increase revenue. People bid on search "keywords", similar to an auction. The highest bidder gets the top spot. This is why you'll see a lot of links to websites selling products when you do a search. Of course, this isn't a problem if your just looking for something to buy. But that leads to other problems. 
 
More Information:
How Search Engines Work - from Search Engine Watch 
Search Engine Sizes - from Search Engine Watch 

 
 

Boolean Operations  MetaTags  Results 
Link Popularity  Ranking  Robots

 

Boolean Operations:
 
Many search engines use Boolean operations in their content searches. Boolean operators are words such as "and", "not", "or", and "near", and function keys such as "+", "-", "()" and "*". These are placed between keywords and will allow quicker and more specific results to be listed. Each search engine is different and if they allow Boolean algebraic functioning on their web searches, they will have a "help and usage" page with information to guide you. 
 
 

Link Popularity:
 
Some Web crawls are also based on link popularity - the number of websites that have a link to an individual page. Search Engines that base results on link popularity include Excite, HotBot and Lycos. Even though this may seem like a good way of ranking results, it does have it's deficiencies. Results tend to ignore web page content, newer websites and educational resources. And I don't know about you, but some of the more interesting sites are personal web pages which usually don't have a "rating".
 
Link Popularity is very important at Google.
 
 
Metatags:
 
Metatags are not visible when viewing a web page, they are embedded into the HTML of the web page itself. 
 
Many forms of Metatags exist. Title, Description, Keywords, Classification and Title are the primary ones used by search engines. Content rating is gaining popularity and will soon 
become standard in search engine "politics".
 
Title, Description and Keywords are regularly used in determining relevance and ranking. 
 
If a description metatag is not present in the web page, the search engine will usually display the first 200 characters or so of the Web pages content. You may have noticed this in your searching adventures.
 
Of the major search engines, Excite does not support metatags.
 
Also metatags do not boost ratings with AltaVista, Excite, FAST, Google, Magellan, Northern Lights and WebCrawler.
 
However, metatags do boost ratings with Inktomi which supplies AOL Search, HotBot and MSN Search
 
Some web pages use the Meta Refresh tags, which instruct a web robot when to return. However, these only effect AltaVista.
 
 
Ranking:
 
In the "old" days of the Web results were primarily listed in alphabetical order. Which meant a long time sorting through material to find what was relevant to your search.
 
Today's "search" software is more complex and varies with each search engine. For instance software used by AltaVista, Excite and the Go Network use a web page's content, or relevance, to display results. HotBot's priority lies in websites that have been frequently accessed in past Web searches. The Go Network and some Web directories, like About.com, use reviewed and contributed websites.
 
With the addition of "paid for placements" or "keyword bidding", results are placed in order according to highest bidder. 
 
More Information:
How Search Engines Rank Web Pages - Search Engine Watch 

  
Results:
 
When you perform a "search", data is organized and presented in a relevant order based upon "algorithmic functions" and search criteria to find "keyword" matches. Sound complicated? It is.
 
Some searches will locate specific "keywords" in a web page URL and or "title" while others will display results based upon the number of times that your "keyword" is displayed in the content of a web page (aka relevance). Of course, this depends on the search service being used.
  
  
Robots
 
Robot's are primarily programs that operate on a predetermine set of instructions. And will perform different functions based on the preferences of the company doing the crawling. They travel the Web gathering information such as URLs, page titles, page description, meta tags and some even store a cache of each individual web page on their servers (Go Network, Google).
 
Some robots perform what is called "deep crawling" as compared to a "surface crawl". This is when a robot collects information from a website that it crawls, and collects additional information from URLs that are listed or linked on that initial page. The following search engines have bots that perform a "deep crawl": AltaVista, FAST, Google, HotBot, and Northern Lights. Results usually take longer when searching on these types of "crawl". 

 

 

Metasearch Engines 

"Metasearch engines are search engines that query several search engines for you, and present a mix of the hits on the metasearch engines' own result pages". -- Search Engine World
 
Seem like the best of all worlds? Not exactly. 
 
Different Search Engines use different "algorithms" in their searches. This means that you can't perform "advanced Boolean" operations because Metasearch Engines do not translate queries to individual Search Engines. It may work in a few instances but not all.
 
And Google and Northern Light do not allow metasearch engines to query their sites, you'll have to use their search pages to garner results. Google has one of the biggest search directories on the Internet. I say "directory" because Google "caches" pages.
 
Ideally a Metasearch engine would provide access to all the Major Search Engines.
 
A factor to be considered in using a metasearch engine is the number of "paid placement" listings. Dogpile and Mamma have a notoriously high percentages of "paid placement" listings. And this means that your first few pages of results will be mostly paid links.
 
Vivisimo, IxQuick and Profusion have the lowest percentage of "paid placement" listings. With Vivisimo having 0 paid links. 
 
The rest of the major metasearch engines range in between 30% and 40% "paid placement links".
 
When using a metasearch engine, use the "advanced" search feature to customize your search options and save your search preferences.
  
 
   Examples of Metasearch engines and the number of search engines:

  

Dogpile (17)  MetaCrawler  (13)  qbSearch  (20) 
IxQuick  (12)  Profusion  (14)  Search.com  (22) 
Mamma  (8)     Vivisimo  (16)

  

 

 

Hybrids

Hybrids are Search Services with an associated web directory. With reviews and inclusions determined by humans. The exact rules and regulations on being included in both "sides" of a hybrid search service are unknown, but I bet it ranks up there with luck and money.
  
The following are types of "hybrids":
  
Lycos      Go Network      About.com 

 

 

Specialty

Specialty Search Engines are pretty self explanatory. Searches are primarily limited to specific categories or topics. These include: MP3s, Music, News, Newsgroups, Regional, Software, Price Comparison, State Specific, Travel, Video, etc. 
 
Major search engines will have searchable "categories" as well as a "general" search capability. However, Specialty services tend to give better "targeted" results. 

More Information: 
Specialty Search Engines
- Listing and Critique - Search Engine Watch

 

 

Web Directories:

Directories depend on human intervention for input. A description is 
submitted by the webmaster or an editor writes a review. Once a website is accepted and cataloged, there's virtually nothing that improves a websites ranking. The rules for search engines do not apply to directories.


The following are considered Web Directories:

 

 

Major Search Engines: The "Search Engine Shuffle"

All search engines are created different and use different algorithms in their search software. This is why the same search on different search engines produce different results.

And to further complicate things, they get their information from different sources.

Inktomi supplies AOL Search, HotBot, and iWon. And secondary results to NBCi, MSN Search and LookSmart.

Google supplies secondary results to Yahoo and secondary results to Netscape Search.

Netscape is supplied primarily by Open Directory and Netscape's own database with secondary results supplied by Google

Yahoo uses primarily it own database with secondary results from Google

MSN Search is a powered by LookSmart, with secondary results that come from Inktomi, RealNames and DirectHit.

Lycos' main listings come from the Open Directory project and results from DirectHit, with secondary results coming from FAST.

HotBot's first page of results comes from DirectHit, and secondary results from Inktomi. It gets its directory information from Open Directory.

Excite owns WebCrawler, and runs them as separate companies. But they both produce the same results.

DirectHit provides main results for HotBot and is available for MSN Search.

DirectHit is owned by Ask Jeeves who it provides results for.

AltaVista is supplied by LookSmart.

AOL Search allows it's members to use it's database of "internal" links but uses Open Directory and Inktomi for members outside the "AOL" circle.
  
 
The Following are Considered Major Search Engines:

 * Be-aware, Pop-up Banners from hell!!

More Information:
Major Search Engine Listing and Critique - SearchEngineWatch 

 

 

Pay for Listing and Paid Placement

A "Paid Placement" is when a website pays for search ranking based on keywords. A "Pay for Listing" is a charge made by the web service to be listed in their database.

The following is a list of "Pay for Listing" Search Engines. This is not all of them but these are the major players. However, almost all Search Services have some form of "Pay for Listing" or "Paid Placement" program.
 

 
* Paid Placement - UK and Europe 
** European "Pay for Listing" Directory 
  
  
More Information:
Pay for Placement? - Article from Search Engine Watch 
Buying Your Way into Search Engines - from Search Engine Watch

 

 

Newsletters, Lists and Discussion Forums

FantomNews - Cloaking & Search Engine Positioning  
I-Search Discussion List
Pandia Post 
ResearchBuzz - Excellent Resource  
ResearchBuzz  - News
Search Engine Forums 
Search Engine Report
Search Engine Showdown Online Newsletter 
Search Engine Watch - Home plate
SearchEngineNews - Resource 
Web Search Newsletter

 

 

Further Reading

AllSearchEngines.co.uk - An Index 
Big Search Engine Index - Only 811 of them - UK Website 
Conducting Research on the Internet  - Univ. at Albany Libraries
EntireWeb - Web Crawler 
Evaluating Quality on the Net by Hope N. Tillman
Glossary for Information Retrieval
Guide to Effective Searching on the Internet from BrightPlanet.com
Internet Search Strategies - Search Engine Showdown 
Learning More about Search Engines and Subject Directories: FAQs  
by CLN
Media Metrix Search Engine Ratings - Search Engine Watch 
Power Searching for Everyone  - Search Engine Watch 
Ratings, Reviews and Tests - Search Engine Watch 
Search Engine Alliances Chart - Search Engine Watch 
Search Engine and Subject Directory FAQs - Searching FAQs
Search Engine Fundamentals - Tutorial from WebNovice.com
Search Engine Information - Links 
Search Engine Reviews - Search Engine Showdown 
Search Engine Showdown - Search Engine Showdown 
Search Engine Statistics - Search Engine Showdown 
Search Engine Terms 
Search Engine Tutorials - Search Engine Watch 
Search Engine Watch - Tips about Internet Search Engines 
SearchEngineBase
SearchEngineBase Forum
SearchEngineForums -  Featuring JimTools 
SearchEngineMatrix - For the Adult Content Webmaster 
Searching the Internet - Recommended Sites 
and Search Techniques - Univ. at Albany Libraries
Searching The Net - Help Guide - ZDNet
Second Generation Searching on the Web  - Univ. at Albany Libraries
SpiderFood - Forum 
Surfing with a Purpose
Process and Strategy put to the Test on the Internet - Educom Review
Suggested Internet Research Strategies by Ronald W. Kriesel
The Spider's Apprentice - Helpful Guide To  Web Search Engines
Toward More Comprehensive Web Searching
Single Searching Versus Mega-searching by Gred R. Notess
Up and Coming Web Search Engine/Directory 
User's Guide to Web Surfing - Search Engine Showdown 
Web Search - About.com 
Web Search Strategies - by Debbie Flanagan
Web Searching, Sleuthing and Sifting
Web Searching Tips - Search Engine Watch 

 

 

 

 

   
 

About this SiteAdd a LinkPrivacy Policy | Subscribe Newsletter  | Site FAQContact Us
 

Earth Station 9 Banners & Logo 2002 red.
Earthtstation9.com website Copyright 1997-2003, Stan  Daniloski. All Rights Reserved
Earthstation1.com website 1995-1999 James Charles Kaelin