Librarian's Ultimate Guide to Search Engines

Librarians were the ultimate search engines before the web took over. Librarians are trusted, credible sources finding and delivering information as they witness, search, organize, and catalog information. Online research and the power of the web have made information only fingertips away from all of us, but the taxonomies and standards used for search will impact how people learn online for years to come. Below are some of the things librarians understand about search - and things that anyone doing online research can benefit from.

History of Search Engines

While there are many search engines, about 80 to 90 percent of the search market belongs to just a few including Google, Bing, and MSN. There are a few other engines that are relatively popular but some are white-labelled versions of the above. If you want to see a chart of approximate web traffic figures for these engines, use alexaholic.com. Alexaholic uses Alexa.com but will let you view multiple traffic charts simultaneously. These will give you a relative comparison of which is more popular.

Web 2.0 Search Engines

These are the new breed - they're the tip of the iceberg of advanced search applications for what is known as the semantic web. They add another dimension to searching. Some offer visual search using an initial image that you select or even draw. Others let you search by color or meta tags of audio files.

Most of these new engines are works in progress that need a few generations of revisions. A few are truly brilliant, and all of them innovative. Some use meta level concepts such as synonym matching, color or shape similarity, thematic concepts, and semantics. There's even a device that carves rivers, canyons, and valleys into foam based on search engine queries.

All of them appear to improve the search experience, but mostly for advanced users who are familiar with unusual search paradigms. If you're interested, visit some to get a sense of them. The rest of this article focuses on traditional text-based search engines.

Glossary: Search Engine + Related

Before discussing ways to refine search queries, let's have a look at a few terms either specifically related to search engines, or related to topics in this article.

  • Anchor Text: Whenever you see a hyperlink on a web page, the actual words used to specify the link are referred to as the anchor text.
  • Blog: A blog (formally known as weblog) is a special website that has been structured with articles (blog posts) in reverse chronological order. Blog posts are also organized into page groups and monthly archives. They have a structural advantage in search engines, though they often result in false search results.
  • Bot/Spider: A search engine bot or spider is a special automated web application that indexes web pages for a search engine.
  • Cache: Some engines store the full text of an indexed web page. Whenever the page is updated, the engine's cache will also be updated.
  • Invisible Web: The Invisible Web consists of websites that are difficult or impossible to find, either because they are not indexed in a search engine or because they require a password.
  • Query Strings: This simply means the actual text that you enter in a search query, including letters, digits, punctuation, and any special operator characters.
  • SEM/SEO: Search engine marketing/search engine optimization
  • SERP: Means Search Engine Results Page - the pages that result when you do a search query.
  • Stop Words: Stop words are any words, such as "the," "and," "a," "or," that add little value in being part of a search query string. Most engines do not store these when indexing web pages.
  • Tags: Tags refer to a topic category classification, primarily for weblog sites. So if you write a blog post about food, it might have tags such as "recipe," "italian," "mushrooms," "pasta". Tags are applied by the author of a post.
  • TLD: TLD means Top Level Domain and refers to the final part of the name of a web domain. For example, http://www.msn.com/ is an URL. The TLD is the ".com" part. The "msn" part is known as the second-level domain.
  • URL: URL means Uniform Resource Locator and essentially means the web address of a specific web page.
  • Web Feeds: Web feeds are a special form of web content that organizes new content from a website or blog into the form of headlines and excerpts. Web feeds make it easy to syndicate content online, as well to subscribe to such content for frequent browsing using a "web feed reader". (See "Bloglines" in the final section of this article.)

Refining Search Queries

All text-based search engines work on a query string supplied by the user. But most of the time, the SERP's returned number in the hundreds or even millions of pages, make it difficult to find what you want. To reduce the number of SERPs, we need to refine our search strings. To do that, we need to use special query operators that are derived mostly from Boolean logic, pus a few specialized operators.

All search engines use a fairly common set of advanced query operators (AQOs). However, not all engines process AQOs the same way. So if you do use advanced operators, you will want to play around with them in your favorite search engine to learn how they're handled. The operator descriptions below are generalized - not all engines will support them in exactly the way described.

Boolean Site Operators
These are powerful operators that most engines have, but which are not always well known. While there is a common set of operators, a few engines have their own variations. Here is an amalgamated list. A few references are included after this section, if you are interested in finding out more. All of them consist of a predefined keyword and a colon, ":", character, which are then followed by a word or URL or domain name, etc. There should be no spaces on either side of the colon.

  • allinanchor:, inanchor: Use allinanchor to specify one or more words that must all be in anchor text. (See definition of anchor text in Glossary above.) Use inanchor to specify one word in anchor text and one or more words in the rest of the document body. Example: allinanchor:librarian
  • allintitle: , intitle: Use allintitle to specify one or more words that must all be in the title of a web page. Use intitle: to check for a single word in the title, and one or more words in the document body. Example: allintitle: librarians
  • allinurl:, inurl: Use allinurl to specify one or more words to be checked in the URL of a web page. Use inurl: to check one word in the URL and one or more words in the document body. Example: allinurl: librarians
  • define: Returns definitions of a specific word, from various sources. Example: define:librarian
  • domain:, site: Use with a domain name to limit searches to pages on that site. Example: site:stanford.edu
  • filetype: Use with a media file type (e.g., PDF) to limit SERPs to that type of document. Example: library filetype:xls
  • info: Provides engine-specific info about a particular URL or its parent site. Example: info:becomealibrarian.org
  • related: Engines determine topic similarity of web pages on different sites. This operator, when used with an URL, will return pages from other sites that are similar. Example: related:lii.org

Additional References
Here are a few links to pages about advanced queries.

Customize Your Education