123ArticleOnline Logo
Welcome to 123ArticleOnline.com!
ALL >> Animal-and-Pet >> View Article

Search Engine Robots - How They Work, What They Do (part

Profile Picture
By Author: Daria Goetsch
Total Articles: 3
Comment this article
Facebook ShareTwitter ShareGoogle+ ShareTwitter Share

Automated search engine robots, sometimes called "spiders" or "crawlers", are
the seekers of web pages. How do they work? What is it they really do? Why are
they important?

You'd think with all the fuss about indexing web pages to add to search engine
databases, that robots would be great and powerful beings. Wrong. Search engine
robots have only basic functionality like that of early browsers in terms of
what they can understand in a web page. Like early browsers, robots just can't
do certain things. Robots don't understand frames, Flash movies, images or
JavaScript. They can't enter password protected areas and they can't click all
those buttons you have on your website. They can be stopped cold while indexing
a dynamically generated URL and slowed to a stop with JavaScript navigation.

How Do Search Engine Robots Work?
Think of search engine robots as automated data retrieval programs, traveling
the web to find information and links.

When you submit a web page to a search engine at the "Submit a URL" page, the
new URL is ...
... added to the robot's queue of websites to visit on its next foray out
onto the web. Even if you don't directly submit a page, many robots will find
your site because of links from other sites that point back to yours. This is
one of the reasons why it is important to build your link popularity and to get
links from other topical sites back to yours.

When arriving at your website, the automated robots first check to see if you
have a robots.txt file. This file is used to tell robots which areas of your
site are off-limits to them. Typically these may be directories containing only
binaries or other files the robot doesn't need to concern itself with.

Robots collect links from each page they visit, and later follow those links
through to other pages. In this way, they essentially follow the links from one
page to another. The entire World Wide Web is made up of links, the original
idea being that you could follow links from one place to another. This is how
robots get around.

The "smarts" about indexing pages online comes from the search engine engineers,
who devise the methods used to evaluate the information the search engine robots
retrieve. When introduced into the search engine database, the information is
available for searchers querying the search engine. When a search engine user
enters their query into the search engine, there are a number of quick
calculations done to make sure that the search engine presents just the right
set of results to give their visitor the most relevant response to their query.

You can see which pages on your site the search engine robots have visited by
looking at your server logs or the results from your log statistics program.
Identifying the robots will show you when they visited your website, which pages
they visited and how often they visit. Some robots are readily identifiable by
their user agent names, like Google's "Googlebot"; others are bit more obscure,
like Inktomi's "Slurp". Still other robots may be listed in your logs that you
cannot readily identify; some of them may even appear to be human-powered
browsers.

Along with identifying individual robots and counting the number of their
visits, the statistics can also show you aggressive bandwidth-grabbing robots or
robots you may not want visiting your website. In the resources section of the
end of this article, you will find sites that list names and IP addresses of
search engine robots to help you identify them.

How Do They Read The Pages On Your Website?
When the search engine robot visits your page, it looks at the visible text on
the page, the content of the various tags in your page's source code (title tag,
meta tags, etc.), and the hyperlinks on your page. From the words and the links
that the robot finds, the search engine decides what your page is about. There
are many factors used to figure out what "matters" and each search engine has
its own algorithm in order to evaluate and process the information. Depending on
how the robot is set up through the search engine, the information is indexed
and then delivered to the search engine's database.

The information delivered to the databases then becomes part of the search
engine and directory ranking process. When the search engine visitor submits
their query, the search engine digs through its database to give the final
listing that is displayed on the results page.

The search engine databases update at varying times. Once you are in the search
engine databases, the robots keep visiting you periodically, to pick up any
changes to your pages, and to make sure they have the latest info. The number of
times you are visited depends on how the search engine sets up its visits, which
can vary per search engine.

Sometimes visiting robots are unable to access the website they are visiting. If
your site is down, or you are experiencing huge amounts of traffic, the robot
may not be able to access your site. When this happens, the website may not be
re-indexed, depending on the frequency of the robot visits to your website. In
most cases, robots that cannot access your pages will try again later, hoping
that your site will be accessible then.

Resources

SpiderSpotting - Search Engine Watch

http://searchenginewatch.com/webmasters/spiders.html

Robotstxt.org

List of robots and protocols for setting up a robots.txt file.
http://www.robotstxt.org/

Spider-Food

Tutorials, forums and articles about Search Engine spiders and Search Engine
Marketing.
http://spider-food.net/

Spiderhunter.com

Articles and resources about tracking Search Engine spiders.
http://www.spiderhunter.com/

Sim Spider Search Engine Robot Simulator

Search Engine World has a spider that simulates what the Search Engine robots
read from your website.
http://www.searchengineworld.com/cgi-bin/sim_spider.cgi ABOUT THE AUTHOR
Daria Goetsch is the founder and Search Engine Marketing Consultant for Search Innovation Marketing (www.searchinnovation.com), a Search Engine Promotion company serving small businesses. She has specialized in search engine optimization since 1998, including three years as the Search Engine Specialist for O'Reilly & Associates, a technical book publishing company.

Total Views: 258Word Count: 1023See All articles From Author

Add Comment

Animal and Pet Articles

1. Easily Trainable Cats:
Author: Annie Martha

2. How To Take Care Of Your Dog For A Shiny Coat And Healthy Skin
Author: Annie Martha

3. Protect Your Pets: A List Of The Most Dangerous Houseplants
Author: Souvik Ghosh

4. Dog Shoes: How To Get Your Pup To Wear Them Happily
Author: Pawsandpaws

5. How To Keep Your Pet Safe From Parvo Virus?
Author: Sikandar Singh

6. Finding The Best Cocker Spaniel Breeders In Ohio: A Complete Guide
Author: Sam Pinkman

7. Preventing Ticks On Dogs Naturally
Author: Hasannul Huq

8. Best Dog Food For A Sensitive Stomach: A Guide For Concerned Pet Parents
Author: Pooja Dunedar

9. Dog Kidney Anatomy: Understanding The Vital Organ For Canine Health
Author: Safarivet

10. How To Make The Most Of One-on-one Dog Training Sessions
Author: benjabeth

11. Best Pet Names: A Guide To Naming Your Furry Friend
Author: themouthword

12. All That You Must Know About Caring For Lazy Dogs Through Pet Grooming
Author: Petgroomly

13. How Do Ragdoll Kittens Compare To Other Cat Breeds?
Author: Cat Exotica

14. Essential Dog Training Tools From Slip Leashes To Dog Balls
Author: Von Ultimate Dog Shop

15. The Best Day Boarding For Dogs In Bangalore
Author: anikammp

Login To Account
Login Email:
Password:
Forgot Password?
New User?
Sign Up Newsletter
Email Address: