The importance of a Robots.txt file

If you have a website you should keep a close eye on your statistics file to see which Bots – both bad and good – are crawling your site. As a general rule you want the good bots to crawl your site; they’re taking a look at the pages you’ve created in order to add them to their directories. Good bots include Google, InfoSeek, Excite, Fast/All The Web, Alta Vista, Lycos, Inktomi, WiseNut, Ask Jeeves/Teoma, Northern Light, Alexa and Gigablast. The list of bad bots is very long so I will not attempt to list them in this post.

So how do you block the bad bots and allow the good bots access to the pages on your website? With a Robots Text file.

A robots.txt file will also serve to keep even the good bots from crawling pages that are private, or that you do not want indexed. For example, you would not want the files in your images folder to be indexed. So, block them.  I always add the following to my robots text file as follows:

User-agent: *
Disallow: /images/
Disallow: /cgi-bin/
Disallow: /temp/
Disallow: /*.gif$
Disallow: /*.jpg$
Disallow: /*.zip$
Disallow: /*.doc$
Disallow: /*.exe$
Disallow: /*.pdf$

The User-agent: * found at the top of this list tells ALL robots “this file or directory is off limits.”

If you want to block a specific bot from crawling your site you would simply add this command to your robots.txt file:

User-agent: [Name of Bot]
Disallow: /

This tells the particular named robot “STOP! You are not allowed on this site.” Does it really stop them? Sometimes it doesn’t. But it will stop many of them.

Go here to download a good robots.txt file.