Search
Articles
May 2006- What is Robots Exclusion Standard?
by Ekta Verma
The robots exclusion standard or robots.txt protocol is a convention
to prevent web spiders and other web robots from accessing all or part
of a website. The information specifying the parts that should not be
accessed is specified in a file called robots.txt in the top-level directory
of the website. You can allow all robots to visit all files by using
the wildcard "*" specifies all robots.
For example :
User-agent: * Disallow:
You can also keeps all robots out :
User-agent: * Disallow: /
You can also tells a specific crawler not to enter one specific directory
:
User-agent: BadBot Disallow: /private/
But you should not use below codes as it is not a stable standard
extension.:
Disallow: *
Instead you can use :
Disallow: /
HTML meta tags for robots :
HTML meta tags can be used to exclude robots according to the contents
of web pages.
<meta name="robots" content="noindex,nofollow" />
By using above code within the head section of an HTML document you
can tell the search engines such as Google, Yahoo!, or MSN to exclude
the page from its index and not to follow any links on this page for
further possible indexing.
You can also use robots.txt generator for making your robots.txt
file. Then open a text editor, like windows notepad, copy and paste
the text from the text box in it, and save the file as robots.txt. Upload
the file to your root-directory. (The same directory as your index.htm/html
file is in.)
References: Wikipedia -The Free Encyclopedia and Internet.
Visit: Halfvalue.com [A unique
shopping website]
About the Author
Ekta Verma
Note: These articles do not represent the advice or opinions of
Apollo Hosting. They represent the thoughts, advice and opinions of
the individual authors.
|