February 22, 2012

What is Robot.txt file


If you have owned a blog or website then there may be some files or folders that you don't want to be available to user or search engines. Search engine robots, crawler or spiders crawls your web to index web-pages and to search the user queried content
. It is possible that a search engine robot can crawl the web directory that the user doesn't want to be crawled on. So how it would be possible to deny robot to crawl that directories and files., this is the question... To keep that robots and crawlers away there are some Robot Exclusion Standard or Robot.txt which tells the robots not to access the mentioned or told. And all search engines generally obey Robot.txt. These are protocols according to which the the search engine robots have to work.
The Robot.txt file generally looks like...

User-agent: *
Disallow:

For example: If you want to block access to every robot then the code for that would be...
User-agent: *        # For Every User Agent
Disallow: /cgi-bin/  # Block Access to cgi-bin directory
Disallow: /images/   # Block Access to temp directory
For example: If you want to tell the location of your websites sitemap then...

User-agent: *        # For Every User Agent
sitemap: http://domain-name.com/sitemap.xml  # Address of sitemap file in Front of Sitemap parameter
For example: If any file should have aceess within the blocked directory then...
User-agent: * # For Every User Agent
allow: /images/img.jpg  # Block Access to cgi-bin directory
Disallow: /images/   # Block Access to temp directory
These are some of the examples use of of Robot.txt files. The robot.txt file should be in the root directory of the site, say, mydomain.com/robot.txt, this is because a robot before crawling the website, checks robot.txt first, and follow what they have asked to do.
But this doesn't mean that, if you have robot.txt file installed on your site and the user agents will not have access to the disallowed content. It is like requesting the user agents not to access them, But the bad robots can access the content. These type of robots are generally used by hackers or to be very precise then by crackers.

0 comments:

Post a Comment

I would Like to Hear From You

Twitter Delicious Facebook Digg Stumbleupon Favorites More

 
Design by Free WordPress Themes