Which bots and spiders should I block in robots.txt?
In order to:
- Increase security of my website
- Reduce bandwidth requirements
- Prevent email address harvesting
Solution 1:
No bot that is harvesting emails or testing your site for vulnerabilities will respect your robots.txt. In fact these malicious bots look at the robots.txt to better map your site. If any point you have a Disallow:
this will be used to better attack your site. A hacker that is manually looking at your site should spend extra time examining any files/directories that you are attempting to disallow.
Solution 2:
robots.txt will not increase security of your website or prevent e-mail address harvesting. robots.txt is a guide for search engines to skip over sections of your website. These will not be indexed and should be used for any sections you do not want to show up in public search engines.
However, this will in no way prevent any other bots from downloading your entire site to increase security or prevent e-mail harvesting. To increase security you need to add authentication and only allow authenticated users beyond the secured sections. To prevent e-mail address harvesting don't put e-mails in plain text (or easily decipherable text) on a website.