How do I detect bots programmatically
There is no sure-fire way to catch all bots. A bot could act just like a real browser if someone wanted that.
Most serious bots identify themselves clearly in the agent string, so with a list of known bots you can fitler out most of them. To the list you can also add some agent strings that some HTTP libraries use by default, to catch bots from people who don't even know how to change the agent string. If you just log the agent strings of visitors, you should be able to pick out the ones to store in the list.
You can also make a "bad bot trap" by putting a hidden link on your page that leads to a page that's filtered out in your robots.txt file. Serious bots would not follow the link, and humans can't click on it, so only bot that doesn't follow the rules request the file.
Depending on the type of bot you want to detect:
- Detecting Honest Web Crawlers
- Detecting Stealth Web Crawlers
you can use Request.Browser.Crawler to detect crawlers programmatically;
preferably keep your list of recognized crawlers up to date as described here http://www.primaryobjects.com/cms/article102.aspx