Block Bots with IIS 7.5 and 8.0
Normally you use robots.txt. It will work on all well behaved bots.
For bots that are not well behaved there is often little you can do. You can limit connection counts or bandwidth in your firewall or webserver, but major bots will typically use multiple IP addresses. Limiting based on user-agent strings is usually not a good idea, as those are trivial for the bot to spoof, and bots that does not care about robots.txt have a tendency to spoof useragent strings as well. It works in the specific case when the bot sends a correct user agent, but does not obey the robots.txt.
Edit: If you really want to block based on useragent instead of pushing it back to your firewall or similar I think the easiest way is to use URLScan. You write a rule that looks something like this:
[Options]
RuleList=DenyYandex
[DenyYandex]
DenyDataSection=Agents
ScanHeaders=User-Agent
[Agents]
Yandex
I know this is an old question, but in IIS 7.5 you can deny by user agent if you use Request Filtering.
In IIS, go to the website you wish to apply the filter and then in the right pane, click the Request Filtering icon. (you may have to enable this feature through server manager).
Click the Rules tab, and then along the far right list, select "Add Filtering Rule"
Give it a name, and then in the Scan Headers section, put "User-Agent".
You can add any specific file type(s) to block in Applies To, or you can leave it blank to make it apply to all file types.
In Deny Strings, enter all of the user agent strings you want to block. In the case of this question, you would put "Yandex" here.
I confirmed these changes in chrome using the User Agent Switcher extension.
For crawlers that do not respect Robots.txt, you can use URL Rewrite on the server to block based on their User Agent, see: http://chrisfulstow.com/using-the-iis-7url-rewrite-module-to-block-crawlers/
Here’s an easy way to block the main web crawlers – Google Bing and Yahoo – from indexing any site across an entire server. This is really useful if you push all your beta builds to a public facing server, but don’t want them indexed yet by the search engines.
Install the IIS URL Rewrite Module.
At the server level, add a request blocking rule. Block user-agent headers matching the regex: googlebot|msnbot|slurp.
Or, just paste this rule into “C:\Windows\System32\inetsrv\config\applicationHost.config”
<system.webServer> <rewrite> <globalRules> <rule name="RequestBlockingRule1" stopProcessing="true"> <match url=".*" /> <conditions> <add input="{HTTP_USER_AGENT}" pattern="googlebot|msnbot|slurp" /> </conditions> <action type="CustomResponse" statusCode="403" statusReason="Forbidden: Access is denied." statusDescription="You do not have permission to view this page." /> </rule> </globalRules> </rewrite> </system.webServer>
This’ll block Google, Bing and Yahoo from indexing any site published on the server. To test it out, try the Firefox User Agent Switcher.
For more info: http://www.iis.net/download/URLRewrite