How do you create a single robots.txt file for all sites on an IIS instance
I want to create a single robots.txt file and have it served for all sites on my IIS (7 in this case) instance.
I do not want to have to configure anything on any individual site.
How can I do this?
Solution 1:
It can be done using the Url Rewrite module for IIS.
Create these folders:
\Inetpub\wwwroot\allsites
\Inetpub\wwwroot\site1
\Inetpub\wwwroot\site2
Create 2 websites using the path of site# above. Inside each website, create a virtual directory called allsites pointing to \Inetpub\wwwroot\allsites
Next, create these files. Each should have unique content to verify this is working during testing:
\Inetpub\wwwroot\allsites\robots.txt
\Inetpub\wwwroot\site2\robots.txt
Install the Url Rewrite module for IIS if you have not done so already.
Place this in the web.config of each website:
<?xml version="1.0" encoding="UTF-8"?>
<configuration>
<system.webServer>
<rewrite>
<rules>
<clear />
<rule name="Rewrite robots.txt">
<match url="^(robots.txt)$" />
<conditions logicalGrouping="MatchAll" trackAllCaptures="false">
<add input="{REQUEST_FILENAME}" matchType="IsFile" negate="true" />
</conditions>
<action type="Rewrite" url="/allsites/robots.txt" />
</rule>
</rules>
</rewrite>
<directoryBrowse enabled="true" />
</system.webServer>
</configuration>
What this rule does is matches a url such as http://mysite/robots.txt
, and rewrite it to request http://mysite/allsites/robots.txt instead
. However, it will ONLY do this if the robots.txt file doesn't exist on the filesystem at that location.
So you can put a common robots.txt in allsites, but override it any site you want by placing a custom robots.txt in the website root.
This is a not a redirect. The remote web crawler will have no idea that IIS is doing this behind the scenes.
Update:
I haven't done this on my configuration, but the Url Rewrite module does support global rules which can be defined at the server level. So you would not need to define this for each site.
http://learn.iis.net/page.aspx/460/using-the-url-rewrite-module/
"Global and distributed rewrite rules. URL Rewrite uses global rules to define server-wide URL rewriting logic. These rules are defined within the applicationHost.config file, and they supercede rules configured at lower levels in the configuration hierarchy. The module also uses distributed rules to define URL rewrite logic specific to a particular configuration scope. This type of rule can be defined on any configuration level by using Web.config files."
Solution 2:
An alternative to the robots.txt file is the X-Robots-Tag HTTP header, as detailed here:
http://googleblog.blogspot.com/2007/07/robots-exclusion-protocol-now-with-even.html
Which can be applied server-wide on IIS by adding a custom HTTP Header
IIS 6: right-click on the "Web Sites" folder > Properties > HTTP Headers
IIS 7: on the server home screen, click on HTTP Request Headers, choose "add"
Unlike robots.txt, this appears to be proprietary to Google and like robots.txt it is only useful against "compliant" search engine indexers.
Solution 3:
Can you use symbolic links? Would that work?
http://www.howtogeek.com/howto/windows-vista/using-symlinks-in-windows-vista/