How do I use robots.txt to disallow crawling for only my subdomains?

Solution 1:

The robots.txt file needs to go in the top level directory of you webserver. If your main domain and each subdomain are on different vhosts then you can put it in the top level directory of each subdomain and include something like

User-agent: *
Disallow: /

Where the robots.txt is located depends upon how you access a particular site. Given a URL like

 http://example.com/somewhere/index.html

a crawler will discard everything to the right of the domain name and append robots.txt

http://example.com/robots.txt  

So you need to put your robots.txt in the directory pointed to by the DocumentRoot directive for example.com and to disallow access to /somewhere you need

User-agent: *
Disallow: /somewhere

If you have subdomains and you access them as

http://subdomain.example.com

and you want to disallow access to the whole subdomain then you need to put your robots.txt in the directory pointed to by the DocumentRoot directive for the subdomain etc.

Solution 2:

You have to put it in your root directory, otherwise it won't be found.

Solution 3:

  1. You need to put robots.txt in you root directory

  2. The Disallow rules are not domian/sub-domain specific and will apply to all urls

For example: Lets assume you are using a sub.mydomain.com and mydomain.com (both are linked to the same ftp folder). For this setup, if you set a Disallow: /admin/ rule then all URL sub.mydomain.com/admin/ and in mydomain.com/admin/ will be Disallowed.

But if sub.mydomain.com is actually links no another site (and also to another ftp folder) then you`ll need to create another robots.txt and put it in the root of that folder.