How do I use robots.txt to disallow crawling for only my subdomains?
Solution 1:
The robots.txt file needs to go in the top level directory of you webserver. If your main domain and each subdomain are on different vhosts then you can put it in the top level directory of each subdomain and include something like
User-agent: *
Disallow: /
Where the robots.txt is located depends upon how you access a particular site. Given a URL like
http://example.com/somewhere/index.html
a crawler will discard everything to the right of the domain name and append robots.txt
http://example.com/robots.txt
So you need to put your robots.txt in the directory pointed to by the DocumentRoot directive for example.com and to disallow access to /somewhere you need
User-agent: *
Disallow: /somewhere
If you have subdomains and you access them as
http://subdomain.example.com
and you want to disallow access to the whole subdomain then you need to put your robots.txt in the directory pointed to by the DocumentRoot directive for the subdomain etc.
Solution 2:
You have to put it in your root directory, otherwise it won't be found.
Solution 3:
You need to put robots.txt in you root directory
The Disallow rules are not domian/sub-domain specific and will apply to all urls
For example: Lets assume you are using a sub.mydomain.com and mydomain.com (both are linked to the same ftp folder). For this setup, if you set a Disallow: /admin/ rule then all URL sub.mydomain.com/admin/ and in mydomain.com/admin/ will be Disallowed.
But if sub.mydomain.com is actually links no another site (and also to another ftp folder) then you`ll need to create another robots.txt and put it in the root of that folder.