Does a company have implied right to crawl my website?

There is legal precedent for this. Field v. Google Inc., 412 F. Supp. 2d 1106, (U.S. Dist. Ct. Nevada 2006). Google won a summary judgement based on several factors, most notably that the author did not utilize a robots.txt file in the metatags on his website, which would have prevented Google from crawling and caching pages the website owner did not want indexed.

Ruling pdf

There is NO U.S. law specifically dealing with robots.txt files; however another court case has set some precedent that could eventually lead to robots.txt files being considered as circumventing intentional electronic measures taken to protect content. In HEALTHCARE ADVOCATES, INC Vs HARDING, EARLEY, FOLLMER & FRAILEY, et. al, Healthcare Advocates argued that Harding et al essentially hacked the capabilities of the Wayback Machine in order to gain access to cached files of pages that had newer versions with robots.txt files. While Healthcare Advocates lost this case, the District Court noted that the problem was not that Harding et al "picked the lock," but that they gained access to the files because of a server-load problem with the Wayback Machine that granted access to the cached files when it shouldn't have and therefore there was "no lock to pick."

Court Ruling pdf

It is only a matter of time IMHO until someone takes this ruling and turns it on its side: The court indicated that robots.txt is a lock to prevent crawling and circumventing it is picking the lock.

Many of these lawsuits, unfortunately, aren't as simple as "I tried to tell your crawler that it is not allowed and your crawler ignored those settings/commands." There are a host of other issues in all these cases that ultimately affect the outcome more than that core issue of whether or not a robots.txt file should be considered electronic protection method under US DCMA law.

That having been said, this is a US law and someone from China can do what they want--not because of the legal issue, but because China won't enforce US trademark and copyright protection, so good luck going after them.

Not a short answer, but there really isn't a short, simple answer to your question!


Yes, they have the right to do so - you've created a public website, what makes you think they don't?

You too, of course, have the right to stop them. You can ask them not to crawl your website with robots.txt or actively prevent them from accessing it with something like fail2ban.

Alternatively, don't worry about it and continue on with your life. It's not hurting anything and is definitely on the benign side of Internet probing.