Determining if a bot is scraping utility bill content and how to block it

Solution 1:

There's no way to detect or block a well-written bot that's only scraping a small number of pages -- its behaviour can be indistinguishable from a genuine user.

Solution 2:

  1. You could block or rate-limit any single source IP that is accessing more than one account. As mentioned above, this would have to be able to know that more than account is being accessed and might not be trivial to implement. This could also block tenants in an apartment complex that have NAT-ted internet as a "utility", of course.

  2. You could implement a CAPTCHA.