Block spam by using geoip filter?

We are looking for a way to be able to block spam based on geographic location by filtering using geoip.

context: we rarely have any email correspondence outside of the USA, so we would like to block all incoming email outside the US except for maybe one or two countries.

After a little Googling I have found a couple of solutions that may work (or not), but I would like to know what other sysadmins are currently doing or what they would recommend as a solution.

Here is what I have found so far:

Using PowerDNS and its GeoIP backend it is possible to use geoip for filtering. Normally this backend is used to help distribute load as a kind of load balancing but I dont see why it couldnt be used to kill spam as well?

Possibly use the Maxmind lite country database and some scripting to do a similar job.

Ideally what I am looking for is a solution that would handle decent load and scale well too...aren't we all! ;)

Thanks in advance for your help! :-)


There is also the geoip patch for netfilter/iptables for Linux. You could use this to block 25 for your email server if it is Linux. You could use Linux as a firewall for your email server with this iptables patch. Best part is that it is free :-)


From this research paper on SNARE, I present this nugget:

For ham, 90% of the messages travel about 4,000 km or less. On the other hand, for spam, only 28% of messages stay within this range.

My personal observations mirror yours and note that even now in 2014, geographic location continues to be an excellent predictor of spam. As others pointed out, GeoIP location (country or distance) alone is not a sufficiently reliable basis for blocking connections. However, combining GeoIP distance with a few other pieces of data about the connection, such as FCrDNS, HELO hostname validity, sender OS (via p0f), and SPF provides a 99.99% reliable basis (as in, a .01% chance of a FP) for rejecting 80% of connections before the DATA phase.

Unlike some SMTP tests (such as a DNSBL listing in zen.spamhaus.org) which have very low FP rates, none of the aforementioned tests individually are a sufficient basis for rejecting connections. Here's another pattern that falls into that category–-the envelope sender user matches the envelope recipient user. I've noticed that about 30% of spam follows this pattern: from: [email protected] to: [email protected]. It happens far more frequently in spam than in valid mail flows. Another spammer pattern is a non-matching envelope and header from domain.

By heuristically scoring these "spam appearing" characteristics, the basis for an extremely reliable filtering system can be assembled. SpamAssassin already does (or can do) most of what I described. But you also asked for a solution that would handle sufficient load and scale well. While SpamAssassin is great, I didn't see "massively reduced resource consumption" anywhere in the 3.4 release notes.

All the tests I listed in the first paragraph occur before SMTP DATA. Combining those early tests forms a sufficient basis for rejecting spammy connections before SMTP DATA without any False Positives. Rejecting the connection before SMTP data avoids the bandwidth costs of transferring the message as well as the heavier CPU and network load of content based filters (SpamAssassin, dspam, header validation, DKIM, URIBL, antivirus, DMARC, etc.) for the vast majority of connections. Doing far less work per connection scales much better.

For the smaller subset of messages that are indeterminate at SMTP DATA, the connection is allowed to proceed and I score the message with results from the content filters.

To accomplish all I have described, I've done a bit of hacking on a node.js based SMTP server called Haraka. It scales very, very well. I have written a custom plugin called Karma which does the heuristics scoring, and I've put all the weighting scores into a config file. To get an idea of how karma works, have a look at the karma.ini config file. I've been getting "better than gmail" filtering results.

Having a look at the tests run by the FCrDNS, helo.checks, and data.headers as well. They might provide you with additional filtering ideas. If you have further ideas for reliably detecting spam with cheap (pre-DATA) tests, I'm interested to hear them.