Load Balance Mail Gateways

Solution 1:

We do this simply using Linux Virtual Server, which is part of the standard linux kernel for some years now.

It allows for weigth-based loadbalancing and is quite easy to setup, we are doing something like this:

ipvsadm -A -t 192.168.0.3:25 -s wrr
ipvsadm -a -t 192.168.0.3:25 -r 192.168.0.8:25 -g -w 100
ipvsadm -a -t 192.168.0.3:25 -r 192.168.0.9:25 -g -w 100

(where 192.168.0.3 is your "service IP" or "virtual IP" and 192.168.0.8 and 192.168.0.9 are your "real servers")

Most important to know - the way of operation. This setup uses "gateway mode", in which source and destination of the packets are not changed. But this has some implications. The virtual ip has to be configured on all "real servers". But this might lead to ARP race conditions you should avoid by design:

  • Either your "real servers" are behind the load balancer in a separate LAN
  • OR you configure your real servers not to reply to ARP for the virtual address
  • OR you are routing the virtual IP directly to your load balancer so it is not ARPed for

Perhaps -m - masquerading mode is a little bit easier to set up.

And - another hint here: you might want to use keepalived which sets ipvsadm up, monitors your mail server for reachabilty and perhaps provides redundancy for the loadbalancer itself using VRRP.

We are using ipvs to handle 15k CPS DNS load balancing.

(*) at least in debian it's called this way, but searching for ipvs should be easy

Solution 2:

SMTP has built in load balancing using DNS, in a round robin fashion. This works quite well for most purposes. If that's not sufficient for you, you will have to create your own custom set up which is not an easy task. So unless you really need it I'd stick with what is available and widely used.

I am assuming your email servers (MTA) are on the same domain (say example.org), in that case create an MX record for each separate MTA, with the same priority. Using same priority ensures each server is tried in a round robin fashion, otherwise the one with the highest priority (lower number) is always tried first (in the case of MTAs that aren't broken, spammers love to hit the server with lowest priority thinking it may be a lower spec "fallback" server):

example.org.    IN  MX  10  mx1.example.org.
example.org.    IN  MX  10  mx2.example.org.
example.org.    IN  MX  10  mx3.example.org.
example.org.    IN  MX  10  mx4.example.org.

Of course make sure each mx* can be resolved:

example.org.    IN      A       192.168.2.1
mx1     IN      A       192.168.2.2
mx2     IN      A       192.168.2.3
mx3     IN      A       192.168.2.4
mx4     IN      A       192.168.2.5

If you want to also use DNS to "load balance" MTAs for your users to send out email you can configure DNS in this way. Let's call your outgoing server smtp.example.org and tell your users to submit email to it. I put "load balance" in quotes because this won't avoid connecting to a server that's down the way MTAs deal with it using MX records. In this case the user has to retry one or more times to hit a working server.

smtp    IN      A       192.168.2.2
smtp    IN      A       192.168.2.3
smtp    IN      A       192.168.2.4
smtp    IN      A       192.168.2.5

This is a crude solution because depending on the user's system and setup they may keep trying to hit just one IP. But at least it's not "down for everyone" and you can always direct them to a working server. In addition if a server is permanently down you can remove it from the DNS and once cached that should prevent your users from hitting it. In this case haproxy may not be such a bad solution.