*nix CARP or VMWare Fault Tolerance?
We're experimenting with what VMWare called a "Fully Collapsed DMZ" on blade centre. Basically our DMZ goes straight into a vSwitch and all the security appliances are virtualised.
I've spent days reading up about why this is a good idea and why it's a bad idea, what needs to be done to make it safe, etc, but the one thing I'm having trouble finding is information regarding the best fault tolerance method.
Our edge firewall of choice is pfSense which supports CARP. We've got 10 blades in the cluster, so it's quite feasable to have two or even three pfSense firewalls with VMWare HA enabled and configured internally with CARP that take over eachother in the event of a blade failure. But this seems like a lot of administrative overhead and I'm an un-trusting kind of guy, so it means that I'll be logging into multiple firewalls every week to make sure that all our rules etc have mirrored.
But why bother with CARP when VMWare's FT (even with its single vCPU shortfall) will provide all the features of CARP and as afar as I can tell, less management, stress and concern for my job.
tl;dr:
Is there any compelling reason to use CARP over FT, or vice versa for a software-based firewall?
Solution 1:
Although I've played with FT for ages now I've yet to find an actual use for it to be honest. Not only is the single vCPU thing a pain but the shear amount of network traffic generated is astonishing. You really do end up using most of a GigE link just for FT, so you end up throwing your vMotion traffic over another vswitch - making the whole thing quite a pain in the backside to be honest. My other concern is that FT only protects you against physical failures, if the FT'ed VM falls over for any reason then you still lose the service as the outage will be perfectly mirrored in the secondary VM.
I'm a cautious guy when it comes to production systems and just don't think FT's worth it right now, hope that changes but I'd sooner have other systems such as clustering/VIPs etc. there instead.
Oh and don't worry about collapsed DMZs, if you're using one of the vShield products I personal think they're as secure as any Cisco box.
Solution 2:
CARP is really designed to let your hosts detect if the other host's network NIC is offline (which is often the case when the physical host is down, but not necessarily)
The advantage of using CARP over VMWare FT would be if VMWare FT behaves differently when the NIC has failed.
If you're comfortable with running your firewall on a VM then the only concern I would have is the FT behaviour on a NIC failure. If a NIC failure does not force the FT to failover, then I would retain CARP.
Solution 3:
In addition to what Chris S mentioned, which I agree with, I would also go with CARP because what happens when you're upgrading a firewall or doing any other kind of maintenance requiring a reboot or power down? With FT, you're down while it reboots, with CARP, it's completely transparent. Or if you need to change something on the VM that can't be done while it's live, you have to cut off everything. Use CARP and you're in good shape in all those scenarios. FT is mostly to protect from hardware failure, where CARP accommodates any imaginable maintenance needs with no downtime at all, as well as protection from hardware failure.