High throughput meshed VPN to connect datacenter hosts

We're renting a number of hosts in a public datacenter. The datacenter does not offer private VLANs; all hosts receive one (or more) public IPv4/IPv6 addresses. The hosts come with very modern CPUs (Haswell quad-core, 3.4GHz) and have Gbit uplinks. The different areas (rooms? floors? buildings?) of the datacenter are interconnected with - from what I can tell - Gbit or 500Mbit links. Our hosts are running debian wheezy. Currently we're running just above 10 hosts, with the expectation of growth in the near future.

I am looking for a way to get all hosts to communicate with each other, securely and confidentially. Layer 3 is fine, layer 2 ok (but not necessary). Since I don't have access to VLANs, it'll have to be a VPN of some sort.

What is important to me:

  1. high throughput, ideally close to wirespeed
  2. decentralized, meshed architecture - this is to make sure that throughput is not slowed down by a central element (e.g. VPN concentrator)
  3. CPU footprint is not excessive (given AESNI and GCM-cipher suites, I'm hoping this is not a ridiculous requirement)
  4. operational ease of use; not too complicated to setup; network can grow without loosing established connections

We're currently using tinc. It ticks [2] and [4], but I only reach about 600Mbit/s (simplex) of a 960Mbit/s wirespeed, and I loose one core completely. Also, tinc 1.1 - currently in development - is not yet multithreaded, so I'm stuck with singlecore performance.

Traditional IPSec is out of the question, since it requires a central element, or a sh*tload of tunnels to be configured (to achieve [2]). IPsec with opportunistic encryption would be a solution, but I'm not sure it ever made it into stable production code.

I've stumbled across tcpcrypt today. Except for the missing authentication, it looks like what I want. Userspace implementation smells slow, but so are all the other VPNs as well. And they speak of a kernel implementation. I have not tried it yet, and am interested how it behaves re [1] and [3].

What other options are there? What are people doing, who are not on AWS?

Additional Info

I'm interested in GCM, hoping that it will reduce the CPU footprint. See Intel's paper on the topic. When talking to one of the tinc developers, he explained that even using AESNI for the encryption, the HMAC (e.g. SHA-1) is still very expensive at Gbit speed.

Final Update

IPsec in transport mode works perfectly and does exactly what I want. After much evaluation I have chosen Openswan over ipsec-tools, simply because it supports AES-GCM. On the Haswell CPUs I measure about 910-920Mbit/sec simplex throughput with about 8-9% CPU load of one kworkerd.


Solution 1:

What you don't want is a VPN. What you do want is indeed IPsec, but not in tunnel mode. Rather, you want IPsec in transport mode.

In this configuration, each host communicates directly to its peer, and only packet payloads are encrypted, leaving IP headers in place. This way, you don't need to do any routing gymnastics to get things working.

Yes, you'll need an IPsec connection stanza for each host (unless your hosts are grouped in a subnet, in which case you can do this via a CIDR block), but those can easily be generated programmatically by your configuration management system.

You didn't ask about configuration details, but if you need some pointers (there's not all that much solid information out there on transport mode), you can refer to this blog post I wrote up recently.