How do I get anycast for my servers?

I want to have anycast for my web service, but I cannot find any information on how to achieve this or any company that can help.

I've found loads of companies offering anycast DNS, but that's not what I need.

I have a stateless web service that I want to geographically distribute, using anycast to load balance and increase uptime. Are there any technical reasons a company cannot just advertise an IP address at multiple datacenters for me?

What technical aspects about anycasting do I need to know about to evaluate existing offerings and help me find companies that could help me? What are the pitfalls I need to watch out for?


There are two separate aspects about anycast that need to be understood in order to address your particular request. The first part is how anycast addresses are advertised and routed. The second is what the challenges are in TCP to an anycast address, and how they might be addressed.

Announcing and routing

In order to keep the BGP table of an acceptable size, most AS will filter incoming announcements if the prefixes are too long. For IPv4 the threshold tend to be a /24 prefix, which means 256 addresses. This means in order to do anycast on the public internet, you need at least 256 addresses.

If you already have a /24 prefix of your own, then there is not much stopping a hosting provider from announcing it in your behalf. If this is the case anycasting could be as simple for you as finding a bunch of different hosting providers willing to provide this service at the right price. Then you just have all of them announce the prefix on your behalf.

You can look at publicly available information about advertised routes to find providers already announcing prefixes on behalf of their customers in order to guide you to providers likely to offer this kind of service. One tool to look up this in routing tables is bgp.he.net.

If you do not have your own prefix and want one from a provider, it is important to understand what the limitations mentioned above means to that provider.

The provider has enough IP addresses that they could configure an anycast prefix. However once they do that, they are committed to using all 256 addresses as anycast. And all 256 addresses must be hosted in the exact same set of locations.

For this reason you sometimes see 256 addresses allocated in order to use just one of them for an anycasted service. This might be the first opportunity for you. A provider already anycasting a prefix might in fact have 250 unused anycast addresses. If your service is "interesting" enough for a provider, they may be willing to rent you hosting on one of those remaining addresses. One important caveat is that you would have to be hosted in the exact same locations as their primary anycast service. And an arrangement would likely be needed in which they move your service as they see fit, because it is their primary anycasted service that decides where hosting is needed.

Most of the above is assuming roughly a 1:1 correspondence between where the provider is hosting a service and where they are announcing the prefixes.

If the hosting provider have their own redundant backbone and their own data centers, then they could announce a prefix in a different set of locations from where they are hosting it. Moreover internally they can route longer prefixes as unicast or anycast.

For example if the provider announces a /22 in four different POPs, and they have a redundant network between those (for example a ring of four links), they could internally route a /24 or /25 to each POP and maybe set aside a /28 to be anycasted to all POPs (which effectively means get serviced by the POP where the packets first enter their network).

If you can find a provider which has both the redundant backbone and data centers, then it simply is much easier for such a provider to anycast one of their own IP addresses for your service. However keep in mind, that in doing so your service consumes one CAM table entry in every one of their backbone routers. And you'd have to pay for that.

TCP and anycast

As some of the comments have pointed out, TCP is a stateful protocol. So even if you consider your web service to be stateless, it still has state at the TCP layer. The consequence of that is that naively anycasting a TCP based service will be that users will experience very frequent connection reset.

That issue can be addressed by putting another layer in front of the actual web servers. What is needed is a layer of nodes that can forward received TCP packets to the proper web server and do so consistently across a connection. So far this pretty much describes a standard DSR based load balancer.

However since there are multiple instances of this load balancer, they need to share state. A distributed hash table is a data structure which could be used for this layer.

Moreover packets from the load balancing layer need to be forwarded unmodified to the backend. IP routing based on the destination IP of the original packet won't solve that problem, because that destination address is still the anycasted address, so the packet would never make it to the backend but simply bounce back to the load balancer and loop until the TTL expired.

Typical load balancers address this by modifying the destination MAC address and forward it, thereby bypassing the IP routing. This only works if your load balancer and backends were all located in a single location and the network between them is entirely switched without any routers between load balancer and backends.

However there is a different approach to solving that problem. Packets from the load balancer to backend can be send through an IP tunnel. The outer IP header carries a destination address which is a unicast address pointing to a backend. The inner IP header is unmodified and carries the client IP as source and anycast IP as destination.

In this setup the source IP of the outer header is mostly unused. In principle it is supposed to be a unicast address of the load balancer receiving the packet. However some services (for example facebook) copies the client IP from the inner header as source IP on the outer header. This mistake on facebook's part can be detected from the outside because sometimes the tunneled packets trigger an ICMP error which is sent directly back to the client.

There is no need for the inner and outer header to use the same IP version. So the unicast addresses that are needed for load balancers and backends can all be IPv6 such the number of load balancers and backends are not limited by availability of IPv4 addresses.

Using a design as sketched above has the added advantage that the load balancers typically only need a minor part of the hardware in this setup, and it is only the load balancers that need to be reached through the anycast address. This means that it is less of a problem if your anycast address need to be relocated with short warning due to piggybacking on an anycast prefix allocated primarily for a different service.

Pitfalls

Obviously the setup sketched above is more complicated than simply deploying a bunch of standalone web servers. Complicated setups tend to be a source of unavailability. So some amount of work will have to be put into such a scheme to make it robust enough to be more reliable than the alternative. This means this is more likely something that should be deployed as part of a CDN service rather than something deployed for an individual web service.

If you try to do anycast TCP with anything simpler than the setup described above, you may very well run into the problem with routes changing mid-connection, and as a result users will experience resets.

Anycast may do some good for availability, latency, and load balancing. However it is no silver bullet. Anycast does balance load, and you can scale with load by adding more nodes. But don't expect anywhere near perfectly balanced load across the nodes reached by anycast. In the setup described above with a distributed load balancing layer, the load balancers themselves may not get even load, but they could distribute load evenly across backends.

Don't rely on a single anycast IP for availability. If one of your nodes goes down, routing may not pick it up automatically. It does not affect all clients, but a subset of clients may have their packets routed to a node which is down. Hence for those clients, your anycast IP address is down. If you want redundancy, you need multiple anycast IP addresses.

Latency can be good as long as routes don't change in the middle of a connection. But as soon as the TCP handshake has completed, you are committed to using a specific backend for the duration of the TCP connection. Packets have to go from client to load balancer to backend and to client. This triangular routing can increase latency. There is a latency reduction from anycast and being able to pick the backend closest by, but having three legs on the roundtrip rather than just two can increase latency. Only collecting lots of real world measurements will tell you which of the two factors weigh more.


This realistic article might also help https://engineering.linkedin.com/network-performance/tcp-over-ip-anycast-pipe-dream-or-reality

Real User Monitorings were used by linkedin to assess whether global anycast would have a good performance than a regional anycast. At the end they realised and in-fact implemented the regional anycast where different anycast address was used for a different region. They are using a mix of DNS based load balancing and the regional anycast based one.

The solution mentioned above is a good one as it somewhat provides separation between locations and the identities of the servers but its based on tunneling. I believe much more better approach will be to use the same separation approach without tunneling but then its implementation is quite limited this time. It is in active research though e.g. traffic engineering through ILNP (Identifier Locator Network Protocol) provides answers to these entangled issues. cheers


You'll need to colo physical webserver hardware with a network provider that can do the anycast for you.

If you go this route, you'll probably also want to setup a tunnel to the management (drac etc) cards on the machines so you don't need to visit them on-site.

We do this for our website.