What's the piece of hardware listening on Facebook's or Wikipedia's IP address?

Solution 1:

It isn't necessarily a piece of hardware doing this but a complete system that has been designed to scale. This not only encompasses the hardware but more importantly the application design, database design (relational or otherwise), networking, storage and how they all fit together.

A good starting point for your curiosity on finding out how some of the large sites scale is High Scalability - Start Here and High Scalability on Wikimedia architecture, Facebook and Twitter as examples.

Regarding your question about DNS and single IP addresses and round-robin these types of sites will often use load balancing as a method of presenting a single IP address. This can be done either by specialised hardware load balancers or through software running on general purpose servers. The incoming requests to the IP managed by the load balancer is then distributed across a series of servers transparently to the end user.

For a good explanation on this topic, including a comparison of hardware and software load balancers/proxies and how they compare to DNS round robin, have a read of Load Balancing Web Applications.

Solution 2:

Anycast can also be used for TCP connections, assuming the connections are short-lived so the routes do not change during the connection lifetime. This is a good assumption with HTTP connections (especially if Connection: Keep-Alive is kept to a short timeout or disabled).

Many CDNs (CacheFly, MaxCDN, and probably many others) actually use anycast for TCP connections (HTTP), and not just DNS. When you resolve a hostname on CacheFly, you get the same IP address around the world, it is simply routed to the "closest" CacheFly cluster. "Closest" here would be in terms of BGP path length and metrics, which is usually a better way to measure network latency than simple geographic distance.

In the case of Wikipedia specifically: http://www.datacenterknowledge.com/archives/2008/06/24/a-look-inside-wikipedias-infrastructure/

Solution 3:

The easiest way to verify if an IP address is using Anycast is to do a traceroute from different location. You can try the following : go to traceroute.org , pick a location and try to do a traceroute to IP address 8.8.8.8 ( Google Public DNS that use anycast ). You should be able to see that traceroute from server in Australia to 8.8.8.8 stay in Australia.

Instead of ping, try to do hostname lookup : eg : http://network-tools.com/default.asp?prog=dnsrec&host=profile.ak.fbcdn.net

You will see the list of IP address behind that name. These IP addresses will be use in a round-robin fashion when you ping the server.