Technologies that allow YouTube scale distribution?

Youtube as we know, is massive. It has thousand of concurrent users streaming at least 2 megabytes per video. Obviously, that gets to be a lot of traffic... far too much for any one server.

What networking technologies allow pushing 4 billion videos a day?


Scaling on the backend

In a very simple setup, one DNS entry goes to one IP which belongs to one server. Everybody the world over goes to that single machine. With enough traffic, that's just too much to handle long before you get to be YouTube's size. In a simple scenario, we add a load balancer. The job of the load balancer is to redirect traffic to various back-end servers while appearing as one server.

With as much data as YouTube has, it would be too much to expect all servers to be able to serve all videos, so we have another layer of indirection to add: sharding. In a contrived example, one server is responsible for everything that starts with "A", another owns "B", and so on.

Moving the edge closer

Eventually, though, the bandwidth just becomes intense and you're moving a LOT of data into one room. So, now that we're super popular, we move it out of that room. The two technologies that matter here are Content Distribution Networks and Anycasting.

Where I've got this big static files being requested all over the world, I stop pointing direct links to my hosting servers. What I do instead is put up a link to my CDN server. When somebody asks to view a video, they ask my CDN server for it. The CDN is responsible for already having the video, asking for a copy from the hosting server, or redirecting me. That will vary based on the architecture of the network.

How is that CDN helpful? Well, one IP may actually belong to many servers that are in many places all over the world. When your request leaves your computer and goes to your ISP, their router maps the best path (shortest, quickest, least cost... whatever metric) to that IP. Often for a CDN, that will be on or next to your closest Tier 1 network.

So, I requested a video from YouTube. The actual machine it was stored on is at least iad09s12.v12.lscache8.c.youtube.com and tc.v19.cache5.c.youtube.com. Those show up in the source of my webpage I'm looking at and were provided by some form of indexing server. Now, from Maine I found that tc19 server to be in Miama, Florida. From Washington, I found the tc19 server to be in San Jose, California.


Several techniques are used for large sites.

www.youtube.com -> any number of IP addresses

Let's look in DNS:

www.youtube.com is an alias for youtube-ui.l.google.com.
youtube-ui.l.google.com has address 74.125.226.14
youtube-ui.l.google.com has address 74.125.226.0
youtube-ui.l.google.com has address 74.125.226.1
youtube-ui.l.google.com has address 74.125.226.2
youtube-ui.l.google.com has address 74.125.226.3
youtube-ui.l.google.com has address 74.125.226.4
youtube-ui.l.google.com has address 74.125.226.5
youtube-ui.l.google.com has address 74.125.226.6
youtube-ui.l.google.com has address 74.125.226.7
youtube-ui.l.google.com has address 74.125.226.8
youtube-ui.l.google.com has address 74.125.226.9
youtube-ui.l.google.com has IPv6 address 2001:4860:800f::88

So www.youtube.com could actually go to several IP addresses.

anycasted IP addresses

A single IP could be handled by any number of Autonomous Systems (a Network on the internet) simultaneously. For instance, many of the root DNS servers as well as Google's 8.8.8.8 DNS server are anycasted at many points around the globe. The idea is that if you're in the US, you hit the US network and if you're in the UK, you hit the UK network.

media coming from different server

Just because you're on www.youtube.com, that does't mean that all the content has to come from the same server. Right on this site, static resources are served from sstatic.net instead of serverfault.com.

For instance, if we watch Kaley Cuoco's Slave Leia PSA we find that the media is served up by v10.lscache5.c.youtube.com.

multiple internet connections

I assure you, Youtube has more than one internet connection. Notwithstanding all the other techniques, even if Youtube really was a single site and a single server, it could in theory have connections to every single other network to which it was serving video. In the real world that's not possible of course, but consider the idea.

Any or all of these ideas (and more!) can be used to support a Content Delivery Network. Read up on that article if you'd like to know more.