Where to go to learn about web architecture? Youtube example? [closed]

I'm trying to build a web application that is similar to Youtube (it's not a knock off), but I guess I don't know how video is served on the internet very well.

I know how to build regular database driven web applications, but nothing like the scalability of Youtube. All of the applications I have built before have all been run on one server with the files stored on the same box as the web server.

How does one decouple the application server from the file storage from the media server? I would more or less want four machines (clusters of machines):

  1. Application servers -- Present the web page, handle user uploads, link the user's flash player to the correct media server etc.
  2. Database shards -- Store user information, check favorites, etc.
  3. File storage -- Store the media files
  4. Media servers -- Serve the media files

How do I hook all of this together? Which technologies should I leverage? Where do I go to learn more about architecting this?

How does Youtube's embeddable flash stuff work? I want to embed my flash player on other websites and have it tie into my architecture.

Note: I have looked into: http://highscalability.com/youtube-architecture

But I still don't get the overall picture of how this stuff ties together. If someone can explain in high level terms how all of this stuff works?

Are there dedicated client servers running internally to shuffle around all of this stuff between the application servers, file storage, etc. Is it all via HTTP using JSON, what is going on here!

Thanks!


You can check out http://highscalability.com/ for some really cool stuff on this topic.


How does one decouple the application server from the file storage from the media server?

You use a cluster file system like OCFS2: http://oss.oracle.com/projects/ocfs2/

SAN storage, OCFS2, each application server mounts the shared file system via iSCSI.

A more exotic solution is MogileFS, used by LiveJournal: http://www.danga.com/mogilefs/

Cheers


Well, how it all fits together is however you want to fit it together really. There are so many options on how to architect a system of the type you're describing, what it really depends on is the particular application, and what you're comfortable with and how much it really needs to scale.

The best you can really do is take a look at all the different software in use for highly scalable websites, and read tutorials, documentation, and look at real implementations if you are able to. Take a look at messaging software like ZeroMQ, or other software based on AMQP. Look at scalable data stores that don't require manual sharding and normalization of databases. There are so many things you can read about that will give you some idea of how everything really fits together, and you can get a proper bird's eye view of a whole system.