Deploying Files To Multiple Servers

Solution 1:

Well, both Twitter and Facebook have started using bittorrent in their clusters to distribute new code revs. Doing this, they're able to push code to tens of thousands of servers in a very short amount of time compared to old-school centralized deployment methods.

It doesn't sound like you're at that scale yet, but there's no harm in designing your deployment system such that it will not prove to be a bottleneck anytime soon.

Solution 2:

I don't recommend git for the scales you're talking to. It can work, but I personally see some deficits with using that model for fetching.

There are a couple of things that determine how best to go about this:

  1. How big of a repo needs to be shared out.
  2. How fast it needs to converge.

For perfect convergence, and maximum speed you'll have to go with a network file system, such as NFSv4. The clustered filesystems I know about don't scale to 'multiple hundreds' of nodes, so it has to be a network filesystem. This presents its own challenges, but it means that you'll reach convergence the moment the files are updated on the NFS head.

For rapid convergence, you can use some rsync trickery. If the rsync daemon ends up being CPU bound, you can certainly put a couple-three rsync servers behind a loadbalancer like haproxy. Couple that with cron jobs to pull data (or some other method of triggering code updates) and you can hit convergence pretty quickly.

For both of the above, it'll probably be a good idea to put the central repository on 10GbE links for maximum throughput.

An alternate is a push-rsync, where it's run from the central repo to push updates to your servers. It won't converge as fast as either of the above, but will be more friendly to your internal bandwidth. Use multiple hosts pushing to divided ranges for better speed.

Solution 3:

rdist may work for you.