Is rsync a good candidate for failover implementation (very large dataset)?

Is rsync smart/efficient at detecting which files to copy/delete?

Rsync is extremely efficient at detecting and updating files. Depending on how your files change, you might find a smaller number of large files are far easier to sync then lots of small files. Depending on what options you choose, on each run it is going to stat() every file on both sides, and then transfer the changes if the files are different. If only a small number of your files are changing, then this step to look for changed files can quite expensive. A lot of factors come into play about how long rsync takes. If you are serious about trying this you should do a lot of testing on real data to see how things work.

If the master crashes and a slave takes over for an hour (for example), is making the master up-to-date again as simple as running rsync the other way round (slave to master)?

Should be.

Is there any possibility of implementing multi-master systems with rsync?

Unison, which uses the rsync libraries allows a bi-directional sync. It should permit updates on either side. With the correct options it can identify conflicts and save backups of any files where a change was made on both ends.

Without knowing more about the specifics I can't tell you with any confidence this is the way to go. You may need to look at DRBD, or some other clustered device/filesystem approach which will sync things at a lower level.


Should I split my large files?
rsync is smart, but very large files can be dramatically less efficient to synchronize. Here's why:

If only a part of a file changes, then rsync is smart enough to only send that part. But to figure out which part to send, it has to divide the file into logical chunks of X bytes, build checksums for each chunk (on both sides), compare the chunks, send the differences, and then re-construct the file on the receiving end.

On the other hand, if you have a bunch of small files which don't change, then the dates and sizes will match and rsync will skip the checksum step and just assume that the file hasn't changed. If we're talking about many GB of data, you're skipping a LOT of IO, and saving a LOT of time. So even though there's extra overhead involved with comparing more files, it still comes out to less than the amount of time required to actually read the files and compare the checksums.

So, while you want as few files as necessary, you also want enough files so that you won't waste a lot of IO working on unchanged data. I'd recommend splitting the data along the logical boundaries your application uses.

is making the master up-to-date again as simple as running rsync the other way round
From a filesystem perspective, yes. But your application might have other requirements that complicate things. And, of course, you'll be reverting back to your most recent checkpoint at which you rsync'ed to your slave.

Is there any possibility of implementing multi-master systems with rsync?
Technically yes, but down that path lies madness. Assuming everything works great, then everything will be fine. But when there's hiccups, you can start to run into problems with changes (and specifically deletes) getting synced the wrong direction, overwriting your good files with your bad ones, or deleting your inserted files, or the ghosts of deleted files reappearing. Most people recommend against it, but you can try it if you like.

advice, tips, experience
If you're looking for a master/master setup with on-the-fly syncing, I'd recommend DRBD. It's significantly more complicated to set up and maintain, but a lot more capable. It does block-level synchronization of the disk itself, rather than the files on it. To do this "on-line", you need a filesystem that's can tolerate that type of synchronization, like GFS.

Rsync is more like a snapshot system than a continuous synchronization system.