Options to efficiently synchronize 1 million files with remote servers?
Solution 1:
Since instant updates are also acceptable, you could use lsyncd.
It watches directories (inotify) and will rsync
changes to slaves.
At startup it will do a full rsync
, so that will take some time, but after that only changes are transmitted.
Recursive watching of directories is possible, if a slave server is down the sync will be retried until it comes back.
If this is all in a single directory (or a static list of directories) you could also use incron.
The drawback there is that it does not allow recursive watching of folders and you need to implement the sync functionality yourself.
Solution 2:
Consider using a distributed filesystem, such as GlusterFS. Being designed with replication and parallelism in mind, GlusterFS may scale up to 10 servers much more smoothly than ad-hoc solutions involving inotify and rsync
.
For this particular use-case, one could build a 10-server GlusterFS volume of 10 replicas (i.e. 1 replica/brick per server), so that each replica would be an exact mirror of every other replica in the volume. GlusterFS would automatically propagate filesystem updates to all replicas.
Clients in each location would contact their local server, so read access to files would be fast. The key question is whether write latency could be kept acceptably low. The only way to answer that is to try it.
Solution 3:
I doubt rsync
would work for this in the normal way, because scanning a million files and comparing it to the remote system 10 times would take to long. I would try to implement a system with something like inotify
that keeps a list of modified files and pushes them to the remote servers (if these changes don't get logged in another way anyway). You can then use this list to quickly identify the files required to be transferred - maybe even with rsync (or better 10 parallel instances of it).
Edit: With a little bit of work, you could even use this inotify/log watch approach to copy the files over as soon as the modification happens.