Software to synchronize two directories (local/remote)

I need software that would detect changes in directory (A) every 5 seconds or so and upload/delete changed files from/at (B) remote directory.

I found non-commercial software, https://github.com/devstructure/doubledown. I am wondering if there are any commercial alternatives?


Solution 1:

rsync is definitely the right tool for this job. It exists to keep directories in synch and can do it with a fair bit of smarts. For example: it'll transfer only deltas whenever it can and it can work over ssh tunnels.

Let's say you have machine source that hosts your live version of the directory tree /my/tree and machine sink that you want to keep in close synchronization with it. If you had an ssh account on sink you could, from source use rsync as follows:

rsync -avz --delete -e ssh /my/tree/ remoteuser@sink:/my/tree

This assumes you want /my/tree in the exact same spot on sink as you have it on source. Of course, you don't need to keep it in the exact same spot.

Breaking down the command line:

  • -avz: archive mode, verbose out, use compression during transfer
  • --delete: delete files on sync that aren't present on source
  • -e ssh: Use ssh as the connection method

This call will, of course, ask you for your password when you make it. If you want to do this in some automated fashion you're going to need to share some keys between the accounts on the machines and use public-private key encryption to make the ssh connection.

To set up your key pair for this rysnc, run the following command on your source machine:

> ssh-keygen -t rsa -b 2048 -f ~/.ssh/my-rsync-key 
Generating public/private rsa key pair.
Enter passphrase (empty for no passphrase): [press enter here] 
Enter same passphrase again: [press enter here] 
Your identification has been saved in ~/.ssh/my-rsync-key. 
Your public key has been saved in ~/.ssh/my-rsync-key.pub. 
The key fingerprint is: 
2e:28:d9:ec:85:21:e7:ff:73:df:2e:07:78:f0:d0:a0 root@source

> chmod 600 ~/.ssh/my-rsync-key

For this keypair to work we need to add the contents of ~/.ssh/my-rsync-key.pub to the ~<remoteuser>/.ssh/authorized_keys file on the sink machine.

First copy the file over to the sink machine:

scp ~/.ssh/my-rsync-key.pub remoteuser@sink:~

Then ssh to the sink machine and import the key by running the following as remoteuser on the machine:

> if [ ! -d ~/.ssh ]; then mkdir ~/.ssh ; chmod 700 ~/.ssh ; fi
cd ~/.ssh/ 
if [ ! -f authorized_keys ]; then touch authorized_keys ; chmod 600 authorized_keys ; fi 
cat ~/my-rsync-key.pub >> authorized_keys
rm ~/my-rsync-key.pub

For additional tips on locking down the ssh connection between your source and sink machines I recommend taking a look at this page.

From your source machine you can test that this setup works by running:

rsync -avz --dry-run -e "ssh -i ~/.ssh/my-rsync-key" /my/tree/ remoteuser@sink:/my/tree

That will do a dry run of an rsync. If you see the rsync command connecting and comparing the files you know things are setup properly.

Now we need an easy way to call this rsync command from a LaunchD config file as shown in this helpful answer on this site. Since you want this call to happen in a tight loop you'll need to make certain you don't have multiple copies of the rsync running at the same time. You can use flock to create a mutex that ensures a bash script is singleton: only one instance of it every runs at one time on a machine. So we're going to create the following script on disk:

#!/bin/sh
SINK_INSTANCE=remoteuser@sink
DIR=/my/tree
KEY=~/.ssh/my-rsync-key
LOG = ~/my_rsync.log
LOCK = ~/my_rsync.lock
SOURCE=/my/tree

exec 9>${LOCK}
if ! flock -n 9  ; then
    echo "Another instance of your rsync is already running";
    exit 1
fi

echo "----------" >> ${LOG}
echo `date` >> ${LOG}

rsync -avz --delete -e "ssh -i ${KEY}" \
    ${SOURCE}/ {SINK_INSTANCE}:${SOURCE} 2>&1 >> ${LOG}

Save that as ~/my_rsync.sh.

That script will take care of doing the rsync for you. All you need to do now is set it up via LaunchD and have it run in a tight loop. Following the directions from here and modifying it to meet our needs, we're going to create ~/Library/LaunchAgents/my-rsync.plist in a text editor and make the contents:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN"
 "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
    <key>Label</key>
    <string>My Rsync</string>
    <key>Program</key>
    <string>/bin/sh</string>
    <key>ProgramArguments</key>
    <array>
        <string>sh</string>
        <string>-c</string>
        <string>while sleep 5s; /Users/my/my_rsync.sh; done</string>
    </array>
    <key>ServiceDescription</key>
    <string>Keep /my/tree synchronized with the machine sink</string>
    <key>KeepAlive</key>
    <true/>
</dict>
</plist>

That should take care of things.

The usual caveats apply: I wrote this from memory and didn't test it. So don't follow along blindly. Test carefully along the way. Whenever you're in doubt use the --dry-run option on rsync. It'll print out what it would have done without actually doing anything.

Solution 2:

Instead of running the rsync every 5 seconds, you can use the lsyncd daemon to watch the directory tree.

It works in OS X through /dev/fsevents, but I don't have a plugin ready .deb file. It's a little geeky to get it compiled and installed. Until I release version 2.0.6 (soon) I'd advice to use GIT head, since lsyncd 2.0.5 got some known OS X bugs.

Solution 3:

rsync would be a great tool for this, and it is built into the Mac OS. It will check differences between two locations, and then only copy the delta changes across the network. rsync without any extra flags does almost exactly what you are looking for.

The only addition is that you would need to have it run every 5 seconds to check for changes. You can do this by using launchd, with a great example already on this site. In this example it looks like the script is kept running, and then sleeping 5 seconds, then run again.

The issue with running this so often is the tool will also have to check the differences, and if there are so many changes in 5 seconds, the overhead to record those changes and transfer them may take more than 5 seconds.