How can one efficiently use S3 to back up files incrementally?

I understand how rsync works on a high-level, but there are 2 sides. With S3 there is no daemon to speak of — well there is, but it's basically just HTTP.

There look to be a few approaches.

s3rsync (but this just bolts on rsync to s3). Straightforward. Not sure I want to depend on something 3rd party. I wish s3 just supported rsync.

There also are some rsync 'clones' like duplicity that claim to support s3 without said bolt-on. But how can it do this? Are they keeping an index file locally? I'm not sure how that can be as efficient.

I obviously want to use s3 because it's cheap and reliable, but there are things that rsync is the tool for, like backing up a giant directory of images.

What are the options here? What do I lose by using duplicity + s3 instead of rsync + s3rsync + s3?


Solution 1:

Since this question was last answered, there is a new AWS command line tool, aws.

It can sync, rsync-like, between local storage and s3. Example usage:

aws s3 sync s3://mybucket /some/local/dir/

If your system's python environment is set up properly, you can install AWS client using pip:

pip install awscli

Solution 2:

The s3cmd tool has a great sync option. I use it to sync local backups, using something like:

s3cmd sync --skip-existing $BACKUPDIR/weekly/ s3://MYBACKUP/backup/mysql/

The --skip-existing means it doesn't try to checksum compare the existing files. If there is a file with that name already, it will just quickly skip it and move on. There is also --delete-removed option which will remove files not existing locally, but I want to keep on S3 even ones that I have cleaned up locally so I don't use this.