What is a practical way to mirror an Amazon S3 bucket?
I want to mirror my Amazon S3 buckets. I want to do this because 1) I don't want all my data only existing with one provider; and 2) in case of software error or security breach I want to have the data backed up.
I can mirror to a local disk with the s3cmd sync function, but that does not scale for very large buckets and is not useful for quick backup restores. I'd rather have my data mirrored to a competitor like Rackspace Cloud Files.
Anyone have some suggestions on a simple and robust way to facilitate this kind of mirroring in an automated way on a Linux box?
You can use the "s3cmd" utility with the "sync" option, although I stumbled on your question because I'm trying to figure out if this syncing mechanism is screwing up my duplicity backups.
I was having the same problem so I whipped up a little program specifically designed to mirror one S3 bucket to another; I call it s3s3mirror.
I did try the "s3cmd sync" approach first, but I had a bucket with hundreds of thousands of objects in it, and "s3cmd sync" just sat there, not doing anything but consuming more and more memory until my system died. I designed s3s3mirror to get going immediately, to use 100 concurrent threads (configurable), and to make modest use of CPU and memory. If I do say so myself, it's pretty freakin' fast.
I've made this available on github under an Apache License. If you decide to give it a whirl please let me know what you think and if there's anything that can be improved.
Here's the link: https://github.com/cobbzilla/s3s3mirror
thanks!
- jonathan.
Amazon now has a supported tool to do this, the aws cli.
It can mirror in either direction between local and remote, or between two s3 locations.
Unfortunately it doesn't have any direct support for non-S3 locations like rackspace, but I thought this would be a useful answer for some who find this question. Like me, before I found it.
In particular,
aws s3 sync s3://some/s3/path /some/local/path
Check out Jungle Disk Server. It works with both Amazon S3 and Rackspace Cloud Files. You could mount S3 and CF in different locations on your filesystem and then use rsync between the two.
You could try mounting the buckets with fuse using s3fs. Once that's done, you can then rsync from the mount point to your local disk.