Uploading 200GB of files to S3

What's the best way to upload 200GB tar.gz files to S3 in Linux? On researching I found that S3 limit on objects has been increased to 5TB and came to know about the multi part upload mechanism to speed uploads. I found a python script boto which might help on this. Could someone shed more light on this topic?


Don't upload it. Post it. http://aws.amazon.com/importexport/

Amazon offer a service where you send them portable media, and they upload the data for you, from their fast backbone.

If you're really hellbent on doing it yourself, grab a copy of S3cmd and do s3cmd sync.

"Never underestimate the bandwidth of a station wagon full of tapes hurtling down the highway." - Andrew S Tanenbaum

Edit: If you really want to be able to chunk the file upload, I suggest you do the following.

  1. Get hold of an AWS EC2 instance with enough ephemeral storage to hold the files you want to upload.
  2. Use GNU Split to divide the files into smaller chunks.
  3. Upload the chunks to your temporary EC2 instance.
  4. Reassemble the chunks with the split option to reaassemble.
  5. Upload the chunks to S3 from EC2 (blindingly fast!)
  6. Shutdown the EC2 instance, but keep it handy.

Thanks for your reply and options tom.I got was able to achieve a 20gb upload to s3 using mulipart upload.I needed python 2.5> + boto library + s3_mulitpart python script to do the upload. My references where

  1. http://code.google.com/p/boto/ - boto ( used 2.1.1)
  2. http://www.elastician.com/2010/12/s3-multipart-upload-in-boto.html : mulipart upload using boto.
  3. http://bcbio.wordpress.com/2011/04/10/parallel-upload-to-amazon-s3-with-python-boto-and-multiprocessing/ : parallel upload to Amazon S3 script

Hope these are useful.

Prem