Uploading 200GB of files to S3
What's the best way to upload 200GB tar.gz files to S3 in Linux? On researching I found that S3 limit on objects has been increased to 5TB and came to know about the multi part upload mechanism to speed uploads. I found a python script boto which might help on this. Could someone shed more light on this topic?
Don't upload it. Post it. http://aws.amazon.com/importexport/
Amazon offer a service where you send them portable media, and they upload the data for you, from their fast backbone.
If you're really hellbent on doing it yourself, grab a copy of S3cmd and do s3cmd sync
.
"Never underestimate the bandwidth of a station wagon full of tapes hurtling down the highway." - Andrew S Tanenbaum
Edit: If you really want to be able to chunk the file upload, I suggest you do the following.
- Get hold of an AWS EC2 instance with enough ephemeral storage to hold the files you want to upload.
- Use GNU Split to divide the files into smaller chunks.
- Upload the chunks to your temporary EC2 instance.
- Reassemble the chunks with the split option to reaassemble.
- Upload the chunks to S3 from EC2 (blindingly fast!)
- Shutdown the EC2 instance, but keep it handy.
Thanks for your reply and options tom.I got was able to achieve a 20gb upload to s3 using mulipart upload.I needed python 2.5> + boto library + s3_mulitpart python script to do the upload. My references where
- http://code.google.com/p/boto/ - boto ( used 2.1.1)
- http://www.elastician.com/2010/12/s3-multipart-upload-in-boto.html : mulipart upload using boto.
- http://bcbio.wordpress.com/2011/04/10/parallel-upload-to-amazon-s3-with-python-boto-and-multiprocessing/ : parallel upload to Amazon S3 script
Hope these are useful.
Prem