Quick way to list all files in Amazon S3 bucket?
I have an amazon s3 bucket that has tens of thousands of filenames in it. What's the easiest way to get a text file that lists all the filenames in the bucket?
I'd recommend using boto. Then it's a quick couple of lines of python:
from boto.s3.connection import S3Connection
conn = S3Connection('access-key','secret-access-key')
bucket = conn.get_bucket('bucket')
for key in bucket.list():
print(key.name.encode('utf-8'))
Save this as list.py, open a terminal, and then run:
$ python list.py > results.txt
AWS CLI
Documentation for aws s3 ls
AWS have recently release their Command Line Tools. This works much like boto and can be installed using sudo easy_install awscli
or sudo pip install awscli
Once you have installed, you can then simply run
aws s3 ls
Which will show you all of your available buckets
CreationTime Bucket
------------ ------
2013-07-11 17:08:50 mybucket
2013-07-24 14:55:44 mybucket2
You can then query a specific bucket for files.
Command:
aws s3 ls s3://mybucket
Output:
Bucket: mybucket
Prefix:
LastWriteTime Length Name
------------- ------ ----
PRE somePrefix/
2013-07-25 17:06:27 88 test.txt
This will show you all of your files.
s3cmd is invaluable for this kind of thing
$ s3cmd ls -r s3://yourbucket/ | awk '{print $4}' > objects_in_bucket
Be carefull, amazon list only returns 1000 files. If you want to iterate over all files you have to paginate the results using markers :
In ruby using aws-s3
bucket_name = 'yourBucket'
marker = ""
AWS::S3::Base.establish_connection!(
:access_key_id => 'your_access_key_id',
:secret_access_key => 'your_secret_access_key'
)
loop do
objects = Bucket.objects(bucket_name, :marker=>marker, :max_keys=>1000)
break if objects.size == 0
marker = objects.last.key
objects.each do |obj|
puts "#{obj.key}"
end
end
end
Hope this helps, vincent
Update 15-02-2019:
This command will give you a list of all buckets in AWS S3:
aws s3 ls
This command will give you a list of all top-level objects inside an AWS S3 bucket:
aws s3 ls bucket-name
This command will give you a list of ALL objects inside an AWS S3 bucket:
aws s3 ls bucket-name --recursive
This command will place a list of ALL inside an AWS S3 bucket... inside a text file in your current directory:
aws s3 ls bucket-name --recursive | cat >> file-name.txt