How to list all AWS S3 objects in a bucket using Java
It might be a workaround but this solved my problem:
ObjectListing listing = s3.listObjects( bucketName, prefix );
List<S3ObjectSummary> summaries = listing.getObjectSummaries();
while (listing.isTruncated()) {
listing = s3.listNextBatchOfObjects (listing);
summaries.addAll (listing.getObjectSummaries());
}
For those, who are reading this in 2018+. There are two new pagination-hassle-free APIs available: one in AWS SDK for Java 1.x and another one in 2.x.
1.x
There is a new API in Java SDK that allows you to iterate through objects in S3 bucket without dealing with pagination:
AmazonS3 s3 = AmazonS3ClientBuilder.standard().build();
S3Objects.inBucket(s3, "the-bucket").forEach((S3ObjectSummary objectSummary) -> {
// TODO: Consume `objectSummary` the way you need
System.out.println(objectSummary.key);
});
This iteration is lazy:
The list of
S3ObjectSummary
s will be fetched lazily, a page at a time, as they are needed. The size of the page can be controlled with thewithBatchSize(int)
method.
2.x
The API changed, so here is an SDK 2.x version:
S3Client client = S3Client.builder().region(Region.US_EAST_1).build();
ListObjectsV2Request request = ListObjectsV2Request.builder().bucket("the-bucket").prefix("the-prefix").build();
ListObjectsV2Iterable response = client.listObjectsV2Paginator(request);
for (ListObjectsV2Response page : response) {
page.contents().forEach((S3Object object) -> {
// TODO: Consume `object` the way you need
System.out.println(object.key());
});
}
ListObjectsV2Iterable
is lazy as well:
When the operation is called, an instance of this class is returned. At this point, no service calls are made yet and so there is no guarantee that the request is valid. As you iterate through the iterable, SDK will start lazily loading response pages by making service calls until there are no pages left or your iteration stops. If there are errors in your request, you will see the failures only after you start iterating through the iterable.
This is direct from AWS documentation:
AmazonS3 s3client = new AmazonS3Client(new ProfileCredentialsProvider());
ListObjectsRequest listObjectsRequest = new ListObjectsRequest()
.withBucketName(bucketName)
.withPrefix("m");
ObjectListing objectListing;
do {
objectListing = s3client.listObjects(listObjectsRequest);
for (S3ObjectSummary objectSummary :
objectListing.getObjectSummaries()) {
System.out.println( " - " + objectSummary.getKey() + " " +
"(size = " + objectSummary.getSize() +
")");
}
listObjectsRequest.setMarker(objectListing.getNextMarker());
} while (objectListing.isTruncated());
I am processing a large collection of objects generated by our system; we changed the format of the stored data and needed to check each file, determine which ones were in the old format, and convert them. There are other ways to do this, but this one relates to your question.
ObjectListing list = amazonS3Client.listObjects(contentBucketName, contentKeyPrefix);
do {
List<S3ObjectSummary> summaries = list.getObjectSummaries();
for (S3ObjectSummary summary : summaries) {
String summaryKey = summary.getKey();
/* Retrieve object */
/* Process it */
}
list = amazonS3Client.listNextBatchOfObjects(list);
}while (list.isTruncated());