Fastest way to get Google Storage bucket size?

I'm currently doing this, but it's VERY slow since I have several terabytes of data in the bucket:

gsutil du -sh gs://my-bucket-1/

And the same for a sub-folder:

gsutil du -sh gs://my-bucket-1/folder

Is it possible to somehow obtain the total size of a complete bucket (or a sub-folder) elsewhere or in some other fashion which is much faster?


Solution 1:

The visibility for google storage here is pretty shitty

The fastest way is actually to pull the stackdriver metrics and look at the total size in bytes: enter image description here

Unfortunately there is practically no filtering you can do in stackdriver. You can't wildcard the bucket name and the almost useless bucket resource labels are NOT aggregate-able in stack driver metrics

Also this is bucket level only- not prefixes

The SD metrics are updated daily so unless you can wait a day you cant use this to get the current size right now

UPDATE: Stack Driver metrics now support user metadata labels so you can label your GCS buckets and aggregate those metrics by custom labels you apply.

Solution 2:

Unfortunately, no. If you need to know what size the bucket is right now, there's no faster way than what you're doing.

If you need to check on this regularly, you can enable bucket logging. Google Cloud Storage will generate a daily storage log that you can use to check the size of the bucket. If that would be useful, you can read more about it here: https://cloud.google.com/storage/docs/accesslogs#delivery

Solution 3:

If the daily storage log you get from enabling bucket logging (per Brandon's suggestion) won't work for you, one thing you could do to speed things up is to shard the du request. For example, you could do something like:

gsutil du -s gs://my-bucket-1/a* > a.size &
gsutil du -s gs://my-bucket-1/b* > b.size &
...
gsutil du -s gs://my-bucket-1/z* > z.size &
wait
awk '{sum+=$1} END {print sum}' *.size

(assuming your subfolders are named starting with letters of the English alphabet; if not; you'd need to adjust how you ran the above commands).

Solution 4:

Use the built in dashboard Operations -> Monitoring -> Dashboards -> Cloud Storage

The graph at the bottom shows the bucket size for all buckets, or you can select an individual bucket to drill down.

object size graph