Fastest way to get Google Storage bucket size?
I'm currently doing this, but it's VERY slow since I have several terabytes of data in the bucket:
gsutil du -sh gs://my-bucket-1/
And the same for a sub-folder:
gsutil du -sh gs://my-bucket-1/folder
Is it possible to somehow obtain the total size of a complete bucket (or a sub-folder) elsewhere or in some other fashion which is much faster?
Solution 1:
The visibility for google storage here is pretty shitty
The fastest way is actually to pull the stackdriver metrics and look at the total size in bytes:
Unfortunately there is practically no filtering you can do in stackdriver. You can't wildcard the bucket name and the almost useless bucket resource labels are NOT aggregate-able in stack driver metrics
Also this is bucket level only- not prefixes
The SD metrics are updated daily so unless you can wait a day you cant use this to get the current size right now
UPDATE: Stack Driver metrics now support user metadata labels so you can label your GCS buckets and aggregate those metrics by custom labels you apply.
Solution 2:
Unfortunately, no. If you need to know what size the bucket is right now, there's no faster way than what you're doing.
If you need to check on this regularly, you can enable bucket logging. Google Cloud Storage will generate a daily storage log that you can use to check the size of the bucket. If that would be useful, you can read more about it here: https://cloud.google.com/storage/docs/accesslogs#delivery
Solution 3:
If the daily storage log you get from enabling bucket logging (per Brandon's suggestion) won't work for you, one thing you could do to speed things up is to shard the du request. For example, you could do something like:
gsutil du -s gs://my-bucket-1/a* > a.size &
gsutil du -s gs://my-bucket-1/b* > b.size &
...
gsutil du -s gs://my-bucket-1/z* > z.size &
wait
awk '{sum+=$1} END {print sum}' *.size
(assuming your subfolders are named starting with letters of the English alphabet; if not; you'd need to adjust how you ran the above commands).
Solution 4:
Use the built in dashboard Operations -> Monitoring -> Dashboards -> Cloud Storage
The graph at the bottom shows the bucket size for all buckets, or you can select an individual bucket to drill down.