Duplicated rows of Google cloud billing data
1. Why rows get duplicated when un-nesting the labels field?
When you are un-nesting a repeated type field like labels
, duplication of rows is expected. To be precise, each row gets duplicated based on the length of that row's labels
array.
2. Why are there duplicate rows even before un-nesting?
If you create 2 Compute Engine VMs of exact config and location, the idle usage rows in the billing export of those 2 VMs are exactly the same unless labelled. The export table doesn't have an explicitly exposed primary key.
The export table's granularity is only until service & SKUs and not per resource. This leads to duplicate looking data but they are actual valid usages.