Duplicated rows of Google cloud billing data

1. Why rows get duplicated when un-nesting the labels field?

When you are un-nesting a repeated type field like labels, duplication of rows is expected. To be precise, each row gets duplicated based on the length of that row's labels array.

Unnest count vs total labels

2. Why are there duplicate rows even before un-nesting?

If you create 2 Compute Engine VMs of exact config and location, the idle usage rows in the billing export of those 2 VMs are exactly the same unless labelled. The export table doesn't have an explicitly exposed primary key.

The export table's granularity is only until service & SKUs and not per resource. This leads to duplicate looking data but they are actual valid usages.