Can Google Cloud Storage write byte ranges into an existing object?

Google Cloud Storage API doesn't appear to support writing byte ranges. Are there any workarounds?

I'm considering a design that would use a storage object to read and write 1000's of fixed byte ranges, where each range is a data record (~1kb) in a partitioned schema.

For example, each range might represent data for a day of the year from a given start time, and the client would obtain the data for a given range of days by reading at an offset.

I can see that I could use a storage object for each record, and will need to consider the impact batch limits (100 calls) will have on performance when reading the many small records.

Of course, it may be that cloud storage is the wrong approach and a database would handle this requirement better. However, it seems that storage scales very well (for parallel reads and writes) and would otherwise work well for long-term storage of bulk data that needs to only answer a very specific type of query (i.e. general purpose database seems unnecessary overhead).

Reference

Objects: insert

A Case for Packing and Indexing in Cloud File Systems

Solution 1:

Objects are immutable and must be written sequentially from start to finish.

If you're looking for a way to do parallel uploads you can use Object Composition or Multipart Uploads.

Solution 2:

Google Cloud Storage is the Object Storage managed service for Google Cloud Platform. Unlike a block storage or file system storage, objects stored are immutable.

As mentioned in the Official documentation,

Objects are immutable, which means that an uploaded object cannot change throughout its storage lifetime. An object's storage lifetime is the time between successful object creation, such as uploading, and successful object deletion. In practice, this means that you cannot make incremental changes to objects, such as append operations or truncate operations. However, it is possible to replace objects that are stored in Cloud Storage, and doing so happens atomically: until the new upload completes, the old version of the object is served to readers, and after the upload completes the new version of the object is served to readers. So a single replacement operation simply marks the end of one immutable object's lifetime and the beginning of a new immutable object's lifetime.

As a workaround, you can try the concept of Object Composition,

gsutil compose gs://bucket/source_obj1 [gs://bucket/source_obj2 ...] gs://bucket/composite_obj

This operation basically concatenates the contents of a number of objects in the same bucket under a new name (as an example, cat file1 file2 > newfile) but without re-writing data. So you could create a new object, upload the contents to append to it, close and subsequently compose this new piece at the end of your main file. However, there is a limit (currently 32) to the number of components that can be composed in a single operation.

You may explore more on Object Composition from the Compose Documentation and documentation.

Can Google Cloud Storage write byte ranges into an existing object?

Solution 1:

Solution 2:

Related

Recent Posts