Google Compute Engine: what is the difference between disk snapshot and disk image?

A snapshot reflects the contents of a persistent disk in a concrete instant in time. An image is the same thing, but includes an operating system and boot loader and can be used to boot an instance.

Images and snapshots can be public or private. In the case of images, public can mean official public images provided by Google or not.

Snapshots are stored as diffs (a snapshot is stored relative to the previous one, though that is transparent to you) while images are not. They are also cheaper ($0.03 per GB/month vs $0.085 for images).

These days the two concepts are quite similar. It's now possible to start an instance using a snapshot instead of an image, which is an easy way of resizing your boot partition. Using snapshots may be simpler for most cases.


Snapshots:

  • Good for backup and disaster recovery
  • Lower cost than images
  • Smaller size than images since it doesn't contain OS, etc.
  • Differential backups - only the data changed since the last snapshot is recreated
  • Faster to create than images
  • Snapshots are only available in the project they are created (now it is possible to share between projects)
  • Can be created for running disks even while they are attached to running instances

Images:

  • Good for reusing compute engine instance states with new instances
  • Available across different projects
  • Can't be created for running instances(unless you use --force flag)

Snapshots are primarily targeting backup and disaster recovery scenarios, they are cheaper, easier to create (can often be uploaded without stopping the VM). They are meant for frequent regular upload, and rare downloads.

Images are primarily meant for boot disk creation. They optimized for multiple downloads of the same data over and over. If the same image downloaded many times, subsequent to the first download the following downloads are going to be very fast (even for large images).

Images do not have to be used for boot disks exclusively, they also can be used for data that need to be made quickly available to a large set of VMs (In a scenario where a shared read-only disk doesn't satisfy the requirements for whatever reason)