Is running permanently in a VMWare snapshot bad for performance?
I understand that the VMWare KB frowns upon long running snapshots mainly due to two things (In my opinion)
Taking tons of snapshots can fill up the data store. Snapshots are simply delta files. Let's say you have a 50 Gig VMDK, near full, and you take a snapshot. In your snapshot you flip every single bit. Your delta file will also be about 50 GB. Snapshot again, flip the bits, another 50 Gig delta file. These can get out of control fast.
Committing large snapshots carries risk. When consolidating snapshots you are writing the delta changes to the original VMDK. This takes time and carries the risk that if something happens you just nuked your VMDK.
Their warnings seem to make logical sense.
With that being said, is it inherently bad to run my machine permanently off of a snapshot VMDK? I want to make my tree the following:
- Base
- Snap1
- Snap 2
- You are here
- Snap1
Snap 1 and 2 will be taken immediately after installing and provisioning the base system. These are machines I plan to refresh frequently so I will simply make my tree look like the following:
- Base
- Snap1
- You are here
- Snap 2
- Snap1
Delete Snap2 and recreate Snap2.
I can not see how this could have any implications for the following reasons:
Since I simply installed a base image and took my deltas immediately after there is no way I could possibly fill up the data store. Assuming my base image is only 10 GB (on a 50 GB thin provisioned disk), even if my delta flipped every single bit the max my total usage could be is 60 GB (10 GB base VMDK which is locked + 50 GB of delta in the snapshot VMDK file). This assumes I do not create any further snapshots.
Since my use case does not call for consolidating the snapshots I do not risk errors upon consolidating my deltas. When I move back to Snap1 and delete Snap2, all of the delta that resided in Snap2 simply gets deleted.
The storage load is exactly the same, so I should be getting the same IOPS. I understand that some files (mainly system files) will exist on the original VMDK and others (everything after the base) will reside in the delta but I don't see how ESXI would care. All the files are on the same physical datastore so the performance should be equivalent to referencing everything in the original VMDK without snapshots.
Any thoughts? ESXI 5.5 with the data store being RAID'd DAS.
I do not have a vCenter license so templating and cloning is off the table.
RESULTS OF TEST
I got in early today to run some tests. Here's the results. There is a performance penalty but I'm not sure why.
Before Snapshotting:
After Snapshotting:
Solution 1:
Yes, there are performance implications for long-running snapshots. There are even greater implications for consolidating delta VMDKs back to the original disk file. This can cause unresponsiveness in your VM's operating system or other undesirable behavior.
VMware has templating and cloning functionality built into vCenter. You need a $600 vSphere Essentials license to enable this.
You can create a VM to your taste, then clone it to a template. That template can then be used to generate new virtual machines from a "Golden Master" image.
This allows you to have a "clean state" but also create long-running or permanent VMs from that master image. No snapshots needed.
Solution 2:
ewwhite's answer is correct, but just to expand a bit more or the performance penalty, consider the following scenario:
You create a VM. A virtual read from the vmdk takes one physical disk read of the same size. Fairly straightforward.
Now imagine you take a snapshot of the VM. Now, for every virtual read, you're going to incur 2 physical reads, one from the base vmdk and one from the delta vmdk, because you need information from both to get the current state. You're now at twice the physical disk reads.
For two snapshots, you're doing three times the reads, and so on. If you have a lot of snapshots, you can see how this can be a fairly significant performance penalty. It doesn't necessarily translate into n-times worse performance (due to caching, sections that haven't been changed, etc.), but it's not a good practice.