Which is better image format, raw or qcow2, to use as a baseimage for other VMs?
I am using a baseimage and based on that creating many VMs. And now I want to know which is better, qcow2 or raw to use for a baseimage. Moreover, can you please tell me if there is any advantage of using this baseimage thing, instead of cloning the whole disk. Speed can be one factor but in term of efficiency is there any problem in using a baseimage and then creating VMs using that baseimage ?
Edit 1:
I performed some experiments and got
First one is when both baseimage and overlay are qcow2. Second When baseimage is raw but the overlay is qcow2 and in third case I am giving individual raw disk image to each VM. Surprisingly, last case is much more efficient as compared to the other two.
Experimental Setup: OS in baseimage : Ubuntu Server 14.04 64 bit. Host OS: Ubuntu 12.04 64bit RAM : 8GB Processor : Intel® Core™ i5-4440 CPU @ 3.10GHz × 4 Disk : 500 GB
On x-axis : Number of VM booted simultaneously. Starting from 1 and incremented upto 15.
On y-axis : Total Time to boot "x" number of machines.
From the graphs, it seems that giving full disk image to VM is much more efficient then other 2 methods.
Edit 2:
This is for the case when we are giving individual raw image to each VM. After doing cache flushing, this is the graph. It is almost similar to the raw baseimage + qcow overlay.
Thanks.
For your specific use case (base image + qcow2 overlay), the RAW format should be preferred:
- It's faster: as it has no metadata associated, it is as fast as possible. On the other hand, Qcow2 has two layer of indirection that must be crossed before to hit the actual data
- As the overlay layer must be a Qcow2 file, you don't lose the ever-useful snapshot capability (RAW images don't support snapshots by themselves)
The choice between base image + qcow2 overlay vs multiple full copies depends on your priority:
- For absolute performance, use fallocated RAW images. This has the downside of not supporting snapshot, with in most environments is a too high price to pay
- For flexibility and space-efficiency use RAW base images + Qcow2 overlays.
Anyway, I found Qcow2 files somewhat fragile.
For my production KVM hypervisors I basically use two different setups:
- where performance is #1 I use LVM volumes directly attached to the virtual machines, and I use LVM snapshot capability to take consistent backups
- where I can sacrifice some performance for enhanced flexibility, I use a single, big LVM Thin Provisioned Volume + XFS + RAW images
Another possibility is to use a normal LVM volume + XFS + RAW images. The only downside is that normal (non-thin) LVM snapshots are very slow and snapshotting a busy normal LVM volume will kill performance (for the lifetime of the snapshot). Anyway, if you plan to use only a sporadic use of snapshots, this can be the simpler and safer bet.
Some references:
KVM I/O slowness on RHEL 6
KVM storage performance and Qcow2 prellocation on RHEL 6.1 and Fedora 16
KVM storage performance and cache settings on Red Hat Enterprise Linux 6.2
LVM thin volume explained
Please be advised....if you are using linux you can use raw
and get the same benifits of qcow2
as far as size goes.
...If your file system supports holes (for example in ext2 or ext3 on Linux or NTFS on Windows), then only the written sectors will reserve space.
https://docs.fedoraproject.org/en-US/Fedora/18/html/Virtualization_Administration_Guide/sect-Virtualization-Tips_and_tricks-Using_qemu_img.html
raw Raw disk image format (default). This can be the fastest file-based format. If your file system supports holes (for example in ext2 or ext3 on Linux or NTFS on Windows), then only the written sectors will reserve space. Use qemu-img info to obtain the real size used by the image or ls -ls on Unix/Linux.