libvirt: virtual drives gone after snapshot - VM still running
Edit: see solution to my problem below in Update2.
After playing around with some backup script i have the following situation
virsh domblklist polar-bear
Target Source
------------------------------------------------
vda /home/user/vms/polarbear.backup
vdb /home/user/vms/polarbear_data.backup
However, these files do not really exist at this palce.
ls -al /home/user/vms/*.backup
ls: cannot access '/home/user/vms/*.backup': No such file or directory
The VM runs fine though and it even rebooted without any issue. But i'm a bit afraid i'm running into some horrorful scenario soon..
I guess something went wrong when the snapshots were deleted?
Currently no snapshots are configured for this machine.
virsh snapshot-list polar-bear
Name Creation Time State
------------------------------------------------------------
What should i do?
UPDATE
After following the solution from DanielB below, i have now two VMs. The original one is in paused while the newly created one runs on the recovered images + copies of the old images. The XML of polar-bear2 now looks as follows:
<disk type='file' device='disk'>
<driver name='qemu' type='qcow2'/>
<source file='/mnt/sda3/polarbear.backup_recover'/>
<backingStore type='file' index='1'>
<format type='raw'/>
<source file='/home/user/vms/polarbear_2.img'/>
<backingStore/>
</backingStore>
<target dev='vda' bus='virtio'/>
<alias name='virtio-disk0'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
</disk>
<disk type='file' device='disk'>
<driver name='qemu' type='qcow2'/>
<source file='/mnt/sda3/nextcloud_data.backup_recover'/>
<backingStore type='file' index='1'>
<format type='raw'/>
<source file='/home/user/vms/nextcloud_data_2.img'/>
<backingStore/>
</backingStore>
<target dev='vdb' bus='virtio'/>
<alias name='virtio-disk1'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/>
</disk>
At least this is seems okay now:
virsh domblklist polar-bear2
Target Source
------------------------------------------------
vda /mnt/sda3/polarbear.backup_recover
vdb /mnt/sda3/nextcloud_data.backup_recover
The nextcloud_data.backup_recover image is only ~2.5GB in size, the original one /home/user/vms/nextcloud_data_2.img is 16GB though.
So... how do i get this back to one image?
UPDATE2 - solution?
So after I examined the XML in the step described above, i came to the conclusion to search the internet for "Backingstore". I came across this article:
And after some hesitation i thought i had backups now and could try it. So after i ran
virsh blockcommit polar-bear2 vda --verbose --pivot --active
virsh blockcommit polar-bear2 vdb --verbose --pivot --active
Now it looks like this:
virsh domblklist polar-bear2
Target Source
------------------------------------------------
vda /home/user/vms/polarbear.img
vdb /home/user/vms/nextcloud_data.img
And the XML now looks like this:
<disk type='file' device='disk'>
<driver name='qemu' type='raw'/>
<source file='/home/user/vms/polarbear.img'/>
<backingStore/>
<target dev='vda' bus='virtio'/>
<alias name='virtio-disk0'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
</disk>
<disk type='file' device='disk'>
<driver name='qemu' type='raw'/>
<source file='/home/user/vms/nextcloud_data.img'/>
<backingStore/>
<target dev='vdb' bus='virtio'/>
<encryption format='luks'>
</encryption>
<alias name='virtio-disk1'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/>
</disk>
So i guess the "pivoting" was what i wanted to do...
However, one thing which I find is odd: when i did this block commit thing, it merged the data back to
/home/user/vms/polarbear.img
However, in the XML that i had adapted to create the 2nd VM i had put
/home/user/vms/polarbear_2.img
So i would expect it to be merged back into this image.
I guess libvirt has some UUID to keep track of disk images and hence use the original one maybe...
Whatever - i now did the same for the orignial polarbear VM and i think it works.
Under no circumstances allow the VM to be shut down yet, as your disks will likely be irretrievably lost if that happens.
You can recover the deleted disks data by copying data from the live file descriptor QEMU holds open as long as it is running....
You need to find the process ID of the QEMU process first of all.
Then look in /proc/$PID/fd/ and you should see some symlinks that correspond to your disk image(s) and have the word "(deleted)" in them.
You should be able to use these to read the data from the deleted file.
eg as an example, my QEMU procss is PID 253575, and shows FD '15' corresponds to my deleted disk
# ls -al /proc/253575/fd | grep deleted
lrwx------. 1 qemu qemu 64 Oct 19 10:21 15 -> /var/lib/libvirt/images/demo.qcow2 (deleted)
# dd if=/proc/253575/fd/15 of=safe.img bs=1M
1+1 records in
1+1 records out
1376256 bytes (1.4 MB, 1.3 MiB) copied, 0.00419377 s, 328 MB/s
Before copying the image like this is is worth logging into your VM and at least run "sync" to flush all pending data out to disk.
You'll need to do this recovery for each disk in your VM.
Still do NOT shutdown your running VM yet.
Now copy the XML config for your guest and create a new one
$ virsh dumpxml polar-bear > polar-bear.xml
$ vi polar-bear.xml
....change the disk image paths and change the name, UUID...
$ virsh define polar-bear.xml
Now try to boot this new guest and make sure all your data is present and correct.
If you finally verify everything is good, then you can consider shutting down your original guest with the deleted disks.
I would suggest trying this whole process on a non-production demo VM first so you are comfortable. eg spin up a new VM on your laptop, delete its disk, and then try to recover it. Once you tested that it works as expected, then try recovery on your real important VM.
If you're extra paranoid though, you might want to login to the original guest and rsync any data out to somewhere safe as a second backup.