How to repair a broken-resized qcow2 disk image for libvirt/kvm?

Solution 1:

Did you run the "qemu-img resize diskimage.qcow2 +22GB" while the QEMU process was still running with the same disk open ? If so, that would certainly explain the data corruption, as you would potentially have 2 processes writing to the qcow2 file at the same time and if both writes required qcow2 metadata allocations that could corrupt internal file data structures.

The "qemu-img check" result looks very bogus. In particular tcmalloc is complaining that it can't allocate a 360 GB block of memory. It looks like qemu-img is misinterpreting this error as success, printing the bogus message "No errors found". This is a bug you should certainly report to QEMU.

The 'convert' error just looks to be a followup to the same error that tcmalloc hit.

Unfortunately I don't have any suggestions to fix the problem - I was just going to recommend "check -r" to try to fix it. Your only likely remaining chance is to mail qemu-devel and see if any of the qcow2 maintainers have suggestions.

Solution 2:

Treat qcow2 corruption like a hard drive with bad blocks.

Shutdown that VM.

Then do:

modprobe nbd
qemu-nbd --connect=/dev/nbd0 diskimage.qcow2
ddrescue /dev/nbd0 new_diskimage.raw
qemu-nbd --disconnect /dev/nbd0
qemu-img convert -O qcow2 new_diskimage.raw new_diskimage.qcow2

Now try to boot and pray, hopefully it will get you to the rescue mode, where you can run fsck on that disk.