Modify a large file, then be able to rollback changes doing it (almost) inplace

I'm recovering data from a damaged 500GB disk drive. I'm copying data (ext4 partition) to a 500GB image file. The process is taking about 3 months of copying in total (yes, months). So I'm using dd for patiently filling the image file. First I dd a chunk to a temp file, then put the chunk into the backup file and so...

The problem is that I want to access the partially filled image and recover some data before the backup process ends. I've mounted it read-only and used photorec and testdisk and it's ok. But I also want to try fsck to (try to) repair the partition. After peeking data I would like to rollback fsck changes and resume the copying.

I know tools like rsync, rdiff and git derivates (bup, git-annex...) that could help. but I wonder if there is a way to make this in-place. Not taking another 500Gb of indexed original data.

I don't want versioning capabilities. I don't want a backup of my file. The workflow would be something like:

  1. I have original_500GB_file.img -> 500GB of data
  2. I modify 2GB of the file. Say now I have modified_500GB_file.img and other auxiliary files -> less than 600GB of data (500 original + 2 modified + some metadata)
  3. When I'm happy making changes, rollback and get to point 1 again.

How can achieve this? Would it be possible with BTRFS snapshot capabilities?? (unfortunately I have the file in a NTFS partition)

Thanks.


Solution 1:

The easiest way would actually be using BTRFS or ZFS and their snapshot capabilities, yes. I didn't work too much with BTRFS (only ZFS right now), but the rollback should be no problem.

(I'm going to write ZFS-based, but it should work rather similarly for BTRFS)

Before you start the recovery process, you take a "snapshot" of your current file-system that holds the 500GB image.

Then you can copy all the data that you got during the recovery to some other location (not inside the same filesystem, otherwise they will be destroyed during rollback!). Only the changes on the 500GB image will take up space. So if you change only 50GB, you would require a total of ~550GB inside the filesystem.

If you are done with this partial recovery, you can do a "rollback" and reset the filesystem to the state it was when you did the "snapshot".

Note, that the snapshot/rollback mechanism always works for a complete filesystem, not only single files.

Edit:

I think some versions of NTFS also have something like a snapshot capability. Windows 7 should allow you to play with that, but as I hear it was removed in Windows 8.... if you right-click a file (the 500GB image), then there should be some option saying "previous versions of this file".

Another rather complicated option: set up a virtual machine, put the 500GB file there. Virtual machine software (like VMware, Virtualbox, VirtualPC) also allow you to take snapshots (of the whole operating system image, including your 500GB) and roll them back. But this would require you to install another OS inside a virtual machine and all that...

Solution 2:

I found a good and easy solution for my problem. Slizzered's last paragraph about virtual machines gave me a hint. You can use qemu software without having to actually load a virtual machine. I found the relevant information here and here.

First you have to create a copy on write (COW) file of your image. This is going to use your original_500GB_file.img as its base. The big file won't be edited because its used as read-only. The COW one is minimal in size and will only grow when changes are made. Just what I needed:

$ qemu-img create -f qcow2 -b original_500GB_file.img disposable.qcow2

Formatting 'disposable.qcow2', fmt=qcow2 size=498000000000 backing_file='original_500GB_file.img' encryption=off cluster_size=65536 lazy_refcounts=off

$ ls -l disposable.qcow2

-rw-r--r-- 1 dertalai users 204288 abr 15 20:01 disposable.qcow2

Now you just have to virtualize the original_read-only + cow_writable pair into a single usable block device:

# modprobe nbd

# qemu-nbd -c /dev/nbd0 disposable.qcow2

/dev/nbd0 is ready for use. You can fsck it or even mount it and do whatever you need. When you are done and want to rollback the changes, just free the block device from any process that is using it and delete the COW file if you want:

# qemu-nbd -d /dev/nbd0

# rmmod nbd

$ rm disposable.qcow2