Drive recovery: zipped image is much smaller than original drive

I have a 1TB SSD that got corrupted. It was the main drive for a Windows machine, but wasn't used very heavily, so I doubt that it failed due to hardware problems. I made a compressed image of the drive for recovery, using the command sudo dd if=/dev/sda bs=1M status=progress | bzip2 -9 > /mnt/dsk2.bz2, with a 300GB drive mounted on /mnt. The resulting compressed image is only 21GB in size, which to me seems like most of the drive somehow got zeroed. dd showed 1.0TB copied and I only have 1 1TB drive so I'm sure that I was copying from the correct source.

The drive had a main partition of 930GB and a few small partitions <1GB in size. In Windows, the main partition appeared as RAW, and the system wouldn't boot.

Given this information, how should I proceed with recovery? Would something like ddrescue be likely to recover anything from an image like this or should I run recuperabit on the main partition despite the corruption? I don't care about restoring a full bootable image, just some of the files that were on the main partition.


The fact you got 21 GB compressed data from 1 TB read by dd means the stream from dd was highly compressible. Possible reasons:

  1. The space the filesystems considered empty compressed extremely well.

    It's possible the majority of sectors considered empty by the filesystems were read as zeros. Maybe because:

    • They were zeroed when the filesystems where created. In case of NTFS this happens when the creation of the filesystem ("formatting") is performed without the "quick" option.
    • Or maybe they were zeroed later for some reason.
    • Or maybe the filesystems were trimmed on a regular basis. Some SSDs return zeros when reading from trimmed blocks. In general an SSD may return anything; even if not zeros, the "data" from trimmed blocks may compress well.
  2. Files in the filesystems were compressed well.

    Some files (e.g. text files) compress well. Other files compress to some moderate degree. Files already highly compressed (e.g. media files) cannot be compressed more.

    If you had about 21 GB of incompressible data and the empty space compressed extremely well, then a result of 21 GB of compressed data wouldn't be a surprise. 30 GB of mixed data might be compressed to 21 GB. Compressing a Windows installation plus 15 GB of already compressed media (movies, mp3s) down to 21 GB is impossible, I think (one cannot compress arbitrary data arbitrarily well).

    Maybe you remember how much data you had and (roughly) how prone to compression it might be. This way you can estimate what percentage of your data may be in the image. Whatever estimation you get, treat it as optimistic; the reality is probably worse. It's very likely you have lost data due to corruption you mentioned. This brings us to the next possible reason.

  3. Instead of good data you got "garbage" (because of corruption) that compressed extremely well.

    Possibly the SSD returned highly-compressible garbage (e.g. zeros) from blocks that should contain useful data, as if it had been trimmed too much.

The image you have contains everything you were able to get from /dev/sda. Without messing with the firmware or the hardware of the SSD you most likely cannot get more. Unless maybe the SSD is "unstable" and sometimes returns meaningful data for addresses where other times it returns zeros. Taking another (uncompressed) image and comparing to the current one may give some clues (and then maybe you will be able to merge images). I think it's not totally impossible the SSD is "unstable" this way, but I wouldn't count on it.

Probably the image you have not only contains everything you were able to get from /dev/sda; it contains everything you are or will be able to get from it, ever. The SSD may get worse, the image is the best you have.

A commercial data recovery service may or may not be able to get more data from the SSD by messing with the firmware or the hardware.


how should I proceed with recovery?

I don't know if any utility to recover files supports recovering from a compressed image. You need to decompress the image. If you decompress to a filesystem that supports compression (e.g. Btrfs) and the resulting file is compressed on the filesystem level then it will take less diskspace than 1 TB. It will probably be significantly more than 21 GB (the filesystem won't be as good as bzip2 -9), still the size on disk should be comparable.

Alternatively write the decompressed image as sparse. Many filesystems support sparse files (ext4 does). You can force sparseness by piping to dd conv=sparse (but mind this: Why didn't dd conv=sparse save space as I expected?). Another method is to pipe to cp --sparse=always /proc/self/fd/0 /target/file. (Note neither dd conv=sparse nor cp --sparse nor /proc are portable.) But this will save diskspace only if the uncompressed image contains blocks of zeros. High compression ratio you observed may or may not be because of zeros; it may be because of blocks of non-zeros that also compress extremely well but cannot be written as sparse.


Would something like ddrescue be likely to recover anything from an image like this?

Probably ddrescue won't help you at all. Its purpose is to get an image of a block device (or a regular file, or whatever seekable file) that is as close as possible to the original. cp or dd can also do it, unless a read error occurs. ddrescue is designed to deal with read errors (while other tools usually stop). Your dd did not encounter a formal error, therefore it worked. This job is done. The time to use ddrescue was then; you were kinda lucky you managed to do this with dd.

(On the other hand you cannot make ddrescue work with bzip, so it wouldn't be possible to get a compressed image. I understand you explicitly wanted the image to be compressed. You could have made ddrescue write a sparse image though.)

Using ddrescue now on the (compressed or uncompressed) image will only give you another perfect copy of the image. You don't expect read errors when reading the image, the filesystem holding the image is healthy, right? So ddrescue would be like cp.

There is another usage case of ddrescue. If you could mount a filesystem from the image, if you found files in directories in it, then you could use cp to get them out. This cp might fail because of read errors caused by the filesystem being unhealthy. If cp fails then ddrescue may be able to get more fragments of the troublesome file (or files, one by one). But only if you can mount.

You probably won't be able to easily mount the filesystem that had appeared as RAW. Tools like testdisk may be able to help you mount it. Remember that mounting the filesystem is not your goal.


or should I run recuperabit on the main partition despite the corruption?

Tools like photorec or foremost may be able to recover some files (or their fragments) even if the filesystem doesn't mount (see answers here: How do I recover lost/inaccessible data from my storage device?). I have no experience with recuperabit but it seems it's the right tool for the job and it may be better than other tools (even other tools combined) because it's tailored to NTFS.

Usually such tool (and recuperabit for sure) expects input as an image (regular file) or a (healthy) block device where the image was written to. Alternatively as the original block device. The original device is possibly unhealthy, therefore the usual procedure is to make a copy (image) once and work with it (or copies of it) anyway. The point is the copy cannot be compressed, you need the uncompressed file.

And then yes, recuperabit is a good idea. I wouldn't expect much though. You have probably lost data, the lost data is not in the image, it most likely cannot be retrieved from the SSD in home environment or ever. Maybe you will recover something, better than nothing, it's worth to try.