Ways to improve completeness of files for data recovery and scanning?

Solution 1:

Unless you had absolutely no writes to the device in question after files were deleted, there is always a chance that one or more files may be partially or fully unrecoverable.

If you are running Windows from this drive, given that it writes various things to disk (registry updates, logs) the more time that passes, the more chances that a deleted file may be overwritten and become unrecoverable. First rule of recovery is to remove the device in question and use an operating system running from a separate disk (or recovery CD/flash drive) to examine and perform recovery operations.

You could try other recovery tools, such as Piriform's Recuva, but it's likely the portion of the file that could not be recovered was overwritten by a later file written to disk.

Take this opportunity to develop a better backup plan to avoid needing to perform the recovery in the first place.

Solution 2:

I actually used all 3 tools mentioned in my answer to your previous query. I did end up with a whole bunch of partial files (sadly unavoidable), and since I was paranoid about the completeness of any given recovered file, I ended up with multiple copies of each recovered by the different programs. Of course, some files were found by one program and not the other, but in general I spent more than two weeks sorting through the debris, comparing copies, keeping (mostly) the biggest ones out of the multiple versions and so on. For some files that were recovered partially, I was able to track down the source and download the remaining portions after a hash check or similar. All in all, I was able to recover roughly 80-85%, not of the total, but of the files I really wanted. I feel that was a very decent success rate, given the drive in my case was physically failing (I even used the freezer trick in the end!) In another instance when I did this for a friend who had accidentally formatted the wrong drive and then used the system for a few hours, I was able to recover around 90% of his files.

The process is going to take time and effort and if you can, you absolutely should try out more than one program, since they all don't work the exact same way and their recovery algorithms differ. Talking specifically about R-Studio, while it has been a while I do remember that I turned on the search for Known File Types, even though it added hours to the scan time. In this mode the program utilises knowledge of the internal format/structure of common file types to identify and recover as much of the file content as possible. I did manage to shave off a lot of the scan time though by going to the File Types dialog box (manual - pg. 32) and unchecking all those types that I knew I never had on my drive. This saved the program from wasting time searching for non-existent file types. I also made sure I saved the scan information, so that in subsequent runs I could simply load it and save hours on re-scanning. The other settings I tweaked were related to skipping bad blocks/unreadable areas of the drive, which I don't think is a problem in your case since your drive's otherwise fine.

Can't think of anything else I did for R-Studio specifically. I also went through all the settings for the other two programs and basically enabled any option that seemed to me like it would improve the scan quality, irrespective of the time it would take. Didn't bother with tech support, I know what they're like. In any case, once the programs were done it was all manual work to sift through and salvage what I could from the results returned.

Solution 3:

The problem you are encountering is due to fragmentation. When a file is written to disk, it is not always stored with all of its contents in a row. Whenever you delete a file, it frees up some blocks on disk, but there may be another file right after it. This leaves a chunk of free space that is as big as the deleted file (rounded up to the nearest cluster size). If you then write a file that is bigger than the deleted file, it may get written to that block, but only as much as fits, with the rest being written to the next available free block. This file is now fragmented.

Most people think of disk fragmentation as being bad for performance, but the truth is, it is far more problematic and troublesome for data-recovery. When you lose a file and need to recover it, if the file is not fragmented, then all you need to do is to find the beginning of the file and know its size. Then you can simply copy the appropriate number of clusters. This is made easier when using file signature which are usually stored in the header (i.e., at he start of the file). Therefore, all you have to do is to scan the disk, looking for patterns that indicate the beginning of a file, and then copy a certain number of blocks to recover the file (usually resulting in some extraneous junk at the end of the file; but that’s better than nothing).

If the file is fragmented however, it becomes much more difficult to recover the file because without the file-system information telling you where each chunk of the file is stored, there is no way to know which clusters belong to which files. You may be able to find a file and get a part of it, for example the first 15MB that happen to be stored in a row, but then that last 1MB may be stored somewhere else and there’s no way to know that.

If you are trying to recover a text-file, you may be able to manually locate the separate pieces scattered around the disk and stitch them back together, but even that is difficult if you happened to be editing the file and saved several times, making changes each time. How do you know if the next piece is from the latest version of the file or from an earlier piece? It is possible, but quite time-consuming. Most binary files on the other hand are flat out impossible to recover if they are fragmented (I suppose you could find pieces of certain types of binary files like MP3s in which even fragments can be “viewed”).

Keeping your disk defragmented makes data-recovery much easier. Unfortunately, due to the limited write-capability of SSD drives, many people are actually defragmenting less, which puts data at more risk of being permanently lost.


Let’s say that your files are stored on disk as shown below. The purple clusters are where your file is stored. The first 15MB of the file are stored in a row, but the last 1MB is stored separately, earlier on the disk. The yellow line shows the beginning of the file where the PDF signature was found. The program found the signature and identified a PDF file, but was only able to copy the first 15MB before finding another file. It has no way to know where the last 1MB is located because the cluster chain is, or rather was stored in the file-system.

enter image description here