Is there any way to recover deleted files which may have since been overwritten?

Solution 1:

If the data has actually been overwritten it cannot be recovered.

Here is the 1996 paper that originated the claims that it's possible: Peter Gutmann: "Secure Deletion of Data from Magnetic and Solid-State Memory" (also here).

To be fair, Dr. Gutmann notes in his first epilogue to the paper:

Another point that a number of readers seem to have missed is that this paper doesn't present a data-recovery solution but a data-deletion solution. In other words it points out in its problem statement that there is a potential risk, and then the body of the paper explores the means of mitigating that risk.

In other words, the paper was not claiming to prove that recovery of overwritten data was possible. He was only showing reasons to believe that it was plausible, and proposing overwrite patterns that should be sufficient to preclude such recovery. (Although, since he apparently never actually demonstrated any recovery of overwritten data, he could not have tested those patterns, so their efficacy remains as unproven as their necessity.)

However, the papers linked below show that the risk is far, far smaller than claimed.

(I feel compelled to mention that Dr. Gutmann is also famous for claiming that Windows Vista's anti-piracy, DRM enforcement features would use so much energy as to contribute noticably to global warming. George Ou: "Claim that Vista DRM causes full CPU load and global warming debunked!" (2007).)

In the following sections I will cite several papers that disagree with Dr. Gutmann's claims about the possibility of recovery of overwritten data.


Daniel Feenberg, National Bureau of Economic Research: "Can Intelligence Agencies Read Overwritten Data?" (2003, rev. 2013) thoroughly analyzed Gutmann's claims and found them "much overwrought". The presentation is fairly nontechnical and provides a good "starting point" for the subsequent papers. (Note, this link was previously posted in the question comments by Moab. Note also: In the below quotation, and in the rest of this answer, "MFM" refers to "Magnetic Force Microscope", a microscope that reveals magnetization patterns at very high resolution, rather than "Modified Frequency Modulation", the now-obsolete technique for recording data on hard drives. "MFM" is also used in the latter context in Gutmann's paper and in some of the papers linked below.)

Gutmann mentions that after a simple setup of the MFM device, that bits start flowing within minutes. This may be true, but the bits he refers to are not from disk files, but pixels in the pictures of the disk surface. Charles Sobey [...] suggests that it would take more than a year to scan a single platter with recent MFM technology, and tens of terabytes of image data would have to be processed.

and:

A single write is sufficient if the overwrite is truly random, even given an STM microscope with far greater powers than those in the references. In fact, data written to the disk prior to the data whose recovery is sought will interfere with recovery just as must as data written after - the STM microscope can't tell the order in which magnetic moments are created. It isn't like ink, where later applications are physically on top of earlier markings.

and:

Recently I was sent a fascinating piece by Wright, Kleiman and Sundhar (2008) who show actual data on the accuracy of recovered image data. While the images include some information about underlying bits, the error rate is so high that it is difficult to imagine any use for the result. While the occasional word might be recovered out of thousands, the vast majority of apparently recovered words would be spurious.

(The paper by Wright et al is the paper I cover in the next+1 section.)

Another fact to ponder is the failure of anyone to read the "18 minute gap" Rosemary Woods created on the tape of Nixon discussing the Watergate break-in. In spite of the fact that the data density on an analog recorder of in the 1960s was approximately one million times less than current drive technology, and that audio recovery would not require a high degree of accuracy, not one phoneme has been recovered.


Feenberg links to a paper by Charles Sobey: "Recovering unrecoverable data". (The link he uses is stale; this one is live at the moment.) However Sobey's paper is about recovery of data from failed drives, not overwritten ones:

If the disk is not physically damaged, the user's data is still there, unless it has been overwritten. (emph. added - jeh)


Craig Wright, Dave Kleiman, and Shyaam Sundhar R.S.: "Overwriting Hard Drive Data: The Great Wiping Controversy" (2008) is a more technical paper. The authors tested Gutmann's theory of use of a magnetic force microscope (Gutmann apparently never did, never claimed to have) and found it wouldn't work.

The gist of their argument is this: Old data does have an effect on the magnetic fields that result when new data overwrites it; however, they show (with actual data) that the effect is extremely weak. It is of the same order as the signal variations - noise - encountered "naturally" when reading even a formerly-pristine drive, and the two cannot be reliably separated. i.e. variations in magnetic field strength that are due to the noise that is normal in the operation of a hard drive, and those that are due to old data, cannot be distinguished from each other.

The fact is, with modern drives (even going as far back as 1990) that this entire process is mostly a guessing game that fails significantly when tested.

In addition, they state that the underlying theory behind recovery of overwritten data is unsupportable:

The argument arises from the statement that “each track contains an image of everything ever written to it, but that the contribution from each ``layer" gets progressively smaller the further back it was made”. This is a misunderstanding of the physics of drive functions and magneto-resonance. There is in fact no time component and the image is not layered. It is rather a density plot. [emph. added - jeh]

and

the level of recovery when presented with a perfect image is too low to be of use even on a low density pristine drive (which does not exist in any actual environment).

and

Consequently, we can categorically state that there is a minimal (less than a 0.01% chance) of recovering any data on a NEW and unused drive that has a single raw wipe pass (not even a low-level format). In the cases where a drive has been used (even being formatted for use) it is not possible to recover the information – there is a small chance of bit recovery, but the odds of obtaining a whole word are small.


Gordon Hughes and Tom Coughlin: "Secure Erase of Disk Drive Data" (2004) reaches similar conclusions for "exotic" analog analysis of the signal recovered from the heads.

They show that it is possible to show weak correlations between recovered signals and old data, but only if you already know what the old data was. This is obviously forensically useless. If you have nothing to try to correlate with, then the residual effects of old data on the signal are indistinguishable from noise. This is the same conclusion made in the paper cited previously by Wright, Kleiman, and Sundhar, but reached by analysis of electrical signals instead of MFM images.

They conclude:

One erasure pass appears to be sufficient to make old data unrecoverable.

Also, regarding MFM "pictures" of the magnetic domains:

It is easy to obtain pictures that appear to show unerased track edge data. But no one has shown complete recovery of a data sector, including the data synchronization preamble, bit de-randomizer, partial response and modulation codes, and error correction code.


I will add a final thought of my own: If it were possible to write two different sets of data to the same physical area of the drive and recover both reliably, the hard drive makers would have been all over this years ago. They would have been using the drive's apparent ability to store two different items of data in the same place to increase the drive's usable capacity. This clearly has not happened.