zero fill vs random fill
Many tutorials suggest that i should fill a disk with /dev/urandom instead of /dev/zero if i want it to be unrecoverable. But I don't quite get it, how can a disk still be recoverable after being zero-filled? And is this just very specialized people (read government agencies) who can recover a zero-filled disk, or something your average geek can do?
PS: I'm not THAT worried about my data, I sell used-computers from time to time, and I'd rather the average joe buyers won't recover anything funny from them.
Solution 1:
While filling a disk with /dev/zero
will zero it out, most (currently available) recovery software cannot recover files from a single pass. More passes make the erase more secure, but take more time.
/dev/urandom
is considered more secure, because it fills the disk with random data (from the Linux kernel's entropy pools), making it harder for recovery software to find any meaningful data (it also takes longer).
In short, a moderate number of passes /dev/urandom
is safer if you are trying to securely erase data, but for most casual applications, a few passes from /dev/zero
will suffice.
I usually use the two in combination when erasing disks (always erase before reselling or recycling your computer!).
Solution 2:
Many tutorials suggest that i should fill a disk with /dev/urandom instead of /dev/zero if i want it to be unrecoverable.
Whatever you do, do not use /dev/urandom
.
On my i7-3770, /dev/urandom
produces an astonishing 1 GB of pseudo-randomly generated data per minute. For a 4 TB hard drive, a single wipe with /dev/urandom
would take over 66 hours!
If you absolutely must use pseudo-randomly generated data (more on that below), at least use a decently fast way of generating it. For example
openssl enc -aes-128-ctr -pass file:/dev/random 2>/dev/null | tail -c+17
prints an infinite stream of bytes. It uses AES in CTR mode and a password read from /dev/random
, so it's cryptographically secure for any hard drive smaller than 1,000,000 TB.
It's also fast. Very fast. On the same machine, it managed to generate 1.5 GB per second, so it's 90 times faster than /dev/urandom
. That's more than any consumer-level hard drive can handle.
[I]s this just very specialized people (read government agencies) who can recover a zero-filled disk, or something your average geek can do?
In Overwriting Hard Drive Data: The Great Wiping Controversy, the authors conclude that overwriting a pristine drive (only used for the test) once with non-random data lower the probability of recovering a single bit correctly to 92%. This means that a single byte (one ASCII character) can be recovered with only 51% probability; and there's no way of telling if the byte has been recovered correctly or not.
In real world scenarios (slightly used drive), the probability drops to 56% for a single bit and merely 9% for a single byte.
They took a new drive, wiped it three times to simulate short-term usage, wrote a short text to it and wiped the drive once with non-random data. These were the results:
Original text:
Secure deletion of data - Peter Gutmann - 1996
Abstract
With the use of increasingly sophisticated encryption systems, an attacker wishing to gain access to sensitive data is forced to look elsewhere for information. One avenue of attack is the recovery of supposedly erased data from magnetic media or random-access memory.
Recovered text:
¡ÄuÜtÞdM@ª""îFnFã:à•ÅÒ̾‘¨L‘¿ôPÙ!#¯ -×LˆÙÆ!mC
2´³„‡·}NŽýñêZØ^›l©þì®·äÖŒv¿^œº0TÏ[ªHÝBš¸ð
7zô|»òëÖ/""º[ýÀ†,kR¿xt¸÷\Í2$Iå""•ÑU%TóÁ’ØoxÈ$i
Wï^™oËS²Œ,Ê%ñ ÖeS» eüB®Èk‹|YrÍȶ=ÏÌSáöp¥D
ôÈŽ"|ûÚA6¸œ÷U•$µM¢;Òæe•ÏÏMÀùœç]#•Q
Á¹Ù""—OX“h
ÍýïÉûË Ã""W$5Ä=rB+5•ö–GßÜä9ïõNë-ߨYa“–ì%×Ó¿Ô[Mãü
·†Î‚ƒ‚…[Ä‚KDnFJˆ·×ÅŒ¿êäd¬sPÖí8'v0æ#!)YÐúÆ©
k-‹HÈø$°•Ø°Ïm/Wîc@Û»Ì"„zbíþ00000000000000000
Solution 3:
At the microscopic level a hard drive bit has neither "1", nor "0", but a magnetic charge. there is a threshold above which the charge is considered a "1". Likewise the bits geometric location is not precise, but falls within a given space.
The theory is that a tiny bit of the previous charge is still present in a newly written bit, so if you just zero the disk it might be possible for someone to set a new much lower threshold for what is considered a 1, and still recover the data. Writing random data makes this much harder.
The theory behind multiple passes has to do with the geometric location of the bit on the disk. If the current pass is a little further ahead or behind, then a remnant of the old bit might be peeking out from aside of the new bit. two or three passes (especially of random data) make it much less likely that a previous bit would be identifiable.
As others have already said, These fears are mostly overblown. The biggest risk is data that is only deleted by the OS, or not deleted at all.