Creating a large file of random bytes quickly
I want to create a large file ~10G filled with zeros and random values. I have tried using:
dd if=/dev/urandom of=10Gfile bs=5G count=10
it creates a file of about 2Gb and exits with a exit status 0. I fail to understand why?
I also tried creating file using:
head -c 10G </dev/urandom >myfile
but it takes about 28-30 mins to create it. But i want it created faster. Anyone has a solution?
Also i wish to create multiple files with same (pseudo) random pattern for comparison. Does anyone know a way to do that? Thanks
Solution 1:
I've seen a pretty neat trick at commandlinefu: use /dev/urandom
as a source of randomness (it is a good source), and then using that as a password to an AES stream cipher.
I can't tell you with 100% sure, but I do believe that if you change the parameters (i.e. use way more than just 128 bytes from /dev/urandom
), it is at least close enough to a cryptographically secure PRNG, for all practical purposes:
This command generates a pseudo-random data stream using aes-256-ctr with a seed set by /dev/urandom. Redirect to a block device for secure data scrambling.
openssl enc -aes-256-ctr -pass pass:"$(dd if=/dev/urandom bs=128 count=1 2>/dev/null | base64)" -nosalt < /dev/zero > randomfile.bin
How does this work?
openssl enc -aes-256-ctr
will use openssl
to encrypt zeroes with AES-256 in CTR mode.
- What will it encrypt?
/dev/zero
- What is the password it will use to encrypt it?
dd if=/dev/urandom bs=128 count=1 | base64
That is one block of 128 bytes of /dev/urandom
encoded in base64 (the redirect to /dev/null
is to ignore errors).
-
I'm actually not sure why
-nosalt
is being used, since OpenSSL's man page states the following:-salt use a salt in the key derivation routines. This is the default. -nosalt don't use a salt in the key derivation routines. This option SHOULD NOT be used except for test purposes or compatibility with ancient versions of OpenSSL and SSLeay.
Perhaps the point is to make this run as fast as possible, and the use of salts would be unjustified, but I'm not sure whether this would leave any kind of pattern in the ciphertext. The folks at the Cryptography Stack Exchange may be able to give us a more thorough explanation on that.
-
The input is
/dev/zero
. This is because it really doesn't matter what is being encrypted - the output will be something resembling random data. Zeros are fast to get, and you can get (and encrypt) as much as you want without running out of them. -
The output is
randomfile.bin
. It could also be/dev/sdz
and you would randomize a full block device.
But I want to create a file with a fixed size! How do I do that?
Simple!
dd if=<(openssl enc -aes-256-ctr -pass pass:"$(dd if=/dev/urandom bs=128 count=1 2>/dev/null | base64)" -nosalt < /dev/zero) of=filename bs=1M count=100 iflag=fullblock
Just dd
that command with a fixed blocksize
(which is 1 MB here) and count
. The file size will be blocksize * count
= 1M * 100 = 100M.
Solution 2:
I'm getting good speeds using the shred
utility.
- 2G with
dd in=/dev/urandom
- 250sec - 2G with
openssl rand
- 81sec - 2G with
shred
- 39sec
So I expect about 3-4 minutes for 10G with shred
.
Create an empty file and shred it by passing the desired file size.
touch file
shred -n 1 -s 10G file
I'm not sure how cryptographically secure the generated data is, but it looks random. Here's some info on that.
Solution 3:
There is a random number generator program sharand
, it writes random bytes to a file. (The program was originally called sharnd, with one letter a less ( see http://mattmahoney.net/dc/)
It takes roughly one third of the time compared to reading /dev/urandom
It's a secure RNG - there are faster, but not secure RNG, but that's not what's needed normally.
To be really fast, look for the collection of RNG algorithms for perl: libstring-random-perl
.
Let's give it a try (apt-get install sharand
):
$ time sharand a 1000000000
sharand a 1000000000 21.72s user 0.34s system 99% cpu 22.087 total
$ time head -c 1000000000 /dev/urandom > urand.out
head -c 1000000000 /dev/urandom > urand.out 0.13s user 61.22s system 99% cpu 1:01.41 total
And the result files - (they do look more random from the inside):
$ ls -l
-rw-rw-r-- 1 siegel siegel 1000000000 Aug 5 03:02 sharand.out
-rw-rw-r-- 1 siegel siegel 1000000000 Aug 5 03:11 urand.out
Comparing the 'total' time values, sharand
took only a third of the time needed by the urandom method to create a little less than a GB random bytes:
sharand
: 22s totalurandom
: 61s total