How to create a file from terminal repeating a set of words infinitely?
How to create a file from terminal repeating a set of words infinitely? I need it to create a huge file for parsing purposes like 2-4GB in size. Currently I am manually copying pasting lines into the same file to increase the size.
There's an easy way to repeat a line lots of times using the yes
command:
yes we have no bananas | head -n 10000 > out.txt
will result in out.txt containing 10,000 lines all saying "we have no bananas".
To limit the output to an exact number of bytes, use head
's -c
option instead of -n
. For example, this generates exactly 10 kB of text:
yes we have no bananas | head -c 10000 > out.txt
Perl has the nifty x
operator:
$ perl -e 'print "foo\n" x 5'
foo
foo
foo
foo
foo
So, as a simple solution, you could just write your line a few million times. For example, this command created a 3G file:
perl -e 'print "This is my line\n" x 200000000' > file
If you need to specify an exact size (2 GiB in this case), you can do:
perl -e 'use bytes; while(length($str)<2<<20){ $str.="This is my line\n"} print "$str\n"' > file
I can't recommend infinitely repeating text, but you could make a ~2GB file of repeated text with python like so...
python3 -c 'with open("bigfile", "w") as f: f.write(("hello world "*10+"\n")*2*10**7)'
That will print "hello world " 10 times and make a new line, and repeat that 20,000,000 times, writing the result to the file bigfile
. If all your chars are ASCII, then each one is one byte, so calculate appropriately depending on what you want to write...
Your cpu may be owned. I run out of RAM if I try doing more than 10,000,000 lines...
I'm running a toaster though
-
Put the set of words to be repeated in a file e.g.
source.txt
. Get the size of thesource.txt
, in bytes e.g. by:stat -c '%s' source.txt
Decide the size of the destination file e.g.
destination.txt
, 2 GB or 4 GB or whatever. Convert the size in bytes.Divide the destination file size by source file size.
bash
can't do floating point arithmetic, but it's not needed in this case.Use a
for
construct to repeat acat source.txt
operation the division result times. This would be closest approximate of the destination file size you can get by repetition. The output of the operation is saved indestination.txt
.
For example, assuming the source.txt
is of 30 bytes, and we want to create a 2 GB file, we need:
for ((i=0; i<=((16777216/30)); i++)); do cat source.txt; done >destination.txt
Here I am setting upper limit by ((16777216/30))
at initialization time; you can get the result and put it here too.
The operation would take some time; the larger the source.txt
, the less time will be needed.