How to create a file from terminal repeating a set of words infinitely?

How to create a file from terminal repeating a set of words infinitely? I need it to create a huge file for parsing purposes like 2-4GB in size. Currently I am manually copying pasting lines into the same file to increase the size.


There's an easy way to repeat a line lots of times using the yes command:

yes we have no bananas | head -n 10000 > out.txt

will result in out.txt containing 10,000 lines all saying "we have no bananas".


To limit the output to an exact number of bytes, use head's -c option instead of -n. For example, this generates exactly 10 kB of text:

yes we have no bananas | head -c 10000 > out.txt

Perl has the nifty x operator:

$ perl -e 'print "foo\n" x 5'
foo
foo
foo
foo
foo

So, as a simple solution, you could just write your line a few million times. For example, this command created a 3G file:

perl -e 'print "This is my line\n" x 200000000' > file

If you need to specify an exact size (2 GiB in this case), you can do:

perl -e 'use bytes; while(length($str)<2<<20){ $str.="This is my line\n"} print "$str\n"' > file

I can't recommend infinitely repeating text, but you could make a ~2GB file of repeated text with python like so...

python3 -c 'with open("bigfile", "w") as f: f.write(("hello world "*10+"\n")*2*10**7)'

That will print "hello world " 10 times and make a new line, and repeat that 20,000,000 times, writing the result to the file bigfile. If all your chars are ASCII, then each one is one byte, so calculate appropriately depending on what you want to write...

Your cpu may be owned. I run out of RAM if I try doing more than 10,000,000 lines...

I'm running a toaster though


  • Put the set of words to be repeated in a file e.g. source.txt. Get the size of the source.txt, in bytes e.g. by:

     stat -c '%s' source.txt
    
  • Decide the size of the destination file e.g. destination.txt, 2 GB or 4 GB or whatever. Convert the size in bytes.

  • Divide the destination file size by source file size. bash can't do floating point arithmetic, but it's not needed in this case.

  • Use a for construct to repeat a cat source.txt operation the division result times. This would be closest approximate of the destination file size you can get by repetition. The output of the operation is saved in destination.txt.

For example, assuming the source.txt is of 30 bytes, and we want to create a 2 GB file, we need:

for ((i=0; i<=((16777216/30)); i++)); do cat source.txt; done >destination.txt

Here I am setting upper limit by ((16777216/30)) at initialization time; you can get the result and put it here too.

The operation would take some time; the larger the source.txt, the less time will be needed.