Making a large file using the terminal

Creating an infinite number of words, guaranteed unique

The script below will generate guaranteed unique words from characters from the alphabet. The issue with any fixed length of characters is that it will produce a limited set of possibilities, limiting the size of your file.

I therefore used python's permutations, which produces a (finite) number of unique words. However After using all the combinations, we simply start over, printing the words 2, then 3, four, n times etc., every value for n will create a new unique word. Thus we have a generator to produce 100% certainly unique words.

The script:

import itertools
import string

ab = [c for c in string.ascii_lowercase]

t = 1
while True:   
    for n in range(1, len(ab)+1): 
        words = itertools.permutations(ab, n)
        for word in words:
            print(t*("".join(word)))
    t += 1

How to use

  • Simply copy the script into an empty file, save it as unique_generator.py
  • Run it by the command:

    python3 /path/to/unique_generator.py > /path/to/bigfile.txt
    

Note

The script produces unique words of various lenght. If you want, start- or max length can be set, by changing the lines:

for n in range(1, len(ab)+1)

(replace the start of the range), and changing:

while True:  

into (for example):

while t < 10:   

In the last case, the length of the words is max 10 times the alphabet.

Ending the process

  • When running it from terminal, simply press Ctrl+C
  • Otherwise:

    kill $(pgrep -f /path/to/unique_generator.py)
    

    should do the job.


To get a large file full on random words use the following command:

cat /dev/urandom | head -c 1000000 | tr -dc "A-Za-z0-9\n" | sort | uniq

This will create a file with a unqiue word on each line and strings of random text. You can increase the size of the file by make 1000 larger or smaller. Each count is equal to roughly one byte.

To make the words space seperated, simply pass them back through tr "\n" " ".

cat /dev/urandom | head -c 1000000 | tr -dc "A-Za-z0-9\n" | sort | uniq | tr "\n" " "

This also avoids the performance problems associated with loops on the shell.