Does Unix grep work faster with long or short search terms?

Solution 1:

Some reference material:

GNU grep uses the well-known Boyer-Moore algorithm, which looks first for the final letter of the target string, and uses a lookup table to tell it how far ahead it can skip in the input whenever it finds a non-matching character.

from Why GNU grep is fast.

The algorithm preprocesses the string being searched for (the pattern), but not the string being searched in (the text). [...] In general, the algorithm runs faster as the pattern length increases.

from Boyer–Moore string search algorithm.

Conclusion: Use longer strings.

Now, a bit of benchmark for fun:

# Initialisation
cd $(mktemp -d) && dd if=/dev/urandom of=random bs=1M count=1000
# Version
grep --v` # grep (GNU grep) 2.9
# Benchmark
(for s in 'short' 'this is not so short and we could even consider this as pretty long'; do for t in {1..10}; do time grep "$s" random; done; done ) 2> result

Results: 0.952s is the average for the short string, 0.244s is the average for the long string.

NB: The length is not the only criterion to be taken into account.

Does Unix grep work faster with long or short search terms?

Solution 1:

Related

Recent Posts