Fastest possible grep

I'd like to know if there is any tip to make grep as fast as possible. I have a rather large base of text files to search in the quickest possible way. I've made them all lowercase, so that I could get rid of -i option. This makes the search much faster.

Also, I've found out that -F and -P modes are quicker than the default one. I use the former when the search string is not a regular expression (just plain text), the latter if regex is involved.

Does anyone have any experience in speeding up grep? Maybe compile it from scratch with some particular flag (I'm on Linux CentOS), organize the files in a certain fashion or maybe make the search parallel in some way?

Try with GNU parallel, which includes an example of how to use it with grep:

grep -r greps recursively through directories. On multicore CPUs GNU parallel can often speed this up.
find . -type f | parallel -k -j150% -n 1000 -m grep -H -n STRING {}
This will run 1.5 job per core, and give 1000 arguments to grep.

For big files, it can split it the input in several chunks with the --pipe and --block arguments:

 parallel --pipe --block 2M grep foo < bigfile

You could also run it on several different machines through SSH (ssh-agent needed to avoid passwords):

parallel --pipe --sshlogin server.example.com,server2.example.net grep foo < bigfile

If you're searching very large files, then setting your locale can really help.

GNU grep goes a lot faster in the C locale than with UTF-8.

export LC_ALL=C

Ripgrep claims to now be the fastest.

https://github.com/BurntSushi/ripgrep

Also includes parallelism by default

 -j, --threads ARG
              The number of threads to use.  Defaults to the number of logical CPUs (capped at 6).  [default: 0]

From the README

It is built on top of Rust's regex engine. Rust's regex engine uses finite automata, SIMD and aggressive literal optimizations to make searching very fast.

Apparently using --mmap can help on some systems:

http://lists.freebsd.org/pipermail/freebsd-current/2010-August/019310.html

Fastest possible grep

Related

Recent Posts