Fastest possible grep
I'd like to know if there is any tip to make grep
as fast as possible. I have a rather large base of text files to search in the quickest possible way. I've made them all lowercase, so that I could get rid of -i
option. This makes the search much faster.
Also, I've found out that -F
and -P
modes are quicker than the default one. I use the former when the search string is not a regular expression (just plain text), the latter if regex is involved.
Does anyone have any experience in speeding up grep
? Maybe compile it from scratch with some particular flag (I'm on Linux CentOS), organize the files in a certain fashion or maybe make the search parallel in some way?
Try with GNU parallel, which includes an example of how to use it with grep
:
grep -r
greps recursively through directories. On multicore CPUs GNUparallel
can often speed this up.find . -type f | parallel -k -j150% -n 1000 -m grep -H -n STRING {}
This will run 1.5 job per core, and give 1000 arguments to
grep
.
For big files, it can split it the input in several chunks with the --pipe
and --block
arguments:
parallel --pipe --block 2M grep foo < bigfile
You could also run it on several different machines through SSH (ssh-agent needed to avoid passwords):
parallel --pipe --sshlogin server.example.com,server2.example.net grep foo < bigfile
If you're searching very large files, then setting your locale can really help.
GNU grep goes a lot faster in the C locale than with UTF-8.
export LC_ALL=C
Ripgrep claims to now be the fastest.
https://github.com/BurntSushi/ripgrep
Also includes parallelism by default
-j, --threads ARG
The number of threads to use. Defaults to the number of logical CPUs (capped at 6). [default: 0]
From the README
It is built on top of Rust's regex engine. Rust's regex engine uses finite automata, SIMD and aggressive literal optimizations to make searching very fast.
Apparently using --mmap can help on some systems:
http://lists.freebsd.org/pipermail/freebsd-current/2010-August/019310.html