Using an index to make grep faster?

I find myself grepping the same codebase over and over. While it works great, each command takes about 10 seconds, so I am thinking about ways to make it faster.

So can grep use some sort of index? I understand an index probably won't help for complicated regexps, but I use mostly very simple patters. Does an indexer exist for this case?

EDIT: I know about ctags and the like, but I would like to do full-text search.


what about cscope, does this match your shoes?

Allows searching code for:

  • all references to a symbol
  • global definitions
  • functions called by a function
  • functions calling a function
  • text string
  • regular expression pattern
  • a file
  • files including a file

Full-text indexing

There are tools such as recoll, swish-e and sphinx but you'd have to check if they can support the sort of search criteria you need.

Recoll

Recoll is a personal full text search tool for Unix/Linux.

Swish-e

Swish-e is a fast, flexible, and free open source system for indexing collections of Web pages or other files.

Sphinx

Sphinx lets you either batch index and search data stored in an SQL database, NoSQL storage, or just files quickly and easily

grep

I'm surprised grep is as slow as you describe, can you reduce the number of files being searched? For example when I only need to search the source files for one executable (out of many in a project) I feed grep the names from a command that lists the source files for that program:

grep expression `sources myprogram`

sources is a program specific to my development environment but you may have (or be able to construct) something equivalent.

I'm assuming you've tried obvious techniques such as

find /foo/myproject -name "*.c" -exec fgrep -l searchtext

I've read a suggestion that the -P option of current grep can speed up searches significantly.


You could copy your codebase on a RAM disk.


grep, no. But there are several programs which use indexes and aimed at code base. ctags (there is a version provided with vim), etags (aimed for use with emacs), global (more independent of the editor) are the one I'm thinking about now but there are probably other.