Efficiently search sorted file

I have a large file containing one string on each line. I would like to be able to quickly determine if a string is in the file. Ideally, this would be done using a binary chop type algorithm.

Some Googling revealed the look command with the -b flag which promises to locate and output all strings beginning with a given prefix using a binary search algorithm. Unfortunately, it doesn't seem to work correctly and returns null results for strings that I know are in the file (they are properly returned by the equivalent grep search).

Does anyone know of another utility or strategy to search this file efficiently?


There's an essential difference between grep and look:

Unless explicitly stated otherwise, grep will find patterns even somewhere within the lines. For look the manpage states:

look — display lines beginning with a given string

I'm not using look very often, but it did work fine on a trivial example I just tried.


Maybe a little late answer:

Sgrep will help you.

Sgrep (sorted grep) searches sorted input files for lines that match a search key and outputs the matching lines. When searching large files sgrep is much faster than traditional Unix grep, but with significant restrictions.

  • All input files must be sorted regular files.
  • The sort key must start at the beginning of the line.
  • The search key matches only at the beginning of the line.
  • No regular expression support.

You can download source here: https://sourceforge.net/projects/sgrep/?source=typ_redirect

and the documents here: http://sgrep.sourceforge.net/

Another Way:

I don't know how large is the file.Maybe you should try parallel:

https://stackoverflow.com/questions/9066609/fastest-possible-grep

I always do grep with files which size > 100GB, it works well.