Open source command line tools for indexing a large number of text files [closed]

I'm looking for any open source command line tool or tools which will allow me to index and search a large number of plain text files. Approximate search would be a plus. The tool only needs to print the files that match, although some match context would be useful. A GUI tool isn't useful for my application, nor is anything that searches files one by one (grep for example). I'm basically targeting unix platforms (osx, linux, bsd).

EDIT: I'm not interested in any sort of tool that is system-wide, or needs to run in the background. Basically, I want to build an index for a directory tree full of text files and then later be able to search against it. Preferably the index is one or a few files that I can specify the location of.

Any ideas?


The best thing you could do is feed the text files into a MySQL database and use its FullText matching system. This will give very rapid searches with rankings on how well the results match with the search.

Interfacing a MySQL database with other systems, such as a website for document searching, etc, would be a simple enough task.

Useful resources:

  • MySQL basics: http://news.softpedia.com/news/MySQL-Basic-Usage-Guide-37081.shtml
  • How to use full text searching: http://devzone.zend.com/article/1304
  • MySQL Full Text Searching manual: http://dev.mysql.com/doc/refman/5.0/en/fulltext-search.html

If you want to search for files by file name:

The standard Unix tool for this is locate. It builds a database of files in a cron job, then locate searches through the matches.

It's part of most Linux distributions (usually package "locate" or "mlocate").

If you want to search for files by content:

There are a variety of search engines available that will index documents for you (some even support other formats besides plain text, e.g. word processor document). Examples would be Beagle and Google desktop search. There's a fairly exhaustive list on Wikipedia:

http://en.wikipedia.org/wiki/List_of_search_engines#Desktop_search_engines

Edit:

If you don't want a search engine that runs in the background or automatically indexes all your files, you can probably still use a desktop search engine. Most of them let you control the indexing process, so you can start the indexing manually and specify which directories to index and where to put the index file.


I found what I was looking for. Swish++ can index of a directory of files (not just text), and is basically a set of command line tools. It appears to be a rewrite of Swish-e.


I used to use swish-e, but that was about a decade ago. Development seems to have stalled since then (sometimes stalled means “stable”, not “dead”), but it might work for you.