Search Engine - Lucene or Solr

We need to integrate a search engine in our Product Catalog management software. the catalog is expected to have more than 4-5 mn. records with relational data spread over several tables. Our dev platform is Asp.Net 3.5 and we have done some pre-liminary work on Lucene, found it to be good. However, we just came to know of Solr and was looking for some practical tips to compare Lucene & Solr from implementation, timeline, regular maintenance, performance, features perspective. Any guidance or pointers would be really helpful. Thanks.


Solution 1:

Lucene:

Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search

Solr:

Solr is an open source enterprise search server based on the Lucene Java search library, with XML/HTTP and JSON APIs, hit highlighting, faceted search, caching, replication, a web administration interface and ...

Essentially, Lucene is embedded in Solr and is purely a full-text search library, with the purpose of being embedded into projects giving them full-text search capabilities. Solr has much more features and administration capabilities, allowing to search structured data without needing to write any custom code, load data from CSV files, tolerant parsing of user input, faceted searching, highlighting matched text in results, and retrieving search results in a variety of formats (XML, JSON, ...) . Check Solr features page and see if any feature is relevant for your project.

Solution 2:

I have to agree with Andrew Clegg. I think when a lot of Java Developer types look at Lucene vs Solr, Lucene looks more friendly because it's a just a library (POJJ: Plain Old Java Jar!), like any other library and it looks straightforward to embed, versus the complexity of standing Solr up as a separate process that communicates over complex HTTP.

However, I think that for almost all search use cases, Solr is the right approach. Because most of the complexity in Search is not the direct initial integration, but in the fuzzy areas of tuning searches, scaling to meet demand, and maintaining your indexes that cross over from the developer centric world to being in the systems world. And Solr handles all of those needs nicely.

Solution 3:

Like dcruz says, Solr uses Lucene anyway, so it's not a valid comparison.

Lucene is a toolkit for building search apps, Solr is a search app built with Lucene.

IMO you'd be crazy not to use Solr, as it provides you with a lot of 'plumbing' that you'd have to write yourself otherwise -- like a configurable Data Import Handler to suck data out of your RDBMS or XML repositories.

Plus it gives you a web admin interface and other bells and whistles.

Solution 4:

One thing to consider is how difficult it will be to setup your application when you mix these two environments (Java/.NET). If you use the Lucene.NET libraries you can limit your required external dependency installs which streamlines deployment.

Another thing to consider is do you need the extras that Solr is offering? A(nother) web admin interface is probably great but it extends your risk envelope. Laying down Java and another service means more patch management. If you stick with .NET only your patch strategy can be the standard windows update model.

Of course rolling your an implementation using Lucene.NET will have development and maintenance costs of its own but in my experience it has been straight forward and easy to work with.