ElasticSearch -- boosting relevance based on field value

Need to find a way in ElasticSearch to boost the relevance of a document based on a particular value of a field. Specifically, there is a special field in all my documents where the higher the field value is, the more relevant the doc that contains it should be, regardless of the search.

Consider the following document structure:

{
    "_all" : {"enabled" : "true"},
    "properties" : {
        "_id":            {"type" : "string",  "store" : "yes", "index" : "not_analyzed"},
        "first_name":     {"type" : "string",  "store" : "yes", "index" : "yes"},
        "last_name":      {"type" : "string",  "store" : "yes", "index" : "yes"},
        "boosting_field": {"type" : "integer", "store" : "yes", "index" : "yes"}
        }
}

I'd like documents with a higher boosting_field value to be inherently more relevant than those with a lower boosting_field value. This is just a starting point -- the matching between the query and the other fields will also be taken into account in determining the final relevance score of each doc in the search. But, all else being equal, the higher the boosting field, the more relevant the document.

Anyone have an idea on how to do this?

Thanks a lot!


Solution 1:

You can either boost at index time or query time. I usually prefer query time boosting even though it makes queries a little bit slower, otherwise I'd need to reindex every time I want to change my boosting factors, which usally need fine-tuning and need to be pretty flexible.

There are different ways to apply query time boosting using the elasticsearch query DSL:

  • Boosting Query
  • Custom Filters Score Query
  • Custom Boost Factor Query
  • Custom Score Query

The first three queries are useful if you want to give a specific boost to the documents which match specific queries or filters. For example, if you want to boost only the documents published during the last month. You could use this approach with your boosting_field but you'd need to manually define some boosting_field intervals and give them a different boost, which isn't that great.

The best solution would be to use a Custom Score Query, which allows you to make a query and customize its score using a script. It's quite powerful, with the script you can directly modify the score itself. First of all I'd scale the boosting_field values to a value from 0 to 1 for example, so that your final score doesn't become a big number. In order to do that you need to predict what are more or less the minimum and the maximum values that the field can contain. Let's say minimum 0 and maximum 100000 for instance. If you scale the boosting_field value to a number between 0 and 1, then you can add the result to the actual score like this:

{
    "query" : {
        "custom_score" : {
            "query" : {
                "match_all" : {}
            },
            "script" : "_score + (1 * doc.boosting_field.doubleValue / 100000)"
        }
    }
}

You can also consider to use the boosting_field as a boost factor (_score * rather than _score +), but then you'd need to scale it to an interval with minimum value 1 (just add a +1).

You can even tune the result in order the change its importance adding a weight to the value that you use to influence the score. You are going to need this even more if you need to combine multiple boosting factors together in order to give them a different weight.

Solution 2:

With a recent version of Elasticsearch (version 1.3+) you'll want to use "function score queries":

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-function-score-query.html

A scored query_string search looks like this:

{
 'query': {
        'function_score': {
            'query': { 'query_string': { 'query': 'my search terms' } },
            'functions': [{ 'field_value_factor': { 'field': 'my_boost' } }]
        }
    }
}

"my_boost" is a numeric field in your search index that contains the boost factor for individual documents. May look like this:

{ "my_boost": { "type": "float", "index": "not_analyzed" } }

Solution 3:

if you want to avoid to do the boosting each time inside the query, you might consider to add it to your mapping directly adding "boost: factor.

So your mapping then may look like this:

{
    "_all" : {"enabled" : "true"},
    "properties" : {
        "_id":            {"type" : "string",  "store" : "yes", "index" : "not_analyzed"},
        "first_name":     {"type" : "string",  "store" : "yes", "index" : "yes"},
        "last_name":      {"type" : "string",  "store" : "yes", "index" : "yes"},
        "boosting_field": {"type" : "integer", "store" : "yes", "index" : "yes", "boost" : 10.0,}
        }
}