Elastic search- search_analyzer vs index_analyzer
I was looking at http://euphonious-intuition.com/2012/08/more-complicated-mapping-in-elasticsearch/ which explains ElasticSearch analyzers.
I did not understand the part about having different search and index analyzers.
The second example of custom mapping goes like this:
->the index analyzer is an edgeNgram
->the search analyzer is:
"full_name":{
"filter":[
"standard",
"lowercase",
"asciifolding"
],
"type":"custom",
"tokenizer":"standard"
}
if we wanted the query "Race" to not return results like *ra*pport and *rac*ial due to edgeNgram, why index it with edgeNgram in the first place?
Please explain with an example where different analyzers are useful.
Solution 1:
You usually have similar analysis chain at both index time and query time. Similar doesn't mean exactly the same, but usually the way you index documents reflects the way you query them.
The ngrams example is a really good fit though, since it's one of the main reasons why you would use different analyzers at index and query time.
For partial matches you index with edge ngrams, so that "elasticsearch" becomes (with mingram 3 and maxgram 20):
"ela", "elas","elast","elasti","elastic","elastics","elasticse","elasticsea","elasticsear","eleasticsearc" and "elasticsearch"
Let's now query the created field. If we query for the term "elastic" there's a match and we get back the expected result. We basically made become what we called above partial match an exact match, given what we indexed. There's no need to apply ngrams to the query too. If we did we would query for all the following terms:
"ela", "elas","elast","elasti" and "elastic"
That would make the query way more complex and would lead to get weird results as well. Let's say you index the term "elapsed" in another document, same field. You would have the following ngrams:
"ela", "elap", "elaps", "elapse", "elapsed"
If you search for "elastic" and make ngrams to the query, the term "ela" would match this second document too, thus you would get it back together with the first document, even though no terms contain the whole "elastic" term you were looking for.
I would suggest you to have a look at the analyze api to play around around with different analyzer and their different results.
Solution 2:
To reference the official documentation about index vs search analyzers:
Occasionally, it makes sense to use a different analyzer at index and search time. For instance, at index time we may want to index synonyms, eg for every occurrence of quick we also index fast, rapid and speedy. But at search time, we don’t need to search for all of these synonyms. Instead we can just look up the single word that the user has entered, be it quick, fast, rapid or speedy.
To enable this distinction, Elasticsearch also supports the index_analyzer and search_analyzer parameters, and analyzers named default_index and default_search.
Taking these extra parameters into account, the full sequence at index time really looks like this:
- the index_analyzer defined in the field mapping, else
- the analyzer defined in the field mapping, else
- the analyzer defined in the _analyzer field of the document, else
- the default index_analyzer for the type, which defaults to
- the default analyzer for the type, which defaults to
- the analyzer named default_index in the index settings, which defaults to
- the analyzer named default in the index settings, which defaults to
- the analyzer named default_index at node level, which defaults to
- the analyzer named default at node level, which defaults to
- the standard analyzer
And at search time:
- the analyzer defined in the query itself, else
- the search_analyzer defined in the field mapping, else
- the analyzer defined in the field mapping, else
- the default search_analyzer for the type, which defaults to
- the default analyzer for the type, which defaults to
- the analyzer named default_search in the index settings, which defaults to
- the analyzer named default in the index settings, which defaults to
- the analyzer named default_search at node level, which defaults to
- the analyzer named default at node level, which defaults to
- the standard analyzer