How to not-analyze in ElasticSearch?
I've got a field in an ElasticSearch field which I do not want to have analyzed, i. e. it should be stored and compared verbatim. The values will contain letters, numbers, whitespace, dashes, slashes and maybe other characters.
If I do not give an analyzer in my mapping for this field, the default still uses a tokenizer which hacks my verbatim string into chunks of words. I don't want that.
Is there a super simple analyzer which, basically, does not analyze? Or is there a different way of denoting that this field shall not be analyzed?
I only create the index, I don't do anything else. I can use analyzers like "english" for other fields which seems to be built-in names for pre-configured analyzers. Is there a list of other names? Maybe there's one fitting my needs (namely doing nothing with the input).
This is my mapping currently:
{
"my_type": {
"properties": {
"my_field1": { "type": "string", "analyzer": "english" },
"my_field2": { "type": "string" }
}
}
}
my_field1
is language-dependent; this seems to work. my_field2
shall be verbatim. I'd like to give an analyzer there which simply does not do anything.
A sample value for my_field2
would be "B45c 14/04"
.
Solution 1:
"my_field2": {
"properties": {
"title": {
"type": "string",
"index": "not_analyzed"
}
}
}
Check you here, https://www.elastic.co/guide/en/elasticsearch/reference/1.4/mapping-core-types.html, for further info.
Solution 2:
This is no longer true due to the removal of the string
(replaced by keyword
and text
) type as described here. Instead you should use keyword
type with "index": true | false
.
For Example OLD:
{
"foo": {
"type" "string",
"index": "not_analyzed"
}
}
becomes NEW:
{
"foo": {
"type" "keyword",
"index": true
}
}
This means the field is indexed but as it is typed as keyword
not analyzed implicitly. If you would like to have the field analyzed, you need to use text
type.
Solution 3:
keyword
analyser can be also used.
// don't actually use this, use "index": "not_analyzed" instead
{
"my_type": {
"properties": {
"my_field1": { "type": "string", "analyzer": "english" },
"my_field2": { "type": "string", "analyzer": "keyword" }
}
}
}
As noted here: https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-keyword-analyzer.html, it makes more sense to mark those fields as not_analyzed
.
But keyword
analyzer can be useful when it is set by default for whole index.
UPDATE: As it said in comments, string
is no longer supported in 5.X