Remove a field from a Elasticsearch document
I need to remove a field in all the documents indexed to Elasticsearch. How can I do it?
What @backtrack told is true , but then there is a very convenient way of doing this in Elasticsearch. Elasticsearch will abstract out the internal complexity of the deletion. You need to use update API to achieve this -
curl -XPOST 'localhost:9200/test/type1/1/_update' -d '{
"script" : "ctx._source.remove(\"name_of_field\")"
}'
You can find more documentation here.
Note: As of Elastic Search 6 you are required to include a content-type header:
-H 'Content-Type: application/json'
Elasticsearch added update_by_query
in 2.3. This experimental interface allows you to do the update against all the documents that match a query.
Internally elasticsearch does a scan/scroll to collect batches of documents and then update them like the bulk update interface. This is faster than doing it manually with your own scan/scroll interface due to not having the overhead of network and serialization. Each record must be loaded into ram, modified and then written.
Yesterday I removed a large field from my ES cluster. I saw sustained throughput of 10,000 records per second during the update_by_query, constrained by CPU rather than IO.
Look into setting conflicts=proceed
if the cluster has other update traffic, or the whole job will stop when it hits a ConflictError
when one of the records is updated underneath one of the batches.
Similarly setting wait_for_completion=false
will cause the update_by_query to run via the tasks interface. Otherwise the job will terminate if the connection is closed.
url:
http://localhost:9200/INDEX/TYPE/_update_by_query?wait_for_completion=false&conflicts=proceed
POST body:
{
"script": "ctx._source.remove('name_of_field')",
"query": {
"bool": {
"must": [
{
"exists": {
"field": "name_of_field"
}
}
]
}
}
}
As of Elasticsearch 1.43, inline groovy scripting is disabled by default. You'll need to enable it for an inline script like this to work by adding script.inline: true
to your config file.
Or upload the groovy as a script and use the "script": { "file": "scriptname", "lang": "groovy"}
format.
You can use _update_by_query
Example 1
index: my_index
field: user.email
POST my_index/_update_by_query?conflicts=proceed
{
"script" : "ctx._source.user.remove('email')",
"query" : {
"exists": { "field": "user.email" }
}
}
Example 2
index: my_index
field: total_items
POST my_index/_update_by_query?conflicts=proceed
{
"script" : "ctx._source.remove('total_items')",
"query" : {
"exists": { "field": "total_items" }
}
}