Elasticsearch Bulk API - Index vs Create/Update
I'm using the Elasticsearch Bulk API to create or update documents.
I do actually know if they are creates or updates, but I can simplify my code by just making them all index
, or "upserts" in the SQL sense.
Is there any disadvantage in using index
(and letting ES figure it out) over using the more explicit create
and update
?
If you're sending create
, you must ensure that the document doesn't exist yet in your index otherwise the call will fail, whereas sending the same document with index
will always succeed.
Then, if for performance reasons, you know you'll create a document (with either create
or index
) and then you'll only update just a few properties, then using update
might make sense.
Otherwise, if you're always sending full documents, I'd use index
all the time, for both creating and updating. Whenever it sees an index
action, ES will either create the document if it doesn't exist or replace it if it exists, but the call will always succeed.
The short answer: No there is no disadvantage.
The create and update endpoint are special cases. With create you want to do nothing if the document is already there. With update you can provided less data if you do not have all the data of the document you could just add a few fields. You could also make sure the document is only indexed in case it is already there with the update.
You won't be able to use index for everything. According to the docs:
index will add or replace a document as necessary
Also, if you are updating a document, it might be worthwhile to add the 'doc_as_upsert' flag. More info here and here