Elasticsearch and CAP Theorem
Elasticsearch is a distributed system. As per the CAP theorem, it can satisfy any 2 out of 3 properties. Which one is compromised in Elasticsearch?
The answer is not so straightforward. It depends upon how the system is configured and how you want to use it. I will try to go into the details.
Paritioning in ElasticSearch
- Each index is partitioned in shards, meaning data in each shard is mutually exclusive to other shards. Each shard further has multiple Lucence indices, which are not in the scope of this answer.
- Each shard can have a replica running (most setups have) and in an event of a failure, the replica can be promoted to a master. Let's call a shard that has a primary working and is reachable from the ES node that our application server is hitting as an Active shard. Thus, a shard with no copies in primary and is not reachable is considered as failed shard. (Eg: An error saying "all shards failed" means no primaries in that index are available)
- ES has a feature to have multiple primaries (divergent shards). It is not a good situation as we lose both read/write consistencies.
In an event of a network partition, what will happen to:
-
Reads:
- By default reads will continue to happen on shards that are active. Thus, the data from failed shards will be excluded from our search queries. In this context, we consider the system to be AP. However, the situation is temporary and does not require manual effort to synchronize shard when the cluster is connected again.
- By setting a search option
allow_partial_search_results
[1] tofalse
, we can force the system to error when some of the shards have failed, guaranteeing consistent results. In this context, we consider the system to be CP. - In case no primaries are reachable from the node(s) that our application server is connecting to, the system will completely fail. Even if we say that our partition tolerance has failed, we also see that availability has taken a hit. This situation can be called be just C or CP
- There can be cases where the team has to anyhow bring up the shards and their out of sync replica(s) were reachable. So they decide to make it a primary (manually). Note that there can be some un-synced data resulting in divergent shards. This results in the AP situation. Consistency will be hard to restore when the situation normalizes (sync shards manually)
-
Writes
- Only if all shards fail, writes will stop working. But even if one shard is active writes will work and are consistent (by default). This will be CP
- However, we can set option
index-wait-for-active-shards
[2] asall
to ensure writes only happen when all shards in the index are active. I only see a little advantage of the flag, which would be to keep all shards balanced at any cost. This will be still CP (but lesser availability than the previous case) - Like in the last read network partition case, if we make un-synced replicas as primary (manually) there can be some data loss and divergent shards. The situation will be AP here and consistency will be hard to restore when the situation normalizes (sync shards manually)
Based on the above, you can make a more informed decision and tweak ElasticSearch according to your requirements.
References:
- https://www.elastic.co/guide/en/elasticsearch/reference/current/search-search.html
- https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-index_.html#index-wait-for-active-shards
I strongly disagree with Harshit, Elasticsearch compromises on availability as he also mentioned few requests are returned error due to unavailability of shards.
ES guarantees consistency - as data read/write are always consistent. guarantees ES gaurantees Partition tolerance - if any node which was partitioned, joined back to the cluster after some time, it is able to recover the missed data to the current state.
Moreover, there is no distributed system that gives up on Partition Tolerance, cause without a guaranty of PT distributed system can't exist.