ElasticSearch: Unassigned Shards, how to fix?
I have an ES cluster with 4 nodes:
number_of_replicas: 1
search01 - master: false, data: false
search02 - master: true, data: true
search03 - master: false, data: true
search04 - master: false, data: true
I had to restart search03, and when it came back, it rejoined the cluster no problem, but left 7 unassigned shards laying about.
{
"cluster_name" : "tweedle",
"status" : "yellow",
"timed_out" : false,
"number_of_nodes" : 4,
"number_of_data_nodes" : 3,
"active_primary_shards" : 15,
"active_shards" : 23,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 7
}
Now my cluster is in yellow state. What is the best way to resolve this issue?
- Delete (cancel) the shards?
- Move the shards to another node?
- Allocate the shards to the node?
- Update 'number_of_replicas' to 2?
- Something else entirely?
Interestingly, when a new index was added, that node started working on it and played nice with the rest of the cluster, it just left the unassigned shards laying about.
Follow on question: am I doing something wrong to cause this to happen in the first place? I don't have much confidence in a cluster that behaves this way when a node is restarted.
NOTE: If you're running a single node cluster for some reason, you might simply need to do the following:
curl -XPUT 'localhost:9200/_settings' -d '
{
"index" : {
"number_of_replicas" : 0
}
}'
Solution 1:
By default, Elasticsearch will re-assign shards to nodes dynamically. However, if you've disabled shard allocation (perhaps you did a rolling restart and forgot to re-enable it), you can re-enable shard allocation.
# v0.90.x and earlier
curl -XPUT 'localhost:9200/_settings' -d '{
"index.routing.allocation.disable_allocation": false
}'
# v1.0+
curl -XPUT 'localhost:9200/_cluster/settings' -d '{
"transient" : {
"cluster.routing.allocation.enable" : "all"
}
}'
Elasticsearch will then reassign shards as normal. This can be slow, consider raising indices.recovery.max_bytes_per_sec
and cluster.routing.allocation.node_concurrent_recoveries
to speed it up.
If you're still seeing issues, something else is probably wrong, so look in your Elasticsearch logs for errors. If you see EsRejectedExecutionException
your thread pools may be too small.
Finally, you can explicitly reassign a shard to a node with the reroute API.
# Suppose shard 4 of index "my-index" is unassigned, so you want to
# assign it to node search03:
curl -XPOST 'localhost:9200/_cluster/reroute' -d '{
"commands": [{
"allocate": {
"index": "my-index",
"shard": 4,
"node": "search03",
"allow_primary": 1
}
}]
}'
Solution 2:
OK, I've solved this with some help from ES support. Issue the following command to the API on all nodes (or the nodes you believe to be the cause of the problem):
curl -XPUT 'localhost:9200/<index>/_settings' \
-d '{"index.routing.allocation.disable_allocation": false}'
where <index>
is the index you believe to be the culprit. If you have no idea, just run this on all nodes:
curl -XPUT 'localhost:9200/_settings' \
-d '{"index.routing.allocation.disable_allocation": false}'
I also added this line to my yaml config and since then, any restarts of the server/service have been problem free. The shards re-allocated back immediately.
FWIW, to answer an oft sought after question, set MAX_HEAP_SIZE to 30G unless your machine has less than 60G RAM, in which case set it to half the available memory.
References
- Shard Allocation Awareness
Solution 3:
This little bash script will brute force reassign, you may lose data.
NODE="YOUR NODE NAME"
IFS=$'\n'
for line in $(curl -s 'localhost:9200/_cat/shards' | fgrep UNASSIGNED); do
INDEX=$(echo $line | (awk '{print $1}'))
SHARD=$(echo $line | (awk '{print $2}'))
curl -XPOST 'localhost:9200/_cluster/reroute' -d '{
"commands": [
{
"allocate": {
"index": "'$INDEX'",
"shard": '$SHARD',
"node": "'$NODE'",
"allow_primary": true
}
}
]
}'
done
Solution 4:
I also encountered similar error. It happened to me because one of my data node was full and due to which shards allocation failed. If unassigned shards are there and your cluster is RED and few indices also RED, in that case I have followed below steps and these worked like a champ.
in kibana dev tool-
GET _cluster/allocation/explain
If any unassigned shards are there then you will get details else will throw ERROR.
simply running below command will solve everything-
POST _cluster/reroute?retry_failed
Thanks to -
https://github.com/elastic/elasticsearch/issues/23199#issuecomment-280272888
Solution 5:
The only thing that worked for me was changing the number_of_replicas (I had 2 replicas, so I changed it to 1 and then changed back to 2).
First:
PUT /myindex/_settings
{
"index" : {
"number_of_replicas" : 1
}
}
Then:
PUT /myindex/_settings
{
"index" : {
"number_of_replicas" : 2
}
}
(I Already asnwered it in this question)