EsRejectedExecutionException in elasticsearch for parallel search

Solution 1:

Elasticsearch has a thread pool and a queue for search per node. A thread pool will have N number of workers ready to handle the requests. When a request comes and if a worker is free , this is handled by the worker. Now by default the number of workers is equal to the number of cores on that CPU. When the workers are full and there are more search requests, the request will go to queue. The size of queue is also limited. If by default size is, say, 100 and if there happens more parallel requests than this, then those requests would be rejected as you can see in the error log.

Solutions:

  1. The immediate solution for this would be to increase the size of the search queue. We can also increase the size of threadpool, but then that might badly affect the performance of individual queries. So, increasing the queue might be a good idea. But then remember that this queue is memory residential and increasing the queue size too much can result in Out Of Memory issues. (more info)

  2. Increase number of nodes and replicas - Remember each node has its own search threadpool/queue. Also, search can happen on primary shard OR replica.

Solution 2:

Maybe it sounds strange, but you need to lower the parallel searches count. With that exception, Elasticsearch tells you that you are overloading it. There are some limits (at thread count level) that are set in Elasticsearch and, most of the times, the defaults for these limits are the best option. So, if you are testing your cluster to see how much load it can hold, this would be an indicator that some limits have been reached.

Alternatively, if you really want to change the default you can try increasing the queue size for searches to accommodate the concurrency demands, but keep in mind that the larger the queue size, the more pressure you put on your cluster that, in the end, will cause instability.

Solution 3:

I saw this same error because I was sending lots of indexing requests in to ES in parallel. Since I'm writing a data migration, it was easy enough to make them serial, and that resolved the issue.

Solution 4:

I don't know what was your node configuration but your queue size (1000) is already on a higher side. As others have explained already, your search requests are queued in the Elasticsearch thread pool queue. Even after such a high queue size, if you are getting rejections, that gives some hint that you need to revisit your query pattern.

Like many other designs, even in this case, there is no one-size-fits-all solution. I found this is a very good post about how this queue works and different ways to do a performance test to find out what suits best for your use case.

HTH!