Recommendations for Elastic Search hardware [closed]
Are there any good guides for hardware level to support ElasticSearch? Are recommendations for Lucene or Solr a good place to start? We're looking at rolling out a deployment starting with
- 27 million documents, 8TB of data
- add 300k documents per day
Then scaling that up about 10x, to
- 270 million documents, 80TB of data
- add 3 million documents/day
This is a strange use case, where queries would be in the thousands/day, but response times need to remain low enough for a good experience with an Ajaxy webapp.
Solution 1:
There are a lot of factors that can come into play, so I don't think there are many general guidelines.
You should conduct a smaller scale evaluation, perhaps with 1/5th the initial data set to see how things behave when you throw your expected indexing and search load at the setup. This will ensure you understand how much space your data will actually consume in the search engine. For elasticsearch, depends if you are storing the source json and how fields get analyzed and if they are stored.
EC2 can be a reasonable way to eval elasticsearch without a large h/w expenditure.
For cluster based software, like elasticsearch, there are tradeoffs between keeping the cluster smaller vs larger. A large cluster is nice because when you lose a server, less data needs to be re-allocated. A smaller cluster consumes less energy and is easier to maintain.
We run a cluster with 35 million documents w/ total index size around 300GB x 2, since all the indexes are replicated. To support this and a very large number of searches, we have 4 nodes, each with 24 cores, 48GB of RAM and 1TB of storage with 10K disks in raid10. We recently increased disk size to ensure we had more head room.
For your case, I'd recommend more RAM and more disk. You can probably save money on CPUs with that search volume.
Low search volume actually hurts performance, since caches (both internal to the s/w used and OS disk) won't be warmed well.
Hope this helps, Paul