How well does elasticsearch compress data?
I am looking to scope the servers required for an elasticsearch proof-of-concept.
Ultimately, my question is this:
Given 1GB of json text indexed by elasticsearch, how much can I expect to be occupied on disk by elasticsearch?
Obviously there are many variables, but I'm going for orders of magnitude. 100MB? 100GB?
I understand that elasticsearch performs compression ( http://www.elasticsearch.org/guide/reference/index-modules/store/ ), but I don't know what kind of footprint the indexes and other structures occupy.
Anecdotal answers are acceptable, but please also let me know what version you're using.
The answer is: it depends.
A blog post by Adrien Grand, who works on Elasticsearch did some benchmarking with this from the Lucene standpoint. It looks like he was getting about 2x improvement.
He also mentions LUCENE-4226, where some of this underlying compression work was done. There's a few benchmarks listed within, comparing the various compression algorithms tested.
As well, based on this Elasticsearch 0.19.5 release announcement, it appears that store-level compression defaults to LZF, with Snappy coming some time in the future. Further looking around showed that Snappy experimental support appeared in 0.19.9