Compressing floating point data
Are there any lossless compression methods that can be applied to floating point time-series data, and will significantly outperform, say, writing the data as binary into a file and running it through gzip?
Reduction of precision might be acceptable, but it must happen in a controlled way (i.e. I must be able to set a bound on how many digits must be kept)
I am working with some large data files which are series of correlated double
s, describing a function of time (i.e. the values are correlated). I don't generally need the full double
precision but I might need more than float
.
Since there are specialized lossless methods for images/audio, I was wondering if anything specialized exists for this situation.
Clarification: I am looking for existing practical tools rather than a paper describing how to implement something like this. Something comparable to gzip in speed would be excellent.
Here are some ideas if you want to create your own simple algorithm:
- Use xor of the current value with the previous value to obtain a set of bits describing the difference.
- Divide this difference into two parts: one part is the "mantissa bits" and one part is the "exponent bits".
- Use variable length encoding (different number of bits/bytes per value), or any compression method you choose, to save these differences. You might use separate streams for mantissas and exponents, since mantissas have more bits to compress.
- This may not work well if you are alternating between two different time-value streams sources. So you may have to compress each source into a separate stream or block.
- To lose precision, you can drop the least significant bits or bytes from the mantissa, while leaving the exponent intact.
You might want to have a look at these resources:
- Lossless Compression of Predicted Floating-Point Values
- Papers by Martin Burtscher: The FPC Double-Precision Floating-Point Compression Algorithm and its Implementation, Fast Lossless Compression of Scientific Floating-Point Data and High Throughput Compression of Double-Precision Floating-Point Data
You might also want to try Logluv-compressed TIFF for this, thought I haven't used them myself.