On a modern system, will using disk compression give me better overall performance?
Solution 1:
Yes, disk compression can provide better performance under particular circumstances:
- Your application is disk throughput bound: modern CPUs and (de)compression algorithms can run at much higher bandwidth than modern disks in long transfers. Any reduction at all in the amount of data moving to or from disk platters is a win in this circumstance
- It takes less time to (de)compress data that's going to disk platters than the difference in transfer times, and you have CPU cycles to spare
There's a reason both ZFS and Btrfs, both recent green-field designs, include provisions for compression.
In the HPC space, when an application is checkpointing from memory to disk, the CPUs are frequently not doing anything useful at all. This time is essentially pure overhead. Any use of the CPUs to reduce this time is a win.
Solution 2:
Disk compression will never give you better performance.
It may give you almost no penalty due to fast modern CPUs, but that's an entirely different thing.
You assume having to transfer less data from/to disk can improve performance; but big data transfers are almost never an I/O bottleneck: the real bottlenecks are seek time and latency. Modern hard disks are really fast on sustained data transfers with big files, what slows them down are little transfers from all over the disk.
Some scenarios:
- Media files. Those are usually already compressed on their own (JPEG, MPEG, MP3), so compressing them at the filesystem level is not going to help at all; it will instead worsen things, because CPU resources are already needed to encode/decode them.
- Databases. Those are usually read from/written to in little random bursts, so compressing them will not only have no benefit at all, but will also degrade performance, as the DBMS can't properly identify where on disk the physical data it needs to access are stored.
- Pagefile. This is usually quite large, but the O.S. needs to address very small chunks of data on it, and needs to do that very precisely ("Read 4K at physical address X"); compressing it is usually not possible, but even if it was, it would be a complete waste of time and resources: it would provide almost zero compression, due to the "complete random data" nature of this file.