How does NTFS compression affect performance?
I've heard that NTFS compression can reduce performance due to extra CPU usage, but I've read reports that it may actually increase performance because of reduced disk reads. How exactly does NTFS compression affect system performance?
Notes:
- I'm running a laptop with a 5400 RPM hard drive, and many of the things I do on it are I/O bound.
- The processor is a AMD Phenom II with four cores running at 2.0 GHz.
- The system is defragmented regularly using UltraDefrag.
- The workload is mixed read-write, with reads occurring somewhat more often than writes.
- The files to be compressed include a selected subset of personal documents (not the full home folder) and programs, including several (less demanding) games and Visual Studio (which tends to be I/O bound more often than not).
Solution 1:
I've heard that NTFS compression can reduce performance due to extra CPU usage, but I've read reports that it may actually increase performance because of reduced disk reads.
Correct. Assuming your CPU, using some compression algorithm, can compress at C MB/s and decompress at D MB/s, and your hard drive has write speed W and read speed R. So long as C > W, you get a performance gain when writing, and so long as D > R, you get a performance gain when reading. This is a drastic assumption in the write case, since Lempel-Ziv's algorithm (as implemented in software) has a non-deterministic compression rate (although it can be constrained with a limited dictionary size).
How exactly does NTFS compression affect system performance?
Well, it's exactly by relying on the above inequalities. So long as your CPU can sustain a compression/decompression rate above your HDD write speed, you should experience a speed gain. However, this does have an effect on large files, which may experience heavy fragmentation (due to the algorithm), or not be compressed at all.
This may be due to the fact that the Lempel-Ziv algorithm slows down as the compression moves on (since the dictionary continues to grow, requiring more comparisons as bits come in). Decompression is almost always the same rate, regardless of the file size, in the Lempel-Ziv algorithm (since the dictionary can just be addressed using a base + offset scheme).
Compression also impacts how files are laid out on the disk. By default, a single "compression unit" is 16 times the size of a cluster (so most 4 kB cluster NTFS filesystems will require 64 kB chunks to store files), but does not increase past 64 kB. However, this can affect fragmentation and space requirements on-disk.
As final note, latency is another interesting value of discussion. While the actual time it takes to compress the data does introduce latency, when the CPU clock speed is in gigahertz (i.e. each clock cycle is less then 1 ns), the latency introduced is negligible compared to hard drive seek rates (which is on the order of milliseconds, or millions of clock cycles).
To actually see if you'll experience a speed gain, there's a few things you can try. The first is to benchmark your system with a Lempel-Ziv based compression/decompression algorithm. If you get good results (i.e. C > W and D > R), then you should try enabling compression on your disk.
From there, you might want to do more benchmarks on actual hard drive performance. A truly important benchmark (in your case) would be to see how fast your games load, and see how fast your Visual Studio projects compile.
TL,DR: Compression might be viable for a filesystem utilizing many small files requiring high throughput and low latency. Large files are (and should be) unaffected due to performance and latency concerns.
Solution 2:
I explained it here in the Wikpedia entry for NTFS:
NTFS can compress files using LZNT1 algorithm (a variant of the LZ77 [23] ). Files are compressed in 16-cluster chunks. With 4 kB clusters, files are compressed in 64 kB chunks. If the compression reduces 64 kB of data to 60 kB or less, NTFS treats the unneeded 4 kB pages like empty sparse file clusters—they are not written. This allows not unreasonable random-access times. However, large compressible files become highly fragmented as then every 64 kB chunk becomes a smaller fragment. [24][25] Compression is not recommended by Microsoft for files exceeding 30 MB because of the performance hit.[citation needed]
The best use of compression is for files that are repetitive, written seldom, usually accessed sequentially, and not themselves compressed. Log files are an ideal example. Compressing files that are less than 4 kB or already compressed (like .zip or .jpg or .avi) may make them bigger as well as slower.[citation needed] Users should avoid compressing executables like .exe and .dll (they may be paged in and out in 4 kB pages). Compressing system files used at bootup like drivers, NTLDR, winload.exe, or BOOTMGR may prevent the system from booting correctly.[26]
Although read–write access to compressed files is often, but not always [27] transparent, Microsoft recommends avoiding compression on server systems and/or network shares holding roaming profiles because it puts a considerable load on the processor.[28]
Single-user systems with limited hard disk space can benefit from NTFS compression for small files, from 4 kB to 64 kB or more, depending on compressibility. Files less than 900 bytes or so are stored with the directory entry in the MFT.[29]
The slowest link in a computer is not the CPU but the speed of the hard drive, so NTFS compression allows the limited, slow storage space to be better used, in terms of both space and (often) speed.[30] (This assumes that compressed file fragments are stored consecutively.)
I recommend compression only for files which compress to 64KB or less (ie 1 piece). Otherwise, your file will consist of many 64K or less fractions.
MyDefrag does a better job of defragging.