NTFS compression on SSD - ups and downs

This topic discusses NTFS compression on HDDs as a method of improving disk access performance, and concludes that it's poor at that more often than not. But I have always viewed compression as a way to conserve space, and learned its effectiveness at that. And now I have an SSD where space is expensive and performance penalty e. g. for reading / writing 2 clusters instead of 1 is much lower.

On the other hand, since SSDs are much faster than HDDs, I would expect that higher throughput will result in higher CPU usage. Can this become an issue? Any other thoughts on the matter?

I like the space saving effect, it's not huge but it's there. If performance is a concern, though, I would rather turn it off:

enter image description here


Solution 1:

Microsoft wrote this a while ago in a blog:

NTFS compresses files by dividing the data stream into CU’s (this is similar to how sparse files work). When the stream contents are created or changed, each CU in the data stream is compressed individually. If the compression results in a reduction by one or more clusters, the compressed unit will be written to disk in its compressed format. Then a sparse VCN range is tacked to the end of the compressed VCN range for alignment purposes (as shown in the example below). If the data does not compress enough to reduce the size by one cluster, then the entire CU is written to disk in its uncompressed form.

This design makes random access very fast since only one CU needs to be decompressed in order to access any single VCN in the file. Unfortunately, large sequential access will be relatively slower since decompression of many CU’s is required to do sequential operations (such as backups).

And in a KB article writes this:

While NTFS file system compression can save disk space, compressing data can adversely affect performance. NTFS compression has the following performance characteristics. When you copy or move a compressed NTFS file to a different folder, NTFS decompresses the file, copies or moves the file to the new location, and then recompresses the file. This behavior occurs even when the file is copied or moved between folders on the same computer. Compressed files are also expanded before copying over the network, so NTFS compression does not save network bandwidth.

Because NTFS compression is processor-intensive, the performance cost is more noticeable on servers, which are frequently processor-bound. Heavily loaded servers with a lot of write traffic are poor candidates for data compression. However, you may not experience significant performance degradation with read-only, read-mostly, or lightly loaded servers.

If you run a program that uses transaction logging and that constantly writes to a database or log, configure the program to store its files on a volume that is not compressed. If a program modifies data through mapped sections in a compressed file, the program can produce "dirty" pages faster than the mapped writer can write them. Programs such as Microsoft Message Queuing (also known as MSMQ) do not work with NTFS compression because of this issue.

Because user home folders and roaming profiles use lots of read and write operations, Microsoft recommends that you put user home folders and roaming profiles on a volume that does not have NTFS compression on the parent folder or on the volume root.


Summary:

only compress small files which never change (only reads and no writes to it) because reads are fast, but writes require uncompression and new compression which takes CPU power and the storage type is not so important.

Solution 2:

As Claudio says a lot of things in detail, i am going to resume his opinion that is also mine, i have seen the same effects after trying what he says.

For SSD the NTFS compression must not be used.

Now i will enumerate some motives for such affirmation:

Motive Nº1: It will kill SSD musch faster, since it makes two writes; NTFS compression allways writes uncompressed data prior to start compression on RAM and then re-write compressed data only if it is a gain of at least 4KiB.

Motive Nº2: Using NTFS 4KiB cluster on a SSD is loosing 50% of SSD speed, check any benchmark and will see 128KiB blocks makes SSD twice faster than using 4KiB blocks, and NTFS compression can only be used on 4KiB cluster NTFS partitions.

Motive Nº3: There are containers (like PISMO File Mount) that can create a container that is seen as on the fly compression and/or encryption, such conteiners do the compression on RAM and do not send uncompressed data to disk prior to re-write on compressed form, also more, PISMO gets a better compression ratio than NTFS.

There are much more motives, but that are the top most importants.

The otrer point is SPEED, any compresion is done on CPU, so if you do not have a very fast CPU (mono-thread is used for such on NTFS while multi-thread is used on some containers) will see very slow read/write when compressed; worst, you can have a very fast cpu, but if it is in use for other things (like rendering, transcoding, etc) there is no cpu left for compression, so again you will get poor performance.

NTFS compresion is only good for traditional slow disks when you have cpu without much use, but it requieres a good defragmentation after each write (at file level), because each 64KiB block (compressed or not) is written at a multiple of 64KiB position; the only way to pack such fragments is after compression (or write on a compressed folder) do a defragmentation of such file.

P.D.: Beware we are talking about Windows on real hardware, not inside virtual machines, the importatnt thing is who writes to the physical medium, other ones may have cache layers that can mitigate the effects and also improve things a lot.

Solution 3:

No one talk about mayor problem on non SSD, it is fragmentation.

Each 64KiB block is written where it would be without compression, but it it can be compressed, so at least is <=60KiB, then it writes less than 64KiB, bit nest block will go where it would as if the previous one wasn't compress, so a lot of gaps apèars.

Test it with a multi gigabyte file of a virtusl machine of any windows system (they tend to be reduced at 50%, but with a huge >10000 fragments).

And for SSDs there is something not told, how on the hell do it write? I mean, if it does write it uncompressed and then overwrite it with compressed version (for each 64KiB mega blocks), the SSD life is cutted a lot; but if it writes it directly on compressed form, then SSD live could be lo ger or shorter.... longer if you write that 64KiB only at once, shorter, mu h shorter if you write that 64KiB in 4KiB, because it will write such 64KiB (in compressed form) as many times as 64/4=16 times.

The performance penalty is caused because CPU time needed to compress/uncompress be bigger than time gained on not need writting 4KiB blocks... so with a very fast CPU and a very slow disk compression reduces time to write and read, but if SSD is very fast and CPU is quite slow, it will write much slower.

When i talk about fast or slow CPU i mean at that moment, CPU can be in use by 'maths' or other process, so allways think on free cpu, not on CPU specs at paper, same goes for disk/SSD, it can be in use by multiple process.

Say you have 7Zip writting a huge file from another disk with LZMA2, it will use a lot of CPU, so if at the same time you are copying a NTFS compressed file, it has no CPU free, so it will go slower than without NTFS compression, but as soon as 7Zip end using the CPU, such CPU will be able to NTFS compress faster, and at that time NTFS compression can do things faster.

Personally i never use NTFS compression, i prefer PISMO file mount PFO containers (with compression, and it also allows encription, both on the fly and transparent to apps), it gives much better compresion ratio and less CPU impact, while it is a read and write on the fly, no need to decompress prior to use, just mount and use it in read and write mode.

Since PISMO do compression on RAM prior to write on disk, it can make SSD last longer, my tests of NTFS compression makes me think it send data to disk twice, first uncompressed, and after that if it can compress it is overwitten in compressed form.

Why NTFS compressed write speed on my SSD is near 1/2 of non compressed one with files than compress at near 1/2 of its size or lower compressed sizes? In my AMD Threadripper 2950 (32 cores and 64 threads) with 128GiB of ram (fast CPU, very fast CPU) at less than 1% use of it, so there is plenty CPU to do compression faster than SSD max secuential speed, maybe because NTFS compression starts after that 64KiB blocks are sent to disk uncompressed and then overwritten with the compressed version... oh if i do it on a virtual machine running Linux on host and Windows on guest, then Linux cache informs me such clusters are written twice, and speed is much, much faster (Linux is caching the non compressed NTFS writes sent by windows guest and since after it they get overwrited with compressed data, linux do not send uncompressed data to the disk, Linux write cache!!!).

My recomendation, do not use NTFS compression, except inside Virtual machines guests thst runs windows if host is Linux, and never ever if you use the CPU a lotor if your CPU is not fast enough.

Modern SSD has a huge internal ram cache, so that write+overwtite caused by NTFS compression can be mitigated by SSD internal cache system.

My tests where done on "pretty" SSD's with no internal RAM for cache inside the SSD, when i repeat them on the ones with ram cache, write speed is fastet, but not as one would think.

Do your own tests, and use huge files sizes (bigger than total tam installed to avoid cache hidden results).

By the way, something some people do not know about NTFS vompression... any file of 4KiB or lower will never ever get NTFS compress because there is no way to reduce its size at least 4KiB.

NTFS co pression takes bloack of 64KiB, compress them and if it can reduce one cluster (4KiB) then it is written compressed, 64KiB are 16 blocks of 4KiB (consecutives).

If a file of 8KiB when compression ends the final result is more than 4KiB it van not save any cluster, so it is written non compressed,... and so on... pression must gain at least 4KiB.

Ah, and for NTFS compression, the NTFS must be with cluster size of 4KiB.

Try and do a test: Use 128KiB cluster on a NTFS on SSDyou will see a huge performance improve on write an read speeds.

Filesystems on SSD with 4KiB cluster are loosing a lot of their speed, on most cases more than a 50% lost... see any benchmark out there that test with different block sizes, from 512Bytes up to 2MiB, most of SSD write at double speed when on 64KiB (or 128KiB) cluster size than on 4KiB.

Want a real imptivement on your SSD? Do not use 4KiB cluster on filesystem, use 128KiB.

Only use 4KiB cluster if more than 99% of your files are less than 128KiB.

Etc, etc, etc... test, test and test your own case.

Note: Create the system NTFS partition with diskpart in console mode while installing Windows with 128KiB cluster, or from another Windows, but do not let windows format while on installer graphical part (it will allways format it as 4KiB cluster NTFS).

All my Windows are now installed on 128KiB cluster NTFS partition on >400GiB SSD (SLC).

Hope things will get clear, M$ is not saying how iy writes NTFS compressed, my tests tell me it write twice (64KiB uncompressed, then <=60KiB compreesed), not just once (beware of that if on SSD).

Beware: Windows tries to NTFS compress some internal dirs, no matter if you say no NTFS compress, the only way to really avoid such if having NFTS cluster size different than 4KiB, since NTFS compression only works on 4KiB cluster size NTFS partitions

Solution 4:

I see the comments by others, and I think people often forget the most useful scenario where NTFS file/folder compression has a great advantage on SSD: modern development tools. My university-licensed Matlab has in its (for ordinary user read-only) installation folder the following amounts of data:

28.5 GB Data 30.6 GB Size on disk Contains 729.246 files and 15.000 folders (!!!)

This is on my laptop with 500 GB SSD, where the windows partition is 200 GB.

I know Matlab is a bit extreme in this regard, but many devtools have similar properties: a ton of small, highly compressable text files (headers, code, XML files). I am compressing Matlab right now before I install Intel Quartus FPGA devtool, and Octave is already compressed as follows:

1.55 GB Data Size on disk: 839 GB Contains 34.362 files 1.955 folders

This stuff is written once, and read zillions of times during project builds. It makes perfect sense to expend some CPU power to decompress it and save perhaps half of your precious SSD space.