Avoiding extreme fragmentation of compressed system images on NTFS

Solution 1:

Avoiding fragmentation

The secret is to not write uncompressed files on the disk to begin with.

Indeed, after you compress an already existing large file it will become horrendously fragmented due to the nature of the NTFS in-place compression algorithm.

Instead, you can avoid this drawback altogether by making OS compress a file's content on-the-fly, before writing it to the disk. This way compressed files will be written to the disk as any normal files - without unintentional gaps. For this purpose you need to create a compressed folder. (The same way you mark files to be compressed, you can mark folders to be compressed.) Afterwards, all files written to that folder will be compressed on the fly (i.e. written as streams of compressed blocks). Files compressed this way can still end up being somewhat fragmented, but it will be a far cry from the mess that in-place NTFS compression creates.

Example

NTFS compressed 232Mb system image to 125Mb:

  • In-place compression created whopping 2680 fragments!
  • On-the-fly compression created 19 fragments.

Defragmentation

It's true that NTFS compressed files can pose a problem to some defragment tools. For example, a tool I normally use can't efficiently handle them - it slows down to a crawl. Fret not, the old trusty Contig from Sysinternals does the job of defragmenting NTFS compressed files quickly and effortlessly!

Solution 2:

Reading the article on Wikipedia about NTFS compression:

Files are compressed in 16-cluster chunks. With 4 kB clusters, files are compressed in 64 kB chunks. If the compression reduces 64 kB of data to 60 kB or less, NTFS treats the unneeded 4 kB pages like empty sparse file clusters—they are not written.

This allows for reasonable random-access times - the OS just has to follow the chain of fragments.

However, large compressible files become highly fragmented since every chunk < 64KB becomes a fragment.

First things first. WBAdmin is in essence a backup utility that cam restore a full system. So, it's expected that it's output file is large (> 4 Gb). As shown by the quote, large files become rapidly fragmented. This is due to the way NTFS compresses: not by files, but by sectors.

A good analogy is of a cake being split into several boxes, some of which aren't empty. This is the initial file. The compression part squeezes the pieces of cake, leaving a space in the boxes. As the pieces of cake aren't together, because of the created space, the pieces that make up the cake become fragmented.

I am still skeptical about NTFS giving out that kind of compression ratio. According to a test made by MaximumCompression on multiple files, NTFS gets the lowest score in compression ratio, a measly 40%. From personal experience I can tell you it's much lower than that, in fact so low that I never bothered to used it nor have I seen it's effects.

The best way to avoid fragmentation is to stop relying on NTFS. Most defraggers will fail to expand or move the compressed files. If somehow they did, NTFS could not be able to expand the files, or if he could, as the defragmentation process would fill the leftover space from the compression (the 4kB), the expansion would fragment the files, as the file wouldn't be written in the before-contiguous clusters.

This being said, and if you don't need to read the file constantly, use some of the formats recommended in the above link. 7z and rar are quite efficient (i.e. they compress with high ratios at a decent time). If you care about space and not about time, then choose a PAQ-type algorithm (although you will spend a very long time compressing and decompressing the files). There are also speedy algorithms available.

If you do need to read the file constantly, don't compress it at all. NTFS is just too damn messy.