Configuring NTFS file system for performance
We have an application that is planning to store around 1.1TB of XML files which average 8.5kb in size.
These represent a rolling 18 months of data, with around 200,000 new files being created every day.
Each file will be written only once, and then has a 3% chance of being read a small number (<10) of times over the following 18 months.
What NTFS options are open to us that will help with performance?
Current ones on our list are:
- Disabling 8.3 name creation
- Limiting the number of files in a directory (number still under debate...)
Edit
Regarding fragmentation: We are planning to use 2k cluster sizes for disk space usage efficiency. Each file will be written only once (i.e. no file edits). Files will be deleted after 18 months on a dayby-day basis.
Therefore we don't believe that fragmentation will be a significant issue.
I would also add:
Turn off disk defragmentation. Change block size to 16kb so each file is written into a single block.
Rational for this:
You are wanting to write 1.7GB of data a day, in 200,000 files. Assuming that these files are writen over a 24 hour day, this means around 3 files a second. This does not seem to be a significant problem for a single SATA disk so my guess is that you have other problems as well as disk performance.
(i.e. do you have enough memory? or are you paging memory to disk as well?)
However
Windows NTFS file systems by default attempts to defragment file systems in the background. Disk defragmentation will kill performance whilst you are defragmenting the disk. Since performance seems to already be an issue, this will only be making matters worse for you.
There is a balance between using small cluster sizes and IO performance in writing large files. Files and the file allocation table will not be on the same sector on the disk, so having to allocated blocks as you are writing files will cause the disk head to have to constantly move around. Using a cluster size that is capable of storing 95% of your files in one cluster each, will improve your IO write performance.
As other people have pointed out, using a tiny cluster size of 2k will cause fragmentation over time. Think of it like this, during the first 18 months you will be writing files into clean empty disk, but the OS doesnt know that once closed, no more data will be added to each file, so it has been leaving some blocks available at the end each file incase that file is extended later. Long before you fill the disk, you will find that the only free space is in gaps between other files. Not only that, when its selecting a gap for your file, the os does not know if you are writing a 5 block file or a 2 block file, so it can't make a good choices on where to save your file.
At the end of the day, engineering is about handling conflicting needs, and choosing the lowest cost solution to these balancing needs. My guess is that buying a larger hard drive is probably cheaper than buying faster hard drives.
Disable last access time stamp and reserve space for the MFT.
- NTFS Performance Hacks
- Disable the NTFS Last Access Time Stamp