How can I visualize hard disk space with millions of files?

We have a hard disk that is 600 Gigs and nearly full. It's been filled up with 18,501,765 files (mostly small 19k images) and 7,142,132 folders. It's very difficult to find out where exactly all the space has gone too. Our regular cleanup procedures are not clearing up enough space which means we need to look at this drive as a whole and determine what is out there and what can be moved or removed. We've tried several applications and so far they have either blown up or simply ran for an amazing amount of time to complete.

Server Information

  • Operating System: Windows Server 2003
  • File System: NTFS

Solution

Space ObServer was able to read through 18,501,765 files and 7,142,132 folders without taking up hardly any memory. I'm sure this is mostly due to the fact that it uses a SQL backend to store all of the data. It unfortunately the most expensive of all the products at $259.95 per server.

Attempted Solutions

During my research I tried several different solutions both pay and free. I kept a list of the products I tried below for everyone's information.

Free Software

  • JDiskReport - Stops at 10 million
  • WinDirStat - Stops at 10 million
  • SpaceMonger - Skipping due to mention of RAM storage
  • SpaceSniffer - Stops at unknown - crash

Pay Software

  • FolderSizes - Stops at 12 million (3 million folders)
  • TreeSize Professional - Skipping due to mention of RAM storage

Updates

Update #1: The server I am attempting to analyze has 2 GB of RAM and most products that I try seem to try and keep the file/folder information in memory. This tends to run out much too quickly with 18,501,765 files and 7,142,132 folders.

Update #2: Looks like the developers of WinDirStat got involved enough to tell us that it can compile under 64-bit. That gives it more memory to work with but I'm not sure if it's going to be enough unless they can persist to disk.


Assuming your OS is Windows...

Either way you slice it, tabulating millions of files is always going to take a long time and will be restricted by the I/O of the disk itself. I recommend TreeSize Professional. Or maybe SpaceObServer. You could give the freeware version of TreeSize a try as well.


Definitely try WinDirStat: it gives a fantastic visualization of disk use by depicting each file as a rectangle drawn to scale, color coded by file type. Click on any item in the visualization and you'll see it in the directory tree.

The standard 32-bit build is limited to 10 million files and 2 GB RAM usage, but the source code will build successfully as a 64-bit application. The fact that the server in question has only 2GB of RAM may be problematic in this specific case, but most servers with such large numbers of files will have much more RAM.

Edit #1: I regret to have discovered that, when tested on a 4TB volume containing millions of files, WinDirStat Portable crashed after indexing about 6.5 million files. It may not work for the original question if the drive contains 6+ million files.

Edit #2: Full version of WinDirStat crashes at 10 million files and 1.9GB used

Edit #3: I got in touch with the WinDirStat developers and: (1) they agree that this was caused by memory usage limitations of the x86 architecture, and (2) mentioned that it can be compiled as 64-bit without errors. More soon.

Edit #4: The test of a 64-bit build of WinDirStat was successful. In 44 minutes, it indexed 11.4 million files and consumed 2.7 GB of RAM.


I regularly use FolderSizes on several 1TB drives with several million files with no problems.


+1 for the TreeSize products, but...

Your sentence about "not cleaning enough space" makes me wonder: Could you have run out of NTFS MFT reserved space? If the filesystem grabs more MFT space than is initially allocated, it is not returned to regular filespace, and is not shown in defrag operations.

http://support.microsoft.com/kb/174619

"Volumes with a small number of relatively large files exhaust the unreserved space first, while volumes with a large number of relatively small files exhaust the MFT zone space first. In either case, fragmentation of the MFT starts to take place when one region or the other becomes full. If the unreserved space becomes full, space for user files and directories starts to be allocated from the MFT zone competing with the MFT for allocation. If the MFT zone becomes full, space for new MFT entries is allocated from the remainder of the disk, again competing with other files. "