How to delete huge number of files on Windows

Solution 1:

Technical Explanation

The reason that most methods are causing problems is that Windows tries to enumerate the files and folders. This isn’t much of a problem with a few hundred—or even thousand—files/folders a few levels deep, but when you have trillions of files in millions of folders going dozens of levels deep, then that will definitely bog the system down.

Let’s you have “only” 100,000,000 files, and Windows uses a simple structure like this to store each file along with its path (that way you avoid storing each directory separately, thus saving a some overhead):

struct FILELIST {                   // Total size is 264 to 528 bytes:
  TCHAR         name[MAX_PATH];     // MAX_PATH=260; TCHAR=1 or 2 bytes
  FILELIST*     nextfile;           // Pointers are 4 bytes for 32-bit and 8 for 64-bit
}

Depending on whether it uses 8-bit characters or Unicode characters (it uses Unicode) and whether your system is 32-bit or 64-bit, then it will need between 25GB and 49GB of memory to store the list (and this is a a very simplified structure).

The reason why Windows tries to enumerate the files and folders before deleting them varies depending on the method you are using to delete them, but both Explorer and the command-interpreter do it (you can see a delay when you initiate the command). You can also see the disk activity (HDD LED) flash as it reads the directory tree from the drive.

Solution

Your best bet to deal with this sort of situation is to use a delete tool that deletes the files and folders individually, one at a time. I don’t know if there are any ready-made tools to do it, but it should be possible to accomplish with a simple batch-file.

@echo off
if not [%1]==[] cd /d %1
del /q *
for /d %%i in (*) do call %0 "%%i"

What this does is to check if an argument was passed. If so, then it changes to the directory specified (you can run it without an argument to start in the current directory or specify a directory—even on a different drive to have it start there).

Next, it deletes all files in the current directory. In this mode, it should not enumerate anything and simply delete the files without sucking up much, if any, memory.

Then it enumerates the folders in the current directory and calls itself, passing each folder to it(self) to recurse downward.

Analysis

The reason that this should work is because it does not enumerate every single file and folder in the entire tree. It does not enumerate any files at all, and only enumerates the folders in the current directory (plus the remaining ones in the parent directories). Assuming there are only a few hundred sub-directories in any given folder, then this should not be too bad, and certainly requires much less memory than other methods that enumerate the entire tree.

You may wonder about using the /r switch instead of using (manual) recursion. That would not work because while the /r switch does recursion, it pre-enumerates the entire directory tree which is exactly what we want to avoid; we want to delete as we go without keeping track.

Comparison

Lets compare this method to the full-enumeration method(s).

You had said that you had “millions of directories”; let’s say 100 million. If the tree is approximately balanced, and assuming an average of about 100 sub-directories per folder, then the deepest nested directory would be about four levels down—actually, there would be 101,010,100 sub-folders in the whole tree. (Amusing how 100M can break down to just 100 and 4.)

Since we are not enumerating files, we only need to keep track of at most 100 directory names per level, for a maximum of 4 × 100 = 400 directories at any given time.

Therefore the memory requirement should be ~206.25KB, well within the limits of any modern (or otherwise) system.

Test

Unfortunately(?) I don’t have a system with trillions of files in millions of folders, so I am not able to test it (I believe at last count, I had about ~800K files), so someone else will have to try it.

Caveat

Of course memory isn’t the only limitation. The drive will be a big bottleneck too because for every file and folder you delete, the system has to mark it as free. Thankfully, many of these disk operations will be bundled together (cached) and written out in chunks instead of individually (at least for hard-drives, not for removable media), but it will still cause quite a bit of thrashing as the system reads and writes the data.

Solution 2:

I can't speak to the trillions of files, but I recently nuked an old file share that contained ~ 1.8M files using:

robocopy EmptyTMPFolder FolderToDelete /MIR /MT:16 /ETA /R:30 /W:5

"EmptyTMPFolder " is an empty local directory. the /MIR option will make the target look like the source (empty).

The real benefit for this approach was the retry option (/R:30). This permitted an opportunity to absorb any connectivity issues that may occur during this process. Local deletes might not find benefit in this approach.

I don't have specific benchmarks to compare the times, but I would prefer this over some of the other options suggested b/c of the retry/wait options. The deletes began near instantly.

Solution 3:

To delete all folders will take a long time, and there is not a whole lot you can do about it. What you can do is save your data, and format your drive. It is not optimal, but it will work (and quickly).

Another option is perhaps to use some linux distro on a live CD that can read from an NTFS partition. I know from personal experience that rm -rf folderName can run for at least 2 days without crashing a system with 2GB of RAM. It will take a while, but at least it will finish.

Solution 4:

Erm.. I don't want to know how you created so many.

What's happening is Explorer is trying to enumerate every single file, and store the information in memory, before it starts deleting. And there's obviously way too many.

Have you tried the command rmdir /s? As long as it actually deletes the files as they are found rather than waiting on every single one to be enumerated, it may work.

How many levels of subdirectories are there? If there's only one, or some other low number, then a quick batch file that manually recurses through might work.

Any method will take a while, though.

Solution 5:

Shift+Delete skips the Recycle Bin, and might significantly speed up things.

If that doesn't work (extreme cases), try Fast Folder Eraser and / or Mass Directory Eraser