What's the best way to calculate the size of a directory in .NET?

I've written the following routine to manually traverse through a directory and calculate its size in C#/.NET:


protected static float CalculateFolderSize(string folder)
{
    float folderSize = 0.0f;
    try
    {
        //Checks if the path is valid or not
        if (!Directory.Exists(folder))
            return folderSize;
        else
        {
            try
            {
                foreach (string file in Directory.GetFiles(folder))
                {
                    if (File.Exists(file))
                    {
                        FileInfo finfo = new FileInfo(file);
                        folderSize += finfo.Length;
                    }
                }

                foreach (string dir in Directory.GetDirectories(folder))
                    folderSize += CalculateFolderSize(dir);
            }
            catch (NotSupportedException e)
            {
                Console.WriteLine("Unable to calculate folder size: {0}", e.Message);
            }
        }
    }
    catch (UnauthorizedAccessException e)
    {
        Console.WriteLine("Unable to calculate folder size: {0}", e.Message);
    }
    return folderSize;
}

I have an application which is running this routine repeatedly for a large number of folders. I'm wondering if there's a more efficient way to calculate the size of a folder with .NET? I didn't see anything specific in the framework. Should I be using P/Invoke and a Win32 API? What's the most efficient way of calculating the size of a folder in .NET?


Solution 1:

No, this looks like the recommended way to calculate directory size, the relevent method included below:

public static long DirSize(DirectoryInfo d) 
{    
    long size = 0;    
    // Add file sizes.
    FileInfo[] fis = d.GetFiles();
    foreach (FileInfo fi in fis) 
    {      
        size += fi.Length;    
    }
    // Add subdirectory sizes.
    DirectoryInfo[] dis = d.GetDirectories();
    foreach (DirectoryInfo di in dis) 
    {
        size += DirSize(di);   
    }
    return size;  
}

You would call with the root as:

Console.WriteLine("The size is {0} bytes.", DirSize(new DirectoryInfo(targetFolder));

...where targetFolder is the folder-size to calculate.

Solution 2:

DirectoryInfo dirInfo = new DirectoryInfo(@strDirPath);
long dirSize = await Task.Run(() => dirInfo.EnumerateFiles( "*", SearchOption.AllDirectories).Sum(file => file.Length));

Solution 3:

I do not believe there is a Win32 API to calculate the space consumed by a directory, although I stand to be corrected on this. If there were then I would assume Explorer would use it. If you get the Properties of a large directory in Explorer, the time it takes to give you the folder size is proportional to the number of files/sub-directories it contains.

Your routine seems fairly neat & simple. Bear in mind that you are calculating the sum of the file lengths, not the actual space consumed on the disk. Space consumed by wasted space at the end of clusters, file streams etc, are being ignored.

Solution 4:

public static long DirSize(DirectoryInfo dir)
{
    return dir.GetFiles().Sum(fi => fi.Length) +
           dir.GetDirectories().Sum(di => DirSize(di));
}

Solution 5:

The real question is, what do you intend to use the size for?

Your first problem is that there are at least four definitions for "file size":

  • The "end of file" offset, which is the number of bytes you have to skip to go from the beginning to the end of the file.
    In other words, it is the number of bytes logically in the file (from a usage perspective).

  • The "valid data length", which is equal to the offset of the first byte which is not actually stored.
    This is always less than or equal to the "end of file", and is a multiple of the cluster size.
    For example, a 1 GB file can have a valid data length of 1 MB. If you ask Windows to read the first 8 MB, it will read the first 1 MB and pretend the rest of the data was there, returning it as zeros.

  • The "allocated size" of a file. This is always greater than or equal to the "end of file".
    This is the number of clusters that the OS has allocated for the file, multiplied by the cluster size.
    Unlike the case where the "end of file" is greater than the "valid data length", The excess bytes are not considered to be part of the file's data, so the OS will not fill a buffer with zeros if you try to read in the allocated region beyond the end of the file.

  • The "compressed size" of a file, which is only valid for compressed (and sparse?) files.
    It is equal to the size of a cluster, multiplied by the number of clusters on the volume that are actually allocated to this file.
    For non-compressed and non-sparse files, there is no notion of "compressed size"; you would use the "allocated size" instead.

Your second problem is that a "file" like C:\Foo can actually have multiple streams of data.
This name just refers to the default stream. A file might have alternate streams, like C:\Foo:Bar, whose size is not even shown in Explorer!

Your third problem is that a "file" can have multiple names ("hard links").
For example, C:\Windows\notepad.exe and C:\Windows\System32\notepad.exe are two names for the same file. Any name can be used to open any stream of the file.

Your fourth problem is that a "file" (or directory) might in fact not even be a file (or directory):
It might be a soft link (a "symbolic link" or a "reparse point") to some other file (or directory).
That other file might not even be on the same drive. It might even point to something on the network, or it might even be recursive! Should the size be infinity if it's recursive?

Your fifth is that there are "filter" drivers that make certain files or directories look like actual files or directories, even though they aren't. For example, Microsoft's WIM image files (which are compressed) can be "mounted" on a folder using a tool called ImageX, and those do not look like reparse points or links. They look just like directories -- except that the're not actually directories, and the notion of "size" doesn't really make sense for them.

Your sixth problem is that every file requires metadata.
For example, having 10 names for the same file requires more metadata, which requires space. If the file names are short, having 10 names might be as cheap as having 1 name -- and if they're long, then having multiple names can use more disk space for the metadata. (Same story with multiple streams, etc.)
Do you count these, too?