How do I use a dump file to diagnose a memory leak?

I have a .NET service with a normal private working set of about 80 MB. During a recent load test, the process reached 3.5 GB memory usage causing the whole machine to be low on physical memory (3.9 of 4 GB used), and the memory was not released long after the load test was stopped. Using task manager, I took a dump file of the process and opened it in Visual Studio 2010 SP1, and I am able to start debugging on it.

How do I diagnose the memory issue? I have dotTrace Memory 3.x at my disposal, does it support memory profiling on dump files? If not, will the memory profiling features of Visual Studio 2010 Premium help (I currently have Professional)? Can WinDbg help?

UPDATE: The new Visual Studio 2013 Ultimate can now natively diagnose memory issues using dump files. See this blog post for more details.


Solution 1:

Install WinDbg. You need to make sure you get the correct version x86 or x64 depending on your dump. Here is a direct link to the download for x86.

On that, you need to ensure you took the correct dump. You can use Task Manager to create the dump file (right click on process -> Create Dump File). If you're on 64bit and your process is x86 use the 32bit version of Task Manager (C:\Windows\SysWOW64\taskmgr.exe) to take the dump file. See my article for more info on taking dump files, eg if you're on XP and need to use windbg to create the dump file.

warning there's a fairly steep learning curve and things might not work exactly as described here so come back with any issues.

I'm assuming you're using .NET4 given you can open the dump in Visual Studio. Here's a very quick guide to help you work with your dmp file:

1) Run WinDbg, set symbols path (File -> Symbol Search Path) to

SRV*c:\symbols*http://msdl.microsoft.com/download/symbols

2) Open Crash dump or drag your .DMP file onto WinDbg.

3)type this into the command window

.loadby sos clr

(FYI, for .NET 2, the command should be .loadby sos mscorwks)

4) then type this

!dumpheap -stat

which lists the type of objects and their count. looks something like this:

enter image description here

You will have to analyze this in the context of your application and see if anything appears unusual.

There is much more to windbg, google is your friend.

Solution 2:

Generally if you have a leak in a managed application, it means that something is not getting collected. Common sources include

  • Events handlers: If the subscriber is not removed the publisher will hold on to it.

  • Statics

  • Finalizers: A blocked finalizer will prevent the finalizer thread from running any other finalizers and thus prevent these instances from being collected.

  • Similarly, a deadlocked thread will hold on to whatever roots it holds. Of course if you have deadlocked threads that will probably affect the application on several levels.

To troubleshoot this you need to inspect the managed heap. WinDbg + SOS (or PSSCOR) will let you do this. The !dumpheap -stat command lists the entire managed heap.

You need to have an idea of the number of instances of each type to expect on the heap. Once you find something that looks odd you can use the !dumpheap -mt <METHOD TABLE> command to list all instances of a given type.

The next step is to analyze the root of these instances. Pick one at random and do a !gcroot on that. That will show how that particular instance is rooted. Look for event handlers and pinned objects (usually represent static references). If you see the finalizer queue in there you need to examine what the finalizer thread is doing. Use the !threads and !clrstack commands for that.

If everything looks fine for that instance you move on to another instance. If that doesn't yield anything you may need to go back to look at the heap again and repeat from there.

Other sources of leaks include: Assemblies that are not unloaded and fragmentation of the Large Object Heap. SOS/PSSCOR can help you locate these as well, but I'll skip the details for now.

If you want to know more I recommend Tess' blog. I've also done a couple of videos covering how to use WinDbg + SOS (here and here).

If you have the option of debugging the process while it runs, I recommend using PSSCOR instead of SOS. PSSCOR is essentially a private branch of the SOS sources that has been enhanced with additional commands and many of the existing SOS commands have been improved as well. E.g. the PSSCOR version of the !dumpheap command has a very useful delta column, which makes troubleshooting memory leaks much easier.

In order to use it you need to start your process, attach WinDbg and load PSSCOR and do a !dumpheap -stat. Then you let the process run again so allocations are made. Break the execution and repeat the command. Now PSSCOR will show you the number of instances that were added/removed since the previous inspection.

Solution 3:

Since version 2017.2 JetBrains dotMemory supports Windows memory dumps analysis with all its power and fancy GUI.