Dead code detection in legacy C/C++ project [closed]

How would you go about dead code detection in C/C++ code? I have a pretty large code base to work with and at least 10-15% is dead code. Is there any Unix based tool to identify this areas? Some pieces of code still use a lot of preprocessor, can automated process handle that?


Solution 1:

You could use a code coverage analysis tool for this and look for unused spots in your code.

A popular tool for the gcc toolchain is gcov, together with the graphical frontend lcov (http://ltp.sourceforge.net/coverage/lcov.php).

If you use gcc, you can compile with gcov support, which is enabled by the '--coverage' flag. Next, run your application or run your test suite with this gcov enabled build.

Basically gcc will emit some extra files during compilation and the application will also emit some coverage data while running. You have to collect all of these (.gcdo and .gcda files). I'm not going in full detail here, but you probably need to set two environment variables to collect the coverage data in a sane way: GCOV_PREFIX and GCOV_PREFIX_STRIP...

After the run, you can put all the coverage data together and run it through the lcov toolsuite. Merging of all the coverage files from different test runs is also possible, albeit a bit involved.

Anyhow, you end up with a nice set of webpages showing some coverage information, pointing out the pieces of code that have no coverage and hence, were not used.

Off course, you need to double check if the portions of code are not used in any situation and a lot depends on how good your tests exercise the codebase. But at least, this will give an idea about possible dead-code candidates...

Solution 2:

Compile it under gcc with -Wunreachable-code.

I think that the more recent the version, the better results you'll get, but I may be wrong in my impression that it's something they've been actively working on. Note that this does flow analysis, but I don't believe it tells you about "code" which is already dead by the time it leaves the preprocessor, because that's never parsed by the compiler. It also won't detect e.g. exported functions which are never called, or special case handling code which just so happen to be impossible because nothing ever calls the function with that parameter - you need code coverage for that (and run the functional tests, not the unit tests. Unit tests are supposed to have 100% code coverage, and hence execute code paths which are 'dead' as far as the application is concerned). Still, with these limitations in mind it's an easy way to get started finding the most completely bollixed routines in the code base.

This CERT advisory lists some other tools for static dead code detection

Solution 3:

Your approach depends on the availability (automated) tests. If you have a test suite that you trust to cover a sufficient amount of functionality, you can use a coverage analysis, as previous answers already suggested.

If you are not so fortunate, you might want to look into source code analysis tools like SciTools' Understand that can help you analyse your code using a lot of built in analysis reports. My experience with that tool dates from 2 years ago, so I can't give you much detail, but what I do remember is that they had an impressive support with very fast turnaround times of bug fixes and answers to questions.

I found a page on static source code analysis that lists many other tools as well.

If that doesn't help you sufficiently either, and you're specifically interested in finding out the preprocessor-related dead code, I would recommend you post some more details about the code. For example, if it is mostly related to various combinations of #ifdef settings you could write scripts to determine the (combinations of) settings and find out which combinations are never actually built, etc.