How can I know which parts in the code are never used?
There are two varieties of unused code:
- the local one, that is, in some functions some paths or variables are unused (or used but in no meaningful way, like written but never read)
- the global one: functions that are never called, global objects that are never accessed
For the first kind, a good compiler can help:
-
-Wunused
(GCC, Clang) should warn about unused variables, Clang unused analyzer has even been incremented to warn about variables that are never read (even though used). -
-Wunreachable-code
(older GCC, removed in 2010) should warn about local blocks that are never accessed (it happens with early returns or conditions that always evaluate to true) - there is no option I know of to warn about unused
catch
blocks, because the compiler generally cannot prove that no exception will be thrown.
For the second kind, it's much more difficult. Statically it requires whole program analysis, and even though link time optimization may actually remove dead code, in practice the program has been so much transformed at the time it is performed that it is near impossible to convey meaningful information to the user.
There are therefore two approaches:
- The theoretic one is to use a static analyzer. A piece of software that will examine the whole code at once in great detail and find all the flow paths. In practice I don't know any that would work here.
- The pragmatic one is to use an heuristic: use a code coverage tool (in the GNU chain it's
gcov
. Note that specific flags should be passed during compilation for it to work properly). You run the code coverage tool with a good set of varied inputs (your unit-tests or non-regression tests), the dead code is necessarily within the unreached code... and so you can start from here.
If you are extremely interested in the subject, and have the time and inclination to actually work out a tool by yourself, I would suggest using the Clang libraries to build such a tool.
- Use the Clang library to get an AST (abstract syntax tree)
- Perform a mark-and-sweep analysis from the entry points onward
Because Clang will parse the code for you, and perform overload resolution, you won't have to deal with the C++ languages rules, and you'll be able to concentrate on the problem at hand.
However this kind of technique cannot identify the virtual overrides that are unused, since they could be called by third-party code you cannot reason about.
For the case of unused whole functions (and unused global variables), GCC can actually do most of the work for you provided that you're using GCC and GNU ld.
When compiling the source, use -ffunction-sections
and -fdata-sections
, then when linking use -Wl,--gc-sections,--print-gc-sections
. The linker will now list all the functions that could be removed because they were never called and all the globals that were never referenced.
(Of course, you can also skip the --print-gc-sections
part and let the linker remove the functions silently, but keep them in the source.)
Note: this will only find unused complete functions, it won't do anything about dead code within functions. Functions called from dead code in live functions will also be kept around.
Some C++-specific features will also cause problems, in particular:
- Virtual functions. Without knowing which subclasses exist and which are actually instantiated at run time, you can't know which virtual functions you need to exist in the final program. The linker doesn't have enough information about that so it will have to keep all of them around.
- Globals with constructors, and their constructors. In general, the linker can't know that the constructor for a global doesn't have side effects, so it must run it. Obviously this means the global itself also needs to be kept.
In both cases, anything used by a virtual function or a global-variable constructor also has to be kept around.
An additional caveat is that if you're building a shared library, the default settings in GCC will export every function in the shared library, causing it to be "used" as far as the linker is concerned. To fix that you need to set the default to hiding symbols instead of exporting (using e.g. -fvisibility=hidden
), and then explicitly select the exported functions that you need to export.
Well if you using g++ you can use this flag -Wunused
According documentation:
Warn whenever a variable is unused aside from its declaration, whenever a function is declared static but never defined, whenever a label is declared but not used, and whenever a statement computes a result that is explicitly not used.
http://docs.freebsd.org/info/gcc/gcc.info.Warning_Options.html
Edit: Here is other useful flag -Wunreachable-code
According documentation:
This option is intended to warn when the compiler detects that at least a whole line of source code will never be executed, because some condition is never satisfied or because it is after a procedure that never returns.
Update: I found similar topic Dead code detection in legacy C/C++ project
I think you are looking for a code coverage tool. A code coverage tool will analyze your code as it is running, and it will let you know which lines of code were executed and how many times, as well as which ones were not.
You could try giving this open source code coverage tool a chance: TestCocoon - code coverage tool for C/C++ and C#.
The real answer here is: You can never really know for sure.
At least, for nontrivial cases, you can't be sure you've gotten all of it. Consider the following from Wikipedia's article on unreachable code:
double x = sqrt(2);
if (x > 5)
{
doStuff();
}
As Wikipedia correctly notes, a clever compiler may be able to catch something like this. But consider a modification:
int y;
cin >> y;
double x = sqrt((double)y);
if (x != 0 && x < 1)
{
doStuff();
}
Will the compiler catch this? Maybe. But to do that, it will need to do more than run sqrt
against a constant scalar value. It will have to figure out that (double)y
will always be an integer (easy), and then understand the mathematical range of sqrt
for the set of integers (hard). A very sophisticated compiler might be able to do this for the sqrt
function, or for every function in math.h, or for any fixed-input function whose domain it can figure out. This gets very, very complex, and the complexity is basically limitless. You can keep adding layers of sophistication to your compiler, but there will always be a way to sneak in some code that will be unreachable for any given set of inputs.
And then there are the input sets that simply never get entered. Input that would make no sense in real life, or get blocked by validation logic elsewhere. There's no way for the compiler to know about those.
The end result of this is that while the software tools others have mentioned are extremely useful, you're never going to know for sure that you caught everything unless you go through the code manually afterward. Even then, you'll never be certain that you didn't miss anything.
The only real solution, IMHO, is to be as vigilant as possible, use the automation at your disposal, refactor where you can, and constantly look for ways to improve your code. Of course, it's a good idea to do that anyway.