Linux C++: how to profile time wasted due to cache misses?
I know that I can use gprof to benchmark my code.
However, I have this problem -- I have a smart pointer that has an extra level of indirection (think of it as a proxy object).
As a result, I have this extra layer that effects pretty much all functions, and screws with caching.
Is there a way to measure the time my CPU wastes due to cache misses?
Solution 1:
You could try cachegrind and it's front-end kcachegrind.
Solution 2:
Linux supports with perf
from 2.6.31 on. This allows you to do the following:
- compile your code with -g to have debug information included
- run your code e.g. using the last level cache misses counters:
perf record -e LLC-loads,LLC-load-misses yourExecutable
- run
perf report
- after acknowledging the initial message, select the
LLC-load-misses
line, - then e.g. the first function and
- then
annotate
. You should see the lines (in assembly code, surrounded by the the original source code) and a number indicating what fraction of last level cache misses for the lines where cache misses occurred.
- after acknowledging the initial message, select the
Solution 3:
You could find a tool that accesses the CPU performance counters. There is probably a register in each core that counts L1, L2, etc misses. Alternately Cachegrind performs a cycle-by-cycle simulation.
However, I don't think that would be insightful. Your proxy objects are presumably modified by their own methods. A conventional profiler will tell you how much time those methods are taking. No profile tool would tell you how performance would improve without that source of cache pollution. That's a matter of reducing the size and structure of the program's working set, which isn't easy to extrapolate.
A quick Google search turned up boost::intrusive_ptr
which might interest you. It doesn't appear to support something like weak_ptr
, but converting your program might be trivial, and then you would know for sure the cost of the non-intrusive ref counts.