Linux C++: how to profile time wasted due to cache misses?

I know that I can use gprof to benchmark my code.

However, I have this problem -- I have a smart pointer that has an extra level of indirection (think of it as a proxy object).

As a result, I have this extra layer that effects pretty much all functions, and screws with caching.

Is there a way to measure the time my CPU wastes due to cache misses?


Solution 1:

You could try cachegrind and it's front-end kcachegrind.

Solution 2:

Linux supports with perf from 2.6.31 on. This allows you to do the following:

  • compile your code with -g to have debug information included
  • run your code e.g. using the last level cache misses counters: perf record -e LLC-loads,LLC-load-misses yourExecutable
  • run perf report
    • after acknowledging the initial message, select the LLC-load-misses line,
    • then e.g. the first function and
    • then annotate. You should see the lines (in assembly code, surrounded by the the original source code) and a number indicating what fraction of last level cache misses for the lines where cache misses occurred.

Solution 3:

You could find a tool that accesses the CPU performance counters. There is probably a register in each core that counts L1, L2, etc misses. Alternately Cachegrind performs a cycle-by-cycle simulation.

However, I don't think that would be insightful. Your proxy objects are presumably modified by their own methods. A conventional profiler will tell you how much time those methods are taking. No profile tool would tell you how performance would improve without that source of cache pollution. That's a matter of reducing the size and structure of the program's working set, which isn't easy to extrapolate.

A quick Google search turned up boost::intrusive_ptr which might interest you. It doesn't appear to support something like weak_ptr, but converting your program might be trivial, and then you would know for sure the cost of the non-intrusive ref counts.