Profiling python C extensions

After the comment by pygabriel I decided to upload a package to pypi that implements a profiler for python extensions using the cpu-profiler from google-perftools: http://pypi.python.org/pypi/yep


I've found my way using google-perftools. The trick was to wrap the functions StartProfiler and StopProfiler in python (throught cython in my case).

To profile the C extension is sufficient to wrap the python code inside the StartProfiler and StopProfiler calls.

from google_perftools_wrapped import StartProfiler, StopProfiler
import c_extension # extension to profile c_extension.so

StartProfiler("output.prof")
... calling the interesting functions from the C extension module ...
StopProfiler()

Then to analyze for example you can export in callgrind format and see the result in kcachegrind:

pprof --callgrind c_extension.so output.prof > output.callgrind 
kcachegrind output.callgrind

One of my colleague told me ltrace(1). It helped me on the same situation quite a lot.

Assume the shared object name of your C extention is myext.so and you want to execute benchmark.py, then

ltrace -x @myext.so -c python benchmark.py

Its output is like

% time     seconds  usecs/call     calls      function
------ ----------- ----------- --------- --------------------
 24.88   30.202126     7550531         4 ldap_result
 12.46   15.117625     7558812         2 l_ldap_result4
 12.41   15.059652     5019884         3 ldap_chase_v3referrals
 12.41   15.057678     3764419         4 ldap_new_connection
 12.40   15.050310     3762577         4 ldap_int_open_connection
 12.39   15.042360     3008472         5 ldap_send_server_request
 12.38   15.029055     3757263         4 ldap_connect_to_host
  0.05    0.057890       28945         2 ldap_get_option
  0.04    0.052182       26091         2 ldap_sasl_bind
  0.03    0.030760       30760         1 l_ldap_get_option
  0.03    0.030635       30635         1 LDAP_get_option
  0.02    0.029960       14980         2 ldap_initialize
  0.02    0.027988       27988         1 ldap_int_initialize
  0.02    0.026722       26722         1 l_ldap_simple_bind
  0.02    0.026386       13193         2 ldap_send_initial_request
  0.02    0.025810       12905         2 ldap_int_select
....

Special care is needed if your shared object has - or + in its file name. These characters aren't treated as is (see man 1 ltrace for details).

The wildcard * can be a workaround such as -x @myext* in place of -x @myext-2.so.


With gprof, you can profile any program that was properly compiled and linked (gcc -pg etc, in gprof's case). If you're using a Python version not built with gcc (e.g., the Windows precompiled version the PSF distributes), you'll need to research what equivalent tools exist for that platform and toolchain (in the Windows PSF case, maybe mingw can help). There may be "irrelevant" data there (internal C functions in the Python runtime), and, if so, the percentages shown by gprof may not be applicable -- but the absolute numbers (of calls, and durations thereof) are still valid, and you can post-process gprof's output (e.g., with a little Python script;-) to exclude the irrelevant data and compute the percentages you want.


I found py-spy very easy to use. See this blog post for an explanation of its native extension support.

Highlights:

  • pip installable
  • cpu sampling based
  • no compiler flags required
  • executes your program or attaches to a running process
  • multiple output formats (I recommend --format speedscope)
  • configurable sampling rate