Profiling python C extensions
After the comment by pygabriel I decided to upload a package to pypi that implements a profiler for python extensions using the cpu-profiler from google-perftools: http://pypi.python.org/pypi/yep
I've found my way using google-perftools. The trick was to wrap the functions StartProfiler and StopProfiler in python (throught cython in my case).
To profile the C extension is sufficient to wrap the python code inside the StartProfiler and StopProfiler calls.
from google_perftools_wrapped import StartProfiler, StopProfiler
import c_extension # extension to profile c_extension.so
StartProfiler("output.prof")
... calling the interesting functions from the C extension module ...
StopProfiler()
Then to analyze for example you can export in callgrind format and see the result in kcachegrind:
pprof --callgrind c_extension.so output.prof > output.callgrind
kcachegrind output.callgrind
One of my colleague told me ltrace(1)
. It helped me on the same situation quite a lot.
Assume the shared object name of your C extention is myext.so
and you want to execute benchmark.py
, then
ltrace -x @myext.so -c python benchmark.py
Its output is like
% time seconds usecs/call calls function
------ ----------- ----------- --------- --------------------
24.88 30.202126 7550531 4 ldap_result
12.46 15.117625 7558812 2 l_ldap_result4
12.41 15.059652 5019884 3 ldap_chase_v3referrals
12.41 15.057678 3764419 4 ldap_new_connection
12.40 15.050310 3762577 4 ldap_int_open_connection
12.39 15.042360 3008472 5 ldap_send_server_request
12.38 15.029055 3757263 4 ldap_connect_to_host
0.05 0.057890 28945 2 ldap_get_option
0.04 0.052182 26091 2 ldap_sasl_bind
0.03 0.030760 30760 1 l_ldap_get_option
0.03 0.030635 30635 1 LDAP_get_option
0.02 0.029960 14980 2 ldap_initialize
0.02 0.027988 27988 1 ldap_int_initialize
0.02 0.026722 26722 1 l_ldap_simple_bind
0.02 0.026386 13193 2 ldap_send_initial_request
0.02 0.025810 12905 2 ldap_int_select
....
Special care is needed if your shared object has -
or +
in its file name. These characters aren't treated as is (see man 1 ltrace
for details).
The wildcard *
can be a workaround such as -x @myext*
in place of -x @myext-2.so
.
With gprof, you can profile any program that was properly compiled and linked (gcc -pg
etc, in gprof
's case). If you're using a Python version not built with gcc
(e.g., the Windows precompiled version the PSF distributes), you'll need to research what equivalent tools exist for that platform and toolchain (in the Windows PSF case, maybe mingw
can help). There may be "irrelevant" data there (internal C functions in the Python runtime), and, if so, the percentages shown by gprof
may not be applicable -- but the absolute numbers (of calls, and durations thereof) are still valid, and you can post-process gprof
's output (e.g., with a little Python script;-) to exclude the irrelevant data and compute the percentages you want.
I found py-spy very easy to use. See this blog post for an explanation of its native extension support.
Highlights:
- pip installable
- cpu sampling based
- no compiler flags required
- executes your program or attaches to a running process
- multiple output formats (I recommend
--format speedscope
) - configurable sampling rate