Specifying Exact CPU Instruction Set with Cythonized Python Wheels
The pip infrastructure doesn't support such granularity.
I think a better approach would be to have two versions of the Cython-extension compiled: with -march=native
and without, to install both and to decide at the run time which one should be loaded.
Here is a proof of concept.
The first hoop to jump: how to check at run time which instructions are supported by CPU/OS combination. For the simplicity we will check for AVX (this SO-post has more details) and I offer only a gcc-specific (see also this) solution - called impl_picker.pyx
:
cdef extern from *:
"""
int cpu_supports_avx(void){
return __builtin_cpu_supports("avx");
}
"""
int cpu_supports_avx()
def cpu_has_avx_support():
return cpu_supports_avx() != 0
The second problem: the pyx-file and the module must have the same name. To avoid code duplication, the actual code is in a pxi-file:
# worker.pxi
cdef extern from *:
"""
int compiled_with_avx(void){
#ifdef __AVX__
return 1;
#else
return 0;
#endif
}
"""
int compiled_with_avx()
def compiled_with_avx_support():
return compiled_with_avx() != 0
As one can see, the function compiled_with_avx_support
will yield different results, depending on whether it was compiled with -march=native
or not.
And now we can define two versions of the module just by including the actual code from the *.pxi-file. One module called worker_native.pyx
:
# distutils: extra_compile_args=["-march=native"]
include "worker.pxi"
and worker_fallback.pyx
:
include "worker.pxi"
Building everything, e.g. via cythonize -i -3 *.pyx
, it can be used as follows:
from impl_picker import cpu_has_avx_support
# overhead once when imported:
if cpu_has_avx_support():
import worker_native as worker
else:
print("using fallback worker")
import worker_fallback as worker
print("compiled_with_avx_support:", worker.compiled_with_avx_support())
On my machine the above would lead to compiled_with_avx_support: True
, on older machines the "slower" worker_fallback
will be used and the result will be compiled_with_avx_support: False
.
The goal of this post is not to give a working setup.py
, but just to outline the idea how one could achieve the goal of picking correct version at the run time. Obviously, the setup.py could be quite more complicated: e.g. one would need to compile multiple c-files with different compiler settings (see this SO-post, how this could be achieved).