Specifying Exact CPU Instruction Set with Cythonized Python Wheels

The pip infrastructure doesn't support such granularity.

I think a better approach would be to have two versions of the Cython-extension compiled: with -march=native and without, to install both and to decide at the run time which one should be loaded.

Here is a proof of concept.

The first hoop to jump: how to check at run time which instructions are supported by CPU/OS combination. For the simplicity we will check for AVX (this SO-post has more details) and I offer only a gcc-specific (see also this) solution - called impl_picker.pyx:

cdef extern from *:
    """
    int cpu_supports_avx(void){
        return __builtin_cpu_supports("avx");
    }
    """
    int cpu_supports_avx()

def cpu_has_avx_support():
    return cpu_supports_avx() != 0

The second problem: the pyx-file and the module must have the same name. To avoid code duplication, the actual code is in a pxi-file:

# worker.pxi
cdef extern from *:
    """   
    int compiled_with_avx(void){
        #ifdef __AVX__
            return 1;
        #else
            return 0;
        #endif
    }
    """
    int compiled_with_avx()

def compiled_with_avx_support():
    return compiled_with_avx() != 0

As one can see, the function compiled_with_avx_support will yield different results, depending on whether it was compiled with -march=native or not.

And now we can define two versions of the module just by including the actual code from the *.pxi-file. One module called worker_native.pyx:

# distutils: extra_compile_args=["-march=native"]

include "worker.pxi"

and worker_fallback.pyx:

include "worker.pxi"

Building everything, e.g. via cythonize -i -3 *.pyx, it can be used as follows:

from impl_picker import cpu_has_avx_support

# overhead once when imported:
if cpu_has_avx_support():
    import worker_native as worker
else:
    print("using fallback worker")
    import worker_fallback as worker

print("compiled_with_avx_support:", worker.compiled_with_avx_support())

On my machine the above would lead to compiled_with_avx_support: True, on older machines the "slower" worker_fallback will be used and the result will be compiled_with_avx_support: False.


The goal of this post is not to give a working setup.py, but just to outline the idea how one could achieve the goal of picking correct version at the run time. Obviously, the setup.py could be quite more complicated: e.g. one would need to compile multiple c-files with different compiler settings (see this SO-post, how this could be achieved).