What replaces x86 intrinsics for C when Apple ditches Intel CPUs for their own chips?

There are whole industries founded on the use of Intel Intrinsics for CPU parallelisation (with SIMD). For example, the community of Lattice QCD physicists depend on that for boost in the efficiency of lattice simulations.

Intel-based macs can be and are routinely used by such professionals to do their job. However, there are rumours about Apple replacing Intel CPU with ARM cpus in future Macs. Will these professionals have to replace Macs with other Intel-based computers, or are there alternatives to Intel Intrinsics for C that are supported on ARM-based CPUs?


Intel Intrinsics are really just a library that provides easier access to a number of Intel instructions sets - such as SSE (Streaming SIMD Extensions), AVX, etc. - for C programmers. The goal is to be able to utilise these instruction sets for parallelisation, etc. without having to do low-level assembly programming by hand.

The ARM platform has similar instruction sets that serve many of the same purposes. For example NEON is the ARM alternative to SSE on Intel. NEON gives you SIMD instructions that you can leverage to increase parallelisation.

And similar to the Intel Intrinsics, you have the ARM Compiler Intrinsics, that serves the same purpose. You can include "arm_neon.h" in your C program to be able to use NEON instructions with a C interface without having to resort to low-level assembly programming.

It is worth noting however, that the instructions available on Intel and ARM are not identical. So similar to "ordinary programs", you cannot use SIMD instructions for Intel on ARM (or vice versa) directly. In practice, software programmers often use software libraries with ready-made higher level operations that are able to take advantage of both Intel instructions as well as ARM instructions. A good example is the "Simd" image processing library (https://github.com/ermig1979/Simd) which offers high level operations that have seperate, optimized implementations for SSE, AVX, VMX, VSX and NEON (i.e. Intel, PowerPC and ARM).

As far as I can see, the growth in new parallisation features is very high on both Intel and ARM platforms - it is essential to providing next generation performance for some users. On newer ARM chips you have access to for example the SVE instruction set (Scalable Vector Extensions, which is essentially an even better SIMD instruction set for 64-bit ARM processors). There's no inherent advantage to either the Intel or ARM platforms in terms of providing new and enhanced SIMD instruction sets for programmers in the future.

Apple's own processors (in for example iPhones and iPads) have had the NEON instruction set for many years. The A5 CPUs and later also have the Advanced NEON set. The newer A11 CPUs have the SVE instructions, and the very latest A12 CPUs add SIMD support for complex numbers on top of that.


The Apple M1 supports Neon SIMD instructions but not SVE. You can use sse2neon which clones the x86-64 SIMD intrinsics (MMX, SSE, AES) with their Neon counterparts.

Here are some benchmarks using this simple program. The only change made to the C code to allow compilation on the M1 was this conditional:

#ifdef __x86_64__
 #include <immintrin.h>
#else   
 #include "sse2neon.h"
#endif

This allows you to use the same intrinsics for both architectures. Intel provides a great guide for using the x86-64 intrinsics.