Using Assembly Language in C/C++
The only time it's useful to revert to assembly language is when
-
the CPU instructions don't have functional equivalents in C++ (e.g. single-instruction-multiple-data instructions, BCD or decimal arithmetic operations)
- AND the compiler doesn't provide extra functions to wrap these operations (e.g. C++11 Standard has atomic operations including compare-and-swap,
<cstdlib>
hasdiv
/ldiv
et al for getting quotient and remainder efficiently) - AND there isn't a good third-party library (e.g. http://mitpress.mit.edu/catalog/item/default.asp?tid=3952&ttype=2)
OR
- AND the compiler doesn't provide extra functions to wrap these operations (e.g. C++11 Standard has atomic operations including compare-and-swap,
for some inexplicable reason - the optimiser is failing to use the best CPU instructions
...AND...
- the use of those CPU instructions would give some significant and useful performance boost to bottleneck code.
Simply using inline assembly to do an operation that can easily be expressed in C++ - like adding two values or searching in a string - is actively counterproductive, because:
- the compiler knows how to do this equally well
- to verify this, look at its assembly output (e.g.
gcc -S
) or disassemble the machine code
- to verify this, look at its assembly output (e.g.
- you're artificially restricting its choices regarding register allocation, CPU instructions etc., so it may take longer to prepare the CPU registers with the values needed to execute your hardcoded instruction, then longer to get back to an optimal allocation for future instructions
- compiler optimisers can choose between equivalent-performance instructions specifying different registers to minimise copying between them, and may choose registers in such a way that a single core can process multiple instructions during one cycle, whereas forcing everythingt through specific registers would serialise it
- in fairness, GCC has ways to express needs for specific types of registers without constraining the CPU to an exact register, still allowing such optimisations, but it's the only inline assembly I've ever seen that addresses this
- compiler optimisers can choose between equivalent-performance instructions specifying different registers to minimise copying between them, and may choose registers in such a way that a single core can process multiple instructions during one cycle, whereas forcing everythingt through specific registers would serialise it
- if a new CPU model comes out next year with another instruction that's 1000% faster for that same logical operation, then the compiler vendor is more likely to update their compiler to use that instruction, and hence your program to benefit once recompiled, than you are (or whomever's maintaining the software then is)
- the compiler will select an optimal approach for the target architecture its told about: if you hardcode one solution then it will need to be a lowest-common-denominator or
#ifdef
-ed for your platforms - assembly language isn't as portable as C++, both across CPUs and across compilers, and even if you seemingly port an instruction, it's possible to make a mistake re registers that are safe to clobber, argument passing conventions etc.
- other programmers may not know or be comfortable with assembly
One perspective that I think's worth keeping in mind is that when C was introduced it had to win over a lot of hardcore assembly language programmers who fussed over the machine code generated. Machines had less CPU power and RAM back then and you can bet people fussed over the tiniest thing. Optimisers became very sophisticated and have continued to improve, whereas the assembly languages of processors like the x86 have become increasingly complicated, as have their execution pipelines, caches and other factors involved in their performance. You can't just add values from a table of cycles-per-instruction any more. Compiler writers spend time considering all those subtle factors (especially those working for CPU manufacturers, but that ups the pressure on other compilers too). It's now impractical for assembly programmers to average - over any non-trivial application - significantly better efficiency of code than that generated by a good optimising compiler, and they're overwhelmingly likely to do worse. So, use of assembly should be limited to times it really makes a measurable and useful difference, worth the coupling and maintenance costs.
First of all, you need to profile your program. Then you optimize the most used paths in C or C++ code. Unless advantages are clear you don't rewrite in assembler. Using assembler makes your code harder to maintain and much less portable - it is not worth it except in very rare situations.