Is ADD 1 really faster than INC ? x86 [duplicate]
I have read various optimization guides that claim ADD 1 is faster than using INC in x86. Is this really true?
On some micro-architectures, with some instruction streams, INC
will incur a "partial flags update stall" (because it updates some of the flags while preserving the others). ADD
sets the value of all of the flags, and so does not risk such a stall.
ADD
is not always faster than INC
, but it is almost always at least as fast (there are a few corner cases on certain older micro-architectures, but they are exceedingly rare), and sometimes significantly faster.
For more details, consult Intel's Optimization Reference Manual or Agner Fog's micro-architecture notes.
While it's not a definite answer. Write this C file:
=== inc.c ===
#include <stdio.h>
int main(int argc, char *argv[])
{
for (int n = 0; n < 1000; n++) {
printf("%d\n", n);
}
return 0;
}
Then run:
clang -march=native -masm=intel -O3 -S -o inc.clang.s inc.c
gcc -march=native -masm=intel -O3 -S -o inc.gcc.s inc.c
Note the generated assembly code. Relevant clang output:
mov esi, ebx
call printf
inc ebx
cmp ebx, 1000
jne .LBB0_1
Relevant gcc output:
mov edi, 1
inc ebx
call __printf_chk
cmp ebx, 1000
jne .L2
This proves that both clang's and gcc's authors thinks INC
is the better choice over ADD reg, 1
on modern architectures.
What would that mean for your question? Well, I would trust their judgement over the guides you have read and conclude that INC
is just as fast as ADD
and that the one byte saved due to the shorter register encoding makes it preferable. Compiler authors are just people so they can be wrong, but it is unlikely. :)
Some more experimentation shows me that if you don't use the -march=native
option, then gcc will use add ebx, 1
instead. Clang otoh, always likes inc best. My conclusion is that when you asked the question in 2012 ADD
was sometimes preferable but now in the year 2016 you should always go with INC
.