Is ADD 1 really faster than INC ? x86 [duplicate]

I have read various optimization guides that claim ADD 1 is faster than using INC in x86. Is this really true?


On some micro-architectures, with some instruction streams, INC will incur a "partial flags update stall" (because it updates some of the flags while preserving the others). ADD sets the value of all of the flags, and so does not risk such a stall.

ADD is not always faster than INC, but it is almost always at least as fast (there are a few corner cases on certain older micro-architectures, but they are exceedingly rare), and sometimes significantly faster.

For more details, consult Intel's Optimization Reference Manual or Agner Fog's micro-architecture notes.


While it's not a definite answer. Write this C file:

=== inc.c ===
#include <stdio.h>
int main(int argc, char *argv[])
{
    for (int n = 0; n < 1000; n++) {
        printf("%d\n", n);
    }
    return 0;
}

Then run:

clang -march=native -masm=intel -O3 -S -o inc.clang.s inc.c
gcc -march=native -masm=intel -O3 -S -o inc.gcc.s inc.c

Note the generated assembly code. Relevant clang output:

mov     esi, ebx
call    printf
inc     ebx
cmp     ebx, 1000
jne     .LBB0_1

Relevant gcc output:

mov     edi, 1
inc     ebx
call    __printf_chk
cmp     ebx, 1000
jne     .L2

This proves that both clang's and gcc's authors thinks INC is the better choice over ADD reg, 1 on modern architectures.

What would that mean for your question? Well, I would trust their judgement over the guides you have read and conclude that INC is just as fast as ADD and that the one byte saved due to the shorter register encoding makes it preferable. Compiler authors are just people so they can be wrong, but it is unlikely. :)

Some more experimentation shows me that if you don't use the -march=native option, then gcc will use add ebx, 1 instead. Clang otoh, always likes inc best. My conclusion is that when you asked the question in 2012 ADD was sometimes preferable but now in the year 2016 you should always go with INC.