Which is faster : if (bool) or if(int)?

Solution 1:

Makes sense to me. Your compiler apparently defines a bool as an 8-bit value, and your system ABI requires it to "promote" small (< 32-bit) integer arguments to 32-bit when pushing them onto the call stack. So to compare a bool, the compiler generates code to isolate the least significant byte of the 32-bit argument that g receives, and compares it with cmpb. In the first example, the int argument uses the full 32 bits that were pushed onto the stack, so it simply compares against the whole thing with cmpl.

Solution 2:

Compiling with -03 gives the following for me:

f:

    pushl   %ebp
    movl    %esp, %ebp
    cmpl    $1, 8(%ebp)
    popl    %ebp
    sbbl    %eax, %eax
    andb    $58, %al
    addl    $99, %eax
    ret

g:

    pushl   %ebp
    movl    %esp, %ebp
    cmpb    $1, 8(%ebp)
    popl    %ebp
    sbbl    %eax, %eax
    andb    $58, %al
    addl    $99, %eax
    ret

.. so it compiles to essentially the same code, except for cmpl vs cmpb. This means that the difference, if there is any, doesn't matter. Judging by unoptimized code is not fair.


Edit to clarify my point. Unoptimized code is for simple debugging, not for speed. Comparing the speed of unoptimized code is senseless.

Solution 3:

When I compile this with a sane set of options (specifically -O3), here's what I get:

For f():

        .type   _Z1fi, @function
_Z1fi:
.LFB0:
        .cfi_startproc
        .cfi_personality 0x3,__gxx_personality_v0
        cmpl    $1, %edi
        sbbl    %eax, %eax
        andb    $58, %al
        addl    $99, %eax
        ret
        .cfi_endproc

For g():

        .type   _Z1gb, @function
_Z1gb:
.LFB1:
        .cfi_startproc
        .cfi_personality 0x3,__gxx_personality_v0
        cmpb    $1, %dil
        sbbl    %eax, %eax
        andb    $58, %al
        addl    $99, %eax
        ret
        .cfi_endproc

They still use different instructions for the comparison (cmpb for boolean vs. cmpl for int), but otherwise the bodies are identical. A quick look at the Intel manuals tells me: ... not much of anything. There's no such thing as cmpb or cmpl in the Intel manuals. They're all cmp and I can't find the timing tables at the moment. I'm guessing, however, that there's no clock difference between comparing a byte immediate vs. comparing a long immediate, so for all practical purposes the code is identical.


edited to add the following based on your addition

The reason the code is different in the unoptimized case is that it is unoptimized. (Yes, it's circular, I know.) When the compiler walks the AST and generates code directly, it doesn't "know" anything except what's at the immediate point of the AST it's in. At that point it lacks all contextual information needed to know that at this specific point it can treat the declared type bool as an int. A boolean is obviously by default treated as a byte and when manipulating bytes in the Intel world you have to do things like sign-extend to bring it to certain widths to put it on the stack, etc. (You can't push a byte.)

When the optimizer views the AST and does its magic, however, it looks at surrounding context and "knows" when it can replace code with something more efficient without changing semantics. So it "knows" it can use an integer in the parameter and thereby lose the unnecessary conversions and widening.