Why is (a % 256) different than (a & 0xFF)?

I always assumed that when doing (a % 256) the optimizer would naturally use an efficient bitwise operation, as if I wrote (a & 0xFF).

When testing on compiler explorer gcc-6.2 (-O3):

// Type your code here, or load an example.
int mod(int num) {
    return num % 256;
}

mod(int):
    mov     edx, edi
    sar     edx, 31
    shr     edx, 24
    lea     eax, [rdi+rdx]
    movzx   eax, al
    sub     eax, edx
    ret

And when trying the other code:

// Type your code here, or load an example.
int mod(int num) {
    return num & 0xFF;
}

mod(int):
    movzx   eax, dil
    ret

Seems like I'm completely missing something out. Any ideas?

Solution 1:

It's not the same. Try num = -79, and you will get different results from both operations. (-79) % 256 = -79, while (-79) & 0xff is some positive number.

Using unsigned int, the operations are the same, and the code will likely be the same.

PS- Someone commented

They shouldn't be the same, a % b is defined as a - b * floor (a / b).

That's not how it is defined in C, C++, Objective-C (ie all the languages where the code in the question would compile).

Solution 2:

Short answer

-1 % 256 yields -1 and not 255 which is -1 & 0xFF. Therefore, the optimization would be incorrect.

Long answer

C++ has the convention that (a/b)*b + a%b == a, which seems quite natural. a/b always returns the arithmetic result without the fractional part (truncating towards 0). As a consequence, a%b has the same sign as a or is 0.

The division -1/256 yields 0 and hence -1%256 must be -1 in order to satisfy the above condition ((-1%256)*256 + -1%256 == -1). This is obviously different from -1&0xFF which is 0xFF. Therefore, the compiler cannot optimize the way you want.

The relevant section in the C++ standard [expr.mul §4] as of N4606 states:

For integral operands the / operator yields the algebraic quotient with any fractional part discarded; if the quotient a/b is representable in the type of the result, (a/b)*b + a%b is equal to a [...].

Enabling the optimization

However, using unsigned types, the optimization would be completely correct, satisfying the above convention:

unsigned(-1)%256 == 0xFF

Other languages

This is handled very different across different programming languages as you can look up on Wikipedia.

Solution 3:

Since C++11, num % 256 has to be non-positive if num is negative.

So the bit pattern would depend on the implementation of signed types on your system: for a negative first argument, the result is not the extraction of the least significant 8 bits.

It would be a different matter if num in your case was unsigned: these days I'd almost expect a compiler to make the optimisation that you cite.