Why can't GCC optimize the logical bitwise AND pair in "x && (x & 4242)" to "x & 4242"?
Exactly why should it be able to optimize the code? You're assuming that any transformation that works will be done. That's not at all how optimizers work. They're not Artificial Intelligences. They simply work by parametrically replacing known patterns. E.g. the "Common Subexpression Elimination" scans an expression for common subexpressions, and moves them forwards, if that does not change side effects.
(BTW, CSE shows that optimizers are already quite aware of what code movement is allowed in the possible presence of side effects. They know that you have to be careful with &&
. Whether expr && expr
can be CSE-optimized or not depends on the side effects of expr
.)
So, in summary: which pattern do you think applies here?
You are correct that this appears to be a deficiency, and possibly an outright bug, in the optimizer.
Consider:
bool slow(int x)
{
return x && (x & 4242);
}
bool slow2(int x)
{
return (x & 4242) && x;
}
Assembly emitted by GCC 4.8.1 (-O3):
slow:
xorl %eax, %eax
testl %edi, %edi
je .L2
andl $4242, %edi
setne %al
.L2:
rep ret
slow2:
andl $4242, %edi
setne %al
ret
In other words, slow2
is misnamed.
I have only contributed the occasional patch to GCC, so whether my point of view carries any weight is debatable :-). But it is certainly strange, in my view, for GCC to optimize one of these and not the other. I suggest filing a bug report.
[Update]
Surprisingly small changes appear to make a big difference. For example:
bool slow3(int x)
{
int y = x & 4242;
return y && x;
}
...generates "slow" code again. I have no hypothesis for this behavior.
You can experiment with all of these on multiple compilers here.
This is how your code looks in ARM which should make slow
run faster when input it 0.
fast(int):
movw r3, #4242
and r3, r0, r3
adds r0, r3, #0
movne r0, #1
bx lr
slow(int):
cmp r0, #0
bxeq lr
movw r3, #4242
and r3, r0, r3
adds r0, r3, #0
movne r0, #1
bx lr
However GCC would optimize very nicely when you start using such trivial functions anyway.
bool foo() {
return fast(4242) && slow(42);
}
becomes
foo():
mov r0, #1
bx lr
My point is sometimes such code requires more context to be optimized further so why would implementers of optimizers (improvers!) should bother?
Another example:
bool bar(int c) {
if (fast(c))
return slow(c);
}
becomes
bar(int):
movw r3, #4242
and r3, r0, r3
cmp r3, #0
movne r0, #1
bxne lr
bx lr