Why don't C++ compilers optimize this conditional boolean assignment as an unconditional assignment?
Consider the following function:
void func(bool& flag)
{
if(!flag) flag=true;
}
It seems to me that if flag has a valid boolean value, this would be equivalent to unconditional setting it to true
, like this:
void func(bool& flag)
{
flag=true;
}
Yet neither gcc nor clang optimize it this way — both generate the following at -O3
optimization level:
_Z4funcRb:
.LFB0:
.cfi_startproc
cmp BYTE PTR [rdi], 0
jne .L1
mov BYTE PTR [rdi], 1
.L1:
rep ret
My question is: is it just that the code is too special-case to care to optimize, or are there any good reasons why such optimization would be undesired, given that flag
is not a reference to volatile
? It seems the only reason which might be is that flag
could somehow have a non-true
-or-false
value without undefined behavior at the point of reading it, but I'm not sure whether this is possible.
This may negatively impact the performance of the program due to cache coherence considerations. Writing to flag
each time func()
is called would dirty the containing cache line. This will happen regardless of the fact that the value being written exactly matches the bits found at the destination address before the write.
EDIT
hvd has provided another good reason that prevents such an optimization. It is a more compelling argument against the proposed optimization, since it may result in undefined behavior, whereas my (original) answer only addressed performance aspects.
After a little more reflection, I can propose one more example why compilers should be strongly banned - unless they can prove that the transformation is safe for a particular context - from introducing the unconditional write. Consider this code:
const bool foo = true;
int main()
{
func(const_cast<bool&>(foo));
}
With an unconditional write in func()
this definitely triggers undefined behavior (writing to read-only memory will terminate the program, even if the effect of the write would otherwise be a no-op).
Aside from Leon's answer on performance:
Suppose flag
is true
. Suppose two threads are constantly calling func(flag)
. The function as written, in that case, does not store anything to flag
, so this should be thread-safe. Two threads do access the same memory, but only to read it. Unconditionally setting flag
to true
means two different threads would be writing to the same memory. This is not safe, this is unsafe even if the data being written is identical to the data that's already there.
I am not sure about the behaviour of C++ here, but in C the memory might change because if the memory contains a non-zero value other than 1, it would remain unchanged with the check, but changed to 1 with the check.
But as I am not very fluent in C++, I don't know if this situation is even possible.