gcc -O0 still optimizes out "unused" code. Is there a compile flag to change that?

Solution 1:

Why does gcc not emit the specified instruction?

A compiler produces code that must have the observable behavior specified by the Standard. Anything that is not observable can be changed (and optimized) at will, as it does not change the behavior of the program (as specified).

How can you beat it into submission?

The trick is to make the compiler believe that the behavior of the particular piece of code is actually observable.

Since this a problem frequently encountered in micro-benchmark, I advise you to look how (for example) Google-Benchmark addresses this. From benchmark_api.h we get:

template <class Tp>
inline void DoNotOptimize(Tp const& value) {
    asm volatile("" : : "g"(value) : "memory");
}

The details of this syntax are boring, for our purpose we only need to know:

  • "g"(value) tells that value is used as input to the statement
  • "memory" is a compile-time read/write barrier

So, we can change the code to:

asm volatile("" : : : "memory");

__m128 result = _mm_div_ss(s1, s2);

asm volatile("" : : "g"(result) : );

Which:

  • forces the compiler to consider that s1 and s2 may have been modified between their initialization and use
  • forces the compiler to consider that the result of the operation is used

There is no need for any flag, and it should work at any level of optimization (I tested it on https://gcc.godbolt.org/ at -O3).

Solution 2:

GCC doesn't "optimize out" anything here. It just doesn't generate useless code. It seems to a very common illusion that there's some pure form of code that the compiler should generate and any changes to that are an "optimization". There is no such thing.

The compiler creates some data structure that represents what the code means, then it applies some transformations on that data structure and from that it generates assembler that then gets compiled down to instructions. If you compile without "optimizations" it just means that the compiler will only do the least effort possible to generate code.

In this case, the whole statement is useless because it doesn't do anything and is thrown away immediately (after expanding the inlines and what the builtins mean it is equivalent to writing a/b;, the difference is that writing a/b; will emit a warning about statement with no effect while the builtins probably aren't handled by the same warnings). This is not an optimization, the compiler would actually have to expend extra effort to invent meaning to a meaningless statement, then fake a temporary variable to store the result of this statement to then throw it away.

What you're looking for is not flags to disable optimizations, but pessimization flags. I don't think any compiler developers waste time implementing such flags. Other than maybe as an April fools joke.

Solution 3:

I'm not an expert with gcc internals, but it seems that your problem is not with removing dead code by some optimization pass. It is most likely that the compiler is not even considering generate this code in the first place.

Let's reduce your example from compiler specific intrinsics to a plain old addition:

int foo(int num) {
    num + 77;
    return num + 15;
}

No code for + 77 generated:

foo(int):
        push    rbp
        mov     rbp, rsp
        mov     DWORD PTR [rbp-4], edi
        mov     eax, DWORD PTR [rbp-4]
        add     eax, 15
        pop     rbp
        ret

When one of the operands has side effects, only that operand gets evaluated. Still, no addition in the assembly.

But saving this result into an (even unused) variable forces the compiler to generate code for addition:

int foo(int num) {
  int baz = num + 77;
  return num + 15;
}

Assembly:

foo(int):
    push    rbp
    mov     rbp, rsp
    mov     DWORD PTR [rbp-20], edi
    mov     eax, DWORD PTR [rbp-20]
    add     eax, 77
    mov     DWORD PTR [rbp-4], eax
    mov     eax, DWORD PTR [rbp-20]
    add     eax, 15
    pop     rbp
    ret

The following is just a speculation, but from my experience with compiler construction, it is more natural to not generate the code for unused expressions, rather than eliminating this code later.

My recommendation is to be explicit about your intentions, and put the result of an expression into volatile (and, hence, non-removable by the optimizer) variable.

@Matthieu M pointed out that it is not sufficient to prevent precomputing the value. So for something more than playing with signals, you should use documented ways to perform the exact instruction you want (probably, volatile inline assembly).