What do compilers do with compile-time branching?

Solution 1:

TL;DR

There are several ways to get different run-time behavior dependent on a template parameter. Performance should not be your primary concern here, but flexibility and maintainability should. In all cases, the various thin wrappers and constant conditional expressions will all be optimized away on any decent compiler for release builds. Below a small summary with the various tradeoffs (inspired by this answer by @AndyProwl).

Run-time if

Your first solution is the simple run-time if:

template<class T>
T numeric_procedure(const T& x)
{
    if (std::is_integral<T>::value) {
        // valid code for integral types
    } else {
        // valid code for non-integral types,
        // must ALSO compile for integral types
    }
}

It is simple and effective: any decent compiler will optimize away the dead branch.

There are several disadvantages:

  • on some platforms (MSVC), a constant conditional expression yields a spurious compiler warning which you then need to ignore or silence.
  • But worse, on all conforming platforms, both branches of the if/else statement need to actually compile for all types T, even if one of the branches is known not to be taken. If T contains different member types depending on its nature, then you will get a compiler error as soon as you try to access them.

Tag dispatching

Your second approach is known as tag-dispatching:

template<class T>
T numeric_procedure_impl(const T& x, std::false_type)
{
    // valid code for non-integral types,
    // CAN contain code that is invalid for integral types
}    

template<class T>
T numeric_procedure_impl(const T& x, std::true_type)
{
    // valid code for integral types
}

template<class T>
T numeric_procedure(const T& x)
{
    return numeric_procedure_impl(x, std::is_integral<T>());
}

It works fine, without run-time overhead: the temporary std::is_integral<T>() and the call to the one-line helper function will both be optimized way on any decent platform.

The main (minor IMO) disadvantage is that you have some boilerplate with 3 instead of 1 function.

SFINAE

Closely related to tag-dispatching is SFINAE (Substitution failure is not an error)

template<class T, class = typename std::enable_if<!std::is_integral<T>::value>::type>
T numeric_procedure(const T& x)
{
    // valid code for non-integral types,
    // CAN contain code that is invalid for integral types
}    

template<class T, class = typename std::enable_if<std::is_integral<T>::value>::type>
T numeric_procedure(const T& x)
{
    // valid code for integral types
}

This has the same effect as tag-dispatching but works slightly differently. Instead of using argument-deduction to select the proper helper overload, it directly manipulates the overload set for your main function.

The disadvantage is that it can be a fragile and tricky way if you don't know exactly what the entire overload set is (e.g. with template heavy code, ADL could pull in more overloads from associated namespaces you didn't think of). And compared to tag-dispatching, selection based on anything other than a binary decision is a lot more involved.

Partial specialization

Another approach is to use a class template helper with a function application operator and partially specialize it

template<class T, bool> 
struct numeric_functor;

template<class T>
struct numeric_functor<T, false>
{
    T operator()(T const& x) const
    {
        // valid code for non-integral types,
        // CAN contain code that is invalid for integral types
    }
};

template<class T>
struct numeric_functor<T, true>
{
    T operator()(T const& x) const
    {
        // valid code for integral types
    }
};

template<class T>
T numeric_procedure(T const& x)
{
    return numeric_functor<T, std::is_integral<T>::value>()(x);
}

This is probably the most flexible approach if you want to have fine-grained control and minimal code duplication (e.g. if you also want to specialize on size and/or alignment, but say only for floating point types). The pattern matching given by partial template specialization is ideally suited for such advanced problems. As with tag-dispatching, the helper functors are optimized away by any decent compiler.

The main disadvantage is the slightly larger boiler-plate if you only want to specialize on a single binary condition.

If constexpr (C++1z proposal)

This is a reboot of failed earlier proposals for static if (which is used in the D programming language)

template<class T>
T numeric_procedure(const T& x)
{
    if constexpr (std::is_integral<T>::value) {
        // valid code for integral types
    } else {
        // valid code for non-integral types,
        // CAN contain code that is invalid for integral types
    }
}

As with your run-time if, everything is in one place, but the main advantage here is that the else branch will be dropped entirely by the compiler when it is known not to be taken. A great advantage is that you keep all code local, and do not have to use little helper functions as in tag dispatching or partial template specialization.

Concepts-Lite (C++1z proposal)

Concepts-Lite is an upcoming Technical Specification that is scheduled to be part of the next major C++ release (C++1z, with z==7 as the best guess).

template<Non_integral T>
T numeric_procedure(const T& x)
{
    // valid code for non-integral types,
    // CAN contain code that is invalid for integral types
}    

template<Integral T>
T numeric_procedure(const T& x)
{
    // valid code for integral types
}

This approach replaces the class or typename keyword inside the template< > brackets with a concept name describing the family of types that the code is supposed to work for. It can be seen as a generalization of the tag-dispatching and SFINAE techniques. Some compilers (gcc, Clang) have experimental support for this feature. The Lite adjective is referring to the failed Concepts C++11 proposal.

Solution 2:

Note that although the optimizer may well be able to prune statically-known tests and unreachable branches from the generated code, the compiler still needs to be able to compile each branch.

That is:

int foo() {
  #if 0
    return std::cout << "this isn't going to work\n";
  #else
    return 1;
  #endif
}

will work fine, because the preprocessor strips out the dead branch before the compiler sees it, but:

int foo() {
  if (std::is_integral<double>::value) {
    return std::cout << "this isn't going to work\n";
  } else {
    return 1;
  }
}

won't. Even though the optimizer can discard the first branch, it will still fail to compile. This is where using enable_if and SFINAE help, because you can select the valid (compilable) code, and the invalid (un-compilable) code's Failure to compile Is Not An Error.

Solution 3:

To answer the title question about how compilers handle if(false):

They optimize away constant branch conditions (and the dead code)

The language standard does not of course require compilers to not be terrible, but the C++ implementations that people actually use are non-terrible in this way. (So are most C implementations, except for maybe very simplistic non-optimizing ones like tinycc.)

One of the major reasons C++ is designed around if(something) instead of the C preprocessor's #ifdef SOMETHING is that they're equally efficient. Many C++ features (like constexpr) only got added after compilers already implemented the necessary optimizations (inlining + constant propagation). (The reason we put up with all the undefined-behaviour pitfalls and gotchas of C and C++ is performance, especially with modern compilers that aggressively optimize on the assumption of no UB. The language design typically doesn't impose unnecessary performance costs.)


But if you care about debug-mode performance, the choice can be relevant depending on your compiler. (e.g. for a game or other program with real-time requirements for a debug build to even be testable).

e.g. clang++ -O0 ("debug mode") still evaluates an if(constexpr_function()) at compile time and treats it like if(false) or if(true). Some other compilers only eval at compile-time if they're forced to (by template-matching).


There is no performance cost for if(false) with optimization enabled. (Barring missed-optimization bugs, which might depend on how early in the compile process the condition can be resolved to false and dead-code elimination can remove it before the compiler "thinks about" reserving stack space for its variables, or that the function may be non-leaf, or whatever.)

Any non-terrible compiler can optimize away dead code behind a compile-time-constant condition (Wikipedia: Dead Code Elimination). This is part of the baseline expectations people have for a C++ implementation to be usable in the real world; it's one of the most basic optimizations and all compilers in real use do it for simple cases like a constexpr.

Often constant-propagation (especially after inlining) will make conditions compile-time constants even if they weren't obviously so in the source. One of the more-obvious cases is optimizing away the compare on the first iterations of a for (int i=0 ; i<n ; i++) so it can turn into a normal asm loop with a conditional branch at the bottom (like a do{}while loop in C++) if n is constant or provably > 0. (Yes, real compilers do value-range optimizations, not just constant propagation.)


Some compilers, like gcc and clang, remove dead code inside an if(false) even in "debug" mode, at the minimum level of optimization that's required for them to transform the program logic through their internal arch-neutral representations and eventually emit asm. (But debug mode disables any kind of constant-propagation for variables that aren't declared const or constexpr in the source.)

Some compilers only do it when optimization is enabled; for example MSVC really likes to be literal in its translation of C++ to asm in debug mode and will actually create a zero in a register and branch on it being zero or not for if(false).

For gcc debug mode (-O0), constexpr functions aren't inlined if they don't have to be. (In some places the language requires a constant, like an array size inside a struct. GNU C++ supports C99 VLAs, but does choose to inline a constexpr function instead of actually making a VLA in debug mode.)

But non-function constexprs do get evaluated at compile time, not stored in memory and tested.

But just to reiterate, at any level of optimization, constexpr functions are fully inlined and optimized away, and then the if()


Examples (from the Godbolt compiler explorer)

#include <type_traits>
void baz() {
    if (std::is_integral<float>::value) f1();  // optimizes for gcc
    else f2();
}

All compilers with -O2 optimization enabled (for x86-64):

baz():
        jmp     f2()    # optimized tailcall

Debug-mode code quality, normally not relevant

GCC with optimization disabled still evaluates the expression and does dead-code elimination:

baz():
        push    rbp
        mov     rbp, rsp          # -fno-omit-frame-pointer is the default at -O0
        call    f2()              # still an unconditional call, no runtime branching
        nop
        pop     rbp
        ret

To see gcc not inline something with optimization disabled

static constexpr bool always_false() { return sizeof(char)==2*sizeof(int); }
void baz() {
    if (always_false()) f1();
    else f2();
}
static constexpr bool always_false() { return sizeof(char)==2*sizeof(int); }
void baz() {
    if (always_false()) f1();
    else f2();
}
;; gcc9.1 with no optimization chooses not to inline the constexpr function
baz():
        push    rbp
        mov     rbp, rsp
        call    always_false()
        test    al, al              # the bool return value
        je      .L9
        call    f1()
        jmp     .L11
.L9:
        call    f2()
.L11:
        nop
        pop     rbp
        ret

MSVC's braindead literal code-gen with optimization disabled:

void foo() {
    if (false) f1();
    else f2();
}
;; MSVC 19.20 x86-64  no optimization
void foo(void) PROC                                        ; foo
        sub     rsp, 40                             ; 00000028H
        xor     eax, eax                     ; EAX=0
        test    eax, eax                     ; set flags from EAX (which were already set by xor)
        je      SHORT $LN2@foo               ; jump if ZF is set, i.e. if EAX==0
        call    void f1(void)                          ; f1
        jmp     SHORT $LN3@foo
$LN2@foo:
        call    void f2(void)                          ; f2
$LN3@foo:
        add     rsp, 40                             ; 00000028H
        ret     0

Benchmarking with optimization disabled is not useful

You should always enable optimization for real code; the only time debug-mode performance matters is when that's a pre-condition for debugability. It's not a useful proxy to avoid having your benchmark optimize away; different code gains more or less from debug mode depending on how it's written.

Unless that's a really big deal for your project, and you just can't find enough info about local vars or something with minimal optimization like g++ -Og, the headline of this answer is the full answer. Ignore debug mode, only bother thinking about quality of the asm in optimized builds. (Preferably with LTO enabled, if your project can enable that to allow cross-file inlining.)