What is the effect of ordering if...else if statements by probability?
Specifically, if I have a series of if
...else if
statements, and I somehow know beforehand the relative probability that each statement will evaluate to true
, how much difference in execution time does it make to sort them in order of probability? For example, should I prefer this:
if (highly_likely)
//do something
else if (somewhat_likely)
//do something
else if (unlikely)
//do something
to this?:
if (unlikely)
//do something
else if (somewhat_likely)
//do something
else if (highly_likely)
//do something
It seems obvious that the sorted version would be faster, however for readability or the existence of side-effects, we might want to order them non-optimally. It's also hard to tell how well the CPU will do with branch prediction until you actually run the code.
So, in the course of experimenting with this, I ended up answering my own question for a specific case, however I'd like to hear other opinions/insights as well.
Important: this question assumes that the if
statements can be arbitrarily reordered without having any other effects on the behavior of the program. In my answer, the three conditional tests are mutually exclusive and produce no side effects. Certainly, if the statements must be evaluated in a certain order to achieve some desired behavior, then the issue of efficiency is moot.
As a general rule, most if not all Intel CPUs assume forward branches are not taken the first time they see them. See Godbolt's work.
After that, the branch goes into a branch prediction cache, and past behavior is used to inform future branch prediction.
So in a tight loop, the effect of misordering is going to be relatively small. The branch predictor is going to learn which set of branches is most likely, and if you have non-trivial amount of work in the loop the small differences won't add up much.
In general code, most compilers by default (lacking another reason) will order the produced machine code roughly the way you ordered it in your code. Thus if statements are forward branches when they fail.
So you should order your branches in the order of decreasing likelihood to get the best branch prediction from a "first encounter".
A microbenchmark that loops tightly many times over a set of conditions and does trivial work is going to dominated by tiny effects of instruction count and the like, and little in the way of relative branch prediction issues. So in this case you must profile, as rules of thumb won't be reliable.
On top of that, vectorization and many other optimizations apply to tiny tight loops.
So in general code, put most likely code within the if
block, and that will result in the fewest un-cached branch prediction misses. In tight loops, follow the general rule to start, and if you need to know more you have little choice but to profile.
Naturally this all goes out the window if some tests are far cheaper than others.
I made up the following test to time the execution of two different if
...else if
blocks, one sorted in order of probability, the other sorted in reverse order:
#include <chrono>
#include <iostream>
#include <random>
#include <algorithm>
#include <iterator>
#include <functional>
using namespace std;
int main()
{
long long sortedTime = 0;
long long reverseTime = 0;
for (int n = 0; n != 500; ++n)
{
//Generate a vector of 5000 random integers from 1 to 100
random_device rnd_device;
mt19937 rnd_engine(rnd_device());
uniform_int_distribution<int> rnd_dist(1, 100);
auto gen = std::bind(rnd_dist, rnd_engine);
vector<int> rand_vec(5000);
generate(begin(rand_vec), end(rand_vec), gen);
volatile int nLow, nMid, nHigh;
chrono::time_point<chrono::high_resolution_clock> start, end;
//Sort the conditional statements in order of increasing likelyhood
nLow = nMid = nHigh = 0;
start = chrono::high_resolution_clock::now();
for (int& i : rand_vec) {
if (i >= 95) ++nHigh; //Least likely branch
else if (i < 20) ++nLow;
else if (i >= 20 && i < 95) ++nMid; //Most likely branch
}
end = chrono::high_resolution_clock::now();
reverseTime += chrono::duration_cast<chrono::nanoseconds>(end-start).count();
//Sort the conditional statements in order of decreasing likelyhood
nLow = nMid = nHigh = 0;
start = chrono::high_resolution_clock::now();
for (int& i : rand_vec) {
if (i >= 20 && i < 95) ++nMid; //Most likely branch
else if (i < 20) ++nLow;
else if (i >= 95) ++nHigh; //Least likely branch
}
end = chrono::high_resolution_clock::now();
sortedTime += chrono::duration_cast<chrono::nanoseconds>(end-start).count();
}
cout << "Percentage difference: " << 100 * (double(reverseTime) - double(sortedTime)) / double(sortedTime) << endl << endl;
}
Using MSVC2017 with /O2, the results show that the sorted version is consistently about 28% faster than the unsorted version. Per luk32's comment, I also switched the order of the two tests, which makes a noticeable difference (22% vs 28%). The code was run under Windows 7 on an Intel Xeon E5-2697 v2. This is, of course, very problem-specific and should not be interpreted as a conclusive answer.