Why is it impossible to build a compiler that can determine if a C++ function will change the value of a particular variable?

Why is it impossible to build such a compiler?

For the same reason that you can't write a program that will determine whether any given program will terminate. This is known as the halting problem, and it's one of those things that's not computable.

To be clear, you can write a compiler that can determine that a function does change the variable in some cases, but you can't write one that reliably tells you that the function will or won't change the variable (or halt) for every possible function.

Here's an easy example:

void foo() {
    if (bar() == 0) this->a = 1;
}

How can a compiler determine, just from looking at that code, whether foo will ever change a? Whether it does or doesn't depends on conditions external to the function, namely the implementation of bar. There's more than that to the proof that the halting problem isn't computable, but it's already nicely explained at the linked Wikipedia article (and in every computation theory textbook), so I'll not attempt to explain it correctly here.


Imagine such compiler exists. Let's also assume that for convenience it provides a library function that returns 1 if the passed function modifies a given variable and 0 when the function doesn't. Then what should this program print?

int variable = 0;

void f() {
    if (modifies_variable(f, variable)) {
        /* do nothing */
    } else {
        /* modify variable */
        variable = 1;
    }
}

int main(int argc, char **argv) {
    if (modifies_variable(f, variable)) {
        printf("Modifies variable\n");
    } else {
        printf("Does not modify variable\n");
    }

    return 0;
}

Don't confuse "will or will not modify a variable given these inputs" for "has an execution path which modifies a variable."

The former is called opaque predicate determination, and is trivially impossible to decide - aside from reduction from the halting problem, you could just point out the inputs might come from an unknown source (eg. the user). This is true of all languages, not just C++.

The latter statement, however, can be determined by looking at the parse tree, which is something that all optimizing compilers do. The reason they do is that pure functions (and referentially transparent functions, for some definition of referentially transparent) have all sorts of nice optimizations that can be applied, like being easily inlinable or having their values determined at compile-time; but to know if a function is pure, we need to know if it can ever modify a variable.

So, what appears to be a surprising statement about C++ is actually a trivial statement about all languages.


I think the key word in "whether or not a C++ function will change the value of a particular variable" is "will". It is certainly possible to build a compiler that checks whether or not a C++ function is allowed to change the value of a particular variable, you cannot say with certainty that the change is going to happen:

void maybe(int& val) {
    cout << "Should I change value? [Y/N] >";
    string reply;
    cin >> reply;
    if (reply == "Y") {
        val = 42;
    }
}

I don't think it's necessary to invoke the halting problem to explain that you can't algorithmically know at compile time whether a given function will modify a certain variable or not.

Instead, it's sufficient to point out that a function's behavior often depends on run-time conditions, which the compiler can't know about in advance. E.g.

int y;

int main(int argc, char *argv[]) {
   if (argc > 2) y++;
}

How could the compiler predict with certainty whether y will be modified?