Why can't dead code detection be fully solved by a compiler?

compiler-theory

The compilers I've been using in C or Java have dead code prevention (warning when a line won't ever be executed). My professor says that this problem can never be fully solved by compilers though. I was wondering why that is. I am not too familiar with the actual coding of compilers as this is a theory-based class. But I was wondering what they check (such as possible input strings vs acceptable inputs, etc.), and why that is insufficient.

The dead code problem is related to the Halting problem.

Alan Turing proved that it is impossible to write a general algorithm that will be given a program and be able to decide whether that program halts for all inputs. You may be able to write such an algorithm for specific types of programs, but not for all programs.

How does this relate to dead code?

The Halting problem is reducible to the problem of finding dead code. That is, if you find an algorithm that can detect dead code in any program, then you can use that algorithm to test whether any program will halt. Since that has been proven to be impossible, it follows that writing an algorithm for dead code is impossible as well.

How do you transfer an algorithm for dead code into an algorithm for the Halting problem?

Simple: you add a line of code after the end of the program you want to check for halt. If your dead-code detector detects that this line is dead, then you know that the program does not halt. If it doesn't, then you know that your program halts (gets to the last line, and then to your added line of code).

Compilers usually check for things that can be proven at compile-time to be dead. For example, blocks that are dependent on conditions that can be determined to be false at compile time. Or any statement after a return (within the same scope).

These are specific cases, and therefore it's possible to write an algorithm for them. It may be possible to write algorithms for more complicated cases (like an algorithm that checks whether a condition is syntactically a contradiction and therefore will always return false), but still, that wouldn't cover all possible cases.

Well, let's take the classical proof of the undecidability of the halting problem and change the halting-detector to a dead-code detector!

C# program

using System;
using YourVendor.Compiler;

class Program
{
    static void Main(string[] args)
    {
        string quine_text = @"using System;
using YourVendor.Compiler;

class Program
{{
    static void Main(string[] args)
    {{
        string quine_text = @{0}{1}{0};
        quine_text = string.Format(quine_text, (char)34, quine_text);

        if (YourVendor.Compiler.HasDeadCode(quine_text))
        {{
            System.Console.WriteLine({0}Dead code!{0});
        }}
    }}
}}";
        quine_text = string.Format(quine_text, (char)34, quine_text);

        if (YourVendor.Compiler.HasDeadCode(quine_text))
        {
            System.Console.WriteLine("Dead code!");
        }
    }
}

If YourVendor.Compiler.HasDeadCode(quine_text) returns false, then the line System.Console.WriteLn("Dead code!"); won't be ever executed, so this program actually does have dead code, and the detector was wrong.

But if it returns true, then the line System.Console.WriteLn("Dead code!"); will be executed, and since there is no more code in the program, there is no dead code at all, so again, the detector was wrong.

So there you have it, a dead-code detector that returns only "There is dead code" or "There is no dead code" must sometimes yield wrong answers.

If the halting problem is too obscure, think of it this way.

Take a mathematical problem that is believed to be true for all positive integer's n, but hasn't been proven to be true for every n. A good example would be Goldbach's conjecture, that any positive even integer greater than two can be represented by the sum of two primes. Then (with an appropriate bigint library) run this program (pseudocode follows):

 for (BigInt n = 4; ; n+=2) {
     if (!isGoldbachsConjectureTrueFor(n)) {
         print("Conjecture is false for at least one value of n\n");
         exit(0);
     }
 }

Implementation of isGoldbachsConjectureTrueFor() is left as an exercise for the reader but for this purpose could be a simple iteration over all primes less than n

Now, logically the above must either be the equivalent of:

 for (; ;) {
 }

(i.e. an infinite loop) or

print("Conjecture is false for at least one value of n\n");

as Goldbach's conjecture must either be true or not true. If a compiler could always eliminate dead code, there would definitely be dead code to eliminate here in either case. However, in doing so at the very least your compiler would need to solve arbitrarily hard problems. We could provide problems provably hard that it would have to solve (e.g. NP-complete problems) to determine which bit of code to eliminate. For instance if we take this program:

 String target = "f3c5ac5a63d50099f3b5147cabbbd81e89211513a92e3dcd2565d8c7d302ba9c";
 for (BigInt n = 0; n < 2**2048; n++) {
     String s = n.toString();
     if (sha256(s).equals(target)) {
         print("Found SHA value\n");
         exit(0);
     }
 }
 print("Not found SHA value\n");

we know that the program will either print out "Found SHA value" or "Not found SHA value" (bonus points if you can tell me which one is true). However, for a compiler to be able to reasonably optimise that would take of the order of 2^2048 iterations. It would in fact be a great optimisation as I predict the above program would (or might) run until the heat death of the universe rather than printing anything without optimisation.

I don't know if C++ or Java have an Eval type function, but many languages do allow you do call methods by name. Consider the following (contrived) VBA example.

Dim methodName As String

If foo Then
    methodName = "Bar"
Else
    methodName = "Qux"
End If

Application.Run(methodName)

The name of the method to be called is impossible to know until runtime. Therefore, by definition, the compiler cannot know with absolute certainty that a particular method is never called.

Actually, given the example of calling a method by name, the branching logic isn't even necessary. Simply saying

Application.Run("Bar")

Is more than the compiler can determine. When the code is compiled, all the compiler knows is that a certain string value is being passed to that method. It doesn't check to see if that method exists until runtime. If the method isn't called elsewhere, through more normal methods, an attempt to find dead methods can return false positives. The same issue exists in any language that allows code to be called via reflection.

Why can't dead code detection be fully solved by a compiler?

Related

Recent Posts