Why do local variables require initialization, but fields do not?

If I create a bool within my class, just something like bool check, it defaults to false.

When I create the same bool within my method, bool check(instead of within the class), i get an error "use of unassigned local variable check". Why?

Solution 1:

Yuval and David's answers are basically correct; summing up:

Use of an unassigned local variable is a likely bug, and this can be detected by the compiler at low cost.
Use of an unassigned field or array element is less likely a bug, and it is harder to detect the condition in the compiler. Therefore the compiler makes no attempt to detect the use of an uninitialized variable for fields, and instead relies upon the initialization to the default value in order to make the program behavior deterministic.

A commenter to David's answer asks why it is impossible to detect the use of an unassigned field via static analysis; this is the point I want to expand upon in this answer.

First off, for any variable, local or otherwise, it is in practice impossible to determine exactly whether a variable is assigned or unassigned. Consider:

bool x;
if (M()) x = true;
Console.WriteLine(x);

The question "is x assigned?" is equivalent to "does M() return true?" Now, suppose M() returns true if Fermat's Last Theorem is true for all integers less than eleventy gajillion, and false otherwise. In order to determine whether x is definitely assigned, the compiler must essentially produce a proof of Fermat's Last Theorem. The compiler is not that smart.

So what the compiler does instead for locals is implements an algorithm which is fast, and overestimates when a local is not definitely assigned. That is, it has some false positives, where it says "I can't prove that this local is assigned" even though you and I know it is. For example:

bool x;
if (N() * 0 == 0) x = true;
Console.WriteLine(x);

Suppose N() returns an integer. You and I know that N() * 0 will be 0, but the compiler does not know that. (Note: the C# 2.0 compiler did know that, but I removed that optimization, as the specification does not say that the compiler knows that.)

All right, so what do we know so far? It is impractical for locals to get an exact answer, but we can overestimate not-assigned-ness cheaply and get a pretty good result that errs on the side of "make you fix your unclear program". That's good. Why not do the same thing for fields? That is, make a definite assignment checker that overestimates cheaply?

Well, how many ways are there for a local to be initialized? It can be assigned within the text of the method. It can be assigned within a lambda in the text of the method; that lambda might never be invoked, so those assignments are not relevant. Or it can be passed as "out" to anothe method, at which point we can assume it is assigned when the method returns normally. Those are very clear points at which the local is assigned, and they are right there in the same method that the local is declared. Determining definite assignment for locals requires only local analysis. Methods tend to be short -- far less than a million lines of code in a method -- and so analyzing the entire method is quite quick.

Now what about fields? Fields can be initialized in a constructor of course. Or a field initializer. Or the constructor can call an instance method that initializes the fields. Or the constructor can call a virtual method that initailizes the fields. Or the constructor can call a method in another class, which might be in a library, that initializes the fields. Static fields can be initialized in static constructors. Static fields can be initialized by other static constructors.

Essentially the initializer for a field could be anywhere in the entire program, including inside virtual methods that will be declared in libraries that haven't been written yet:

// Library written by BarCorp
public abstract class Bar
{
    // Derived class is responsible for initializing x.
    protected int x;
    protected abstract void InitializeX(); 
    public void M() 
    { 
       InitializeX();
       Console.WriteLine(x); 
    }
}

Is it an error to compile this library? If yes, how is BarCorp supposed to fix the bug? By assigning a default value to x? But that's what the compiler does already.

Suppose this library is legal. If FooCorp writes

public class Foo : Bar
{
    protected override void InitializeX() { } 
}

is that an error? How is the compiler supposed to figure that out? The only way is to do a whole program analysis that tracks the initialization static of every field on every possible path through the program, including paths that involve choice of virtual methods at runtime. This problem can be arbitrarily hard; it can involve simulated execution of millions of control paths. Analyzing local control flows takes microseconds and depends on the size of the method. Analyzing global control flows can take hours because it depends on the complexity of every method in the program and all the libraries.

So why not do a cheaper analysis that doesn't have to analyze the whole program, and just overestimates even more severely? Well, propose an algorithm that works that doesn't make it too hard to write a correct program that actually compiles, and the design team can consider it. I don't know of any such algorithm.

Now, the commenter suggests "require that a constructor initialize all fields". That's not a bad idea. In fact, it is such a not-bad idea that C# already has that feature for structs. A struct constructor is required to definitely-assign all fields by the time the ctor returns normally; the default constructor initializes all the fields to their default values.

What about classes? Well, how do you know that a constructor has initialized a field? The ctor could call a virtual method to initialize the fields, and now we are back in the same position we were in before. Structs don't have derived classes; classes might. Is a library containing an abstract class required to contain a constructor that initializes all its fields? How does the abstract class know what values the fields should be initialized to?

John suggests simply prohibiting calling methods in a ctor before the fields are initialized. So, summing up, our options are:

Make common, safe, frequently used programming idioms illegal.
Do an expensive whole-program analysis that makes the compilation take hours in order to look for bugs that probably aren't there.
Rely upon automatic initialization to default values.

The design team chose the third option.

Solution 2:

When I create the same bool within my method, bool check(instead of within the class), i get an error "use of unassigned local variable check". Why?

Because the compiler is trying to prevent you from making a mistake.

Does initializing your variable to false change anything in this particular path of execution? Probably not, considering default(bool) is false anyway, but it is forcing you to be aware that this is happening. The .NET environment prevents you from accessing "garbage memory", since it will initialize any value to their default. But still, imagine this was a reference type, and you'd pass an uninitialized (null) value to a method expecting a non-null, and get a NRE at runtime. The compiler is simply trying to prevent that, accepting the fact that this may sometimes result in bool b = false statements.

Eric Lippert talks about this in a blog post:

The reason why we want to make this illegal is not, as many people believe, because the local variable is going to be initialized to garbage and we want to protect you from garbage. We do in fact automatically initialize locals to their default values. (Though the C and C++ programming languages do not, and will cheerfully allow you to read garbage from an uninitialized local.) Rather, it is because the existence of such a code path is probably a bug, and we want to throw you in the pit of quality; you should have to work hard to write that bug.

Why doesn't this apply to a class field? Well, I assume the line had to be drawn somewhere, and local variables initialization are a lot easier to diagnose and get right, as opposed to class fields. The compiler could do this, but think of all the possible checks it would need to be making (where some of them are independent of the class code itself) in order to evaluate if each field in a class is initialized. I am no compiler designer, but I am sure it would be definitely harder as there are plenty of cases that are taken into account, and has to be done in a timely fashion as well. For every feature you have to design, write, test and deploy and the value of implementing this as opposed to the effort put in would be non-worthy and complicated.

Why do local variables require initialization, but fields do not?

Solution 1:

Solution 2:

Related

Recent Posts