What's the (hidden) cost of Scala's lazy val?

Solution 1:

This is taken from the scala mailing list and gives implementation details of lazy in terms of Java code (rather than bytecode):

class LazyTest {
  lazy val msg = "Lazy"
}

is compiled to something equivalent to the following Java code:

class LazyTest {
  public int bitmap$0;
  private String msg;

  public String msg() {
    if ((bitmap$0 & 1) == 0) {
        synchronized (this) {
            if ((bitmap$0 & 1) == 0) {
                synchronized (this) {
                    msg = "Lazy";
                }
            }
            bitmap$0 = bitmap$0 | 1;
        }
    }
    return msg;
  }

}

Solution 2:

It looks like the compiler arranges for a class-level bitmap int field to flag multiple lazy fields as initialized (or not) and initializes the target field in a synchronized block if the relevant xor of the bitmap indicates it is necessary.

Using:

class Something {
  lazy val foo = getFoo
  def getFoo = "foo!"
}

produces sample bytecode:

 0  aload_0 [this]
 1  getfield blevins.example.Something.bitmap$0 : int [15]
 4  iconst_1
 5  iand
 6  iconst_0
 7  if_icmpne 48
10  aload_0 [this]
11  dup
12  astore_1
13  monitorenter
14  aload_0 [this]
15  getfield blevins.example.Something.bitmap$0 : int [15]
18  iconst_1
19  iand
20  iconst_0
21  if_icmpne 42
24  aload_0 [this]
25  aload_0 [this]
26  invokevirtual blevins.example.Something.getFoo() : java.lang.String [18]
29  putfield blevins.example.Something.foo : java.lang.String [20]
32  aload_0 [this]
33  aload_0 [this]
34  getfield blevins.example.Something.bitmap$0 : int [15]
37  iconst_1
38  ior
39  putfield blevins.example.Something.bitmap$0 : int [15]
42  getstatic scala.runtime.BoxedUnit.UNIT : scala.runtime.BoxedUnit [26]
45  pop
46  aload_1
47  monitorexit
48  aload_0 [this]
49  getfield blevins.example.Something.foo : java.lang.String [20]
52  areturn
53  aload_1
54  monitorexit
55  athrow

Values initialed in tuples like lazy val (x,y) = { ... } have nested caching via the same mechanism. The tuple result is lazily evaluated and cached, and an access of either x or y will trigger the tuple evaluation. Extraction of the individual value from the tuple is done independently and lazily (and cached). So the above double-instantiation code generates an x, y, and an x$1 field of type Tuple2.

Solution 3:

With Scala 2.10, a lazy value like:

class Example {
  lazy val x = "Value";
}

is compiled to byte code that resembles the following Java code:

public class Example {

  private String x;
  private volatile boolean bitmap$0;

  public String x() {
    if(this.bitmap$0 == true) {
      return this.x;
    } else {
      return x$lzycompute();
    }
  }

  private String x$lzycompute() {
    synchronized(this) {
      if(this.bitmap$0 != true) {
        this.x = "Value";
        this.bitmap$0 = true;
      }
      return this.x;
    }
  }
}

Note that the bitmap is represented by a boolean. If you add another field, the compiler will increase the size of the field to being able to represent at least 2 values, i.e. as a byte. This just goes on for huge classes.

But you might wonder why this works? The thread-local caches must be cleared when entering a synchronized block such that the non-volatile x value is flushed into memory. This blog article gives an explanation.

Solution 4:

Scala SIP-20 proposes a new implementation of lazy val, which is more correct but ~25% slower than the "current" version.

The proposed implementation looks like:

class LazyCellBase { // in a Java file - we need a public bitmap_0
  public static AtomicIntegerFieldUpdater<LazyCellBase> arfu_0 =
    AtomicIntegerFieldUpdater.newUpdater(LazyCellBase.class, "bitmap_0");
  public volatile int bitmap_0 = 0;
}
final class LazyCell extends LazyCellBase {
  import LazyCellBase._
  var value_0: Int = _
  @tailrec final def value(): Int = (arfu_0.get(this): @switch) match {
    case 0 =>
      if (arfu_0.compareAndSet(this, 0, 1)) {
        val result = 0
        value_0 = result
        @tailrec def complete(): Unit = (arfu_0.get(this): @switch) match {
          case 1 =>
            if (!arfu_0.compareAndSet(this, 1, 3)) complete()
          case 2 =>
            if (arfu_0.compareAndSet(this, 2, 3)) {
              synchronized { notifyAll() }
            } else complete()
        }
        complete()
        result
      } else value()
    case 1 =>
      arfu_0.compareAndSet(this, 1, 2)
      synchronized {
        while (arfu_0.get(this) != 3) wait()
      }
      value_0
    case 2 =>
      synchronized {
        while (arfu_0.get(this) != 3) wait()
      }
      value_0
    case 3 => value_0
  }
}

As of June 2013 this SIP hasn't been approved. I expect that it's likely to be approved and included in a future version of Scala based on the mailing list discussion. Consequently, I think you'd be wise to heed Daniel Spiewak's observation:

Lazy val is *not* free (or even cheap). Use it only if you absolutely need laziness for correctness, not for optimization.