What's the (hidden) cost of Scala's lazy val?
Solution 1:
This is taken from the scala mailing list and gives implementation details of lazy
in terms of Java code (rather than bytecode):
class LazyTest {
lazy val msg = "Lazy"
}
is compiled to something equivalent to the following Java code:
class LazyTest {
public int bitmap$0;
private String msg;
public String msg() {
if ((bitmap$0 & 1) == 0) {
synchronized (this) {
if ((bitmap$0 & 1) == 0) {
synchronized (this) {
msg = "Lazy";
}
}
bitmap$0 = bitmap$0 | 1;
}
}
return msg;
}
}
Solution 2:
It looks like the compiler arranges for a class-level bitmap int field to flag multiple lazy fields as initialized (or not) and initializes the target field in a synchronized block if the relevant xor of the bitmap indicates it is necessary.
Using:
class Something {
lazy val foo = getFoo
def getFoo = "foo!"
}
produces sample bytecode:
0 aload_0 [this]
1 getfield blevins.example.Something.bitmap$0 : int [15]
4 iconst_1
5 iand
6 iconst_0
7 if_icmpne 48
10 aload_0 [this]
11 dup
12 astore_1
13 monitorenter
14 aload_0 [this]
15 getfield blevins.example.Something.bitmap$0 : int [15]
18 iconst_1
19 iand
20 iconst_0
21 if_icmpne 42
24 aload_0 [this]
25 aload_0 [this]
26 invokevirtual blevins.example.Something.getFoo() : java.lang.String [18]
29 putfield blevins.example.Something.foo : java.lang.String [20]
32 aload_0 [this]
33 aload_0 [this]
34 getfield blevins.example.Something.bitmap$0 : int [15]
37 iconst_1
38 ior
39 putfield blevins.example.Something.bitmap$0 : int [15]
42 getstatic scala.runtime.BoxedUnit.UNIT : scala.runtime.BoxedUnit [26]
45 pop
46 aload_1
47 monitorexit
48 aload_0 [this]
49 getfield blevins.example.Something.foo : java.lang.String [20]
52 areturn
53 aload_1
54 monitorexit
55 athrow
Values initialed in tuples like lazy val (x,y) = { ... }
have nested caching via the same mechanism. The tuple result is lazily evaluated and cached, and an access of either x or y will trigger the tuple evaluation. Extraction of the individual value from the tuple is done independently and lazily (and cached). So the above double-instantiation code generates an x
, y
, and an x$1
field of type Tuple2
.
Solution 3:
With Scala 2.10, a lazy value like:
class Example {
lazy val x = "Value";
}
is compiled to byte code that resembles the following Java code:
public class Example {
private String x;
private volatile boolean bitmap$0;
public String x() {
if(this.bitmap$0 == true) {
return this.x;
} else {
return x$lzycompute();
}
}
private String x$lzycompute() {
synchronized(this) {
if(this.bitmap$0 != true) {
this.x = "Value";
this.bitmap$0 = true;
}
return this.x;
}
}
}
Note that the bitmap is represented by a boolean
. If you add another field, the compiler will increase the size of the field to being able to represent at least 2 values, i.e. as a byte
. This just goes on for huge classes.
But you might wonder why this works? The thread-local caches must be cleared when entering a synchronized block such that the non-volatile x
value is flushed into memory. This blog article gives an explanation.
Solution 4:
Scala SIP-20 proposes a new implementation of lazy val, which is more correct but ~25% slower than the "current" version.
The proposed implementation looks like:
class LazyCellBase { // in a Java file - we need a public bitmap_0
public static AtomicIntegerFieldUpdater<LazyCellBase> arfu_0 =
AtomicIntegerFieldUpdater.newUpdater(LazyCellBase.class, "bitmap_0");
public volatile int bitmap_0 = 0;
}
final class LazyCell extends LazyCellBase {
import LazyCellBase._
var value_0: Int = _
@tailrec final def value(): Int = (arfu_0.get(this): @switch) match {
case 0 =>
if (arfu_0.compareAndSet(this, 0, 1)) {
val result = 0
value_0 = result
@tailrec def complete(): Unit = (arfu_0.get(this): @switch) match {
case 1 =>
if (!arfu_0.compareAndSet(this, 1, 3)) complete()
case 2 =>
if (arfu_0.compareAndSet(this, 2, 3)) {
synchronized { notifyAll() }
} else complete()
}
complete()
result
} else value()
case 1 =>
arfu_0.compareAndSet(this, 1, 2)
synchronized {
while (arfu_0.get(this) != 3) wait()
}
value_0
case 2 =>
synchronized {
while (arfu_0.get(this) != 3) wait()
}
value_0
case 3 => value_0
}
}
As of June 2013 this SIP hasn't been approved. I expect that it's likely to be approved and included in a future version of Scala based on the mailing list discussion. Consequently, I think you'd be wise to heed Daniel Spiewak's observation:
Lazy val is *not* free (or even cheap). Use it only if you absolutely need laziness for correctness, not for optimization.