Why does a small change to this Scala code make such a huge difference to performance?

I'm running on a 32-bit Debian 6.0 (Squeeze) system (a 2.5 GHz Core 2 CPU), sun-java6 6.24-1 but with the Scala 2.8.1 packages from Wheezy.

This code, compiled with scalac -optimise, takes over 30 seconds to run:

object Performance {

  import scala.annotation.tailrec

  @tailrec def gcd(x:Int,y:Int):Int = {
    if (x == 0)
      y 
    else 
      gcd(y%x,x)
  }

  val p = 1009
  val q = 3643
  val t = (p-1)*(q-1)

  val es = (2 until t).filter(gcd(_,t) == 1)
  def main(args:Array[String]) {
    println(es.length)
  }
}

But if I make the trivial change of moving the val es= one line down and inside the scope of main, then it runs in just 1 second, which is much more like I was expecting to see and comparable with the performance of equivalent C++. Interestingly, leaving the val es= where it is but qualifying it with lazy also has the same accelerating effect.

What's going on here? Why is performing the calculation outside function scope so much slower?


The JVM doesn't optimize static initializers (which is what this is) to the same level that it optimizes method calls. Unfortunately, when you do a lot of work there, that hurts performance--this is a perfect example of that. This is also one reason why the old Application trait was considered problematic, and why there is in Scala 2.9 a DelayedInit trait that gets a bit of compiler help to move stuff from the initializer into a method that's called later on.


(Edit: fixed "constructor" to "initializer". Rather lengthy typo!)


Code inside a top-level object block is translated to a static initializer on the object's class. The equivalent in Java would be

class Performance{
    static{
      //expensive calculation
    }
    public static void main(String[] args){
      //use result of expensive calculation
    }
}

The HotSpot JVM doesn't perform any optimizations on code encountered during static initializers, under the reasonable heuristic that such code will only be run once.