How do JavaScript closures work at a low level?

Solution 1:

This is a section of slebetman's answer to the question javascript can't access private properties that answers your question very well.

The Stack:

A scope is related to the stack frame (in Computer Science it's called the "activation record" but most developers familiar with C or assembly know it better as stack frame). A scope is to a stack frame what a class is to an object. By that I mean that where an object is an instance of a class, a stack frame is an instance of scope.

Let's use a made-up language as an example. In this language, like in javascript, functions define scope. Lets take a look at an example code:

var global_var

function b {
    var bb
}

function a {
    var aa
    b();
}

When we read the code above, we say that the variable aa is in scope in function a and the variable bb is in scope in function b. Note that we don't call this thing private variables. Because the opposite of private variables are public variables and both refer to properties bound to objects. Instead we call aa and bb local variables. The opposite of local variables are global variables (not public variables).

Now, let's see what happens when we call a:

a() gets called, create a new stack frame. Allocate space for local variables on the stack:

The stack:
 ┌────────┐
 │ var aa │ <── a's stack frame
 ╞════════╡
 ┆        ┆ <── caller's stack frame

a() calls b(), create a new stack frame. Allocate space for local variables on the stack:

The stack:
 ┌────────┐
 │ var bb │ <── b's stack frame
 ╞════════╡
 │ var aa │
 ╞════════╡
 ┆        ┆

In most programming languages, and this includes javascript, a function only has access to its own stack frame. Thus a() cannot access local variables in b() and neither can any other function or code in global scope access variables in a(). The only exception are variables in global scope. From an implementation point of view this is achieved by allocating global variables in an area of memory that does not belong to the stack. This is generally called the heap. So to complete the picture the memory at this point looks like this:

The stack:     The heap:
 ┌────────┐   ┌────────────┐
 │ var bb │   │ global_var │
 ╞════════╡   │            │
 │ var aa │   └────────────┘
 ╞════════╡
 ┆        ┆

(as a side note, you can also allocate variables on the heap inside functions using malloc() or new)

Now b() completes and returns, it's stack frame is removed from the stack:

The stack:     The heap:
 ┌────────┐   ┌────────────┐
 │ var aa │   │ global_var │
 ╞════════╡   │            │
 ┆        ┆   └────────────┘

and when a() completes the same happens to its stack frame. This is how local variables gets allocated and freed automatically - via pushing and popping objects off the stack.

Closures:

A closure is a more advanced stack frame. But whereas normal stack frames gets deleted once a function returns, a language with closures will merely unlink the stack frame (or just the objects it contains) from the stack while keeping a reference to the stack frame for as long as it's required.

Now let's look at an example code of a language with closures:

function b {
    var bb
    return function {
        var cc
    }
}

function a {
    var aa
    return b()
}

Now let's see what happens if we do this:

var c = a()

First function a() is called which in turn calls b(). Stack frames are created and pushed onto the stack:

The stack:
 ┌────────┐
 │ var bb │
 ╞════════╡
 │ var aa │
 ╞════════╡
 │ var c  │
 ┆        ┆

Function b() returns, so it's stack frame is popped off the stack. But, function b() returns an anonymous function which captures bb in a closure. So we pop off the stack frame but don't delete it from memory (until all references to it has been completely garbage collected):

The stack:             somewhere in RAM:
 ┌────────┐           ┌╶╶╶╶╶╶╶╶╶┐
 │ var aa │           ┆ var bb  ┆
 ╞════════╡           └╶╶╶╶╶╶╶╶╶┘
 │ var c  │
 ┆        ┆

a() now returns the function to c. So the stack frame of the call to b() gets linked to the variable c. Note that it's the stack frame that gets linked, not the scope. It's kind of like if you create objects from a class it's the objects that gets assigned to variables, not the class:

The stack:             somewhere in RAM:
 ┌────────┐           ┌╶╶╶╶╶╶╶╶╶┐
 │ var c╶╶├╶╶╶╶╶╶╶╶╶╶╶┆ var bb  ┆
 ╞════════╡           └╶╶╶╶╶╶╶╶╶┘
 ┆        ┆

Also note that since we haven't actually called the function c(), the variable cc is not yet allocated anywhere in memory. It's currently only a scope, not yet a stack frame until we call c().

Now what happens when we call c()? A stack frame for c() is created as normal. But this time there is a difference:

The stack:
 ┌────────┬──────────┐
 │ var cc    var bb  │  <──── attached closure
 ╞════════╤──────────┘
 │ var c  │
 ┆        ┆

The stack frame of b() is attached to the stack frame of c(). So from the point of view of function c() it's stack also contains all the variables that were created when function b() was called (Note again, not the variables in function b() but the variables created when function b() was called - in other words, not the scope of b() but the stack frame created when calling b(). The implication is that there is only one possible function b() but many calls to b() creating many stack frames).

But the rules of local and global variables still applies. All variables in b() become local variables to c() and nothing else. The function that called c() has no access to them.

What this means is that when you redefine c in the caller's scope like this:

var c = function {/* new function */}

this happens:

                     somewhere in RAM:
                           ┌╶╶╶╶╶╶╶╶╶┐
                           ┆ var bb  ┆
                           └╶╶╶╶╶╶╶╶╶┘
The stack:
 ┌────────┐           ┌╶╶╶╶╶╶╶╶╶╶╶╶╶╶╶╶╶╶╶╶┐
 │ var c╶╶├╶╶╶╶╶╶╶╶╶╶╶┆ /* new function */ ┆
 ╞════════╡           └╶╶╶╶╶╶╶╶╶╶╶╶╶╶╶╶╶╶╶╶┘
 ┆        ┆

As you can see, it's impossible to regain access to the stack frame from the call to b() since the scope that c belongs to doesn't have access to it.

Solution 2:

I've written an article on this topic: How do JavaScript closures work under the hood: the illustrated explanation.

To understand the subject, we need to know how scope objects (or LexicalEnvironments) are allocated, used and deleted. This understanding is a key to having a big picture and to know how do closures work under the hood.

I'm not going to re-type the whole article here, but as a short example, consider this script:

"use strict";

var foo = 1;
var bar = 2;

function myFunc() {
  //-- define local-to-function variables
  var a = 1;
  var b = 2;
  var foo = 3;
}

//-- and then, call it:
myFunc();

When executing the top-level code, we have the following arrangement of scope objects:

enter image description here

Notice that myFunc references both:

  • Function object (which contains code and any other publicly-available properties)
  • Scope object, which was active by the time function is defined.

And when myFunc() is called, we have the following scope chain:

enter image description here

When function is called, new scope object is created and used to augment the scope chain referenced by the myFunc. It allows us to achieve very powerful effect when we define some inner function, and then call it outside of the outer function.

See the aforementioned article, it explains things in detail.

Solution 3:

Here is an example of how you can transform code that needs closures into code that doesn't. The essential points to pay attention to are: how function declarations are transformed, how function calls are transformed, and how accesses to local variables that have been moved to the heap are transformed.

Input:

var f = function (x) {
  x = x + 10
  var g = function () {
    return ++x
  }
  return g
}

var h = f(3)
console.log(h()) // 14
console.log(h()) // 15

Output:

// Header that goes at the top of the program:

// A list of environments, starting with the one
// corresponding to the innermost scope.
function Envs(car, cdr) {
  this.car = car
  this.cdr = cdr
}

Envs.prototype.get = function (k) {
    var e = this
    while (e) {
        if (e.car.get(k)) return e.car.get(k)
        e = e.cdr
    }
    // returns undefined if lookup fails
}

Envs.prototype.set = function (k, v) {
    var e = this
    while (e) {
        if (e.car.get(k)) {
            e.car.set(k, v)
            return this
        }
        e = e.cdr
    }
    throw new ReferenceError()
}

// Initialize the global scope.
var envs = new Envs(new Map(), null)

// We have to use this special function to call our closures.
function call(f, ...args) {
    return f.func(f.envs, ...args)
}

// End of header.

var f = {
    func: function (envs, x) {
        envs = new Envs(new Map().set('x',x), envs)

        envs.set('x', envs.get('x') + 10))
        var g = {
            func: function (envs) {
                envs = new Envs(new Map(), envs)
                return envs.set('x', envs.get('x') + 1).get('x')
            },
            envs: envs
        }
        return g
    },
    envs: envs
}

var h = call(f, 3)
console.log(call(h)) // 14
console.log(call(h)) // 15

Let's break down how the three key transformations go. For the function declaration case, assume for concreteness that we have a function of two arguments x and y and one local variable z, and x and z can escape the stack frame and so need to be moved to the heap. Because of hoisting we may assume that z is declared at the beginning of the function.

Input:

var f = function f(x, y) {
    var z = 7
    ...
}

Output:

var f = {
    func: function f(envs, x, y) {
        envs = new Envs(new Map().set('x',x).set('z',7), envs)
        ...
    }
    envs: envs
}

That's the tricky part. The rest of the transformation just consists in using call to call the function and replacing accesses to the variables moved to the heap with lookups in envs.

A couple of caveats.

  1. How did we know that x and z needed to be moved to the heap but not y? Answer: the simplest (but possibly not optimal) thing is to just move anything to the heap that is referenced in an enclosed function body.

  2. The implementation I have given leaks a ton of memory and requires function calls to access access local variables moved to the heap instead of inlining that. A real implementation wouldn't do these things.

Finally, user3856986 posted an answer that makes some different assumptions than mine, so let's compare it.

The main difference is that I assumed that local variables would be kept on a traditional stack, while user3856986's answer only makes sense if the stack will be implemented as some kind of structure on the heap (but he or she is not very explicit about this requirement). A heap implementation like this can work, though it will put more load on the allocator and GC since you have to allocate and collect stack frames on the heap. With modern GC technology, this can be more efficient than you might think, but I believe that the commonly used VMs do use traditional stacks.

Also, something left vague in user3856986's answer is how the closure gets a reference to the relevant stack frame. In my code, this happens when the envs property is set on the closure while that stack frame is executing.

Finally, user3856986 writes, "All variables in b() become local variables to c() and nothing else. The function that called c() has no access to them." This is a little misleading. Given a reference to the closure c, the only thing that stops one from getting access to the closed variables from the call to b is the type system. One could certainly access these variables from assembly (otherwise, how could c access them?). On the other hand, as for the true local variables of c, it doesn't even make sense to ask if you can get access to them until some particular invocation of c has been specified (and if we consider some particular call, by the time control gets back to the caller, the information stored in them might already have been destroyed).