Is F# really faster than Erlang at spawning and killing processes?

Updated: This question contains an error which makes the benchmark meaningless. I will attempt a better benchmark comparing F# and Erlang's basic concurrency functionality and inquire about the results in another question.

I am trying do understand the performance characteristics of Erlang and F#. I find Erlang's concurrency model very appealing but am inclined to use F# for interoperability reasons. While out of the box F# doesn't offer anything like Erlang's concurrency primitives -- from what I can tell async and MailboxProcessor only cover a small portion of what Erlang does well -- I've been trying to understand what is possible in F# performance wise.

In Joe Armstrong's Programming Erlang book, he makes the point that processes are very cheap in Erlang. He uses the (roughly) the following code to demonstrate this fact:

-module(processes).
-export([max/1]).

%% max(N) 
%%   Create N processes then destroy them
%%   See how much time this takes

max(N) ->
    statistics(runtime),
    statistics(wall_clock),
    L = for(1, N, fun() -> spawn(fun() -> wait() end) end),
    {_, Time1} = statistics(runtime),
    {_, Time2} = statistics(wall_clock),
    lists:foreach(fun(Pid) -> Pid ! die end, L),
    U1 = Time1 * 1000 / N,
    U2 = Time2 * 1000 / N,
    io:format("Process spawn time=~p (~p) microseconds~n",
          [U1, U2]).

wait() ->
    receive
        die -> void
    end.

for(N, N, F) -> [F()];
for(I, N, F) -> [F()|for(I+1, N, F)].

On my Macbook Pro, spawning and killing 100 thousand processes (processes:max(100000)) takes about 8 microseconds per processes. I can raise the number of processes a bit further, but a million seems to break things pretty consistently.

Knowing very little F#, I tried to implement this example using async and MailBoxProcessor. My attempt, which may be wrong, is as follows:

#r "System.dll"
open System.Diagnostics

type waitMsg =
    | Die

let wait =
    MailboxProcessor.Start(fun inbox ->
        let rec loop =
            async { let! msg = inbox.Receive()
                    match msg with 
                    | Die -> return() }
        loop)

let max N =
    printfn "Started!"
    let stopwatch = new Stopwatch()
    stopwatch.Start()
    let actors = [for i in 1 .. N do yield wait]
    for actor in actors do
        actor.Post(Die)
    stopwatch.Stop()
    printfn "Process spawn time=%f microseconds." (stopwatch.Elapsed.TotalMilliseconds * 1000.0 / float(N))
    printfn "Done."

Using F# on Mono, starting and killing 100,000 actors/processors takes under 2 microseconds per process, roughly 4 times faster than Erlang. More importantly, perhaps, is that I can scale up to millions of processes without any apparent problems. Starting 1 or 2 million processes still takes about 2 microseconds per process. Starting 20 million processors is still feasible, but slows to about 6 microseconds per process.

I have not yet taken the time to fully understand how F# implements async and MailBoxProcessor, but these results are encouraging. Is there something I'm doing horribly wrong?

If not, is there some place Erlang will likely outperform F#? Is there any reason Erlang's concurrency primitives can't be brought to F# through a library?

EDIT: The above numbers are wrong, due to the error Brian pointed out. I will update the entire question when I fix it.


Solution 1:

In your original code, you only started one MailboxProcessor. Make wait() a function, and call it with each yield. Also you are not waiting for them to spin up or receive the messages, which I think invalidates the timing info; see my code below.

That said, I have some success; on my box I can do 100,000 at about 25us each. After too much more, I think possibly you start fighting the allocator/GC as much as anything, but I was able to do a million too (at about 27us each, but at this point was using like 1.5G of memory).

Basically each 'suspended async' (which is the state when a mailbox is waiting on a line like

let! msg = inbox.Receive()

) only takes some number of bytes while it's blocked. That's why you can have way, way, way more asyncs than threads; a thread typically takes like a megabyte of memory or more.

Ok, here's the code I'm using. You can use a small number like 10, and --define DEBUG to ensure the program semantics are what is desired (printf outputs may be interleaved, but you'll get the idea).

open System.Diagnostics 

let MAX = 100000

type waitMsg = 
    | Die 

let mutable countDown = MAX
let mre = new System.Threading.ManualResetEvent(false)

let wait(i) = 
    MailboxProcessor.Start(fun inbox -> 
        let rec loop = 
            async { 
#if DEBUG
                printfn "I am mbox #%d" i
#endif                
                if System.Threading.Interlocked.Decrement(&countDown) = 0 then
                    mre.Set() |> ignore
                let! msg = inbox.Receive() 
                match msg with  
                | Die -> 
#if DEBUG
                    printfn "mbox #%d died" i
#endif                
                    if System.Threading.Interlocked.Decrement(&countDown) = 0 then
                        mre.Set() |> ignore
                    return() } 
        loop) 

let max N = 
    printfn "Started!" 
    let stopwatch = new Stopwatch() 
    stopwatch.Start() 
    let actors = [for i in 1 .. N do yield wait(i)] 
    mre.WaitOne() |> ignore // ensure they have all spun up
    mre.Reset() |> ignore
    countDown <- MAX
    for actor in actors do 
        actor.Post(Die) 
    mre.WaitOne() |> ignore // ensure they have all got the message
    stopwatch.Stop() 
    printfn "Process spawn time=%f microseconds." (stopwatch.Elapsed.TotalMilliseconds * 1000.0 / float(N)) 
    printfn "Done." 

max MAX

All this said, I don't know Erlang, and I have not thought deeply about whether there's a way to trim down the F# any more (though it's pretty idiomatic as-is).

Solution 2:

Erlang's VM doesn't uses OS threads or process to switch to new Erlang process. It's VM simply counts function calls into your code/process and jumps to other VM's process after some (into same OS process and same OS thread).

CLR uses mechanics based on OS process and threads, so F# has much higher overhead cost for each context switch.

So answer to your question is "No, Erlang is much faster than spawning and killing processes".

P.S. You can find results of that practical contest interesting.