The actor model: Why is Erlang/OTP special? Could you use another language?

The C++ code does not deal with fairness, isolation, fault detection or distribution which are all things which Erlang brings as part of its actor model.

No actor is allowed to starve any other actor (fairness)
If one actor crashes, it should only affect that actor (isolation)
If one actor crashes, other actors should be able to detect and react to that crash (fault detection)
Actors should be able to communicate over a network as if they were on the same machine (distribution)

Also the beam SMP emulator brings JIT scheduling of the actors, moving them to the core which is at the moment the one with least utilization and also hibernates the threads on certain cores if they are no longer needed.

In addition all the libraries and tools written in Erlang can assume that this is the way the world works and be designed accordingly.

These things are not impossible to do in C++, but they get increasingly hard if you add the fact that Erlang works on almost all of the major hw and os configurations.

edit: Just found a description by Ulf Wiger about what he sees erlang style concurrency as.

I don't like to quote myself, but from Virding's First Rule of Programming

Any sufficiently complicated concurrent program in another language contains an ad hoc informally-specified bug-ridden slow implementation of half of Erlang.

With respect to Greenspun. Joe (Armstrong) has a similar rule.

The problem is not to implement actors, that's not that difficult. The problem is to get everything working together: processes, communication, garbage collection, language primitives, error handling, etc ... For example using OS threads scales badly so you need to do it yourself. It would be like trying to "sell" an OO language where you can only have 1k objects and they are heavy to create and use. From our point of view concurrency is the basic abstraction for structuring applications.

Getting carried away so I will stop here.

This is actually an excellent question, and has received excellent answers that perhaps are yet unconvincing.

To add shade and emphasis to the other great answers already here, consider what Erlang takes away (compared to traditional general purpose languages such as C/C++) in order to achieve fault-tolerance and uptime.

First, it takes away locks. Joe Armstrong's book lays out this thought experiment: suppose your process acquires a lock and then immediately crashes (a memory glitch causes the process to crash, or the power fails to part of the system). The next time a process waits for that same lock, the system has just deadlocked. This could be an obvious lock, as in the AquireScopedLock() call in the sample code; or it could be an implicit lock acquired on your behalf by a memory manager, say when calling malloc() or free().

In any case, your process crash has now halted the entire system from making progress. Fini. End of story. Your system is dead. Unless you can guarantee that every library you use in C/C++ never calls malloc and never acquires a lock, your system is not fault tolerant. Erlang systems can and do kill processes at will when under heavy load in order make progress, so at scale your Erlang processes must be killable (at any single point of execution) in order to maintain throughput.

There is a partial workaround: using leases everywhere instead of locks, but you have no guarantee that all the libraries you utilize also do this. And the logic and reasoning about correctness gets really hairy quickly. Moreover leases recover slowly (after the timeout expires), so your entire system just got really slow in the face of failure.

Second, Erlang takes away static typing, which in turn enables hot code swapping and running two versions of the same code simultaneously. This means you can upgrade your code at runtime without stopping the system. This is how systems stay up for nine 9's or 32 msec of downtime/year. They are simply upgraded in place. Your C++ functions will have to be manually re-linked in order to be upgraded, and running two versions at the same time is not supported. Code upgrades require system downtime, and if you have a large cluster that cannot run more than one version of code at once, you'll need to take the entire cluster down at once. Ouch. And in the telecom world, not tolerable.

In addition Erlang takes away shared memory and shared shared garbage collection; each light weight process is garbage collected independently. This is a simple extension of the first point, but emphasizes that for true fault tolerance you need processes that are not interlocked in terms of dependencies. It means your GC pauses compared to java are tolerable (small instead of pausing a half-hour for a 8GB GC to complete) for big systems.

The actor model: Why is Erlang/OTP special? Could you use another language?

Related

Recent Posts