What parallel programming model do you recommend today to take advantage of the manycore processors of tomorrow?

If you were writing a new application from scratch today, and wanted it to scale to all the cores you could throw at it tomorrow, what parallel programming model/system/language/library would you choose? Why?

I am particularly interested in answers along these axes:

  1. Programmer productivity / ease of use (can mortals successfully use it?)
  2. Target application domain (what problems is it (not) good at?)
  3. Concurrency style (does it support tasks, pipelines, data parallelism, messages...?)
  4. Maintainability / future-proofing (will anybody still be using it in 20 years?)
  5. Performance (how does it scale on what kinds of hardware?)

I am being deliberately vague on the nature of the application in anticipation of getting good general answers useful for a variety of applications.


Solution 1:

Multi-core programming may actually require more than one paradigm. Some current contenders are:

  1. MapReduce. This works well where a problem can be easily decomposed into parallel chunks.
  2. Nested Data Parallelism. This is similar to MapReduce, but actually supports recursive decomposition of a problem, even when the recursive chunks are of irregular size. Look for NDP to be a big win in purely functional languages running on massively parallel but limited hardware (like GPUs).
  3. Software Transactional Memory. If you need traditional threads, STM makes them bearable. You pay a 50% performance hit in critical sections, but you can scale complex locking schemes to 100s of processors without pain. This will not, however, work for distributed systems.
  4. Parallel object threads with messaging. This really clever model is used by Erlang. Each "object" becomes a lightweight thread, and objects communicate by asynchronous messages and pattern matching. It's basically true parallel OO. This has succeeded nicely in several real-world applications, and it works great for unreliable distributed systems.

Some of these paradigms give you maximum performance, but only work if the problem decomposes cleanly. Others sacrifice some performance, but allow a wider variety of algorithms. I suspect that some combination of the above will ultimately become a standard toolkit.

Solution 2:

Two solutions I really like are join calculus (JoCaml, Polyphonic C#, Cω) and the actor model (Erlang, Scala, E, Io).

I'm not particularly impressed with Software Transactional Memory. It just feels like it's only there to allow threads to cling on to life a little while longer, even though they should have died decades ago. However, it does have three major advantages:

  1. People understand transactions in databases
  2. There is already talk of transactional RAM hardware
  3. As much as we all wish them gone, threads are probably going to be the dominant concurrency model for the next couple of decades, sad as that may be. STM could significantly reduce the pain.

Solution 3:

The mapreduce/hadoop paradigm is useful, and relevant. Especially for people who are used to languages like perl, the idea of mapping over an array and doing some action on each element should come pretty fluidly and naturally, and mapreduce/hadoop just takes it to the next stage and says that there's no reason that each element of the array need be processed on the same machine.

In a sense it's more battle tested, because Google is using mapreduce and plenty of people have been using hadoop, and has shown that it works well for scaling applications across multiple machines over the network. And if you can scale over multiple machines across the network, you can scale over multiple cores in a single machine.