How atomic are GHC's thunks?
Solution 1:
From the blog post @Lambdageek linked, the GHC Commentary and the GHC User's Guide I piece together the following:
GHC tries to prevent reevaluating thunks, but because true locking between threads is expensive, and thunks are usually pure and so harmless to reevaluate, it normally does so in a sloppy manner, with a small chance of duplicating work anyhow.
The method it uses for avoiding work is to replace thunks by a blackhole, a special marker that tells other threads (or sometimes, the thread itself; that's how <<loop>>
detection happens) that the thunk is being evaluated.
Given this, there are at least three options:
By default, it uses "lazy blackholing", where this is only done before the thread is about to pause. It then "walks" its stack and creates "true" blackholes for new thunks, using locking to ensure each thunk only gets one thread blackholing it, and aborting its own evaluation if it discovers another thread has already blackholed a thunk. This is cheaper as it doesn't need to consider thunks whose evaluation time is so short that it fits completely between two pauses.
With the
-feager-blackholing-flag
, blackholes are instead created as soon as a thunk starts evaluating, and the User's Guide recommends this if you are doing a lot of parallelism. However, because locking on every thunk would be too expensive, these blackholes are cheaper "eager" ones, which are not synchronized with other threads (although other threads can still see them if there's not a race condition). Only when the thread pauses do these get turned into "true" blackholes.A third case, which the blog post was particularly about, is used for special functions like
unsafePerformIO
, for which it's harmful to evaluate a thunk more than once. In this case, the thread uses a "true" blackhole with real locking, but creates it immediately, by inserting an artificial thread pause before the real evaluation.
Solution 2:
To summarize the article linked to in the comments: the thunks in GHC are not strictly atomic between multiple threads: it is possible in a race condition for multiple threads to evaluate the same thunk, duplicating work. But, this is not a big deal in practice because:
- Guaranteed purity means it never matters whether a thunk is evaluated twice, in terms of program semantics.
unsafePerformIO
is a case where that might be an issue, but it turns out thatunsafePerformIO
is careful to avoid re-running the same IO action. -
Because all values are thunks, most thunks turn out to be quite small, so occasionally duplicating the work to force one is not a big deal in terms of program speed. You might imagine that it's expensive to duplicate, for example,
last [1,2..10000000]
because that whole computation is expensive. But of course really the outermost thunk just resolves to another thunk, something like:case [1,2..10000000] of [x] -> x (_:xs) -> last xs [] -> error "empty list"
and if two threads duplicate the work to turn the call to
last
into a use ofcase
, that's pretty cheap in the grand scheme of things.
Solution 3:
Yes, sometimes the same thunk can be evaluated by multiple threads. GHC runtime tries to minimize the probability of duplicated work, so in practice it is rare. Please see "Haskell on a Shared-Memory Multiprocessor" paper for low level details, mostly the "Lock-free thunk evaluation" section. (I'd recommend the paper for every professional haskell developer btw.)