In what sense is the IO Monad pure?

I've had the IO monad described to me as a State monad where the state is "the real world". The proponents of this approach to IO argue that this makes IO operations pure, as in referentially transparent. Why is that? From my perspective it appears that code inside the IO monad have plenty of observable side effects. Also, isn't it possible to describe pretty much any non-pure function like a function of the real world? For example, can't we think of, say, C's malloc as being a function that takes a RealWorld and an Int and returns a pointer and a RealWorld, only just like in the IO monad the RealWorld is implicit?

Note: I know what a monad is and how it's used. Please don't respond with a link to a random monad tutorial unless it specifically adresses my question.


I think the best explanation I've heard was actually fairly recently on SO. IO Foo is a recipe for creating a Foo. Another common, more literal, way of saying this is that it is a "program that produces a Foo". It can be executed (many times) to create a Foo or die trying. The execution of the recipe/program is what we ultimately want (otherwise, why write one?), but the thing that is represented by an IO action in our code is the recipe itself.

That recipe is a pure value, in the same exact sense that a String is a pure value. Recipes can be combined and manipulated in interesting, sometimes astonishing, ways, but the many ways these recipes can be combined (except for the blatantly non-pure unsafePerformIO, unsafeCoerce, etc.) are all completely referentially transparent, deterministic, and all that nice stuff. The resulting recipe depends in absolutely no way whatsoever on the state of anything other than the recipes that it was built up from.


Also, isn't it possible to describe pretty much any non-pure function like a function of the real world? For example, can't we think of, say, C's malloc as being a function that takes a RealWorld and an Int and returns a pointer and a RealWorld, only just like in the IO monad the RealWorld is implicit?

For sure ...

The whole idea of functional programming is to describe programs as a combination of small, independent calculations building up bigger computations.

Having these independent calculations, you'll have lots of benefits, reaching from concise programs to efficient and efficiently parallelizable codes, laziness up to the the rigorous guarantee that control flows as intended - with no chance of interference or corruption of arbitrary data.

Now - in some cases (like IO), we need impure code. Calculations involving such operations cannot be independent - they could mutate arbitrary data of another computation.

The point is - Haskell is always pure, IO doesn't change this.

So, our impure, non-independent codes have to get a common dependency - we have to pass a RealWorld. So whatever stateful computation we want to run, we have to pass this RealWorld thing to apply our changes to - and whatever other stateful computation wants to see or make changes has to know the RealWorld too.

Whether this is done explicitly or implicitly through the IO monad is irrelevant. You build up a Haskell program as a giant computation that transforms data, and one part of this data is the RealWorld.

Once the initial main :: IO () gets called when your program is run with the current real world as a parameter, this real world gets carried through all impure calculations involved, just as data would in a State. That's what monadic >>= (bind) takes care of.

And where the RealWorld doesn't get (as in pure computations or without any >>=-ing to main), there is no chance of doing anything with it. And where it does get, that happened by purely functional passing of an (implicit) parameter. That's why

let foo = putStrLn "AAARGH" in 42

does absolutely nothing - and why the IO monad - like anything else - is pure. What happens inside this code can of course be impure, but it's all caught inside, with no chance of interfering with non-connected computations.


Suppose we have something like:

animatePowBoomWhenHearNoiseInMicrophone :: TimeDiff -> Sample -> IO ()
animatePowBoomWhenHearNoiseInMicrophone
    levelWeightedAverageHalfLife levelThreshord = ...

programA :: IO ()
programA = animatePowBoomWhenHearNoiseInMicrophone 3 10000

programB :: IO ()
programB = animatePowBoomWhenHearNoiseInMicrophone 3 10000

Here's a point of view:

animatePowBoomWhenHearNoiseInMicrophone is a pure function in the sense that its results for same input, programA and programB, are exactly the same. You can do main = programA or main = programB and it would be exactly the same.

animatePowBoomWhenHearNoiseInMicrophone is a function receiving two arguments and resulting in a description of a program. The Haskell runtime can execute this description if you set main to it or otherwise include it in main via binding.

What is IO? IO is a DSL for describing imperative programs, encoded in "pure-haskell" data structures and functions.

"complete-haskell" aka GHC is an implementation of both "pure-haskell", and an imperative implementation of an IO decoder/executer.


It quite simply comes down to extensional equality:

If you were to call getLine twice, then both calls would return an IO String which would look exactly the same on the outside each time. If you were to write a function to take 2 IO Strings and return a Bool to signal a detected difference between them both, it would not be possible to detect any difference from any observable properties. It could not ask any other function whether they are equal and any attempt at using >>= must also return something in IO which all are equall externally.


I'll let Martin Odersky answer this

The IO monad does not make a function pure. It just makes it obvious that it's impure.

Sounds clear enough.