What is () in Haskell, exactly?
I'm reading Learn You a Haskell, and in the monad chapters, it seems to me that ()
is being treated as a sort of "null" for every type. When I check the type of ()
in GHCi, I get
>> :t ()
() :: ()
which is an extremely confusing statement. It seems that ()
is a type all to itself. I'm confused as to how it fits into the language, and how it seems to be able to stand for any type.
tl;dr ()
does not add a "null" value to every type, hell no; ()
is a "dull" value in a type of its own: ()
.
Let me step back from the question a moment and address a common source of confusion. A key thing to absorb when learning Haskell is the distinction between its expression language and its type language. You're probably aware that the two are kept separate. But that allows the same symbol to be used in both, and that is what is going on here. There are simple textual cues to tell you which language you're looking at. You don't need to parse the whole language to detect these cues.
The top level of a Haskell module lives, by default, in the expression language. You define functions by writing equations between expressions. But when you see foo :: bar in the expression language, it means that foo is an expression and bar is its type. So when you read () :: ()
, you're seeing a statement which relates the ()
in the expression language with the ()
in the type language. The two ()
symbols mean different things, because they are not in the same language. This replication often causes confusion for beginners, until the expression/type language separation installs itself in their subconscious, at which point it becomes helpfully mnemonic.
The keyword data
introduces a new datatype declaration, involving a careful mixture of the expression and type languages, as it says first what the new type is, and secondly what its values are.
data TyCon tyvar ... tyvar = ValCon1 type ... type | ... | ValConn type ... type
In such a declaration, type constructor TyCon is being added to the type language and the ValCon value constructors are being added to the expression language (and its pattern sublanguage). In a data
declaration, the things which stand in argument places for the ValCons tell you the types given to the arguments when that ValCon is used in expressions. For example,
data Tree a = Leaf | Node (Tree a) a (Tree a)
declares a type constructor Tree
for binary tree types storing a
elements at nodes, whose values are given by value constructors Leaf
and Node
. I like to colour type constructors (Tree
) blue and value constructors (Leaf
, Node
) red. There should be no blue in expressions and (unless you're using advanced features) no red in types. The built-in type Bool
could be declared,
data Bool = True | False
adding blue Bool
to the type language, and red True
and False
to the expression language. Sadly, my markdown-fu is inadequate to the task of adding the colours to this post, so you'll just have to learn to add the colours in your head.
The "unit" type uses ()
as a special symbol, but it works as if declared
data () = () -- the left () is blue; the right () is red
meaning that a notionally blue ()
is a type constructor in the type language, but that a notionally red ()
is a value constructor in the expression language, and indeed () :: ()
. [ It is not the only example of such a pun. The types of larger tuples follow the same pattern: pair syntax is as if given by
data (a, b) = (a, b)
adding (,)
to both type and expression languages. But I digress.]
So the type ()
, often pronounced "Unit", is a type containing one value worth speaking of: that value is also written ()
but in the expression language, and is sometimes pronounced "void". A type with only one value is not very interesting. A value of type ()
contributes zero bits of information: you already know what it must be. So, while there is nothing special about type ()
to indicate side effects, it often shows up as the value component in a monadic type. Monadic operations tend to have types which look like
val-in-type-1 -> ... -> val-in-type-n -> effect-monad val-out-type
where the return type is a type application: the (type) function tells you which effects are possible and the (type) argument tells you what sort of value is produced by the operation. For example
put :: s -> State s ()
which is read (because application associates to the left ["as we all did in the sixties", Roger Hindley]) as
put :: s -> (State s) ()
has one value input type s
, the effect-monad State s
, and the value output type ()
. When you see ()
as a value output type, that just means "this operation is used only for its effect; the value delivered is uninteresting". Similarly
putStr :: String -> IO ()
delivers a string to stdout
but does not return anything exciting.
The ()
type is also useful as an element type for container-like structures, where it indicates that the data consists just of a shape, with no interesting payload. For example, if Tree
is declared as above, then Tree ()
is the type of binary tree shapes, storing nothing of interest at nodes. Similarly [()]
is the type of lists of dull elements, and if there is nothing of interest in a list's elements, then the only information it contributes is its length.
To sum up, ()
is a type. Its one value, ()
, happens to have the same name, but that's ok because the type and expression languages are separate. It's useful to have a type representing "no information" because, in context (e.g., of a monad or a container), it tells you that only the context is interesting.
The ()
type can be thought of as a zero-element tuple. It's a type that can only have one value, and thus it's used where you need to have a type, but you don't actually need to convey any information. Here's a couple of uses for this.
Monadic things like IO
and State
have a return value, as well as performing side-effects. Sometimes the only point of the operation is to perform a side-effect, like writing to the screen or storing some state. For writing to the screen, putStrLn
must have type String -> IO ?
-- IO
always has to have some return type, but here there's nothing useful to return. So what type should we return? We could say Int, and always return 0, but that's misleading. So we return ()
, the type that has only one value (and thus no useful information), to indicate that there's nothing useful coming back.
It's sometimes useful to have a type which can have no useful values. Consider if you'd implemented a type Map k v
which maps keys of type k
to values of type v
. Then you want to implement a Set
, which is really similar to a map except that you don't need the value part, just the keys. In a language like Java you might use booleans as the dummy value type, but really you just want a type that has no useful values. So you could say type Set k = Map k ()
It should be noted that ()
is not particularly magic. If you want you can store it in a variable and do a pattern match on it (although there's not much point):
main = do
x <- putStrLn "Hello"
case x of
() -> putStrLn "The only value..."