Difference between `data` and `newtype` in Haskell

What is the difference when I write this?

data Book = Book Int Int

versus

newtype Book = Book (Int, Int) -- "Book Int Int" is syntactically invalid

Solution 1:

Great question!

There are several key differences.

Representation

  • A newtype guarantees that your data will have exactly the same representation at runtime, as the type that you wrap.
  • While data declares a brand new data structure at runtime.

So the key point here is that the construct for the newtype is guaranteed to be erased at compile time.

Examples:

  • data Book = Book Int Int

data

  • newtype Book = Book (Int, Int)

newtype

Note how it has exactly the same representation as a (Int,Int), since the Book constructor is erased.

  • data Book = Book (Int, Int)

data tuple

Has an additional Book constructor not present in the newtype.

  • data Book = Book {-# UNPACK #-}!Int {-# UNPACK #-}!Int

enter image description here

No pointers! The two Int fields are unboxed word-sized fields in the Book constructor.

Algebraic data types

Because of this need to erase the constructor, a newtype only works when wrapping a data type with a single constructor. There's no notion of "algebraic" newtypes. That is, you can't write a newtype equivalent of, say,

data Maybe a = Nothing
             | Just a

since it has more than one constructor. Nor can you write

newtype Book = Book Int Int

Strictness

The fact that the constructor is erased leads to some very subtle differences in strictness between data and newtype. In particular, data introduces a type that is "lifted", meaning, essentially, that it has an additional way to evaluate to a bottom value. Since there's no additional constructor at runtime with newtype, this property doesn't hold.

That extra pointer in the Book to (,) constructor allows us to put a bottom value in.

As a result, newtype and data have slightly different strictness properties, as explained in the Haskell wiki article.

Unboxing

It doesn't make sense to unbox the components of a newtype, since there's no constructor. While it is perfectly reasonable to write:

data T = T {-# UNPACK #-}!Int

yielding a runtime object with a T constructor, and an Int# component. You just get a bare Int with newtype.


References:

  • "Newtype" on the Haskell wiki
  • Norman Ramsey's answer about the strictness properties