'type family' vs 'data family', in brief?
Solution 1:
(Boosting useful information from comments into an answer.)
Standalone vs In-Class declaration
Two syntactically different ways to declare a type family and/or data family, that are semantically equivalent:
standalone:
type family Foo data family Bar
or as part of a typeclass:
class C where
type Foo
data Bar
both declare a type family, but inside a typeclass the family
part is implied by the class
context, so GHC/Haskell abbreviates the declaration.
"New type" vs "Type Synonym" / "Type Alias"
data family F
creates a new type, similar to how data F = ...
creates a new type.
type family F
does not create a new type, similar to how type F = Bar Baz
doesn't create a new type (it just creates an alias/synonym to an existing type).
Example of non-injectivity of type family
An example (slightly modified) from Data.MonoTraversable.Element
:
import Data.ByteString as S
import Data.ByteString.Lazy as L
-- Declare a family of type synonyms, called `Element`
-- `Element` has kind `* -> *`; it takes one parameter, which we call `container`
type family Element container
-- ByteString is a container for Word8, so...
-- The Element of a `S.ByteString` is a `Word8`
type instance Element S.ByteString = Word8
-- and the Element of a `L.ByteString` is also `Word8`
type instance Element L.ByteString = Word8
In a type family, the right-side of equations Word8
names an existing type; the things are the left-side creates new synonyms: Element S.ByteString
and Element L.ByteString
Having a synonym means we can interchange Element Data.ByteString
with Word8
:
-- `w` is a Word8....
>let w = 0::Word8
-- ... and also an `Element L.ByteString`
>:t w :: Element L.ByteString
w :: Element L.ByteString :: Word8
-- ... and also an `Element S.ByteString`
>:t w :: Element S.ByteString
w :: Element S.ByteString :: Word8
-- but not an `Int`
>:t w :: Int
Couldn't match expected type `Int' with actual type `Word8'
These type synonyms are "non-injective" ("one-way"), and therefore non-invertible.
-- As before, `Word8` matches `Element L.ByteString` ...
>(0::Word8)::(Element L.ByteString)
-- .. but GHC can't infer which `a` is the `Element` of (`L.ByteString` or `S.ByteString` ?):
>(w)::(Element a)
Couldn't match expected type `Element a'
with actual type `Element a0'
NB: `Element' is a type function, and may not be injective
The type variable `a0' is ambiguous
Worse, GHC can't even solve non-ambiguous cases!:
type instance Element Int = Bool
> True::(Element a)
> NB: `Element' is a type function, and may not be injective
Note the use of "may not be"! I think GHC is being conservative, and refusing to check whether the Element
truly is injective. (Perhaps because a programmer could add another type instance
later, after importing a pre-compiled module, adding ambiguity.
Example of injectivity of data family
In contrast: In a data family, each right-hand side contains a unique constructor , so the definitions are injective ("reversible") equations.
-- Declare a list-like data family
data family XList a
-- Declare a list-like instance for Char
data instance XList Char = XCons Char (XList Char) | XNil
-- Declare a number-like instance for ()
data instance XList () = XListUnit Int
-- ERROR: "Multiple declarations of `XListUnit'"
data instance XList () = XListUnit Bool
-- (Note: GHCI accepts this; the new declaration just replaces the previous one.)
With data family
, seeing the constructor name on the right (XCons
, or XListUnit
)
is enough to let the type-inferencer know we must be working with XList ()
not an XList Char
. Since constructor names are unique, these defintions are injective/reversible.
If type
"just" declares a synonym, why is it semantically useful?
Usually, type
synonyms are just abbreviations, but type
family synonyms have added power: They can make a simple type (kind *
) become a synonym of a "type with kind * -> *
applied to an argument":
type instance F A = B
makes B
match F a
. This is used, for example, in Data.MonoTraversable
to make a simple type Word8
match functions of the type Element a -> a
(Element
is defined above).
For example, (a bit silly), suppose we have a version of const
that only works with "related" types:
> class Const a where constE :: (Element a) -> a -> (Element a)
> instance Const S.ByteString where constE = const
> constE (0::Word8) undefined
ERROR: Couldn't match expected type `Word8' with actual type `Element a0'
-- By providing a match `a = S.ByteString`, `Word8` matches `(Element S.ByteString)`
> constE (0::Word8) (undefined::S.ByteString)
0
-- impossible, since `Char` cannot match `Element a` under our current definitions.
> constE 'x' undefined
Solution 2:
I don't think any decision tree or Venn diagram will exist because the applications for type and data families are pretty wide.
Generally you've already highlighted the key design differences and I would agree with your takeaway to first see if you can get away with data family
.
For me the key point is that each instance of a data family
creates a new type, which does substantially limit the power because you can't do what is often the most natural thing and make an existing type be the instance.
For example the GMapKey
example on the Haskell wiki page on "indexed types" is a reasonably natural fit for data families:
class GMapKey k where
data GMap k :: * -> *
empty :: GMap k v
lookup :: k -> GMap k v -> Maybe v
insert :: k -> v -> GMap k v -> GMap k v
The key type of the map k
is the argument to the data family and the actual map type is the result of the data family (GMap k
) . As a user of a GMapKey
instance you're probably quite happy for the GMap k
type to be abstract to you and just manipulate it via the generic map operations in the type class.
In contrast the Collects
example on the same wiki page is sort of the opposite:
class Collects ce where
type Elem ce
empty :: ce
insert :: Elem ce -> ce -> ce
member :: Elem ce -> ce -> Bool
toList :: ce -> [Elem ce]
The argument type is the collection and the result type is the element of the collection. In general a user is going to want to operate on those elements directly using the normal operations on that type. For example the collection might be IntSet
and the element might be Int
. Having the Int
be wrapped up in some other type would be quite inconvenient.
Note - these two examples are with type classes and therefore don't need the family
keyword as declaring a type inside a type class implies it must be a family. Exactly the same considerations apply as for standalone families though, it's just a question of how the abstraction is organised.