What are the problems with an ADT encoding that associates types with data constructors? (Such as Scala.)

In Scala, algebraic data types are encoded as sealed one-level type hierarchies. Example:

-- Haskell
data Positioning a = Append
                   | AppendIf (a -> Bool)
                   | Explicit ([a] -> [a]) 
// Scala
sealed trait Positioning[A]
case object Append extends Positioning[Nothing]
case class AppendIf[A](condition: A => Boolean) extends Positioning[A]
case class Explicit[A](f: Seq[A] => Seq[A]) extends Positioning[A]

With case classes and case objects, Scala generates a bunch of things like equals, hashCode, unapply (used by pattern matching) etc that brings us many of the key properties and features of traditional ADTs.

There is one key difference though – In Scala, "data constructors" have their own types. Compare the following two for example (Copied from the respective REPLs).

// Scala

scala> :t Append
Append.type

scala> :t AppendIf[Int](Function const true)
AppendIf[Int]

-- Haskell

haskell> :t Append
Append :: Positioning a

haskell> :t AppendIf (const True)
AppendIf (const True) :: Positioning a

I have always considered the Scala variation to be on the advantageous side.

After all, there is no loss of type information. AppendIf[Int] for instance is a subtype of Positioning[Int].

scala> val subtypeProof = implicitly[AppendIf[Int] <:< Positioning[Int]]
subtypeProof: <:<[AppendIf[Int],Positioning[Int]] = <function1>

In fact, you get an additional compile time invariant about the value. (Could we call this a limited version of dependent typing?)

This can be put to good use – Once you know what data constructor was used to create a value, the corresponding type can be propagated through rest of the flow to add more type safety. For example, Play JSON, which uses this Scala encoding, will only allow you to extract fields from JsObject, not from any arbitrary JsValue.

scala> import play.api.libs.json._
import play.api.libs.json._

scala> val obj = Json.obj("key" -> 3)
obj: play.api.libs.json.JsObject = {"key":3}

scala> obj.fields
res0: Seq[(String, play.api.libs.json.JsValue)] = ArrayBuffer((key,3))

scala> val arr = Json.arr(3, 4)
arr: play.api.libs.json.JsArray = [3,4]

scala> arr.fields
<console>:15: error: value fields is not a member of play.api.libs.json.JsArray
              arr.fields
                  ^

scala> val jsons = Set(obj, arr)
jsons: scala.collection.immutable.Set[Product with Serializable with play.api.libs.json.JsValue] = Set({"key":3}, [3,4])

In Haskell, fields would probably have type JsValue -> Set (String, JsValue). Which means it will fail at runtime for a JsArray etc. This problem also manifests in the form of well known partial record accessors.

The view that Scala's treatment of data constructors is wrong has been expressed numerous times – on Twitter, mailing lists, IRC, SO etc. Unfortunately I don't have links to any of those, except for a couple - this answer by Travis Brown, and Argonaut, a purely functional JSON library for Scala.

Argonaut consciously takes the Haskell approach (by privateing case classes, and providing data constructors manually). You can see that the problem I mentioned with Haskell encoding exists with Argonaut as well. (Except it uses Option to indicate partiality.)

scala> import argonaut._, Argonaut._
import argonaut._
import Argonaut._

scala> val obj = Json.obj("k" := 3)
obj: argonaut.Json = {"k":3}

scala> obj.obj.map(_.toList)
res6: Option[List[(argonaut.Json.JsonField, argonaut.Json)]] = Some(List((k,3)))

scala> val arr = Json.array(jNumber(3), jNumber(4))
arr: argonaut.Json = [3,4]

scala> arr.obj.map(_.toList)
res7: Option[List[(argonaut.Json.JsonField, argonaut.Json)]] = None

I have been pondering this for quite some time, but still do not understand what makes Scala's encoding wrong. Sure it hampers type inference at times, but that does not seem like a strong enough reason to decree it wrong. What am I missing?


To the best of my knowledge, there are two reasons why Scala's idiomatic encoding of case classes can be bad: type inference, and type specificity. The former is a matter of syntactic convenience, while the latter is a matter of increased scope of reasoning.

The subtyping issue is relatively easy to illustrate:

val x = Some(42)

The type of x turns out to be Some[Int], which is probably not what you wanted. You can generate similar issues in other, more problematic areas:

sealed trait ADT
case class Case1(x: Int) extends ADT
case class Case2(x: String) extends ADT

val xs = List(Case1(42), Case1(12))

The type of xs is List[Case1]. This is basically guaranteed to be not what you want. In order to get around this issue, containers like List need to be covariant in their type parameter. Unfortunately, covariance introduces a whole bucket of issues, and in fact degrades the soundness of certain constructs (e.g. Scalaz compromises on its Monad type and several monad transformers by allowing covariant containers, despite the fact that it is unsound to do so).

So, encoding ADTs in this fashion has a somewhat viral effect on your code. Not only do you need to deal with subtyping in the ADT itself, but every container you ever write needs to take into account the fact that you're landing on subtypes of your ADT at inopportune moments.

The second reason not to encode your ADTs using public case classes is to avoid cluttering up your type space with "non-types". From a certain perspective, ADT cases are not really types: they are data. If you reason about ADTs in this fashion (which is not wrong!), then having first-class types for each of your ADT cases increases the set of things you need to carry in your mind to reason about your code.

For example, consider the ADT algebra from above. If you want to reason about code which uses this ADT, you need to be constantly thinking about "well, what if this type is Case1?" That just not a question anyone really needs to ask, since Case1 is data. It's a tag for a particular coproduct case. That's all.

Personally, I don't care much about any of the above. I mean, the unsoundness issues with covariance are real, but I generally just prefer to make my containers invariant and instruct my users to "suck it up and annotate your types". It's inconvenient and it's dumb, but I find it preferable to the alternative, which is a lot of boilerplate folds and "lower-case" data constructors.

As a wildcard, a third potential disadvantage to this sort of type specificity is it encourages (or rather, allows) a more "object-oriented" style where you put case-specific functions on the individual ADT types. I think there is very little question that mixing your metaphors (case classes vs subtype polymorphism) in this way is a recipe for bad. However, whether or not this outcome is the fault of typed cases is sort of an open question.