Sets, Functors and Eq confusion
A discussion came up at work recently about Sets, which in Scala support the zip
method and how this can lead to bugs, e.g.
scala> val words = Set("one", "two", "three")
scala> words zip (words map (_.length))
res1: Set[(java.lang.String, Int)] = Set((one,3), (two,5))
I think it's pretty clear that Set
s shouldn't support a zip
operation, since the elements are not ordered. However, it was suggested that the problem is that Set
isn't really a functor, and shouldn't have a map
method. Certainly, you can get yourself into trouble by mapping over a set. Switching to Haskell now,
data AlwaysEqual a = Wrap { unWrap :: a }
instance Eq (AlwaysEqual a) where
_ == _ = True
instance Ord (AlwaysEqual a) where
compare _ _ = EQ
and now in ghci
ghci> import Data.Set as Set
ghci> let nums = Set.fromList [1, 2, 3]
ghci> Set.map unWrap $ Set.map Wrap $ nums
fromList [3]
ghci> Set.map (unWrap . Wrap) nums
fromList [1, 2, 3]
So Set
fails to satisfy the functor law
fmap f . fmap g = fmap (f . g)
It can be argued that this is not a failing of the map
operation on Set
s, but a failing of the Eq
instance that we defined, because it doesn't respect the substitution law, namely that for two instances of Eq
on A and B and a mapping f : A -> B
then
if x == y (on A) then f x == f y (on B)
which doesn't hold for AlwaysEqual
(e.g. consider f = unWrap
).
Is the substition law a sensible law for the Eq
type that we should try to respect? Certainly, other equality laws are respected by our AlwaysEqual
type (symmetry, transitivity and reflexivity are trivially satisfied) so substitution is the only place that we can get into trouble.
To me, substition seems like a very desirable property for the Eq
class. On the other hand, some comments on a recent Reddit discussion include
"Substitution seems stronger than necessary, and is basically quotienting the type, putting requirements on every function using the type."
-- godofpumpkins
"I also really don't want substitution/congruence since there are many legitimate uses for values which we want to equate but are somehow distinguishable."
-- sclv
"Substitution only holds for structural equality, but nothing insists
Eq
is structural."-- edwardkmett
These three are all pretty well known in the Haskell community, so I'd be hesitant to go against them and insist on substitability for my Eq
types!
Another argument against Set
being a Functor
- it is widely accepted that being a Functor
allows you to transform the "elements" of a "collection" while preserving the shape. For example, this quote on the Haskell wiki (note that Traversable
is a generalization of Functor
)
"Where
Foldable
gives you the ability to go through the structure processing the elements but throwing away the shape,Traversable
allows you to do that whilst preserving the shape and, e.g., putting new values in.""
Traversable
is about preserving the structure exactly as-is."
and in Real World Haskell
"...[A] functor must preserve shape. The structure of a collection should not be affected by a functor; only the values that it contains should change."
Clearly, any functor instance for Set
has the possibility to change the shape, by reducing the number of elements in the set.
But it seems as though Set
s really should be functors (ignoring the Ord
requirement for the moment - I see that as an artificial restriction imposed by our desire to work efficiently with sets, not an absolute requirement for any set. For example, sets of functions are a perfectly sensible thing to consider. In any case, Oleg has shown how to write efficient Functor and Monad instances for Set
that don't require an Ord
constraint). There are just too many nice uses for them (the same is true for the non-existant Monad
instance).
Can anyone clear up this mess? Should Set
be a Functor
? If so, what does one do about the potential for breaking the Functor laws? What should the laws for Eq
be, and how do they interact with the laws for Functor
and the Set
instance in particular?
Another argument against
Set
being aFunctor
- it is widely accepted that being aFunctor
allows you to transform the "elements" of a "collection" while preserving the shape. [...] Clearly, any functor instance for Set has the possibility to change the shape, by reducing the number of elements in the set.
I'm afraid that this is a case of taking the "shape" analogy as a defining condition when it is not. Mathematically speaking, there is such a thing as the power set functor. From Wikipedia:
Power sets: The power set functor P : Set → Set maps each set to its power set and each function f : X → Y to the map which sends U ⊆ X to its image f(U) ⊆ Y.
The function P(f) (fmap f
in the power set functor) does not preserve the size of its argument set, yet this is nonetheless a functor.
If you want an ill-considered intuitive analogy, we could say this: in a structure like a list, each element "cares" about its relationship to the other elements, and would be "offended" if a false functor were to break that relationship. But a set is the limiting case: a structure whose elements are indifferent to each other, so there is very little you can do to "offend" them; the only thing is if a false functor were to map a set that contains that element to a result that doesn't include its "voice."
(Ok, I'll shut up now...)
EDIT: I truncated the following bits when I quoted you at the top of my answer:
For example, this quote on the Haskell wiki (note that
Traversable
is a generalization ofFunctor
)"Where
Foldable
gives you the ability to go through the structure processing the elements but throwing away the shape,Traversable
allows you to do that whilst preserving the shape and, e.g., putting new values in.""
Traversable
is about preserving the structure exactly as-is."
Here's I'd remark that Traversable
is a kind of specialized Functor
, not a "generalization" of it. One of the key facts about any Traversable
(or, actually, about Foldable
, which Traversable
extends) is that it requires that the elements of any structure have a linear order—you can turn any Traversable
into a list of its elements (with Foldable.toList
).
Another, less obvious fact about Traversable
is that the following functions exist (adapted from Gibbons & Oliveira, "The Essence of the Iterator Pattern"):
-- | A "shape" is a Traversable structure with "no content,"
-- i.e., () at all locations.
type Shape t = t ()
-- | "Contents" without a shape are lists of elements.
type Contents a = [a]
shape :: Traversable t => t a -> Shape t
shape = fmap (const ())
contents :: Traversable t => t a -> Contents a
contents = Foldable.toList
-- | This function reconstructs any Traversable from its Shape and
-- Contents. Law:
--
-- > reassemble (shape xs) (contents xs) == Just xs
--
-- See Gibbons & Oliveira for implementation. Or do it as an exercise.
-- Hint: use the State monad...
--
reassemble :: Traversable t => Shape t -> Contents a -> Maybe (t a)
A Traversable
instance for sets would violate the proposed law, because all non-empty sets would have the same Shape
—the set whose Contents
is [()]
. From this it should be easy to prove that whenever you try to reassemble
a set you would only ever get the empty set or a singleton back.
Lesson? Traversable
"preserves shape" in a very specific, stronger sense than Functor
does.
Set is "just" a functor (not a Functor
) from the subcategory of Hask where Eq
is "nice" (i.e. the subcategory where congruence, substitution, holds). If constraint kinds were around from way back then perhaps set would be a Functor
of some kind.
Well, Set can be treated as a covariant functor, and as a contravariant functor; usually it's a covariant functor. And for it to behave regarding equality one has to make sure that whatever the implementation, it does.
Regarding Set.zip - it is nonsense. As well as Set.head (you have it in Scala). It should not exist.