scala, guidelines on return type - when prefer seq, iterable, traversable
When do you choose to type a given function's return type as Seq
vs Iterable
vs Traversable
(or alternatively even deeper within Seq
's hierarchy)?
How do you make that decision? We have a lot of code that returns Seq
s by default (usually starting from results of a DB query and successive transformations). I tend to want to make the return types Traversable
by default and Seq
when specifically expecting a given order. But I don't have a strong justification for doing so.
I am perfectly familiar with the definition of each trait, so please don't answer with defining the terms.
This is a good question. You have to balance two concerns:
- (1) try to keep your API general, so you can change the implementation later
- (2) give the caller some useful operations to perform on the collection
Where (1) asks you to be as little specific about the type (e.g. Iterable
over Seq
), and (2) asks you the opposite.
Even if the return type is just Iterable
, you can still return let's say a Vector
, so if the caller wishes to gain extra power, it can just call .toSeq
or .toIndexedSeq
on it, and that operation is cheap for a Vector
.
As a measure of the balance, I would add a third point:
- (3) use a type that kind of reflects how the data is organised. E.g. when you can assume that the data does have a sequence, give
Seq
. If you can assume that no two equal objects can occur, give aSet
. Etc.
Here are my rules of thumb:
- try to use only a small set of collections:
Set
,Map
,Seq
,IndexedSeq
- I often violate this previous rule, though, using
List
in favour ofSeq
. It allows the caller to do pattern matching with the cons extractors - use immutable types only (e.g.
collection.immutable.Set
,collection.immutable.IndexedSeq
) - do not use concrete implementations (
Vector
), but the general type (IndexedSeq
) which gives the same API - if you are encapsulating a mutable structure, only return
Iterator
instances, the caller can then easily generate a strict structure, e.g. by callingtoList
on it - if your API is small and clearly tuned towards "big data throughput", use
IndexedSeq
Of course, this is my personal choice, but I hope it sounds sane.
- Use
Seq
by default everywhere. - Use
IndexedSeq
when you need to access by index. - Use anything else only in special circumstances.
These are the "common-sense" guidelines. They are simple, practical, and work well in practice while balancing principles and performance. The principles are:
- Use a type that reflects how the data is organized (thanks OP and ziggystar).
- Use interface types in both method arguments and return types. Both inputs and return types of an API benefit from the flexibility of generality.
Seq
satisfies both principles. As described in http://docs.scala-lang.org/overviews/collections/seqs.html:
A sequence is a kind of iterable that has a [finite] length and whose elements have fixed index positions, starting from 0.
90% of the time, your data is a Seq.
Other notes:
-
List
is an implementation type, so you shouldn't use it in an API. AVector
for instance can't be used as aList
without going through a conversion. -
Iterable
doesn't definelength
.Iterable
abstracts across finite sequences and potentially infinite streams. Most of the time one is dealing with finite sequences so you "have a length," andSeq
reflects that. Frequently you won't actually make use of length. But it's needed often enough, and is easy to provide, so useSeq
.
Drawbacks:
There are some slight downsides to these "common-sense" conventions.
- You can't use List cons pattern matching i.e.
case head :: tail => ...
. You can use:+
and+:
as described here. Importantly, however, matching onNil
still works as described in Scala: Pattern matching Seq[Nothing].
Footnotes:
- I'm not discussing
Map
here because the question, sensibly, doesn't ask about it. - I'm only addressing immutable collections here.
- The guidelines I suggest are consistent with Should I use List[A] or Seq[A] or something else?
Make your method's return type as specific as possible. Then if the caller wants to keep it as a SuperSpecializedHashMap
or type it as a GenTraversableOnce
, they can. This is why the compiler infers the most specific type by default.
A rule of thumb I follow is, depending on implementation, to make the return types as specific as possible and the types of arguments as general as possible. It's an easy to follow rule and it provides you with consistent guarantees on the type properties with maximum freedom.
Say, if you have a function implementation which just traverses a data structure with methods like map
, filter
or fold
- those that are implemented in the trait Traversable
, you can expect it to perform equally on any type of input collection - be it a List
, Vector
, HashSet
or even a HashMap
, so your input argument should be specified as Traversable[T]
. The choice of output type of the function should only depend on its implementation: in this case it should be Traversable
too. If however in your function you force this data structure to some more specific type with methods like toList
, toSeq
or toSet
, you should specify the appropriate type. Notice the consistency between the implementation and the return type?
If your function accesses the elements of input by index, the input should be specified as IndexedSeq
, as it is the most general type that provides you with guarantees on effective implementation of method apply
.
In case of abstract members the same rule applies with the only difference that you should specify the return types based on how you plan to use them instead of implementation, thus most often they will be more general than in implementation. The categorical choices Seq
, Set
or Map
are the most expected.
Following this rule you protect yourself from very common cases of bottleneck when, for instance, items get appended to List
or contains
gets called on a Seq
instead of a Set
, yet your program remains a nice degree of freedom and is consistent in sense of choice of types.