Iterator versus Stream of Java 8
To take advantage of the wide range of query methods included in java.util.stream
of Jdk 8 I am attempted to design domain models where getters of relationship with *
multiplicity (with zero or more instances ) return a Stream<T>
, instead of an Iterable<T>
or Iterator<T>
.
My doubt is if there is any additional overhead incurred by the Stream<T>
in comparison to the Iterator<T>
?
So, is there any disadvantage of compromising my domain model with a Stream<T>
?
Or instead, should I always return an Iterator<T>
or Iterable<T>
, and leave to the end-user the decision of choosing whether to use a stream, or not, by converting that iterator with the StreamUtils
?
Note that returning a Collection
is not a valid option because in this case most of the relationships are lazy and with unknown size.
Solution 1:
There's lots of performance advice here, but sadly much of it is guesswork, and little of it points to the real performance considerations.
@Holger gets it right by pointing out that we should resist the seemingly overwhelming tendency to let the performance tail wag the API design dog.
While there are a zillion considerations that can make a stream slower than, the same as, or faster than some other form of traversal in any given case, there are some factors that point to streams haven a performance advantage where it counts -- on big data sets.
There is some additional fixed startup overhead of creating a Stream
compared to creating an Iterator
-- a few more objects before you start calculating. If your data set is large, it doesn't matter; it's a small startup cost amortized over a lot of computation. (And if your data set is small, it probably also doesn't matter -- because if your program is operating on small data sets, performance is generally not your #1 concern either.) Where this does matter is when going parallel; any time spent setting up the pipeline goes into the serial fraction of Amdahl's law; if you look at the implementation, we work hard to keep the object count down during stream setup, but I'd be happy to find ways to reduce it as that has a direct effect on the breakeven data set size where parallel starts to win over sequential.
But, more important than the fixed startup cost is the per-element access cost. Here, streams actually win -- and often win big -- which some may find surprising. (In our performance tests, we routinely see stream pipelines which can outperform their for-loop over Collection
counterparts.) And, there's a simple explanation for this: Spliterator
has fundamentally lower per-element access costs than Iterator
, even sequentially. There are several reasons for this.
The Iterator protocol is fundamentally less efficient. It requires calling two methods to get each element. Further, because Iterators must be robust to things like calling
next()
withouthasNext()
, orhasNext()
multiple times withoutnext()
, both of these methods generally have to do some defensive coding (and generally more statefulness and branching), which adds to inefficiency. On the other hand, even the slow way to traverse a spliterator (tryAdvance
) doesn't have this burden. (It's even worse for concurrent data structures, because thenext
/hasNext
duality is fundamentally racy, andIterator
implementations have to do more work to defend against concurrent modifications than doSpliterator
implementations.)Spliterator
further offers a "fast-path" iteration --forEachRemaining
-- which can be used most of the time (reduction, forEach), further reducing the overhead of the iteration code that mediates access to the data structure internals. This also tends to inline very well, which in turn increases the effectiveness of other optimizations such as code motion, bounds check elimination, etc.Further, traversal via
Spliterator
tend to have many fewer heap writes than withIterator
. WithIterator
, every element causes one or more heap writes (unless theIterator
can be scalarized via escape analysis and its fields hoisted into registers.) Among other issues, this causes GC card mark activity, leading to cache line contention for the card marks. On the other hand,Spliterators
tend to have less state, and industrial-strengthforEachRemaining
implementations tend to defer writing anything to the heap until the end of the traversal, instead storing its iteration state in locals which naturally map to registers, resulting in reduced memory bus activity.
Summary: don't worry, be happy. Spliterator
is a better Iterator
, even without parallelism. (They're also generally just easier to write and harder to get wrong.)
Solution 2:
Let’s compare the common operation of iterating over all elements, assuming that the source is an ArrayList
. Then, there are three standard ways to achieve this:
-
Collection.forEach
final E[] elementData = (E[]) this.elementData; final int size = this.size; for (int i=0; modCount == expectedModCount && i < size; i++) { action.accept(elementData[i]); }
-
Iterator.forEachRemaining
final Object[] elementData = ArrayList.this.elementData; if (i >= elementData.length) { throw new ConcurrentModificationException(); } while (i != size && modCount == expectedModCount) { consumer.accept((E) elementData[i++]); }
-
Stream.forEach
which will end up callingSpliterator.forEachRemaining
if ((i = index) >= 0 && (index = hi) <= a.length) { for (; i < hi; ++i) { @SuppressWarnings("unchecked") E e = (E) a[i]; action.accept(e); } if (lst.modCount == mc) return; }
As you can see, the inner loop of the implementation code, where these operations end up, is basically the same, iterating over indices and directly reading the array and passing the element to the Consumer
.
Similar things apply to all standard collections of the JRE, all of them have adapted implementations for all ways to do it, even if you are using a read-only wrapper. In the latter case, the Stream
API would even slightly win, Collection.forEach
has to be called on the read-only view in order to delegate to the original collection’s forEach
. Similarly, the iterator has to be wrapped to protect against attempts to invoke the remove()
method. In contrast, spliterator()
can directly return the original collection’s Spliterator
as it has no modification support. Thus, the stream of a read-only view is exactly the same as the stream of the original collection.
Though all these differences are hardly to notice when measuring real life performance as, as said, the inner loop, which is the most performance relevant thing, is the same in all cases.
The question is which conclusion to draw from that. You still can return a read-only wrapper view to the original collection, as the caller still may invoke stream().forEach(…)
to directly iterate in the context of the original collection.
Since the performance isn’t really different, you should rather focus on the higher level design like discussed in “Should I return a Collection or a Stream?”