Stream.skip behavior with unordered terminal operation
Solution 1:
Recall that the goal of stream flags (ORDERED, SORTED, SIZED, DISTINCT) is to enable operations to avoid doing unnecessary work. Examples of optimizations that involve stream flags are:
- If we know the stream is already sorted, then
sorted()
is a no-op; - If we know the size of the stream, we can pre-allocate a correct-sized array in
toArray()
, avoiding a copy; - If we know that the input has no meaningful encounter order, we need not take extra steps to preserve encounter order.
Each stage of a pipeline has a set of stream flags. Intermediate operations can inject, preserve, or clear stream flags. For example, filtering preserves sorted-ness / distinct-ness but not sized-ness; mapping preserves sized-ness but not sorted-ness or distinct-ness. Sorting injects sorted-ness. The treatment of flags for intermediate operations is fairly straightforward, because all decisions are local.
The treatment of flags for terminal operations is more subtle. ORDERED is the most relevant flag for terminal ops. And if a terminal op is UNORDERED, then we do back-propagate the unordered-ness.
Why do we do this? Well, consider this pipeline:
set.stream()
.sorted()
.forEach(System.out::println);
Since forEach
is not constrained to operate in order, the work of sorting the list is completely wasted effort. So we back-propagate this information (until we hit a short-circuiting operation, such as limit
), so as not to lose this optimization opportunity. Similarly, we can use an optimized implementation of distinct
on unordered streams.
Is this behavior intended or it's a bug?
Yes :) The back-propagation is intended, as it is a useful optimization that should not produce incorrect results. However, the bug part is that we are propagating past a previous skip
, which we shouldn't. So the back-propagation of the UNORDERED flag is overly aggressive, and that's a bug. We'll post a bug.
If yes is it documented somewhere?
It should be just an implementation detail; if it were correctly implemented, you wouldn't notice (except that your streams are faster.)