When does a Java intermediate operation hit every element?

I'm confused by the way Java streams work, particularly as regard to short-circuiting. For an example of what confuses me, I cooked up the following example:

List<Integer> list = Arrays.asList(1,2,3,4,5,6,7,8,9,10);
Optional<Integer> res = list.stream()
        .map(x -> {
            System.out.println("first map: " + x);
            return 2*x;
        })
        .sorted((a,b)-> {
            System.out.println("sorting " + a + " : " + b);
            return a - b;
        })
        .map(x -> {
            System.out.println("second map: " +  x);
            return 2*x;
        })
        .findAny();
System.out.println("found " + res.get());

with output

first map: 1
first map: 2
first map: 3
first map: 4
first map: 5
first map: 6
first map: 7
first map: 8
first map: 9
first map: 10
sorting 4 : 2
sorting 6 : 4
sorting 8 : 6
sorting 10 : 8
sorting 12 : 10
sorting 14 : 12
sorting 16 : 14
sorting 18 : 16
sorting 20 : 18
second map: 2
found 4

When executed, this code demonstrates that calling sorted in the middle forces the first map to apply to every element in the stream. However, the second map does not get applied to every element in the stream since findAny short-circuits it.

So basically my question is: what are the rules here? Why is Java smart enough to know it doesn't need to call the second map on every element of the stream but not smart enough to know that findAny at the end doesn't require it to actually sort anything.

I tried "reading the manual" on this, and it just isn't clear to me.


That happens because streams are processing elements from the source lazily one at a time.

Each operation occurs only when it's needed.

In the case of sorting, stream dumps all the data from the source to memory because there's no way to sort elements peeking then one by one.

map --> sorted --> map --> findAny

The First-time map operation gets applied to all elements because it precedes the sorting operation.

After sorting is done, elements will be processed one at a time.

Terminal operation findAny is a short-circuit operation. And hence only the very first of all the elements will be peeked by the terminal operation, the second map operation needs to be applied only once.


Why is Java smart enough to know it doesn't need to call the second map on every element of the stream but not smart enough to know that findAny at the end doesn't require it to actually sort anything

Well, I guess because until now findAny() in the case of single-threaded execution still behaves exactly like findFirst(), which will require to do sorting in order to find the correct result.

You may try to play with sequential streams and find out that findAny() and findFirst() yield the same result.

I can't give you the answer to why it behaves in such away.

Java-doc says with caution about findAny()

The behavior of this operation is explicitly nondeterministic

Only if stream is being processed in parallel behavior of findAny() differs from findFirst().

findAny() could return an element that is peeked by any thread, while findFirst() guarantees to respect the initial order while providing the result.


https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/util/stream/Stream.html


This is indeed not obvious because the actual reason is rather historical. The first release version of Java 8 had a Stream implementation which did consider the unordered nature of findAny, at least in some cases.

Not in the case of a skippable sorted but in more cases than correct. This has been discussed in

  • Is this a bug in Files.lines(), or am I misunderstanding something about parallel streams? and
  • Stream.skip behavior with unordered terminal operation

In the latter Q&A we find a comment from Brian Goetz, one of the architects:

After some analysis we chose to back out back-propagation entirely. The only place where this pays off is optimizing away a sort; if you have a pipeline that sorts into an unordered terminal op, that's probably a user error anyway.

So the current state of the reference implementation is that the unordered nature of the terminal operation is not used for optimizing preceding intermediate operations.

This doesn’t imply that such optimization wouldn’t be valid; the example of skipping a sort when the subsequent operations do not care about the order is explicitly mentioned as valid, though Brian Goetz suggests that such a scenario might be a mistake on the developer’s side anyway.

I have a slightly different point of view, as I think it would be a useful scenario if you have Stream returning method, which may return a sorted Stream, while operations chained by the caller determine whether this sorting is relevant for the final result or not.

But whether such an optimization happens or not does not affect the correctness of the result. Note that this does not only apply to unordered operations.

When you have

Optional<Integer> res = IntStream.rangeClosed(1, 10)
    .mapToObj(x -> {
        System.out.println("first map: " + x);
        return 2*x;
    })
    .sorted((a,b)-> {
        System.out.println("sorting " + a + " : " + b);
        return Integer.compare(a, b);
    })
    .map(x -> {
        System.out.println("second map: " +  x);
        return 2*x;
    })
    .findFirst();

it would be perfectly valid to transform the operation to (the equivalent of)

Optional<Integer> res = IntStream.rangeClosed(1, 10)
    .mapToObj(x -> {
        System.out.println("first map: " + x);
        return 2*x;
    })
    .min((a,b)-> {
        System.out.println("sorting " + a + " : " + b);
        return Integer.compare(a, b);
    })
    .map(x -> {
        System.out.println("second map: " +  x);
        return 2*x;
    });

which would be more efficient and still return the correct result for the terminal operation while the evaluation order of the first mapping function and the comparator is different.

But the correct result is all that matters. You should never assume a specific evaluation order at all. You don’t find a specification of the evaluation order in the manual, because it has been left out intentionally. The presence or absence of optimizations can change at any time. If you want a dramatic example, replace your findAny() with count() and compare your example’s behavior in Java 8 and Java 9 (or newer).

// run with Java 8 and Java 9+ and compare...
long count = IntStream.rangeClosed(1, 10)
    .mapToObj(x -> {
        System.out.println("first map: " + x);
        return 2*x;
    })
    .sorted((a,b)-> {
        System.out.println("sorting " + a + " : " + b);
        return Integer.compare(a, b);
    })
    .map(x -> {
        System.out.println("second map: " +  x);
        return 2*x;
    })
    .count();
System.out.println("found " + count + " elements");