Java8 streams sequential and parallel execution produce different results?

Running the following stream example in Java8:

    System.out.println(Stream
        .of("a", "b", "c", "d", "e", "f")
        .reduce("", (s1, s2) -> s1 + "/" + s2)
    );

yields:

/a/b/c/d/e/f

Which is - of course - no surprise. Due to http://docs.oracle.com/javase/8/docs/api/index.html?overview-summary.html it shouldn't matter whether the stream is executed sequentially or parallel:

Except for operations identified as explicitly nondeterministic, such as findAny(), whether a stream executes sequentially or in parallel should not change the result of the computation.

AFAIK reduce() is deterministic and (s1, s2) -> s1 + "/" + s2 is associative, so that adding parallel() should yield the same result:

    System.out.println(Stream
            .of("a", "b", "c", "d", "e", "f")
            .parallel()
            .reduce("", (s1, s2) -> s1 + "/" + s2)
    );

However the result on my machine is:

/a//b//c//d//e//f

What's wrong here?

BTW: using (the preferred) .collect(Collectors.joining("/")) instead of reduce(...) yields the same result a/b/c/d/e/f for sequential and parallel execution.

JVM details:

java.specification.version: 1.8
java.version: 1.8.0_31
java.vm.version: 25.31-b07
java.runtime.version: 1.8.0_31-b13

Solution 1:

From reduce's documentation:

The identity value must be an identity for the accumulator function. This means that for all t, accumulator.apply(identity, t) is equal to t.

Which is not true in your case - "" and "a" creates "/a".

I have extracted the accumulator function and added a printout to show what happens:

BinaryOperator<String> accumulator = (s1, s2) -> {
    System.out.println("joining \"" + s1 + "\" and \"" + s2 + "\"");
    return s1 + "/" + s2;
};
System.out.println(Stream
                .of("a", "b", "c", "d", "e", "f")
                .parallel()
                .reduce("", accumulator)
);

This is example output (it differs between runs):

joining "" and "d"
joining "" and "f"
joining "" and "b"
joining "" and "a"
joining "" and "c"
joining "" and "e"
joining "/b" and "/c"
joining "/e" and "/f"
joining "/a" and "/b//c"
joining "/d" and "/e//f"
joining "/a//b//c" and "/d//e//f"
/a//b//c//d//e//f

You can add an if statement to your function to handle empty string separately:

System.out.println(Stream
        .of("a", "b", "c", "d", "e", "f")
        .parallel()
        .reduce((s1, s2) -> s1.isEmpty()? s2 : s1 + "/" + s2)
);

As Marko Topolnik noticed, checking s2 is not required as accumulator doesn't have to be commutative function.

Solution 2:

To add to other answer,

You might want to use Mutable reduction, the doc specify that doing something like

String concatenated = strings.reduce("", String::concat)

Will give bad performance result.

We would get the desired result, and it would even work in parallel. However, we might not be happy about the performance! Such an implementation would do a great deal of string copying, and the run time would be O(n^2) in the number of characters. A more performant approach would be to accumulate the results into a StringBuilder, which is a mutable container for accumulating strings. We can use the same technique to parallelize mutable reduction as we do with ordinary reduction.

So you should use a StringBuilder instead.

Java8 streams sequential and parallel execution produce different results?

Solution 1:

Solution 2:

Related

Recent Posts