Is there ever a reason not to use Java 8's parallelSort?
I was reading this question about the differences between Java's Arrays.sort
and Arrays.parallelSort
, which is a few years old by now. What surprised me is that there was only one question that mentioned any downside to using parallelSort
; namely that the speedup decreases if you're using a lot of your CPU.
Assuming you're not in some sort of specialized, single-threaded environment, should one always choose parallelSort
? Is there ever a reason not to? Note that one of the answers to the question above mentions that if there are fewer than 4096 elements, parallelSort
simply calls sort
anyway.
Solution 1:
There are some downsides to using Arrays.parallelSort
- it uses the
ForkJoinPool.commonPool()
and will fight with other functions that use it by default (e.g.parallel()
on a stream) - the thread-pool
Arrays.parallelSort
uses is not configurable (only on a global level by increasing the common-pools thread amount) - it performs worse on small data sets (more often than not arrays contain few elements, the JDK even acknowledges that e.g. most
ArrayList
stay empty for their whole lifetime which saves quite a bit of memory and CPU time for not instantiating arrays that will never be filled)
And another anecdotal scenario: say if you implement some card game that needs sorting. Its embarassingly easy to parallelize multiple game executions next to each other instead of parallelizing the sorting mechanism of one run which may only take a fraction of the whole game loop. You lost an easy way to parallelize now (e.g. when running the game in the context of genetic algorithms).
But yes, if you happen to have large arrays and sorting is a substantial part of your applications runtime use Arrays.parallelSort
.
EDIT:
And even if Arrays.parallelSort
switches to a normal sort if the given array has less than 4096 elements: it's all about showing intention - you want a parallel sort if possible which holds a different meaning than just calling sort
. And to be nitpicky: it does indeed perform worse on small arrays as it has to do the additional check if the array is containing less than 4096 elements and some other checks about the common pools thread count (whichs overhead is of course negligible) :).
Solution 2:
This is not much different than the question when to use stream()
vs parallelStream()
- it depends on how much data you have. Of course the majority of time, when sorting 10 elements in parallel, will be consumed by the threading framework that is under the hood (which is un-specified by the documentation), not by the sorting itself.
But you also have to wonder why such methods are introduced IMO. Hardware is moving (has already moved?) towards many CPU's, not more GHz
, so doing things in parallel is only a normal course for any language that wants to still be alive in the next 20 years.
As to how much data you need to actually be performant for parallelSort
as opposed to sort
, plus knowing that we need at least MIN_ARRAY_SORT_GRAN + 1
to gain any potential benefit; writing a proper test to prove that for this particular set-up and run, you would need at least X
numbers, is not that complicated. You also have to take into account that some arrays might be already sorted (explained further), while some might be totally un-sorted (5,4,3,2,1
for example), this brings some penalties for the second one.
Taking some random data and making a test:
@Warmup(iterations = 10)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@Measurement(iterations = 2, time = 2, timeUnit = TimeUnit.SECONDS)
public class ParallelSort {
public static void main(String[] args) throws Exception {
Options opt = new OptionsBuilder()
.include(ParallelSort.class.getName())
.build();
new Runner(opt).run();
}
@Benchmark
@BenchmarkMode(Mode.AverageTime)
@Fork(1)
public int[] parallel(ParallelSortExecutionPlan plan) {
Arrays.parallelSort(plan.ints());
return plan.ints();
}
@Benchmark
@BenchmarkMode(Mode.AverageTime)
@Fork(1)
public int[] nonParallel(ParallelSortExecutionPlan plan) {
Arrays.sort(plan.ints());
return plan.ints();
}
}
@State(Scope.Benchmark)
public class ParallelSortExecutionPlan {
@Param(value = {"10", "100", "1000", "10000", "100000", "1000000"})
private int howMany;
private int[] ints;
public static void main(String[] args) {
}
@Setup(Level.Invocation)
public void setUp() {
ints = new int[howMany];
for (int i = 0; i < howMany; ++i) {
ints[i] = ThreadLocalRandom.current().nextInt();
}
}
int[] ints() {
return ints;
}
}
Just notice that the second class is using @Setup(Level.Invocation)
(if you know a bit of JMH) - this is a very sharp tool here; but I uses it because I want an un-sorted array for each Invocation
of the method. As otherwise, if Trial
would have been used for example - only the first call would be an un-sorted array, all other calls of the @Benhcmark
method would already be sorted. For the fun of it, you could change that single line to @Setup(Level.Trial)
for example and see the results, they will make very little sense.
Running this reveals:
Benchmark (howMany) Mode Cnt Score Error Units
ParallelSort.nonParallel 10 avgt 2 128.847 ns/op
ParallelSort.parallel 10 avgt 2 116.656 ns/op
ParallelSort.nonParallel 100 avgt 2 1956.746 ns/op
ParallelSort.parallel 100 avgt 2 1963.335 ns/op
ParallelSort.nonParallel 1000 avgt 2 32162.611 ns/op
ParallelSort.parallel 1000 avgt 2 31716.915 ns/op
ParallelSort.nonParallel 10000 avgt 2 423531.663 ns/op
ParallelSort.parallel 10000 avgt 2 201802.609 ns/op
ParallelSort.nonParallel 100000 avgt 2 6503511.987 ns/op
ParallelSort.parallel 100000 avgt 2 1363169.661 ns/op
ParallelSort.nonParallel 1000000 avgt 2 69058738.586 ns/op
ParallelSort.parallel 1000000 avgt 2 13469112.930 ns/op
Pretty much a very expected output to me.
Solution 3:
No, I would say no for small enough arrays. The overhead of setting up the threads won't result in an observable speed up.
The key is "small enough". It won't be the same answer for all problems.
Dogma should never be applied, except in the case of this dogma rule. Just like the only thing we should never tolerate is intolerance. There's a Popper paradox in there somewhere.