Why is processing a sorted array *slower* than an unsorted array? (Java's ArrayList.indexOf)
Solution 1:
It looks like caching / prefetching effect.
The clue is that you compare Doubles (objects), not doubles (primitives). When you allocate objects in one thread, they are typically allocated sequentially in memory. So when indexOf
scans a list, it goes through sequential memory addresses. This is good for CPU cache prefetching heuristics.
But after you sort the list, you still have to do the same number of memory lookups in average, but this time memory access will be in random order.
UPDATE
Here is the benchmark to prove that the order of allocated objects matters.
Benchmark (generator) (length) (postprocess) Mode Cnt Score Error Units
ListIndexOf.indexOf random 1000000 none avgt 10 1,243 ± 0,031 ms/op
ListIndexOf.indexOf random 1000000 sort avgt 10 6,496 ± 0,456 ms/op
ListIndexOf.indexOf random 1000000 shuffle avgt 10 6,485 ± 0,412 ms/op
ListIndexOf.indexOf sequential 1000000 none avgt 10 1,249 ± 0,053 ms/op
ListIndexOf.indexOf sequential 1000000 sort avgt 10 1,247 ± 0,037 ms/op
ListIndexOf.indexOf sequential 1000000 shuffle avgt 10 6,579 ± 0,448 ms/op
Solution 2:
I think we are seeing the effect of memory cache misses:
When you create the unsorted list
for (int i = 0; i < LIST_LENGTH; i++) {
list.add(r.nextDouble());
}
all the double are most likely allocated in a contiguous memory area. Iterating through this will produce few cache misses.
On the other hand in the sorted list the references point to memory in a chaotic manner.
Now if you create a sorted list with contiguous memory:
Collection.sort(list);
List<Double> list2 = new ArrayList<>();
for (int i = 0; i < LIST_LENGTH; i++) {
list2.add(new Double(list.get(i).doubleValue()));
}
this sorted list has the same performance than the original one (my timing).