.toArray(new MyClass[0]) or .toArray(new MyClass[myList.size()])?
Assuming I have an ArrayList
ArrayList<MyClass> myList;
And I want to call toArray, is there a performance reason to use
MyClass[] arr = myList.toArray(new MyClass[myList.size()]);
over
MyClass[] arr = myList.toArray(new MyClass[0]);
?
I prefer the second style, since it's less verbose, and I assumed that the compiler will make sure the empty array doesn't really get created, but I've been wondering if that's true.
Of course, in 99% of the cases it doesn't make a difference one way or the other, but I'd like to keep a consistent style between my normal code and my optimized inner loops...
Solution 1:
Counterintuitively, the fastest version, on Hotspot 8, is:
MyClass[] arr = myList.toArray(new MyClass[0]);
I have run a micro benchmark using jmh the results and code are below, showing that the version with an empty array consistently outperforms the version with a presized array. Note that if you can reuse an existing array of the correct size, the result may be different.
Benchmark results (score in microseconds, smaller = better):
Benchmark (n) Mode Samples Score Error Units
c.a.p.SO29378922.preSize 1 avgt 30 0.025 ▒ 0.001 us/op
c.a.p.SO29378922.preSize 100 avgt 30 0.155 ▒ 0.004 us/op
c.a.p.SO29378922.preSize 1000 avgt 30 1.512 ▒ 0.031 us/op
c.a.p.SO29378922.preSize 5000 avgt 30 6.884 ▒ 0.130 us/op
c.a.p.SO29378922.preSize 10000 avgt 30 13.147 ▒ 0.199 us/op
c.a.p.SO29378922.preSize 100000 avgt 30 159.977 ▒ 5.292 us/op
c.a.p.SO29378922.resize 1 avgt 30 0.019 ▒ 0.000 us/op
c.a.p.SO29378922.resize 100 avgt 30 0.133 ▒ 0.003 us/op
c.a.p.SO29378922.resize 1000 avgt 30 1.075 ▒ 0.022 us/op
c.a.p.SO29378922.resize 5000 avgt 30 5.318 ▒ 0.121 us/op
c.a.p.SO29378922.resize 10000 avgt 30 10.652 ▒ 0.227 us/op
c.a.p.SO29378922.resize 100000 avgt 30 139.692 ▒ 8.957 us/op
For reference, the code:
@State(Scope.Thread)
@BenchmarkMode(Mode.AverageTime)
public class SO29378922 {
@Param({"1", "100", "1000", "5000", "10000", "100000"}) int n;
private final List<Integer> list = new ArrayList<>();
@Setup public void populateList() {
for (int i = 0; i < n; i++) list.add(0);
}
@Benchmark public Integer[] preSize() {
return list.toArray(new Integer[n]);
}
@Benchmark public Integer[] resize() {
return list.toArray(new Integer[0]);
}
}
You can find similar results, full analysis, and discussion in the blog post Arrays of Wisdom of the Ancients. To summarize: the JVM and JIT compiler contains several optimizations that enable it to cheaply create and initialize a new correctly sized array, and those optimizations can not be used if you create the array yourself.
Solution 2:
As of ArrayList in Java 5, the array will be filled already if it has the right size (or is bigger). Consequently
MyClass[] arr = myList.toArray(new MyClass[myList.size()]);
will create one array object, fill it and return it to "arr". On the other hand
MyClass[] arr = myList.toArray(new MyClass[0]);
will create two arrays. The second one is an array of MyClass with length 0. So there is an object creation for an object that will be thrown away immediately. As far as the source code suggests the compiler / JIT cannot optimize this one so that it is not created. Additionally, using the zero-length object results in casting(s) within the toArray() - method.
See the source of ArrayList.toArray():
public <T> T[] toArray(T[] a) {
if (a.length < size)
// Make a new array of a's runtime type, but my contents:
return (T[]) Arrays.copyOf(elementData, size, a.getClass());
System.arraycopy(elementData, 0, a, 0, size);
if (a.length > size)
a[size] = null;
return a;
}
Use the first method so that only one object is created and avoid (implicit but nevertheless expensive) castings.
Solution 3:
From JetBrains Intellij Idea inspection:
There are two styles to convert a collection to an array: either using a pre-sized array (like c.toArray(new String[c.size()])) or using an empty array (like c.toArray(new String[0]).
In older Java versions using pre-sized array was recommended, as the reflection call which is necessary to create an array of proper size was quite slow. However since late updates of OpenJDK 6 this call was intrinsified, making the performance of the empty array version the same and sometimes even better, compared to the pre-sized version. Also passing pre-sized array is dangerous for a concurrent or synchronized collection as a data race is possible between the size and toArray call which may result in extra nulls at the end of the array, if the collection was concurrently shrunk during the operation.
This inspection allows to follow the uniform style: either using an empty array (which is recommended in modern Java) or using a pre-sized array (which might be faster in older Java versions or non-HotSpot based JVMs).
Solution 4:
Modern JVMs optimise reflective array construction in this case, so the performance difference is tiny. Naming the collection twice in such boilerplate code is not a great idea, so I'd avoid the first method. Another advantage of the second is that it works with synchronised and concurrent collections. If you want to make optimisation, reuse the empty array (empty arrays are immutable and can be shared), or use a profiler(!).
Solution 5:
toArray checks that the array passed is of the right size (that is, large enough to fit the elements from your list) and if so, uses that. Consequently if the size of the array provided it smaller than required, a new array will be reflexively created.
In your case, an array of size zero, is immutable, so could safely be elevated to a static final variable, which might make your code a little cleaner, which avoids creating the array on each invocation. A new array will be created inside the method anyway, so it's a readability optimisation.
Arguably the faster version is to pass the array of a correct size, but unless you can prove this code is a performance bottleneck, prefer readability to runtime performance until proven otherwise.