ArrayList Vs LinkedList
I was following a previous post on this that says:
For LinkedList
- get is O(n)
- add is O(1)
- remove is O(n)
- Iterator.remove is O(1)
For ArrayList
- get is O(1)
- add is O(1) amortized, but O(n) worst-case since the array must be resized and copied
- remove is O(n)
So by looking at this, I concluded that if I've to do just sequential insert in my collection for say 5000000 elements, LinkedList
will outclass ArrayList
.
And if I've to just fetch the elements from collection by iterating i.e. not grabbing the element in middle, still LinkedList
will outclass `ArrayList.
Now to verify my above two statements, I wrote below sample program… But I'm surprised that my above statements were proven wrong.
ArrayList
outclassed Linkedlist
in both the cases. It took less time than LinkedList
for adding as well as fetching them from Collection. Is there anything I'm doing wrong, or the initial statements about LinkedList
and ArrayList
does not hold true for collections of size 5000000?
I mentioned size, because if I reduce the number of elements to 50000, LinkedList
performs better and initial statements hold true.
long nano1 = System.nanoTime();
List<Integer> arr = new ArrayList();
for(int i = 0; i < 5000000; ++i) {
arr.add(i);
}
System.out.println( (System.nanoTime() - nano1) );
for(int j : arr) {
;
}
System.out.println( (System.nanoTime() - nano1) );
long nano2 = System.nanoTime();
List<Integer> arrL = new LinkedList();
for(int i = 0; i < 5000000; ++i) {
arrL.add(i);
}
System.out.println( (System.nanoTime() - nano2) );
for(int j : arrL) {
;
}
System.out.println( (System.nanoTime() - nano2) );
Remember that big-O complexity describes asymptotic behaviour and may not reflect actual implementation speed. It describes how the cost of each operation grows with the size of the list, not the speed of each operation. For example, the following implementation of add
is O(1) but is not fast:
public class MyList extends LinkedList {
public void add(Object o) {
Thread.sleep(10000);
super.add(o);
}
}
I suspect in your case ArrayList is performing well because it increases it's internal buffer size fairly aggressively so there will not be a large number of reallocations. When the buffer does not need to be resized ArrayList will have faster add
s.
You also need to be very careful when you do this kind of profiling. I'd suggest you change your profiling code to do a warm-up phase (so the JIT has the opportunity to do some optimization without affecting your results) and average the results over a number of runs.
private final static int WARMUP = 1000;
private final static int TEST = 1000;
private final static int SIZE = 500000;
public void perfTest() {
// Warmup
for (int i = 0; i < WARMUP; ++i) {
buildArrayList();
}
// Test
long sum = 0;
for (int i = 0; i < TEST; ++i) {
sum += buildArrayList();
}
System.out.println("Average time to build array list: " + (sum / TEST));
}
public long buildArrayList() {
long start = System.nanoTime();
ArrayList a = new ArrayList();
for (int i = 0; i < SIZE; ++i) {
a.add(i);
}
long end = System.nanoTime();
return end - start;
}
... same for buildLinkedList
(Note that sum
may overflow and you might be better to use System.currentTimeMillis()
).
It's also possible that the compiler is optimizing away your empty get
loops. Make sure the loop actually does something to ensure that the right code is getting called.
This is a bad benchmark IMO.
- need to repeat in loop multiple times to warm up jvm
- need to DO something in your iterative loop or it can be optimized array
-
ArrayList
resizes, which is costly. If you had constructedArrayList
asnew ArrayList(500000)
you would construct in one blow, and then all allocations would be quite cheap (one preallocating backed array) - You don't specify your memory JVM - it should be run with -xMs == -Xmx (everything preallocated) and sufficiently high that no GC is likely to be triggered
- This benchmark doesn't cover the most unpleasant aspect of LinkedList - random access. (an iterator isn't necessarily the same thing). If you feed say 10% of the size of a large collection as a random selection of
list.get
you will find linkedlists are awful for grabbing anything other than the first or last element.
For an arraylist: the jdk get is what you'd expect:
public E get(int index) {
RangeCheck(index);
return elementData[index];
}
(basically just return the indexed array element.,
For a linkedlist:
public E get(int index) {
return entry(index).element;
}
looks similar? Not quite. entry is a method not an primitive array, and look what it has to do:
private Entry<E> entry(int index) {
if (index < 0 || index >= size)
throw new IndexOutOfBoundsException("Index: "+index+
", Size: "+size);
Entry<E> e = header;
if (index < (size >> 1)) {
for (int i = 0; i <= index; i++)
e = e.next;
} else {
for (int i = size; i > index; i--)
e = e.previous;
}
return e;
}
That's right, if you ask for say list.get(250000)
, it's gotta start at the head and repeatedly iterate through the next element. 250000 accesses or so (there's an optimization in the code where it starts at head or tail depending on which would be less accesses.)
An ArrayList is a simpler data structure than a LinkedList. An ArrayList has a single array of pointers in contiguous memory locations. It only has to be recreated if the array is expanded beyond its allocated size.
A LinkedList consists of a chain of nodes; each node is separated allocated and has front and back pointers to other nodes.
So what does this mean? Unless you need to insert in the middle, splice, delete in the middle etc. an ArrayList will usually be faster. It needs less memory allocations, has much better locality of reference (which is important for processor caching) etc.
To understand why the results you got do not contradict the "big O" characterization. We need to go back to first principles; i.e. the definition.
Let f(x) and g(x) be two functions defined on some subset of the real numbers. One writes
f(x) = O(g(x)) as x -> infinity
if and only if, for sufficiently large values of x, f(x) is at most a constant multiplied by g(x) in absolute value. That is, f(x) = O(g(x)) if and only if there exists a positive real number M and a real number x0 such that
|f(x)| <= M |g(x)| for all x > x_0.
In many contexts, the assumption that we are interested in the growth rate as the variable x goes to infinity is left unstated, and one writes more simply that f(x) = O(g(x)).
So, the statement add1 is O(1)
, means is that the time cost of an add1
operation on a list of size N tends towards a constant Cadd1 as N tends to infinity.
And the statement add2 is O(1) amortized over N operations
, means is that the average time cost of one of a sequence of N add2
operations tends towards a constant Cadd2 as N tends to infinity.
What is does not say is what those constants Cadd1 and Cadd2 are. In fact the reason that LinkedList is slower than ArrayList in your benchmark is that Cadd1 is larger than Cadd2.
The lesson is that big O notation does not predict absolute or even relative performance. All it predicts is the shape of the performance function as the controlling variable gets very large. This is useful to know, but it doesn't tell you everything you need to know.