Distinction between the capacity of an array list and the size of an array

I read the below snippet in Core Java I book.

Allocating an array list as new ArrayList <'Employee>(100) // capacity is 100

is not the same as allocating a new array as new Employee[100] // size is 100

There is an important distinction between the capacity of an array list and the size of an array. If you allocate an array with 100 entries, then the array has 100 slots, ready for use. An array list with a capacity of 100 elements has the potential of holding 100 elements (and, in fact, more than 100, at the cost of additional reallocations); but at the beginning, even after its initial construction, an array list holds no elements at all.

When I saw the source code array list, the constructor creates of an Object array of given capacity which is ready to hold elements of given capacity (below is the code snippet).

public ArrayList(int initialCapacity) {
     super();
     if (initialCapacity < 0)
         throw new IllegalArgumentException("Illegal Capacity: "+
                                            initialCapacity);
     this.elementData = new Object[initialCapacity];
 }

I am not able to figure out the actual difference what the author has mentioned in above text.


Solution 1:

If you allocate a new array with arr = new Employee[100], the size of that array (arr.length) is going to be 100. It has 100 elements. All the elements are initially null (as this is an array of object references), but still, there are 100 elements.

If you do something like list = new ArrayList <Employee>(100), and try to check list.size(), you'll get 0. There are no elements in the list.

Internally, it's true that the ArrayList allocates enough place to put 100 items before it needs to extend its capacity, but that's an internal implementation detail, and the list presents its content to you as "no items stored". Only if you actually do list.add(something), you'll have items in the list.

So although the list allocates storage in advance, the API with which it communicates with the program tells you there are no items in it. The null items in its internal array are not available to you - you cannot retrieve them or change them.

Solution 2:

An ArrayList is just one way to represent an abstract list, and the capacity of an ArrayList is an implementation detail of how the system implements the logical list.

An ArrayList stores the elements of a list by using an actual array "under the covers." The actual realization of the array in computer memory has a certain size when it is allocated; this size is the ArrayList's capacity. The ArrayList emulates a variable-sized list by storing the logical length of the list in addition to the fixed-length array. Thus if you have an ArrayList with a capacity 10 which contains 4 logical elements, the ArrayList can be represented as a length and an array

(4) | e1 | e2 | e3 | e4 | __ | __ | __| __ | __ | __ |

where the (4) is the logical length of the list and '__' represent data that is ignored because it is not part of the logical list. If you attempt to access the 5th element of this ArrayList, it will throw an exception because it knows that the fifth element has not been initialized. If we then append an extra element e5 to the list, the ArrayList becomes

(5) | e1 | e2 | e3 | e4 | e5 | __ | __ | __ | __ | __ |

Note that the capacity has not changed, while the logical length has, because the underlying array is still able to handle all the data in the logical list.

If you manage to add more than ten elements to this list, the ArrayList will not break. The ArrayList is an abstraction meant to be compatible with all array operations. Rather, the ArrayList changes its capacity when its logical length exceeds its original capacity. If we were to add the elements (a1, a2, ..., a7) to the above list, the resulting ArrayList might look like

(12) | e1 | e2 | e3 | e4 | e5 | a1 | a2 | a3 | a4 | a5 | a6 | a7 | __ | __ | __ | __ | __ | __ | __ | __ |

with a capacity of 20.

Once you have created an ArrayList, you can ignore the capacity in all programming that follows; the logic is unaffected. However, the performance of the system under certain kinds of operations can be affected. Increasing the capacity, for instance, might well involved allocating a larger array, copying the first array into the second and then performing the operations. This can be quite slow compared to, e.g. the same operation on a linked list. Thus it is sensible to choose the capacity of an ArrayList to be bigger than, or at least comparable to, the actual number of elements expected in the real runtime environment.