list.clear() vs list = new ArrayList<Integer>(); [duplicate]

It's hard to know without a benchmark, but if you have lots of items in your ArrayList and the average size is lower, it might be faster to make a new ArrayList.

http://www.docjar.com/html/api/java/util/ArrayList.java.html

public void clear() {
    modCount++;

    // Let gc do its work
    for (int i = 0; i < size; i++)
        elementData[i] = null;

    size = 0;
}

List.clear would remove the elements without reducing the capacity of the list.

groovy:000> mylist = [1,2,3,4,5,6,7,8,9,10,11,12]
===> [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
groovy:000> mylist.elementData.length
===> 12
groovy:000> mylist.elementData
===> [Ljava.lang.Object;@19d6af
groovy:000> mylist.clear()
===> null
groovy:000> mylist.elementData.length
===> 12
groovy:000> mylist.elementData
===> [Ljava.lang.Object;@19d6af
groovy:000> mylist = new ArrayList();
===> []
groovy:000> mylist.elementData
===> [Ljava.lang.Object;@2bfdff
groovy:000> mylist.elementData.length
===> 10

Here mylist got cleared, the references to the elements held by it got nulled out, but it keeps the same backing array. Then mylist was reinitialized and got a new backing array, the old one got GCed. So one way holds onto memory, the other one throws out its memory and gets reallocated from scratch (with the default capacity). Which is better depends on whether you want to reduce garbage-collection churn or minimize the current amount of unused memory. Whether the list sticks around long enough to be moved out of Eden might be a factor in deciding which is faster (because that might make garbage-collecting it more expensive).


I think that the answer is that it depends on a whole range of factors such as:

  • whether the list size can be predicted beforehand (i.e. can you set the capacity accurately),
  • whether the list size is variable (i.e. each time it is filled),
  • how long the lifetime of the list will be in both versions, and
  • your heap / GC parameters and CPU.

These make it hard to predict which will be better. But my intuition is that the difference will not be that great.

Two bits of advice on optimization:

  • Don't waste time trying to optimize this ... unless the application is objectively too slow AND measurement using a profiler tells you that this is a performance hotspot. (The chances are that one of those preconditions won't be true.)

  • If you do decide to optimize this, do it scientifically. Try both (all) of the alternatives and decide which is best by measuring the performance in your actual application on a realistic problem / workload / input set. (An artificial benchmark is liable to give you answers that do not predict real-world behavior, because of factors like those I listed previously.)