Memory overhead of Java HashMap compared to ArrayList

Solution 1:

If you're comparing HashMap with ArrayList, I presume you're doing some sort of searching/indexing of the ArrayList, such as binary search or custom hash table...? Because a .get(key) thru 6 million entries would be infeasible using a linear search.

Using that assumption, I've done some empirical tests and come up with the conclusion that "You can store 2.5 times as many small objects in the same amount of RAM if you use ArrayList with binary search or custom hash map implementation, versus HashMap". My test was based on small objects containing only 3 fields, of which one is the key, and the key is an integer. I used a 32bit jdk 1.6. See below for caveats on this figure of "2.5".

The key things to note are:

(a) it's not the space required for references or "load factor" that kills you, but rather the overhead required for object creation. If the key is a primitive type, or a combination of 2 or more primitive or reference values, then each key will require its own object, which carries an overhead of 8 bytes.

(b) In my experience you usually need the key as part of the value, (e.g. to store customer records, indexed by customer id, you still want the customer id as part of the Customer object). This means it is IMO somewhat wasteful that a HashMap separately stores references to keys and values.

Caveats:

  1. The most common type used for HashMap keys is String. The object creation overhead doesn't apply here so the difference would be less.

  2. I got a figure of 2.8, being 8880502 entries inserted into the ArrayList compared with 3148004 into the HashMap on -Xmx256M JVM, but my ArrayList load factor was 80% and my objects were quite small - 12 bytes plus 8 byte object overhead.

  3. My figure, and my implementation, requires that the key is contained within the value, otherwise I'd have the same problem with object creation overhead and it would be just another implementation of HashMap.

My code:

public class Payload {
    int key,b,c;
    Payload(int _key) { key = _key; }
}


import org.junit.Test;

import java.util.HashMap;
import java.util.Map;


public class Overhead {
    @Test
    public void useHashMap()
    {
        int i=0;
        try {
            Map<Integer, Payload> map = new HashMap<Integer, Payload>();
            for (i=0; i < 4000000; i++) {
                int key = (int)(Math.random() * Integer.MAX_VALUE);
                map.put(key, new Payload(key));
            }
        }
        catch (OutOfMemoryError e) {
            System.out.println("Got up to: " + i);
        }
    }

    @Test
    public void useArrayList()
    {
        int i=0;
        try {
            ArrayListMap map = new ArrayListMap();
            for (i=0; i < 9000000; i++) {
                int key = (int)(Math.random() * Integer.MAX_VALUE);
                map.put(key, new Payload(key));
            }
        }
        catch (OutOfMemoryError e) {
            System.out.println("Got up to: " + i);
        }
    }
}


import java.util.ArrayList;


public class ArrayListMap {
    private ArrayList<Payload> map = new ArrayList<Payload>();
    private int[] primes = new int[128];

    static boolean isPrime(int n)
    {
        for (int i=(int)Math.sqrt(n); i >= 2; i--) {
            if (n % i == 0)
                return false;
        }
        return true;
    }

    ArrayListMap()
    {
        for (int i=0; i < 11000000; i++)    // this is clumsy, I admit
            map.add(null);
        int n=31;
        for (int i=0; i < 128; i++) {
            while (! isPrime(n))
                n+=2;
            primes[i] = n;
            n += 2;
        }
        System.out.println("Capacity = " + map.size());
    }

    public void put(int key, Payload value)
    {
        int hash = key % map.size();
        int hash2 = primes[key % primes.length];
        if (hash < 0)
            hash += map.size();
        do {
            if (map.get(hash) == null) {
                map.set(hash, value);
                return;
            }
            hash += hash2;
            if (hash >= map.size())
                hash -= map.size();
        } while (true);
    }

    public Payload get(int key)
    {
        int hash = key % map.size();
        int hash2 = primes[key % primes.length];
        if (hash < 0)
            hash += map.size();
        do {
            Payload payload = map.get(hash);
            if (payload == null)
                return null;
            if (payload.key == key)
                return payload;
            hash += hash2;
            if (hash >= map.size())
                hash -= map.size();
        } while (true);
    }
}

Solution 2:

The simplest thing would be to look at the source and work it out that way. However, you're really comparing apples and oranges - lists and maps are conceptually quite distinct. It's rare that you would choose between them on the basis of memory usage.

What's the background behind this question?

Solution 3:

All that is stored in either is pointers. Depending on your architecture a pointer should be 32 or 64 bits (or more or less)

An array list of 10 tends to allocate 10 "Pointers" at a minimum (and also some one-time overhead stuff).

A map has to allocate twice that (20 pointers) because it stores two values at a time. Then on top of that, it has to store the "Hash". which should be bigger than the map, at a loading of 75% it SHOULD be around 13 32-bit values (hashes).

so if you want an offhand answer, the ratio should be about 1:3.25 or so, but you are only talking pointer storage--very small unless you are storing a massive number of objects--and if so, the utility of being able to reference instantly (HashMap) vs iterate (array) should be MUCH more significant than the memory size.

Oh, also: Arrays can be fit to the exact size of your collection. HashMaps can as well if you specify the size, but if it "Grows" beyond that size, it will re-allocate a larger array and not use some of it, so there can be a little waste there as well.

Solution 4:

I don't have an answer for you either, but a quick google search turned up a function in Java that might help.

Runtime.getRuntime().freeMemory();

So I propose that you populate a HashMap and an ArrayList with the same data. Record the free memory, delete the first object, record memory, delete the second object, record the memory, compute the differences,..., profit!!!

You should probably do this with magnitudes of data. ie Start with 1000, then 10000, 100000, 1000000.

EDIT: Corrected, thanks to amischiefr.

EDIT: Sorry for editing your post, but this is pretty important if you are going to use this (and It's a little much for a comment) . freeMemory does not work like you think it would. First, it's value is changed by garbage collection. Secondly, it's value is changed when java allocates more memory. Just using the freeMemory call alone doesn't provide useful data.

Try this:

public static void displayMemory() {
    Runtime r=Runtime.getRuntime();
    r.gc();
    r.gc(); // YES, you NEED 2!
    System.out.println("Memory Used="+(r.totalMemory()-r.freeMemory()));
}

Or you can return the memory used and store it, then compare it to a later value. Either way, remember the 2 gcs and subtracting from totalMemory().

Again, sorry to edit your post!