what's a good persistent collections framework for use in java?

By persistent collections I mean collections like those in clojure.

For example, I have a list with the elements (a,b,c). With a normal list, if I add d, my original list will have (a,b,c,d) as its elements. With a persistent list, when I call list.add(d), I get back a new list, holding (a,b,c,d). However, the implementation attempts to share elements between the list wherever possible, so it's much more memory efficient than simply returning a copy of the original list. It also has the advantage of being immutable (if I hold a reference to the original list, then it will always return the original 3 elements).

This is all explained much better elsewhere (e.g. http://en.wikipedia.org/wiki/Persistent_data_structure).

Anyway, my question is... what's the best library for providing this functionality for use in java? Can I use the clojure collections somehow (other that by directly using clojure)?


Solution 1:

Just use the ones in Clojure directly. While obviously you might not want to use the language it's self, you can still use the persistent collections directly as they are all just Java classes.

import clojure.lang.PersistentHashMap;
import clojure.lang.IPersistentMap;

IPersistentMap map = PersistentHashMap.create("key1", "value1");

assert map.get("key1").equals("value1");
IPersistentMap map2 = map.assoc("key1", "value1");

assert map2 != map;
assert map2.get("key1").equals("value1");

(disclaimer: I haven't actually compiled that code :)

the down side is that the collections aren't typed, i.e. there are no generics with them.

Solution 2:

What about pcollections?

You can also check out Clojure's implementation of persistent collections (PersistentHashMap, for instance).

Solution 3:

I was looking for a slim, Java "friendly" persistent collection framework and took TotallyLazy and PCollections mentioned in this thread for a testdrive, because they sounded most promising to me.

Both provide reasonable simple interfaces to manipulate persistent lists:

// TotallyLazy
PersistentList<String> original = PersistentList.constructors.empty(String.class);
PersistentList<String> modified = original.append("Mars").append("Raider").delete("Raider");

// PCollections
PVector<String> original = TreePVector.<String>empty();
PVector<String> modified = original.plus("Mars").plus("Raider").minus("Raider");

Both PersistentList and PVector extend java.util.List, so both libraries should integrate well into an existing environment.

It turns out, however, that TotallyLazy runs into performance problems when dealing with larger lists (as already mentioned in a comment above by @levantpied). On my MacBook Pro (Late 2013) inserting 100.000 elements and returning the immutable list took TotallyLazy ~2000ms, whereas PCollections finished within ~120ms.

My (simple) test cases are available on Bitbucket, if someone wants to take a more thorough look.

[UPDATE]: I recently had a look at Cyclops X, which is a high performing and more complete lib targeted for functional programming. Cyclops also contains a module for persistent collections.