Where do I find a standard Trie based map implementation in Java? [closed]
I have a Java program that stores a lot of mappings from Strings to various objects.
Right now, my options are either to rely on hashing (via HashMap) or on binary searches (via TreeMap). I am wondering if there is an efficient and standard trie-based map implementation in a popular and quality collections library?
I've written my own in the past, but I'd rather go with something standard, if available.
Quick clarification: While my question is general, in the current project I am dealing with a lot of data that is indexed by fully-qualified class name or method signature. Thus, there are many shared prefixes.
You might want to look at the Trie implementation that Limewire is contributing to the Google Guava.
There is no trie data structure in the core Java libraries.
This may be because tries are usually designed to store character strings, while Java data structures are more general, usually holding any Object
(defining equality and a hash operation), though they are sometimes limited to Comparable
objects (defining an order). There's no common abstraction for "a sequence of symbols," although CharSequence
is suitable for character strings, and I suppose you could do something with Iterable
for other types of symbols.
Here's another point to consider: when trying to implement a conventional trie in Java, you are quickly confronted with the fact that Java supports Unicode. To have any sort of space efficiency, you have to restrict the strings in your trie to some subset of symbols, or abandon the conventional approach of storing child nodes in an array indexed by symbol. This might be another reason why tries are not considered general-purpose enough for inclusion in the core library, and something to watch out for if you implement your own or use a third-party library.
Apache Commons Collections v4.0 now supports trie structures.
See the org.apache.commons.collections4.trie
package info for more information. In particular, check the PatriciaTrie
class:
Implementation of a PATRICIA Trie (Practical Algorithm to Retrieve Information Coded in Alphanumeric).
A PATRICIA Trie is a compressed Trie. Instead of storing all data at the edges of the Trie (and having empty internal nodes), PATRICIA stores data in every node. This allows for very efficient traversal, insert, delete, predecessor, successor, prefix, range, and select(Object) operations. All operations are performed at worst in O(K) time, where K is the number of bits in the largest item in the tree. In practice, operations actually take O(A(K)) time, where A(K) is the average number of bits of all items in the tree.
Also check out concurrent-trees. They support both Radix and Suffix trees and are designed for high concurrency environments.
I wrote and published a simple and fast implementation here.