How do you query object collections in Java (Criteria/SQL-like)?

Suppose you have a collection of a few hundred in-memory objects and you need to query this List to return objects matching some SQL or Criteria like query. For example, you might have a List of Car objects and you want to return all cars made during the 1960s, with a license plate that starts with AZ, ordered by the name of the car model.

I know about JoSQL, has anyone used this, or have any experience with other/homegrown solutions?

Filtering is one way to do this, as discussed in other answers.

Filtering is not scalable though. On the surface time complexity would appear to be O(n) (i.e. already not scalable if the number of objects in the collection will grow), but actually because one or more tests need to be applied to each object depending on the query, time complexity more accurately is O(n t) where t is the number of tests to apply to each object.

So performance will degrade as additional objects are added to the collection, and/or as the number of tests in the query increases.

There is another way to do this, using indexing and set theory.

One approach is to build indexes on the fields within the objects stored in your collection and which you will subsequently test in your query.

Say you have a collection of Car objects and every Car object has a field color. Say your query is the equivalent of "SELECT * FROM cars WHERE Car.color = 'blue'". You could build an index on Car.color, which would basically look like this:

'blue' -> {Car{name=blue_car_1, color='blue'}, Car{name=blue_car_2, color='blue'}}
'red'  -> {Car{name=red_car_1, color='red'}, Car{name=red_car_2, color='red'}}

Then given a query WHERE Car.color = 'blue', the set of blue cars could be retrieved in O(1) time complexity. If there were additional tests in your query, you could then test each car in that candidate set to check if it matched the remaining tests in your query. Since the candidate set is likely to be significantly smaller than the entire collection, time complexity is less than O(n) (in the engineering sense, see comments below). Performance does not degrade as much, when additional objects are added to the collection. But this is still not perfect, read on.

Another approach, is what I would refer to as a standing query index. To explain: with conventional iteration and filtering, the collection is iterated and every object is tested to see if it matches the query. So filtering is like running a query over a collection. A standing query index would be the other way around, where the collection is instead run over the query, but only once for each object in the collection, even though the collection could be queried any number of times.

A standing query index would be like registering a query with some sort of intelligent collection, such that as objects are added to and removed from the collection, the collection would automatically test each object against all of the standing queries which have been registered with it. If an object matches a standing query then the collection could add/remove it to/from a set dedicated to storing objects matching that query. Subsequently, objects matching any of the registered queries could be retrieved in O(1) time complexity.

The information above is taken from CQEngine (Collection Query Engine). This basically is a NoSQL query engine for retrieving objects from Java collections using SQL-like queries, without the overhead of iterating through the collection. It is built around the ideas above, plus some more. Disclaimer: I am the author. It's open source and in maven central. If you find it helpful please upvote this answer!

I have used Apache Commons JXPath in a production application. It allows you to apply XPath expressions to graphs of objects in Java.

yes, I know it's an old post, but technologies appear everyday and the answer will change in the time.

I think this is a good problem to solve it with LambdaJ. You can find it here: http://code.google.com/p/lambdaj/

Here you have an example:

LOOK FOR ACTIVE CUSTOMERS // (Iterable version)

List<Customer> activeCustomers = new ArrayList<Customer>();  
for (Customer customer : customers) {  
  if (customer.isActive()) {  
    activeCusomers.add(customer);  
  }  
}

LambdaJ version

List<Customer> activeCustomers = select(customers, 
                                        having(on(Customer.class).isActive()));

Of course, having this kind of beauty impacts in the performance (a little... an average of 2 times), but can you find a more readable code?

It has many many features, another example could be sorting:

Sort Iterative

List<Person> sortedByAgePersons = new ArrayList<Person>(persons);
Collections.sort(sortedByAgePersons, new Comparator<Person>() {
        public int compare(Person p1, Person p2) {
           return Integer.valueOf(p1.getAge()).compareTo(p2.getAge());
        }
});

Sort with lambda

List<Person> sortedByAgePersons = sort(persons, on(Person.class).getAge());

Update: after java 8 you can use out of the box lambda expressions, like:

List<Customer> activeCustomers = customers.stream()
                                          .filter(Customer::isActive)
                                          .collect(Collectors.toList());

How do you query object collections in Java (Criteria/SQL-like)?

Related

Recent Posts