JPA: what is the proper pattern for iterating over large result sets?

Page 537 of Java Persistence with Hibernate gives a solution using ScrollableResults, but alas it's only for Hibernate.

So it seems that using setFirstResult/setMaxResults and manual iteration really is necessary. Here's my solution using JPA:

private List<Model> getAllModelsIterable(int offset, int max)
{
    return entityManager.createQuery("from Model m", Model.class).setFirstResult(offset).setMaxResults(max).getResultList();
}

then, use it like this:

private void iterateAll()
{
    int offset = 0;

    List<Model> models;
    while ((models = Model.getAllModelsIterable(offset, 100)).size() > 0)
    {
        entityManager.getTransaction().begin();
        for (Model model : models)
        {
            log.info("do something with model: " + model.getId());
        }

        entityManager.flush();
        entityManager.clear();
        em.getTransaction().commit();
        offset += models.size();
    }
}

I tried the answers presented here, but JBoss 5.1 + MySQL Connector/J 5.1.15 + Hibernate 3.3.2 didn't work with those. We've just migrated from JBoss 4.x to JBoss 5.1, so we've stuck with it for now, and thus the latest Hibernate we can use is 3.3.2.

Adding couple of extra parameters did the job, and code like this runs without OOMEs:

        StatelessSession session = ((Session) entityManager.getDelegate()).getSessionFactory().openStatelessSession();

        Query query = session
                .createQuery("SELECT a FROM Address a WHERE .... ORDER BY a.id");
        query.setFetchSize(Integer.valueOf(1000));
        query.setReadOnly(true);
        query.setLockMode("a", LockMode.NONE);
        ScrollableResults results = query.scroll(ScrollMode.FORWARD_ONLY);
        while (results.next()) {
            Address addr = (Address) results.get(0);
            // Do stuff
        }
        results.close();
        session.close();

The crucial lines are the query parameters between createQuery and scroll. Without them the "scroll" call tries to load everything into memory and either never finishes or runs to OutOfMemoryError.


You can't really do this in straight JPA, however Hibernate has support for stateless sessions and scrollable result sets.

We routinely process billions of rows with its help.

Here is a link to documentation: http://docs.jboss.org/hibernate/core/3.3/reference/en/html/batch.html#batch-statelesssession