How to do bulk (multi row) inserts with JpaRepository?

To get a bulk insert with Sring Boot and Spring Data JPA you need only two things:

  1. set the option spring.jpa.properties.hibernate.jdbc.batch_size to appropriate value you need (for example: 20).

  2. use saveAll() method of your repo with the list of entities prepared for inserting.

Working example is here.

Regarding the transformation of the insert statement into something like this:

INSERT INTO table VALUES (1, 2), (3, 4), (5, 6)

the such is available in PostgreSQL: you can set the option reWriteBatchedInserts to true in jdbc connection string:

jdbc:postgresql://localhost:5432/db?reWriteBatchedInserts=true

then jdbc driver will do this transformation.

Additional info about batching you can find here.

UPDATED

Demo project in Kotlin: sb-kotlin-batch-insert-demo

UPDATED

Hibernate disables insert batching at the JDBC level transparently if you use an IDENTITY identifier generator.


The underlying issues is the following code in SimpleJpaRepository:

@Transactional
public <S extends T> S save(S entity) {
    if (entityInformation.isNew(entity)) {
        em.persist(entity);
        return entity;
    } else {
        return em.merge(entity);
    }
}

In addition to the batch size property settings, you have to make sure that the class SimpleJpaRepository calls persist and not merge. There are a few approaches to resolve this: use an @Id generator that does not query sequence, like

@Id
@GeneratedValue(generator = "uuid2")
@GenericGenerator(name = "uuid2", strategy = "uuid2")
var id: Long

Or forcing the persistence to treat the records as new by having your entity implement Persistable and overriding the isNew() call

@Entity
class Thing implements Pesistable<Long> {
    var value: Int,
    @Id
    @GeneratedValue
    var id: Long = -1
    @Transient
    private boolean isNew = true;
    @PostPersist
    @PostLoad
    void markNotNew() {
        this.isNew = false;
    }
    @Override
    boolean isNew() {
        return isNew;
    }
}

Or override the save(List) and use the entity manager to call persist()

@Repository
public class ThingRepository extends SimpleJpaRepository<Thing, Long> {
    private EntityManager entityManager;
    public ThingRepository(EntityManager entityManager) {
        super(Thing.class, entityManager);
        this.entityManager=entityManager;
    }

    @Transactional
    public List<Thing> save(List<Thing> things) {
        things.forEach(thing -> entityManager.persist(thing));
        return things;
    }
}

The above code is based on the following links:

  • http://www.hameister.org/SpringBootUsingIdsForBulkImports.html
  • http://www.hameister.org/SpringBootBulkImportWithCrudRepository.html
  • https://vladmihalcea.com/the-best-way-to-do-batch-processing-with-jpa-and-hibernate/

You can configure Hibernate to do bulk DML. Have a look at Spring Data JPA - concurrent Bulk inserts/updates. I think section 2 of the answer could solve your problem:

Enable the batching of DML statements Enabling the batching support would result in less number of round trips to the database to insert/update the same number of records.

Quoting from batch INSERT and UPDATE statements:

hibernate.jdbc.batch_size = 50

hibernate.order_inserts = true

hibernate.order_updates = true

hibernate.jdbc.batch_versioned_data = true

UPDATE: You have to set the hibernate properties differently in your application.properties file. They are under the namespace: spring.jpa.properties.*. An example could look like the following:

spring.jpa.properties.hibernate.jdbc.batch_size = 50
spring.jpa.properties.hibernate.order_inserts = true
....

All mentioned methods work but will be slow especially if the source for inserted data lies in some other table. Firstly, even with batch_size>1 the insert operation will be executed in multiple SQL queries. Secondly, if the source data lies in the other table you need to fetch the data with other queries (and in the worst case scenario load all data into memory), and convert it to static bulk inserts. Thirdly, with separate persist() call for each entity (even if batch is enabled) you will bloat entity manager first level cache with all these entity instances.

But there's another option for Hibernate. If you use Hibernate as a JPA provider you can fallback to HQL which supports bulk inserts natively with subselect from another table. The example:

Session session = entityManager.unwrap(Session::class.java)
session.createQuery("insert into Entity (field1, field2) select [...] from [...]")
  .executeUpdate();

Whether this will work depends on your ID generation strategy. If the Entity.id is generated by the database (for example MySQL auto increment), it will be executed successfully. If the Entity.id is generated by your code (especially true for UUID generators), it will fail with "unsupported id generation method" exception.

However, in the latter scenario this problem can be overcome by custom SQL function. For example in PostgreSQL I use uuid-ossp extension which provides uuid_generate_v4() function, which I finally register in my custom dialog:

import org.hibernate.dialect.PostgreSQL10Dialect;
import org.hibernate.dialect.function.StandardSQLFunction;
import org.hibernate.type.PostgresUUIDType;

public class MyPostgresDialect extends PostgreSQL10Dialect {

    public MyPostgresDialect() {
        registerFunction( "uuid_generate_v4", 
            new StandardSQLFunction("uuid_generate_v4", PostgresUUIDType.INSTANCE));
    }
}

And then I register this class as a hibernate dialog:

hibernate.dialect=MyPostgresDialect

Finally I can use this function in bulk insert query:

SessionImpl session = entityManager.unwrap(Session::class.java);
session.createQuery("insert into Entity (id, field1, field2) "+
  "select uuid_generate_v4(), [...] from [...]")
  .executeUpdate();

The most important is the underlying SQL generated by Hibernate to accomplish this operation and it's just a single query:

insert into entity ( id, [...] ) select uuid_generate_v4(), [...] from [...]