Batch processing in Hibernate

When multiple operations must be processed at once, it's better to do it automatically. This type of operations is called batch (do not confuse with bash which is a command language).

In this article we'll focus on batch processing with Hibernate. In the first part we'll discover the generic concept of batch processing. The next part will be destined to its implementation in Hibernate. At the last part we'll write some code to illustrate this feature.

What is batch processing ?

Let's start by try to understand the concept of batch processing. It's an automatic treatment of the non-interactive jobs. "Non-interactive" means that they are no human intervention as, for example, form filling or manual configuration for every treated task.

A good example of batch processing is the billing system of your mobile. Last day of every month you receive a billing with an amount to pay. The amount is calculated by adding the price of every call you made. The calculation is made automatically, at the end of every month. You don't receive the invoice after every call.

Batch processing in Hibernate

Batch processing in Hibernate starts with some of configuration. The property hibernate.jdbc.batch_size facilitates execution of batch queries. A batch query is a SQL query able to make more than one data manipulation's by a call. For example, batch query is an insertion of 50 rows with one INSERT INTO... call. If batch_size property is bigger than 1, the queries will be grouped and send with less calls than without batching.

Nothing explains better than a pragmatic sample. Imagine that batch_size of your application is 50. Now, you want to insert 120 rows to the products table. Hibernate won't call INSERT INTO products 120 times, but only 3 times. The two first calls will contain the first 100 insertions. The last call will insert remaining 20 rows.

Some of pre-requisites must be known before you write batch processing code:

  1. your database must support the execution of batch queries
  2. the batched entities can't have primary key generated with IDENTITY strategy. If your entity has following identity strategy, batch queries won't work:
    @Entity
    @Table(name="products")
    public class Product {
      @Id
      @GeneratedValue(strategy = IDENTITY)
      @Column(name="id")
      public int getId() {
        return this.id;
      }
    }
  3. if you're using MySQL, add rewriteBatchedStatemets to your connection's URL. MySQL doesn't support batch form of prepared statements. Thanks to rewriteBatchedStatements=true, we indicates that the driver should rewrite prepared statements into multi-value INSERT INTO query when batch job is submitted by the client. Now, if we log MySQL queries, we'll see one global multi-value insert instead of line per line inserts. There are an example of multi-value insert query:
    530 Query	/* insert com.waitingforcode.data.Product */ insert into products (name, id) values ('product: 0', 1),('product: 1', 2),('product: 2', 3),('product: 3', 4),('product: 4', 5),('product: 5', 6),('product: 6', 7),('product: 7', 8),('product: 8', 9),('product: 9', 10)
    
  4. prefer to check the MySQL logs instead of Hibernate ones. To be sure that your batch queries are executed correctly, it's better to check them into MySQL queries logs. Hibernate will show you individual insert every time (at least, every time when I was testing batch queries) and it can confuse you. To enable MySQL queries log, edit the configuration file by putting log lines (example from Ubuntu). First, you have to found your configuration file:
    bartosz@bartosz-K70ID:/etc$ cd mysql/
    bartosz@bartosz-K70ID:/etc/mysql$ ls
    conf.d  debian.cnf  debian-start  my.cnf
    bartosz@bartosz-K70ID:/etc/mysql$ sudo gedit my.cnf 
    [sudo] password for bartosz:
    
    After that, uncomment following entries, placed under [mysqld] part (if they are absent, add them under this part):
    general_log_file        = /var/log/mysql/mysql.log
    general_log             = 1
    

    Please note that this configuration was only tested on 5.5.34 version of MySQL.
  5. beware of batch queries for entities with relations. If one entity depends on another by one-to-one or one-to-many relationship, for example when one customer can have a shopping cart, two INSERT queries must be done to satisfy data integrity. Without any supplementary configuration, Hibernate will execute these inserts separately. So, if you have 100 carts to add, 200 request will be made. Even if batch_size is equal to 50. This behavior can be changed thanks to following configuration entries:
    true
    true        
    

    They tell to Hibernate to group insert and update statements. With this configuration we should have only 4 inserts (2 for shopping carts and 2 for customers) in the case of previously described example.

Batch processing example in Hibernate

After this short theoretical introduction, we can pass to batch processing sample. They are our test class:

@Entity
@Table(name="products")
public class Product {

  private int id;
  private String name;

  public void setId(int id) {
    this.id = id;
  }

  @Id
  @Column(name="id")
  public int getId() {
    return this.id;
  }
  
  public void setName(String name) {
    this.name = name;
  }
  
  @Column(name="name")
  @NotEmpty
  public String getName() {
    return this.name;
  }

  @Override
  public String toString() {
    return "Product {"+this.name+"}";
  }
        
}

One comment. We won't implement identification strategy to do not complicate presented code. This "trick" should be avoided in the realworld applications.

Because we will run the batch inside Spring, we need to configure a session factory bean as below:


  
  
  
    
      org.hibernate.dialect.MySQLDialect
      5
      true
      true
      true
      true
    
     

The session's factory configuration is focused on batch processing. You can see that we'll try to submit 5 inserts at the same query (batch_size entry). order_inserts and order_updates, you know them from the previous part of the article. Two another configurations, these ones used only to debug the output, are show_sql and use_sql_comments. The first one will print Hibernate's SQL queries into console. Thanks to it, you'll see that even if MySQL executes batched queries, Hibernate prints insert queries one per one. use_sql_comments will add some comments before queries definition. Thanks to it, you'll able to see, for example, which Java's entity is concerned by the SQL query.

Datasource configuration looks like:


  
  
  
  
  
   
  
  
  
  
  
  
  
  

Note the presence of rewriteBatchedStatements in jdbcUrl property.

And now we can pass to JUnit test case:

@RunWith(SpringJUnit4ClassRunner.class)
@ContextConfiguration(locations={"configuration.xml"})
@WebAppConfiguration
public class BatchTest {

  private static final Logger LOGGER = LoggerFactory.getLogger(BatchTest.class);
  @Autowired
  private SessionFactory sessionFactory;

  @Test
  public void insert() {
    Session session = sessionFactory.openSession();
    try {
      // FlushMode.COMMIT means that all items will be saved into database after transaction.commit() successfull call
      session.setFlushMode(FlushMode.COMMIT);

      // CacheMode.IGNORE tells Hibernate to ignore caching interactions. Thanks to it we can avoid to overload Java's memory with batched and cached objects.
      session.setCacheMode(CacheMode.IGNORE);
      Transaction transaction = session.beginTransaction();

      // NOTE (repeat): manual generation is not a good way to handle ids, but for the simplicity reason, we use it this time
      int id = 1;
      for (int i=1; i<11; i++) {
        Product product = new Product();
        product.setId(id++);
        product.setName("product: "+i);
        session.save(product);
        // we need to flush and clear the session every 5 new objects. It's done to avoid memory problems caused by session overloading.
        if (i % 5 == 0) {
          session.flush();
          session.clear();
        }
      }
      // commit transaction: at this moment flushed Product instances will be saved in database 
      transaction.commit();
    } catch (Exception e) {
      LOGGER.error("An error occured on batch execution: "+e.getMessage(), e);
      fail("An exception interrupted the test: "+e.getMessage());
    } finally {
      session.close();
    }
  }	
}

All important comments are placed inside the code for better understanding. Before executing this test case, create table at database with following statement:

CREATE TABLE IF NOT EXISTS products (
  id INT(5) NOT NULL,
  name VARCHAR(20) NOT NULL,
  PRIMARY KEY(id)
);

Now execute previous test and make some checks:
- in console you should see line per line inserts, as below:

Hibernate: 
    /* insert com.waitingforcode.db.entity.Product
        */ insert 
        into
            products
            (name, id) 
        values
            (?, ?)
Hibernate: 
    /* insert com.waitingforcode.db.entity.Product
        */ insert 
        into
            products
            (name, id) 
        values
            (?, ?)
Hibernate: 
    /* insert com.waitingforcode.db.entity.Product
        */ insert 
        into
            products
            (name, id) 
        values
            (?, ?)
// ... etc.

- when you check MySQL queries logs, you'll see that the products are added two times, by 5 products in each INSERT query:

186 Query     SHOW FULL TABLES FROM `springSandbox` LIKE 'PROBABLYNOT' 
140425 20:04:48 186 Query     SHOW FULL TABLES FROM `springSandbox` LIKE 'PROBABLYNOT'
186 Query     SET autocommit=0
186 Query     /* insert com.waitingforcode.db.entity.Product */ insert into products (name, id) values ('product: 1', 1),('product: 2', 2),('product: 3', 3),('product: 4', 4),('product: 5', 5)
186 Query     /* insert com.waitingforcode.db.entity.Product */ insert into products (name, id) values ('product: 6', 6),('product: 7', 7),('product: 8', 8),('product: 9', 9),('product: 10', 10)
186 Query     commit
140425 20:04:49   186 Query     SET autocommit=1

- and finally, to be sure that all operations were correctly executed, check inserted rows inside products table:

mysql> select * from products;

+----+-------------+
| id | name        |
+----+-------------+
|  1 | product: 1  |
|  2 | product: 2  |
|  3 | product: 3  |
|  4 | product: 4  |
|  5 | product: 5  |
|  6 | product: 6  |
|  7 | product: 7  |
|  8 | product: 8  |
|  9 | product: 9  |
| 10 | product: 10 |
+----+-------------+
10 rows in set (0.00 sec)

After reading all these conditions, we can finally execute batch jobs in Hibernate. Remember that you can't do that with auto-generated identity and that you must be careful when dealing with entities containing (one|many)-to-(one|many) relationships. In the case of tests, you should always turn on MySQL queries logs and check if executed queries are transmitted by Hibernate as expected.


If you liked it, you should read:

📚 Newsletter Get new posts, recommended reading and other exclusive information every week. SPAM free - no 3rd party ads, only the information about waitingforcode!