As other programming components, data access layer has also its own patterns and anti-patterns. One of the most famous examples of anti-patterns is called select n+1.
Data Engineering Design Patterns
Looking for a book that defines and solves most common data engineering problems? I'm currently writing
one on that topic and the first chapters are already available in π
Early Release on the O'Reilly platform
I also help solve your data engineering problems π contact@waitingforcode.com π©
The first part of this post explains what is this select n+1 anti-pattern. The second part shows how to avoid it with the example of Criteria-based queries.
What is select n+1 problem ?
Select n+1 problem is related to fetching data in parent-child relationship. Normally, it should be retrieved in a single query thanks to JOIN clauses. However in some cases, especially in ORM's entity-based queries, child rows can be queried separately with direct SELECT queries on joined table.
Why these nested selects are bad ? Firstly, sending data across network naturally slows down final results generation. With all data generated in a single query, the client incurs network connection costs only once. Secondly, with the increasing number of queries to execute, the number of database lookups increases also and it also has an influence on results generation latency.
To illustrate the problem we'll use an example of customer who has 1,n addresses. The problem in pseudo-code looks like:
[] consumers = query("SELECT * FROM consumer"); for (Consumer consumer in consumers) [] addresses = consumer.getAddresses()
Solving n+1 problem
Below you can find some commented test cases showing how to produce and solve n+1 problem:
// ... Init part omitted for brevity @Test public void should_show_the_problem_of_n_1_selects() { CriteriaBuilder criteriaBuilder = entityManager.getCriteriaBuilder(); CriteriaQuery<Customer> query = criteriaBuilder.createQuery(Customer.class); Root<Customer> queryRoot = query.from(Customer.class); query.select(queryRoot); TypedQuery<Customer> typedQuery = entityManager.createQuery(query); List<Customer> customers = typedQuery.getResultList(); consumeCustomers(customers); // 2 customers, each of them has 3 addresses, so expected // number of queries in n+1 problem is: 1 for get all customers and // 2 for get addresses of each customer assertThat(customers).hasSize(2); assertThat(testAppender.customerQueries).hasSize(1); assertThat(testAppender.addressesQueries).hasSize(2); } @Test public void should_load_customers_with_addresses_as_a_single_query() { CriteriaBuilder criteriaBuilder = entityManager.getCriteriaBuilder(); CriteriaQuery<Customer> query = criteriaBuilder.createQuery(Customer.class); Root<Customer> queryRoot = query.from(Customer.class); query.select(queryRoot); // Without distinct the Cartesian product is returned. // Note also that it DISTINCT clause can need some additional deduplication // work from database. The solution could be the use of // Hibernate's Criteria.setResultTransformer(DISTINCT_ROOT_ENTITY) // method query.distinct(true); queryRoot.fetch("addresses", JoinType.LEFT); TypedQuery<Customer> typedQuery = entityManager.createQuery(query); List<Customer> customers = typedQuery.getResultList(); consumeCustomers(customers); // This time generated query contains a LEFT JOIN address // and looks like: // select distinct COLUMNS from customer customer0_ left outer join address addresses1_ // on customer0_.id=addresses1_.customer_id assertThat(customers).hasSize(2); assertThat(testAppender.customerQueries).hasSize(1); assertThat(testAppender.addressesQueries).isEmpty(); } private void consumeCustomers(List<Customer> customers) { for (Customer customer : customers) { for (Address address : customer.getAddresses()) { System.out.println("Found address " + address); } } } private static final class TestAppender extends ConsoleAppender { private List<String> addressesQueries = new ArrayList<>(); private List<String> customerQueries = new ArrayList<>(); @Override public void append(LoggingEvent event) { String logMessage = event.getRenderedMessage(); if (logMessage.contains("select") && logMessage.contains("from address")) { addressesQueries.add(logMessage); } else if (logMessage.contains("select") && logMessage.contains("from customer")) { customerQueries.add(logMessage); } } }
n+1 problem is quite easy to detect. Every time when a nested loop is iterated on child values, it can execute distinct select queries. One of available fixes can be the use of explicit joins, as shown in the test case from the second part of this post.