JVM JIT optimizations on waitingforcode.com

Last time I was writing a post about the "whys" of code generation. And I found a lot of points related to JVM JIT optimizations. This time I will show what kind of optimization the JVM is capable of at runtime.

Data Engineering Design Patterns

Looking for a book that defines and solves most common data engineering problems? I'm currently writing one on that topic and the first chapters are already available in 👉 Early Release on the O'Reilly platform

I also help solve your data engineering problems 👉 contact@waitingforcode.com 📩

In this post I will list different possible optimizations that JVM can make at runtime. I won't focus here on a specific JVM compiler but rather on things that can be changed at runtime by the Just-In-Time compilation. I tried to group the different optimization techniques in 3 groups: code elimination, code modification, and memory optimization.

Code elimination

The simplest way to accelerate code execution is to...execute less code. When I heard about the first code elimination technique, I was quite surprised. I couldn't figure out how the JIT can remove rare checks and still execute the code properly. After all, if it would eliminate an if-else on a nullable object, we could end up with a NullPointerException. The technique letting that happen is called uncommon trap. JIT will track down the execution code for specific method and if it'll detect that the method's if-else check (eg. NPE check) is never called, it will substitute it with a trap. Once the trap is reached, the compiler will catch this event and deoptimize the optimized method, i.e. replace the "trapped" version with less optimistic one having the if-else check.

The previous point illustrated the optimistic optimization approach of the JVM JIT where the optimized method can be deoptimized if the execution environment changes. The next optimization, called lock elision is not prone to the invalidation. The idea of this technique is to remove synchronized block every time the lock doesn't take any effect, like for instance in the following snippet where it can be removed because there will always be only one thread accessing the instance of A:

public void someMethod() {
  A a = new A();
  synchronized (a) { 
   // ...
}

Another technique is quite similar to the first one but it doesn't guess since it eliminates dead code. It doesn't mean that you should let a dead code in your codebase though! You should always write your code for human beings and if you let the not used parts, you will decrease the readability of it. It also doesn't mean that you should worry about the elimination of everything which is not used. The code can be dropped only when it's not used and that it doesn't have the side effects.

Code changes

I could put the techniques from this part to the previous one but I wanted to clearly separate code elimination and code modification. So in this next part you can learn about code rewriting and the first use case is the one I covered shortly in the post Vectorized operations in Apache Spark SQL, which is loop unrolling. The idea is to reduce the number of iterations by making each iteration execute more operations, like here:

for (int i = 0; i < 100; i++) { 
  doSth(i);
}

// can be rewritten as:
for (int i = 0; i < 100; i += 5) {
  doSth(i);
  doSth(i+1);
  doSth(i+2);
  doSth(i+3);
  doSth(i+4);
}

The second technique from this category is method inlining. The idea is straightforward. If a method is called very often, it's considered by the compiler as being "hot". At this moment it can decide to take the body of this method and put it directly to the caller method. This optimization avoids costly instruction jumps and the execution flow transfers. The inlining depends on the frequency of calls but also on the size. That's another argument to keep your method short and simple (aside from the improved understandability).

Another way to reduce the number of assembly jumps is to apply the branch prediction optimization. The idea is to rearrange if-else condition regarding the frequency of their execution. For example, if the else case is executed much more often than the if instruction, the compiler can decide to transform else to an if and execute it before the initial if (which by the way becomes an else).

Next strategy rearranging the code is called lock coaersing. It applies when you define multiple synchronized blocks that can be merged into one common instruction, like in the following snippet:

synchronized(objectA) {
  doSth1();
}
synchronized(objectA) {
  doSth2();
}
// It can become
synchronized(objectA) {
  doSth1();
  doSth2();
}

The compiler can also apply expression sinking.The goal of this optimization is to move the code to the parts that are really using it. For instance, in the following snippet the variable id is used only in the first if statement. In the optimized version, this field could be moved inside the block:

int id = getIdFromDatabase();
if (...) { } else {}


// optimized
if (...) { 
  int id = getIdFromDatabase();
  // ...
} else {}

Memory optimization

That was all for code formatting. Let's see now some optimizations for memory use. One of them is called escape analysis and it avoids unnecessary objects allocation on the heap. If after inlining a method, some objects used in it are local, ie. they don't have side effects and are visible only inside this method, the compiler will allocate them on the stack rather than on the heap to reduce the memory allocation rate and the memory pressure (GC). By saying "allocate", I don't mean the classical object allocation. In case of escape analysis the compiler will use a technique called scalar replacement and deconstruct the object into its constituent fields and allocate these fields on the stack like normal local variables.

The second optimization from this family helps to reduce storage and computational costs with redundant store elimination. The idea here is to avoid to allocate and compute the fragment that has no effect. For instance, in the following example, the first line of the code can be optimized that way:

int result = x*y 
result = 10

The list is not exhaustive. You will find a more complete list in "JVM JIT-compiler overview" presentation linked below. However, I hope that this reduced list already gives you an idea about how JIT can improve code execution by applying different forms of optimizations (elimination, changes, allocation, ...).