Java memory model

In multi-threading languages as Java, a memory model is mandatory to be able to determine correctly code behavior on concurrent tasks execution.

Data Engineering Design Patterns

Looking for a book that defines and solves most common data engineering problems? I wrote one on that topic! You can read it online on the O'Reilly platform, or get a print copy on Amazon.

I also help solve your data engineering problems 👉 contact@waitingforcode.com 📩

This article will be an introduction to Java's memory model specification. In the first part we'll see the key concepts to understand how memory works. After that, we'll try to define the concept of memory model for multi-threading languages. At the end we'll move some of general behaviors and ideas into Java's world, without consecrating to them a lot of place. It'll be make in one of next articles.

Reminder about memory

Before analyzing memory as some complicated piece of concurrent programming, let's start by remind some basics about memory working. To define memory shortly, we can tell that it's a container that holds all information of actually running programs. Programs can have also the data stored in the memory, known also by acronym RAM (Random Access Memory, another used terms are: primary storage, primary memory, main storage, internal storage, main memory). This container stores the data within specific addresses, for example you can retrieve data belonging to program 'A' at 0 address and data of 'B' program at address 39. Addresses are like a mailboxes. They help to find needed information simpler and at any time. But stored information can change at every moment, exactly as the names written at mailboxes.

Another important participant of program execution is central processing unit (CPU). It's a control center that transforms instructions received from hardware and software (input) to expected output (for example: action as opening of new program). To make this transformation, CPU uses, two units: arithmetic logic (executes arithmetic and logic operations as 1 > 2 ?) and control (sends electrical signals to computer system can execute stored program instructions). We can now deduce that CPU is closely associated with memory that holds data and instructions to processing. First, this information is sent to memory from control unit. Next, arithmetic logic unit gets them and make necessary operations before returning them back to memory. These stages also known as: fetch (the instruction), decode (the instruction to understand what should be done and which data), execute (the instruction) and write (back into memory). When treated data is ready to output (for output device or secondary storage device as hard disk), it's also deleted from the memory.

An interesting concept resulting from interaction between CPU units and memory, is CPU cache. It's used by CPU to limit the access to the main memory. It stores the copy of data frequently read from the memory. The CPU cache is also one of elements that can pose problems in execution of multi-threading programs. Imagine following pseudo-code class:

class TestClass {
  
  int telNumber = 00000;
  String name = "";

  void write() {
    telNumber = 39839309;
    name = "O'Lery";
  }
  
   void read() {
    print "Tel number is: "+this.telNumber;
    print "Name is: "+this.name;
  }
  
}

Now, they're two threads operating on the same instance of this class, one will read and another one will write both variables. telNumber field is stored in CPU cache while name one is registered only in main memory. The access for the first field will be faster than in the case of the second. But we can't be sure that reading thread will see the changes made by writing thread because it'll still read from cache and the chances could be made in main memory. The situation becomes more complicated when CPU reorders the execution of lines to improve program performances. This kind of reordering is called out-of-order execution.

To be able to execute some operations before another, we must implement a memory barrier (called also as membar, memory fence or fence instruction). Let's come back to our TestClass and put a memory barrier between two prints in read method. Thanks to this barrier, CPU will execute the print of tel number before the print of name String. 4 types of memory barriers exist (load and store are the memory operations, the first to memory-read and the second to memory-write):
- StoreStore: guarantees that the writes separated by memory barrier are visible one before the other. For example:

set x = 10;
StoreStore barrier
set y = 20;

In this situation, thanks to StoreStore we're sure that all readers will see x set to 10 before y set to 20. An example of StoreStore could be flushing of all dirty entities out of the cache.
- LoadLoad: this memory barrier guarantees the respect of loads in memory. Thanks to it, we can read one information before another one:

while (y != 20);
LoadLoad barrier
get x;

Thanks to memory barrier in this simple, we can indicate to another threads executing StoreStore case, that they must know about data reading. The example of LoadLoad could be removing of all cached entities.
- LoadStore: ensures that the loads all made before writes:

get b;
LoadStore barrier
set a = 30;

And the other thread:

get a;
LoadStore barrier
set b = 20;

- StoreLoad: mix both write and get operation, it ensures that all writes made before the memory barrier are visible to other processors and that all loads after the barrier receive always the latest value, for example:

set a = 30;
StoreLoad barrier
get b;

And the other thread:

set b = 20;
StoreLoad barrier
get a;

Notice here that reordering can be also made by programming language compilers, always to improve performances.

What is memory model ?

As we mentioned earlier, compiler is also able to make some reordering at the name of optimization. But it shouldn't do it without considering memory rules. We can distinguish two memory models:
- strong: in this model, all writes of one CPU core are visible by other CPU cores. They're visible in execution order. From described 4 types of memory barriers, only one is available in this type: StoreLoad.
- weak: here, all 4 memory barriers can occur because the instructions reordering can be made as well by compiler as by processor.

To simplify, the memory model describes which operations should be visible at given moment. It defines also the behavior for specific situations, as synchronization when two threads try to access the same variable or method at given time.

Java Memory Model (JMM)

The first Java Memory Model (JMM), defined in 1995, were criticized. It wasn't able to use many runtime optimizations and protect code against concurrency issues (for example some final fields were observed to change theirs values or writes on volalite fields were reordered with non-volatile ones and produce a non-intuitive environment). It's only in 2004 when new JMM took effect. It had to reply to following interrogations:
- How to simplify synchronized syntax ?
- How to facilitate the understanding of multi-threading programs execution ?
- How to guarantee initialization safety where final fields without synchronization aren't corrupted in multi-thread environment ?
- How to plug proved JMM considerations on popular hardware architectures ?

Among the concepts strongly associated with JMM, we can distinguish:

final fields they're immutable, ie. are initialized once and can't be modified. Thanks to it, compiler can freely move them in the goal of optimize the code execution.
synchronized blocks only one thread can execute synchronized blocks and methods. Synchronization ensure also that all writes made by single thread are flushed to memory and seen by other threads. So, we can tell that it's a remedy for CPU cache issues. All entry to synchronized blocks purges the local cache and force to load the data directly from the memory.
volatile fields these fields are designed to returning always the most recent value. Multi-threading program won't be able to see corrupted values, for example: read from cache or being result of execution reordering by compilator. The data is always read from the main memory and they're no problems with getting stale informations. If one thread modifies volatile field, this write will be visible to the second thread making the read operation of changed volatile.
atomicity atomic classes are placed inside java.util.concurrent.atomic package. As we can read in Sun's Java tutorial about atomic variables, they're work like volatile fields. They aren't read from CPU cache but from the main memory and the changes are visible to all threads.

A great example to illustrate this case is class with incremented counter. If two threads increment concurrently a simple int, they can read incorrect values from the cache. By this way, 1st thread can increment the number 8 by setting it to 9. But the 2nd thread can no see these changes in its cache and also increment 8 by giving 9 again. Atomic classes, in occurency AtomicInteger, allows to avoid this kind of situations by using volatile reads and writes through set and get methods.
happen before relation it's the name of the rules which main goal is to ensure that execution order in single thread program is kept at runtime. The second goal is to ensure that the ordering of synchronized blocks and volatile variables read and writes are preserved when multiple threads working on them. Why these rules are called happen before ? To understand, take a look on the chapter about Java Memory Model in the book "Java concurrency in practice". You could find there that happens-before can be applied to several rules:
- program order: in single thread, one action happens before another acton defined later in the program order.
- monitor lock: the monitor lock, acquired for example with synchronized block, happens before another lock on the same monitor.
- volatile variables: the volatile fields writes happen before the reads of these fields.
- thread start: thread start happens before the other operations made by this thread (ie. Thread.start() happens before Thread.doSomething()).
- thread termination: actions in a thread happens before other threads detect the thread's termination.
- thread interruption: interruption invoked from one thread to another one must happen before the interrupted threads detects the interruption (ie. interruption must produce throwing of InterruptedException or invokation of isInterrupted or interrupted methods in the interrupted thread; the opposite can never happen).
- finalizers: the end of object construction happens before the start of the finalizer.
- ttransitivity: in the threads triangle, A-B-C, if A happens before B and B happens before C, A must happen before C too.

This articles shows that some of strange situations occur when we've compiled our Java code. The strange situations those the goal is to optimize the execution and, thanks to defined Java Memory Model, guarantee the execution with the respect of all happens-before rules. A lot of Java tools are there to simplify this, volatile reads that prevents cache read and write operations, synchronized block that can be acquired only by one monitor at the given moment and even final fields.

Consulting

With nearly 16 years of experience, including 8 as data engineer, I offer expert consulting to design and optimize scalable data solutions. As an O’Reilly author, Data+AI Summit speaker, and blogger, I bring cutting-edge insights to modernize infrastructure, build robust pipelines, and drive data-driven decision-making. Let's transform your data challenges into opportunities—reach out to elevate your data engineering game today!

👉 contact@waitingforcode.com
🔗 past projects