Lazy operator in Scala on waitingforcode.com

Versions: Scala 2.12.1

Scala's lazy instances generation can be helpful in a lot of places. It simplifies writing since we can declare an instance at right and common place and delay its physical creation up to its first use. In Java we've this possibility too, though, it's much more verbose than in Scala.

Data Engineering Design Patterns

Looking for a book that defines and solves most common data engineering problems? I wrote one on that topic! You can read it online on the O'Reilly platform, or get a print copy on Amazon.

I also help solve your data engineering problems 👉 contact@waitingforcode.com 📩

In this short post talking about Scala features we'll discover lazy instances creation. The first section explains the "what" and "why" of lazy operator. The second one demonstrates what happens under-the-hood.

Lazy instance creation

In order to understand what lazy operator brings to Scala applications, nothing better than an example:

behavior of "val declaration"

it should "evaluate the object eagerly" in {
  var initialized = false
  class InitializedObject() {
    initialized = true
  }

  val tested = new InitializedObject()

  initialized shouldBe true
}

it should "evaluate the object lazy with lazy operator" in {
  var initialized = false
  class InitializedObject() {
    initialized = true

    def sayHello() = print("Hello world")
  }
  lazy val tested = new InitializedObject()

  initialized shouldBe false

  tested.sayHello()
  initialized shouldBe true
}

What is the difference between both test cases ? As you can see through assertions, the former ones creates the instance of InitializedObject eagerly, right after its val declaration. The latter case delays this creation until the place where the instance is used for the first time.

Thus, lazy evaluated expression can be considered as a mix between a function, because it's really executed only on demand, and an immutable variable since it's created only once. It's a perfect manner to defer expensive computations until they are needed and at the same time, to keep the code readable.

Lazy under-the-hood

To see what happens with lazy let's compile the following class with -print argument:

class LazyGenerator {

  lazy val heavyObjectInstance = new HeavyObject

}

class HeavyObject {}

The print parameter outputs the code generated by Scala compiler and in our case it looks like:

Warning:scalac:  // LazyGenerator.scala
Warning:scalac: package <empty> {
  class LazyGenerator extends Object {
    @volatile private[this] var bitmap$0: Boolean = false;
    private def heavyObjectInstance$lzycompute(): HeavyObject = {
      {
        LazyGenerator.this.synchronized({
          if (LazyGenerator.this.bitmap$0.unary_!())
            {
              LazyGenerator.this.heavyObjectInstance = new HeavyObject();
              LazyGenerator.this.bitmap$0 = true;
              ()
            };
          scala.runtime.BoxedUnit.UNIT
        });
        ()
      };
      LazyGenerator.this.heavyObjectInstance
    };
    lazy private[this] var heavyObjectInstance: HeavyObject = _;
    <stable> <accessor> lazy def heavyObjectInstance(): HeavyObject = if (LazyGenerator.this.bitmap$0.unary_!())
      LazyGenerator.this.heavyObjectInstance$lzycompute()
    else
      LazyGenerator.this.heavyObjectInstance;
    def <init>(): LazyGenerator = {
      LazyGenerator.super.<init>();
      ()
    }
  };
  class HeavyObject extends Object {
    def <init>(): HeavyObject = {
      HeavyObject.super.<init>();
      ()
    }
  }
}

As you can see, the code generated by the compiler is very similar to the code we would write in Java. It stores a flag telling if the lazy object was generated. This value is used in the accessor method that either generates the instance (first call) or simply returns already computed value - all wrapped with synchronized block to prevent race conditions.

However above approach has some risks of deadlock. In different cases listed in SIP-20 mentioned in "Read also" section we can find: circular dependencies, no circular dependencies - both with or without additional synchronization mechanisms. Shortly speaking, to avoid this problem an improvement was proposed with 2 synchronized blocks: one to create a lazy instance and another to notify waiting threads about successful creation. The SIP contains more details and explores more versions and potential problems.

Lazy evaluation brings a lot of convenience to write Scala programs. It helps to defer a costly initialization until the moment of the first use. As shown in the 2nd section, the code generated by the compiler is very similar to the code we could write in Java - a flag controls if the lazy object was initialized and if not, it delegates the instance creation to a synchronized method setting the flag to true and creating the object.

Consulting

With nearly 16 years of experience, including 8 as data engineer, I offer expert consulting to design and optimize scalable data solutions. As an O’Reilly author, Data+AI Summit speaker, and blogger, I bring cutting-edge insights to modernize infrastructure, build robust pipelines, and drive data-driven decision-making. Let's transform your data challenges into opportunities—reach out to elevate your data engineering game today!

👉 contact@waitingforcode.com
🔗 past projects

TAGS: #One Scala feature per week