Let it crash model

Versions: Akka 2.5.6

Some time ago during my research about Akka I found a term describing coding philosophy using, among others, by this library. The term itself is quite intriguing and it's the reason why I've decided to deepen it.

Data Engineering Design Patterns

Looking for a book that defines and solves most common data engineering problems? I wrote one on that topic! You can read it online on the O'Reilly platform, or get a print copy on Amazon.

I also help solve your data engineering problems 👉 contact@waitingforcode.com 📩

This post describes this coding philosophy. The first part puts it into a context of defensive programming and makes an insight on some points of let it crash model. The next section shows a program respecting the main principles of this model.

Let it crash model and defensive programming

In order to better understand Let it crash model, let's begin by explaining the main concepts of defensive programming. The main goals of this principle consist on making the system behave in consistent and predictable manner even under unexpected conditions (e.g. unexpected user input). Some signs of defensive programming are:

controlling defaults - for instance in switch-case statements, we'll always define the default behavior:
```
  int number = getNumberFromSomewhere...
  switch (number) {
    case 1: 
      //
    case 2:
      // 
    case default:
      // 
  }
  
```
The default is here even if we expect that the number always will be contained between 1 and 2.
ensuring types correctness - especially visible in the case of casts where the ((Car)myObject).drive() will be replaced by more safe:
```
  if (myObject instanceof Car) {
    ((Car)myObject).drive()
  }
  
```
assertions - sometimes they're used in order to ensure that the function parameters are valid (e.g. are not null)

Let it crash model (Licm) deals with "unexpected" differently. It doesn't try to prevent failures. Instead it considers the failures as a natural state of the application's lifecycle and tries to deal with it. A failure can be caused either by an unexpected reason (e.g. unhandled input) or by intentionally implemented behavior. So the model tells that failing should not be considered as critical since it promotes the idea of self-healing systems, able to recover themselves from the errors.

Thus instead of trying to define and fix every kind of error, Let it crash model guides the programming effort to the systems that can not only detect that something went wrong but also can deal with the problem thanks to defined strategy. Among these strategies we could distinguish: retry failed process or continue the process by skipping the failing context (e.g. can be put to a kind of dead-letter place).

To achieve the self-healing, each running process must be supervised by another process called supervisor. The role of the supervisor consists on handling the failure by executing the most appropriate strategy.

Let it crash model use cases

Maybe the most obvious example of the Licm is Erlang. Erlang programming model is based on the supervision tree composed of workers and supervisors. As you can deduce, the workers leaves represent running processes while the supervisor ones correspond to the processes monitoring the behavior of workers or other supervisors. The tree data structure simplifies the implementation of 2 supervision strategies:

1 for 1 Supervision - if a supervised process dies, then only this process is restarted.
1 for all Supervision - if a supervised process dies, then all supervised processes are terminated and restarted.

Even if Erlang's implementation of Licm is quite easy to understand, this language is still less popular than the JVM-based languages. Thus the case of Scala, and more particularly of its Akka library, will be used to illustrate the model through the code. But before going into the implementation, let's outline in what this library fits into Licm.

The execution of programs written with Akka library is based on 2 flows: normal and recovery. In the first one, the application executes normally while in the second one some actors monitor the activity of the others in the normal flow. As in the case of Erlang, the monitoring actors are called supervisors and they're also responsible for handling failures. But they have more choices in terms of failure handling:

resume the supervised actor keeping its internal state
restart the supervised actor and clear its internal state
stop the supervised actor permanently
escalate the failure that can lead to make supervisor fail

Moreover, Akka supports also the same supervision strategies as Erlang. They're represented by AllForOneStrategy and OneForOneStrategy classes.

Let it crash model in Akka actors

To see let it crash model in action we'll use the Akka actors. The example is simple. One actor tries to figure out if the other actor (conventionally called "interpreted") understands some words. According to the situation, the interaction can continue, restart or be stopped:

class LetItCrashModelTest  extends FunSuite with BeforeAndAfter with Matchers {

  implicit val system = ActorSystem("responses-learner-service")
  private val InterpretersSupervisor = {
    val understandableWords = Seq("house", "cat", "dog")
    system.actorOf(Props(new Supervisor(understandableWords)), "stranger-supervisor")
  }

  before {
    DataHolder.restartMessages.clear()
    DataHolder.stopsCounter = 0
  }

  test("should ignore the message of unsupported type") {
    InterpretersSupervisor ! 300

    // Give some time to put the message
    Thread.sleep(500)
    DataHolder.restartMessages shouldBe empty
  }

  test("should restart the actor when a world is not understandable") {
    InterpretersSupervisor ! "pies"

    // Give some time to put the message and check if the actor was restarted
    Thread.sleep(500)
    DataHolder.restartMessages should have size 1
    DataHolder.restartMessages(0) shouldEqual "The world pies is not understandable by the actor"
  }

  test("should stop the actor when the message is empty") {
    InterpretersSupervisor ! ""

    // Give some time to put the message and check if the actor was stopped
    Thread.sleep(500)
    DataHolder.stopsCounter shouldEqual 1
  }

}

object DataHolder {
  val restartMessages = new ListBuffer[String]()

  var stopsCounter = 0
}

class Interpreter(knownWords: Seq[String]) extends Actor {

  override def preRestart(reason: Throwable, message: Option[Any]) = {
    DataHolder.restartMessages.append(reason.getMessage)
    super.preRestart(reason, message)
  }

  override def postStop() = {
    DataHolder.stopsCounter += 1
  }

  override def receive: Receive = {
    case message: String => {
      if (message.isEmpty) {
        throw new RuntimeException("Somebody is joking - the message can't be empty")
      } else if (!knownWords.contains(message)) {
        throw new IllegalStateException(s"The world ${message} is not understandable by the actor")
      }
    }
    case _ => throw new IllegalArgumentException("Only text messages are expected to be sent")
  }

}

class Supervisor(interpeterUnderstandableWords: Seq[String]) extends Actor {

  // The supervised object must be created inside the actor
  private val Interpreter = context.actorOf(Props(new Interpreter(interpeterUnderstandableWords)), "interpreter")

  override def receive: Receive = {
    case msg => Interpreter forward msg
  }

  override def supervisorStrategy = OneForOneStrategy() {
    case _: IllegalArgumentException => Resume
    case _: IllegalStateException => Restart
    case _: Throwable => Stop
  }
}

Unlike defensive programming approach, the Let it crash model considers failures as one of natural state of the system and doesn't try to prevent them in all possible ways. Instead it favors the thinking about dealing with failures that can lead to auto-recovery systems. One of the examples can be Akka library in which supervisors respond, according to defined strategy (restart, resume, stop...), to each of supervised actors failures. It was pretty clearly shown in the last section code snippet where the supervisor adopted different behavior for 3 supported errors (unknown word, empty message or not supported data type message).

Consulting

With nearly 16 years of experience, including 8 as data engineer, I offer expert consulting to design and optimize scalable data solutions. As an O’Reilly author, Data+AI Summit speaker, and blogger, I bring cutting-edge insights to modernize infrastructure, build robust pipelines, and drive data-driven decision-making. Let's transform your data challenges into opportunities—reach out to elevate your data engineering game today!

👉 contact@waitingforcode.com
🔗 past projects

TAGS: #Programming models