Let it crash model

Versions: Akka 2.5.6

Some time ago during my research about Akka I found a term describing coding philosophy using, among others, by this library. The term itself is quite intriguing and it's the reason why I've decided to deepen it.

This post describes this coding philosophy. The first part puts it into a context of defensive programming and makes an insight on some points of let it crash model. The next section shows a program respecting the main principles of this model.

Let it crash model and defensive programming

In order to better understand Let it crash model, let's begin by explaining the main concepts of defensive programming. The main goals of this principle consist on making the system behave in consistent and predictable manner even under unexpected conditions (e.g. unexpected user input). Some signs of defensive programming are:

Let it crash model (Licm) deals with "unexpected" differently. It doesn't try to prevent failures. Instead it considers the failures as a natural state of the application's lifecycle and tries to deal with it. A failure can be caused either by an unexpected reason (e.g. unhandled input) or by intentionally implemented behavior. So the model tells that failing should not be considered as critical since it promotes the idea of self-healing systems, able to recover themselves from the errors.

Thus instead of trying to define and fix every kind of error, Let it crash model guides the programming effort to the systems that can not only detect that something went wrong but also can deal with the problem thanks to defined strategy. Among these strategies we could distinguish: retry failed process or continue the process by skipping the failing context (e.g. can be put to a kind of dead-letter place).

To achieve the self-healing, each running process must be supervised by another process called supervisor. The role of the supervisor consists on handling the failure by executing the most appropriate strategy.

Let it crash model use cases

Maybe the most obvious example of the Licm is Erlang. Erlang programming model is based on the supervision tree composed of workers and supervisors. As you can deduce, the workers leaves represent running processes while the supervisor ones correspond to the processes monitoring the behavior of workers or other supervisors. The tree data structure simplifies the implementation of 2 supervision strategies:

Even if Erlang's implementation of Licm is quite easy to understand, this language is still less popular than the JVM-based languages. Thus the case of Scala, and more particularly of its Akka library, will be used to illustrate the model through the code. But before going into the implementation, let's outline in what this library fits into Licm.

The execution of programs written with Akka library is based on 2 flows: normal and recovery. In the first one, the application executes normally while in the second one some actors monitor the activity of the others in the normal flow. As in the case of Erlang, the monitoring actors are called supervisors and they're also responsible for handling failures. But they have more choices in terms of failure handling:

Moreover, Akka supports also the same supervision strategies as Erlang. They're represented by AllForOneStrategy and OneForOneStrategy classes.

Let it crash model in Akka actors

To see let it crash model in action we'll use the Akka actors. The example is simple. One actor tries to figure out if the other actor (conventionally called "interpreted") understands some words. According to the situation, the interaction can continue, restart or be stopped:

class LetItCrashModelTest  extends FunSuite with BeforeAndAfter with Matchers {

  implicit val system = ActorSystem("responses-learner-service")
  private val InterpretersSupervisor = {
    val understandableWords = Seq("house", "cat", "dog")
    system.actorOf(Props(new Supervisor(understandableWords)), "stranger-supervisor")
  }

  before {
    DataHolder.restartMessages.clear()
    DataHolder.stopsCounter = 0
  }

  test("should ignore the message of unsupported type") {
    InterpretersSupervisor ! 300

    // Give some time to put the message
    Thread.sleep(500)
    DataHolder.restartMessages shouldBe empty
  }

  test("should restart the actor when a world is not understandable") {
    InterpretersSupervisor ! "pies"

    // Give some time to put the message and check if the actor was restarted
    Thread.sleep(500)
    DataHolder.restartMessages should have size 1
    DataHolder.restartMessages(0) shouldEqual "The world pies is not understandable by the actor"
  }

  test("should stop the actor when the message is empty") {
    InterpretersSupervisor ! ""

    // Give some time to put the message and check if the actor was stopped
    Thread.sleep(500)
    DataHolder.stopsCounter shouldEqual 1
  }

}

object DataHolder {
  val restartMessages = new ListBuffer[String]()

  var stopsCounter = 0
}

class Interpreter(knownWords: Seq[String]) extends Actor {

  override def preRestart(reason: Throwable, message: Option[Any]) = {
    DataHolder.restartMessages.append(reason.getMessage)
    super.preRestart(reason, message)
  }

  override def postStop() = {
    DataHolder.stopsCounter += 1
  }

  override def receive: Receive = {
    case message: String => {
      if (message.isEmpty) {
        throw new RuntimeException("Somebody is joking - the message can't be empty")
      } else if (!knownWords.contains(message)) {
        throw new IllegalStateException(s"The world ${message} is not understandable by the actor")
      }
    }
    case _ => throw new IllegalArgumentException("Only text messages are expected to be sent")
  }

}

class Supervisor(interpeterUnderstandableWords: Seq[String]) extends Actor {

  // The supervised object must be created inside the actor
  private val Interpreter = context.actorOf(Props(new Interpreter(interpeterUnderstandableWords)), "interpreter")

  override def receive: Receive = {
    case msg => Interpreter forward msg
  }

  override def supervisorStrategy = OneForOneStrategy() {
    case _: IllegalArgumentException => Resume
    case _: IllegalStateException => Restart
    case _: Throwable => Stop
  }
}

Unlike defensive programming approach, the Let it crash model considers failures as one of natural state of the system and doesn't try to prevent them in all possible ways. Instead it favors the thinking about dealing with failures that can lead to auto-recovery systems. One of the examples can be Akka library in which supervisors respond, according to defined strategy (restart, resume, stop...), to each of supervised actors failures. It was pretty clearly shown in the last section code snippet where the supervisor adopted different behavior for 3 supported errors (unknown word, empty message or not supported data type message).