Scala extractors

Versions: Scala 2.12.1

Scala's apply method is a convenient way to create objects without the use of new operator and thus, to reduce the verbosity just a little. Often, as for instance in the case of case classes, apply is accompanied by its opposite, unapply, used in its turn to build extractors.

In this post we'll focus on the role of extractors in Scala. The first section will start by giving an example of these objects. It'll be followed by a short explanation of case classes behavior shortly mentioned some lines before. The last part will show extractor use in different contexts. Each of them will contain some code examples to illustrate the presented ideas.

Extractors defined

Since extractors are mostly met in pattern matching constructs, let's begin by writing a simple matching operation on a text:

it("should apply on simple string object") {
  val textFromLower = "lowerCase"
  val textFromUpper = "UpperCase"
  val emptyText = ""

  def getNormalizedText(text: String): String = {
    val extractedText = text match {
      case NormalizeName(text) => text
      case _ => "_EMPTY_"
    }
    extractedText
  }

  object NormalizeName {
    def unapply(notNormalizedText: String): Option[String] = {
      if (notNormalizedText.isEmpty) {
        None
      } else if (notNormalizedText(0) == notNormalizedText(0).toUpper) {
        Some(notNormalizedText)
      } else {
        Some(s"${notNormalizedText(0).toUpper}${notNormalizedText.substring(1)}")
      }
    }
  }

  val normalizedLowerCase = getNormalizedText(textFromLower)
  normalizedLowerCase shouldEqual "LowerCase"
  val normalizedUpperCase = getNormalizedText(textFromUpper)
  normalizedUpperCase shouldEqual "UpperCase"
  val normalizedEmptyText = getNormalizedText(emptyText)
  normalizedEmptyText shouldEqual "_EMPTY_"
}

The code shows one of use cases of extractor in pattern matching. As you can see, we have a string text that is used in the pattern matching through NormalizeName extractor's unapply method. Inside this method we simply check if the text starts by an upper case. If yes it's returned as provided, otherwise it's reformatted with the first letter put to upper case.

Hence we could define extractor as an object with implemented unapply or unapplySeq method, able to extract some values of an object. It returns one of 3 different values:

The same rules apply for unapplySeq method. The single difference is that unapplySeq is better adapted to values with variable number of arguments, as for instance sequences. A short example shows that:

it("should apply for sequence") {
  object StringWordsExtractor {
    def unapplySeq(commaSeparatedText: String): Option[Seq[String]] = {
      Some(commaSeparatedText.split(","))
    }
  }

  val result = "a,b,c,d" match {
    case StringWordsExtractor(item1, item2, item3, _*) => s"Got: ${item1}, ${item2}, ${item3}"
    case _ => "d"
  }

  result shouldEqual "Got: a, b, c"
}

Extractors and case classes

Scala's case classes have some special behavior. Scala compiler automatically generates their equals and hashCode methods to provide value equality without any code to write. But these native implementations are not the single ones because unapply is also provided without any implementation need. Let's first look at a regular class use in pattern matching:

class Person(firstName: String, lastName: String)
val abPerson = new Person("a", "b")
val result = abPerson match {
  case Person(firstName, lastName) => ""
  case _ => ""
}

The code doesn't compile because of compilation error: Error:(45, 12) not found: value Person case Person(firstName, lastName) => "". But the same code compiles and works if we transform class Person in case class Person:

it("should use automatically generated unapply") {
case class Person(firstName: String, lastName: String)
  val abPerson = Person("a", "b")
  val result = abPerson match {
    case Person(firstName, lastName) => s"Got ${firstName} ${lastName}"
    case _ => ""
  }

  result shouldEqual "Got a b"
}

it("should apply on case class through explicit unapply call") {
  case class Person(firstName: String, lastName: String)

  val (firstName, lastName) = Person.unapply(Person("a", "b")).get

  firstName shouldEqual "a"
  lastName shouldEqual "b"
}

Unlike previous test case, these 2 compiled and worked correctly. It's because of automatically generated unapply method. We can see that by compiling a case class Person(firstName: String, lastName: String) and analyzing generated class with javap Person.class. The output will be similar to:

Compiled from "Person.scala"
public class Person implements scala.Product,scala.Serializable {
  public static scala.Option> unapply(Person);
  public static Person apply(java.lang.String, java.lang.String);
  public static scala.Function1, Person> tupled();
  public static scala.Function1> curried();
  public java.lang.String firstName();
  public java.lang.String lastName();
  // ....

As you can clearly see, unapply method returning an optional tuple with class 2 parameters was generated. It's why case classes can be used in pattern matching natively.

Extractors use cases

We've seen in the first section that extractors can return one of 3 types and that depending on them, they are used either to test a condition or to extract specific values. To be more precise, their use was pretty well summarized in Pattern Matching in Scala paper by Michael RĆ¼egg. The author, going through pattern matching specificities, presents extractors in the context of:

In this post about Scala features we discovered extractors. As shown, they're objects having unapply or unapplySeq methods implemented. Thanks to them these objects can be used in pattern matching blocks to extract values from matched object through unapply methods. The methods that are automatically added by Scala compiler to case classes and that, as shown in the last part, can be used in conjunction with pattern matching in conversion or regular expressions.