Scala extractors

on waitingforcode.com

Scala extractors

Scala's apply method is a convenient way to create objects without the use of new operator and thus, to reduce the verbosity just a little. Often, as for instance in the case of case classes, apply is accompanied by its opposite, unapply, used in its turn to build extractors.

In this post we'll focus on the role of extractors in Scala. The first section will start by giving an example of these objects. It'll be followed by a short explanation of case classes behavior shortly mentioned some lines before. The last part will show extractor use in different contexts. Each of them will contain some code examples to illustrate the presented ideas.

Extractors defined

Since extractors are mostly met in pattern matching constructs, let's begin by writing a simple matching operation on a text:

it("should apply on simple string object") {
  val textFromLower = "lowerCase"
  val textFromUpper = "UpperCase"
  val emptyText = ""

  def getNormalizedText(text: String): String = {
    val extractedText = text match {
      case NormalizeName(text) => text
      case _ => "_EMPTY_"
    }
    extractedText
  }

  object NormalizeName {
    def unapply(notNormalizedText: String): Option[String] = {
      if (notNormalizedText.isEmpty) {
        None
      } else if (notNormalizedText(0) == notNormalizedText(0).toUpper) {
        Some(notNormalizedText)
      } else {
        Some(s"${notNormalizedText(0).toUpper}${notNormalizedText.substring(1)}")
      }
    }
  }

  val normalizedLowerCase = getNormalizedText(textFromLower)
  normalizedLowerCase shouldEqual "LowerCase"
  val normalizedUpperCase = getNormalizedText(textFromUpper)
  normalizedUpperCase shouldEqual "UpperCase"
  val normalizedEmptyText = getNormalizedText(emptyText)
  normalizedEmptyText shouldEqual "_EMPTY_"
}

The code shows one of use cases of extractor in pattern matching. As you can see, we have a string text that is used in the pattern matching through NormalizeName extractor's unapply method. Inside this method we simply check if the text starts by an upper case. If yes it's returned as provided, otherwise it's reformatted with the first letter put to upper case.

Hence we could define extractor as an object with implemented unapply or unapplySeq method, able to extract some values of an object. It returns one of 3 different values:

  • a single sub-value of any type T - as shown in the previous snippet
  • a boolean value - in such case the extractor can be treated as test method:
    it("should be used as a test case") {
      object EvenOddTester {
        def unapply(number: Int): Boolean = {
          number%2 == 0
        }
      }
    
      def checkIfIsEven(number: Int): Boolean = {
        val result = number match {
          // Surprisingly the extractor doesn't expect any argument. It's passed implicitly to it
          // In such case unapply is just a test saying if the match can be done or not
          case EvenOddTester() => true
          case _ => false
        }
        result
      }
    
      val is2Even = checkIfIsEven(2)
      is2Even shouldBe true
      val is3Even = checkIfIsEven(3)
      is3Even shouldBe false
    }
    
  • several sub-values of any types - in such case the returned type can be an Option[(T1, T2, ...)]:
    it("should extract many values of the sub-type") {
      object LettersExtractor {
        def unapply(text: String): Option[(String, Int, String)] = {
          if (text.isEmpty) {
            None
          } else {
            Some((text, text.length, text.toUpperCase()))
          }
        }
      }
    
      def getTextStats(text: String): (Option[(String, Int, String)]) = {
        val result = text match {
          case LettersExtractor(extractionResult) => Some(extractionResult)
          case _ => None
        }
        result
      }
    
      val textStats = getTextStats("aBc")
      textStats shouldBe defined
      textStats.get._1 shouldEqual "aBc"
      textStats.get._2 shouldEqual 3
      textStats.get._3 shouldEqual "ABC"
    }
    

The same rules apply for unapplySeq method. The single difference is that unapplySeq is better adapted to values with variable number of arguments, as for instance sequences. A short example shows that:

it("should apply for sequence") {
  object StringWordsExtractor {
    def unapplySeq(commaSeparatedText: String): Option[Seq[String]] = {
      Some(commaSeparatedText.split(","))
    }
  }

  val result = "a,b,c,d" match {
    case StringWordsExtractor(item1, item2, item3, _*) => s"Got: ${item1}, ${item2}, ${item3}"
    case _ => "d"
  }

  result shouldEqual "Got: a, b, c"
}

Extractors and case classes

Scala's case classes have some special behavior. Scala compiler automatically generates their equals and hashCode methods to provide value equality without any code to write. But these native implementations are not the single ones because unapply is also provided without any implementation need. Let's first look at a regular class use in pattern matching:

class Person(firstName: String, lastName: String)
val abPerson = new Person("a", "b")
val result = abPerson match {
  case Person(firstName, lastName) => ""
  case _ => ""
}

The code doesn't compile because of compilation error: Error:(45, 12) not found: value Person case Person(firstName, lastName) => "". But the same code compiles and works if we transform class Person in case class Person:

it("should use automatically generated unapply") {
case class Person(firstName: String, lastName: String)
  val abPerson = Person("a", "b")
  val result = abPerson match {
    case Person(firstName, lastName) => s"Got ${firstName} ${lastName}"
    case _ => ""
  }

  result shouldEqual "Got a b"
}

it("should apply on case class through explicit unapply call") {
  case class Person(firstName: String, lastName: String)

  val (firstName, lastName) = Person.unapply(Person("a", "b")).get

  firstName shouldEqual "a"
  lastName shouldEqual "b"
}

Unlike previous test case, these 2 compiled and worked correctly. It's because of automatically generated unapply method. We can see that by compiling a case class Person(firstName: String, lastName: String) and analyzing generated class with javap Person.class. The output will be similar to:

Compiled from "Person.scala"
public class Person implements scala.Product,scala.Serializable {
  public static scala.Option> unapply(Person);
  public static Person apply(java.lang.String, java.lang.String);
  public static scala.Function1, Person> tupled();
  public static scala.Function1> curried();
  public java.lang.String firstName();
  public java.lang.String lastName();
  // ....

As you can clearly see, unapply method returning an optional tuple with class 2 parameters was generated. It's why case classes can be used in pattern matching natively.

Extractors use cases

We've seen in the first section that extractors can return one of 3 types and that depending on them, they are used either to test a condition or to extract specific values. To be more precise, their use was pretty well summarized in Pattern Matching in Scala paper by Michael Rüegg. The author, going through pattern matching specificities, presents extractors in the context of:

  • (obviously) pattern matching - the role of extractors was globally presented in the context of pattern matching and it's their main use case.
  • conversions - unapply in conjunction with pattern matching can also be used to convert types, as shown in the following example:
    it("should be used to convert types") {
      trait Currency
      case class Dollar(amount: Double) extends Currency
      case class Euro(amount: Double) extends Currency
    
      object Pound {
        def unapply(currency: Currency): Option[Double] = currency match {
          case Dollar(amount) => Some(amount*0.75d)
          case Euro(amount) => Some(amount*0.88d)
          case _ => None
        }
      }
    
      def getPoundsAmount(currency: Currency): Double = {
        currency match {
          case Pound(amountInPounds) => amountInPounds
        }
      }
    
      val euroInPounds = getPoundsAmount(Euro(1D))
      euroInPounds shouldEqual 0.88d
      val dollarInPounds = getPoundsAmount(Dollar(1D))
      dollarInPounds shouldEqual 0.75d
    }
    
  • regular expressions - extractors are also widely used in Scala's regular expressions. In Regex.scala file we can find them defined for matching or groups, as shown in the following 2 test cases:
    it("should be used to replace characters in matched string") {
      val replaceRegex = """\d+""".r
      val replacedMatchedParts = replaceRegex.replaceAllIn("A0B1C2", _ match {
        case Match(matchedNumber) => (matchedNumber.toInt + 1).toString
        }
      )
    
      replacedMatchedParts shouldEqual "A1B2C3"
    }
    
    it("should be used to replace characters by matching groups") {
      val groupsRegex = """[A-Z](\d)[A-Z](\d)[A-Z](\d)""".r
    
      val replacedMatchedParts = groupsRegex.replaceAllIn("A0B1C2", _ match {
        case Groups(nr1, nr2, nr3) => s"${nr3}${nr2}${nr1}"
      })
    
      replacedMatchedParts shouldEqual "210"
    }
    
    And if you take a look at Match and Groups objects, you'll see unapply* methods implemented like this:
      object Groups {
        def unapplySeq(m: Match): Option[Seq[String]] = if (m.groupCount > 0) Some(1 to m.groupCount map m.group) else None
      }
      object Match {
        def unapply(m: Match): Some[String] = Some(m.matched)
      }
    

In this post about Scala features we discovered extractors. As shown, they're objects having unapply or unapplySeq methods implemented. Thanks to them these objects can be used in pattern matching blocks to extract values from matched object through unapply methods. The methods that are automatically added by Scala compiler to case classes and that, as shown in the last part, can be used in conjunction with pattern matching in conversion or regular expressions.

Share, like or comment this post on Twitter:

Share on: