Scala rich data types

Versions: Scala 2.12.1

Have you ever wondered why in Scala we can directly reverse a String and in Java we must use a StringBuilder especially for it? If yes, this post provides a little bit more explanation by focusing on Scala's data types equivalents to Java's primitives (+ String) called rich wrappers.

Data Engineering Design Patterns

Looking for a book that defines and solves most common data engineering problems? I wrote one on that topic! You can read it online on the O'Reilly platform, or get a print copy on Amazon.

I also help solve your data engineering problems 👉 contact@waitingforcode.com 📩

The post talks about Scala data types. Its first section explains the concept globally. The second one focuses on the transparent use of Scala's rich data types. The third part compares the code written with Scala rich types and Java data types.

Data types

In the context of this post the data types will concern the most basic types in Scala, such as String, Int, Double, Float, Byte, Short, Boolean and Long. All of them have one thing in common - they all extend AnyVal. Thanks to that the compiler is able to make some optimizations and represent them as simple Java's primitives at runtime. Hence, all of them share the ranges of bits of their corresponding Java primitives. The following example shows 2 classes and their respective bytecodes:

class RichWrappers {

  private val number = 1

}
public class Primitives {

  private int number = 1;

  private Integer integerNumber = 1;

}

If we analyze their bytecodes, we can clearly see that Scala's Int is considered as Java's int (Integer integerNumber was added to highlight that difference):

public test.Primitives();
descriptor: ()V
flags: ACC_PUBLIC
Code:
  stack=2, locals=1, args_size=1
     0: aload_0
     1: invokespecial #1                  // Method java/lang/Object."":()V
     4: aload_0
     5: iconst_1
     6: putfield      #2                  // Field number:I
     9: aload_0
    10: iconst_1
    11: invokestatic  #3                  // Method java/lang/Integer.valueOf:(I)Ljava/lang/Integer;
    14: putfield      #4                  // Field integerNumber:Ljava/lang/Integer;
    17: return
  LineNumberTable:
    line 3: 0
    line 5: 4
    line 7: 9
  LocalVariableTable:
    Start  Length  Slot  Name   Signature
        0      18     0  this   Ltest/Primitives;


public com.waitingforcode.RichWrappers();
descriptor: ()V
flags: ACC_PUBLIC
Code:
  stack=2, locals=1, args_size=1
     0: aload_0
     1: invokespecial #19                 // Method java/lang/Object."":()V
     4: aload_0
     5: iconst_1
     6: putfield      #13                 // Field number:I
     9: return
  LocalVariableTable:
    Start  Length  Slot  Name   Signature
        0      10     0  this   Lcom/waitingforcode/RichWrappers;
  LineNumberTable:
    line 7: 0
    line 5: 4
    line 3: 9

All data types are defined inside scala package and are imported automatically by the compiler. It's why we don't need to write the import statement at every use. Internally the data types classes are declared as abstract classes extending AnyVal class:

final abstract class Int private extends AnyVal 
final abstract class Byte private extends AnyVal
// ...

It doesn't mean we manipulate abstract classes though. Instead, all of them are converted to their “rich" wrappers.

Rich wrappers

The construction of rich wrappers is transparent for the programmers. An object called Predef provides the implicit conversion methods transforming abstract data types to their rich equivalents:

@inline implicit def byteWrapper(x: Byte)       = new runtime.RichByte(x)
@inline implicit def shortWrapper(x: Short)     = new runtime.RichShort(x)
@inline implicit def intWrapper(x: Int)         = new runtime.RichInt(x)
@inline implicit def charWrapper(c: Char)       = new runtime.RichChar(c)
@inline implicit def longWrapper(x: Long)       = new runtime.RichLong(x)
@inline implicit def floatWrapper(x: Float)     = new runtime.RichFloat(x)
@inline implicit def doubleWrapper(x: Double)   = new runtime.RichDouble(x)
@inline implicit def booleanWrapper(x: Boolean) = new runtime.RichBoolean(x)
/** @group conversions-string */
@inline implicit def augmentString(x: String): StringOps = new StringOps(x)
/** @group conversions-string */
@inline implicit def unaugmentString(x: StringOps): String = x.repr

It's a great example of Pimp My Lib pattern explained in the post about Scala implicits some months ago. Thanks to that it's possible transparently extend the behavior of native data types.

Rich features

After talking about theoretical points of Scala data types, it's a good moment to jump into practice. Below tests show some of arbitrary chosen operations that can be easily made with Scala. Each test compares Scala operations with Java ones:

describe("string") {
  val scalaText = "abc"
  val javaText: java.lang.String = "abc"
  it("should get last character") {
    scalaText.last shouldEqual 'c'
    javaText.charAt(javaText.length-1) shouldEqual 'c'
  }
  it("should reverse string") {
    scalaText.reverse shouldEqual "cba"
    new java.lang.StringBuilder(javaText).reverse().toString() shouldEqual "cba"
  }
}
describe("integer") {
  val scalaInteger = 10
  val javaInteger: java.lang.Integer = 10
  it("should create a range") {
    scalaInteger.to(15) should contain allOf(10, 11, 12, 13, 14)
    IntStream.range(javaInteger, 15).boxed().collect(Collectors.toList()) should contain allOf(10, 11, 12, 13, 14)
  }
  it("should return absolute value") {
    scalaInteger.abs shouldEqual 10d
    java.lang.Math.abs(javaInteger) shouldEqual 10d
  }
  it("should return binary string") {
    scalaInteger.toBinaryString shouldEqual "1010"
    java.lang.Integer.toBinaryString(javaInteger) shouldEqual "1010"
  }
}

As you can see in the above tests, Scala has a more idiomatic way to retrieve rich properties of data types. The last function will always be more meaningful than an operation using charAt. Similarly to reverse that is much more intuitive than Java's StringBuilder reverse method.

Scala rich data types extend the behavior of their Java's equivalents. Most of the time they provide shortcut methods internally using more verbose Java code. It's possible thanks to rich wrappers described in the second section and enabled with Pimp My Lib pattern.

Consulting

With nearly 16 years of experience, including 8 as data engineer, I offer expert consulting to design and optimize scalable data solutions. As an O’Reilly author, Data+AI Summit speaker, and blogger, I bring cutting-edge insights to modernize infrastructure, build robust pipelines, and drive data-driven decision-making. Let's transform your data challenges into opportunities—reach out to elevate your data engineering game today!

👉 contact@waitingforcode.com
🔗 past projects

TAGS: #One Scala feature per week