Have you ever wondered why in Scala we can directly reverse a String and in Java we must use a StringBuilder especially for it? If yes, this post provides a little bit more explanation by focusing on Scala's data types equivalents to Java's primitives (+ String) called rich wrappers.
What would it take for you to trust your Databricks pipelines in production?
A 3-day bug hunt on a 3-person team costs up to €7,200 in lost engineering time. This workshop teaches you to prevent that — unit tests, data tests, and integration tests for PySpark and Databricks Lakeflow, including Spark Declarative Pipelines.
Konieczny
The post talks about Scala data types. Its first section explains the concept globally. The second one focuses on the transparent use of Scala's rich data types. The third part compares the code written with Scala rich types and Java data types.
Data types
In the context of this post the data types will concern the most basic types in Scala, such as String, Int, Double, Float, Byte, Short, Boolean and Long. All of them have one thing in common - they all extend AnyVal. Thanks to that the compiler is able to make some optimizations and represent them as simple Java's primitives at runtime. Hence, all of them share the ranges of bits of their corresponding Java primitives. The following example shows 2 classes and their respective bytecodes:
class RichWrappers {
private val number = 1
}
public class Primitives {
private int number = 1;
private Integer integerNumber = 1;
}
If we analyze their bytecodes, we can clearly see that Scala's Int is considered as Java's int (Integer integerNumber was added to highlight that difference):
public test.Primitives();
descriptor: ()V
flags: ACC_PUBLIC
Code:
stack=2, locals=1, args_size=1
0: aload_0
1: invokespecial #1 // Method java/lang/Object."":()V
4: aload_0
5: iconst_1
6: putfield #2 // Field number:I
9: aload_0
10: iconst_1
11: invokestatic #3 // Method java/lang/Integer.valueOf:(I)Ljava/lang/Integer;
14: putfield #4 // Field integerNumber:Ljava/lang/Integer;
17: return
LineNumberTable:
line 3: 0
line 5: 4
line 7: 9
LocalVariableTable:
Start Length Slot Name Signature
0 18 0 this Ltest/Primitives;
public com.waitingforcode.RichWrappers();
descriptor: ()V
flags: ACC_PUBLIC
Code:
stack=2, locals=1, args_size=1
0: aload_0
1: invokespecial #19 // Method java/lang/Object."":()V
4: aload_0
5: iconst_1
6: putfield #13 // Field number:I
9: return
LocalVariableTable:
Start Length Slot Name Signature
0 10 0 this Lcom/waitingforcode/RichWrappers;
LineNumberTable:
line 7: 0
line 5: 4
line 3: 9
All data types are defined inside scala package and are imported automatically by the compiler. It's why we don't need to write the import statement at every use. Internally the data types classes are declared as abstract classes extending AnyVal class:
final abstract class Int private extends AnyVal final abstract class Byte private extends AnyVal // ...
It doesn't mean we manipulate abstract classes though. Instead, all of them are converted to their “rich" wrappers.
Rich wrappers
The construction of rich wrappers is transparent for the programmers. An object called Predef provides the implicit conversion methods transforming abstract data types to their rich equivalents:
@inline implicit def byteWrapper(x: Byte) = new runtime.RichByte(x) @inline implicit def shortWrapper(x: Short) = new runtime.RichShort(x) @inline implicit def intWrapper(x: Int) = new runtime.RichInt(x) @inline implicit def charWrapper(c: Char) = new runtime.RichChar(c) @inline implicit def longWrapper(x: Long) = new runtime.RichLong(x) @inline implicit def floatWrapper(x: Float) = new runtime.RichFloat(x) @inline implicit def doubleWrapper(x: Double) = new runtime.RichDouble(x) @inline implicit def booleanWrapper(x: Boolean) = new runtime.RichBoolean(x) /** @group conversions-string */ @inline implicit def augmentString(x: String): StringOps = new StringOps(x) /** @group conversions-string */ @inline implicit def unaugmentString(x: StringOps): String = x.repr
It's a great example of Pimp My Lib pattern explained in the post about Scala implicits some months ago. Thanks to that it's possible transparently extend the behavior of native data types.
Rich features
After talking about theoretical points of Scala data types, it's a good moment to jump into practice. Below tests show some of arbitrary chosen operations that can be easily made with Scala. Each test compares Scala operations with Java ones:
describe("string") {
val scalaText = "abc"
val javaText: java.lang.String = "abc"
it("should get last character") {
scalaText.last shouldEqual 'c'
javaText.charAt(javaText.length-1) shouldEqual 'c'
}
it("should reverse string") {
scalaText.reverse shouldEqual "cba"
new java.lang.StringBuilder(javaText).reverse().toString() shouldEqual "cba"
}
}
describe("integer") {
val scalaInteger = 10
val javaInteger: java.lang.Integer = 10
it("should create a range") {
scalaInteger.to(15) should contain allOf(10, 11, 12, 13, 14)
IntStream.range(javaInteger, 15).boxed().collect(Collectors.toList()) should contain allOf(10, 11, 12, 13, 14)
}
it("should return absolute value") {
scalaInteger.abs shouldEqual 10d
java.lang.Math.abs(javaInteger) shouldEqual 10d
}
it("should return binary string") {
scalaInteger.toBinaryString shouldEqual "1010"
java.lang.Integer.toBinaryString(javaInteger) shouldEqual "1010"
}
}
As you can see in the above tests, Scala has a more idiomatic way to retrieve rich properties of data types. The last function will always be more meaningful than an operation using charAt. Similarly to reverse that is much more intuitive than Java's StringBuilder reverse method.
Scala rich data types extend the behavior of their Java's equivalents. Most of the time they provide shortcut methods internally using more verbose Java code. It's possible thanks to rich wrappers described in the second section and enabled with Pimp My Lib pattern.
Data Engineering Design Patterns
Looking for a book that defines and solves most common data engineering problems? I wrote
one on that topic! You can read it online
on the O'Reilly platform,
or get a print copy on Amazon.
I also help solve your data engineering problems contact@waitingforcode.com đź“©
Read also about Scala rich data types here:
Related blog posts:
- Sealed keyword in Scala
- Promises in Scala
- Annotations in Scala
- Work-stealing in Scala
- Type specialization in Scala
Today in #OneScalaFeaturePerWeek some notes about the extension for Java's primitive types in #Scala: https://t.co/SPqkjL0LaN
— Bartosz Konieczny (@waitingforcode) December 2, 2018
