When I was analyzing one of Apache Spark GraphX functions for the first time I faced a class annotated with @specialized annotation. Since then I decided to find more information about it and share them with you in this post.
Data Engineering Design Patterns
Looking for a book that defines and solves most common data engineering problems? I'm currently writing
one on that topic and the first chapters are already available in 馃憠
Early Release on the O'Reilly platform
I also help solve your data engineering problems 馃憠 contact@waitingforcode.com 馃摡
In the first section of the post I will explain the basic points about @specialized. In the next one, I will tend to show how to use it. In the final part, I'll do a micro-benchmark to analyze the real impact on specialized code.
Definition
Scala uses the @specialized class to apply type specialization to the compiled classes. The specialization occurs at the compile time and consists on generating the versions of generic classes for the specific types.
I was quite abstract for me too at the beginning, but an example helped me to follow. Let's say we have a generic class like CustomSequence[T]. If we apply the Long type specialization to it, the compiler will generate the $Generated$CustomSequence[Long] and use it everywhere we write val longSeq = new CustomSequence[Long]. I gave an example of Long not by accident because the type specialization applies only to the primitive types.
Of course, the type specialization doesn't come without costs. It will slow down the compilation time since the compiler has some extra work to do. And the negative impact on the compilation can be really big. If you take a generic class with 3 types, like BigGeneric[T1, T2, T3], the compiler will need to generate the combination for every primitive.
On the other side, the specialization may have a positive impact on the runtime because it helps to avoid the boxing/unboxing during the code execution. If you don't believe me at words, I will try to convince you in the last section.
Use in Scala
At first glance the type specialization looks easy. Let's see now how to use it in Scala with already mentioned @specialized annotation. Since it's difficult to illustrate the type specialization with usual learning tests, we'll try to do this by analyzing the compiled classes.
@specialized can be used in 2 different ways, without and with the list of the specialized types. You can see that in the following examples:
class GlobalSpecialization[@specialized T] { def get(item: T) = item } class ReducedSpecialization[@specialized(Long) T] { def get(item: T) = item }
If we take a look at the compiled classes, we should see:
'GlobalSpecialization$mcB$sp.class' 'GlobalSpecialization$mcJ$sp.class' 'GlobalSpecialization$mcC$sp.class' 'GlobalSpecialization$mcS$sp.class' 'ReducedSpecialization$mcJ$sp.class' 'GlobalSpecialization$mcD$sp.class' 'GlobalSpecialization$mcV$sp.class' ReducedSpecialization.class 'GlobalSpecialization$mcF$sp.class' 'GlobalSpecialization$mcZ$sp.class' 'GlobalSpecialization$mcI$sp.class' GlobalSpecialization.class
A not specialized class would generate only 1 compiled file. Let's add it to our test package to see what the compiler is doing when it sees a specialized and not specialized code:
class NotSpecialized[T] { def get(item: T) = item } class Tests { val longNotSpecialized = new NotSpecialized[Long]() longNotSpecialized.get(3L) + 4L val longReducedSpecialized = new ReducedSpecialization[Long]() longReducedSpecialized.get(4L) + 5L }
If we take a look at Tests.class with javap -v Tests.class command, we should see that the compiled adds a boxing for the not specialized type and doesn't do that for the specialized type:
# Not specialized class 9: invokespecial #29 // Method com/waitingforcode/specialization/NotSpecialized."":()V 12: putfield #17 // Field longNotSpecialized:Lcom/waitingforcode/specialization/NotSpecialized; 15: aload_0 16: invokevirtual #31 // Method longNotSpecialized:()Lcom/waitingforcode/specialization/NotSpecialized; 19: ldc2_w #32 // long 3l 22: invokestatic #39 // Method scala/runtime/BoxesRunTime.boxToLong:(J)Ljava/lang/Long; 25: invokevirtual #43 // Method com/waitingforcode/specialization/NotSpecialized.get:(Ljava/lang/Object;)Ljava/lang/Object; 28: invokestatic #47 // Method scala/runtime/BoxesRunTime.unboxToLong:(Ljava/lang/Object;)J 31: ldc2_w #48 // long 4l 34: ladd # Specialized class 37: new #51 // class com/waitingforcode/specialization/ReducedSpecialization$mcJ$sp 40: dup 41: invokespecial #52 // Method com/waitingforcode/specialization/ReducedSpecialization$mcJ$sp." ":()V 44: putfield #22 // Field longReducedSpecialized:Lcom/waitingforcode/specialization/ReducedSpecialization; 47: aload_0 48: invokevirtual #54 // Method longReducedSpecialized:()Lcom/waitingforcode/specialization/ReducedSpecialization; 51: ldc2_w #48 // long 4l 54: invokevirtual #60 // Method com/waitingforcode/specialization/ReducedSpecialization.get$mcJ$sp:(J)J 57: ldc2_w #61 // long 5l
Just to show you that I didn't hide the boxing in the get method of ReducedSpecialization, class, you can find the bytecode for it in the next snippet:
public long get$mcJ$sp(long); descriptor: (J)J flags: ACC_PUBLIC Code: stack=2, locals=3, args_size=2 0: lload_1 1: lreturn LocalVariableTable: Start Length Slot Name Signature 0 2 0 this Lcom/waitingforcode/specialization/ReducedSpecialization$mcJ$sp; 0 2 1 item J LineNumberTable: line 5: 0 MethodParameters: Name Flags item final
Specialized type impact on runtime
In order to check the specialized type impact on the Scala runtime I'll use the JMH, exactly like in the post about structural types. Since the build.sbt is the same, I'll omit it here for brevity. Let's focus rather on the tested classes:
@OutputTimeUnit(TimeUnit.MILLISECONDS) @BenchmarkMode(Array(Mode.All)) class SpecializedTypeMicroBenchmark { @Benchmark def verify_specialized: Unit = { val specialized = new SpecializedType[Int] (0 to 1000000).map(nr => specialized.item(nr)) } @Benchmark def verify_not_specialized: Unit = { val notSpecialized = new NotSpecializedType[Int] (0 to 1000000).map(nr => notSpecialized.item(nr)) } } class NotSpecializedType[T] { def item(item: T) = item } class SpecializedType[@specialized T] { def item(item: T) = item }
After executing the code with sbt jmh:run -i 20 -wi 10 -f1 -t1 -rf text, I got the following results:
Benchmark Mode Cnt Score Error Units SpecializedTypeMicroBenchmark.verify_not_specialized thrpt 20 0.079 卤 0.010 ops/ms SpecializedTypeMicroBenchmark.verify_specialized thrpt 20 0.095 卤 0.015 ops/ms SpecializedTypeMicroBenchmark.verify_not_specialized avgt 20 21.024 卤 8.833 ms/op SpecializedTypeMicroBenchmark.verify_specialized avgt 20 12.423 卤 1.765 ms/op SpecializedTypeMicroBenchmark.verify_not_specialized sample 1394 14.427 卤 0.458 ms/op SpecializedTypeMicroBenchmark.verify_not_specialized:verify_not_specialized路p0.00 sample 8.569 ms/op SpecializedTypeMicroBenchmark.verify_not_specialized:verify_not_specialized路p0.50 sample 13.058 ms/op SpecializedTypeMicroBenchmark.verify_not_specialized:verify_not_specialized路p0.90 sample 21.332 ms/op SpecializedTypeMicroBenchmark.verify_not_specialized:verify_not_specialized路p0.95 sample 25.059 ms/op SpecializedTypeMicroBenchmark.verify_not_specialized:verify_not_specialized路p0.99 sample 32.775 ms/op SpecializedTypeMicroBenchmark.verify_not_specialized:verify_not_specialized路p0.999 sample 43.424 ms/op SpecializedTypeMicroBenchmark.verify_not_specialized:verify_not_specialized路p0.9999 sample 43.450 ms/op SpecializedTypeMicroBenchmark.verify_not_specialized:verify_not_specialized路p1.00 sample 43.450 ms/op SpecializedTypeMicroBenchmark.verify_specialized sample 1876 10.736 卤 0.431 ms/op SpecializedTypeMicroBenchmark.verify_specialized:verify_specialized路p0.00 sample 6.111 ms/op SpecializedTypeMicroBenchmark.verify_specialized:verify_specialized路p0.50 sample 9.052 ms/op SpecializedTypeMicroBenchmark.verify_specialized:verify_specialized路p0.90 sample 15.188 ms/op SpecializedTypeMicroBenchmark.verify_specialized:verify_specialized路p0.95 sample 20.064 ms/op SpecializedTypeMicroBenchmark.verify_specialized:verify_specialized路p0.99 sample 38.676 ms/op SpecializedTypeMicroBenchmark.verify_specialized:verify_specialized路p0.999 sample 55.613 ms/op SpecializedTypeMicroBenchmark.verify_specialized:verify_specialized路p0.9999 sample 74.580 ms/op SpecializedTypeMicroBenchmark.verify_specialized:verify_specialized路p1.00 sample 74.580 ms/op SpecializedTypeMicroBenchmark.verify_not_specialized ss 20 26.855 卤 9.885 ms/op SpecializedTypeMicroBenchmark.verify_specialized ss 20 17.305 卤 5.425 ms/op
The specialized version performs much better than the not specialized one. We can notice that already in the throughput metric where the former reaches almost 0.1 operations per ms while the latter is only close to 0.08. We can also notice that the not specialized code takes almost twice more to execute than the specialized one. Quite surprising is the result for sample time (sample) measure where the worst case for specialized version is worse than the same result for the not optimized code. It doesn't mean that the specialization is bad though. It's quite good even for the cold start (ss).
Maybe you won't use the type specialization frequently. In this article, I didn't try to convince you to change the code and put the @specialized annotation everywhere. It would probably slow down the compilation time and not bring a lot of advantages on runtime. However, if your application starts to slow down and the reason for that is the primitive type boxing, the specialization is here one of the solutions. As shown in the second section, the use of this mechanism is quite easy because it can be summarized to the use of @specialized annotation with, optionally, the list of specialized types.