Since in distributed computing the data moves either locally (within single worker) or remotely (between several different workers), it must have a format understandable by the machine. And this format is guaranteed by the operation of serialization, also present in Apache Beam.
Some of previous posts (Serialization issues - part 1) presented some of solutions for serialization problems. This post is its continuation.
Issues with not serializable objects are maybe the most painful when we start to work with Spark. But hopefully there are several solutions to them.
Serialization frameworks are intrinsic part of Big Data systems. Spark is not an exception for this rule and it offers some different possibilities to manage serialization.