distributed data serialization articles

Coders in Apache Beam

Since in distributed computing the data moves either locally (within single worker) or remotely (between several different workers), it must have a format understandable by the machine. And this format is guaranteed by the operation of serialization, also present in Apache Beam.

Continue Reading β†’

Serialization issues - part 2

Some of previous posts (Serialization issues - part 1) presented some of solutions for serialization problems. This post is its continuation.

Continue Reading β†’

Serialization issues - part 1

Issues with not serializable objects are maybe the most painful when we start to work with Spark. But hopefully there are several solutions to them.

Continue Reading β†’

Serialization in Spark

Serialization frameworks are intrinsic part of Big Data systems. Spark is not an exception for this rule and it offers some different possibilities to manage serialization.

Continue Reading β†’