Articles about Spark closures on waitingforcode.com

July 2, 2017 • Apache Spark

JARs split personality problem

Often making errors helps to progress. It was my case with spark-submit and local/remote JAR pair. They helped me to understand the role of driver, closures, serialization and some configuration properties.

Continue Reading →

June 11, 2017 • Apache Spark

Spark's Singleton to be or not to be dilemma

Some time ago I was wondering why an object created once in the driver is recreated every time with new stage on executors - even if this object is sent through a broadcast variable. After some code digging, the response related to Java serialization appeared.

Continue Reading →

June 5, 2017 • Apache Spark

Serialization issues - part 2

Some of previous posts (Serialization issues - part 1) presented some of solutions for serialization problems. This post is its continuation.

Continue Reading →

June 5, 2017 • Apache Spark

Serialization issues - part 1

Issues with not serializable objects are maybe the most painful when we start to work with Spark. But hopefully there are several solutions to them.

Continue Reading →

May 21, 2017 • Apache Spark

Code execution on driver and executors

Keeping in mind which parts of Spark code are executed on driver and which ones on workers is important and can help to avoid some of annoying errors, as the ones related to serialization.

Continue Reading →

April 15, 2017 • Apache Spark

Jobs, stages and tasks

Every distributed computation is divided in small parts called jobs, stages and tasks. It's useful to know them especially during monitoring because it helps to detect bottlenecks.

Continue Reading →

Spark closures articles

JARs split personality problem

Spark's Singleton to be or not to be dilemma

Serialization issues - part 2

Serialization issues - part 1

Code execution on driver and executors

Jobs, stages and tasks