Articles about Spark code execution on waitingforcode.com

July 2, 2017 • Apache Spark

JARs split personality problem

Often making errors helps to progress. It was my case with spark-submit and local/remote JAR pair. They helped me to understand the role of driver, closures, serialization and some configuration properties.

Continue Reading →

May 13, 2017 • Apache Spark

isEmpty() trap in Spark

In general Spark's actions reflects logic implemented in a lot of equivalent methods in programming languages. As an example we can consider isEmpty() that in Spark checks the existence of only 1 element and similarly in Java's List. But it can often lead to troubles, especially when more than 1 action is invoked.

Continue Reading →

April 15, 2017 • Apache Spark

Jobs, stages and tasks

Every distributed computation is divided in small parts called jobs, stages and tasks. It's useful to know them especially during monitoring because it helps to detect bottlenecks.

Continue Reading →

Spark code execution articles

JARs split personality problem

isEmpty() trap in Spark

Jobs, stages and tasks