Articles about Spark resilient on waitingforcode.com

March 31, 2018 • Apache Spark Structured Streaming

Fault tolerance in Apache Spark Structured Streaming

The Structured Streaming guarantees end-to-end exactly-once delivery (in micro-batch mode) through the semantics applied to state management, data source and data sink. The state was more covered in the post about the state store but 2 other parts still remain to discover.

Continue Reading →

July 9, 2017 • Apache Spark

Failed tasks resubmit

A lot of things are automatized in Spark: metadata and data checkpointing, task distribution, to quote only some of them. Another one, not mentioned very often, is the automatic retry in the case of task failures.

Continue Reading →

May 29, 2017 • Apache Spark Streaming

Metadata checkpoint

One of previous posts talked about checkpoint types in Spark Streaming. This one focuses more on one type of them - metadata checkpoint.

Continue Reading →

Spark resilient articles

Fault tolerance in Apache Spark Structured Streaming

Failed tasks resubmit

Metadata checkpoint