Articles about Spark fault tolerance on waitingforcode.com

March 31, 2018 • Apache Spark Structured Streaming

Fault tolerance in Apache Spark Structured Streaming

The Structured Streaming guarantees end-to-end exactly-once delivery (in micro-batch mode) through the semantics applied to state management, data source and data sink. The state was more covered in the post about the state store but 2 other parts still remain to discover.

Continue Reading →

July 9, 2017 • Apache Spark

Failed tasks resubmit

A lot of things are automatized in Spark: metadata and data checkpointing, task distribution, to quote only some of them. Another one, not mentioned very often, is the automatic retry in the case of task failures.

Continue Reading →

July 2, 2017 • Apache Spark Streaming

Graceful shutdown explained

Spark has different methods to reduce data loss, also during streaming processing. It proposes well known checkpointing but also less obvious operation invoked on stopping processing - graceful shutdown.

Continue Reading →

May 29, 2017 • Apache Spark Streaming

Metadata checkpoint

One of previous posts talked about checkpoint types in Spark Streaming. This one focuses more on one type of them - metadata checkpoint.

Continue Reading →

Spark fault tolerance articles

Fault tolerance in Apache Spark Structured Streaming

Failed tasks resubmit

Graceful shutdown explained

Metadata checkpoint