Spark fault tolerance articles

Fault tolerance in Apache Spark Structured Streaming

The Structured Streaming guarantees end-to-end exactly-once delivery (in micro-batch mode) through the semantics applied to state management, data source and data sink. The state was more covered in the post about the state store but 2 other parts still remain to discover.

Continue Reading →

Failed tasks resubmit

A lot of things are automatized in Spark: metadata and data checkpointing, task distribution, to quote only some of them. Another one, not mentioned very often, is the automatic retry in the case of task failures.

Continue Reading →

Graceful shutdown explained

Spark has different methods to reduce data loss, also during streaming processing. It proposes well known checkpointing but also less obvious operation invoked on stopping processing - graceful shutdown.

Continue Reading →

Metadata checkpoint

One of previous posts talked about checkpoint types in Spark Streaming. This one focuses more on one type of them - metadata checkpoint.

Continue Reading →