Spark fault tolerance articles

Fault tolerance in Apache Spark Structured Streaming

The Structured Streaming guarantees end-to-end exactly-once delivery (in micro-batch mode) through the semantics applied to state management, data source and data sink. The state was more covered in the post about the state store but 2 other parts still remain to discover.

Continue Reading β†’

Failed tasks resubmit

A lot of things are automatized in Spark: metadata and data checkpointing, task distribution, to quote only some of them. Another one, not mentioned very often, is the automatic retry in the case of task failures.

Continue Reading β†’

Graceful shutdown explained

Spark has different methods to reduce data loss, also during streaming processing. It proposes well known checkpointing but also less obvious operation invoked on stopping processing - graceful shutdown.

Continue Reading β†’

Metadata checkpoint

One of previous posts talked about checkpoint types in Spark Streaming. This one focuses more on one type of them - metadata checkpoint.

Continue Reading β†’