Articles about Spark checkpoint on waitingforcode.com

December 12, 2018 • Apache Spark GraphX

GraphX and fault-tolerance

Bad things happen in distributed data processing and if we're prepared for them, it's better. To prevent against such issues Apache Spark is able to recompute failed partition but also to store the computation snapshot as a checkpoint. Both properties apply to GraphX module's fault-tolerance mechanism.

Continue Reading →

May 29, 2017 • Apache Spark Streaming

Metadata checkpoint

One of previous posts talked about checkpoint types in Spark Streaming. This one focuses more on one type of them - metadata checkpoint.

Continue Reading →

November 18, 2016 • Apache Spark Streaming

Spark Streaming checkpointing and Write Ahead Logs

Checkpoint allows Spark to truncate dependencies on previously computed RDDs. In the case of streams processing their role is extended. In additional, they're not a single method to prevent against failures.

Continue Reading →

Spark checkpoint articles

GraphX and fault-tolerance

Metadata checkpoint

Spark Streaming checkpointing and Write Ahead Logs