Articles about custom state store on waitingforcode.com

December 21, 2019 • Apache Spark Structured Streaming

Extending state store in Structured Streaming - reprocessing and limits

In my previous post I have shown you the writing and reading parts of my custom state store implementation. Today it's time to cover the data reprocessing and also the limits of the solution.

Continue Reading →

December 15, 2019 • Apache Spark Structured Streaming

Extending state store in Structured Streaming - reading and writing state

In my previous post I introduced the classes involved in the interactions with the state store, and also shown the big picture of the implementation. Today it's time to write some code :)

Continue Reading →

December 14, 2019 • Apache Spark Structured Streaming

Why UnsafeRow.copy() for state persistence in the state store?

In my last Spark+AI Summit 2019 follow-up posts I'm implementing a custom state store. The extension is inspired by the default state store. At the moment of code analysis, one of the places that intrigued me was the put(key: UnsafeRow, value: UnsafeRow) method. Keep reading if you're curious why.

Continue Reading →

December 8, 2019 • Apache Spark Structured Streaming

Extending state store in Structured Streaming - introduction

When I started to think about implementing my own state store, I had an idea to load the state on demand for given key from a distributed and single-digit milliseconds latency store like AWS DynamoDB. However, after analyzing StateStore API and how it's used in different places, I saw it won't be easy.

Continue Reading →

custom state store articles

Extending state store in Structured Streaming - reprocessing and limits

Extending state store in Structured Streaming - reading and writing state

Why UnsafeRow.copy() for state persistence in the state store?

Extending state store in Structured Streaming - introduction