Articles about distributed stateful processing on waitingforcode.com

March 18, 2018 • Apache Spark Structured Streaming

Stateful transformations with mapGroupsWithState

Streaming stateful processing in Apache Spark evolved a lot from the first versions of the framework. At the beginning was updateStateByKey but some time after, judged inefficient, it was replaced by mapWithState. With the arrival of Structured Streaming the last method was replaced in its turn by mapGroupsWithState.

Continue Reading →

January 21, 2018 • Apache Beam

Dealing with state lifecycle in Apache Beam

As we saw in the previous post, Apache Beam brings the possibility to deal with state. However, as we learned there, the state itself allows only to keep something in memory during the window duration. After that, the state is removed. But thanks to another Beam's feature called timers we can deal with the expiring state just before its removal from the state store.

Continue Reading →

January 21, 2018 • Apache Beam

Stateful processing in Apache Beam

Real-time processing is most of the time somehow related to stateful processing. Either we need to solve some sessionization problem, count the number of visitors per minute etc. Not surprisingly Apache Beam comes with the API adapted to put in place the solutions to them.

Continue Reading →

June 11, 2017 • Apache Spark Streaming

Stateful transformations with mapWithState

updateStateByKey function, explained in the post about Stateful transformations in Spark Streaming, is not the single solution provided by Spark Streaming to deal with state. Another one, much more optimized, is mapWithState.

Continue Reading →

November 18, 2016 • Apache Spark Streaming

Stateful transformations in Spark Streaming

Spark Streaming is able to handle state-based operations, ie. operations containing a state susceptible to be modified in subsequent batches of data.

Continue Reading →

distributed stateful processing articles

Stateful transformations with mapGroupsWithState

Dealing with state lifecycle in Apache Beam

Stateful processing in Apache Beam

Stateful transformations with mapWithState

Stateful transformations in Spark Streaming