Streaming stateful processing in Apache Spark evolved a lot from the first versions of the framework. At the beginning was updateStateByKey but some time after, judged inefficient, it was replaced by mapWithState. With the arrival of Structured Streaming the last method was replaced in its turn by mapGroupsWithState.
As we saw in the previous post, Apache Beam brings the possibility to deal with state. However, as we learned there, the state itself allows only to keep something in memory during the window duration. After that, the state is removed. But thanks to another Beam's feature called timers we can deal with the expiring state just before its removal from the state store.
Real-time processing is most of the time somehow related to stateful processing. Either we need to solve some sessionization problem, count the number of visitors per minute etc. Not surprisingly Apache Beam comes with the API adapted to put in place the solutions to them.
updateStateByKey function, explained in the post about Stateful transformations in Spark Streaming, is not the single solution provided by Spark Streaming to deal with state. Another one, much more optimized, is mapWithState.
Spark Streaming is able to handle state-based operations, ie. operations containing a state susceptible to be modified in subsequent batches of data.