A 3-day bug hunt on a 3-person team costs up to β¬7,200 in lost engineering time. This workshop teaches you to prevent that β unit tests, data tests, and integration tests for PySpark and Databricks Lakeflow, including Spark Declarative Pipelines.
Real-time processing is most of the time somehow related to stateful processing. Either we need to solve some sessionization problem, count the number of visitors per minute etc. Not surprisingly Apache Beam comes with the API adapted to put in place the solutions to them.
As we saw in the previous post, Apache Beam brings the possibility to deal with state. However, as we learned there, the state itself allows only to keep something in memory during the window duration. After that, the state is removed. But thanks to another Beam's feature called timers we can deal with the expiring state just before its removal from the state store.