A 3-day bug hunt on a 3-person team costs up to β¬7,200 in lost engineering time. This workshop teaches you to prevent that β unit tests, data tests, and integration tests for PySpark and Databricks Lakeflow, including Spark Declarative Pipelines.
It's difficult to contest the importance of testing in programming. Tests help to avoid regressions (a lot of regressions) and also to better understand developed code. Spark (and other data processing frameworks by the way) is not an exception of this rule. But, obviously, testing applications working in distributed mode is more tricky than in the case of standalone programs.
After writing a post about testing Spark applications, I decided to take a look at Spark project tests and see which patterns they use to verify framework features.
At first glance the wide choice of testing families in Scala can scary. After all in JUnit and other xUnit frameworks, the choice of tests declaration is limited. Hopefully after some digging ScalaTest's testing styles become more obvious to understand and to use.
Some time ago I was involved in a discussion about testing Apache Spark SQL code. In this post, I would like to share my observations about this topic.