A 3-day bug hunt on a 3-person team costs up to €7,200 in lost engineering time. This workshop teaches you to prevent that — unit tests, data tests, and integration tests for PySpark and Databricks Lakeflow, including Spark Declarative Pipelines.
Some time ago on my Github bithw1 pointed out an interesting behavior of Hive integration on Apache Spark SQL. To not delve too much into details now, I can tell that the behavior was about not respected DataFrame schema. Our quick exchange ended up with an explanation but it also encouraged me to go much more into details to understand the hows and whys.