A 3-day bug hunt on a 3-person team costs up to β¬7,200 in lost engineering time. This workshop teaches you to prevent that β unit tests, data tests, and integration tests for PySpark and Databricks Lakeflow, including Spark Declarative Pipelines.
Data quality is one of the key factors of a successful data project. Without a good quality, even the most advanced engineering or analytics work will not be trusted, therefore, not used. Unfortunately, data quality controls are very often considered as a work item to implement in the end, which sometimes translates to never.
Previously we learned how to control data quality with Delta Live Tables. Now, it's time to see an open source library in action, Spark Expectations.
In the last blog post of the data quality on Databricks series we're going to discover a Databricks Labs product, the DQX library.