A 3-day bug hunt on a 3-person team costs up to β¬7,200 in lost engineering time. This workshop teaches you to prevent that β unit tests, data tests, and integration tests for PySpark and Databricks Lakeflow, including Spark Declarative Pipelines.
When I've started to play with Apache Parquet I was surprised about 2 versions of writers. Before approaching the rest of planed topics, it's a good moment to explain these different versions better.
An efficient data storage is one of success keys of a good storage format. One of methods helping to improve that is an appropriate encoding and Parquet comes with several different methods.