A 3-day bug hunt on a 3-person team costs up to β¬7,200 in lost engineering time. This workshop teaches you to prevent that β unit tests, data tests, and integration tests for PySpark and Databricks Lakeflow, including Spark Declarative Pipelines.
Previously we discovered two popular architectures in Big Data systems - lambda and kappa. Because it was new and pretty long concepts to explain, we expressly ignored tools.
When some years ago I done a small POC Hadoop/MapReduce project based on a million song dataset (my old blog in French), I expressly omitted the part about architecture. It was a mistake because correctly designed architecture is as important as code written behind.
Originally, Big Data can seem to be strictly related to one tool - Hadoop. However, it's a misunderstanding of the concept because it hides more interesting stuff.