A 3-day bug hunt on a 3-person team costs up to β¬7,200 in lost engineering time. This workshop teaches you to prevent that β unit tests, data tests, and integration tests for PySpark and Databricks Lakeflow, including Spark Declarative Pipelines.
We all have our habits and as programmers, libraries and frameworks are definitely a part of the group. In this blog post I'll share with you a list of Java and Scala classes I use almost every time in data engineering projects. The part for Python will follow next week!
Last week I presented the API of the 3 analyzed ACID file formats. Under-the-hood, they obviously generate data files but not only. And that's something we'll focus on in this blog post.
It's time to start a new series on the blog! I hope to catch on to the ACID file formats that are gaining more and more importance. It's also a good occasion to test a new learning method. Instead of writing one blog post per feature and format, I'll try to compare Delta Lake, Apache Iceberg, and Apache Hudi concepts in the same article. Besides this personal challenge, I hope you'll enjoy the series and also learn something interesting!
The first "reverse" term I've ever encountered in programming was reverse proxy. Since then, I've seen passing "reverse engineering", "reverse iterator", but none of them was a pure data term. Until recently, when I heard about reverse ETL.