I'm the author of Data Engineering Design Patterns (O'Reilly),
a Databricks MVP, and
a freelance data engineer specializing in Apache Spark and Databricks.
I help teams move from working pipelines to resilient architectures.
I'm currently accepting new projects for May 2026. Whether you need a 2-day architectural audit, a hands-on lead for a
complex data engineering problem, or a workshop
let's discuss your project here.
You just changed the value of one column in your processing logic and want to check whether there wasn't other impact. You can do that by launching the query like this: SELECT x.id, COUNT(*) FROM (...
The goal is to write a query that for each table will compare the content and return any rows that are missing in one of the compared tables. The query combines FULL OUTER JOIN, time travel, and ro...