I'm the author of Data Engineering Design Patterns (O'Reilly),
a Databricks MVP, and
a freelance data engineer specializing in Apache Spark and Databricks.
I help teams move from working pipelines to resilient architectures.
I'm currently accepting new projects for Jun 2026. Whether you need a 2-day architectural audit, a hands-on lead for a
complex data engineering problem, or a workshop
let's discuss your project here.
Once I encountered a mysterious error in the operator using some XCom variables: ERROR - No module named 'commons' Traceback (most recent call last): File "/usr/local/lib/python3.6/site-package...
I encountered that problem when I had been trying to add a watcher step to an EMR job. Unfortunately, my first tries ended with a Cluster id '' is not valid message. First, I checked whether the clust...
Apache Airflow is a very flexible orchestration framework. You can execute the operations depending on the conditional branches or, as you will see below, on the previous tasks results. The execut...
A common way to control task sequentiality consists on using data sensors. The idea is to wait for the data generated by the previous DAG execution. Actually, there is also a second way to implement s...