I'm the author of Data Engineering Design Patterns (O'Reilly),
a Databricks MVP, and
a freelance data engineer specializing in Apache Spark and Databricks.
I help teams move from working pipelines to resilient architectures.
I'm currently accepting new projects for May 2026. Whether you need a 2-day architectural audit, a hands-on lead for a
complex data engineering problem, or a workshop
let's discuss your project here.
Even if a lot of Docker containers exist for Apache Spark, it's always a good exercise to make one in your own. It can help to understand some new concepts as well as improve skills of building Docker images.
Some months ago I written the notes about my experience from building Docker image for Spark on YARN cluster. Recently I decided to improve the project and transform it to Docker-compose format.