I'm the author of Data Engineering Design Patterns (O'Reilly),
a Databricks MVP, and
a freelance data engineer specializing in Apache Spark and Databricks.
I help teams move from working pipelines to resilient architectures.
I'm currently accepting new projects for May 2026. Whether you need a 2-day architectural audit, a hands-on lead for a
complex data engineering problem, or a workshop
let's discuss your project here.
To see Apache Spark logs, you need to set the log level before running the transformations. The following snippet will enable trace logs to the console: spark_session.sparkContext.setLogLevel('TRAC...
When running your PySpark on Windows in local mode, you may encounter "No module named" errors. spark = SparkSession.builder.master("local[1]") \ .getOrCreate() To start, ensure that the not foun...