PySpark tips

I'm the author of Data Engineering Design Patterns (O'Reilly), a Databricks MVP, and a freelance data engineer specializing in Apache Spark and Databricks. I help teams move from working pipelines to resilient architectures.
I'm currently accepting new projects for May 2026. Whether you need a 2-day architectural audit, a hands-on lead for a complex data engineering problem, or a workshop let's discuss your project here.

How to enable Apache Spark logging for the code running on the localhost?

To see Apache Spark logs, you need to set the log level before running the transformations. The following snippet will enable trace logs to the console: spark_session.sparkContext.setLogLevel('TRAC...

Continue Reading β†’

No module named '...' error on PySpark for Windows

When running your PySpark on Windows in local mode, you may encounter "No module named" errors. spark = SparkSession.builder.master("local[1]") \ .getOrCreate() To start, ensure that the not foun...

Continue Reading β†’