PySpark tips

No module named '...' error on PySpark for Windows

When running your PySpark on Windows in local mode, you may encounter "No module named" errors. spark = SparkSession.builder.master("local[1]") \ .getOrCreate() To start, ensure that the not foun...

Continue Reading →

How to enable Apache Spark logging for the code running on the localhost?

To see Apache Spark logs, you need to set the log level before running the transformations. The following snippet will enable trace logs to the console: spark_session.sparkContext.setLogLevel('TRAC...

Continue Reading →