No module named '...' error on PySpark for Windows

When running your PySpark on Windows in local mode, you may encounter "No module named" errors.

spark = SparkSession.builder.master("local[1]") \
	.getOrCreate()

To start, ensure that the not found module is correctly installed. You can do it simply by running pip install ${module not found}. If the module is installed but can't be found, it might not be visible on the executor processes.

To test this hypothesis, set the PySpark's Python path to the path of your setup. In the snippet below, the path references the virtual environment created for the local tests:

import os

os.environ['PYSPARK_PYTHON'] = './.venv/Scripts/python.exe'
os.environ['PYSPARK_DRIVER_PYTHON'] = './venv/Scripts/python.exe'

Most of the time, it should fix the issue.