Pure Python / Python specific code in Pyspark application

Solution 1:

The code you write (your "main" function) will be executed on the driver, and when you will operate on distributed data (e.g. RDDs and others), that executor will coordinate with the workers and handle that operation on them (in a sense, those operations will be executed on the workers).