Partitioning in spark while reading from RDBMS via JDBC
Solution 1:
If you don't specify either {partitionColumn
, lowerBound
, upperBound
, numPartitions
} or {predicates
} Spark will use a single executor and create a single non-empty partition. All data will be processed using a single transaction and reads will be neither distributed nor parallelized.
See also:
- How to optimize partitioning when migrating data from JDBC source?
- How to improve performance for slow Spark jobs using DataFrame and JDBC connection?