Articles about Spark SQL partitioning on waitingforcode.com

May 25, 2019 • Apache Spark SQL

Range partitioning in Apache Spark SQL

The most popular partitioning strategy divides the dataset by the hash computed from one or more values of the record. However other partitioning strategies exist as well and one of them is range partitioning implemented in Apache Spark SQL with repartitionByRange method, described in this post.

Continue Reading →

July 22, 2017 • Apache Spark SQL

Partitioning RDBMS data in Spark SQL

Without any explicit definition, Spark SQL won't partition any data, i.e. all rows will be processed by one executor. It's not optimal since Spark was designed to parallel and distributed processing.

Continue Reading →

Spark SQL partitioning articles

Range partitioning in Apache Spark SQL

Partitioning RDBMS data in Spark SQL