Articles about Spark partitioning on waitingforcode.com

May 25, 2019 • Apache Spark SQL

Range partitioning in Apache Spark SQL

The most popular partitioning strategy divides the dataset by the hash computed from one or more values of the record. However other partitioning strategies exist as well and one of them is range partitioning implemented in Apache Spark SQL with repartitionByRange method, described in this post.

Continue Reading →

July 22, 2017 • Apache Spark SQL

Partitioning RDBMS data in Spark SQL

Without any explicit definition, Spark SQL won't partition any data, i.e. all rows will be processed by one executor. It's not optimal since Spark was designed to parallel and distributed processing.

Continue Reading →

October 23, 2016 • Apache Spark

Partitioning in Spark

Partitioning in distributed data is quite common concept. Spark is not an exception and it also has some operations related to partitions.

Continue Reading →

Spark partitioning articles

Range partitioning in Apache Spark SQL

Partitioning RDBMS data in Spark SQL

Partitioning in Spark