To facilitate parallel processing Apache Spark and Apache Kafka have their concept of partitions, Apache Beam works with bundles and Gnocchi deals with sacks. Despite the different naming, the sacks are the same for Gnocchi as the partitions for Spark or Kafka - the unit of work parallelization.
The power of Big Data processing platforms resides mainly in the ability to parallelize processing on different nodes. Each framework has its own unit of parallelism. In Spark it's called partition. Apache Beam calls it bundle.
In October I published the post about Partitioning in Spark. It was an introduction to the partitioning part, mainly focused on basic information, as partitioners and partitioning transformations (coalesce and repartition). This time it's a good moment to take other partition points up.
Without any explicit definition, Spark SQL won't partition any data, i.e. all rows will be processed by one executor. It's not optimal since Spark was designed to parallel and distributed processing.
Partitioning in distributed data is quite common concept. Spark is not an exception and it also has some operations related to partitions.