Articles about Spark SQL RDBMS on waitingforcode.com

August 18, 2018 • Apache Spark SQL

RDBMS options in Apache Spark SQL

Some recent posts covered important Spark SQL options for RDBMS: partitioning and write modes. However they're not the only ones available for this data storage.

Continue Reading →

August 11, 2018 • Apache Spark SQL

SaveMode.Overwrite trap with RDBMS in Apache Spark SQL

Some months ago I presented save modes in Spark SQL. However, this post was limited to their use in files. I was quite surprised to observe some specific behavior of them for RDBMS sinks. Especially for SaveMode.Overwrite.

Continue Reading →

July 22, 2017 • Apache Spark SQL

Partitioning RDBMS data in Spark SQL

Without any explicit definition, Spark SQL won't partition any data, i.e. all rows will be processed by one executor. It's not optimal since Spark was designed to parallel and distributed processing.

Continue Reading →

July 22, 2017 • Apache Spark SQL

Loading data from RDBMS

Structured data processing takes more and more place in Apache Spark project. Structured streaming is one of the proofs. But how does Spark SQL work - and particularly, how does it load data from sources of structured data as RDMBS ?

Continue Reading →

May 21, 2017 • Apache Spark SQL

Schema projection

Even if it's always better to explicit things, in programming we have often the possibility to let the computer to guess. Spark SQL also has this level of intelligence, for example during schema resolving.

Continue Reading →

Spark SQL RDBMS articles

RDBMS options in Apache Spark SQL

SaveMode.Overwrite trap with RDBMS in Apache Spark SQL

Partitioning RDBMS data in Spark SQL

Loading data from RDBMS

Schema projection