How to create an empty DataFrame in Apache Spark SQL? - programming tips about Spark SQL on waitingforcode.com

Like what I do? Support me on Ko-fi

There are different ways to create a DataFrame in Apache Spark SQL. This rule applies also on an empty Dataset that may be useful if you prefer to deal with emptiness rather than missing values (null object pattern).

The easiest way is to use emptyDataFrame field of SparkSession. The only problem is that it will ignore schema:

private val TestSparkSession = SparkSession.builder()
  .master("local[*]")
  .appName("Empty session test").getOrCreate()

TestSparkSession.emptyDataFrame.printSchema()

// prints root

A more interesting way which supports custom schema, uses SparkSession createDataFrame(rowRDD: RDD[Row], schema: StructType) method:

val emptyDatasetWithSchema = TestSparkSession.createDataFrame(TestSparkSession.sparkContext.emptyRDD[Row],
      schema)

This time the DataFrame will contain the schema we want:

root
 |-- name: string (nullable = true)
 |-- address: string (nullable = false)