Apache Spark SQL lets us to manipulate JSON fields in many different ways. One of the features is a field extraction from a stringified JSON with json_tuple(json: Column, fields: String*) function:
val contentString =
"""| { "value1": "1", "value2": "2" } """.stripMargin
val sparkSession: SparkSession = SparkSession.builder()
.appName("Spark SQL json_tuple")
.master("local[*]").getOrCreate()
import sparkSession.implicits._
val inputData = Seq((contentString)).toDF("json_field")
Let's ensure first that contentString is really a string by printing the schema (inputData.printSchema):
root |-- json_field: string (nullable = true)
To extract one of available keys of the stringified JSON, you can use this snippet:
val extractedValues = inputData
.withColumn("value1", functions.json_tuple($"json_field", "value1"))
.withColumn("value2", functions.json_tuple($"json_field", "value2"))
.collect()
.map(row => (row.getAs[String]("value1"), row.getAs[String]("value2")))
extractedValues should contain only (("1", "2"))