Apache Spark SQL lets us to manipulate JSON fields in many different ways. One of the features is a field extraction from a stringified JSON with json_tuple(json: Column, fields: String*) function:
val contentString = """| { "value1": "1", "value2": "2" } """.stripMargin val sparkSession: SparkSession = SparkSession.builder() .appName("Spark SQL json_tuple") .master("local[*]").getOrCreate() import sparkSession.implicits._ val inputData = Seq((contentString)).toDF("json_field")
Let's ensure first that contentString is really a string by printing the schema (inputData.printSchema):
root |-- json_field: string (nullable = true)
To extract one of available keys of the stringified JSON, you can use this snippet:
val extractedValues = inputData .withColumn("value1", functions.json_tuple($"json_field", "value1")) .withColumn("value2", functions.json_tuple($"json_field", "value2")) .collect() .map(row => (row.getAs[String]("value1"), row.getAs[String]("value2"))) extractedValues should contain only (("1", "2"))