How to show the generated code?

To debug your Apache Spark SQL programs, or even to understand how it works better, you can use debugging features exposed through org.apache.spark.sql.execution.debug package. One of them lets you see the generated code:

  "debugCodegen" should "show generated code" in {
    val sparkSession = SparkSession.builder()
      .appName("Codegen print").master("local[*]").getOrCreate()
    import sparkSession.implicits._
    val dataset = Seq(
      (1, "a"), (1, "a"), (1, "a"), (2, "b"), (2, "b"), (3, "c"), (3, "c")
    ).toDF("nr", "letter")

    import org.apache.spark.sql.execution.debug._
    dataset.groupBy($"nr").count().debugCodegen()
  }

The above snippet should print:

Found 2 WholeStageCodegen subtrees.
== Subtree 1 / 2 ==
*(1) HashAggregate(keys=[nr#5], functions=[partial_count(1)], output=[nr#5, count#16L])
+- LocalTableScan [nr#5]

Generated code:
/* 001 */ public Object generate(Object[] references) {
/* 002 */   return new GeneratedIteratorForCodegenStage1(references);
/* 003 */ }
/* 004 */
/* 005 */ // codegenStageId=1
// ...