Spark testing articles

4-day workshop Β· In-person or online

What would it take for you to trust your Databricks pipelines in production?

A 3-day bug hunt on a 3-person team costs up to €7,200 in lost engineering time. This workshop teaches you to prevent that β€” unit tests, data tests, and integration tests for PySpark and Databricks Lakeflow, including Spark Declarative Pipelines.

Unit, data & integration tests
Medallion architecture & Lakeflow SDP
Max 10 participants Β· production-ready templates
See the full curriculum β†’ €7,000 flat fee Β· cohort of up to 10
Bartosz Konieczny
Bartosz
Konieczny

Testing Spark applications

It's difficult to contest the importance of testing in programming. Tests help to avoid regressions (a lot of regressions) and also to better understand developed code. Spark (and other data processing frameworks by the way) is not an exception of this rule. But, obviously, testing applications working in distributed mode is more tricky than in the case of standalone programs.

Continue Reading β†’

Testing strategies in Spark

After writing a post about testing Spark applications, I decided to take a look at Spark project tests and see which patterns they use to verify framework features.

Continue Reading β†’

ScalaTest and testing styles

At first glance the wide choice of testing families in Scala can scary. After all in JUnit and other xUnit frameworks, the choice of tests declaration is limited. Hopefully after some digging ScalaTest's testing styles become more obvious to understand and to use.

Continue Reading β†’

Apache Spark SQL and unit tests

Some time ago I was involved in a discussion about testing Apache Spark SQL code. In this post, I would like to share my observations about this topic.

Continue Reading β†’