Apache Spark 4.1.0 features articles

4-day workshop Β· In-person or online

What would it take for you to trust your Databricks pipelines in production?

A 3-day bug hunt on a 3-person team costs up to €7,200 in lost engineering time. This workshop teaches you to prevent that β€” unit tests, data tests, and integration tests for PySpark and Databricks Lakeflow, including Spark Declarative Pipelines.

Unit, data & integration tests
Medallion architecture & Lakeflow SDP
Max 10 participants Β· production-ready templates
See the full curriculum β†’ €7,000 flat fee Β· cohort of up to 10
Bartosz Konieczny
Bartosz
Konieczny

Spark Declarative Pipelines 101

One of the biggest changes to the Apache Spark Structured Streaming API over the past few years is undoubtedly the introduction of the declarative API, AKA Spark Declarative Pipelines. This post kicks off a three-part series dedicated to this new functionality. By the end of these articles, you will be able to effectively leverage declarative programming in your workflows and gain a deeper understanding of what happens under the hood when you do.

Continue Reading β†’

Spark Declarative Pipelines, going further

Last week, we discovered Spark Declarative Pipelines as a new way of writing streaming pipelines. However, writing the pipelines is only half the battle; the other and perhaps more critical task is understanding exactly what happens once they are in motion. That is exactly what we are going to dive into today.

Continue Reading β†’

Spark Declarative Pipelines internals

Welcome back to our series on Spark Declarative Pipelines (SDP)! So far, we've tackled the fundamentals of building jobs and the logistics of operationalizing them in production. Now that your pipelines are running smoothly, it's time to pop the hood and see what's actually happening under the surface.

Continue Reading β†’