Databricks articles

4-day workshop Β· In-person or online

What would it take for you to trust your Databricks pipelines in production?

A 3-day bug hunt on a 3-person team costs up to €7,200 in lost engineering time. This workshop teaches you to prevent that β€” unit tests, data tests, and integration tests for PySpark and Databricks Lakeflow, including Spark Declarative Pipelines.

Unit, data & integration tests
Medallion architecture & Lakeflow SDP
Max 10 participants Β· production-ready templates
See the full curriculum β†’ €7,000 flat fee Β· cohort of up to 10
Bartosz Konieczny
Bartosz
Konieczny

Running scripts as hooks with Databricks Asset Bundles

Databricks Asset Bundles (DAB) simplify managing Databricks jobs and resources a lot. And they are also flexible because besides the YAML-based declarative way you can add some dynamic behavior with scripts.

Continue Reading β†’

Semantic versioning with a Databricks volume-based package

One of the recommended ways of sharing a library on Databricks is to use the Unity Catalog to store the packages in the volumes. That's the theory but the question is, how to connect the dots between the release preparation and the release process? I'll try to answer this in the blog post.

Continue Reading β†’

Data quality on Databricks - DQX

In the last blog post of the data quality on Databricks series we're going to discover a Databricks Labs product, the DQX library.

Continue Reading β†’

Data quality on Databricks - Spark Expectations

Previously we learned how to control data quality with Delta Live Tables. Now, it's time to see an open source library in action, Spark Expectations.

Continue Reading β†’

Data quality on Databricks - Delta Live Tables

Data quality is one of the key factors of a successful data project. Without a good quality, even the most advanced engineering or analytics work will not be trusted, therefore, not used. Unfortunately, data quality controls are very often considered as a work item to implement in the end, which sometimes translates to never.

Continue Reading β†’

Apache Airflow XCom in Databricks with task values

If you have been working with Apache Airflow already, you certainly met XComs at some point. You know, these variables that you can "exchange" between tasks within the same DAG. If after switching to Databricks Workflows for data orchestration you're wondering how to do the same, there is good news. Databricks supports this exchange capability natively with Task values.

Continue Reading β†’

File trigger in Databricks

For over two years now you can leverage file triggers in Databricks Jobs to start processing as soon as a new file gets written to your storage. The feature looks amazing but hides some implementation challenges that we're going to see in this blog post.

Continue Reading β†’