In-person or online · 4-day intensive

The Real Cost of
Untested Data Pipelines.

A single bug in production costs your team between €400–€800 per day (typical data engineering daily rate) to investigate and fix, multiplied by however many days it takes to find it. A 3-day bug hunt on a 3-person team up to €7,200 in lost engineering time. One workshop pays for itself the first time it prevents one incident.

This practical workshop teaches data engineers how to write tests that actually catch bugs, before your stakeholders do. Covering unit tests, integration tests, and data tests for PySpark and Databricks Lakeflow.

4 days intensive live training
5 hours hands-on exercises
1 Github repo with production-ready blueprints
4 hours free team support after the workshop

Data engineering without tests
is a ticking time bomb.

🕵️

Bad data, discovered late

The NULL values that slipped in three weeks ago are now polluting every dashboard in the company. The data is technically "there" — it's just wrong and you need to run a costly reprocessing to fix the issue.

😰

Afraid to refactor

You know the 2-year-old Spark job needs a rewrite. But there are no tests, no safety net. Touch it and everything might break.

🔁

Manual QA forever

Every deployment means someone manually running SQL queries to "check if it looks right." It's tedious, unreliable, and doesn't scale.

💥

AI-Generated code shipped without control

AI-generated code can accelerate your workflow dramatically, but volume is not the same as reliability. Without proper validation, you have no way of knowing how that code will actually behave in production.

🤷

"We don't test data, we test code"

Software engineering testing practices don't map directly to data pipelines. So most teams just... don't test. Until something explodes.

🧱

No testing culture on the team

You know tests would help. But you don't know where to start, what to test, or how to convince your teammates and manager it's worth the investment.

💸

Tests are a cost

You have a test suite but it's painful to maintain and don't know what you are doing wrong.

🤓

"It works on my machine"

Unfortunately, your colleague left with his machine.

These are not edge cases. They are the default state of data teams that ship without a testing strategy. The good news: every one of them is preventable.

A single production data incident typically costs your team 1–5 engineering days of firefighting — before you start counting the stakeholder trust you'll need weeks to rebuild. A 4-day workshop is a bargain by comparison.


A complete testing toolkit
built for the Databricks Lakehouse.

Unlike generic software testing courses, this workshop is firmly grounded in the Databricks platform — Apache Spark, Delta Lake, Lakeflow Spark Declarative Pipelines, and Declarative Automation Bundles - while also covering the universal testing principles that apply to any data stack.

Four full days of live, hands-on training with the instructor — in person at your location or online via video call, whichever works best for you. Every session includes exercises you write and run yourself.

Day 1
Foundations & Unit testing for Databricks workloads
09:00
Testing approaches for data systems
~1 h

We open by answering the foundational question: why can't we just apply standard software testing practices to data pipelines? Non-determinism, external state, and data evolution make data systems a genuinely different beast. Once that's clear, we establish the software engineering principles that bring discipline to the chaos.

  • Doubtless engineering
  • Eyeball testing
  • Software engineering Test Pyramid
  • Why these approaches are not working
  • Test square is the new pyramid for data systems
10:00
Unit tests
~5h

With the foundations in place, we turn to the first line of defense: unit tests. They remain essential for catching logic errors early and fast. We cover what makes a unit test genuinely useful versus one that gives false confidence, the code patterns and libraries that make writing them less painful, and how to handle the unique challenges of testing PySpark and Databricks workloads. We close by embedding unit tests into the development lifecycle and clearing up the misconceptions that lead teams to either over-rely on them or abandon them too soon

  • Why do they still matter as the first guard?
  • Golden rules of useful and reliable unit tests
  • Useful code constructs for efficient unit testing
  • Life-saving libraries
  • Unit tests for PySpark and Databricks
  • Automation in development lifecycle
  • Misconceptions and gotchas
  • Hands-on lab adding tests to an existing PySpark code base
  • Q&A live discussion on what we learned
Day 2
Data tests
09:00
Data tests
~6h

Moving beyond unit tests, we explore how data tests operate at a different level — validating that the pieces work together against real data and real systems. We build a data quality layer that turns passive observations into active test controls, and rethink assertions in this context. The section closes by wiring everything into the CI/CD process so data tests become a natural checkpoint in every deployment.

  • Difference with Unit Tests and Integration Tests
  • Building data quality layer for tests
  • Transforming data quality observations into actionable test controls
  • Assertions differently
  • Integrating with the CI/CD process
Day 3
Integration tests & Medallion architecture
09:00
Integration tests
~2h

Here we tackle integration tests in practice, focusing on the specific challenges that come with Databricks environments. We look at how to keep maintenance overhead from becoming a burden as the test suite grows, and how to automate execution so integration tests run reliably without manual intervention.

  • Testing Databricks-specific features
  • Mitigating Integration Test maintenance overhead
  • Automating the execution
11:00
Medallion architecture
~2h

We apply the testing strategies to the Medallion architecture, following data as it moves through the Bronze, Silver, and Gold layers. Each layer introduces its own failure modes, so we map out where problems are most likely to originate and how to trace them back to their source. We then address the most common issues that surface at each layer, showing how unit tests, data tests, and integration tests each play a role in keeping the pipeline healthy end to end.

  • The Bronze → Silver → Gold principle
  • Identifying where the problems can came from
  • Addressing most common issues with Unit Tests, Data Tests, and Integration Tests
13:00
Tests and Lakeflow Spark Declarative Pipelines
~2h

Spark Declarative Pipelines come with a common misconception: no SparkSession must mean difficult to test. We unpack how the main SDP script works like a Python __main__ function — it declares the workflow while the real logic lives elsewhere, and that separation is actually your testing advantage. Any business logic extracted from the pipeline can be covered with the unit tests from the previous section. We walk through concrete test examples and finish by integrating SDP tests into the CI/CD pipeline.

  • Integrating tests best practices to the declarative world
  • Learning how to approach SDP pipelines to make them testable
  • Testing incremental logic across Bronze, Silver, and Gold layers
  • Keeping LSDP capabilities for testable workloads
Day 4
Lakeflow Spark Declarative Pipelines & capstone project
09:00
Testing and LLMs
~2h

Before diving into the capstone project, we take a step back to look at one of the most exciting shifts happening in the data engineering space right now: using LLMs as an active partner in building and maintaining your testing layer. We explore how LLMs can generate unit test cases from your pipeline code, produce realistic synthetic datasets, translate business requirements into data quality rules, and help you spot logic paths your current test suite is missing.

11:00
Capstone project
~3h

We close the workshop with a capstone project that brings everything together. Starting from a realistic Databricks data pipeline, participants will apply the full testing stack hands-on: writing unit tests to guard business logic, data tests to enforce quality at each layer of the Medallion architecture, and integration tests to validate the pipeline end to end. The project is designed to reflect the real challenges data engineers face — non-determinism, external state, PySpark specifics and challenges participants to make deliberate choices about what to test, at which level, and how to wire it all into a CI/CD pipeline. By the end, the testing strategies covered throughout the workshop stop being abstract concepts and become a working, cohesive test suite

15:00
Retrospective and final Q&A
~1h

We close the day with an open retrospective and Q&A. This is a space to surface lingering doubts, challenge the approaches presented, and share lessons from the capstone project. No slides, no structure — just an honest conversation about what works, what doesn't.


Bartosz Konieczny

Bartosz Konieczny

Freelance Data Engineer & Author

Data Engineering Design Patterns waitingforcode.com GitHub

Coding since 2010. Shipping data systems ever since.

I'm a freelance data engineer who has held senior hands-on positions across the industry, working on data engineering problems in both batch and stream processing. My work spans Apache Spark, Databricks, Apache Kafka, and Delta Lake across major public cloud platforms.

I write about everything I learn on waitingforcode.com — one of the most comprehensive data engineering blogs on the internet, with deep dives on Apache Spark and Databricks internals, stream processing, and distributed systems. I've spoken at the Spark+AI Summit, the Data+AI Summit, among others.

This workshop distills everything I've learned building, breaking, and fixing real data systems. Not slides — code, patterns, and hard-won lessons.

I'm also the author of:

Data Engineering Design Patterns book cover
Data Engineering Design Patterns Recipes for Solving the Most Common Data Engineering Problems
O'Reilly Media · April 2025
Freelance data engineer with senior hands-on positions since 2010
Author of Data Engineering Design Patterns (O'Reilly, 2025)
Expertise in Apache Spark, Databricks, Apache Kafka, Delta Lake, Python
Speaker at Spark+AI Summit, Data+AI Summit
Blogger at waitingforcode.com — thousands of engineers worldwide


Everything you need to succeed.

👥 Maximum 10 participants per session. This is not a webinar. It's a small-group intensive where you get real attention, live code reviews, and answers to your specific questions — not pre-recorded answers to someone else's.
📍

In-person or Online

Your choice. On-site at your office, at an external venue, or via video call — same content, same instructor.

🧑‍🏫

Live with the Instructor

Four full days of direct access. Ask questions as you go, get unstuck in real time, no asynchronous delays.

💻

Hands-on Labs

A real GitHub repo with exercises for every topic. You write and run tests yourself, not just watch.

🎯

Real-world Templates

Production-ready files and architectures; adapt them and ship to your production environment.

💬

Post-workshop Support

4-hours time credit to ask follow-up questions as you apply what you've learned.


One workshop. A testing strategy for life.

Four full days with an O'Reilly author and Databricks MVP since 2020, maximum 10 participants, fully focused on Databricks and PySpark. Here's how it compares.

Market context

Corporate trainings and workshops €1,000–€1,500 / person
↓ same duration, smaller group, practitioner instructor
Senior data consultant — day rate × 4 €2,000–€3,000 / person
↓ same expertise, structured curriculum, group learning
This workshop — 4 days · max 10 participants €7,000 / flat fee per cohort
flat fee per cohort · Max 10
7,000

In-person or online · Your choice of format · Travel costs separate for in-person

4 full days of live training with the instructor
In-person at your location or online — you decide
Hands-on exercise repository
Production-ready blueprints
Free 4-hours post-workshop support
Get in touch to book
Not sure yet? Email me with your team size and preferred format. I'll answer any questions and we'll find a date that works for everyone.

Frequently asked questions.

What experience level is this workshop for? +
The workshop targets working data engineers with at least 1 year of professional experience. You should be comfortable writing Python and be familiar with Databricks Lakeflow offering. You don't need prior testing experience — but you should understand what data pipelines and transformations are. If you're unsure, email me and I'll tell you honestly whether it's a fit.
What tools will we use? +
Python (pytest), PySpark, Databricks, Delta Lake, Lakeflow Spark Declarative Pipelines, GitHub Actions, DXQ, and Declarative Automation Bundles. Exercises run locally with a lightweight SparkSession and a free Databricks workspace. A laptop with IDE, Python 3.12+ and internet access is all you need.
What's the difference between the in-person and online format? +
The content and exercises are identical. In-person works best for teams that want full immersion and benefit from whiteboard discussions; I travel to your office or we arrange a venue together. Online runs over video call with shared screens for the live coding — it works great for distributed teams or when travel isn't practical. Let me know your situation and we'll pick the right fit.
We use Airflow to orchestrate our Databricks jobs — is that covered? +
The workshop focuses on testing the Databricks layer itself (Lakeflow PySpark jobs, Lakeflow Spark Declarative Pipelines, Declarative Automation Bundles) rather than the orchestration layer. The testing patterns you'll learn apply regardless of whether Databricks is triggered by Airflow, a Databricks Lakeflow, or any other scheduler.
Can I expense this through my company? +
Absolutely. I provide a proper invoice for business purchases. Email me at contact@waitingforcode.com and I'll send you everything you need for your finance team.
How many people can attend? +
The workshop works best with groups of 4-10 participants.
Will my data stack be bug-free after the workshop? +
Testing won't make your data stack bug-free — no honest answer will promise that. But it will give you the tools to catch issues early, respond faster, and keep most problems from ever reaching production.

Your pipelines deserve tests.
Your team deserves confidence.

Book the workshop — €7,000 / group

€7,000 · Max 10 participants · In-person or online

W G

Testing for Data Engineers  ·  contact@waitingforcode.com  ·  © 2026 All rights reserved.

4-day live workshop · In-person or online · Dates on request  ·  waitingforcode.com