Alerts, guards, and data engineering

While I was writing about agnostic data quality alerts with ydata-profiling a few weeks ago, I had an idea for another blog post which generally can be summarized as "what do alerts do in data engineering projects". Since the answer is "it depends", let me share my thoughts on that.

Data Engineering Design Patterns

Looking for a book that defines and solves most common data engineering problems? I wrote one on that topic! You can read it online on the O'Reilly platform, or get a print copy on Amazon.

I also help solve your data engineering problems 👉 contact@waitingforcode.com 📩

Alerts vs. data guards

Before I give you some details on the answer, I owe you a few words of introduction. In the context of this blog post, the first thing is to know the difference between the alerts and guards. The easiest way to grasp it is to locate them in a data processing timeline. Guards occur before performing any work on the dataset, i.e. they can prevent the work from being done if some conditions are not met. Alerts, on another hand, are more a post-processing component that triggers after the fact, for example after a failure of a job. Consequently, they won't prevent any damage but they will keep you up-to-date with the work in progress by notifying you about anny unexpected behavior.

The world would be too nice if we could stop on this definition, though. However, it's not the case because an alert is a natural consequence of a guard evaluation. Why? Let's imagine the pipeline like in the next diagram:

As you can see, whenever a guard evaluates the dataset as invalid, it emits an alert to notify the pipeline owners about an issue. Without this coupling, you - as a pipeline owner - wouldn't be aware of broken things as long as the consumers of your output don't notify you. You certainly agree with me, it's better to be reactive and build trust towards your data rather than being passive and waiting for bad things to come to you.

Long story short, alerts and guards are different but the guards nourish alerts. More exactly, the guards nourish data quality types of alerts.

Use cases

Alerts are a consequence of the guards but their application scope is much wider than the data quality-related issues. You can use the alerts to be notified about:

I was not technical this time but hopefully, with the examples from the second section I could convince you the alerts are intrinsic parts of a mature data engineering project.

Consulting

With nearly 16 years of experience, including 8 as data engineer, I offer expert consulting to design and optimize scalable data solutions. As an O’Reilly author, Data+AI Summit speaker, and blogger, I bring cutting-edge insights to modernize infrastructure, build robust pipelines, and drive data-driven decision-making. Let's transform your data challenges into opportunities—reach out to elevate your data engineering game today!

👉 contact@waitingforcode.com
đź”— past projects


If you liked it, you should read:

📚 Newsletter Get new posts, recommended reading and other exclusive information every week. SPAM free - no 3rd party ads, only the information about waitingforcode!