Apache Airflow articles

Apache Airflow 2 overview - part 2

Welcome to the 2nd blog post dedicated to Apache Airflow 2 features. This time it'll be more about custom code you can add to the most recent version.

Continue Reading β†’

Apache Airflow 2 overview - part 1

Apache Airflow 2 introduced a lot of new features. The most visible one is probably a reworked UI but there is more! In this and the next blog post I'll show some of the interesting new Apache Airflow features.

Continue Reading β†’

Dealing with time delta in Apache Airflow

Often in batch processing we give the pipeline some time to catch up late data, ie. the pipeline for 9 will be executed only at 11. One of methods to do so in Airflow is to compute delta on the tasks but there is a more "native" way with TimeDeltaSensor.

Continue Reading β†’

DAG evolution - using start_date and end_date?

One of the greatest properties in data engineering is idempotency. No matters how many times you will run your pipeline, you will always end up with the same outcome (= 1 file, 1 new table, ...). However, this property may be easily broken when you need to evolve your pipeline. In this blog post, I will verify one possible way to manage it in Apache Airflow.

Continue Reading β†’

Apache Airflow gotchas

From time to time I try to help other people on StackOverflow and one of my tagged topics is Apache Airflow. In this blog post I'll try to show you some problems I saw there last few months.

Continue Reading β†’

Apache Airflow and sequential execution

One of patterns that you may implement in batch ETL is sequential execution. It means that the output of one job execution is a part of the input for the next job execution. Even though Apache Airflow comes with 3 properties to deal with the concurrence, you may need another one to avoid bad surprises.

Continue Reading β†’

Testing sensors in Apache Airflow

Unit tests are the backbone of any software, data-oriented included. However testing some parts that way may be difficult, especially when they interact with the external world. Apache Airflow sensor is an example coming from that category. Fortunately, thanks to Python's dynamic language properties, testing sensors can be simplified a lot.

Continue Reading β†’

Externally triggered DAGs in Apache Airflow

In one of my previous posts, I described orchestration and coordination in the data context. At the end I promised to provide some code proofs to the theory and architecture described there. And that moment of truth is just coming.

Continue Reading β†’