Trunk-based development and data engineering

You're likely familiar with the classic development workflow using main and develop branches to promote code from development to production. But did you know there's an alternative that uses only a single main branch? If not, this post is a great opportunity to learn how it works.

Data Engineering Design Patterns

Looking for a book that defines and solves most common data engineering problems? I wrote one on that topic! You can read it online on the O'Reilly platform, or get a print copy on Amazon.

I also help solve your data engineering problems 👉 contact@waitingforcode.com 📩

This single branch workflow is called Trunk-based development. To put it short, trunk-based development is a version control strategy where you merge small updates into a single shared branch, called the trunk, to minimize merge complexity.

As a result, your development strategy is more straightforward when compared to the Gitflow approach:

That's only the high-level view. To better grasp the differences with Gitflow, we need to recall some basics about this development strategy.

Gitflow 101

To characterize Gitflow we could use the following terms:

Overall, to deliver features to production may involve many actions summarized in the next diagram:

Is it bad? If you are comfortable with this workflow, not at all! In the end there is a human - you and your team - that delivers things on production. If Gitflow helps you guarantee a good quality of the releases, there is nothing wrong with it.

By the way, Gitflow is a valid choice for development workflow:

The isolation brings a lot of flexibility and safety, but it's also one of the biggest weaknesses of the Gitflow model:

Trunk-based alternative

You saw it previously, many branches add some flexibility but they also make things complex. From that constant a natural alternative appears, the Trunk-based development where engineers collaborate from a single long-living common branch called trunk (main). Having this single branch involves:

Overall, the development workflow presented previously for Gitflow can be now summarized to this:

As you can see, fixing bugs is considered as developing a feature, i.e. creating a branch from the trunk and merging it back. It's worth adding that the deployment strategy doesn't consist of blindly pushing the code from the main branch to your production environment. You can apply a more standardized workflow targeting various environments despite having a single deployable branch:

If you don't try to bypass the deployment guards, there is no risk your code will reach the Production environment without any control. Whenever your feature tests detect some issues, you need to return to the main branch to fix them.

Unfortunately this single branch-based is a double-edged sword and some real-world scenarios may require some extra effort:

Trunk-based development and a data project?

Now the question that might be the most interesting for you. Should I use Trunk-based workflows in my data engineering projects? The answer is - as always you'll tell me - it depends.

When it comes to people, trunk-based should be a relatively good option for engineering teams where all team members have a good overall understanding of their technical scope. Besides - and here I'm not considering people in terms of years of experience which is often quoted in the trunk-based development context - team members should own the project from their development to the final release stage. They should be conscious enough to raise an alert about a deployment in progress but the trunk contains commits not planned in the delivery. They also should be reactive enough. It won't be possible to create short-living branches if the code review takes ages.

In addition, engineers should be aware that any change merged to the trunk should be ready to be deployed to production. Put differently, you should test as much as possible on your local environment or on the sandbox environment. You shouldn't consider merging to the main branch and deploying to the development environment as part of your incremental development strategy. Otherwise not finalized code can make it to production.

Long story short, if managing branches is an issue, your team is well structured and self-aware, trunk-based development can be a good option to try. If not, it's always good to know it. Who knows, maybe in a few months your new project will be trunk-based?

Consulting

With nearly 17 years of experience, including 9 as data engineer, I offer expert consulting to design and optimize scalable data solutions. As an O’Reilly author, Data+AI Summit speaker, and blogger, I bring cutting-edge insights to modernize infrastructure, build robust pipelines, and drive data-driven decision-making. Let's transform your data challenges into opportunities—reach out to elevate your data engineering game today!

👉 contact@waitingforcode.com
đź”— past projects