Landing zone or direct writes?

I don't know whether it's a good sign or not, but I start having some convictions about building data systems. Of course, building an architecture will always be the story of trade-offs but there are some practices that I tend to prefer than the others. And in this article I will share my thoughts on one of them.

TL; TR: that's only my thoughts about the topic and I don't have the monopolies of knowledge. Take it carefully and do not consider it as a single source of truth.

First and foremost, what do I mean by landing zone and direct writes? To understand the difference, let's take the example of a data pipeline that processes some JSON files and has to write it to a data warehouse storage (structured data). The dataflow is represented in the following diagram:

How can this pipeline be implemented? One approach would write the data directly to the data warehouse solution from the data processing logic and that's something I'm calling here the direct writes. The second one would pass through intermediary storage and the data loading step would be performed by another process managed by an orchestrator. For the purpose of this blog post I called this intermediary storage landing zone. The following picture summarizes these 2 approaches:

I won't hide it, I like the second one. Why? Due to the following reasons:

But don't get me wrong, I'm still convinced that the solution should be adapted to the problem and maybe for different use cases, a landing zone will not exist. I just hope that thanks to the list you will maybe discover new pros of using a landing zone that may have a real impact on your architecture design! And I'm really curious about your vision for this problem and will be glad if you share it on the comments below :)


If you liked it, you should read:

📚 Newsletter Get new posts, recommended reading and other exclusive information every week. SPAM free - no 3rd party ads, only the information about waitingforcode!