Data engineering articles

Useful classes for data engineers - Scala and Java

We all have our habits and as programmers, libraries and frameworks are definitely a part of the group. In this blog post I'll share with you a list of Java and Scala classes I use almost every time in data engineering projects. The part for Python will follow next week!

Continue Reading β†’

ACID file formats - file system layout

Last week I presented the API of the 3 analyzed ACID file formats. Under-the-hood, they obviously generate data files but not only. And that's something we'll focus on in this blog post.

Continue Reading β†’

ACID file formats - API

It's time to start a new series on the blog! I hope to catch on to the ACID file formats that are gaining more and more importance. It's also a good occasion to test a new learning method. Instead of writing one blog post per feature and format, I'll try to compare Delta Lake, Apache Iceberg, and Apache Hudi concepts in the same article. Besides this personal challenge, I hope you'll enjoy the series and also learn something interesting!

Continue Reading β†’

Reverse ETL

The first "reverse" term I've ever encountered in programming was reverse proxy. Since then, I've seen passing "reverse engineering", "reverse iterator", but none of them was a pure data term. Until recently, when I heard about reverse ETL.

Continue Reading β†’