Storage articles

Home Storage

Looking for something else? Check the categories of Storage:

Apache Avro Apache Cassandra Apache Hudi Apache Iceberg Apache Parquet Apache ZooKeeper Delta Lake Elasticsearch Embedded databases HDFS MySQL PostgreSQL Time series

If not, below you can find all articles belonging to Storage.

October 15, 2025 • Delta Lake

Idempotent writer

If you are ~~old~~ experienced enough, you should remember Apache Spark Structured Streaming file sink where the commit log stores already written files in a dedicated file. Delta Lake uses a similar concept to guarantee idempotent writes, but with less storage overhead.

Continue Reading →

October 9, 2025 • Delta Lake

Tables cloning in Delta Lake

When I was writing the Data Engineering Design Patterns book I had to leave some great suggestions aside. One of them was a code snippet for the Passthrough replicator pattern with Delta Lake's clone feature. But all is not lost as my new Delta Lake blog post will focus on table cloning which is the backbone for implementing the Passthrough replicator pattern!

Continue Reading →

October 1, 2025 • Delta Lake

Constraints in Delta Lake

We all agree, data quality is essential to build trustworthy dashboards or ML algorithms. For so long the single possibility to validate the data for file formats before writing was reserved to the data processing jobs. Thankfully, Delta Lake constraints made this validation possible at the data storage layer (technically, it's still a compute layer but at a very high level of abstraction).

Continue Reading →

August 25, 2025 • Delta Lake

Transactional patterns for Delta Lake before Catalog-managed tables

Dual writes - backend engineers have been facing this challenge for many years. If you are a data engineer with some projects running on production, you certainly faced it too. If not, I hope the blog post will shed some light on that issue and provide you a few solutions!

Continue Reading →

January 10, 2025 • Delta Lake

Delta Lake and restore - traveling in time differently

Time travel is a quite popular Delta Lake feature. But do you know it's not the single one you can use to interact with the past versions? An alternative is the RESTORE command, and it'll be the topic of this blog post.

Continue Reading →

June 18, 2024 • Delta Lake

Delta Lake table as a changelog

One of the big challenges in streaming Delta Lake is the inability to handle in-place changes, like updates, deletes, or merges. There is good news, though. With a little bit of effort on your data provider's side, you can process a Delta Lake table as you would process Apache Kafka topics, hence without in-place changes.

Continue Reading →

March 27, 2024 • Delta Lake

Schema tracking in Delta Lake

Streaming Delta tables is slightly different from streaming native streaming sources, such as Apache Kafka topics. One of the significant differences is schema enforcement. It leads to the job failure in case of schema changes of the streamed table.

Continue Reading →

February 7, 2024 • Delta Lake

Table file formats - streaming writer: Delta Lake

The previous blog from the series we discovered streaming reader. However, an end-to-end streaming Delta Lake pipeline also requires a writer which will be our focus today.

Continue Reading →

January 17, 2024 • Delta Lake

Table file formats - streaming reader: Delta Lake

Even though I'm into streaming these days, I haven't really covered streaming in Delta Lake yet. I only slightly blogged about Change Data Feed but completely missed the fundamentals. Hopefully, this and next blog posts will change this!

Continue Reading →

November 8, 2023 • Delta Lake

Table file formats - checkpoints: Delta Lake

Checkpoints are a well-known fault-tolerance mechanism in stream processing. But what does it have to do with Delta Lake?

Continue Reading →

October 11, 2023 • Delta Lake

Table file formats - vacuum: Delta Lake

If you have some experience with RDBMS, who doesn't btw, you have probably run a VACUUM command to reclaim the storage space occupied by deleted or obsolete rows. If you're now working with Delta Lake, you can do the same!

Continue Reading →

August 30, 2023 • Delta Lake

Table file formats - isolation levels: Delta Lake

If Delta Lake implemented the commits only, I could stop exploring this transactional part after the previous article. But as for RDBMS, Delta Lake implements other ACID-related concepts. One of these are isolation levels.

Continue Reading →

August 23, 2023 • Delta Lake

Table file formats - commits: Delta Lake

One of the great features of modern table file formats is the ability to handle write conflicts. It wouldn't be possible without commits that are the topic of this new blog post.

Continue Reading →

April 28, 2023 • Delta Lake

Table file formats - Schema evolution: Delta Lake

Data lakes have made the data-on-read schema popular. Things seem to change with the new open table file formats, like Delta Lake or Apache Iceberg. Why? Let's try to understand that by analyzing their schema evolution parts.

Continue Reading →

April 8, 2023 • Apache Iceberg

Table file formats - Z-Order compaction: Apache Iceberg

Last time you discovered the Z-Order compaction in Delta Lake. But guess what? Apache Iceberg also has this feature!

Continue Reading →

March 30, 2023 • Delta Lake

Table file formats - Z-Order compaction: Delta Lake

In my recent exploration of the compaction, aka OPTIMIZE command, in Delta Lake, I found this famous Z-Ordering mode. It was one of the most outstanding features when I first heard about Delta Lake. You can't even imagine how impatient I was to see what it is doing under-the-hood. Fortunately, this time has come!

Continue Reading →

February 17, 2023 • Delta Lake

Simplified Delta Lake operations with Mack

I like writing code and each time there is a data processing job to write with some business logic I'm very happy. However, with time I've learned to appreciate the Open Source contributions enhancing my daily work. Mack library, the topic of this blog post, is one of those projects discovered recently.

Continue Reading →

February 10, 2023 • Apache Iceberg

Table file formats - compaction: Apache Iceberg

Compaction is also a feature present in Apache Iceberg. However, it works a little bit differently than for Delta Lake presented last time. Why? Let's see in this new blog post!

Continue Reading →

January 18, 2023 • Delta Lake

Table file formats - Compaction: Delta Lake

The small files is a well known problem in data systems. Fortunately, modern table file formats have built-in features to address it. In the next series we'll see how.

Continue Reading →

November 13, 2022 • Delta Lake

Table file formats - Change Data Capture: Delta Lake

It's time to start the 4th part of the Table file formats series. This time the topic will be Change Data Capture, so how to stream all changes made on the table. As for the 3rd part, I'm going to start with Delta Lake.

Continue Reading →

1
2
3
4
5
6
Next ⟶