Articles about Data engineering patterns on waitingforcode.com - articles for the pleasure of learning and discovery

May 5, 2019 • Data engineering patterns

Big Data patterns implemented - fan-out ingress in Apache Spark Structured Streaming

In the previous post from Big Data patterns implemented series, I wrote about a pattern called fan-in ingress. The idea was to consolidate the data coming from different sources. This time I will cover its companion called fan-out ingress, doing exactly the opposite.

Continue Reading →

April 24, 2019 • Data engineering patterns

Big Data patterns implemented - fan-in ingress

The series about the implementation of Big Data patterns continues. This time I will focus on a streaming pattern called fan-in ingress.

Continue Reading →

April 17, 2019 • Data engineering patterns

Big Data patterns implemented - automated processing metadata insertion

Sometimes metadata is disregarded but very often it helps to retrieve the information easier and faster. One of such use cases are the headers of Apache Parquet where the stats about the column's content are stored. The reader can, without parsing all the lines, know whether what is he looking for is in the file or not. The metadata is also a part of one of Big Data patterns called automated processing metadata insertion.

Continue Reading →

April 11, 2019 • Data engineering patterns

Big Data patterns implemented - automated dataset execution

Some time ago I found a site listing Big Data patterns (link in "Read also" section). However, that site describes them from a very general point of view and it's not always obvious to figure out the what, why and how. That's why I decided to start a new series of posts where I will try to describe these patterns and give some more technical context.

Continue Reading →

Data engineering patterns articles

Big Data patterns implemented - fan-out ingress in Apache Spark Structured Streaming

Big Data patterns implemented - fan-in ingress

Big Data patterns implemented - automated processing metadata insertion

Big Data patterns implemented - automated dataset execution