Articles about streaming on waitingforcode.com

June 5, 2024 • General data engineering

Infoshare 2024: Stream processing fallacies, part 2

The blog shares the last fallacies for my 7 years stream processing journey.

May 30, 2024 • General data engineering

Infoshare 2024: Stream processing fallacies, part 1

Last week I was speaking in Gdansk on the DataMass track at Infoshare. As it often happens, the talk time slot impacted what I wanted to share but maybe it's for good. Otherwise, you wouldn't read stream processing fallacies!

Continue Reading →

April 25, 2024 • General data engineering

Event time skew in stream processing

As a data engineer you're certainly familiar with data skew. Yes, this bad phenomena where one task takes considerably more input than the others and often causes unexpected latency or failures. Turns out, stream processing also has its skew but more related to time.

Continue Reading →

January 10, 2024 • General data engineering

Files streaming is quite a challenge

It's technically possible to process files in a continuous way from a streaming job. However, if you are expecting some latency sensitive job, this will always be slower than processing data directly from a streaming broker. Why?

Continue Reading →

January 3, 2024 • Data engineering patterns

Stream processing models

If you're interested in stream processing, I bet your thinking is technology-based. It's not wrong, after all, the ability to use a tool gives you and me a job. However, for a long-term consideration it's better to reason in terms of patterns or models. Being aware of a more general vision helps assimilate new tools.

Continue Reading →

December 27, 2023 • General data engineering

Streamhouse, the next house to move into?

I must admit it, if you want to catch my attention, you can use some keywords. One of them is "stream". Knowing that, the topic of my new blog post shouldn't surprise you.

Continue Reading →

August 17, 2023 • Data engineering on AWS

Don't sleep when you code...about sleep issue in KPL

Lessons learned why it's always worth checking the code implementation to avoid surprises later. Even for vendor-supported solutions.

Continue Reading →

May 12, 2023 • Data engineering on AWS

Kinesis sequence number is not an Apache Kafka offset

I have used to say "Kinesis Data Streams is like Apache Kafka, an append-only streaming broker with partitions and offsets". Although often it's true, it's not that simple unfortunately.

Continue Reading →

May 5, 2023 • Data engineering on AWS

Amazon Kinesis is not Apache Kafka

Open Source tools helped me switch to the cloud world a lot. The managed cloud services often share the same fundamentals as their Open alternatives. However, there is always something different. Today I'll focus on these differences for Amazon Kinesis service and Apache Kafka ecosystem.

Continue Reading →

February 23, 2023 • General data engineering

Backpressure in the data systems

Having a scalable architecture is the nowadays must but sometimes it may not be enough to provide consistent performance. Sometimes the business requirements, such as consistent delivery time or ordered delivery, can add some additional overhead. Consequently, scalability may not suffice. Fortunately, there are other mechanisms like backpressure that can be helpful.

Continue Reading →

streaming articles