Articles about Structured Streaming file sink on waitingforcode.com

July 12, 2020 • Apache Spark Structured Streaming

File sink and Out-Of-Memory risk

A few weeks ago I wrote 3 posts about file sink in Structured Streaming. At this time I wasn't aware of one potential issue, namely an Out-Of-Memory problem that at some point will happen.

Continue Reading →

June 14, 2020 • Apache Spark Structured Streaming

Structured Streaming file sink and reprocessing

I presented in my previous posts how to use a file sink in Structured Streaming. I focused there on the internal execution and its use in the context of data reprocessing. In this post I will address a few of the previously described points.

Continue Reading →

June 6, 2020 • Apache Spark Structured Streaming

File sink and manifest compaction

In my previous post I introduced the file sink in Apache Spark Structured Streaming. Today it's time to focus on an important concept of this output format which is the manifest file lifecycle.

Continue Reading →

May 30, 2020 • Apache Spark Structured Streaming

File sink in Apache Spark Structured Streaming

One of the homework tasks of my Become a Data Engineer course is about synchronizing streaming data with a file system storage. When I was trying to implement this part, I found a manifest-based file stream that I will explore in this and next blog posts.

Continue Reading →

Structured Streaming file sink articles

File sink and Out-Of-Memory risk

Structured Streaming file sink and reprocessing

File sink and manifest compaction

File sink in Apache Spark Structured Streaming