Apache Spark Structured Streaming joins articles

Stream-to-stream joins internals

In 3 recent posts about Apache Spark Structured Streaming we discovered streaming joins: inner joins, outer joins and state management strategies. Discovering what happens under-the-hood of all of these operations is a good point to sum up the series.

Continue Reading β†’

Stream-to-stream state management

Last weeks we've discovered 2 stream-to-stream join types in Apache Spark Structured Streaming. As told in these posts, state management logic may be sometimes omitted (for inner joins) but generally it's advised to reduce the memory pressure. Apache Spark proposes 3 different state management strategies that will be detailed in the following sections.

Continue Reading β†’

Outer joins in Apache Spark Structured Streaming

Previously we discovered inner stream-to-stream joins in Apache Spark but they aren't the single supported type. Another one are outer joins that let us to combine streams without matching rows.

Continue Reading β†’

Inner joins between streams in Apache Spark Structured Streaming

Apache Kafka Streams supports joins between streams and the community expected the same for Apache Spark. This feature was implemented and released with recent 2.3.0 version and after some months after that, it's a good moment to talk a little about it.

Continue Reading β†’