Apache Spark Structured Streaming articles

Analyzing Structured Streaming Kafka integration - Kafka source

Spark 2.2.0 brought the change of structured streaming state. Between 2.0 and 2.2.0 it was marked as "alpha". But the last version changed this status to General Availability. It's so a good moment to start to play with this new feature - even if some basics have already been covered in the post about structured streaming. This time we'll go deeper and analyze the integration with Apache Kafka that will be helpful to

Continue Reading →

Structured streaming

Project Tungsten, explained in one of previous posts, brought a lot of optimizations - especially in terms of memory use. Until now it was essentially used by Spark SQL and Spark MLib projects. However, since 2.0.0, some work was done to integrate DataFrame/Dataset in streaming processing (Spark Streaming).

Continue Reading →