Articles about Apache Spark Streaming on waitingforcode.com - articles for the pleasure of learning and discovery

July 2, 2017 • Apache Spark Streaming

Graceful shutdown explained

Spark has different methods to reduce data loss, also during streaming processing. It proposes well known checkpointing but also less obvious operation invoked on stopping processing - graceful shutdown.

Continue Reading →

June 11, 2017 • Apache Spark Streaming

Stateful transformations with mapWithState

updateStateByKey function, explained in the post about Stateful transformations in Spark Streaming, is not the single solution provided by Spark Streaming to deal with state. Another one, much more optimized, is mapWithState.

Continue Reading →

May 29, 2017 • Apache Spark Streaming

Metadata checkpoint

One of previous posts talked about checkpoint types in Spark Streaming. This one focuses more on one type of them - metadata checkpoint.

Continue Reading →

April 30, 2017 • Apache Spark Streaming

SparkException: org.apache. spark. streaming. dstream. MappedDStream@7a388990 has not been initialized

Metadata checkpoint is useful in quickly restoring failing jobs. However, it won't work if the context creation and processing parts aren't declared correctly.

Continue Reading →

November 18, 2016 • Apache Spark Streaming

Window-based transformations in Spark Streaming

Regarding to batch-oriented processing in Spark, new transformation types in Spark Streaming are based on time periods.

Continue Reading →

November 18, 2016 • Apache Spark Streaming

Stateful transformations in Spark Streaming

Spark Streaming is able to handle state-based operations, ie. operations containing a state susceptible to be modified in subsequent batches of data.

Continue Reading →

November 18, 2016 • Apache Spark Streaming

Spark Streaming checkpointing and Write Ahead Logs

Checkpoint allows Spark to truncate dependencies on previously computed RDDs. In the case of streams processing their role is extended. In additional, they're not a single method to prevent against failures.

Continue Reading →

November 6, 2016 • Apache Spark Streaming

Spark Streaming configuration

Even if Spark Streaming uses globally the same configuration as batch, there are some of entries specific to streaming.

Continue Reading →

November 6, 2016 • Apache Spark Streaming

Receivers in Spark Streaming

Standard data sources, such as files, queues or sockets are natively implement in Spark Streaming context. But the framework allows the creation of more flexible data consumers called receivers.

Continue Reading →

November 6, 2016 • Apache Spark Streaming

DStream transformations

Spark Streaming is not static and allows to convert DStreams to new types. It can be done, exactly as for batch-oriented processing, through transformations.

Continue Reading →

November 6, 2016 • Apache Spark Streaming

DStream in Spark Streaming

In Spark batch-oriented, RDD was a data abstraction. In Spark Streaming RDDs are still present but for the programmer another data type is exposed - DStream.

Continue Reading →

October 15, 2016 • Apache Spark Streaming

Introduction to Spark Streaming

Spark Streaming is a powerful extension of Spark which helps to work with streams efficiently. In this article we'll present basic concepts of this extension.

Continue Reading →

Apache Spark Streaming articles