Apache Spark Streaming articles

Graceful shutdown explained

Spark has different methods to reduce data loss, also during streaming processing. It proposes well known checkpointing but also less obvious operation invoked on stopping processing - graceful shutdown.

Continue Reading β†’

Stateful transformations with mapWithState

updateStateByKey function, explained in the post about Stateful transformations in Spark Streaming, is not the single solution provided by Spark Streaming to deal with state. Another one, much more optimized, is mapWithState.

Continue Reading β†’

Metadata checkpoint

One of previous posts talked about checkpoint types in Spark Streaming. This one focuses more on one type of them - metadata checkpoint.

Continue Reading β†’

SparkException: org.apache. spark. streaming. dstream. MappedDStream@7a388990 has not been initialized

Metadata checkpoint is useful in quickly restoring failing jobs. However, it won't work if the context creation and processing parts aren't declared correctly.

Continue Reading β†’

Window-based transformations in Spark Streaming

Regarding to batch-oriented processing in Spark, new transformation types in Spark Streaming are based on time periods.

Continue Reading β†’

Stateful transformations in Spark Streaming

Spark Streaming is able to handle state-based operations, ie. operations containing a state susceptible to be modified in subsequent batches of data.

Continue Reading β†’

Spark Streaming checkpointing and Write Ahead Logs

Checkpoint allows Spark to truncate dependencies on previously computed RDDs. In the case of streams processing their role is extended. In additional, they're not a single method to prevent against failures.

Continue Reading β†’

Spark Streaming configuration

Even if Spark Streaming uses globally the same configuration as batch, there are some of entries specific to streaming.

Continue Reading β†’

Receivers in Spark Streaming

Standard data sources, such as files, queues or sockets are natively implement in Spark Streaming context. But the framework allows the creation of more flexible data consumers called receivers.

Continue Reading β†’

DStream transformations

Spark Streaming is not static and allows to convert DStreams to new types. It can be done, exactly as for batch-oriented processing, through transformations.

Continue Reading β†’

DStream in Spark Streaming

In Spark batch-oriented, RDD was a data abstraction. In Spark Streaming RDDs are still present but for the programmer another data type is exposed - DStream.

Continue Reading β†’

Introduction to Spark Streaming

Spark Streaming is a powerful extension of Spark which helps to work with streams efficiently. In this article we'll present basic concepts of this extension.

Continue Reading β†’