Apache Beam articles

on waitingforcode.com

Late data in Apache Beam

Data, especially in streaming applications, can very often arrive on late to the processing pipeline. Despite of that, Apache Beam is able to handle this case pretty easily thanks to watermark mechanism. Continue Reading →

Windows in Apache Beam

As mentioned in one of the first posts about Apache Beam, the concept of window is a key element in its data processing logic. Even for bounded data a default window called global is defined. For the unbounded one the variety of windows is much bigger. Continue Reading →

Coders in Apache Beam

Since in distributed computing the data moves either locally (within single worker) or remotely (between several different workers), it must have a format understandable by the machine. And this format is guaranteed by the operation of serialization, also present in Apache Beam. Continue Reading →

Data partitioning in Apache Beam

The power of Big Data processing platforms resides mainly in the ability to parallelize processing on different nodes. Each framework has its own unit of parallelism. In Spark it's called partition. Apache Beam calls it bundle. Continue Reading →