Complex Event Processing on the cloud

When I've first met the Complex Event Processing (CEP) term, I was scared. Event streaming processing itself was complex enough, so why this extra complex-specific stuff? It happens that the complexity is real but in this post I will rather focus on a different aspect. What are the services supporting the CEP on the cloud?

Data Engineering Design Patterns

Looking for a book that defines and solves most common data engineering problems? I'm currently writing one on that topic and the first chapters are already available in πŸ‘‰ Early Release on the O'Reilly platform

I also help solve your data engineering problems πŸ‘‰ πŸ“©


Before covering the cloud aspect, let me focus on the CEP itself. The first thing to clarify is the meaning of this complex keyword. I understood it only after reading the excellent, although quite old, blog post from Octo blog about CEP.

In a nutshell, the CEP starts by complex because:

The goal of a CEP pipeline is to generate actionable items corresponding to the solved problem. Often, the output will go to a streaming broker because of its real-time semantic. However, it's not forbidden to write these results to a data-at-rest storage for an ad-hoc analysis. Anyway, I hope the "complex" is now clearer and we can move to the cloud architectures.

Cloud services

In the exercise of designing our own architecture, we'll use the following components:

The connectivity of the CEP component is the key element in the analyzed design. We want here to use a cloud service. As you certainly know, managed solutions have their limits and do not magically support all popular data stores. That's why this data processing layer will define the input and output data stores. Let's analyze a few candidates:

So, which one to choose? Thanks to the native Complex Event Processing module, extendibility, and support on the cloud services, Apache Flink seems to be the best option. But in reality, it depends on the cloud provider and the use case. If you're an Azure user and want to detect some simple patterns, Azure Stream Analytics can be enough. On the other hand, writing some code and deploying a Structured Streaming job can be the single option for other scenarios.