Data enrichment is a crucial step in making data more usable by the business users. Doing that with a batch is relatively easy due to the static nature of the dataset. When it comes to streaming, the task is more challenging.
I'm writing this unexpected blog post because I got stuck with watermarks and checkpoints and felt that I was missing some basics. Even though this introduction is a bit negative, the exploration for the data reading enabled my other discoveries.
Previously you could read about transformation of a user job definition into an executable stream graph. Since this explanation was relatively high-level, I decided to deep dive into the final step executing the code.
Have you written your first successful Apache Flink job and are still wondering the high-level API translates into the executable details? I did and decided to answer the question in the new blog post.
I won't hide it, I'm still a fresher in the Apache Flink world and despite my past streaming experiences with Apache Spark Structured Streaming and GCP Dataflow, I need to learn. And to learn a new tool or concept, there is nothing better than watching some conference talks!
Surprised? You shouldn't. I've always been eager to learn, including 5 years ago when for the first time, I left my Apache Spark comfort zone to explore Apache Beam. Since then I had a chance to write some Dataflow streaming pipelines to fully appreciate this technology and work on AWS, GCP, and Azure. But there is some excitement for learning-from scratch I miss.