A 3-day bug hunt on a 3-person team costs up to β¬7,200 in lost engineering time. This workshop teaches you to prevent that β unit tests, data tests, and integration tests for PySpark and Databricks Lakeflow, including Spark Declarative Pipelines.
In one of previous articles we've discovered how to implement Kafka to publish/subscribe pattern, ie. one consumer per consumer group. This time we'll describe another aspect of Kafka consuming, messages queue.
When you need to write about messaging, you'll certainly meet a dilemma about 'which part, consuming or producing, describe first ?'. I decided to start with simple consumers. By 'simple', I mean only 1 consumer per group. Thanks to that we can avoid the point of partition sharing.
As already told during quick introduction, Apache ZooKeeper is an inseparable part of Apache Kafka. Knowing what happens between these two actors is important to start to work with Kafka correctly.
Recently we discovered some theoretical concepts about Apache Kafka. So it's a good moment to discover Java API.
This article starts a small succession of posts about Apache Kafka, considered often as one of solutions to data ingestion.