Global and local Apache ZooKeeper in Apache Pulsar - part 2 on waitingforcode.com

Versions: Apache Pulsar 2.5.0

In my last post about Apache pulsar, I introduced global and local ZooKeepers. In this one, which is the follow-up, I'll check what both of them contain.

Data Engineering Design Patterns

Looking for a book that defines and solves most common data engineering problems? I wrote one on that topic! You can read it online on the O'Reilly platform, or get a print copy on Amazon.

I also help solve your data engineering problems 👉 contact@waitingforcode.com 📩

In the first section I will analyze the content of my ZooKeeper after executing different administration requests. In the second part, I will analyze how Apache Pulsar interacts with its ZooKeeper clusters.

ZooKeeper content

Apache ZooKeeper stores the information about different things, like ledgers, bookies, namespaces or even schemas. Let's analyze every zNode more in detail. But before, I will start my Pulsar docker-compose and execute the following code (with Java SDK, even though can also be done in CLI but I want to learn the SDK before 🤓 ):

  val admin = PulsarAdmin.builder()
    .serviceHttpUrl("http://localhost:8080")
    .build()

  val nonPartitionedTopicName = "wfc-non-partitioned-topic-1"

  admin.topics().createNonPartitionedTopic(nonPartitionedTopicName)
  admin.topics().createSubscription(nonPartitionedTopicName, s"${nonPartitionedTopicName}-subscription",
    MessageId.earliest)

  val partitionedTopicName = "wfc-partitioned-topic-1"
  admin.topics().createPartitionedTopic(partitionedTopicName, 5)
  admin.topics().createSubscription(partitionedTopicName, s"${partitionedTopicName}-subscription",
    MessageId.earliest)


  val adminRoles = Sets.newHashSet("role1", "role2")

  val tenantInfo = new TenantInfo(adminRoles, Sets.newHashSet(admin.clusters().getClusters))
  admin.tenants().createTenant("wfc-tenant", tenantInfo)

  admin.namespaces().createNamespace("wfc-tenant/wfc-namespace")

Let's see now what zNodes exist after executing this code:

/counters/producer-name - that's the first zNode that caught my attention. Its path is quite self-explanatory but why should we even need to store a counter for the number of created producers? According to code, this zNode is used in DistributedIdGenerator to create a unique producer name if this name is not defined when the producer is created.
However, the use of DistributedIdGenerator isn't only limited to the producers. According to the documentation, it lives on every node and could be used, alongside the application prefix and local counter, to create globally unique ids. But I didn't find any use of that in the code.
Before I move to the next point, it's also worth explaining where the producer name it's used. It's used to control the backpressure (quotas) and also to know what producers are currently connected to a particular topic.
/admin - in this part you will find all admin-related configuration. The first of them are policies, stored under /admin/policies. In my snippet I created a new tenant called wfc-tenant, which resulted in the creation of /admin/policies/wfc-tenant policy where you will find the assigned roles and clusters:
```
{"adminRoles":["role1","role2"],"allowedClusters":["local_pulsar"]}
```
To my surprise, I also found my wfc-namespace in the tenant's path /admin/policies/wfc-tenant/wfc-namespace. But it's normal because Pulsar is a multi-tenant system where the tenant is the basic organization unit. Later on you will find namespaces and just after, the topics. You can discover that by analyzing another zNode, /admin/partitioned-topics where was created by wfc-partitioned-topic-1. More exactly, the full path is /admin/partitioned-topics/public/default/persistent/wfc-partitioned-topic-1 and the zNode stores the number of partitions as a JSON document ( {"partitions":5}).

Still about the topics, this admin part also seems to contain something related to the transaction coordination. I found it in the /admin/partitioned-topics/pulsar/system/persistent/transaction_coordinator_assign entry but didn't understand (at this moment!) how it works.

The last intriguing point I found in this part is about local policies. The /admin/local-policies/public/default stores a list of bundles. A bundle is a unit composing every namespace. Every topic of the namespace is later assigned to one bundle. ZooKeeper stores only the boundaries of the bundles:
```
{"bundles":{"boundaries":["0x00000000","0x10000000","0x20000000","0x30000000","0x40000000","0x50000000","0x60000000","0x70000000","0x80000000","0x90000000","0xa0000000","0xb0000000","0xc0000000","0xd0000000","0xe0000000","0xf0000000","0xffffffff"],"numBundles":16}}
```
The bundles for tenant's namespaces are located within the global policies (/admin/policies/wfc-tenant/wfc-namespace). From that, you can see the difference between both. Local policies are initialized only when the local ZooKeeper starts, so at the cluster level (LocalZooKeeperCacheService).
/schemas - the default schema registry for Pulsar users ZooKeeper and BookKeeper. The former one indicates the location of the schemas for every topic whereas the latter stores the definition of the schemas.
/namespace - I'm not sure about that supposition but it looks like all namespaces with their bundles are created here. In my case I can see here the default namespace entries like /namespace/public/default or /namespace/public/default/0x00000000_0x10000000
/bookies - under this zNode you will find all information managed by BookKeeper.
/ledgers - this zNode is required to manage BookKeeper's ledgers. A ledger is BookKeeper is one component of the replicated log and under /ledgers zNode ZooKeeper stores all metadata about them.
In this part you can retrieve the available ledgers, like for instance /ledgers/available/172.21.0.5:3181, or read-only ledgers (/ledgers/available/readonly) because a ledger can be closed and pass after that to read-only mode.

Aside from this high-level information, you will also retrieve here low-level parameters, like the partitions or cursors metadata.

/managed-ledgers - in simple terms, a managed ledger is an abstraction on top of BooKeeper ledgers to provide a single layer for topic storage. That's why if you look at the zNodes, you will find the information about the partitioned and not partitioned topics:

/managed-ledgers/public/default/persistent/wfc-partitioned-topic-1-partition-0
/managed-ledgers/public/default/persistent/wfc-partitioned-topic-1-partition-0/wfc-partitioned-topic-1-subscription
/managed-ledgers/public/default/persistent/wfc-partitioned-topic-1-partition-1
/managed-ledgers/public/default/persistent/wfc-partitioned-topic-1-partition-1/wfc-partitioned-topic-1-subscription
/managed-ledgers/public/default/persistent/wfc-partitioned-topic-1-partition-2
/managed-ledgers/public/default/persistent/wfc-partitioned-topic-1-partition-2/wfc-partitioned-topic-1-subscription
/managed-ledgers/public/default/persistent/wfc-partitioned-topic-1-partition-3
/managed-ledgers/public/default/persistent/wfc-partitioned-topic-1-partition-3/wfc-partitioned-topic-1-subscription
/managed-ledgers/public/default/persistent/wfc-partitioned-topic-1-partition-4
/managed-ledgers/public/default/persistent/wfc-partitioned-topic-1-partition-4/wfc-partitioned-topic-1-subscription

/managed-ledgers/public/default/persistent/wfc-non-partitioned-topic-1
/managed-ledgers/public/default/persistent/wfc-non-partitioned-topic-1/wfc-non-partitioned-topic-1-subscription

Inside the subscription zNodes you will find the information about current cursors, so Apache Kafka's offset consumed by a given consumer (in Pulsar it's abstracted with a subscription).

/loadbalance - under this zNode you will find different parameters used for load balancing, like resource quota and brokers parameters. In the latter category you will find the information about the current load of every broker, like here for a localhost:8080 broker:

{"webServiceUrl":"http://localhost:8080","pulsarServiceUrl":"pulsar://localhost:6650","persistentTopicsEnabled":true,"nonPersistentTopicsEnabled":true,"cpu":{"usage":0.0,"limit":400.0},"memory":{"usage":370.4627685546875,"limit":2048.0},"directMemory":{"usage":0.0,"limit":4096.0},"bandwidthIn":{"usage":0.0,"limit":1.024E7},"bandwidthOut":{"usage":0.0,"limit":1.024E7},"msgThroughputIn":0.0,"msgThroughputOut":0.0,"msgRateIn":0.0,"msgRateOut":0.0,"lastUpdate":1588931505237,"lastStats":{},"numTopics":0,"numBundles":0,"numConsumers":0,"numProducers":0,"bundles":[],"lastBundleGains":[],"lastBundleLosses":[],"brokerVersionString":"2.5.0","protocols":{},"bundleStats":{},"maxResourceUsage":0.1808900237083435,"loadReportType":"LocalBrokerData"}

Aside from that, under /loadbalance/leader you will retrieve the parameters of the current leader broker.

To retrieve the zNodes stores in the global ZooKeeper (configuration store), we can analyze the PulsarClusterMetadataSetup class. You can see there that the zNodes like /admin, /admin/policies or /admin/clusters are created on the global ZooKeeper. Other ones, so /namespace, /managed-ledgers, /ledgers or /bookies, so globally everything scoped to one Apache Pulsar cluster, are on the local ZooKeeper instance.

ZooKeeper management

As you saw, I executed some operations from the SDK and my ZooKeeper was updated by Apache Pulsar. The SDK, and more generally the web service, is the first way to interact with ZooKeeper server from Pulsar. Under-the-hood, the SDK's I used in the post sends the request to the API endpoint. This point is also valid for the read operations.

Another place creating zNodes are bootstrap classes that you call to initialize cluster metadata (initialize-cluster-metadata). By the cluster metadata I mean all information that is required by Apache Pulsar to be available for serving client requests, like the ones from my code snippet. The list of created zNodes is defined in org.apache.pulsar.PulsarClusterMetadataSetup class.

Regarding the reading part, how Apache Pulsar is aware of the changes made on ZooKeeper? Thanks to a ZooKeeper feature called watchers, Pulsar is notified whenever a zNode is modified. When it happens, Pulsar updates the entry in its in-memory ZooKeeper cache service. From that, you can deduce also that from time to time, client's requests won't reach the ZooKeeper server and very often in the code, you can see the calls involving only the zNode's cache, like here:

// NamespaceService
pulsar.getLocalZkCache().getDataAsync(path, ...)
Set activeNativeBrokers = pulsar.getLocalZkCache().getChildren(LoadManager.LOADBALANCE_BROKERS_ROOT)

// PulsarService
if (!this.globalZkCache.exists(
                    AdminResource.path(POLICIES) + "/" + nsName)) {
                LOG.info("SLA Namespace = {} doesn't exist.", nsName);
                return;
            }

// SimpleLoadManagerImpl
           if (pulsar.getLocalZkCache().exists(zkPath)) {
                pulsar.getZkClient().setData(zkPath, settingBytes, -1);

Even though Apache ZooKeeper is another extra layer in Apache Pulsar architecture, you shouldn't be scared. As you can see in this and previous article, it's mostly about storing metadata that is used by other components of the system, like subscription, load balancing, or namespaces. Maybe in the future, I will go back to this ZooKeeper topic but before, I need to learn more about these components.

Consulting

With nearly 16 years of experience, including 8 as data engineer, I offer expert consulting to design and optimize scalable data solutions. As an O’Reilly author, Data+AI Summit speaker, and blogger, I bring cutting-edge insights to modernize infrastructure, build robust pipelines, and drive data-driven decision-making. Let's transform your data challenges into opportunities—reach out to elevate your data engineering game today!

👉 contact@waitingforcode.com
🔗 past projects

TAGS: #ZooKeeper and Pulsar