Schema management in cloud streaming services

When I tell you "schema management" and "streaming", you'll certainly think about the schema registry of Apache Kafka. That's true but also streaming cloud services do manage the schemas and in this blog post we'll see how.

New ebook 🔥

Learn 84 ways to solve common data engineering problems with cloud services.

👉 I want my Early Access edition

In the blog post, I'll check how streaming services of 3 major cloud providers (AWS, Azure, GCP) deal with schemas. Initially, I called this article with a catchy "Schema registry is everywhere" title but after a deeper analysis, I understood that the term is inappropriate for GCP Pub/Sub. By the way, the service simply calls about schema management and not about a registry.

Schema management vs schema registry

Let's start then by spotting some differences between GCP Pub/Sub, AWS Glue Schema Registry and Azure Event Hubs Schema Registry. Pub/Sub implements the schemas management more as a metadata annotation for the topics than a separate metadata management layer because:

Similarities

Despite these differences mostly coming from a different perception of the schema in Pub/Sub, there are some similarities between the services, such as:

As you can see, on the cloud you'll find 2 ways to work with schemas. The first method used by Pub/Sub relies on a static schema associated with the topic. It's an easy way where the clients don't need to manage the schema part. This management is required in the second method based on a schema registry concept and implemented by Glue and Event Hubs. It requires a bit more effort on the client side but provides a more complete set of features with the schema versions and migration compatibility modes.