Hush! It's a secret on the cloud

How to manage secrets is probably one of the first problems you may encounter while deploying some resources from a CI/CD pipeline. The simple answer is: not manage them at all! Let the cloud services do this.

4-day workshop · In-person or online

What would it take for you to trust your Databricks pipelines in production?

A 3-day bug hunt on a 3-person team costs up to €7,200 in lost engineering time. This workshop teaches you to prevent that — unit tests, data tests, and integration tests for PySpark and Databricks Lakeflow, including Spark Declarative Pipelines.

Unit, data & integration tests
Medallion architecture & Lakeflow SDP
Max 10 participants · production-ready templates
See the full curriculum → €7,000 flat fee · cohort of up to 10
Bartosz Konieczny
Bartosz
Konieczny

Services

I'll start the presentation with a service mostly used to manage encryption/decryption keys. All of the 3 major cloud providers I'm currently focusing on - AWS, Azure and GCP - have that service. On AWS and GCP it's called KMS (Key Management Service) whereas in Azure you will find it under the name of Key Vault. And despite this different naming, all of them offer similar features:

Besides the secret key stores that can be useful for encryption purposes, there is a second category of services, this time to manage the secrets. AWS and GCP have dedicated services for that, whereas Azure uses the aforementioned Key Vault. Actually, AWS has even 2 services that can be used to manage the secrets. The first of them is Systems Manager. The service has a Parameter Store capability that integrates with AWS KMS to store secrets in an encrypted format. The second service is Secret Manager. The latter is available under the same name on GCP. Unlike the secret key stores, the secret managers manage real secret data like passwords or access tokens. Although they do rely on the secret key stores to encrypt the persisted secrets but are separate services.

All of them share a set of best practices:

Data services integration

The presented 2 families of services are the most popular ones, but some cloud components provide their secrets management layer. Databricks has a component called secret scope where you can write any secret values and access them with the dbutils.secrets.get function. It works with a separated secrets database but can also be connected to already existing secret key store like Azure Key Vault.

Another service supporting native integration with secrets service is Cloud Composer. It connects with the Secret Manager service. The connection consists of the usage of airflow.providers.google.cloud.secrets.secret_manager.CloudSecretManagerBackend as the secrets backend. The configuration also has a variables_prefix and connections_prefix entries, both defining the variables and connections the Composer can access. Thanks to these 2 values you can separate the secrets used by other applications from the secrets read by Cloud Composer service.

Finally, on AWS, the Secret Manager natively integrates with the RDS, Redshift, and DocumentDB services to store and automatically rotate the connection credentials.

To complete the list, you can, of course, use KMS/Key Vault as the encryption keys provider for various data stores like object stores, streaming brokers, data warehouses or NoSQL databases.

Secrets are your secret, so do not reveal them to anyone unauthorized. To facilitate this task on the cloud, you can use keys and secrets management services alongside the IAM layer.

Data Engineering Design Patterns

Looking for a book that defines and solves most common data engineering problems? I wrote one on that topic! You can read it online on the O'Reilly platform, or get a print copy on Amazon.

I also help solve your data engineering problems contact@waitingforcode.com đź“©