Hush! It's a secret on the cloud

How to manage secrets is probably one of the first problems you may encounter while deploying some resources from a CI/CD pipeline. The simple answer is: not manage them at all! Let the cloud services do this.

New ebook 🔥

Learn 84 ways to solve common data engineering problems with cloud services.

👉 I want my Early Access edition

Services

I'll start the presentation with a service mostly used to manage encryption/decryption keys. All of the 3 major cloud providers I'm currently focusing on - AWS, Azure and GCP - have that service. On AWS and GCP it's called KMS (Key Management Service) whereas in Azure you will find it under the name of Key Vault. And despite this different naming, all of them offer similar features:

Besides the secret key stores that can be useful for encryption purposes, there is a second category of services, this time to manage the secrets. AWS and GCP have dedicated services for that, whereas Azure uses the aforementioned Key Vault. Actually, AWS has even 2 services that can be used to manage the secrets. The first of them is Systems Manager. The service has a Parameter Store capability that integrates with AWS KMS to store secrets in an encrypted format. The second service is Secret Manager. The latter is available under the same name on GCP. Unlike the secret key stores, the secret managers manage real secret data like passwords or access tokens. Although they do rely on the secret key stores to encrypt the persisted secrets but are separate services.

All of them share a set of best practices:

Data services integration

The presented 2 families of services are the most popular ones, but some cloud components provide their secrets management layer. Databricks has a component called secret scope where you can write any secret values and access them with the dbutils.secrets.get function. It works with a separated secrets database but can also be connected to already existing secret key store like Azure Key Vault.

Another service supporting native integration with secrets service is Cloud Composer. It connects with the Secret Manager service. The connection consists of the usage of airflow.providers.google.cloud.secrets.secret_manager.CloudSecretManagerBackend as the secrets backend. The configuration also has a variables_prefix and connections_prefix entries, both defining the variables and connections the Composer can access. Thanks to these 2 values you can separate the secrets used by other applications from the secrets read by Cloud Composer service.

Finally, on AWS, the Secret Manager natively integrates with the RDS, Redshift, and DocumentDB services to store and automatically rotate the connection credentials.

To complete the list, you can, of course, use KMS/Key Vault as the encryption keys provider for various data stores like object stores, streaming brokers, data warehouses or NoSQL databases.

Secrets are your secret, so do not reveal them to anyone unauthorized. To facilitate this task on the cloud, you can use keys and secrets management services alongside the IAM layer.