How to manage secrets is probably one of the first problems you may encounter while deploying some resources from a CI/CD pipeline. The simple answer is: not manage them at all! Let the cloud services do this.
I'll start the presentation with a service mostly used to manage encryption/decryption keys. All of the 3 major cloud providers I'm currently focusing on - AWS, Azure and GCP - have that service. On AWS and GCP it's called KMS (Key Management Service) whereas in Azure you will find it under the name of Key Vault. And despite this different naming, all of them offer similar features:
- integration with other cloud services - secret key stores (common name I'll use for KMS and Key Vault) seamlessly integrate with other cloud services. A pretty common example of the integration is the provision of encryption keys used to data stored in object stores, streaming brokers, or databases.
- lifecycle management - you can manage keys expiration and their automatic rotation.
- versioning - in case of a key rotation, the service preserves older versions. It's very important if a service used the key to encrypt some data. Without the versioning, the key's value would be lost, hence the possibility of decrypting the data.
- authorization - key stores seamlessly integrate with Identity and Access Management (IAM) services. You have then a kind of 2-factor data authorization mechanism because to read the data, any user will need to have access to the encryption keys and the data itself.
Besides the secret key stores that can be useful for encryption purposes, there is a second category of services, this time to manage the secrets. AWS and GCP have dedicated services for that, whereas Azure uses the aforementioned Key Vault. Actually, AWS has even 2 services that can be used to manage the secrets. The first of them is Systems Manager. The service has a Parameter Store capability that integrates with AWS KMS to store secrets in an encrypted format. The second service is Secret Manager. The latter is available under the same name on GCP. Unlike the secret key stores, the secret managers manage real secret data like passwords or access tokens. Although they do rely on the secret key stores to encrypt the persisted secrets but are separate services.
All of them share a set of best practices:
- at least privilege - allow only to read the key(s) needed by the application. For example, if it needs a key to store some data encrypted on an object store, it's not required to give the permission to the keys used to encrypt the streaming broker's data.
- logging - beware of logging. Reading a secret and storing it in logs is not a good idea because it's not secret anymore.
- automatically updated users - if the application stores a reference to the secret and not the secret value itself, you can have an easier maintenance workflow in case of the secret rotation. For example, with the following code snippet you can change the database password at any moment because the consumer will reload the value if it fails with an invalid password error:
try: client.connect_to_db(params) except InvalidPasswordError: params['password'] = key_store.fetch(params.password) client.connect_to_db(params)
- automatically revoked users - imagine that your only way to access a data store is the login/password mechanism. For whatever reason, you have to share a single database user credentials across 3 different applications (not a dream, but it happens, often as a "temporary solution" that transforms to the production code :P). You create then 3 separate IAM identities allowed to decrypt the credentials from the store. Now, if you need to revoke the access for one of these applications but keep it online, you can simply change the secret value and revoke the key secrets access permission of the app.
Data services integration
The presented 2 families of services are the most popular ones, but some cloud components provide their secrets management layer. Databricks has a component called secret scope where you can write any secret values and access them with the dbutils.secrets.get function. It works with a separated secrets database but can also be connected to already existing secret key store like Azure Key Vault.
Another service supporting native integration with secrets service is Cloud Composer. It connects with the Secret Manager service. The connection consists of the usage of airflow.providers.google.cloud.secrets.secret_manager.CloudSecretManagerBackend as the secrets backend. The configuration also has a variables_prefix and connections_prefix entries, both defining the variables and connections the Composer can access. Thanks to these 2 values you can separate the secrets used by other applications from the secrets read by Cloud Composer service.
Finally, on AWS, the Secret Manager natively integrates with the RDS, Redshift, and DocumentDB services to store and automatically rotate the connection credentials.
To complete the list, you can, of course, use KMS/Key Vault as the encryption keys provider for various data stores like object stores, streaming brokers, data warehouses or NoSQL databases.
Secrets are your secret, so do not reveal them to anyone unauthorized. To facilitate this task on the cloud, you can use keys and secrets management services alongside the IAM layer.