Identities and permissions management

AWS was the first cloud provider I've been working on. That's why when I did my first Azure and GCP project, I was always asking myself, "Hey, how would you implement that on AWS?". Answering that question was easy most of the time, but sometimes I got stuck. One of my most significant issues was the identity and permissions management component. I will try to share some related answers in this blog post.

Data Engineering Design Patterns

Looking for a book that defines and solves most common data engineering problems? I wrote one on that topic! You can read it online on the O'Reilly platform, or get a print copy on Amazon.

I also help solve your data engineering problems 👉 contact@waitingforcode.com 📩

How to define a user?

Physical users

Let's start with the most fundamental question, how to define a new user? AWS creates them from the IAM service. Each user can have a programmatic access if the access keys exist, and an interactive access if the creation involves password definition.

Azure manages the users in the Active Directory service. A user is defined there by his name and user name. The service also auto generates the password.

Regarding GCP, it also manages the users in the IAM service but the semantic is a bit different than on AWS. The users come from the Google workspace and are invited by their email addresses. The IAM service doesn't have a command to create a new dedicated user.

Service users

What happens if we must link the user to service and hence create a service account user? Nothing different on AWS where even the users with the interactive access can be used by the services. But the same is not true for Azure and GCP.

On Azure, the idea of a service account is more complex than on AWS. The definition includes 3 different types of service accounts that can live in Azure Active Directory:

managed identity - provides identities for Azure resources either as system-managed identity or user-assigned managed identity. The former ones are only assignable to one instance of a service. In other words, they live as long as the instance is up and running. On the opposite side, the user-assigned identities are separate Azure resources that you can assign to different service instances. Consequently, each instance can only have 1 system- and multiple user-assigned identities.
service principal - under-the-hood a managed identity creates a service principal, which is also known as application's security principal. More exactly, the service principal represents an application object locally in a single Azure AD tenant. Service principals have different authentication mechanisms, such as certificates or secrets. Azure transparently manages the credentials of managed identities. They're then more similar to the service accounts you may find on AWS or GCP.
Since a managed identity creates a service principal, what is the reason for having them both on Azure? In fact, not all services have their managed identities yet. In that case, there is no other choice than to use a service principal.
user-based service account - technically possible method but officially not recommended by Azure: "We recommend that you not use an Azure Active Directory user account as a service account" (Types of Azure Active Directory service accounts). This approach considers user accounts as service accounts which is less secure than relying on managed identities or service principals.

Regarding GCP, it creates the service account users with a dedicated IAM operation. As a regular user, each service account has its id and set of permissions. It can also be impersonated by another user. Impersonation allows principals and resources to act as a service account.

To sum up, service accounts have a special meaning on Azure and GCP, whereas AWS treats them as normal users associated with the services. And this summary is a good transition to the next question of resources authorization.

How to authorize a user to access a resource?

AWS provides 2 ways to authorize access to resources. Both rely on the policies. A policy defines a set of allowed and disallowed actions that users can make. There are 2 types of policies, identity-based and resource-based. The identity-based policies exist within IAM identities, such as users, groups or roles whereas the resource-based policies live with the resources like S3.

Azure manages access in the Role-Based Access Control (RBAC) system. The system defines roles, so what each security principal (user or service account) can do. For example, it can associate a Reader role to a particular Storage Account or all Storage Accounts in the subscription or resource group. The association boundary is called scope and once the scope is defined, the role is also assigned to a security principal. Azure will later use this role assignment to control access to the resources.

When it comes to GCP, IAM also implements the policies, but they can only be attached to a resource. The users only get roles which are collections of various operations allowed on each cloud service. But they don't authorize these operations directly. The grant consists of associating the role to the resource.

How to create a group?

In general, cloud providers recommend using groups to manage multiple users sharing the same permissions. The group's implementation on AWS and GCP is pretty straightforward since it can be understood as a collection of users sharing the same set of permissions. Although this definition is also valid for Azure, groups in Azure have some differences such as:

membership types - AWS and GCP have a static membership rule that explicitly assigns users to the group. Azure, in addition to this mechanism, comes with dynamic groups where the provider evaluates a set of rules to add or remove the users or devices from the group dynamically.
group types - Azure supports security and Microsoft 365 groups. The former is the one you'll probably use in the cloud journey, whereas the latter is in a corporate world to integrate Microsoft 365 tools. AWS doesn't have that distinction and only supports the "security" type. GCP has similar corporate groups concepts with Google Workspace domain.

What surprised me the most while switching from AWS to GCP was the resource policies association. I couldn't understand why the resource permission cannot be directly assigned at the user/group level. But I got more confused on Azure, with the managed identities vs service principals different, Microsoft 365 groups existence, and the idea of user-based service accounts. And what about you? What were your confusion points?

Consulting

With nearly 16 years of experience, including 8 as data engineer, I offer expert consulting to design and optimize scalable data solutions. As an O’Reilly author, Data+AI Summit speaker, and blogger, I bring cutting-edge insights to modernize infrastructure, build robust pipelines, and drive data-driven decision-making. Let's transform your data challenges into opportunities—reach out to elevate your data engineering game today!

👉 contact@waitingforcode.com
🔗 past projects