Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use existing service account for Helm deployment #4155

Open
landerss1 opened this issue Jul 8, 2024 · 10 comments
Open

Use existing service account for Helm deployment #4155

landerss1 opened this issue Jul 8, 2024 · 10 comments
Assignees
Milestone

Comments

@landerss1
Copy link

Describe the current behavior
Currently, there is no option to specify an existing service account when you deploy ASO with the Helm chart. In a multi tenant setup, this is particularly bad from a security perspective, because it requires that ClusterRoleBinding can be created when deploying the tenant ASO operator. With this requirement, Helm deployments in the tenant namespace can use this and elevate privileges to gain cluster wide admin permissions.

Describe the improvement
Add an option to use existing service account for Helm deployment to avoid trying to create ClusterRoleBinding .

Additional context
The tenant namepaces we have are locked down and are not allowed to do stuff on a cluster level. For CI/CD we use Flux, and for security, we use OPA Gatekeeper (Azure Policies). Now, we have a constraint that requires that if you deploy with Flux and use a HelmRelease, there must be a service account specified in the HelmRelease, so for that reason we have default service accounts in each tenant namespace that are able to do everything in that namespace, plus manage CRDs on a cluster level.

@matthchr matthchr self-assigned this Jul 8, 2024
@matthchr matthchr added this to the v2.9.0 milestone Jul 8, 2024
@matthchr
Copy link
Member

matthchr commented Jul 9, 2024

This ask seems reasonable - we'll look at doing it for the next release.

@matthchr
Copy link
Member

matthchr commented Jul 9, 2024

Are you aware that you can run a single instance of the operator in a multitenant mode now? See Credential scope.

Doing this might be significantly easier for you to manage/orchestrate than running the operator in multitenant mode, while still achieving your user-multitenancy goals? See also: https://azure.github.io/azure-service-operator/guide/authentication/#using-multiple-operators-with-a-single-credential-per-operator

This mode is not recommended unless you really need it

ASO also supports installing multiple instances of the operator alongside one another, each configured with different credentials.
This option exists for the most security conscious customers.
Advantages:

  • Each operator pod only has access to a single credential, reducing risk if one pod is somehow compromised.

Disadvantages:

  • Significantly harder to orchestrate ASO upgrades.
  • More kube-apiserver load, as there will be multiple operators running and watching/reconciling resources.

@landerss1
Copy link
Author

I'm not quite sure if that is something that would solve our problem, or maybe I just don't fully understand it. We have just recently migrated to workload identities cluster-wide, so in our multi-tenant deployment we use that for the operator in the tenant namespace. With the user assigned identity, we can then be certain that this tenant namespace can only create resources in a specific resource group that was created for this particular tenant.

@matthchr
Copy link
Member

matthchr commented Jul 9, 2024

Is my understanding correct that right now you're using the multitenant.enable + azureOperatorMode options of the Helm chart? If so, that basically means you're deploying the operator N+1 times, once in a namespace to host the webhooks, and N times in watchers mode in each tenant namespace.

If the objective you have is that every namespace in your cluster needs their own identity to access anything in Azure, and each namespace needs a different identity, you can instead accomplish that by installing the operator once in a namespace such as azureserviceoperator-system and not creating a global credential. This can be done with the following Helm command on a fresh cluster:

helm upgrade --install aso2 aso2/azure-service-operator \
    --create-namespace \
    --namespace=azureserviceoperator-system \
    --set crdPattern='resources.azure.com/*;containerservice.azure.com/*;keyvault.azure.com/*;managedidentity.azure.com/*;eventhub.azure.com/*'

Note that there's no secret details specified in that command. This means that ASO is watching every namespace, but has no permissions with Azure to do anything (because no identities are configured). Then in each tenant namespace, the users can create a secret named aso-credential containing the details of their namespaced workload-identity. Once they do this, ASO will put any Azure resource it finds in that namespace into the subscription associated with that identity using that identity.

The key points here being:

  1. You only need to install the ASO operator once, not N times (1x per namespace).
  2. No namespace has permission to do anything until they create the aso-credential. At which point, only that namespace can use that credential.

You might still want the ability to configure the ServiceAccount of the single installation of ASO, but you won't need to do it in every namespace just in the azureoperator-system namespace (which can be locked down and not accessible to individual teams).

Does that make sense?

@landerss1
Copy link
Author

You are correct, I followed the instructions for multiple operator multitenancy deployment here: https://azure.github.io/azure-service-operator/guide/authentication/multitenant-deployment/

The objective is that every namespace in the cluster needs their own identity to access anything in Azure, and each namespace needs a different identity, so I tried your suggested deployment. But, then I ran into issues with the default ASO service account. From the log, I can see:

I0710 06:13:40.058350       1 azure_generic_arm_reconciler_instance.go:72] "msg"="Determined CreateOrUpdate action" "action"="BeginCreateOrUpdate" "azureName"="rg-dev-we-mimforum" "logger"="controllers.ResourceGroupController" "name"="rg-dev-we-mimforum" "namespace"="mimforum"
I0710 06:13:40.058455       1 azure_generic_arm_reconciler_instance.go:313] "msg"="About to send resource to Azure" "azureName"="rg-dev-we-mimforum" "logger"="controllers.ResourceGroupController" "name"="rg-dev-we-mimforum" "namespace"="mimforum"
I0710 06:13:40.058579       1 recorder.go:104] "msg"="Using credential from \"mimforum/aso-credential\"" "logger"="events" "object"={"kind":"ResourceGroup","namespace":"mimforum","name":"rg-dev-we-mimforum","uid":"4aaa2e4b-b671-4bb3-8b82-bbea02d9f92a","apiVersion":"resources.azure.com/v1api20200601storage","resourceVersion":"4901082"} "reason"="CredentialFrom" "type"="Normal"
E0710 06:13:40.115697       1 generic_reconciler.go:388] "msg"="Encountered error impacting Ready condition" "error"="Reason: UnknownError, Severity: Warning, RetryClassification: RetrySlow, Cause: WorkloadIdentityCredential authentication failed\nPOST https://login.microsoftonline.com/<redacted>/oauth2/v2.0/token\n--------------------------------------------------------------------------------\nRESPONSE 401 Unauthorized\n--------------------------------------------------------------------------------\n{\n  \"error\": \"invalid_client\",\n  \"error_description\": \"AADSTS700213: No matching federated identity record found for presented assertion subject 'system:serviceaccount:azureserviceoperator-system:azureserviceoperator-default'. Please check your federated identity credential Subject, Audience and Issuer against the presented assertion. https://docs.microsoft.com/en-us/azure/active-directory/develop/workload-identity-federation Trace ID: 0680bcc9-7046-4d1b-9d47-7a5d2e329800 Correlation ID: bc7a9f56-0cf9-4687-9bdc-4b3037dc601f Timestamp: 2024-07-10 06:13:40Z\",\n  \"error_codes\": [\n    700213\n  ],\n  \"timestamp\": \"2024-07-10 06:13:40Z\",\n  \"trace_id\": \"0680bcc9-7046-4d1b-9d47-7a5d2e329800\",\n  \"correlation_id\": \"bc7a9f56-0cf9-4687-9bdc-4b3037dc601f\"\n 

And this is the secret used by the operator itself:

apiVersion: v1
kind: Secret
metadata:
  name: aso-controller-settings
  namespace: azureserviceoperator-system
data:
  AZURE_CLIENT_ID: ''
  AZURE_SUBSCRIPTION_ID: ''
  AZURE_SYNC_PERIOD: MW0=
  AZURE_TENANT_ID: ''
type: Opaque

Seems to me that the operator finds the secret in the tenant namespace, but is using the default service account and workload identity to authenticate, and this fails. What am I missing here to get it to work?

@matthchr
Copy link
Member

The objective is that every namespace in the cluster needs their own identity to access anything in Azure, and each namespace needs a different identity, so I tried your suggested deployment. But, then I ran into issues with the default ASO service account.

You should be able to do this with a single operator instance and per-namespace secrets as I described above.

Just making sure the pattern is in alignment w/ what I'm expecting, you created an aso-credential secret in namespace foo (the tenant namespace), and then attempted to create a ResourceGroup in that same namespace?

You had said:

recently migrated to workload identities cluster-wide,

I assume that means your aso-credential in the foo namespace looks something like this (please share a redacted version and correct me if I am wrong)

apiVersion: v1
kind: Secret
metadata:
  name: aso-credential
  namespace: my-namespace
stringData:
  AZURE_SUBSCRIPTION_ID: <subid>
  AZURE_TENANT_ID:    <tenantid>
  AZURE_CLIENT_ID:    <clientid>
  # No AZURE_CLIENT_SECRET because workload identity

If this is the case, did you miss this step for allowing the ASO pod to use that identity to authenticate with Azure?

az identity federated-credential create --name aso-federated-credential --identity-name ${MI_NAME} --resource-group ${MI_RESOURCE_GROUP} --issuer ${SERVICE_ACCOUNT_ISSUER} --subject "system:serviceaccount:azureserviceoperator-system:azureserviceoperator-default" --audiences "api://AzureADTokenExchange"

The error you got looks to me like ASO used the right identity, but was denied from exchanging its serviceaccount identity for an Azure token because no FederatedIdentityCredential allowed it.

@landerss1
Copy link
Author

Yes, your assumptions are right. The aso-credential looks like you described, and we haven't created the federated credential as you point out. We will try that and update this issue. However, if that's the missing part it's an argument in favor of having the option to use an existing service account in the Helm chart rather than having to do this manually.

@matthchr
Copy link
Member

Yes I think even if you can use this (IMO simpler) multitenant strategy there's still value in letting you control the ServiceAccount. It's just that you only need it once not N times. Either way we'll leave this issue open and investigate making SA configurable in the next release.

@landerss1
Copy link
Author

Now it works as per your suggestions. My mistake was that I used the wrong subject when I created the federated credential. Once I got that right and referenced the service operator service account everything worked fine. Thanks!

@matthchr
Copy link
Member

Glad to hear it worked!

We will pursue making SA configurable in the Helm chart too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

No branches or pull requests

2 participants