Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Webhook pods should not be marked ready until webhook is available #709

Closed
chkohner opened this issue Jan 20, 2023 · 0 comments · Fixed by #721
Closed

Webhook pods should not be marked ready until webhook is available #709

chkohner opened this issue Jan 20, 2023 · 0 comments · Fixed by #721
Labels
bug Something isn't working webhook

Comments

@chkohner
Copy link

Describe the bug

Webhook pods are marked ready before webhook is actually servicing requests, causing dependent pods to miss their mutations.

Steps To Reproduce

  1. Install via helm, according to the docs (or using a terraform helm_release, in our case).
  2. Watch logs to see when Serving webhook server is actually logged. Sometimes this takes several (~3.5) minutes but in the example below only a couple seconds.
  3. When deploying subsequent application pods that are properly labeled, they are scheduled and start, but miss the mutation because no webhook is actually servicing requests yet (so they pass through without ENV vars, etc).

Expected behavior

The pod should not be marked ready until the webhook is actually usable, or you cannot (meaningfully) wait for the azwi deployment via helm --wait.

Logs

                I0120 19:42:19.117325       1 main.go:88] entrypoint "msg"="initializing metrics backend" "backend"="prometheus"
                I0120 19:42:19.117395       1 main.go:95] entrypoint "msg"="setting up manager" "userAgent"="azure-workload-identity/webhook/v0.15.0 (linux/amd64) 9e27154/2022-12-14-01:21"
                I0120 19:42:19.795689       1 listener.go:44] controller-runtime/metrics "msg"="Metrics server is starting to listen" "addr"=":8095"
                I0120 19:42:19.795895       1 main.go:115] entrypoint "msg"="setting up cert rotation" 
                I0120 19:42:19.796043       1 main.go:147] entrypoint "msg"="starting manager" 
NOT READY -->   I0120 19:42:19.796301       1 internal.go:366]  "msg"="Starting server" "addr"={"IP":"::","Port":9440,"Zone":""} "kind"="health probe"
                I0120 19:42:19.796312       1 rotator.go:204] cert-rotation "msg"="starting cert rotator controller" 
                I0120 19:42:19.796375       1 internal.go:366]  "msg"="Starting server" "addr"={"IP":"::","Port":8095,"Zone":""} "kind"="metrics" "path"="/metrics"
                I0120 19:42:19.796423       1 controller.go:185]  "msg"="Starting EventSource" "controller"="cert-rotator" "source"="&{{%!s(*v1.Secret=&{{ } {      0 {{0 0 <nil>}} <nil> <nil> map[] map[] [] [] []} <nil> map[] map[] }) %!s(*cache.informerCache=&{0xc00044b4e0}) %!s(chan error=<nil>) %!s(func()=<nil>)}}"
                I0120 19:42:19.796451       1 controller.go:185]  "msg"="Starting EventSource" "controller"="cert-rotator" "source"="&{{%!s(*unstructured.Unstructured=&{map[apiVersion:admissionregistration.k8s.io/v1 kind:MutatingWebhookConfiguration]}) %!s(*cache.informerCache=&{0xc00044b4e0}) %!s(chan error=<nil>) %!s(func()=<nil>)}}"
                I0120 19:42:19.796463       1 controller.go:193]  "msg"="Starting Controller" "controller"="cert-rotator"
                I0120 19:42:19.896669       1 rotator.go:245] cert-rotation "msg"="refreshing CA and server certs" 
                I0120 19:42:19.896755       1 controller.go:227]  "msg"="Starting workers" "controller"="cert-rotator" "worker count"=1
                E0120 19:42:19.896848       1 rotator.go:674] cert-rotation "msg"="secret is not well-formed, cannot update webhook configurations" "error"="Cert secret is not well-formed, missing ca.crt" 
                I0120 19:42:21.046113       1 rotator.go:722] cert-rotation "msg"="Ensuring CA cert" "gvk"={"Group":"admissionregistration.k8s.io","Version":"v1","Kind":"MutatingWebhookConfiguration"} "name"="azure-wi-webhook-mutating-webhook-configuration"
                I0120 19:42:21.065990       1 rotator.go:722] cert-rotation "msg"="Ensuring CA cert" "gvk"={"Group":"admissionregistration.k8s.io","Version":"v1","Kind":"MutatingWebhookConfiguration"} "name"="azure-wi-webhook-mutating-webhook-configuration"
                E0120 19:42:21.427249       1 rotator.go:247] cert-rotation "msg"="could not refresh CA and server certs" "error"="Operation cannot be fulfilled on secrets \"azure-wi-webhook-server-cert\": the object has been modified; please apply your changes to the latest version and try again" 
                I0120 19:42:21.445770       1 rotator.go:271] cert-rotation "msg"="no cert refresh needed" 
(Sometimes this takes minutes...)
                I0120 19:42:21.445841       1 rotator.go:757] cert-rotation "msg"="certs are ready in /certs" 
                I0120 19:42:21.445866       1 rotator.go:777] cert-rotation "msg"="CA certs are injected to webhooks" 
                I0120 19:42:21.445882       1 main.go:159] entrypoint "msg"="setting up webhook server" 
                I0120 19:42:21.445911       1 main.go:163] entrypoint "msg"="registering webhook to the webhook server" 
                I0120 19:42:21.445992       1 server.go:216] controller-runtime/webhook/webhooks "msg"="Starting webhook server" 
                I0120 19:42:21.446015       1 server.go:148] controller-runtime/webhook "msg"="Registering webhook" "path"="/mutate-v1-pod"
                I0120 19:42:21.446177       1 certwatcher.go:131] controller-runtime/certwatcher "msg"="Updated current TLS certificate" 
READY HERE! --> I0120 19:42:21.446276       1 server.go:270] controller-runtime/webhook "msg"="Serving webhook server" "host"="" "port"=9443
                I0120 19:42:21.446302       1 certwatcher.go:85] controller-runtime/certwatcher "msg"="Starting certificate watcher" 

Environment

  • Kubernetes version (use kubectl version):
    Client Version: v1.25.2
    Kustomize Version: v4.5.7
    Server Version: v1.24.6
  • Cloud provider or hardware configuration: AKS

Additional context

The cert resolve is sometimes fairly quick (such as this instance) but can ocassionally take several minutes.

We are using terraform for our helm deployment.

resource "helm_release" "azwi-webhook" {
    name        = "workload-identity-webhook"
    namespace   = local.azwi_namespace
    chart       = "workload-identity-webhook"
    repository  = "https://azure.github.io/azure-workload-identity/charts"
    wait        = true
    
    depends_on  = [ 
        kubernetes_namespace.azwi,
        kubernetes_service_account.app,
        azuread_application_federated_identity_credential.azwi
    ]

    set {
        name    = "azureTenantID"
        value   = data.azuread_client_config.current.tenant_id
    }
}
@chkohner chkohner added the bug Something isn't working label Jan 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working webhook
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants