Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Webhook is unable to recreate the MutatingWebhookConfiguration object if it gets deleted #286

Open
tomleb opened this issue Sep 1, 2023 · 3 comments
Assignees

Comments

@tomleb
Copy link
Contributor

tomleb commented Sep 1, 2023

Summary

While working on rancher/rancher#41789 I wanted to test that the webhook was still working correctly after some dependency changes. I deleted both mutating and validating webhook configuration. I expected webhook to recreate both but it only created the validating one. Here's what I see in the logs:

time="2023-09-01T20:08:33Z" level=error msg="error syncing 'cattle-system/cattle-webhook-ca': handler secrets: failed to create mutating configuration: MutatingWebhookConfiguration.admissionregistration.k8s.io \"\" is invalid: metadata.name: Required value: name or generateName is required, requeuing"

This patch fixes the issue for me so I'll create a PR:

diff --git a/pkg/server/server.go b/pkg/server/server.go
index 1c3f7418332f..1645ddcfe54f 100644
--- a/pkg/server/server.go
+++ b/pkg/server/server.go
@@ -275,9 +275,9 @@ func (s *secretHandler) ensureWebhookConfiguration(validatingConfig *v1.Validati
 		}
 	}
 
-	currMutation, err := s.mutatingController.Get(validatingConfig.Name, metav1.GetOptions{})
+	currMutation, err := s.mutatingController.Get(mutatingConfig.Name, metav1.GetOptions{})
 	if apierrors.IsNotFound(err) {
-		_, err = s.mutatingController.Create(currMutation)
+		_, err = s.mutatingController.Create(mutatingConfig)
 		if err != nil {
 			return fmt.Errorf("failed to create mutating configuration: %w", err)
 		}

Reproducing

Delete both webhook configurations

kubectl delete mutatingwebhookconfiguration rancher.cattle.io
kubectl delete validatingwebhookconfiguration rancher.cattle.io 

Wait a little bit so that webhook tries to recreate them.

You'll see that the validating exists but not the mutating.

EDIT: Note that the this can also be reproduced by following the official instructions for rotating expired webhook certificates: https://ranchermanager.docs.rancher.com/v2.7/troubleshooting/other-troubleshooting-tips/expired-webhook-certificate-rotation

@jrwhetse
Copy link

jrwhetse commented Dec 9, 2023

I'm currently running into this same issue. I recreated the rancher.cattle.io MutatingWebhookConfiguration, deleted the rancher-webhook pod and it now restarts correctly. My downstream cluster is still presenting

Internal error occurred: failed calling webhook "rancher.cattle.io.namespaces.create-non-kubesystem": failed to call webhook: Post "https://rancher-webhook.cattle-system.svc:443/v1/webhook/validation/namespaces?timeout=10s": context deadline exceeded

Any idea how to kick start the downstream cluster to make it check back in?

@tomleb
Copy link
Contributor Author

tomleb commented Dec 11, 2023

Internal error occurred: failed calling webhook "rancher.cattle.io.namespaces.create-non-kubesystem": failed to call webhook: Post "https://rancher-webhook.cattle-system.svc:443/v1/webhook/validation/namespaces?timeout=10s": context deadline exceeded

That seems to be a separate issue. This error message points to /webhook/validation but in your case it was the MutatingWebhookConfiguration that was recreated. Also, the error message says that it's making the request to the validating webhook, but it times out.

@jrwhetse
Copy link

Internal error occurred: failed calling webhook "rancher.cattle.io.namespaces.create-non-kubesystem": failed to call webhook: Post "https://rancher-webhook.cattle-system.svc:443/v1/webhook/validation/namespaces?timeout=10s": context deadline exceeded

That seems to be a separate issue. This error message points to /webhook/validation but in your case it was the MutatingWebhookConfiguration that was recreated. Also, the error message says that it's making the request to the validating webhook, but it times out.

According to rancher/rancher#42611, this issue is a duplicate. In my case, I followed the solution in rancher/rancher#42611 (comment) and was able to get the rancher-webhook started again. Sorry for any confusion

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants