Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

conformance: Test that webhook validations are performed #1514

Closed
youngnick opened this issue Nov 8, 2022 · 11 comments · Fixed by #1552
Closed

conformance: Test that webhook validations are performed #1514

youngnick opened this issue Nov 8, 2022 · 11 comments · Fixed by #1552
Labels
area/conformance help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/feature Categorizes issue or PR as related to a new feature. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.

Comments

@youngnick
Copy link
Contributor

What would you like to be added:
This issue covers adding conformance tests that test that the implementation is implementing the same blocks as are implemented in the webhook. The simplest way to do this is to run the webhook we provide, but the conformance tests should be a bit agnostic about this.

However, I believe that the conformance test should check that invalid objects are never persisted into the cluster by trying to apply them, and checking that the apply fails. That flow will require some sort of webhook, or possibly the extensions to CRD validation api-machinery is considering (which I can't remember the name of right now).

A secondary part of this effort should also clarify some guidelines about what controllers should do with invalid objects - hopefully we can end up with this being as easy as "run a Validate function on the object, and drop the object if it fails" or something.

Why this is needed:
This will make it very clear that the webhook is required for having a conformant implementation.

This issue came up during discussion on #1497, but I think I've mentioned it before, and haven't written it down. Sorry all.

@youngnick youngnick added kind/feature Categorizes issue or PR as related to a new feature. area/conformance labels Nov 8, 2022
@shaneutt shaneutt added the help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. label Nov 8, 2022
@rainest
Copy link
Contributor

rainest commented Nov 8, 2022

If there is a case where we expect that objects are accepted and reach controllers, how to test that behavior isn't immediately obvious. For example, in the case of #1497, if guidance was:

are invalid and will be rejected by the webhook is installed. Implementations MUST exclude these resources from configuration and add an `Accepted=false` condition with a `HeaderFilterNotValid` reason if the webhook is not running.

We can only test one or the other in a given run. If the webhook is running, the resource simply won't exist, and can't have conditions as such. If it is not running, tests can check conditions, but any test to confirm the webhook behavior (that the resource is rejected at creation time) would then fail. This would basically require two suites of conformance tests for behavior with or without the webhook, correct?

@youngnick
Copy link
Contributor Author

Sigh, yes, you're right.

I think that if we're going to do this, we should lean in and say "we're not going to mandate anything about what happens if the webhook is not running, sorry".

@shaneutt
Copy link
Member

shaneutt commented Nov 10, 2022

Sigh, yes, you're right.

I think that if we're going to do this, we should lean in and say "we're not going to mandate anything about what happens if the webhook is not running, sorry".

This is inherently a problem with being out of tree, yeah?

If I'm not mistaken at monday's meeting Travis was basically asking us what we recommend as far as getting the webhook deployed on operator clusters, e.g. he was asking the question whether we recommend the webhook be loaded in with the existing (Go) operator. Today the onus is partially on cluster operators to ensure the webhook is installed, but it might be significantly better if implementations were packaging it. I know there's some hesitation to force everyone to run the webhook and I also understand there's some downsides (particularly with multiple implementations), but perhaps we should explore this further and see if we can't put the burden of ensuring the webhook's presence more heavily on the implementation so that cluster operators would have to opt out to start getting into grey territory rather than being in grey territory by default (as this may inevitably get us into trouble)?

@rainest
Copy link
Contributor

rainest commented Nov 10, 2022

Putting responsibility on implementations to run it, either as a separate container or integrated into a Golang controller, seems reasonable--that's basically what this test would enforce.

This is inherently a problem with being out of tree, yeah?

Would the situation change at all post-GA? I know that Ingress has more complex validation than schema enforcement alone, and it seems like we'd be able to add most if not all validation rules in the webhook there.

I don't know if maybe there's an effort to avoid adding additional validations there going forward, but if we would eventually be able to add them there, we'd be able to plan for eventually not worrying about the webhook being installed separately.

@robscott
Copy link
Member

I've been thinking about this for some time now, and I tend to lean in a different direction. I think that we need to differentiate between a "conformant environment" and a "conformant implementation". In some instances, these are going to be one and the same, but in others, the controller author only has so much control. I'm not sure how useful it is for our controller-focused conformance tests to require the webhook to be present.

I'd rather test that controllers don't blow up when presented with invalid resources. Most controllers can't control how they're installed, they can simply provide a recommended set of steps. Our docs currently state the following:

CRD and webhook validation is not the final validation i.e. webhook is "nice UX" but not schema enforcement. This validation is intended to provide immediate feedback to users when they provide an invalid configuration. Write code defensively...

I still think the webhook is valuable and improves the UX of Gateway API dramatically, but I'm not convinced that it can/should be relied on or required to be present when running conformance tests.

markmc added a commit to markmc/gateway-api that referenced this issue Nov 14, 2022
Relates to kubernetes-sigs#1514

This is a proof-of-concept conformance test that checks for the
validation behavior provided by the admission webhook. In this case
we're just checking that an update to a GatewayClass controller name
is rejected.

We use a temporary GatewayClass rather than the one specified by the
--gateway-class, to avoid messing up installations that do allow this
update through.

This does raise an interesting question though - isn't it strange that
we're saying an implementation is required to validate objects that it
should otherwise ignore? In this case, we would would otherwise say an
implementation should ingore this object because it doesn't recognize
the controller name.

Failure looks like this:

```
=== RUN   TestConformance/GatewayClassAdmissionValidation/GatewayClass_controllerName_is_immutable
    gatewayclass-admission-validation.go:62:
        	Error Trace:	gatewayclass-admission-validation.go:62
        	Error:      	Should be in error chain:
        	            	expected: %!q(**errors.StatusError=0xc000014008)
        	            	in chain:
        	Test:       	TestConformance/GatewayClassAdmissionValidation/GatewayClass_controllerName_is_immutable
        	Messages:   	updating gatewayclass-immutable GatewayClass.Spec.ControllerName should not be permitted
```
@markmc
Copy link
Contributor

markmc commented Nov 14, 2022

Just for discussion purposes, I put up #1534

Some thoughts:

  • It is definitely highly desirable to have validation like this - even the simple example of immutable GatewayClass.spec.controllerName shows how valuable it is in terms of UX
  • If we believe this should usually be achieved by installing our webhook, then this does feel like it's more about a "conformant environment" than a "conformant implementation"
  • If we want to check for a "conformant environment", it does seem a bit pointless to test all of the behaviors of the webhook - we can test the webhook before we publish its image!
  • In gateway-api: Add opt-in validating webhook istio/istio#41868 the proposal (AIUI) is to use library code from gateway-api to provide some admission validation checks - if we were providing a comprehensive validation library instead of an installable webhook, it might feel a bit more like we're trying to enforce that "conformant implementations" are providing these validations?
  • The GatewayClass.spec.controllerName example shows another weirdness with the "conformant implementation" approach - why would we require an implementation to validate objects that it would normally ignore (i.e. because it doesn't recognize the controller name) in order to be conformant? You'd expect we'd only require an implementation to validate objects it is accepting?

HTH

@howardjohn
Copy link
Contributor

I feel like we are testing conformant environment. The environment really is the implementation. If we look at k8s for example, we aren't just testing some api-server binary, we are testing "Kubernetes" which consists of an environment many components. Gateway API is a much smaller environment, but fairly similar.

An environment may get the validation from the built in installable webhook, the Istio library import, or even something else - all that matters is it validates.

controllerName

I think its just a bit of a quirk of mixing global resources and shared resources with multiple implementations, the lines get a bit blurred. For example, you may have 1 validator for the entire cluster (in which case you want cross-class validation), or you may have N implementations doing validations (in which case you might want same-class only validation). Trying to make sure all environments have coverage of all resources is tricky - especially because Kubernetes isn't smart enough to know about "class" in the webhook, so it will actually still need to call all the validators, they could just chose to NOP. But if they are already called, it feels like they might as well validate as well. That does lead to making validation potentially stricter (if I have multiple versions, we are applying both versions of validation at once), but that feels OK to me.

So overall, I think it seems weird especially if we just look at GatewayClass, but it still feels like the best option to me, and its less weird for other fields.

@youngnick
Copy link
Contributor Author

I think we've definitely had some good points raised here, so here's a new proposal from me.

What I'm really aiming for is this: the webhook currently implements some important checks for various types of object safety. I'd like to verify that a conformant implementation is doing some form of the same checks. It's a much better user experience if the invalid objects never make it into the apiserver (this is what the validation webhook is doing), but I think it's probably okay for implementations to not accept the invalid objects as well (this will handle @robscott's concern about invalid objects somehow making it through).

What I think we should work towards is a set of conformance tests (maybe a separate suite), that test that invalid objects (as defined in the webhook code) are either not accepted (ie, the webhook or some other webhook that performs the same checks is running), or that the objects are Accepted, status: false, with a Reason of FailedValidation (this is the case that the webhook is not running, but the implementation performs the same validations and updates the status accordingly).

For this to work, we'll need to ensure it's easy to import the validation code and either:

  • use it in a validation webhook
  • validate objects as part of normal implementation functions.

We already include the validation logic in release bundles, but we could make this easier by providing a simplified public API that will be easier to call (maybe a Validate function that takes a Kubenetes object.Object and then typecasts it and runs the type-specific checks?), and moving usage of it in our webhook to that.

Alternatively, we have a pretty clear pattern for validation functions in our current validation package. We could stick with what we have, then just mandate that:

  • implementations need to validate resources as part of reconciliation, and not Accept them with a ValidationFailed reason.
  • or, implementations can ensure the webhook is running
  • or both.

This approach would let the conformance tests validate the important thing - that the restrictions described in the webhook, which aren't easily expressible in field-level validations, are in effect. Whether those restrictions are done in a webhook or in the implementation doesn't seem as important to me as that they happen somewhere.

@robscott
Copy link
Member

robscott commented Nov 21, 2022

Not fixed yet, looks like this was closed by an unrelated PR.

@shaneutt
Copy link
Member

This is still something we'd like to have:

/help

And ideally we would have it prior to our GA release:

/priority important-soon

As such we'll drop it in the v1.0.0 milestone and see if we can get progress on it soon.

@k8s-ci-robot k8s-ci-robot added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Apr 10, 2023
@shaneutt shaneutt added this to the v1.0.0 milestone Apr 10, 2023
@shaneutt shaneutt removed this from the v1.0.0 milestone Sep 27, 2023
@shaneutt
Copy link
Member

We have deprecated the webhook, so it would seem we don't need to prioritize this any longer.

@shaneutt shaneutt closed this as not planned Won't fix, can't repro, duplicate, stale Mar 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/conformance help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/feature Categorizes issue or PR as related to a new feature. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants