Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GEP-1897: TLS from Gateway to Backend for ingress #1906

Merged
merged 4 commits into from
May 1, 2023

Conversation

candita
Copy link
Contributor

@candita candita commented Apr 3, 2023

What type of PR is this?
/kind gep

What this PR does / why we need it:
There is a well-known gap in the API specification regarding TLS from Gateway to Backend for ingress.

This GEP is very tightly scoped because we have tried and failed to address this well-known gap in the API specification. The lack of support for this fundamental concept is holding back Gateway API adoption by users that require a solution to the use case.

Which issue(s) this PR fixes:

Fixes #1897

Does this PR introduce a user-facing change?:

NONE

@k8s-ci-robot k8s-ci-robot added kind/gep PRs related to Gateway Enhancement Proposal(GEP) cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Apr 3, 2023
@k8s-ci-robot
Copy link
Contributor

Hi @candita. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Apr 3, 2023
@bowei
Copy link
Contributor

bowei commented Apr 3, 2023

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Apr 3, 2023
geps/gep-1897.md Outdated Show resolved Hide resolved
geps/gep-1897.md Outdated Show resolved Hide resolved
Copy link
Member

@shaneutt shaneutt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a great example of a "what and why without the how" initial GEP to start back up the conversation on this topic without having too many details to comb through right away. As a maintainer I particularly appreciate the iterative approach in the face of our focus on GA as it helps to spread the cognitive load over smaller bits of time, and makes it more feasible for me to review in between other priorities.

The suggestion that this is an important feature for multiple implementations rings true with me, the goals and non-goals make sense, and I very much appreciate you meticulously laying out the long and troubled history of this effort's past.

As I have no blockers and this is Provisional with more iterations to come, I approve and I'm looking forward to seeing the follow-ups.

As has become customary with GEPs recently I'll place a temporary hold so that we can allow some time for several people to review without it merging too fast.

/hold

Thanks @candita!

@k8s-ci-robot k8s-ci-robot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Apr 4, 2023
geps/gep-1897.md Outdated Show resolved Hide resolved
geps/gep-1897.md Outdated Show resolved Hide resolved
Copy link

@costinm costinm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's a valid use case and it is ok to start with a narrow set of use cases ( as long as we believe the API choice wouldn't prevent or make difficult the other use cases).

However I would requiest that "Prior art" section is fixed and we include the actual
APIs in current use by different LB solutions. Past internal discussions are not prior art.
Once we have a clear picture of what various implementations currently do and support - we will have a much easier time defining our API.

geps/gep-1897.md Outdated Show resolved Hide resolved
@candita
Copy link
Contributor Author

candita commented Apr 6, 2023

@howardjohn @costinm @bowei I have highlighted some goals as immediate and longer term, as well as added some details on prior art in several implementations. PTAL when you get a chance.

geps/gep-1897.md Show resolved Hide resolved
geps/gep-1897.md Outdated
Comment on lines 42 to 44
5. Termination of TLS for HTTP routing (#1 in [Gateway API TLS Use Cases](#references))
6. HTTPS passthrough use cases (#2 in [Gateway API TLS Use Cases](#references))
7. Termination of TLS for non-HTTP TCP streams (#3 in [Gateway API TLS Use Cases](#references))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mentioning these as "already solved" may be more clear than "Non-goals".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to cover all the use cases in the referenced TLS use cases document. Sound fair?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am fine to list them, its just a bit to call something a non-goal when its already done; usually non-goals are for things that are not yet done but not in scope.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with @howardjohn here that it would be better to distinguish between "already solved" and "not yet done but not in scope".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will add a section "Already Solved TLS Use Cases" to isolate these 3 cases.

geps/gep-1897.md Outdated
6. HTTPS passthrough use cases (#2 in [Gateway API TLS Use Cases](#references))
7. Termination of TLS for non-HTTP TCP streams (#3 in [Gateway API TLS Use Cases](#references))
8. Controlling certificates used by more than one workload (#6 in [Gateway API TLS Use Cases](#references))
9. Client certificate settings used in the TLS handshake **from external clients to the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this. Its not possible for us to control and external client?

I get its a Non-Goal; but checking my understanding here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is use case 7 , which @youngnick mentioned is "particularly useful for JWT use cases, and can make doing some types of authentication and authorization easier".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the word "handshake".

geps/gep-1897.md Outdated Show resolved Hide resolved
backend pod). As mentioned, this solution satisfies the use case in which the backend pod
has its own certificate and the gateway client needs to know how to connect to the backend pod.

![image depicting TLS termination types](images/1897-TLStermtypes.png "TLS termination types")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very nit, would be nice to have one showing terminating but not originating.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, this is the exact opposite of the first feedback I got on this diagram.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I originally had the three termination types separated in a diagram, and got feedback from @robscott that it made more sense to combine these two in one as a contrast to the passthrough route. I can't please both of you, but hope it's okay as is.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah for a bit more context it was very difficult to show all possible forms here. We'd essentially need to cover all of the following options:

  1. Gateway termination, no origination
  2. Gateway termination, origination
  3. No Gateway termination, origination
  4. No Gateway termination, no origination
  5. Passthrough

The diagram that we ended up with feels simpler than anything that tried to represent all possible cases, and representing a slightly larger subset could just lead to confusion.

geps/gep-1897.md Outdated Show resolved Hide resolved
geps/gep-1897.md Outdated Show resolved Hide resolved

Details deferred until we reach consensus on what we want to do, and why we want to do this.

## Prior Art
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So:

  • Client decides TLS: Istio, Ambassador, OpenShift
  • Server (service) decides TLS: Contour, GKE

Is that a fair representation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Ambassador relies on the service configuration, so I would put it with the second bullet.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK Istio is both - with the new auto-mtls clearly in 'server decide' category.

And the service ultimately decides everywhere - if the service has a TLS port, there is nothing a client can do to change the fact that only TLS connections will work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@howardjohn @costin If there's something you want me to add to summarize this better, or you want me to point out the two categorizations, please let me know.


TLS from gateway to backend for ingress exists in several implementations, and was developed independently.

### Istio Gateway supports this with a DestinationRule:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DestinationRule is mainly using this for egress destinations.

For Gateway -> backend, Istio (now) uses auto-mtls, with the root CAs distributed as a config map and not explicitly configured by users.

Using tls.mode=SIMPLE and the rest for internal traffic is a rarely used config - and typically breaks Istio telemetry and policy features.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What action would you like to see taken here?

@youngnick
Copy link
Contributor

/lgtm for provisional.

/hold for more reviewers though.

@robscott
Copy link
Member

Would like to see a couple small follow ups on comment threads (#1906 (comment) and #1906 (comment)), but otherwise this LGTM. Thanks @candita!

@candita
Copy link
Contributor Author

candita commented Apr 27, 2023

@howardjohn @costinm @robscott @bowei @shaneutt I have made the following changes:

  • Removed already-solved use cases from non-goals
  • Added these uses cases to a new section called Already Solved TLS Use Cases
  • Added a new section called Open Questions (TODO) listing comments from @bowei and @costinm

@candita
Copy link
Contributor Author

candita commented Apr 27, 2023

/label tide/merge-method-squash

@k8s-ci-robot k8s-ci-robot added the tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. label Apr 27, 2023
@robscott
Copy link
Member

Will leave this for someone else to add the final LGTM, thanks for all the work on this @candita!

/approve

Copy link
Member

@shaneutt shaneutt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 1, 2023
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: candita, robscott, shaneutt

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@youngnick
Copy link
Contributor

/unhold

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 1, 2023
@k8s-ci-robot k8s-ci-robot merged commit 10899cc into kubernetes-sigs:main May 1, 2023
@shaneutt shaneutt added this to the v1.0.0 milestone May 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/gep PRs related to Gateway Enhancement Proposal(GEP) lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges.
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

GEP: TLS from Gateway to Backend for ingress (backend TLS termination)
9 participants