VAULT-28192 fix Agent and Proxy consuming large amounts of CPU for auto-auth self-healing #27518

VioletHynes · 2024-06-17T14:57:32Z

Description

Fixes an issue introduced in 1.17 where CPU usage in Agent and Proxy are extremely high due to the code taking the same path down a select statement repeatedly (in an infinite loop).

Will be backported to 1.17.

Fixes #27505

TODO only if you're a HashiCorp employee

Labels: If this PR is the CE portion of an ENT change, and that ENT change is
getting backported to N-2, use the new style backport/ent/x.x.x+ent labels
instead of the old style backport/x.x.x labels.
Labels: If this PR is a CE only change, it can only be backported to N, so use
the normal backport/x.x.x label (there should be only 1).
ENT Breakage: If this PR either 1) removes a public function OR 2) changes the signature
of a public function, even if that change is in a CE file, double check that
applying the patch for this PR to the ENT repo and running tests doesn't
break any tests. Sometimes ENT only tests rely on public functions in CE
files.
Jira: If this change has an associated Jira, it's referenced either
in the PR description, commit message, or branch name.
RFC: If this change has an associated RFC, please link it in the description.
ENT PR: If this change has an associated ENT PR, please link it in the
description. Also, make sure the changelog is in this PR, not in your ENT PR.

…to-auth self-healing

github-actions · 2024-06-17T15:10:38Z

CI Results:
All Go tests succeeded! ✅

github-actions · 2024-06-17T16:07:16Z

Build Results:
All builds succeeded! ✅

command/agent/template/template.go

jasonodonnell

LGTM

divyaac · 2024-06-18T00:38:54Z

command/agent/template/template.go

-					invalidTokenCh <- err
-				}
-			default:
+		case err := <-ts.runner.ServerErrCh:


I wonder if this is a possible scenario -

IncomingToken receives a new token at the same time template sends an error back to ServerErrCh

We reauthenticate first by honoring the ServerErrCh select first. Now IncomingCh has two values in the channel

We try using the first token in IncomingCh but there is an error. Another error is sent to ServerErrCh

Again the SeverErrCh is honored first and we reauthenticate.

We are now stuck in a loop where we always honor the token one behind valid token.

Hmm, that's a good point! I do think it's likely in this scenario that both tokens will be valid, but it's still not a great state to be in. I'll rework this to drain the incoming channel in the same place we drain the invalid token channel. I think that should prevent any looping

I definitely understand why we had it the way we had it before though, but I do think this might be the best fix, and the only situation it would struggle is if we have the two channels filled exactly simultaneously

Added here: https://github.com/hashicorp/vault/pull/27518/files#diff-90d2b6ef725d713c0515d4ae01b17766d425facd05e03283c3a6034ade76e7d7R262

I like this!! Thanks for adding Violet!

VAULT-28192 fix Agent and Proxy consuming large amounts of CPU for au…

229f74e

…to-auth self-healing

github-actions bot added the hashicorp-contributed-pr If the PR is HashiCorp (i.e. not-community) contributed label Jun 17, 2024

VioletHynes added this to the 1.17.1 milestone Jun 17, 2024

Changelog

73655a3

VioletHynes added the backport/1.17.x label Jun 17, 2024

VioletHynes requested review from divyaac and jasonodonnell June 17, 2024 16:05

VioletHynes marked this pull request as ready for review June 17, 2024 16:05

Update changelog

a5eaea2

jasonodonnell reviewed Jun 17, 2024

View reviewed changes

command/agent/template/template.go Show resolved Hide resolved

Merge branch 'main' into violethynes/VAULT-28192

9418def

jasonodonnell self-requested a review June 17, 2024 17:59

jasonodonnell approved these changes Jun 17, 2024

View reviewed changes

VioletHynes mentioned this pull request Jun 17, 2024

VAULT-28192 Add known issue for Agent/Proxy CPU issue #27520

Merged

6 tasks

divyaac reviewed Jun 18, 2024

View reviewed changes

drain incoming if we get invalid token

27f13f3

hc-github-team-secure-vault-core mentioned this pull request Jun 18, 2024

Backport of VAULT-28192 Add known issue for Agent/Proxy CPU issue into release/1.17.x #27525

Merged

6 tasks

VioletHynes requested a review from divyaac June 18, 2024 13:47

divyaac approved these changes Jun 18, 2024

View reviewed changes

VioletHynes merged commit 3959722 into main Jun 19, 2024
82 of 83 checks passed

VioletHynes deleted the violethynes/VAULT-28192 branch June 19, 2024 14:23

hc-github-team-secure-vault-core mentioned this pull request Jun 19, 2024

Backport of VAULT-28192 fix Agent and Proxy consuming large amounts of CPU for auto-auth self-healing into release/1.17.x #27544

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

VAULT-28192 fix Agent and Proxy consuming large amounts of CPU for auto-auth self-healing #27518

VAULT-28192 fix Agent and Proxy consuming large amounts of CPU for auto-auth self-healing #27518

VioletHynes commented Jun 17, 2024 •

edited

Loading

github-actions bot commented Jun 17, 2024 •

edited

Loading

github-actions bot commented Jun 17, 2024 •

edited

Loading

jasonodonnell left a comment

divyaac Jun 18, 2024

VioletHynes Jun 18, 2024

VioletHynes Jun 18, 2024

VioletHynes Jun 18, 2024

divyaac Jun 18, 2024

VAULT-28192 fix Agent and Proxy consuming large amounts of CPU for auto-auth self-healing #27518

VAULT-28192 fix Agent and Proxy consuming large amounts of CPU for auto-auth self-healing #27518

Conversation

VioletHynes commented Jun 17, 2024 • edited Loading

Description

TODO only if you're a HashiCorp employee

github-actions bot commented Jun 17, 2024 • edited Loading

github-actions bot commented Jun 17, 2024 • edited Loading

jasonodonnell left a comment

Choose a reason for hiding this comment

divyaac Jun 18, 2024

Choose a reason for hiding this comment

VioletHynes Jun 18, 2024

Choose a reason for hiding this comment

VioletHynes Jun 18, 2024

Choose a reason for hiding this comment

VioletHynes Jun 18, 2024

Choose a reason for hiding this comment

divyaac Jun 18, 2024

Choose a reason for hiding this comment

VioletHynes commented Jun 17, 2024 •

edited

Loading

github-actions bot commented Jun 17, 2024 •

edited

Loading

github-actions bot commented Jun 17, 2024 •

edited

Loading