Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

system-agent unnecessarily restarts when "gentle" conflict occurs #113

Open
Oats87 opened this issue May 4, 2023 · 0 comments
Open

system-agent unnecessarily restarts when "gentle" conflict occurs #113

Oats87 opened this issue May 4, 2023 · 0 comments
Assignees

Comments

@Oats87
Copy link
Collaborator

Oats87 commented May 4, 2023

There is a specific "race" condition in which the system-agent unnecessarily will restart and/or reapply a plan.

The condition that can cause this is from the following:

  1. CAPR planner delivers a new plan to the system-agent
  2. system-agent takes plan, applies it, updates the secret with the applied-checksum
  3. CAPR plansecret controller sees updated plan secret, and proceeds to update the appliedPlan on the secret
  4. system-agent in the mean time has re-enqueued and is trying to run the probes for the second iteration -- when it is done and tries to update but by this time, the CAPR plansecret controller has beat it and the system-agent gets a conflict error due to mismatched RV.

The proposed fix for this is to simply attempt to retrieve the latest secret from the api server, ensure the applied checksum still matches the plan that was just applied, and if so, update the latest secret. This is a safe operation because contractually, the system-agent and CAPR have a contract that makes each responsible for their specific keys.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant