Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data dir rfc #410

Merged
merged 9 commits into from
Jul 1, 2024
Merged

Data dir rfc #410

merged 9 commits into from
Jul 1, 2024

Conversation

jakefhyde
Copy link
Contributor

@jakefhyde jakefhyde commented Jun 19, 2024

Issue: rancher/rancher#45038

Problem

Solution

CheckList

  • Test
  • Docs

@jakefhyde jakefhyde requested a review from a team as a code owner June 19, 2024 02:04
go.mod Outdated Show resolved Hide resolved
Copy link
Contributor

@ericpromislow ericpromislow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have context, but found typos and don't expect a go.mod replace pointing to a personal path to ever work

@jakefhyde
Copy link
Contributor Author

@ericpromislow Updated. CI won't pass until rancher/rancher#45326 is merged anyways.

go.mod Outdated
Comment on lines 21 to 22
k8s.io/dynamic-resource-allocation => k8s.io/dynamic-resource-allocation v0.29.3
k8s.io/endpointslice => k8s.io/endpointslice v0.29.3
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intellisense fails without doing this.

@jakefhyde
Copy link
Contributor Author

Added tests and docs. Changed a bit structurally due to some edge cases found during unit testing, but logic is all still the same.

Oats87
Oats87 previously approved these changes Jun 28, 2024
Copy link

@Oats87 Oats87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high level LGTM

Copy link
Contributor

@MbolotSuse MbolotSuse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple of suggestions on organization/function re-use. One other use case that I'm not sure that this addresses is changing the env var from one value to another - I think this permits that operation, but I'm not sure that's intentional.


#### Data Directories

For all `cluster.provisioning.cattle.io/v1` objects, prevent the creation of new objects with an env var (under
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Since the type is is already defined as part of the path/file name, we don't need to re-define it here.

Suggested change
For all `cluster.provisioning.cattle.io/v1` objects, prevent the creation of new objects with an env var (under
Prevent the creation of new objects with an env var (under

Comment on lines 161 to 166
return &metav1.Status{
Status: "Failure",
Message: `"CATTLE_AGENT_VAR_DIR" cannot be set within "cluster.Spec.RKEConfig.AgentEnvVars": use "cluster.Spec.RKEConfig.DataDirectories.SystemAgent"`,
Reason: metav1.StatusReasonBadRequest,
Code: http.StatusBadRequest,
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here and in several other places you are manually crafting the status field. This is a bit wordy (4 lines as opposed to 1) and doesn't explicitly set the Result.Allowed field. I would recommend using the webhook's admission package, which contains an ResponseBadRequest function here- this fills in a lot of these fields and sets the allowed flag on the response. You will need to return an AdmissionResponse object from your funciton, but I think that this will make it a lot more readable:

Suggested change
return &metav1.Status{
Status: "Failure",
Message: `"CATTLE_AGENT_VAR_DIR" cannot be set within "cluster.Spec.RKEConfig.AgentEnvVars": use "cluster.Spec.RKEConfig.DataDirectories.SystemAgent"`,
Reason: metav1.StatusReasonBadRequest,
Code: http.StatusBadRequest,
}
return admission.ResponseBadRequest( `"CATTLE_AGENT_VAR_DIR" cannot be set within "cluster.Spec.RKEConfig.AgentEnvVars": use "cluster.Spec.RKEConfig.DataDirectories.SystemAgent"`)

newCount++
}
}
if newCount > oldCount {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that this works as intended. From what I can see here, I could change this value as long as I added/removed in a single step. For example, if I change CATTLE_AGENT_VAR_DIR from /var/lib/rke2 to /var/lib/new, that would work.

That being said, I'm not 100% sure that this is intentional - open to hearing otherwise.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly it's not super clear but it is functionally correct: this function is making sure that new CATTLE_AGENT_VAR_DIR vars are only allowed to be added, not removed. The other function is supposed to do validation, which is covered by this case:

	oldEnvVars := slices.DeleteFunc(slices.Clone(oldCluster.Spec.AgentEnvVars), func(envVar rkev1.EnvVar) bool {
		return envVar.Name != systemAgentVarDirEnvVar
	})
	// will be true up until rancher performs one time migration to set SystemAgent data directory
	if len(oldEnvVars) > 0 {
		newEnvVars := slices.DeleteFunc(slices.Clone(newCluster.Spec.AgentEnvVars), func(envVar rkev1.EnvVar) bool {
			return envVar.Name != systemAgentVarDirEnvVar
		})
		if reflect.DeepEqual(newEnvVars, oldEnvVars) && oldCluster.Spec.RKEConfig.DataDirectories.SystemAgent == newCluster.Spec.RKEConfig.DataDirectories.SystemAgent {
			// rancher migration has not been performed yet
			return admission.ResponseAllowed()
		}
		if newCluster.Spec.RKEConfig.DataDirectories.SystemAgent != oldEnvVars[len(oldEnvVars)-1].Value {
			return admission.ResponseBadRequest(
				fmt.Sprintf(`"%s" cannot be set within "cluster.Spec.RKEConfig.AgentEnvVars": use "cluster.Spec.RKEConfig.DataDirectories.SystemAgent"`, systemAgentVarDirEnvVar))
		}

so if we were defining the env var previously, and we haven't changed either the env var or data directory field in this update, assume it's fine (actually as I look at it we shouldn't allow this without validating the other data directories but you get the idea).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be missing something, but the current method will only stop me from adding/removing the env var right? So if I just change it, it's allowed. From your reply we don't want that, it needs to be the same value as before correct?

Copy link
Contributor

@MbolotSuse MbolotSuse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple of small comments and 1 unresolved thread.

@@ -103,6 +113,14 @@ func (p *provisioningAdmitter) Admit(request *admission.Request) (*admissionv1.A
if err := p.validateCloudCredentialAccess(request, response, oldCluster, cluster); err != nil || response.Result != nil {
return response, err
}

if response = p.validateAgentEnvVars(request, oldCluster, cluster); response.Result != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would handle the "ok" cases differently than this. Right now this depends on response.Result being nil for ResponseAllowed() which I think is very dangerous long term. I would recommend one of the following options:

  • Only return a response if it fails validation and return nil otherwise. In this function, check response != nil to see if you should break with a response
  • In this function, check if !response.Allowed - this way, if it fails we short-circuit and return early.
  • Return a bool and an *admissionv1.AdmissionResponse. Use the bool to check if the response was allowed or not.

Big thing here is not to closely couple this code to what the allowed response does like it's currently doing.

if newCluster.Spec.RKEConfig == nil {
return admission.ResponseAllowed()
}
oldSystemAgentVarDirEnvVars := slices.DeleteFunc(slices.Clone(oldCluster.Spec.AgentEnvVars), func(envVar rkev1.EnvVar) bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: It occurs to me that you could do this better with a simple extraction func to find the variable. Something like:

def getEnvVar(name string, envVars []rkev1.EnvVar) *rkev1.EnvVar{
    for _, envVar := range envVars{
        if envVar.Name == name{
            return &envVar
        }
    }
    return nil
}

Main benefit that we get here is that we are no longer cloning/modifying a slice, so I'd imagine the performance here is going to be a bit better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue with this suggestion is that it returns the first occurrence of the env var, and we want the last, as it's possible to specify an env var multiple times.

Comment on lines 191 to 192
if !reflect.DeepEqual(newSystemAgentVarDirEnvVars, oldSystemAgentVarDirEnvVars) || oldCluster.Spec.RKEConfig.DataDirectories.SystemAgent != newCluster.Spec.RKEConfig.DataDirectories.SystemAgent {
if newCluster.Spec.RKEConfig.DataDirectories.SystemAgent != oldSystemAgentVarDirEnvVars[len(oldSystemAgentVarDirEnvVars)-1].Value {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I think that this is a bit tough to follow. I think this would also be easier to write once the actual env var is extracted and you can move out of slices.

In addition, since the second if statement doesn't have an else (or any other additional logic). it can be combined into one if statement.

Copy link
Contributor

@MbolotSuse MbolotSuse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some small test cases to add but outside of that LGTM

@MbolotSuse MbolotSuse dismissed ericpromislow’s stale review July 1, 2024 22:47

Dismissing as stale to permit merging after two other approvals.

@MbolotSuse
Copy link
Contributor

@jakefhyde please squash the commits when merging.

@jakefhyde jakefhyde merged commit 70f1273 into rancher:release/v0.5 Jul 1, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants