-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
core: subvolumegroup clean up #14026
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sp98 i though we agreed on not duplicating the code in Rook and rook-ceph krew plugin it looks like this PR is not importing any of the existing things from the kubectl-rook-ceph whats the plan going forward?
Agree that duplicating the code will be bad. I'll discuss this in huddle to check what will be approach. |
383f127
to
e7939c2
Compare
@@ -0,0 +1,25 @@ | |||
--- | |||
title: Ceph Cleanup |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of a new doc, how about a new section in ceph-teardown.md?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved it to ceph-teardown.md
doc.
|
||
Once the cleanup job is completed successfully, Rook will remove the finalizers from the deleted custom resource. | ||
|
||
This cleanup is supported only for the following custom resources: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we're going to have multiple types of resources supported, it would be nice to have a table, one row for each resource, and its corresponding resources that will be cleaned up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using table.
cmd/rook/ceph/cleanup.go
Outdated
|
||
logger.Infof("starting clean up ceph SubVolumeGroup resource %q", subVolumeGroupName) | ||
|
||
subVolumeList, err := cephclient.ListSubvolumesInGroup(context, clusterInfo, fsName, subVolumeGroupName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only the CLI processing should be in the cmd
package. The rest of this code should likely move under pkg/daemon/ceph
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved to pkg/daemon/ceph
pkg/operator/ceph/csi/spec.go
Outdated
@@ -143,8 +143,8 @@ var ( | |||
DefaultRegistrarImage = "registry.k8s.io/sig-storage/csi-node-driver-registrar:v2.10.0" | |||
DefaultProvisionerImage = "registry.k8s.io/sig-storage/csi-provisioner:v4.0.0" | |||
DefaultAttacherImage = "registry.k8s.io/sig-storage/csi-attacher:v4.5.0" | |||
DefaultSnapshotterImage = "registry.k8s.io/sig-storage/csi-snapshotter:v7.0.1" | |||
DefaultResizerImage = "registry.k8s.io/sig-storage/csi-resizer:v1.10.0" | |||
DefaultSnapshotterImage = "registry.k8s.io/sig-storage/csi-snapshotter:v6.3.2" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was it intentional to downgrade these image versions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed this change.
cmd/rook/ceph/cleanup.go
Outdated
} | ||
|
||
var cleanUpDisksCmd = &cobra.Command{ | ||
Use: "disks", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aren't we cleaning subvolumes not disks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already had a ceph clean
clean to clean up the data on the disks.
I've extend to clean
command with more sub commands.
ceph clean disks
will be use the data cleanup job. This is an existing behavior.
ceph clean CephFSSubVolumeGroup
will clean up the subvolumegrous.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ceph clean disks
is called by which container?
f09904b
to
cb0c42f
Compare
6472c34
to
9c23c8a
Compare
@@ -2,6 +2,23 @@ | |||
title: Cleanup | |||
--- | |||
|
|||
## Cleaning up a Custom Resource |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
## Cleaning up a Custom Resource | |
## Force Delete Resources |
@@ -2,6 +2,23 @@ | |||
title: Cleanup | |||
--- | |||
|
|||
## Cleaning up a Custom Resource | |||
|
|||
To cleanup a specific custom resource, add `ceph.rook.io/force-deletion="true"` annotation before deleting it. Rook will start a cleanup job that will delete the all the ceph resources created by that custom resource. For example, run the following commands to clean the `CephFSSubVolumeGroup` resource named `myfs-csi`: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To cleanup a specific custom resource, add `ceph.rook.io/force-deletion="true"` annotation before deleting it. Rook will start a cleanup job that will delete the all the ceph resources created by that custom resource. For example, run the following commands to clean the `CephFSSubVolumeGroup` resource named `myfs-csi`: | |
To keep your data safe in the cluster, Rook disallows deleting critical cluster resources by default. To override this behavior and force delete a specific custom resource, add the annotation `ceph.rook.io/force-deletion="true"` to the resource and then delete it. Rook will start a cleanup job that will delete the all the related ceph resources created by that custom resource. | |
For example, run the following commands to clean the `CephFSSubVolumeGroup` resource named `my-subvolumegroup`: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CephFSSubVolumeGroup
to CephFilesystemSubVolumeGroup
To cleanup a specific custom resource, add `ceph.rook.io/force-deletion="true"` annotation before deleting it. Rook will start a cleanup job that will delete the all the ceph resources created by that custom resource. For example, run the following commands to clean the `CephFSSubVolumeGroup` resource named `myfs-csi`: | ||
|
||
``` console | ||
kubectl annotate cephfilesystemsubvolumegroups.ceph.rook.io myfs-csi -n rook-ceph ceph.rook.io/force-deletion="true" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kubectl annotate cephfilesystemsubvolumegroups.ceph.rook.io myfs-csi -n rook-ceph ceph.rook.io/force-deletion="true" | |
kubectl annotate cephfilesystemsubvolumegroups.ceph.rook.io my-subvolumegroup -n rook-ceph ceph.rook.io/force-deletion="true" |
|
||
``` console | ||
kubectl annotate cephfilesystemsubvolumegroups.ceph.rook.io myfs-csi -n rook-ceph ceph.rook.io/force-deletion="true" | ||
kubectl delete cephfilesystemsubvolumegroups.ceph.rook.io myfs-csi -n rook-ceph |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For consistency in the docs, let's put the namespace first. We aren't very consistent about it, but IMO it is a nice convention.
kubectl delete cephfilesystemsubvolumegroups.ceph.rook.io myfs-csi -n rook-ceph | |
kubectl -n rook-ceph delete cephfilesystemsubvolumegroups.ceph.rook.io my-subvolumegroup |
if !success { | ||
return opcontroller.ImmediateRetryResult, errors.Wrapf(err, "Waiting for ceph cleanup job to complete successfully for cephFilesystemSubVolumeGroup %q", cephFilesystemSubVolumeGroup.Name) | ||
} | ||
logger.Infof("successfully cleaned up all the ceph resources created by the cephFilesystemSubVolumeGroup %q", cephFilesystemSubVolumeGroup.Name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we get here it just means the job was successfully started. We don't know if it successfully completed, so we should always requeue the event, right?
cleanupConfig := map[string]string{ | ||
opcontroller.CephFSSubVolumeGroupNameEnv: cephFilesystemSubVolumeGroup.Spec.Name, | ||
opcontroller.CephFSVolumeNameEnv: cephFilesystemSubVolumeGroup.Spec.FilesystemName, | ||
opcontroller.CephFSNamesaceEnv: "csi", // TODO: is it always "csi" for cephFS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We create a default "csi" subvolumegroup for each cephfilesystem, but if the user creates their own subvolumegroup CR, the name will be different. And isn't this the namespace, not the svg name?
opcontroller.CephFSNamesaceEnv: "csi", // TODO: is it always "csi" for cephFS | |
opcontroller.CephFSNamespaceEnv: namespace |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is rados namespace name where the omap are stored, its always csi for csi created PVC. i suggested above to rename this variable name to avoid confusion
@@ -187,6 +187,25 @@ func (r *ReconcileCephFilesystemSubVolumeGroup) reconcile(request reconcile.Requ | |||
if cephCluster.Spec.External.Enable { | |||
logger.Warningf("external subvolume group %q deletion is not supported, delete it manually", namespacedName) | |||
} else { | |||
if opcontroller.IsCleanupRequired(cephFilesystemSubVolumeGroup.GetAnnotations()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about moving this whole block to a helper method? This method is so long already, it helps for code readability.
}, | ||
} | ||
|
||
existingJob := &batch.Job{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before we get or create the job, let's check if there are any subvolumes in the subvolumegroup. if not, then we don't need to create the job at all in the first place and it can be immediately deleted. Or is that check somewhere else in the finalizer already?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently are always creating a job for cleaning up the subvolumegroup, whether it has subvolumes or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That looks like good to have. But since we only create this job when annotation is present on the resource, it should be ok to run the job as user has deliberately asked to clean the resource (subvolumes, snapshot etc)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there are no subvolumes in the svg, seems like the finalizer should be immediately removed. This would cover both cases of 1) no subvolumes existing (and no need for creating the job), or 2) if the subvolumes were already cleaned up by the job and the operator is retrying to delete the svg CR.
} | ||
|
||
existingJob := &batch.Job{} | ||
err := client.Get(ctx, types.NamespacedName{Name: jobName, Namespace: c.resource.GetNamespace()}, existingJob) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of calling Get()
, how about just calling Create()
and ignore if there is an error that it already exists? Then no need for two api calls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Get
will also help to get the currently running job and validate its status.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only status we need to know is if there are any subvolumes remaining. The status of the job seems like an extra detail we don't need? Let's discuss.
@@ -2,6 +2,23 @@ | |||
title: Cleanup | |||
--- | |||
|
|||
## Cleaning up a Custom Resource | |||
|
|||
To cleanup a specific custom resource, add `ceph.rook.io/force-deletion="true"` annotation before deleting it. Rook will start a cleanup job that will delete the all the ceph resources created by that custom resource. For example, run the following commands to clean the `CephFSSubVolumeGroup` resource named `myfs-csi`: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CephFSSubVolumeGroup
to CephFilesystemSubVolumeGroup
|
||
| Custom Resource | Ceph Resources to be cleaned up | | ||
| -------- | ------- | | ||
| CephFSSubVolumeGroups | OMAP details, snapshots, clones, subvolumes | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CephFilesystemSubVolumeGroup
OMAP details
-> CSI stored RADOS OMAP details for pvc/volumesnapshots
snapshots
-> subvolume snapshots
clones
-> subvolume clones
cmd/rook/ceph/cleanup.go
Outdated
|
||
fsName := os.Getenv(opcontroller.CephFSVolumeNameEnv) | ||
if fsName == "" { | ||
return fmt.Errorf("cephfs volume name is not available in the pod environment variables") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cephfs volume name
to ceph filesystem name
as the user known about the filesystem and subvolumegroup relationship and not much aware what is fs volume
is even though it also refers to filesystem
cmd/rook/ceph/cleanup.go
Outdated
namespace := os.Getenv(k8sutil.PodNamespaceEnvVar) | ||
clusterInfo := client.AdminClusterInfo(ctx, namespace, "") | ||
|
||
fsName := os.Getenv(opcontroller.CephFSVolumeNameEnv) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CephFSVolumeNameEnv
to CephFSNameEnv
cmd/rook/ceph/cleanup.go
Outdated
csiNamespace := os.Getenv(opcontroller.CephFSNamesaceEnv) | ||
if csiNamespace == "" { | ||
return fmt.Errorf("cephfs SubVolumeGroup namespace name is not available in the pod environment variables") | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
csiNamespace := os.Getenv(opcontroller.CephFSNamesaceEnv) | |
if csiNamespace == "" { | |
return fmt.Errorf("cephfs SubVolumeGroup namespace name is not available in the pod environment variables") | |
} | |
csiNamespace := os.Getenv(opcontroller.CSICephFSRadosNamesaceEnv) | |
if csiNamespace == "" { | |
return fmt.Errorf("CSI rados namespace name is not available in the pod environment variables") | |
} |
func DeleteSubVolumeSnapshots(context *clusterd.Context, clusterInfo *client.ClusterInfo, snapshots cephclient.SubVolumeSnapshotList, fsName, subvol, svg string) error { | ||
for _, snapshot := range snapshots { | ||
logger.Info("deleting snapshot %q for subvolume %q in group %q", snapshot.Name, subvol, svg) | ||
err := cephclient.DeleteSubvolumeSnapshot(context, clusterInfo, fsName, subvol, svg, snapshot.Name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we cannot delete snapshots which are having pending clones and also we need to cancel clone and delete the subvolume created for the clone before deleting the snapshot
pkg/daemon/ceph/client/filesystem.go
Outdated
// SubVolumeSnapshotList is the list of snapshots in a CephFS subvolume | ||
type SubVolumeSnapshotList []SubVolumeSnapshot | ||
|
||
// ListSubVolumeSnaphotsInGroup lists all the subvolume snapshots present in the given filesystem's subvolume group by |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// ListSubVolumeSnaphotsInGroup lists all the subvolume snapshots present in the given filesystem's subvolume group by | |
// ListSubVolumeSnaphots lists all the subvolume snapshots present in the subvolume in the given filesystem's subvolume group. Times out after 5 seconds. |
pkg/daemon/ceph/client/filesystem.go
Outdated
} | ||
|
||
// SubVolumeSnapshotList is the list of snapshots in a CephFS subvolume | ||
type SubVolumeSnapshotList []SubVolumeSnapshot |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SubVolumeSnapshotList
to SubVolumeSnapshots
we can remove List
as its ending with plurals
Volumes: volumes, | ||
RestartPolicy: v1.RestartPolicyOnFailure, | ||
PriorityClassName: cephv1.GetCleanupPriorityClassName(c.cluster.Spec.PriorityClassNames), | ||
ServiceAccountName: k8sutil.DefaultServiceAccount, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we add securityContext to this as well?
cleanupConfig := map[string]string{ | ||
opcontroller.CephFSSubVolumeGroupNameEnv: cephFilesystemSubVolumeGroup.Spec.Name, | ||
opcontroller.CephFSVolumeNameEnv: cephFilesystemSubVolumeGroup.Spec.FilesystemName, | ||
opcontroller.CephFSNamesaceEnv: "csi", // TODO: is it always "csi" for cephFS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is rados namespace name where the omap are stored, its always csi for csi created PVC. i suggested above to rename this variable name to avoid confusion
fd41ddb
to
f48bc5a
Compare
@@ -2,6 +2,11 @@ | |||
title: Cleanup | |||
--- | |||
|
|||
Rook provides following clean up options: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rook provides following clean up options: | |
Rook provides the following clean up options: |
|
||
## Force Delete Resources | ||
|
||
To keep your data safe in the cluster, Rook disallows deleting critical cluster resources by default. To override this behavior and force delete a specific custom resource, add the annotation `ceph.rook.io/force-deletion="true"` to the resource and then delete it. Rook will start a cleanup job that will delete the all the related ceph resources created by that custom resource. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To keep your data safe in the cluster, Rook disallows deleting critical cluster resources by default. To override this behavior and force delete a specific custom resource, add the annotation `ceph.rook.io/force-deletion="true"` to the resource and then delete it. Rook will start a cleanup job that will delete the all the related ceph resources created by that custom resource. | |
To keep your data safe in the cluster, Rook disallows deleting critical cluster resources by default. To override this behavior and force delete a specific custom resource, add the annotation `ceph.rook.io/force-deletion="true"` to the resource and then delete it. Rook will start a cleanup job that will delete all the related ceph resources created by that custom resource. |
cmd/rook/ceph/cleanup.go
Outdated
} | ||
|
||
var cleanUpHostsCmd = &cobra.Command{ | ||
Use: "hosts", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only one host is cleaned up by this command for a particular cleanup job, right?
Use: "hosts", | |
Use: "host", |
cmd/rook/ceph/cleanup.go
Outdated
|
||
var cleanUpHostsCmd = &cobra.Command{ | ||
Use: "hosts", | ||
Short: "Starts the cleanup process on the hosts after ceph cluster is deleted", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Short: "Starts the cleanup process on the hosts after ceph cluster is deleted", | |
Short: "Starts the cleanup process on a host after the ceph cluster is deleted", |
pkg/daemon/ceph/cleanup/disk.go
Outdated
@@ -74,6 +74,7 @@ func (s *DiskSanitizer) StartSanitizeDisks() { | |||
// Start the sanitizing sequence | |||
s.SanitizeRawDisk(osdRawList) | |||
} | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
revert the blank line?
|
||
err = cephclient.DeleteSubVolume(context, clusterInfo, fsName, subVolume.Name, svg) | ||
if err != nil { | ||
return errors.Wrapf(err, "failed to delete subvolume Group %q.", subVolume.Name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of returning immediately on failure, store the error in a var and attempt deletion of all subvolumes. Then at the end return the error if any. This way, we can always attempt to cleanup as much as possible.
pkg/operator/ceph/cluster/cleanup.go
Outdated
@@ -138,7 +138,7 @@ func (c *ClusterController) cleanUpJobContainer(cluster *cephv1.CephCluster, mon | |||
SecurityContext: securityContext, | |||
VolumeMounts: volumeMounts, | |||
Env: envVars, | |||
Args: []string{"ceph", "clean"}, | |||
Args: []string{"ceph", "clean", "hosts"}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Args: []string{"ceph", "clean", "hosts"}, | |
Args: []string{"ceph", "clean", "host"}, |
}, | ||
} | ||
|
||
existingJob := &batch.Job{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there are no subvolumes in the svg, seems like the finalizer should be immediately removed. This would cover both cases of 1) no subvolumes existing (and no need for creating the job), or 2) if the subvolumes were already cleaned up by the job and the operator is retrying to delete the svg CR.
} | ||
|
||
existingJob := &batch.Job{} | ||
err := client.Get(ctx, types.NamespacedName{Name: jobName, Namespace: c.resource.GetNamespace()}, existingJob) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only status we need to know is if there are any subvolumes remaining. The status of the job seems like an extra detail we don't need? Let's discuss.
7ec4e67
to
509f07c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is testing looking? In a separate PR we could add integration tests, as long as manual testing with various scenarios is covered.
volumeName = "cleanup-volume" | ||
dataDirHostPath = "ROOK_DATA_DIR_HOST_PATH" | ||
CleanupAppName = "resource-cleanup" | ||
RESOURCE_CLEANUP_ANNOTATION = "ceph.rook.io/force-deletion" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need for the ceph prefix, so we can use this annotation for any type of resource in the future.
RESOURCE_CLEANUP_ANNOTATION = "ceph.rook.io/force-deletion" | |
RESOURCE_CLEANUP_ANNOTATION = "rook.io/force-deletion" |
|
||
// Start a new job to perform clean up of the ceph resources. It returns true if the cleanup job has succeeded | ||
func (c *ResourceCleanup) StartJob(ctx context.Context, clientset kubernetes.Interface) error { | ||
jobName := k8sutil.TruncateNodeNameForJob("cleanup-job-%s", c.resource.GetName()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need some uniqueness in the resource naming so we don't conflict if other resource types in the future have the same resource name, something like this:
jobName := k8sutil.TruncateNodeNameForJob("cleanup-job-%s", c.resource.GetName()) | |
jobName := k8sutil.TruncateNodeNameForJob("cleanup-svg-%s-%s", svg.Spec.FilesystemName, c.resource.GetName()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New job name
❯ oc get jobs -n rook-ceph
NAME COMPLETIONS DURATION AGE
cleanup-svg-myfs-myfs-csi 1/1 10s 2m43s
Manual testing for following scenarios looks good:
@Madhu-1 Mentioned that subvolumes created for the clones have to be deleted as well. That is not covered in the implementation as of now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just a few nits
} | ||
err = cephclient.DeleteSubVolume(context, clusterInfo, fsName, subVolume.Name, svg) | ||
if err != nil { | ||
retErr = errors.Wrapf(err, "failed to delete subvolume Group %q.", subVolume.Name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
retErr = errors.Wrapf(err, "failed to delete subvolume Group %q.", subVolume.Name) | |
retErr = errors.Wrapf(err, "failed to delete subvolume group %q.", subVolume.Name) |
return "", errors.Wrapf(err, "failed to list omapKeys for omapObj %q", omapObj) | ||
} | ||
|
||
// TODO: Find better solution |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What better solution do we need? If this parsing is working, perhaps it's fine?
@@ -339,7 +340,17 @@ func (r *ReconcileCephFilesystemSubVolumeGroup) deleteSubVolumeGroup(cephFilesys | |||
// If the subvolume group has subvolumes the command will fail with: | |||
// Error ENOTEMPTY: error in rmdir /volumes/csi | |||
if ok && (code == int(syscall.ENOTEMPTY)) { | |||
return errors.Wrapf(err, "failed to delete ceph filesystem subvolume group %q, remove the subvolumes first", cephFilesystemSubVolumeGroup.Name) | |||
msg := fmt.Sprintf("failed to delete ceph filesystem subvolume group %q, remove the subvolumes first", cephFilesystemSubVolumeGroup.Name) | |||
if opcontroller.IsCleanupRequired(cephFilesystemSubVolumeGroup.GetAnnotations()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This helper is just checking the annotations, right? How about this method name?
if opcontroller.IsCleanupRequired(cephFilesystemSubVolumeGroup.GetAnnotations()) { | |
if opcontroller.ForceDeleteRequested(cephFilesystemSubVolumeGroup.GetAnnotations()) { |
opcontroller.CephFSSubVolumeGroupNameEnv: svg.Spec.Name, | ||
opcontroller.CephFSNameEnv: svg.Spec.FilesystemName, | ||
opcontroller.CSICephFSRadosNamesaceEnv: "csi", | ||
opcontroller.CephFSMetaDataPoolNameEnv: fmt.Sprintf("%s-%s", svg.Spec.FilesystemName, "metadata"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about calling generateMetaDataPoolName()
here instead of re-implementing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using generateMetaDataPoolName()
function.
cd237f6
to
83dbba8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, small nit
cmd/rook/ceph/cleanup.go
Outdated
@@ -87,3 +106,37 @@ func startCleanUp(cmd *cobra.Command, args []string) error { | |||
|
|||
return nil | |||
} | |||
|
|||
func startSubVolumeGroupsCleanUp(cmd *cobra.Command, args []string) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
startSubVolumeGroupsCleanUp
to startSubVolumeGroupCleanUp
as we are working with single group here.
cmd/rook/ceph/cleanup.go
Outdated
|
||
err := cleanup.SubVolumeGroupCleanup(context, clusterInfo, fsName, subVolumeGroupName, poolName, csiNamespace) | ||
if err != nil { | ||
rook.TerminateFatal(fmt.Errorf("failed to cleanup cephFS SubVolumeGroup %q in the namespace %q. %v", subVolumeGroupName, namespace, err)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we log filesystem name as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logging it now.
for _, subVolume := range subVolumeList { | ||
logger.Infof("starting clean up of subvolume %q", subVolume.Name) | ||
err := CleanUpOMAPDetails(context, clusterInfo, subVolume.Name, poolName, csiNamespace) | ||
if err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we might need to handle not found errors as future enhancement as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added it the pending list in the PR description. Will take it as a future enhancement.
@sp98 can we also add function test for this one (as a new PR not as part of this one)? |
Cleanup the resources created by subvolumegroup when its deleted. Following resources will be cleaned up: - OMAP value - OMAP keys - Clones - Snapshots - Subvolumes Signed-off-by: sp98 <[email protected]>
core: subvolumegroup clean up (backport #14026)
Cleanup the resources created by subvolumegroup
when its deleted. Following resources will be cleaned up:
Cleanup:
pending (maybe for a follow up PR)
Checklist: