Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

return abnormal if the mount is corrupted #3462

Merged
merged 3 commits into from
Oct 26, 2022

Conversation

Madhu-1
Copy link
Collaborator

@Madhu-1 Madhu-1 commented Oct 21, 2022

When we do stat on the targetpath, if there is any error we can check is it due to corruption. If yes, cephcsi can return abnormal in the NodeGetVolumeStats so that consumer (CO/admin) and detect and take further action.

Signed-off-by: Madhu Rajanna [email protected]

@Madhu-1 Madhu-1 added the DNM DO NOT MERGE label Oct 21, 2022
@Madhu-1
Copy link
Collaborator Author

Madhu-1 commented Oct 21, 2022

Added DNM as still need to test it out.

@Madhu-1 Madhu-1 added ci/skip/e2e skip running e2e CI jobs ci/skip/multi-arch-build skip building on multiple architectures labels Oct 21, 2022
@Madhu-1 Madhu-1 requested review from nixpanic, pkalever and a team October 21, 2022 12:28
@Madhu-1
Copy link
Collaborator Author

Madhu-1 commented Oct 24, 2022

I1024 11:00:47.851488       1 utils.go:195] ID: 12 GRPC call: /csi.v1.Node/NodeGetVolumeStats
I1024 11:00:47.851916       1 utils.go:206] ID: 12 GRPC request: {"volume_id":"0001-0009-rook-ceph-0000000000000001-7fed1ce7-97cf-43ef-9b84-2a49ab992515","volume_path":"/var/lib/kubelet/pods/8087df68-9756-4f38-86ef-6c81e1075607/volumes/kubernetes.io~csi/pvc-15e63d0a-77de-4886-8d0f-516f9fecbeb4/mount"}
I1024 11:00:47.854448       1 utils.go:212] ID: 12 GRPC response: {"usage":[{"available":1073741824,"total":1073741824,"unit":1}]}
I1024 11:02:11.800530       1 utils.go:195] ID: 13 GRPC call: /csi.v1.Node/NodeGetCapabilities
I1024 11:02:11.800683       1 utils.go:206] ID: 13 GRPC request: {}
I1024 11:02:11.801084       1 utils.go:212] ID: 13 GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":2}}},{"Type":{"Rpc":{"type":5}}}]}
I1024 11:02:11.803017       1 utils.go:195] ID: 14 GRPC call: /csi.v1.Node/NodeGetVolumeStats
I1024 11:02:11.803429       1 utils.go:206] ID: 14 GRPC request: {"volume_id":"0001-0009-rook-ceph-0000000000000001-7fed1ce7-97cf-43ef-9b84-2a49ab992515","volume_path":"/var/lib/kubelet/pods/8087df68-9756-4f38-86ef-6c81e1075607/volumes/kubernetes.io~csi/pvc-15e63d0a-77de-4886-8d0f-516f9fecbeb4/mount"}
W1024 11:02:12.879745       1 nodeserver.go:637] ID: 14 corrupted mount detected in "/var/lib/kubelet/pods/8087df68-9756-4f38-86ef-6c81e1075607/volumes/kubernetes.io~csi/pvc-15e63d0a-77de-4886-8d0f-516f9fecbeb4/mount": stat /var/lib/kubelet/pods/8087df68-9756-4f38-86ef-6c81e1075607/volumes/kubernetes.io~csi/pvc-15e63d0a-77de-4886-8d0f-516f9fecbeb4/mount: permission denied
I1024 11:02:12.880189       1 utils.go:212] ID: 14 GRPC response: {"volume_condition":{"abnormal":true,"message":"stat /var/lib/kubelet/pods/8087df68-9756-4f38-86ef-6c81e1075607/volumes/kubernetes.io~csi/pvc-15e63d0a-77de-4886-8d0f-516f9fecbeb4/mount: permission denied"}}

@Madhu-1 Madhu-1 added component/cephfs Issues related to CephFS component/rbd Issues related to RBD and removed DNM DO NOT MERGE ci/skip/e2e skip running e2e CI jobs ci/skip/multi-arch-build skip building on multiple architectures labels Oct 24, 2022
Copy link
Member

@nixpanic nixpanic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@nixpanic nixpanic requested a review from a team October 25, 2022 13:08
@Madhu-1
Copy link
Collaborator Author

Madhu-1 commented Oct 26, 2022

@Mergifyio rebase

When we do stat on the targetpath, if there is
any error we can check is it due to corruption.
If yes, cephcsi can return abnormal in the
NodeGetVolumeStats so that consumer (CO/admin)
and detect and take further action.

Signed-off-by: Madhu Rajanna <[email protected]>
When we do stat on the targetpath, if there is
any error we can check is it due to corruption.
If yes, cephcsi can return abnormal in the
NodeGetVolumeStats so that consumer (CO/admin)
and detect and take further action.

Signed-off-by: Madhu Rajanna <[email protected]>
Added a new section for the ceph kernel client
mount corruption detection and recovery.

Signed-off-by: Madhu Rajanna <[email protected]>
@mergify
Copy link
Contributor

mergify bot commented Oct 26, 2022

rebase

✅ Branch has been successfully rebased

@Madhu-1 Madhu-1 added the ok-to-test Label to trigger E2E tests label Oct 26, 2022
@github-actions
Copy link

/test ci/centos/k8s-e2e-external-storage/1.22

@github-actions
Copy link

/test ci/centos/k8s-e2e-external-storage/1.23

@github-actions
Copy link

/test ci/centos/k8s-e2e-external-storage/1.24

@github-actions
Copy link

/test ci/centos/mini-e2e-helm/k8s-1.22

@github-actions
Copy link

/test ci/centos/mini-e2e-helm/k8s-1.23

@github-actions
Copy link

/test ci/centos/mini-e2e-helm/k8s-1.24

@github-actions
Copy link

/test ci/centos/mini-e2e/k8s-1.22

@github-actions
Copy link

/test ci/centos/mini-e2e/k8s-1.23

@github-actions
Copy link

/test ci/centos/mini-e2e/k8s-1.24

@github-actions
Copy link

/test ci/centos/upgrade-tests-cephfs

@github-actions
Copy link

/test ci/centos/upgrade-tests-rbd

@mergify mergify bot merged commit 0865296 into ceph:devel Oct 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/cephfs Issues related to CephFS component/rbd Issues related to RBD ok-to-test Label to trigger E2E tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants