Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rbd: rbd-nbd is not used as mounter in k8s 1.24 #3176

Closed
Rakshith-R opened this issue Jun 13, 2022 · 6 comments · Fixed by #3207
Closed

rbd: rbd-nbd is not used as mounter in k8s 1.24 #3176

Rakshith-R opened this issue Jun 13, 2022 · 6 comments · Fixed by #3207
Assignees
Labels
bug Something isn't working component/rbd Issues related to RBD regression This issues is a regression

Comments

@Rakshith-R
Copy link
Contributor

https://jenkins-ceph-csi.apps.ocp.ci.centos.org/blue/organizations/jenkins/mini-e2e-helm_k8s-1.24/detail/mini-e2e-helm_k8s-1.24/18/pipeline/
https://jenkins-ceph-csi.apps.ocp.ci.centos.org/blue/organizations/jenkins/mini-e2e_k8s-1.24/detail/mini-e2e_k8s-1.24/19/pipeline

@pkalever , rbd nbd tests are failing


Jun  9 13:22:53.324: INFO: ExecWithOptions: execute(POST https://192.168.39.43:8443/api/v1/namespaces/cephcsi-e2e-4834c72e07c2/pods/csi-rbdplugin-787bh/exec?command=%2Fbin%2Fsh&command=-c&command=pstree+--arguments+%7C+grep+%5Br%5Dbd-nbd&container=csi-rbdplugin&container=csi-rbdplugin&stderr=true&stdout=true)
Jun  9 13:22:53.640: INFO: rbd-nbd process is not running yet: command terminated with exit code 1
Jun  9 13:22:53.640: FAIL: timed out waiting for the rbd-nbd process: rbd-nbd process is not running yet: command terminated with exit code 1

Originally posted by @Rakshith-R in #3174 (comment)

@Rakshith-R Rakshith-R added bug Something isn't working component/rbd Issues related to RBD labels Jun 13, 2022
@pkalever pkalever self-assigned this Jun 16, 2022
@pkalever pkalever added the regression This issues is a regression label Jun 22, 2022
@pkalever
Copy link

Looks like the NodeStageVolume is failing:

I0623 03:37:31.169454   30404 rbd_attach.go:231] nbd module loaded
W0623 03:37:31.171920   30404 util.go:253] kernel 5.10.57 does not support required features
W0623 03:37:31.172322   30404 rbd_attach.go:241] kernel version "5.10.57" doesn't support cookie feature
I0623 03:37:31.175049   30404 server.go:126] Listening for connections on address: &net.UnixAddr{Name:"//csi/csi.sock", Net:"unix"}
I0623 03:37:31.246950   30404 rbd_healer.go:79] sending nodeStageVolume for volID: 0001-0009-rook-ceph-0000000000000002-4bc270f2-f2a5-11ec-bd02-eadea7159bd5, stagingPath: /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-05aa109e-4ee3-4f02-a078-edc9e6efb743/globalmount
E0623 03:37:31.257770   30404 rbd_healer.go:121] nodeStageVolume request failed, volID: 0001-0009-rook-ceph-0000000000000002-4bc270f2-f2a5-11ec-bd02-eadea7159bd5, stagingPath: /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-05aa109e-4ee3-4f02-a078-edc9e6efb743/globalmount, err: rpc error: code = InvalidArgument desc = staging path /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-05aa109e-4ee3-4f02-a078-edc9e6efb743/globalmount does not exist on node
E0623 03:37:31.257979   30404 rbd_healer.go:202] callNodeStageVolume failed, err: rpc error: code = InvalidArgument desc = staging path /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-05aa109e-4ee3-4f02-a078-edc9e6efb743/globalmount does not exist on node

@Madhu-1
Copy link
Collaborator

Madhu-1 commented Jun 23, 2022

The stagingpath is changed in kubernetes 1.24 can you check on that.

@pkalever
Copy link

@Madhu-1 that is correct, the new staging path look something like

/var/lib/kubelet/plugins/kubernetes.io/csi/rook-ceph.rbd.csi.ceph.com/c74962bd63d7f7d5c1ff0a986773ba9d2e42334c0835b1d211cd6924d751c7f6/globalmount/image-meta.json

@pkalever
Copy link

@Madhu-1 do you know what is c74962bd63d7f7d5c1ff0a986773ba9d2e42334c0835b1d211cd6924d751c7f6 in the staging path above?

@humblec
Copy link
Collaborator

humblec commented Jun 23, 2022

@pkalever its sha256 of volumeHandle

@pkalever
Copy link

@humblec yep got it from https://github.com/csi-addons/kubernetes-csi-addons/pull/165/files. Thanks for jumping in.

pkalever pushed a commit to pkalever/ceph-csi that referenced this issue Jun 23, 2022
Kubernetes 1.24 and newer use a different path for staging the volume.
That means the CSI-driver is requested to mount the volume at an other
location, compared to previous versions of Kubernetes. CSI-drivers
implementing the CSI-Addons volumeHealer, must receive the correct path,
otherwise the after a nodeplugin restart the NBD mounts will bailout
attempting to NodeStageVolume() call and return an error.

See-also: kubernetes/kubernetes#107065

Fixes: ceph#3176
Signed-off-by: Prasanna Kumar Kalever <[email protected]>
pkalever pushed a commit to pkalever/ceph-csi that referenced this issue Jun 23, 2022
Kubernetes 1.24 and newer use a different path for staging the volume.
That means the CSI-driver is requested to mount the volume at an other
location, compared to previous versions of Kubernetes. CSI-drivers
implementing the volumeHealer, must receive the correct path, otherwise
the after a nodeplugin restart the NBD mounts will bailout attempting
to NodeStageVolume() call and return an error.

See-also: kubernetes/kubernetes#107065

Fixes: ceph#3176
Signed-off-by: Prasanna Kumar Kalever <[email protected]>
pkalever pushed a commit to pkalever/ceph-csi that referenced this issue Jun 23, 2022
Kubernetes 1.24 and newer use a different path for staging the volume.
That means the CSI-driver is requested to mount the volume at an other
location, compared to previous versions of Kubernetes. CSI-drivers
implementing the volumeHealer, must receive the correct path, otherwise
the after a nodeplugin restart the NBD mounts will bailout attempting
to NodeStageVolume() call and return an error.

See-also: kubernetes/kubernetes#107065

Fixes: ceph#3176
Signed-off-by: Prasanna Kumar Kalever <[email protected]>
pkalever pushed a commit to pkalever/ceph-csi that referenced this issue Jun 24, 2022
Kubernetes 1.24 and newer use a different path for staging the volume.
That means the CSI-driver is requested to mount the volume at an other
location, compared to previous versions of Kubernetes. CSI-drivers
implementing the volumeHealer, must receive the correct path, otherwise
the after a nodeplugin restart the NBD mounts will bailout attempting
to NodeStageVolume() call and return an error.

See-also: kubernetes/kubernetes#107065

Fixes: ceph#3176
Signed-off-by: Prasanna Kumar Kalever <[email protected]>
pkalever pushed a commit to pkalever/ceph-csi that referenced this issue Jun 24, 2022
Kubernetes 1.24 and newer use a different path for staging the volume.
That means the CSI-driver is requested to mount the volume at an other
location, compared to previous versions of Kubernetes. CSI-drivers
implementing the volumeHealer, must receive the correct path, otherwise
the after a nodeplugin restart the NBD mounts will bailout attempting
to NodeStageVolume() call and return an error.

See-also: kubernetes/kubernetes#107065

Fixes: ceph#3176
Signed-off-by: Prasanna Kumar Kalever <[email protected]>
@mergify mergify bot closed this as completed in #3207 Jun 24, 2022
mergify bot pushed a commit that referenced this issue Jun 24, 2022
Kubernetes 1.24 and newer use a different path for staging the volume.
That means the CSI-driver is requested to mount the volume at an other
location, compared to previous versions of Kubernetes. CSI-drivers
implementing the volumeHealer, must receive the correct path, otherwise
the after a nodeplugin restart the NBD mounts will bailout attempting
to NodeStageVolume() call and return an error.

See-also: kubernetes/kubernetes#107065

Fixes: #3176
Signed-off-by: Prasanna Kumar Kalever <[email protected]>
mergify bot pushed a commit that referenced this issue Jun 24, 2022
Kubernetes 1.24 and newer use a different path for staging the volume.
That means the CSI-driver is requested to mount the volume at an other
location, compared to previous versions of Kubernetes. CSI-drivers
implementing the volumeHealer, must receive the correct path, otherwise
the after a nodeplugin restart the NBD mounts will bailout attempting
to NodeStageVolume() call and return an error.

See-also: kubernetes/kubernetes#107065

Fixes: #3176
Signed-off-by: Prasanna Kumar Kalever <[email protected]>
(cherry picked from commit 1da446d)
mergify bot pushed a commit that referenced this issue Jun 24, 2022
Kubernetes 1.24 and newer use a different path for staging the volume.
That means the CSI-driver is requested to mount the volume at an other
location, compared to previous versions of Kubernetes. CSI-drivers
implementing the volumeHealer, must receive the correct path, otherwise
the after a nodeplugin restart the NBD mounts will bailout attempting
to NodeStageVolume() call and return an error.

See-also: kubernetes/kubernetes#107065

Fixes: #3176
Signed-off-by: Prasanna Kumar Kalever <[email protected]>
(cherry picked from commit 1da446d)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working component/rbd Issues related to RBD regression This issues is a regression
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants