Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Race condition when mounting RBD static PVs with the same image name in different pools #4526

Open
AsPulse opened this issue Mar 31, 2024 · 1 comment

Comments

@AsPulse
Copy link

AsPulse commented Mar 31, 2024

Describe the bug

When two RBD Images with the same name in different pools are attempted to be mounted as staticVolume following this procedure, only the one attempted to be mounted first will succeed.

The later one succeeds in creating the PV/PVC and associating the pod with the PVC, but the pod remains in the Pending state with the following error displayed in the events in the pod:

Multi-Attach error for volume "foo-pv" Volume is already used by 1 pod(s) in different namespaces

Environment details

  • Image/version of Ceph CSI driver : v3.10.2
  • Kernel version : 6.5.0-26-generic
  • Mounter used for mounting PVC : rbd
  • Kubernetes cluster version : v1.29.3
  • Ceph cluster version : 18.2.1 reef (stable)

Steps to reproduce

Steps to reproduce the behavior:

  1. Setup details:
    • Setup Rook-ceph Cluster
    • Create pools named foo and bar, enabled application rbd
    • Create RBD Image test under each 2 pools
  2. Deploy below 2 PVs
    apiVersion: v1
    kind: PersistentVolume
    metadata:
      name: foo-pv
    spec:
      accessModes:
      - ReadWriteOnce
      capacity:
        storage: 1Gi
      csi:
        driver: rook-ceph.rbd.csi.ceph.com
        fsType: ext4
        nodeStageSecretRef:
          name: rook-csi-rbd-node
          namespace: rook-ceph
        volumeAttributes:
          clusterID: "rook-ceph"
          pool: "foo"
          staticVolume: "true"
          imageFeatures: "layering,fast-diff,object-map,deep-flatten,exclusive-lock"
        volumeHandle: test
      persistentVolumeReclaimPolicy: Retain
      volumeMode: Filesystem
    ---
    apiVersion: v1
    kind: PersistentVolume
    metadata:
      name: bar-pv
    spec:
      accessModes:
      - ReadWriteOnce
      capacity:
        storage: 1Gi
      csi:
        driver: rook-ceph.rbd.csi.ceph.com
        fsType: ext4
        nodeStageSecretRef:
          name: rook-csi-rbd-node
          namespace: rook-ceph
        volumeAttributes:
          clusterID: "rook-ceph"
          pool: "bar"
          staticVolume: "true"
          imageFeatures: "layering,fast-diff,object-map,deep-flatten,exclusive-lock"
        volumeHandle: test
      persistentVolumeReclaimPolicy: Retain
      volumeMode: Filesystem
  3. Claim above 2 PVs from some pods placed in different namespaces.
  4. See error
    Multi-Attach error for volume "foo-pv" Volume is already used by 1 pod(s) in different namespaces
    
    or
    Multi-Attach error for volume "bar-pv" Volume is already used by 1 pod(s) in different namespaces
    

Actual results

Later mounted pods will stack at Pending due to an error.

Expected behavior

Both pods successfully mount the RBD.

Logs

If the issue is in PVC mounting please attach complete logs of below containers.

  • csi-rbdplugin
    No logs are apeeared
  • driver-registrar
    I0331 07:32:36.733019  899351 main.go:135] Version: v2.10.0
    I0331 07:32:36.733206  899351 main.go:136] Running node-driver-registrar in mode=
    I0331 07:32:43.746178  899351 node_register.go:55] Starting Registration Server at: /registration/rook-ceph.rbd.csi.ceph.com-reg.sock
    I0331 07:32:43.746370  899351 node_register.go:64] Registration Server started at: /registration/rook-ceph.rbd.csi.ceph.com-reg.sock
    I0331 07:32:43.747289  899351 node_register.go:88] Skipping HTTP server because endpoint is set to: ""
    I0331 07:32:44.681946  899351 main.go:90] Received GetInfo call: &InfoRequest{}
    I0331 07:32:48.988262  899351 main.go:101] Received NotifyRegistrationStatus call: &RegistrationStatus{PluginRegistered:true,Error:,}
    

Additional context

I used Rook ceph, but I don't think the problem is related to Rook.

@Madhu-1
Copy link
Collaborator

Madhu-1 commented Apr 2, 2024

@AsPulse i believe the error is coming from the kubernetes please check kubelet logs. is it possible to use different names in volumeHandle in the PV?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants