Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mount Pod stuck in Pending state: PVC is not bound #892

Open
Exubient opened this issue Mar 8, 2024 · 14 comments
Open

Mount Pod stuck in Pending state: PVC is not bound #892

Exubient opened this issue Mar 8, 2024 · 14 comments
Labels
kind/bug Something isn't working

Comments

@Exubient
Copy link

Exubient commented Mar 8, 2024

What happened:

  • I'm trying to make JuiceFS work on a EKS-s3 setup.

  • I have managed to make the PoC application work successfully using a dynamic provisioning/ using StorageClass.

  • After having tested the whole setup works, I tried changing the configs so that the cache for the mount pod would run on a seperate PVC(EBS in my case). option: juicefs/mount-cache-pvc: XXX

  • Thus problem when doing so, the mount pod is stuck in a pending state with the error:
    Unable to attach or mount volumes: unmounted volumes=[cachedir-pvc-0], unattached volumes=[], failed to process volumes=[cachedir-pvc-0]: error processing PVC kube-system/{NAME_OF_CACHE}: PVC is not bound

  • and the pvc stuck in state:
    waiting for first consumer to be created before binding

What you expected to happen:

  • I've manually created the PVC, and know that it works find when attaching to normal pods. The PV is created and a EBS volume is created on my AWS console.
  • I would expect the same to happen when attaching the PVC to the juicefs mount pod.

How to reproduce it (as minimally and precisely as possible):

  • Dynamic Provisioning.
  • StorageClass:
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: my-fjs
provisioner: csi.juicefs.com
reclaimPolicy: Delete
allowVolumeExpansion: true
parameters:
  csi.storage.k8s.io/provisioner-secret-name: fjs-secret
  csi.storage.k8s.io/provisioner-secret-namespace: default
  csi.storage.k8s.io/node-publish-secret-name: fjs-secret
  csi.storage.k8s.io/node-publish-secret-namespace: default
  juicefs/mount-cpu-limit: 400m
  juicefs/mount-memory-limit: 1000Mi
  juicefs/mount-cpu-request: 400m
  juicefs/mount-memory-request: 1000Mi
  juicefs/mount-delete-delay: 10m
  juicefs/mount-cache-pvc: "my-cache-pvc"
mountOptions:
  - cache-dir=/var/jfsCache
  - cache-size=204800
  • Secret:
apiVersion: v1
kind: Secret
metadata:
  name: fjs-secret
type: Opaque
stringData:
  name: jfs
  metaurl: {REDIS} <- this works fine
  storage: s3
  bucket: {SOME_BUCKET}
  access-key: {KEY} 
  secret-key: {SECRET}
  • PVC:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: my-cache-pvc
  namespace: kube-system
status:
  accessModes:
    - ReadWriteOnce
  capacity:
    storage: 40Gi
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 40Gi
  storageClassName: gp3 <- this gp3 setup has no problem
  volumeMode: Filesystem
  • application:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: my-apache-0-pvc
spec:
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 10Gi
  storageClassName: my-fjs
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-apache-1
spec:
  replicas: 1
  selector:
    matchLabels:
      app: my-apache-1
  template:
    metadata:
      labels:
        app: my-apache-1
    spec:
      containers:
      - name: apache
        image: httpd:latest
        volumeMounts:
        - name: data
          mountPath: /usr/local/apache2/htdocs
      volumes:
      - name: data
        persistentVolumeClaim:
          claimName: my-apache-0-pvc
Mount Pod details
apiVersion: v1
kind: Pod
metadata:
  name: >-
    juicefs-ip-{REDACTED}-pvc-ab7baaf4-c582-42a5-a4a8-0de042fdbedb-amqjzk
  namespace: kube-system
  uid: 9cedafb1-90c1-468b-9f52-3eca6a07f1a0
  resourceVersion: '334509614'
  creationTimestamp: '2024-03-08T08:08:30Z'
  labels:
    app.kubernetes.io/name: juicefs-mount
    juicefs-hash: 0cf88f75cfece633d2d16ba0c4492c9531e5cdb1004d620bce143c22770a12a
    volume-id: pvc-ab7baaf4-c582-42a5-a4a8-0de042fdbedb
  annotations:
    juicefs-8fed6b072a2c61837bdb2f2f80858f883987db176d04483e1a3945f: >-
      /var/lib/kubelet/pods/a3d020e8-95b3-4efb-be5c-4244e330bf21/volumes/kubernetes.io~csi/pvc-ab7baaf4-c582-42a5-a4a8-0de042fdbedb/mount
    juicefs-delete-delay: 10m
    juicefs-uniqueid: pvc-ab7baaf4-c582-42a5-a4a8-0de042fdbedb
    juicefs-uuid: ''
  finalizers:
    - juicefs.com/finalizer
  managedFields:
    - manager: juicefs-csi-driver
      operation: Update
      apiVersion: v1
      time: '2024-03-08T08:08:30Z'
      fieldsType: FieldsV1
      fieldsV1:
        f:metadata:
          f:annotations:
            .: {}
            f:juicefs-8fed6b072a2c61837bdb2f2f80858f883987db176d04483e1a3945f: {}
            f:juicefs-delete-delay: {}
            f:juicefs-uniqueid: {}
            f:juicefs-uuid: {}
          f:finalizers:
            .: {}
            v:"juicefs.com/finalizer": {}
          f:labels:
            .: {}
            f:app.kubernetes.io/name: {}
            f:juicefs-hash: {}
            f:volume-id: {}
        f:spec:
          f:containers:
            k:{"name":"jfs-mount"}:
              .: {}
              f:command: {}
              f:env:
                .: {}
                k:{"name":"JFS_FOREGROUND"}:
                  .: {}
                  f:name: {}
                  f:value: {}
              f:envFrom: {}
              f:image: {}
              f:imagePullPolicy: {}
              f:lifecycle:
                .: {}
                f:preStop:
                  .: {}
                  f:exec:
                    .: {}
                    f:command: {}
              f:name: {}
              f:ports:
                .: {}
                k:{"containerPort":9567,"protocol":"TCP"}:
                  .: {}
                  f:containerPort: {}
                  f:name: {}
                  f:protocol: {}
              f:resources:
                .: {}
                f:limits:
                  .: {}
                  f:cpu: {}
                  f:memory: {}
                f:requests:
                  .: {}
                  f:cpu: {}
                  f:memory: {}
              f:securityContext:
                .: {}
                f:privileged: {}
                f:runAsUser: {}
              f:terminationMessagePath: {}
              f:terminationMessagePolicy: {}
              f:volumeMounts:
                .: {}
                k:{"mountPath":"/etc/updatedb.conf"}:
                  .: {}
                  f:mountPath: {}
                  f:name: {}
                k:{"mountPath":"/jfs"}:
                  .: {}
                  f:mountPath: {}
                  f:mountPropagation: {}
                  f:name: {}
                k:{"mountPath":"/var/jfsCache"}:
                  .: {}
                  f:mountPath: {}
                  f:name: {}
                k:{"mountPath":"/var/jfsCache-0"}:
                  .: {}
                  f:mountPath: {}
                  f:name: {}
          f:dnsPolicy: {}
          f:enableServiceLinks: {}
          f:nodeName: {}
          f:preemptionPolicy: {}
          f:priorityClassName: {}
          f:restartPolicy: {}
          f:schedulerName: {}
          f:securityContext: {}
          f:serviceAccount: {}
          f:serviceAccountName: {}
          f:terminationGracePeriodSeconds: {}
          f:tolerations: {}
          f:volumes:
            .: {}
            k:{"name":"cachedir-0"}:
              .: {}
              f:hostPath:
                .: {}
                f:path: {}
                f:type: {}
              f:name: {}
            k:{"name":"cachedir-pvc-0"}:
              .: {}
              f:name: {}
              f:persistentVolumeClaim:
                .: {}
                f:claimName: {}
            k:{"name":"jfs-dir"}:
              .: {}
              f:hostPath:
                .: {}
                f:path: {}
                f:type: {}
              f:name: {}
            k:{"name":"updatedb"}:
              .: {}
              f:hostPath:
                .: {}
                f:path: {}
                f:type: {}
              f:name: {}
    - manager: kubelet
      operation: Update
      apiVersion: v1
      time: '2024-03-08T08:08:30Z'
      fieldsType: FieldsV1
      fieldsV1:
        f:status:
          f:conditions:
            .: {}
            k:{"type":"ContainersReady"}:
              .: {}
              f:lastProbeTime: {}
              f:lastTransitionTime: {}
              f:message: {}
              f:reason: {}
              f:status: {}
              f:type: {}
            k:{"type":"Initialized"}:
              .: {}
              f:lastProbeTime: {}
              f:lastTransitionTime: {}
              f:status: {}
              f:type: {}
            k:{"type":"PodScheduled"}:
              .: {}
              f:lastProbeTime: {}
              f:lastTransitionTime: {}
              f:status: {}
              f:type: {}
            k:{"type":"Ready"}:
              .: {}
              f:lastProbeTime: {}
              f:lastTransitionTime: {}
              f:message: {}
              f:reason: {}
              f:status: {}
              f:type: {}
          f:containerStatuses: {}
          f:hostIP: {}
          f:startTime: {}
      subresource: status
  selfLink: >-
    /api/v1/namespaces/kube-system/pods/juicefs-{REDACTED}-pvc-ab7baaf4-c582-42a5-a4a8-0de042fdbedb-amqjzk
status:
  phase: Pending
  conditions:
    - type: Initialized
      status: 'True'
      lastProbeTime: null
      lastTransitionTime: '2024-03-08T08:08:30Z'
    - type: Ready
      status: 'False'
      lastProbeTime: null
      lastTransitionTime: '2024-03-08T08:08:30Z'
      reason: ContainersNotReady
      message: 'containers with unready status: [jfs-mount]'
    - type: ContainersReady
      status: 'False'
      lastProbeTime: null
      lastTransitionTime: '2024-03-08T08:08:30Z'
      reason: ContainersNotReady
      message: 'containers with unready status: [jfs-mount]'
    - type: PodScheduled
      status: 'True'
      lastProbeTime: null
      lastTransitionTime: '2024-03-08T08:08:30Z'
  hostIP: 10.0.27.65
  startTime: '2024-03-08T08:08:30Z'
  containerStatuses:
    - name: jfs-mount
      state:
        waiting:
          reason: ContainerCreating
      lastState: {}
      ready: false
      restartCount: 0
      image: juicedata/mount:ce-v1.1.1
      imageID: ''
      started: false
  qosClass: Guaranteed
spec:
  volumes:
    - name: jfs-dir
      hostPath:
        path: /var/lib/juicefs/volume
        type: DirectoryOrCreate
    - name: updatedb
      hostPath:
        path: /etc/updatedb.conf
        type: FileOrCreate
    - name: cachedir-0
      hostPath:
        path: /var/jfsCache
        type: DirectoryOrCreate
    - name: cachedir-pvc-0
      persistentVolumeClaim:
        claimName: my-cache-pvc
    - name: kube-api-access-ph6zj
      projected:
        sources:
          - serviceAccountToken:
              expirationSeconds: 3607
              path: token
          - configMap:
              name: kube-root-ca.crt
              items:
                - key: ca.crt
                  path: ca.crt
          - downwardAPI:
              items:
                - path: namespace
                  fieldRef:
                    apiVersion: v1
                    fieldPath: metadata.namespace
        defaultMode: 420
  containers:
    - name: jfs-mount
      image: juicedata/mount:ce-v1.1.1
      command:
        - sh
        - '-c'
        - >-
          /usr/local/bin/juicefs format --storage=s3
          --bucket={REDACTED}
          --access-key={REDACTED} --secret-key=${secretkey} ${metaurl}
          {projectname}

          /bin/mount.juicefs ${metaurl}
          /jfs/pvc-ab7baaf4-c582-42a5-a4a8-0de042fdbedb-amqjzk -o
          cache-size=204800,cache-dir=/var/jfsCache-0:/var/jfsCache,metrics=0.0.0.0:9567
      ports:
        - name: metrics
          containerPort: 9567
          protocol: TCP
      envFrom:
        - secretRef:
            name: >-
              juicefs-ip-{REDACTED}-pvc-ab7baaf4-c582-42a5-a4a8-0de042fdbedb-amqjzk-secret
      env:
        - name: JFS_FOREGROUND
          value: '1'
      resources:
        limits:
          cpu: 400m
          memory: 1000Mi
        requests:
          cpu: 400m
          memory: 1000Mi
      volumeMounts:
        - name: jfs-dir
          mountPath: /jfs
          mountPropagation: Bidirectional
        - name: updatedb
          mountPath: /etc/updatedb.conf
        - name: cachedir-0
          mountPath: /var/jfsCache
        - name: cachedir-pvc-0
          mountPath: /var/jfsCache-0
        - name: kube-api-access-ph6zj
          readOnly: true
          mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      lifecycle:
        preStop:
          exec:
            command:
              - sh
              - '-c'
              - >-
                umount /jfs/pvc-ab7baaf4-c582-42a5-a4a8-0de042fdbedb-amqjzk -l
                && rmdir /jfs/pvc-ab7baaf4-c582-42a5-a4a8-0de042fdbedb-amqjzk
      terminationMessagePath: /dev/termination-log
      terminationMessagePolicy: File
      imagePullPolicy: IfNotPresent
      securityContext:
        privileged: true
        runAsUser: 0
  restartPolicy: Always
  terminationGracePeriodSeconds: 10
  dnsPolicy: ClusterFirstWithHostNet
  serviceAccountName: juicefs-csi-node-sa
  serviceAccount: juicefs-csi-node-sa
  nodeName: {REDACTED}
  securityContext: {}
  schedulerName: default-scheduler
  tolerations:
    - key: CriticalAddonsOnly
      operator: Exists
    - key: node.kubernetes.io/not-ready
      operator: Exists
      effect: NoExecute
    - key: node.kubernetes.io/unreachable
      operator: Exists
      effect: NoExecute
    - key: node.kubernetes.io/disk-pressure
      operator: Exists
      effect: NoSchedule
    - key: node.kubernetes.io/memory-pressure
      operator: Exists
      effect: NoSchedule
    - key: node.kubernetes.io/pid-pressure
      operator: Exists
      effect: NoSchedule
    - key: node.kubernetes.io/unschedulable
      operator: Exists
      effect: NoSchedule
  priorityClassName: system-node-critical
  priority: 2000001000
  enableServiceLinks: true
  preemptionPolicy: PreemptLowerPriority

Anything else we need to know?

  • I'd like to point out that without juicefs/mount-cache-pvc: "my-cache-pvc", the above settings works fine.
  • I'm really curious why PV is not created by the PVC from the juiceFS mount pod (in namespace: kube-system)

Environment:

  • JuiceFS version (use juicefs --version) or Hadoop Java SDK version: Using image juicedata/mount:ce-v1.1.1
  • Cloud provider or hardware configuration running JuiceFS: AWS
  • OS (e.g cat /etc/os-release):
  • Kernel (e.g. uname -a):
  • Object storage (cloud provider and region, or self maintained): S3
  • Metadata engine info (version, cloud provider managed or self maintained):
  • Network connectivity (JuiceFS to metadata engine, JuiceFS to object storage):
  • Others:
    • All the above I'm not able to access since I'm not able to debug and enter the mount pod (since cache volume is not being mounted)

Thank you!

@Exubient Exubient added the kind/bug Something isn't working label Mar 8, 2024
@zhijian-pro zhijian-pro transferred this issue from juicedata/juicefs Mar 8, 2024
@showjason
Copy link
Contributor

Hi @Exubient, it seems like the status of cache pvc my-cache-pvc is not bound, which results in the mount pod can't find my-cache-pvc. In my opinion, you should address the issue of cache pvc firstly.

@Exubient
Copy link
Author

Hi @showjason,
could you please elaborate on what you mean by address the issue of cache pvc?
From documentation, I don't see where the above setup could have gone wrong..

@zxh326
Copy link
Member

zxh326 commented Mar 11, 2024

plz check my-cache-pvc created first and was bound.

You may need to confirm the gp3 storageClass is not working reason...

@showjason
Copy link
Contributor

Hi @showjason, could you please elaborate on what you mean by address the issue of cache pvc? From documentation, I don't see where the above setup could have gone wrong..

I mean you should firstly make sure the cache pvc my-cache-pvc is created and works properly.
The error message PVC is not bound indicates the status of my-cache-pvc is not bound, you need to find the reason.
Perhaps, the storageClass gp3 can't create pv as expected or something else, but not caused by juicefs-csi-driver, you need to figure it out.

@Exubient
Copy link
Author

Exubient commented Mar 12, 2024

Thanks for the input, but i would like to add that gp3 is working fine as the current setting.
for example, I've just confirmed that with the following two files the pvc is correctly mounted and the EBS volume is created.

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: some-working-pvc
status:
  accessModes:
    - ReadWriteOnce
  capacity:
    storage: 40Gi
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 40Gi
  storageClassName: gp3
  volumeMode: Filesystem
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-apache-1
spec:
  replicas: 1
  selector:
    matchLabels:
      app: my-apache-1
  template:
    metadata:
      labels:
        app: my-apache-1
    spec:
      containers:
      - name: apache
        image: httpd:latest
        volumeMounts:
        - name: data
          mountPath: /usr/local/apache2/htdocs
      volumes:
      - name: data
        persistentVolumeClaim:
          claimName: some-working-pvc

@showjason
Copy link
Contributor

showjason commented Mar 12, 2024

Thanks for the input, but i would like to add that gp3 is working fine as the current setting. for example, I've just confirmed that with the following two files the pvc is correctly mounted and the EBS volume is created.

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: some-working-pvc
status:
  accessModes:
    - ReadWriteOnce
  capacity:
    storage: 40Gi
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 40Gi
  storageClassName: gp3
  volumeMode: Filesystem
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-apache-1
spec:
  replicas: 1
  selector:
    matchLabels:
      app: my-apache-1
  template:
    metadata:
      labels:
        app: my-apache-1
    spec:
      containers:
      - name: apache
        image: httpd:latest
        volumeMounts:
        - name: data
          mountPath: /usr/local/apache2/htdocs
      volumes:
      - name: data
        persistentVolumeClaim:
          claimName: some-working-pvc

The deployment order is:

  1. Create the cache pvc my-cache-pvc in namespace kube-system, confirm the my-cache-pvc is created successfully and status must be bound. You said the pvc stuck in state waiting for first consumer to be created before binding, you must find the root cause and fix it.
  2. Deploy the application.

My suggestion is deploying the cache pvc again and checking its status, if bound, deploy the application, if not bound, fix it, then deploy the application.

@Exubient
Copy link
Author

@showjason
thank you once more for the input.

It's quite frustrating for me to debug this issue for the following reasons.

  1. I'm unable to enter the my pod juicefs-ip-10-0-44-104-pvc-c6b52a01-70aa-44a0-9594-5f8900ff88a6-stgvzy as the mount failed.
  2. From my perspective, all my other pods/pvcs that use storageClass gp3 works fine and in well bound to the pod. Just this one using by the mount pod juicefs-ip-10-0-44-104-pvc-c6b52a01-70aa-44a0-9594-5f8900ff88a6-stgvzy
image
  1. when I run kubectl describe pvc my-cache-pvc -n kube-system, WaitForFirstConsumer does not make sense since it's already Used by the mount pod like the below.
Name:          my-cache-pvc
Namespace:     kube-system
StorageClass:  gp3
Status:        Pending
Volume:
Labels:        <none>
Annotations:   <none>
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:
Access Modes:
VolumeMode:    Filesystem
Used By:       juicefs-ip-10-0-44-104-pvc-c6b52a01-70aa-44a0-9594-5f8900ff88a6-stgvzy
Events:
  Type    Reason                Age                     From                         Message
  ----    ------                ----                    ----                         -------
  Normal  WaitForFirstConsumer  2m13s (x26 over 8m17s)  persistentvolume-controller  waiting for first consumer to be created before binding

@showjason
Copy link
Contributor

@Exubient
try to list the pv with kubectl get pv to check the persistent volume corresponding to cache pvc my-cache-pvc is created or not

@Exubient
Copy link
Author

Exubient commented Mar 14, 2024

the pv is not created, only the pvc in stuck in the Pending state

@showjason
Copy link
Contributor

try to figure out why this pv is not created, maybe this is due to the resourcequota in namespace kube-system storage resource quota or something else, you need to check the kubernetes cluster.

@chenmiao1991
Copy link

我使用静态pv也碰到同样的问题,这个多副本的情况下必现的。因为你的缓存pvc是ReadWriteOnce的。第一个mount pod可以正常应用缓存pvc,第二个mount pod会因为缓存pvc不能多挂而pending

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: my-cache-pvc
  namespace: kube-system
status:
  accessModes:
    - ReadWriteOnce    <--这里
  capacity:
    storage: 40Gi
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 40Gi
  storageClassName: gp3 <- this gp3 setup has no problem
  volumeMode: Filesystem

@chenmiao1991
Copy link

To fix this bug, it is recommended to use volumeClaimTemplates like juicefs/mount-cache-pvcTemplates instead of juicefs/mount-cache-pvc. Hope that the authorities will implement it

......
  volumeClaimTemplates:
  - metadata:
      name: bs-cache-pvc
    spec:
      accessModes:
      - ReadWriteOnce
      resources:
        requests:
          storage: 5Gi
      storageClassName: block-storage

@zxh326
Copy link
Member

zxh326 commented Mar 20, 2024

for this case, changing the volumeBindingMode of the gp3 storageClass to Immediate should be able to solve this problem.

@chenmiao1991
Copy link

I open a new issue. Phenomenon is different. #906

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants