-
Notifications
You must be signed in to change notification settings - Fork 552
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ceph-csi for rbd cannot mount image after upgrade k8s to v1.31 #5066
Comments
Hi, these lines in the logs show what's happening best:
Does this problem happen with any RBD-image, pre-existing as well as newly created ones? Kubernetes 1.31 (minikube) is part of the CI runs that are done for every change, so there must be something else that causes the failure. Were there any other changes made? Upgrade of the OS on the nodes, updates of the Ceph cluster, ... ? |
The working node and the non-working node only different in the version of kubernetes and crio. There is no issue for creating the new volume (both pvc and pv): Name: ceph-rbd-vol
Namespace: kube-system
StorageClass: dynamic-ceph-storage
Status: Bound
Volume: pvc-d57cd08f-222d-492f-9a9e-8d08a93c86aa
Labels: <none>
Annotations: pv.kubernetes.io/bind-completed: yes
pv.kubernetes.io/bound-by-controller: yes
volume.beta.kubernetes.io/storage-provisioner: rbd.csi.ceph.com
volume.kubernetes.io/storage-provisioner: rbd.csi.ceph.com
Finalizers: [kubernetes.io/pvc-protection]
Capacity: 1Gi
Access Modes: RWO
VolumeMode: Filesystem
Used By: ceph-rbd-test
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal ExternalProvisioning 32s (x2 over 32s) persistentvolume-controller Waiting for a volume to be created either by the external provisioner 'rbd.csi.ceph.com' or manually by the system administrator. If volume creation is delayed, please verify that the provisioner is running and correctly registered.
Normal Provisioning 32s rbd.csi.ceph.com_csi-rbdplugin-provisioner-7864679-n4flm_bb314cb9-8c88-4764-8299-8049e1a8e696 External provisioner is provisioning volume for claim "kube-system/ceph-rbd-vol"
Normal ProvisioningSucceeded 32s rbd.csi.ceph.com_csi-rbdplugin-provisioner-7864679-n4flm_bb314cb9-8c88-4764-8299-8049e1a8e696 Successfully provisioned volume pvc-d57cd08f-222d-492f-9a9e-8d08a93c86aa Name: pvc-d57cd08f-222d-492f-9a9e-8d08a93c86aa
Labels: <none>
Annotations: pv.kubernetes.io/provisioned-by: rbd.csi.ceph.com
volume.kubernetes.io/provisioner-deletion-secret-name: ceph-user-secret
volume.kubernetes.io/provisioner-deletion-secret-namespace: access-control
Finalizers: [external-provisioner.volume.kubernetes.io/finalizer kubernetes.io/pv-protection]
StorageClass: dynamic-ceph-storage
Status: Bound
Claim: kube-system/ceph-rbd-vol
Reclaim Policy: Delete
Access Modes: RWO
VolumeMode: Filesystem
Capacity: 1Gi
Node Affinity: <none>
Message:
Source:
Type: CSI (a Container Storage Interface (CSI) volume source)
Driver: rbd.csi.ceph.com
FSType: ext4
VolumeHandle: 0001-0024---masked---0000000000000007-1b828329-2005-4798-8803-65f3ae1c330c
ReadOnly: false
VolumeAttributes: clusterID=--masked--
imageFeatures=layering
imageName=csi-vol-1b828329-2005-4798-8803-65f3ae1c330c
journalPool=k8s-sharedpool
pool=k8s-sharedpool
storage.kubernetes.io/csiProvisionerIdentity=1736421088713-9702-rbd.csi.ceph.com
Events: <none> The only issue is with mounting kdp ceph-rbd-test
Name: ceph-rbd-test
Namespace: kube-system
Priority: 0
Service Account: default
Node: --masked--
Start Time: Thu, 09 Jan 2025 17:12:49 +0100
Labels: <none>
Annotations: <none>
Status: Pending
SeccompProfile: RuntimeDefault
IP:
IPs: <none>
Containers:
ceph-rbd-test:
Container ID:
Image: --masked--/library/alpine:3.17
Image ID:
Port: <none>
Host Port: <none>
Command:
/bin/bash
Args:
-c
touch /usr/share/cephdir/hello && ls -lsah /usr/share/cephdir && rm -f /usr/share/cephdir/hello && sleep 1d
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Environment: <none>
Mounts:
/usr/share/cephdir from ceph-vol (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-qdq8c (ro)
Conditions:
Type Status
PodReadyToStartContainers False
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
ceph-vol:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: ceph-rbd-vol
ReadOnly: false
kube-api-access-qdq8c:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedMount 15s kubelet Unable to attach or mount volumes: unmounted volumes=[ceph-vol], unattached volumes=[], failed to process volumes=[ceph-vol]: error processing PVC kube-system/ceph-rbd-vol: PVC is not bound
Normal SuccessfulAttachVolume 15s attachdetach-controller AttachVolume.Attach succeeded for volume "pvc-d57cd08f-222d-492f-9a9e-8d08a93c86aa"
Warning FailedMount 0s (x2 over 1s) kubelet MountVolume.MountDevice failed for volume "pvc-d57cd08f-222d-492f-9a9e-8d08a93c86aa" : rpc error: code = Internal desc = exit status 1 |
If i remember correctly there was a bug in crio, @iPraveenParihar or @Nikhil-Ladha might have link to it. |
This is the fix for the issue: containers/crun#1614 |
I can confirm now that the issue is fixed after I replaced the crun v1.18 in the static release bundle of cri-o by crun-1.19.1 released 3 weeks ago. |
Describe the bug
After we upgraded the K8S cluster from 1.30.4 to 1.31.4, ceph-rbdplugin cannot mount the image anymore. It still works fine on node that has kubelet of version 1.30.4.
In the beginning, we have ceph-csi v3.12.1. The error occured, so we try upgrading to v3.13.0 to see if it can fix the issue, but it's still the same.
Environment details
fuse
orkernel
. for rbd itskrbd
orrbd-nbd
) : krbdSteps to reproduce
Steps to reproduce the behavior:
Storage class:
User permission:
We also try with the new capabilities docs but it has no help
Pod stuck in Init stage and reported error:
Actual results
Node can map the block device but cannot mount it. From the logs, I think the driver try to grep info of the block device using blkid command but not success. Everything works fine when we have kubelet v1.30.
Expected behavior
Node can map and mount the block device to provide to the pods.
Logs
If the issue is in PVC mounting please attach complete logs of below containers.
plugin pod from the node where the mount is failing.
Note:- If its a rbd issue please provide only rbd related logs, if its a
cephFS issue please provide cephFS logs.
The text was updated successfully, but these errors were encountered: