Deploying postgres databasa with csi volumes
Hello I am trying to create postgres database in its own namespace and attach PersistentVolume on it.
I created a cluster with LKE so I have csi driver already installed.
Also secret postgres-credentials is also created.
This is my yaml file for database
# Persistent Volume Claim
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
namespace: postgres
name: postgres-pvc
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
storageClassName: linode-block-storage-retain
# Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
namespace: postgres
name: postgres-deployment
spec:
selector:
matchLabels:
app: postgres-container
template:
metadata:
labels:
app: postgres-container
spec:
containers:
- name: postgres-container
image: postgres:9.6.6
env:
- name: POSTGRES_USER
valueFrom:
secretKeyRef:
name: postgres-credentials
key: POSTGRES_USER
- name: POSTGRES_DB
valueFrom:
secretKeyRef:
name: postgres-credentials
key: POSTGRES_DB
- name: POSTGRES_PASSWORD
valueFrom:
secretKeyRef:
name: postgres-credentials
key: POSTGRES_PASSWORD
ports:
- containerPort: 5432
volumeMounts:
- mountPath: /var/lib/postgresql/data
name: postgres-volume-mount
volumes:
- name: postgres-volume-mount
persistentVolumeClaim:
claimName: postgres-pvc
apiVersion: v1
kind: Service
metadata:
namespace: postgres
name: postgres-service
spec:
selector:
app: postgres-container
ports:
- port: 5432
protocol: TCP
targetPort: 5432
type: NodePort
When i go to my cloud linode dashboard I see volume is created and eveything seems fine.
This are some events in postgres pod that is showing me error.
MountVolume.MountDevice failed for volume "pvc-aa9e0765c2c74cb7" : rpc error: code = Internal desc = Unable to find device path out of attempted paths: [/dev/disk/by-id/linode-pvcaa9e0765c2c74cb7 /dev/disk/by-id/scsi-0Linode_Volume_pvcaa9e0765c2c74cb7]
Unable to attach or mount volumes: unmounted volumes=[postgres-volume-mount], unattached volumes=[postgres-volume-mount default-token-mgvtv]: timed out waiting for the condition
14 Replies
Hey there -
This is a tough one, but I've come across this situation before so I want to give a couple of things to look into.
One thing to look for is to make sure that your Volume is only being mounted on a single container. If there are multiple containers attempting to mount it, the job would fail.
I've also seen this happen as a result of a syntax error. My recommendation is to go through your manifest to make sure everything is formatted correctly (no extra spaces, tabs, or anything like that).
I hope this helps!
Did you find a resolution to this? I'm having a similar problem at the moment:
AttachVolume.Attach succeeded for volume "pvc-a1b6aa5..."
MountVolume.MountDevice failed for volume "pvc-a1b6aa5..." : rpc error: code = Internal desc = Unable to find device path out of attempted paths: [/dev/disk/by-id/linode-pvca1b6aa53... /dev/disk/by-id/scsi-0Linode_Volume_pvca1b6aa53...]
Did you find a resolution to this? I'm having a similar problem at the moment.
MountVolume.MountDevice failed for volume "pvc-d23fbce33cee4fa7" : rpc error: code = Internal desc = Unable to find device path out of attempted paths: [/dev/disk/by-id/linode-pvcd23fbce33cee4fa7 /dev/disk/by-id/scsi-0Linode_Volume_pvcd23fbce33cee4fa7]
One thing to look for is to make sure that your Volume is only being mounted on a single container. If there are multiple containers attempting to mount it, the job would fail.
RWO volumes can be mounted by multiple pods, as long as they're on the same node, right?
I started seeing this error when migrating applications to a new node pool. In at least several cases, I was able to fix it by manually detaching and then reattaching the volume to the node via the https://cloud.linode.com/volumes UI. (Whether or not it was safe to do, I'm not sure.)
I have the same problem, using the exact same statements in the yaml files, are there any solutions for this ?
This will work, but you might need to wait for the first mount to fail, which can take 10 minutes.
Simply delete the VolumeAttachment object in Kubernetes, OR from the Linode Cloud Manager UI detach the volume. Then recreate the pod and be patient for around 10 minutes.
This is obviously not great if you're running a high volume production application, but in that case it's best not to run your database on Kubernetes.
Experiencing the same issue here too, same conditions, but not with postgres.
Same issue here. Re-installing/Upgrading/Redeploying the wordpress app results in the same error:
MountVolume.MountDevice failed for volume "pvc-db413a06bd404b84" : rpc error: code = Internal desc = Unable to find device path out of attempted paths: [/dev/disk/by-id/linode-pvcdb413a06bd404b84 /dev/disk/by-id/scsi-0Linode_Volume_pvcdb413a06bd404b84]
Getting tired of these Volume issues to be honest.
My Postgres app redeployed and was assigned a new node. The volume got automatically detached and attached to that new node. Container/Pod failed to start with Mounting errors. I redeployed the Pod back onto it's original Node. Volume was successfully mounted back to the old Node, but the Pod/Container still won't mount:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedMount 30m (x9 over 50m) kubelet Unable to attach or mount volumes: unmounted volumes=[data], unattached volumes=[dshm data kube-api-access-n6t5h]: timed out waiting for the condition
Warning FailedMount 16m (x2 over 25m) kubelet Unable to attach or mount volumes: unmounted volumes=[data], unattached volumes=[data kube-api-access-n6t5h dshm]: timed out waiting for the condition
Warning FailedMount 12m (x3 over 39m) kubelet Unable to attach or mount volumes: unmounted volumes=[data], unattached volumes=[kube-api-access-n6t5h dshm data]: timed out waiting for the condition
Warning FailedMount 92s (x33 over 52m) kubelet MountVolume.MountDevice failed for volume "pvc-19d050b1a14040c6" : rpc error: code = Internal desc = Unable to find device path out of attempted paths: [/dev/disk/by-id/linode-pvc19d050b1a14040c6 /dev/disk/by-id/scsi-0Linode_Volume_pvc19d050b1a14040c6]<
PVC description
Name: data-fanzy-postgresql-dev-0
Namespace: fanzy-dev
StorageClass: linode-block-storage-retain
Status: Bound
Volume: pvc-19d050b1a14040c6
Labels: app.kubernetes.io/component=primary
app.kubernetes.io/instance=fanzy-postgresql-dev
app.kubernetes.io/name=postgresql
Annotations: pv.kubernetes.io/bind-completed: yes
pv.kubernetes.io/bound-by-controller: yes
volume.beta.kubernetes.io/storage-provisioner: linodebs.csi.linode.com
volume.kubernetes.io/storage-provisioner: linodebs.csi.linode.com
Finalizers: [kubernetes.io/pvc-protection]
Capacity: 10Gi
Access Modes: RWO
VolumeMode: Filesystem
Used By: fanzy-postgresql-dev-0
Events: <none></none>
PV description
Name: pvc-19d050b1a14040c6
Labels: <none>
Annotations: pv.kubernetes.io/provisioned-by: linodebs.csi.linode.com
Finalizers: [kubernetes.io/pv-protection external-attacher/linodebs-csi-linode-com]
StorageClass: linode-block-storage-retain
Status: Bound
Claim: fanzy-dev/data-fanzy-postgresql-dev-0
Reclaim Policy: Retain
Access Modes: RWO
VolumeMode: Filesystem
Capacity: 10Gi
Node Affinity: <none>
Message:
Source:
Type: CSI (a Container Storage Interface (CSI) volume source)
Driver: linodebs.csi.linode.com
FSType: ext4
VolumeHandle: 516140-pvc19d050b1a14040c6
ReadOnly: false
VolumeAttributes: storage.kubernetes.io/csiProvisionerIdentity=1662712251649-8081-linodebs.csi.linode.com
Events: <none></none></none></none>
Now I've noticed that even newly created PVC are failing to get attached to new pods/containers with the same error.
I ran through this example (https://github.com/linode/linode-blockstorage-csi-driver#create-a-kubernetes-secret) and reinstalled the drivers. PVC gets successfully created but fails to mount.
kubectl get pvc/csi-example-pvc pods/csi-example-pod
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
persistentvolumeclaim/csi-example-pvc Bound pvc-c0ea8df9e5684244 10Gi RWO linode-block-storage-retain 21m
NAME READY STATUS RESTARTS AGE
pod/csi-example-pod 0/1 ContainerCreating 0 21m
Here's the error description from the pod:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 14m default-scheduler Successfully assigned default/csi-example-pod to lke71838-112699-635487f6efa8
Warning FailedMount 3m38s kubelet Unable to attach or mount volumes: unmounted volumes=[csi-example-volume], unattached volumes=[kube-api-access-zvksd csi-example-volume]: timed out waiting for the condition
Warning FailedMount 83s (x5 over 12m) kubelet Unable to attach or mount volumes: unmounted volumes=[csi-example-volume], unattached volumes=[csi-example-volume kube-api-access-zvksd]: timed out waiting for the condition
Warning FailedAttachVolume 14s (x7 over 12m) attachdetach-controller AttachVolume.Attach failed for volume "pvc-c0ea8df9e5684244" : Attach timeout for volume 802990-pvcc0ea8df9e5684244
I think you are seeing this strange behavior because of the deployment strategy used for the deployment with persistent volume and linode-block-storage-retain
storageclass.
You need to change the strategy
for deployment to Recreate
. By default, it uses RollingUpdate
.
apiVersion: apps/v1
kind: Deployment
...
spec:
strategy:
type: Recreate
...
The difference between the Recreate
and RollingUpdate
is that Recreate strategy will terminate the old pod before creating new one while RollingUpdate will create new pod before terminating the old one. If you are not using persistent volumes, then any strategy is fine. But with persistent volumes which are supposed to attach to only one pod, if the old one is not terminated, the new one will fail to come up and will remain in ContainerCreating state waiting for the storage to show up. There can be different inconsistent results because of this behavior.
https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#strategy
For statefulsets, when using Rolling Updates with the default Pod Management Policy (OrderedReady), it's possible to get into a broken state that requires manual intervention to repair. Please check the limitations section of k8s docs for more information: https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/#limitations
Also, if you are changing something on cloud manager UI which was auto-generated by k8s, then you might end up with weird issues. Your k8s might be trying to use the name/label which it gave when provisioning the resource and might not find it as the label was later changed from the UI. I once updated the label for my volume on cloud manager UI and k8s failed to identify that as PV within k8s was still referring to the old auto-generated label. I had to clean up things to get it fixed.
I got a similar problem when stateful sets were rescheduled on a different node when the cluster size was reduced…
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 37m default-scheduler Successfully assigned default/mongo-0 to lkeABC-DEF-XYZ
Warning FailedAttachVolume 37m attachdetach-controller Multi-Attach error for volume "pvc-XYZ" Volume is already exclusively attached to one node and can't be attached to another
Normal SuccessfulAttachVolume 36m attachdetach-controller AttachVolume.Attach succeeded for volume "pvc-XYZ"
Warning FailedMount 16m (x2 over 21m) kubelet Unable to attach or mount volumes: unmounted volumes=[mongo-data3], unattached volumes=[kube-api-access-5r75k mongo-data3]: timed out waiting for the condition
Warning FailedMount 6m16s (x23 over 36m) kubelet MountVolume.MountDevice failed for volume "pvc-XYZ" : rpc error: code = Internal desc = Unable to find device path out of attempted paths: [/dev/disk/by-id/linode-pvcXYZ /dev/disk/by-id/scsi-0Linode_Volume_pvcXYZ]
Warning FailedMount 66s (x13 over 35m) kubelet Unable to attach or mount volumes: unmounted volumes=[mongo-data3], unattached volumes=[mongo-data3 kube-api-access-5r75k]: timed out waiting for the condition
Is there a recommended configuration for stateful sets to avoid this multi-attach then repeated FailedMount and pod not starting issue?