This message was deleted Rancher Users #longhorn-storage

Join Slack

This message was deleted.

# longhorn-storage

adamant-kite-43734

08/10/2023, 4:46 PM

This message was deleted.

quiet-area-89381

08/10/2023, 4:48 PM

I read that it may mean that we have hardware issue. So maybe the disks are not doing well.

quiet-area-89381

08/10/2023, 4:55 PM

oh seeing that king of stuff in dmesg is not good

Copy code

[202325.420295] blk_update_request: critical medium error, dev sdd, sector 290816 op 0x1:(WRITE) flags 0x800 phys_seg 5 prio class 0
[202325.423010] EXT4-fs warning (device sdd): ext4_end_bio:344: I/O error 7 writing to inode 13 starting block 36357)
[202325.423099] Buffer I/O error on device sdd, logical block 36352
[202325.424493] Buffer I/O error on device sdd, logical block 36353
[202325.425952] Buffer I/O error on device sdd, logical block 36354
[202325.427393] Buffer I/O error on device sdd, logical block 36355
[202325.429003] Buffer I/O error on device sdd, logical block 36356

quiet-area-89381

08/10/2023, 5:01 PM

All the error messages I see are related to scsi devices created by CSI, not actual hardware devices. (sda on this server).

quiet-area-89381

08/10/2023, 5:01 PM

I see

Copy code

[Thu Aug 10 09:40:45 2023] EXT4-fs (sdb): I/O error while writing superblock
[Thu Aug 10 09:40:45 2023] EXT4-fs (sdb): Remounting filesystem read-only

quiet-area-89381

08/10/2023, 5:02 PM

and also

Copy code

sd 4:0:0:1: Power-on or device reset occurred

quiet-area-89381

08/10/2023, 5:02 PM

many times on different CSI devices.

quiet-area-89381

08/10/2023, 5:18 PM

in the longhorn-csi-plugin container on the kubelet I see this related to the same volume

Copy code

time="2023-08-10T09:44:12Z" level=error msg="NodeGetVolumeStats: req: {\"volume_id\":\"pvc-2e1be23b-4021-45d2-878b-d37cd71679dc\",\"volume_path\":\"/var/snap/microk8s/common/var/lib/kubelet/pods/3a29992b-070e-4072-9e54-1411274dfbff/volumes/kubernetes.io~csi/pvc-2e1be23b-4021-45d2-878b-d37cd71679dc/mount\"} err: rpc error: code = Internal desc = Get \"<http://longhorn-backend:9500/v1/volumes/pvc-2e1be23b-4021-45d2-878b-d37cd71679dc>\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"

quiet-area-89381

08/10/2023, 5:18 PM

I also see that there is a ton of debug/info level logs. Is that because this cluster uses longhorn version 1.4.0-dev ?

quiet-area-89381

08/10/2023, 5:40 PM

The workaround now is to redeploy the deployment/terminate the faulty pod.

22 Views

Open in Slack

Previous Next