https://rancher.com/ logo
l

lively-balloon-7264

01/24/2023, 12:41 PM
i had a longhorn volume go into a degraded state last night - second time in the last few days. it recovered quickly, but i'm curious if there is something i should dig into further doesn't look like i have any hardware issues on the host, when digging into logs, i found that a csi-provisioner pod on the same host crashed
here are the logs from the time of crash/restart
Copy code
$ kubectl version --short
Flag --short has been deprecated, and will be removed in the future. The --short output will become the default.
Client Version: v1.25.4
Kustomize Version: v4.5.7
Server Version: v1.24.8+k3s1
and i'm on longhorn v1.4.0 on amd64 hardware running Ubuntu 20.04.5 LTS
just interested if this is potentially a longhorn bug, or if there's some underlying issue i need to dig into
f

famous-journalist-11332

01/26/2023, 6:29 AM
Can you check if other pod on that node are crashed as well (e.g., instance-manager-xxx, longhorn-manager-xxx,...)
l

lively-balloon-7264

01/26/2023, 12:13 PM
no longhorn pods recently crashed/restarted on that node:
Copy code
$ kubectl -n longhorn-system get pods --field-selector spec.nodeName=k3s-6
NAME                                                  READY   STATUS      RESTARTS       AGE
backup-2-8b0ef6be-77074ba4-27904830-f2fgv             0/1     Completed   0              5d3h
backup-2-ba178593-77074ba4-27904560-d95l8             0/1     Completed   0              5d8h
backup-2-f341a731-77074ba4-27904650-jz7x6             0/1     Completed   0              5d6h
backup-27904590-g6k5v                                 0/1     Completed   0              5d7h
csi-provisioner-5d8dd96b57-gwwwn                      1/1     Running     2 (2d4h ago)   19d
engine-image-ei-fc06c6fb-njhvm                        1/1     Running     0              19d
instance-manager-e-74c2d2ed183ea550d4814476c082e7e6   1/1     Running     0              19d
instance-manager-r-74c2d2ed183ea550d4814476c082e7e6   1/1     Running     0              19d
longhorn-csi-plugin-snt7w                             3/3     Running     0              19d
longhorn-manager-thnjs                                1/1     Running     0              19d
snapshot-6-39b4be4e-77074ba4-27911790-lnqcv           0/1     Completed   0              7h41m
and looking at the logs of all the other longhorn pods on that node, there is really nothing noteworthy other than this one from
instance-manager-e-74c2d2ed183ea550d4814476c082e7e6
right around when the csi-provisioner pod crashed:
Copy code
2023-01-24T08:04:23.633724704Z stderr F [pvc-4cb1db41-0d86-47b7-b3c6-163fe6621335-e-85950dc5] time="2023-01-24T08:04:23Z" level=error msg="R/W Timeout. No response received in 8s"
f

famous-journalist-11332

01/26/2023, 11:00 PM
The above log indicate that the engine of the volume cannot reach its replicas. It could means that the replicas were crash or network connection between them is cut.
16 Views