https://rancher.com/ logo
Title
l

lively-balloon-7264

01/24/2023, 12:41 PM
i had a longhorn volume go into a degraded state last night - second time in the last few days. it recovered quickly, but i'm curious if there is something i should dig into further doesn't look like i have any hardware issues on the host, when digging into logs, i found that a csi-provisioner pod on the same host crashed
here are the logs from the time of crash/restart
$ kubectl version --short
Flag --short has been deprecated, and will be removed in the future. The --short output will become the default.
Client Version: v1.25.4
Kustomize Version: v4.5.7
Server Version: v1.24.8+k3s1
and i'm on longhorn v1.4.0 on amd64 hardware running Ubuntu 20.04.5 LTS
just interested if this is potentially a longhorn bug, or if there's some underlying issue i need to dig into
f

famous-journalist-11332

01/26/2023, 6:29 AM
Can you check if other pod on that node are crashed as well (e.g., instance-manager-xxx, longhorn-manager-xxx,...)
l

lively-balloon-7264

01/26/2023, 12:13 PM
no longhorn pods recently crashed/restarted on that node:
$ kubectl -n longhorn-system get pods --field-selector spec.nodeName=k3s-6
NAME                                                  READY   STATUS      RESTARTS       AGE
backup-2-8b0ef6be-77074ba4-27904830-f2fgv             0/1     Completed   0              5d3h
backup-2-ba178593-77074ba4-27904560-d95l8             0/1     Completed   0              5d8h
backup-2-f341a731-77074ba4-27904650-jz7x6             0/1     Completed   0              5d6h
backup-27904590-g6k5v                                 0/1     Completed   0              5d7h
csi-provisioner-5d8dd96b57-gwwwn                      1/1     Running     2 (2d4h ago)   19d
engine-image-ei-fc06c6fb-njhvm                        1/1     Running     0              19d
instance-manager-e-74c2d2ed183ea550d4814476c082e7e6   1/1     Running     0              19d
instance-manager-r-74c2d2ed183ea550d4814476c082e7e6   1/1     Running     0              19d
longhorn-csi-plugin-snt7w                             3/3     Running     0              19d
longhorn-manager-thnjs                                1/1     Running     0              19d
snapshot-6-39b4be4e-77074ba4-27911790-lnqcv           0/1     Completed   0              7h41m
and looking at the logs of all the other longhorn pods on that node, there is really nothing noteworthy other than this one from
instance-manager-e-74c2d2ed183ea550d4814476c082e7e6
right around when the csi-provisioner pod crashed:
2023-01-24T08:04:23.633724704Z stderr F [pvc-4cb1db41-0d86-47b7-b3c6-163fe6621335-e-85950dc5] time="2023-01-24T08:04:23Z" level=error msg="R/W Timeout. No response received in 8s"
f

famous-journalist-11332

01/26/2023, 11:00 PM
The above log indicate that the engine of the volume cannot reach its replicas. It could means that the replicas were crash or network connection between them is cut.