This message was deleted.
# longhorn-storage
a
This message was deleted.
👀 1
a
Hi @millions-engine-33820 Did you upgrade the kubernetes by following this guide? https://longhorn.io/docs/1.5.1/volumes-and-nodes/maintenance/#upgrading-kubernetes When upgrading K8s, we suggest upgrade it one node at a time and cordon/drain the node first
Are the volumes currently attached to nodes? Can you provide a support bundle so we can understand more about the current situation, thanks!
m
Hi @aloof-branch-69545, since rke1 does the upgrade fully automated there is nothing we can do about the process. As far as I know, rke1 does not drain nodes, it just cordons them.
We saw that the volume was mounted on that node. We worked around that issue by rescheduling the Pod with attached PVC to another node, it then came back up again. Is there some experience with running Longhorn on RKE1 provisioned clusters when upgrading them? It would be highly inconvenient to always scale down all PVC attached workload before an upgrade
a
Hi @millions-engine-33820 I think RKE1 can be configured to drain the node during upgrade as well
Copy code
RKE will cordon each node before upgrading it, and uncordon the node afterward. RKE can also be configured to drain nodes before upgrading them.
according the text from to this doc: https://rke.docs.rancher.com/upgrades/how-upgrades-work
m
I made the experience when draining Nodes that longhorn instance-manager-e and instance-manager-r take a long time to evict and sometimes it even runs in timeouts. This could possibly make the upgrade process fail. Is there a work around for this?
a
It may because it takes some time to detach the volume and reattach to another node When draining node, the user workload on top of it would be deleted and rescheduled to another node so Longhorn volume would be also reattached to the node You can check if it is because the user pod takes so long to be rescheduled
p
I am pretty sure it's because of poddisruptionbudgets, i have the same issue with longhorn storage nodes, unable to drain them.
Copy code
kubectl get pdb -n longhorn-system
NAME                                                MIN AVAILABLE   MAX UNAVAILABLE   ALLOWED DISRUPTIONS   AGE
instance-manager-bc6044472b3fac7954484150f26da8ef   1               N/A               0                     83d
instance-manager-5a9bc9230d731c8d85994f161cd1be02   1               N/A               0                     83d
instance-manager-1b1df4ea7096c59a0363c2b75a0a4089   1               N/A               0                     55d
csi-attacher                                        1               N/A               2                     17d
csi-provisioner                                     1               N/A               2                     17d
if i remove pdb it drains just fine. i am running vanilla kubernetes. questions Is RKE by default only cordon the node without drain ? What will be the best option to cordon / drain or just cordon ?
a
to upgrade the node it is better to drain the node so the workload will be evicted to other node and the volume will be reattach to other node as well after that, Longhorn will automatically delete the PDB So a workaround is corden the node and delete the workload manually to force it to be rescheduled and then drain the node
RKE can be set to cordon and drain the node when upgrading the node