This message was deleted Rancher Users #longhorn-storage

Join Slack

This message was deleted.

# longhorn-storage

adamant-kite-43734

09/06/2023, 12:23 PM

This message was deleted.

👀 1

aloof-branch-69545

09/07/2023, 6:17 AM

Hi @millions-engine-33820 Did you upgrade the kubernetes by following this guide? https://longhorn.io/docs/1.5.1/volumes-and-nodes/maintenance/#upgrading-kubernetes When upgrading K8s, we suggest upgrade it one node at a time and cordon/drain the node first

aloof-branch-69545

09/07/2023, 6:19 AM

Are the volumes currently attached to nodes? Can you provide a support bundle so we can understand more about the current situation, thanks!

millions-engine-33820

09/07/2023, 11:42 AM

Hi @aloof-branch-69545, since rke1 does the upgrade fully automated there is nothing we can do about the process. As far as I know, rke1 does not drain nodes, it just cordons them.

millions-engine-33820

09/07/2023, 11:44 AM

We saw that the volume was mounted on that node. We worked around that issue by rescheduling the Pod with attached PVC to another node, it then came back up again. Is there some experience with running Longhorn on RKE1 provisioned clusters when upgrading them? It would be highly inconvenient to always scale down all PVC attached workload before an upgrade

aloof-branch-69545

09/11/2023, 1:28 AM

Hi @millions-engine-33820 I think RKE1 can be configured to drain the node during upgrade as well

Copy code

RKE will cordon each node before upgrading it, and uncordon the node afterward. RKE can also be configured to drain nodes before upgrading them.

according the text from to this doc: https://rke.docs.rancher.com/upgrades/how-upgrades-work

millions-engine-33820

09/11/2023, 7:25 AM

I made the experience when draining Nodes that longhorn instance-manager-e and instance-manager-r take a long time to evict and sometimes it even runs in timeouts. This could possibly make the upgrade process fail. Is there a work around for this?

aloof-branch-69545

09/12/2023, 1:59 AM

It may because it takes some time to detach the volume and reattach to another node When draining node, the user workload on top of it would be deleted and rescheduled to another node so Longhorn volume would be also reattached to the node You can check if it is because the user pod takes so long to be rescheduled

polite-wire-74609

10/11/2023, 10:22 PM

I am pretty sure it's because of poddisruptionbudgets, i have the same issue with longhorn storage nodes, unable to drain them.

Copy code

kubectl get pdb -n longhorn-system
NAME                                                MIN AVAILABLE   MAX UNAVAILABLE   ALLOWED DISRUPTIONS   AGE
instance-manager-bc6044472b3fac7954484150f26da8ef   1               N/A               0                     83d
instance-manager-5a9bc9230d731c8d85994f161cd1be02   1               N/A               0                     83d
instance-manager-1b1df4ea7096c59a0363c2b75a0a4089   1               N/A               0                     55d
csi-attacher                                        1               N/A               2                     17d
csi-provisioner                                     1               N/A               2                     17d

if i remove pdb it drains just fine. i am running vanilla kubernetes. questions Is RKE by default only cordon the node without drain ? What will be the best option to cordon / drain or just cordon ?

aloof-branch-69545

10/17/2023, 8:03 AM

to upgrade the node it is better to drain the node so the workload will be evicted to other node and the volume will be reattach to other node as well after that, Longhorn will automatically delete the PDB So a workaround is corden the node and delete the workload manually to force it to be rescheduled and then drain the node

aloof-branch-69545

10/17/2023, 8:03 AM

RKE can be set to cordon and drain the node when upgrading the node

Open in Slack

Previous Next