I am trying to roll through my rke2 cluster and do...
# longhorn-storage
e
I am trying to roll through my rke2 cluster and do OS updates. After rebooting each of my 3 server nodes independently I have found that the longhorn pods are not coming up on those hosts, but are working on the other 7 hosts. Not even seeing it attempt to bring the pods up, No image errors or crash loops. Rebooting one of my worker nodes has no problem restarting the longhorn engine/manager/csi, etc.
h
by design longhorn pods will only run on agent (worker) nodes. https://longhorn.io/docs/1.9.0/concepts/#1-design
do you have anything in Error or not Running state?
Copy code
kubectl get po -n longhorn-system -o wide
e
Thank you for the response. Ok, well then that behavior seems counter to what I have seen on 2 other clusters, but I imagine a reboot of those servers would yield the same result. There are no issues reported by kubectl for the longhorn namespace. I have one worker node that had a taint on it that was preventing longhorn startup, but removing the taint and applying it after longhorn started seems to have made that node happy. I asked Google AI and it gave me this nonsense: "Yes, Longhorn can and should run on RKE2 server nodes. It's a common practice to deploy Longhorn as a storage backend for Kubernetes clusters, including those running RKE2." So I was confused. It doesnt seem necessary to me to run it on a server node that has no workloads, but maybe it makes sense for some of the control plane pods to run on the server nodes?
So I dont know what I think I saw... but there is no longhorn on the other server nodes either... so disregard that point. Not sure why Google AI thinks it should be.
h
I am not surprised; there are number of things GoogleAI has been wrong on
Although it may still be possible to run longhorn on server nodes if the label is set?? I have not tried this https://longhorn.io/docs/1.9.0/advanced-resources/os-distro-specific/okd-support/#label-and-annotate-the-node
e
Yeah I saw an article on how to run it across all nodes, including server, but I am not looking to do anything non-standard.
image.png
Is it normal for it to report server nodes as Down? Pretty sure all nodes were reporting as up at one point, but probably before the server were rebooted.
If I delete the node from the Longhorn UI does that remove it from longhorn, or does that pull that node out of the cluster?