01/29/2023, 1:53 PM
On a small cluster with 3 worker nodes, and each worker node has a Longhorn replica, when I use the system-upgrade controller to patch Ubuntu and reboot, sometimes it's too fast and one or several volumes are still degraded when it goes ahead and schedules the next worker node, leaning me with one intact replica and two failed. Obviously not ideal 🙂 So, would a good solution be, to use a "prepare init container" in the system-upgrade plan, and have it check and wait for all volumes to be OK before proceeding? Or is there a better way?
Hmm, maybe - unmounted volumes have a state of unknown and I’ve seen then in “unknown” state numerous times when a node was down/shutting down err hmm, not sure really. Perhaps I need to look at other things to determine the actual state?