This message was deleted.
# rke2
a
This message was deleted.
c
While you can certainly downgrade the binaries and restart, it is likely to break things worse than they currently are
You should not skip minor versions when upgrading. The Kubernetes version skew policy and upgrade order is covered at https://kubernetes.io/releases/version-skew-policy/#supported-component-upgrade-order
w
We're aware of the version skew, it's just a matter now of available approaches to fix it. We're having trouble restoring the backup of that particular cluster
gathering a couple of other details...
c
it might be easier to push forward and just try to finish the upgrade
get all the nodes on the same version, and then see what’s stuck/broken
w
that's where we're at right now, we found that there may be missing updates to pod security policies and some other things introduced in 1.23 according to changelogs
so they upgraded the nodes using https://docs.rke2.io/upgrade/manual_upgrade process on each node, and used the Stable channel, which jumped it straight to 1.24
everything seemed to be running fine for the most part, but there were rejections for things like cert-manager workloads related to pod security policies. For example, I compared the global-unrestricted-psp policy and it was missing expected updates from 1.23 which should have resolved those rejections
c
the workaround is to use
kubectl edit PodSecurityPolicy
to add the missing annotation
w
yea I found that in the rabbit hole but couldn't tell if there was actually anything related incoming, or if we missed something
is there a chance the cluster isn't fubarred from making the jump? (and I'm treating this as a symptom of the wrong problem?)
c
it’ll probably be OK. There may be a few things like this that you need to correct by hand but I wouldn’t give up on it immediately.
👍 1
w
That resolved the cert-manager issue it looks like
awesome
are there any safeguards in place in the node join process to help prevent this situation? i'm a little bit surprised there isn't a remark in the docs for node management which recommends checking and matching the rke2 version
unless there are safeguards and they passed appropriately here
c
nope. We don’t have any preflight checks to go poking around the cluster to check to make sure everything’s been upgraded in the right order. RKE2 and K3s don’t have a concept of an orchestration tool that has a holistic view of the cluster (as you might get from kubeadm or rke) to handle that sort of gating.
w
Understood there. Do you think it would have any value to add remark or two to the docs to remind about about matching version, for folks that might not understand the version skew? We'd be happy to propose/PR
👍 1
It's not really rke2's job to teach people that, I know, but it apparently happens! Lol
c
yeah, if you have a moment to create an issue in https://github.com/rancher/rke2-docs/issues
👍 1
w
@swift-fireman-59958 🙂