This message was deleted Rancher Users #rke2

Join Slack

This message was deleted.

# rke2

adamant-kite-43734

02/24/2023, 8:48 PM

This message was deleted.

creamy-pencil-82913

02/24/2023, 8:53 PM

While you can certainly downgrade the binaries and restart, it is likely to break things worse than they currently are

creamy-pencil-82913

02/24/2023, 8:53 PM

You should not skip minor versions when upgrading. The Kubernetes version skew policy and upgrade order is covered at https://kubernetes.io/releases/version-skew-policy/#supported-component-upgrade-order

witty-honey-18052

02/24/2023, 8:57 PM

We're aware of the version skew, it's just a matter now of available approaches to fix it. We're having trouble restoring the backup of that particular cluster

witty-honey-18052

02/24/2023, 8:59 PM

gathering a couple of other details...

creamy-pencil-82913

02/24/2023, 8:59 PM

it might be easier to push forward and just try to finish the upgrade

creamy-pencil-82913

02/24/2023, 9:00 PM

get all the nodes on the same version, and then see what’s stuck/broken

witty-honey-18052

02/24/2023, 9:01 PM

that's where we're at right now, we found that there may be missing updates to pod security policies and some other things introduced in 1.23 according to changelogs

witty-honey-18052

02/24/2023, 9:02 PM

so they upgraded the nodes using https://docs.rke2.io/upgrade/manual_upgrade process on each node, and used the Stable channel, which jumped it straight to 1.24

witty-honey-18052

02/24/2023, 9:04 PM

everything seemed to be running fine for the most part, but there were rejections for things like cert-manager workloads related to pod security policies. For example, I compared the global-unrestricted-psp policy and it was missing expected updates from 1.23 which should have resolved those rejections

creamy-pencil-82913

02/24/2023, 9:10 PM

ah, you may be running into https://github.com/rancher/rke2/issues/3475

creamy-pencil-82913

02/24/2023, 9:11 PM

the workaround is to use

kubectl edit PodSecurityPolicy

to add the missing annotation

witty-honey-18052

02/24/2023, 9:11 PM

yea I found that in the rabbit hole but couldn't tell if there was actually anything related incoming, or if we missed something

witty-honey-18052

02/24/2023, 9:12 PM

is there a chance the cluster isn't fubarred from making the jump? (and I'm treating this as a symptom of the wrong problem?)

creamy-pencil-82913

02/24/2023, 9:13 PM

it’ll probably be OK. There may be a few things like this that you need to correct by hand but I wouldn’t give up on it immediately.

👍 1

witty-honey-18052

02/24/2023, 9:21 PM

That resolved the cert-manager issue it looks like

witty-honey-18052

02/24/2023, 9:24 PM

awesome

witty-honey-18052

02/24/2023, 9:25 PM

are there any safeguards in place in the node join process to help prevent this situation? i'm a little bit surprised there isn't a remark in the docs for node management which recommends checking and matching the rke2 version

witty-honey-18052

02/24/2023, 9:27 PM

unless there are safeguards and they passed appropriately here

creamy-pencil-82913

02/24/2023, 9:33 PM

nope. We don’t have any preflight checks to go poking around the cluster to check to make sure everything’s been upgraded in the right order. RKE2 and K3s don’t have a concept of an orchestration tool that has a holistic view of the cluster (as you might get from kubeadm or rke) to handle that sort of gating.

witty-honey-18052

02/24/2023, 9:47 PM

Understood there. Do you think it would have any value to add remark or two to the docs to remind about about matching version, for folks that might not understand the version skew? We'd be happy to propose/PR

👍 1

witty-honey-18052

02/24/2023, 9:48 PM

It's not really rke2's job to teach people that, I know, but it apparently happens! Lol

creamy-pencil-82913

02/24/2023, 9:49 PM

yeah, if you have a moment to create an issue in https://github.com/rancher/rke2-docs/issues

👍 1

witty-honey-18052

02/24/2023, 10:09 PM

@swift-fireman-59958 🙂

4 Views

Open in Slack

Previous Next