This message was deleted Rancher Users #rke2

Join Slack

This message was deleted.

# rke2

adamant-kite-43734

06/14/2023, 8:28 PM

This message was deleted.

creamy-pencil-82913

06/14/2023, 8:33 PM

we haven’t made any changes to the channels since last month’s releases. In the case of an outage it will just fail to resolve the latest version, it shouldn’t ever downgrade it. Do you have SUC logs available? From the controller, not the upgrade jobs.

stocky-dinner-20200

06/14/2023, 8:35 PM

I do have the logs, but it basically only logs that I can’t contact the K8s API during the upgrade. It says nothing about what it’s actually doing. If you have a suggestion for a config change to make it do some more logging, I’d love to have it.

creamy-pencil-82913

06/14/2023, 8:37 PM

I believe even at the normal log level it should log when it polls the channel to resolve the version? I honestly can’t remember though. I’ve never seen the SUC randomly downgrade nodes though, I really suspect something else is going on.

stocky-dinner-20200

06/14/2023, 8:42 PM

The only logs I have that aren’t from it failing to talk to K8s are:

Copy code

time="2023-06-12T08:00:56Z" level=error msg="error syncing 'system-upgrade/apply-server-plan-on-mynode-with-60076d09e16f1f7be0af09e7-ab182': handler system-upgrade-controller: jobs.batch \"apply-server-plan-on-mynode-with-60076d09e16f1f7be0af09e7-ab182\" not found, requeuing"

Which suggests it was doing something, but not exactly what.

creamy-pencil-82913

06/14/2023, 8:43 PM

did the controller pod get restarted around that time? that’s normally what I see when it’s starting and the caches haven’t been sync’d yet.

creamy-pencil-82913

06/14/2023, 8:44 PM

also, what version of the SUC are you running?

stocky-dinner-20200

06/14/2023, 8:44 PM

Nope, it’s been running since November…

stocky-dinner-20200

06/14/2023, 8:44 PM

rancher/system-upgrade-controller:v0.9.1

creamy-pencil-82913

06/14/2023, 8:45 PM

that’s a bit old, you might try upgrading to 0.11.0 but even on 0.9 I still don’t have any idea what would cause it to do what you’re describing.

creamy-pencil-82913

06/14/2023, 8:46 PM

do you have logs from the rke2-server journald log to show the downgrade occurring?

stocky-dinner-20200

06/14/2023, 8:46 PM

It’s reassuring you’re as stumped as I am 😂

stocky-dinner-20200

06/14/2023, 8:46 PM

I have it saying the version when it starts up

stocky-dinner-20200

06/14/2023, 8:48 PM

Copy code

journalctl -u rke2-server.service --since='2023-05-31 00:00:00' -g 'Starting rke2'
-- Journal begins at Sun 2023-01-22 00:00:03 GMT, ends at Wed 2023-06-14 21:48:02 BST. --
May 31 02:43:45 <http://mynode.example.com|mynode.example.com> rke2[2095329]: time="2023-05-31T02:43:45+01:00" level=info msg="Starting rke2 v1.25.10+rke2r1 (e0c376c606754f1ae6a1c2401f4f6e9146bda0f3)"
Jun 12 08:29:08 <http://mynode.example.com|mynode.example.com> rke2[443313]: time="2023-06-12T08:29:08+01:00" level=info msg="Starting rke2 v1.25.9+rke2r1 (842d05e64bcbf78552f1db0b32700b8faea403a0)"
Jun 12 08:44:02 <http://mynode.example.com|mynode.example.com> rke2[477981]: time="2023-06-12T08:44:02+01:00" level=info msg="Starting rke2 v1.25.10+rke2r1 (e0c376c606754f1ae6a1c2401f4f6e9146bda0f3)"

creamy-pencil-82913

06/14/2023, 8:53 PM

which channel are you pointed at? If its stable, we have it pinned here and it hasn’t changed in a bit: https://github.com/rancher/rke2/blob/master/channels.yaml#LL3C26-L3C26

stocky-dinner-20200

06/14/2023, 8:53 PM

Copy code

channel: <https://update.rke2.io/v1-release/channels/v1.25>

creamy-pencil-82913

06/14/2023, 8:57 PM

only thing I can think of is a GH outage that caused the channel server not to see that release for a bit? It caches them though so the timing would be hard to pin down.

creamy-pencil-82913

06/14/2023, 8:57 PM

I don’t have access to the channel server logs, unfortunately

stocky-dinner-20200

06/14/2023, 8:59 PM

Hmm yes I see how that could happen; nothing in GitHub’s incident history for that exact time window but Pages and Action did fall over later that day

stocky-dinner-20200

06/14/2023, 9:25 PM

In fact I’ve found another cluster that did the same thing, also v1.25

128 Views

Open in Slack

Previous Next