https://rancher.com/ logo
#k3s
Title
l

late-needle-80860

07/01/2022, 6:26 PM
I’m trying to update a cluster to v1.23.7+k3s1 from v.1.23.6+k3s1. However, when the first control-plane node comes up it can’t start and
journalct -u k3s.service -f
shows:
Copy code
Jul 01 20:25:25 test-test-master-0 systemd[1]: Failed to start Lightweight Kubernetes.
Jul 01 20:25:30 test-test-master-0 systemd[1]: k3s.service: Scheduled restart job, restart counter is at 151.
Jul 01 20:25:30 test-test-master-0 systemd[1]: Stopped Lightweight Kubernetes.
Jul 01 20:25:30 test-test-master-0 systemd[1]: Starting Lightweight Kubernetes...
Jul 01 20:25:30 test-test-master-0 sh[53137]: + /usr/bin/systemctl is-enabled --quiet nm-cloud-setup.service
Jul 01 20:25:30 test-test-master-0 sh[53138]: Failed to get unit file state for nm-cloud-setup.service: No such file or directory
Jul 01 20:25:30 test-test-master-0 k3s[53141]: time="2022-07-01T20:25:30+02:00" level=info msg="Starting k3s v1.23.7+k3s1 (ec61c667)"
Jul 01 20:25:30 test-test-master-0 k3s[53141]: time="2022-07-01T20:25:30+02:00" level=warning msg="Cluster CA certificate is not trusted by the host CA bundle, but the token does not include a CA hash. Use the full token from the server's node-token file to enable Cluster CA validation."
Jul 01 20:25:30 test-test-master-0 k3s[53141]: time="2022-07-01T20:25:30+02:00" level=info msg="Managed etcd cluster not yet initialized"
Jul 01 20:25:30 test-test-master-0 k3s[53141]: time="2022-07-01T20:25:30+02:00" level=warning msg="Cluster CA certificate is not trusted by the host CA bundle, but the token does not include a CA hash. Use the full token from the server's node-token file to enable Cluster CA validation."
Jul 01 20:25:30 test-test-master-0 k3s[53141]: time="2022-07-01T20:25:30+02:00" level=fatal msg="starting kubernetes: preparing server: failed to validate server configuration: critical configuration value mismatch"
Jul 01 20:25:30 test-test-master-0 systemd[1]: k3s.service: Main process exited, code=exited, status=1/FAILURE
Jul 01 20:25:30 test-test-master-0 systemd[1]: k3s.service: Failed with result 'exit-code'.
Jul 01 20:25:30 test-test-master-0 systemd[1]: Failed to start Lightweight Kubernetes.
This is Ubuntu 20.04.4 and when comparing the k3s.service files in
/etc/systemd/system/k3s.service
they look exactly the same ….. can’t find the needle in the haystack
Any advice - thank you
Side-by-side comparing them in e.g. VS Code and theres no difference AT ALL
c

creamy-pencil-82913

07/01/2022, 6:33 PM
we’re going to release v1.23.7+k3s2 to fix that. The workaround is to upgrade the first server (the one pointed at by the --server flag on the failing node) first
l

late-needle-80860

07/01/2022, 6:33 PM
aaah okay so it’s known … fair enough.
c

creamy-pencil-82913

07/01/2022, 6:34 PM
or remove the --server flag from the arguments
l

late-needle-80860

07/01/2022, 6:34 PM
hmmm …. removing the
--server
flag - no consequences of that - one should know of?
c

creamy-pencil-82913

07/01/2022, 6:34 PM
not if its already joined to the cluster
👍 1
l

late-needle-80860

07/01/2022, 6:35 PM
And the issue is the same in v1.23.8+k3s1? I also tried going to that version directly from v1.23.6+k3s1 and the same thing happened.
c

creamy-pencil-82913

07/01/2022, 6:36 PM
yep
l

late-needle-80860

07/01/2022, 6:36 PM
okay fair … so +k3s2 … on v1.23.8 coming up I guess 🙂
The server being pointed at in this setup is the API IP … handled via a
keepalived
floating IP “cluster” across the
control-plane
nodes
Oooooh interesting question arises from this debugging session I’m having here. So. I’ve coded some logic that introduces change, where change might be: • worker node disk conf. • k3s version • OS version ◦ patches • “hardware” resources
The change is introduced by interchanging the nodes … for the control-plane this is one at a time … querying the control-plane health and so on.
So the initial first control-plane node. The one with the
--cluster-init
flag. When that is interchanged it is replaced with “one” with the same node name and the same IP. However, it no longer has the
--cluster-init
flag. Is this an issue?
Hmm reading the docs and I think that the
--cluster-init
flag/arg actually needs to be removed.
c

creamy-pencil-82913

07/01/2022, 6:55 PM
it doesn’t. It’s ignored after the datastore is initialized.
l

late-needle-80860

07/01/2022, 7:05 PM
Super - thank you for the info.
696 Views