This message was deleted.
# k3s
a
This message was deleted.
c
I mean maybe I could update it to 1.24.14 (latest version) but I want to learn 😉 and troubleshoot what's the issue
b
if you ssh into the VM and check the journalctl logs, is there anything interesting?
c
not that I found anything
i saw an info message that for some part (i think scheduler) the node took leader when another control-node was rebootet
so it seemed to work perfectly fine
no messages in error, a few warnings every hour or so
b
if you check the pods or nodes directly with kubectl, does it look healthy?
c
on the workload page of rancher itself for that cluster everything looks fine, let me check on the hosts via the k3s included kubectl
Copy code
NAMESPACE                  NAME                                                     READY   STATUS      RESTARTS          AGE
cattle-fleet-system        fleet-agent-7d6d98b485-bjwbk                             1/1     Running     17 (4h32m ago)    259d
cattle-monitoring-system   alertmanager-rancher-monitoring-alertmanager-0           2/2     Running     8 (4h32m ago)     371d
cattle-monitoring-system   prometheus-rancher-monitoring-prometheus-0               3/3     Running     9 (7d3h ago)      371d
cattle-monitoring-system   pushprox-k3s-server-client-5zr79                         1/1     Running     4 (4h32m ago)     371d
cattle-monitoring-system   pushprox-k3s-server-client-frj4c                         1/1     Running     5 (4h36m ago)     371d
cattle-monitoring-system   pushprox-k3s-server-client-gwgvd                         1/1     Running     3 (7d3h ago)      371d
cattle-monitoring-system   pushprox-k3s-server-client-js4v4                         1/1     Running     5 (4h42m ago)     371d
cattle-monitoring-system   pushprox-k3s-server-proxy-f4f5d4874-pwtmn                1/1     Running     3 (7d3h ago)      371d
cattle-monitoring-system   rancher-monitoring-grafana-57777cc795-z7xb2              3/3     Running     12 (4h36m ago)    371d
cattle-monitoring-system   rancher-monitoring-kube-state-metrics-5bc8bb48bd-lcxxd   1/1     Running     4 (4h32m ago)     371d
cattle-monitoring-system   rancher-monitoring-operator-f79dc4944-ln2bl              1/1     Running     5 (4h36m ago)     371d
cattle-monitoring-system   rancher-monitoring-prometheus-adapter-8846d4757-mkbgw    1/1     Running     3 (7d3h ago)      371d
cattle-monitoring-system   rancher-monitoring-prometheus-node-exporter-7df65        1/1     Running     4 (4h36m ago)     371d
cattle-monitoring-system   rancher-monitoring-prometheus-node-exporter-fcmhz        1/1     Running     3 (7d3h ago)      371d
cattle-monitoring-system   rancher-monitoring-prometheus-node-exporter-ksl5g        1/1     Running     4 (4h32m ago)     371d
cattle-monitoring-system   rancher-monitoring-prometheus-node-exporter-vf847        1/1     Running     5 (4h42m ago)     371d
cattle-system              cattle-cluster-agent-854bbc47f5-2lqw5                    1/1     Running     15 (4h36m ago)    237d
cattle-system              cattle-cluster-agent-854bbc47f5-fb9ml                    1/1     Running     23 (4h42m ago)    237d
cattle-system              system-upgrade-controller-7b8d94c7f5-kcj4k               1/1     Running     4 (4h32m ago)     259d
kube-system                coredns-b96499967-w8gjk                                  1/1     Running     4 (4h42m ago)     194d
kube-system                helm-install-cinder-csi-2v5tq                            0/1     Completed   0                 194d
kube-system                helm-install-openstack-controller-manager-p25s6          0/1     Completed   0                 194d
kube-system                helm-install-traefik-crd-msww2                           0/1     Completed   0                 194d
kube-system                helm-install-traefik-d5z75                               0/1     Completed   0                 7d3h
kube-system                local-path-provisioner-84bb864455-57hh4                  1/1     Running     6 (4h42m ago)     371d
kube-system                metrics-server-ff9dbcb6c-jkwjn                           1/1     Running     4 (4h42m ago)     194d
kube-system                openstack-cinder-csi-controllerplugin-547bc58794-x8n7m   6/6     Running     289 (4h42m ago)   371d
kube-system                openstack-cinder-csi-nodeplugin-2jqmw                    3/3     Running     61 (4h32m ago)    371d
kube-system                openstack-cinder-csi-nodeplugin-69nf5                    3/3     Running     34 (4h42m ago)    371d
kube-system                openstack-cinder-csi-nodeplugin-gdglv                    3/3     Running     29 (5d1h ago)     371d
kube-system                openstack-cinder-csi-nodeplugin-sthn8                    3/3     Running     72 (4h36m ago)    371d
kube-system                openstack-cloud-controller-manager-mzkql                 1/1     Running     24 (4h36m ago)    371d
kube-system                openstack-cloud-controller-manager-n4q77                 1/1     Running     22 (4h32m ago)    371d
kube-system                openstack-cloud-controller-manager-tn6ft                 1/1     Running     24 (4h42m ago)    371d
kube-system                traefik-85fcf6b649-fq6ck                                 1/1     Running     2 (4h36m ago)     25d
looks good to me i'd say
Copy code
# kubectl get nodes
NAME                             STATUS   ROLES                              AGE    VERSION
infra-kube-ctrl-e5a402eb-8tc2n   Ready    control-plane,etcd,master,worker   371d   v1.24.4+k3s1
infra-kube-ctrl-e5a402eb-bw9q4   Ready    control-plane,etcd,master,worker   371d   v1.24.4+k3s1
infra-kube-ctrl-e5a402eb-smwfs   Ready    control-plane,etcd,master,worker   371d   v1.24.4+k3s1
infra-kube-wrk-e67b3b9e-tqpvz    Ready    worker                             371d   v1.24.4+k3s1
g
Check the network stack, there was an element removed (vxlan) from the Ubuntu kernel in v20.04. https://www.seanfoley.blog/microk8s-ubuntu-22-04-lts-jammy-jellyfish-broken-needs-vxlan-support/ This can cause what you're seeing, especially on new nodes (not usually upgraded ones)
c
That’s the thing I did not upgrade it. Neither Ubuntu nor rancher nor k3s. Only after it was stuck in reconciling I decided to upgrade the outdated Ubuntu packages and reboot the nodes
b
pods looks good although lots of restarts. Anyway, what does the log of
cattle-cluster-agent-854bbc47f5
say?
c
logs good so far, a bunch of depcreatation warnings for policy/v1beta1 which would be removed in 1.25 and an error during subscribe websocket: close sent
btw, the one cattle-cluster-agent running on the "reconciling" node has no error, only the other one on the "active" node
@gray-pillow-95920 - checked via lsmod on the node - vxlan is available
the many restarts on the pods might be from our openstack upgrade from 1,5 weeks ago. there we shutodwn all vms due to some network migration. afterwards the cluster was started and in kind of a broken state for a few hours or a day or so. then i rebootet the nodes one by one and afterwards all was good (cluster manager in rancher dashboard showed active) until yesterday (or so, didn't monitor to closely) when this one node showed up as reconciling without anything else being done or executed on that cluster
g
Ok, I haven't seen that. I missed the vxlan step and rebuilt, then found vxlan was still missing. Ok if it didn't help. Just cost me a lot of time so thought I'd share
c
problem "solved" itself 😉 played around and tried to update to 1.24.14, after that rerun our terraform script (which also for that cluster tried to disable the local-storage provider which in turn leads to new VMs being booted. That killed the cluster in the end 😉 So I ended up deleting it and build it fresh. Now everything is fine.
b
great! Sorry I could not help you more, we have a busy week with July releases