Good afternoon folks, I have been seeing an issue ...
# rke2
t
Good afternoon folks, I have been seeing an issue where on install it appears that etcd is failing to start, and the bootstrapping just loops around attemtping to connnect to etcd with no success. The image is on the server according to crictl but no pods are running, and the rke2 logs are not showing anything seemingly useful, even with hte debug flag, is there anything I can do to increase the errors being returned from the initial attemps to start etcd?
c
check the kubelet and containerd logs
what kind of resources (cpu/memory) do you have on the node?
t
2CPUS 32G of RAM
c
you might also check the etcd logs under /var/log/pods
2 cores or 2 sockets?
that’s a lot of memory for just 2 cores. You need at least 4 cores for a server.
t
ok, i can bump it up
c
that’s probably not it, but thats definitely low for a server.
t
It doesnt appear to be getting far enoiugh to create /var/log/pods
c
definitely check containerd and kubelet logs then
t
THe only non info message is:
Copy code
time="2025-05-15T22:45:15.024459505Z" level=error msg="failed to load cni during init, please check CRI plugin status before setting up network for pods" error="cni config load failed: no network config found in /etc/cni/net.d: cni plugin not initialized: failed to load cni config"
c
cni doesn’t get installed until later
kubelet and containerd are both running?
t
kublet exits when it cant contyact etcd
c
thats not how that works
t
and gets restarted and loops
c
kubelet does not talk to etcd, the apiserver does
t
ok, let me try again
c
what is the exact error the kubelet is exiting with
kubelet crashlooping would definitely prevent pods from getting started
t
Copy code
May 15 23:02:48 na-nonprod-dynenvcp-01c.atl01.xx.com rke2[387567]: time="2025-05-15T23:02:48Z" level=info msg="Failed to test etcd connection: failed to get etcd status: rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: dial tcp 127.0.0.1:2379: connect: connection refused\""
May 15 23:02:48 na-nonprod-dynenvcp-01c.atl01.xx.com rke2[387567]: time="2025-05-15T23:02:48Z" level=error msg="Kubelet exited: exit status 1"
c
that’s not the kubelet log. look at the kubelet log.
t
If i make a change to config.yaml will that get picked up between restarts, or do i need to reinstall?
c
just need to restart
t
Yep, my error in the config.yaml passing it an invalid flag
c
that’d do it
t
thanks for the assistance
🙌 1
"command failed" err="failed to set feature gates from initial flags-based config: unrecognized feature gate: EphemeralContainers"
would be the culprit