This message was deleted.
# k3s
a
This message was deleted.
m
Restarting the k3s service didn't help
b
Sounds like the token expired so it doesn't trust the node anymore.
c
That doesn't sound like the actual terminal error that is preventing it from starting. Please post the full log.
m
@creamy-pencil-82913 Having a look at the logs, I guess the issue is somewhere here:
Copy code
Apr 25 15:19:00 k3s-node-1 k3s[13037]: time="2024-04-25T15:19:00Z" level=info msg="Applying CRD addons.k3s.cattle.io"
Apr 25 15:19:00 k3s-node-1 k3s[13037]: time="2024-04-25T15:19:00Z" level=info msg="Waiting to retrieve kube-proxy configuration; server is not ready: <https://127.0.0.1:6443/v1-k3s/readyz>: 500 Internal Server Error"
Apr 25 15:19:00 k3s-node-1 k3s[13037]: I0425 15:19:00.843673   13037 kubelet_node_status.go:112] "Node was previously registered" node="k3s-node-1"
Apr 25 15:19:00 k3s-node-1 k3s[13037]: I0425 15:19:00.843889   13037 kubelet_node_status.go:76] "Successfully registered node" node="k3s-node-1"
Apr 25 15:19:00 k3s-node-1 k3s[13037]: I0425 15:19:00.875058   13037 kuberuntime_manager.go:1529] "Updating runtime config through cri with podcidr" CIDR="10.42.0.0/24"
Apr 25 15:19:00 k3s-node-1 k3s[13037]: I0425 15:19:00.893932   13037 kubelet_network.go:61] "Updating Pod CIDR" originalPodCIDR="" newPodCIDR="10.42.0.0/24"
Apr 25 15:19:00 k3s-node-1 k3s[13037]: I0425 15:19:00.894375   13037 memory_manager.go:170] "Starting memorymanager" policy="None"
Apr 25 15:19:00 k3s-node-1 k3s[13037]: I0425 15:19:00.894404   13037 state_mem.go:35] "Initializing new in-memory state store"
Apr 25 15:19:00 k3s-node-1 k3s[13037]: I0425 15:19:00.894619   13037 state_mem.go:75] "Updated machine memory state"
Apr 25 15:19:00 k3s-node-1 k3s[13037]: I0425 15:19:00.923971   13037 setters.go:568] "Node became not ready" node="k3s-node-1" condition={"type":"Ready","status":"False","lastHeartbeatTime":"2024-04-25T15:19:00Z","lastTransitionTime":"2024-04-25T15:19:00Z","reason":"KubeletNotReady","message":"container runtime status check may not have completed yet"}
Apr 25 15:19:00 k3s-node-1 k3s[13037]: I0425 15:19:00.955484   13037 manager.go:479] "Failed to read data from checkpoint" checkpoint="kubelet_internal_checkpoint" err="checkpoint is not found"
Apr 25 15:19:00 k3s-node-1 k3s[13037]: I0425 15:19:00.955685   13037 plugin_manager.go:118] "Starting Kubelet Plugin Manager"
Apr 25 15:19:00 k3s-node-1 k3s[13037]: I0425 15:19:00.964115   13037 pod_container_deletor.go:80] "Container not found in pod's containers" containerID="daed2709eead47ee33558a1ffc90dc2949f2af9c9e376cd415b5c9e80ded35f0"
Apr 25 15:19:00 k3s-node-1 k3s[13037]: I0425 15:19:00.989103   13037 serving.go:387] Generated self-signed cert in-memory
Apr 25 15:19:01 k3s-node-1 k3s[13037]: I0425 15:19:01.023900   13037 serving.go:387] Generated self-signed cert in-memory
Apr 25 15:19:01 k3s-node-1 k3s[13037]: I0425 15:19:01.182932   13037 serving.go:387] Generated self-signed cert in-memory
Apr 25 15:19:01 k3s-node-1 k3s[13037]: I0425 15:19:01.205819   13037 kubelet_node_status.go:497] "Fast updating node status as it just became ready"
Apr 25 15:19:01 k3s-node-1 k3s[13037]: I0425 15:19:01.245950   13037 apiserver.go:52] "Watching apiserver"
Apr 25 15:19:01 k3s-node-1 k3s[13037]: time="2024-04-25T15:19:01Z" level=info msg="Applying CRD etcdsnapshotfiles.k3s.cattle.io"
Apr 25 15:19:01 k3s-node-1 k3s[13037]: I0425 15:19:01.306314   13037 kube.go:144] Node controller sync successful
Apr 25 15:19:01 k3s-node-1 k3s[13037]: I0425 15:19:01.306477   13037 vxlan.go:141] VXLAN config: VNI=1 Port=0 GBP=false Learning=false DirectRouting=false
Apr 25 15:19:01 k3s-node-1 k3s[13037]: time="2024-04-25T15:19:01Z" level=info msg="Stopped tunnel to 127.0.0.1:6443"
Apr 25 15:19:01 k3s-node-1 k3s[13037]: time="2024-04-25T15:19:01Z" level=info msg="Connecting to proxy" url="<wss://ip:6443/v1-k3s/connect>"
Apr 25 15:19:01 k3s-node-1 k3s[13037]: time="2024-04-25T15:19:01Z" level=info msg="Proxy done" err="context canceled" url="<wss://127.0.0.1:6443/v1-k3s/connect>"
Apr 25 15:19:01 k3s-node-1 k3s[13037]: I0425 15:19:01.337874   13037 topology_manager.go:215] "Topology Admit Handler" podUID="542d7059-5858-4a44-bc36-8e645c44f276" podNamespace="kube-system" podName="local-path-provisioner-84db5d44d9-lpx2h"
Apr 25 15:19:01 k3s-node-1 k3s[13037]: I0425 15:19:01.337958   13037 topology_manager.go:215] "Topology Admit Handler" podUID="2eb23ab8-30c4-4770-9008-ee9e5ed7fb44" podNamespace="kube-system" podName="metrics-server-67c658944b-s7blc"
Apr 25 15:19:01 k3s-node-1 k3s[13037]: I0425 15:19:01.337997   13037 topology_manager.go:215] "Topology Admit Handler" podUID="46770914-4230-4eb6-a033-b4158a407597" podNamespace="kube-system" podName="coredns-6799fbcd5-7dtdb"
Apr 25 15:19:01 k3s-node-1 k3s[13037]: time="2024-04-25T15:19:01Z" level=info msg="error in remotedialer server [400]: websocket: close 1006 (abnormal closure): unexpected EOF"
c
I don’t see that it’s exited. Unless you cut off the end, those look like fairly normal logs from a running k3s server. What makes you think that it’s not joining the cluster?
m
Because I can’t see it listed as one of the servers, using kubectl get nodes doesn’t show it
c
are you checking that locally, or from another node? Are you using etcd, or an external db?
m
I get the same output from checking locally, and from one of the working servers, Only two servers are visible which should be three instead I'm using HA Embedded etcd
I ended up destroying the VM and creating a new one, running the joining command on a clean VM worked, the server has joined the rest of the servers
c
if the third server didn’t even show up in the cluster, it sounds like you deleted it at some point? If thats the case you would have had to delete the DB files and pointed it at one of the other cluster members to rejoin.
If you’d just left it alone it should have been listed in the cluster as NotReady.
m
Ah okay, will keep it in mind in case the issue shows up again, thanks 👍