This message was deleted Rancher Users #k3s

Join Slack

This message was deleted.

# k3s

adamant-kite-43734

02/05/2024, 3:47 PM

This message was deleted.

abundant-hair-58573

02/05/2024, 3:50 PM

This is a 3 node k3s cluster with an external RDS db

abundant-hair-58573

02/05/2024, 4:15 PM

When I run

kubectl get deployments -n kube-system

I see

Copy code

local-path-provisioner 0/1
metrics-server 0/1
coredns 0/1

freezing-caravan-50378

02/05/2024, 4:29 PM

looks like your

kube-dns

service may already be listening on

10.43.0.10

also something about a bad/dead Bearer token seems it may go away later, possibly linked to the already provisioned service / IP

freezing-caravan-50378

02/05/2024, 4:30 PM

Probably the best shot i have, i'd suggest doing a reboot / fresh check of your pods to make sure things are starting correctly

abundant-hair-58573

02/05/2024, 4:30 PM

Yea, actually I just saw that by doing

kubectl get service --all-namespaces

and the kube-dns service is running

freezing-caravan-50378

02/05/2024, 4:30 PM

might be an old pod you see before (initial message)

abundant-hair-58573

02/05/2024, 4:30 PM

running and listening on that IP

freezing-caravan-50378

02/05/2024, 4:31 PM

what about a quick

kubectl get po -A

freezing-caravan-50378

02/05/2024, 4:31 PM

should show everything up i'd assume now

abundant-hair-58573

02/05/2024, 4:33 PM

thta shows te fleet-controller and metrics derver in a crashloopbackoff. Then it shows these as Running

Copy code

svclb-traefik
gitjob
rancher-webhook
fleet-agent

abundant-hair-58573

02/05/2024, 4:33 PM

it shows Rancher as running but with 0/1 ready

freezing-caravan-50378

02/05/2024, 4:34 PM

check

fleet-controller

pod logs if you can

freezing-caravan-50378

02/05/2024, 4:37 PM

Might need to start poking each pod on that node with a reboot and make sure it comes back up, it should self heal though ideally if you still have a proper cluster state (assuming 3+ nodes)

abundant-hair-58573

02/05/2024, 4:39 PM

weird the fleet-controller log shows

Copy code

Error: Unauthorized
Usage: 
  fleet-manager [flags]
...
time="<timestampe>" level=fatal msg=Unauthorized

abundant-hair-58573

02/05/2024, 4:40 PM

thought the usage statement was an error with my command at first lol

freezing-caravan-50378

02/05/2024, 4:42 PM

Weird, if that is

stdout

from the service maybe its being called wrong by the other containers. then possibly the one node is broken? At this point I'd take it out of the cluster and re-provision into the cluster again, I dont poke around too much with system level and just make sure i can re-provision quick

abundant-hair-58573

02/05/2024, 4:45 PM

we keep our manager nodes in an autoscale group in AWS, it looks like all 3 were replaced within a few hours of eachother overnight a few days ago. I have no idea what could have happened, this is in our air-gapped network. So there's 3 nodes right now that all show as healthy. I did remove one and add another one about 45 minutes ago and it joined fine, but showing the same error as the one before. All 3 show the same messages from journalctl

abundant-hair-58573

02/05/2024, 4:45 PM

Would it be easier to troubleshoot if I brought the cluster down to just 1 node?

abundant-hair-58573

02/05/2024, 4:46 PM

I was hoping adding the new node would clear things up but it didn't. We run the k3s install.sh script from our user-data script

abundant-hair-58573

02/05/2024, 4:54 PM

I just deleted the node that those pods were running on, it tried starting them on a new node but it's showing the same errors

abundant-hair-58573

02/05/2024, 5:10 PM

I'm tempted to delete all of my nodes, restore my DB from backups, and bring up a new node. But I'd prefer to get this sorted the right way

abundant-hair-58573

02/05/2024, 5:21 PM

The log from the fleet-agent pod says "Failed to register agent: Unauthorized"

abundant-hair-58573

02/05/2024, 5:54 PM

I brought the cluster down to just 1 node and then rebooted that node so all pods should start fresh, same errors though

3 Views

Open in Slack

Previous Next