This message was deleted Rancher Users #harvester

Join Slack

This message was deleted.

# harvester

adamant-kite-43734

02/07/2023, 10:45 PM

This message was deleted.

bright-fireman-42144

02/07/2023, 11:05 PM

no idea what I'm talking about but: kubectl describe node <node-name>? might give you more info

bright-fireman-42144

02/07/2023, 11:06 PM

check the dashboard for sure for any resource constraints... mine were mainly around disk pressure but I assume CPU and mem could be the issue as well.

crooked-scooter-58172

02/07/2023, 11:35 PM

Thanks @bright-fireman-42144: I tried both the options but didn't get any specific reason

crooked-scooter-58172

02/07/2023, 11:36 PM

All resources are looking good

crooked-scooter-58172

02/08/2023, 12:12 AM

It seems that POD "apply-system-agent-upgrader" is failing with "no route to host" message. Any idea what this pod does and how to stop this auto upgrade?

bright-fireman-42144

02/08/2023, 12:36 AM

again, no clue. What ns? I'll check mine.

great-bear-19718

02/08/2023, 12:41 AM

what is the version of harvester?

crooked-scooter-58172

02/08/2023, 12:41 AM

1.1.0

great-bear-19718

02/08/2023, 12:41 AM

what is the spec of the nodes?

great-bear-19718

02/08/2023, 1:00 AM

on the node where you see the failed status for kubelet

great-bear-19718

02/08/2023, 1:00 AM

are you able to check the logs for rke2-agent

great-bear-19718

02/08/2023, 1:00 AM

journalctl -fu rke2-agent

and also

journalctl -fu rke2-server

crooked-scooter-58172

02/08/2023, 1:03 AM

I am not able to ssh into the failed node. However when I run journalctl commands into other master node in the cluster, I am getting these logs

crooked-scooter-58172

02/08/2023, 1:04 AM

journalctl -fu rke2-server -- Logs begin at Wed 2022-12-21 080617 UTC. -- Feb 08 002430 iaas-node-001 rke2[3271]: time="2023-02-08T002430Z" level=info msg="Event(v1.ObjectReference{Kind:\"HelmChart\", Namespace:\"kube-system\", Name:\"rke2-coredns\", UID:\"31bcd068-9c8f-4440-904d-adb5b6bf5d88\", APIVersion:\"helm.cattle.io/v1\", ResourceVersion:\"317\", FieldPath:\"\"}): type: 'Normal' reason: 'ApplyJob' Applying HelmChart using Job kube-system/helm-install-rke2-coredns" Feb 08 002430 iaas-node-001 rke2[3271]: time="2023-02-08T002430Z" level=info msg="Event(v1.ObjectReference{Kind:\"HelmChart\", Namespace:\"kube-system\", Name:\"rke2-multus\", UID:\"e6c9ddf0-8ee9-42b2-8145-fe28bc72e166\", APIVersion:\"helm.cattle.io/v1\", ResourceVersion:\"383\", FieldPath:\"\"}): type: 'Normal' reason: 'ApplyJob' Applying HelmChart using Job kube-system/helm-install-rke2-multus"

great-bear-19718

02/08/2023, 1:04 AM

i would need to check what is going in the failed node

great-bear-19718

02/08/2023, 1:04 AM

any specific reason you cant ssh into it?

crooked-scooter-58172

02/08/2023, 1:04 AM

Looks like it lost connectivity as I am not even able to ping it anymore

great-bear-19718

02/08/2023, 1:05 AM

that would explain the kubelet error in cluster

great-bear-19718

02/08/2023, 1:05 AM

that is likely to be the reason for the error

crooked-scooter-58172

02/08/2023, 1:06 AM

Actually the issue is that we have 3 nodes cluster for almost 3-4 months and we are facing similar issue with this only. It works for few weeks and suddenly loose connectivity.

crooked-scooter-58172

02/08/2023, 1:06 AM

Our network team analyze everything and didn't find any issue. They told that it could be a harvester specific issue

great-bear-19718

02/08/2023, 1:06 AM

that is hard to say without looking at the logs from the failed node

great-bear-19718

02/08/2023, 1:07 AM

this should collect a lot of OS specific info

Copy code

supportconfig -k -c

great-bear-19718

02/08/2023, 1:08 AM

and once the node is up.. please also generate a harvester support-bundle

great-bear-19718

02/08/2023, 1:08 AM

it is hard to pin point anything without the logs

crooked-scooter-58172

02/08/2023, 1:08 AM

Yes....

8 Views

Open in Slack

Previous Next