https://rancher.com/ logo
#harvester
Title
# harvester
a

adamant-kite-43734

02/07/2023, 10:45 PM
This message was deleted.
b

bright-fireman-42144

02/07/2023, 11:05 PM
no idea what I'm talking about but: kubectl describe node <node-name>? might give you more info
check the dashboard for sure for any resource constraints... mine were mainly around disk pressure but I assume CPU and mem could be the issue as well.
c

crooked-scooter-58172

02/07/2023, 11:35 PM
Thanks @bright-fireman-42144: I tried both the options but didn't get any specific reason
All resources are looking good
It seems that POD "apply-system-agent-upgrader" is failing with "no route to host" message. Any idea what this pod does and how to stop this auto upgrade?
b

bright-fireman-42144

02/08/2023, 12:36 AM
again, no clue. What ns? I'll check mine.
g

great-bear-19718

02/08/2023, 12:41 AM
what is the version of harvester?
c

crooked-scooter-58172

02/08/2023, 12:41 AM
1.1.0
g

great-bear-19718

02/08/2023, 12:41 AM
what is the spec of the nodes?
on the node where you see the failed status for kubelet
are you able to check the logs for rke2-agent
journalctl -fu rke2-agent
and also
journalctl -fu rke2-server
c

crooked-scooter-58172

02/08/2023, 1:03 AM
I am not able to ssh into the failed node. However when I run journalctl commands into other master node in the cluster, I am getting these logs
journalctl -fu rke2-server -- Logs begin at Wed 2022-12-21 080617 UTC. -- Feb 08 002430 iaas-node-001 rke2[3271]: time="2023-02-08T002430Z" level=info msg="Event(v1.ObjectReference{Kind:\"HelmChart\", Namespace:\"kube-system\", Name:\"rke2-coredns\", UID:\"31bcd068-9c8f-4440-904d-adb5b6bf5d88\", APIVersion:\"helm.cattle.io/v1\", ResourceVersion:\"317\", FieldPath:\"\"}): type: 'Normal' reason: 'ApplyJob' Applying HelmChart using Job kube-system/helm-install-rke2-coredns" Feb 08 002430 iaas-node-001 rke2[3271]: time="2023-02-08T002430Z" level=info msg="Event(v1.ObjectReference{Kind:\"HelmChart\", Namespace:\"kube-system\", Name:\"rke2-multus\", UID:\"e6c9ddf0-8ee9-42b2-8145-fe28bc72e166\", APIVersion:\"helm.cattle.io/v1\", ResourceVersion:\"383\", FieldPath:\"\"}): type: 'Normal' reason: 'ApplyJob' Applying HelmChart using Job kube-system/helm-install-rke2-multus"
g

great-bear-19718

02/08/2023, 1:04 AM
i would need to check what is going in the failed node
any specific reason you cant ssh into it?
c

crooked-scooter-58172

02/08/2023, 1:04 AM
Looks like it lost connectivity as I am not even able to ping it anymore
g

great-bear-19718

02/08/2023, 1:05 AM
that would explain the kubelet error in cluster
that is likely to be the reason for the error
c

crooked-scooter-58172

02/08/2023, 1:06 AM
Actually the issue is that we have 3 nodes cluster for almost 3-4 months and we are facing similar issue with this only. It works for few weeks and suddenly loose connectivity.
Our network team analyze everything and didn't find any issue. They told that it could be a harvester specific issue
g

great-bear-19718

02/08/2023, 1:06 AM
that is hard to say without looking at the logs from the failed node
this should collect a lot of OS specific info
Copy code
supportconfig -k -c
and once the node is up.. please also generate a harvester support-bundle
it is hard to pin point anything without the logs
c

crooked-scooter-58172

02/08/2023, 1:08 AM
Yes....
6 Views