This message was deleted Rancher Users #harvester

Join Slack

This message was deleted.

# harvester

adamant-kite-43734

07/07/2022, 4:43 PM

This message was deleted.

great-bear-19718

07/08/2022, 1:46 AM

just to confirm when you removed the node you deleted the node from harvester

great-bear-19718

07/08/2022, 1:46 AM

and re-added it back?

damp-vegetable-48645

07/08/2022, 1:47 AM

I removed it from harvester and then replaced the drive in the machine and reinstalled, joining the existing cluster. After installing (and rebooting), it never joined the cluster, always showing 'UnReady' and never appearing in the host list. It was, originally, the first node that I installed and it feels as though it became the 'controller' (as I found it's IP in the rke2 config on the other nodes).

great-bear-19718

07/08/2022, 1:48 AM

how many nodes were there in the cluster

damp-vegetable-48645

07/08/2022, 1:48 AM

great-bear-19718

07/08/2022, 1:48 AM

any chance to get a support-bundle from the cluster

great-bear-19718

07/08/2022, 1:48 AM

there might be some info in there that might help us identify what could be causing this

damp-vegetable-48645

07/08/2022, 1:49 AM

Unfortunately, not at this time. I've ended up tearing them all down and rebuilt from scratch to try other configuration scenarios. I can take down the first node and go through the same steps again to see if the issue occurs in the new build as well.

great-bear-19718

07/08/2022, 1:50 AM

👍

damp-vegetable-48645

07/08/2022, 1:51 AM

What are the steps to obtain the support-bundle? I wasn't sure if this was a known issue, which is why I asked, even after tearing down the initial test cluster, for 'future knowledge'. Since it appears not, I would be more than happy to see if I can replicate the issue, or write it off as a one-off thing.

great-bear-19718

07/08/2022, 1:52 AM

https://docs.harvesterhci.io/v1.0/troubleshooting/harvester/#generate-a-support-bundle

👍 1

damp-vegetable-48645

07/08/2022, 1:53 AM

I'll repeat the steps either this evening or in the morning (Eastern Time here), and post an update in this thread.

damp-vegetable-48645

07/08/2022, 2:22 AM

It seems that the problem persists after removing/rebuilding 'Node 1' from a 3 node cluster. Nodes 2 and 3 are still online and 'Healthy', however Node 1 is shows as 'NotReady' and after 5-10 minutes post-reboot still not registered with the cluster. I created a support-bundle before performing the reload. I'm not sure if I'll be able to access the UI on the rebuilt node, but can grab another support-bundle from the now 2-node cluster, if that could be helpful in tracing the issue.

great-bear-19718

07/08/2022, 2:52 AM

you should be able to access ui from the 2 node cluster too.. it should still be running

damp-vegetable-48645

07/08/2022, 2:56 AM

Yes, I did grab a second bundle after the node was removed to compare the difference. One item I did find was there were 2

custom-*-machin-plan

secrets which contain an applied-plan field, which references the old node's IP for the server field. I updated the secrets with an IP from the remaining nodes and rebuild Node 1 again, unfortunately, it seems a new plan secret has been created when the new node tried coming online, though (currently) it's empty.

damp-vegetable-48645

07/08/2022, 2:59 AM

I also noticed that the internal Longhorn images were still set to

replica 3

after removing the node. I changed that to 2 to take the volumes out of a the degraded state. Unfortunately, so far, nothing has been able to bring the rebuilt node #1 back in to the cluster. Grasping for straws, I'm wondering if there's a configuration item, either on the remaining servers, or within the internal K8s construct that is referencing the original Node #1's IP address, and passing that in to the node bootstrap config, instead of using an IP from one of the remaining nodes.

great-bear-19718

07/08/2022, 3:00 AM

so you are not able to generate a support bundle?

damp-vegetable-48645

07/08/2022, 3:01 AM

I do have 2 of them. Should I upload them here, or is there a different preferred location? I created one prior to removing the node from the cluster, as a baseline. And created another one after removing the node from the cluster.

great-bear-19718

07/08/2022, 3:04 AM

any chance i could have the 2nd bundle?

damp-vegetable-48645

07/08/2022, 3:05 AM

Slack is telling me that my support-bundle files will exceed my workspace limit.

great-bear-19718

07/08/2022, 3:05 AM

ok.. how about creating a GH issue and attaching it there?

damp-vegetable-48645

07/08/2022, 3:05 AM

Will do.

damp-vegetable-48645

07/08/2022, 3:11 AM

https://github.com/harvester/harvester/issues/2470 created Thank you.

great-bear-19718

07/08/2022, 3:11 AM

👍

great-bear-19718

07/08/2022, 3:32 AM

what is the status of

kubectl get clusters.cluster -A

resource?

great-bear-19718

07/08/2022, 3:36 AM

because

tempharv1

is not in the cluster there is little info about it in the 2nd bundle

great-bear-19718

07/08/2022, 3:41 AM

Copy code

(⎈ |default:default)➜  nodes k get machine -n fleet-local
NAME                  CLUSTER   NODENAME    PROVIDERID         PHASE          AGE    VERSION
custom-4287b915efeb   local     tempharv3   <rke2://tempharv3>   Running        123m
custom-b49899265d4e   local     tempharv2   <rke2://tempharv2>   Running        103m
custom-e6d3236f2c36   local                                    Provisioning   79m

great-bear-19718

07/08/2022, 3:41 AM

it is trying to provision the node.. but there is not much info so may need to check logs for rke2 on the missing node

damp-vegetable-48645

07/08/2022, 1:16 PM

Copy code

kubectl get clusters.cluster -A
NAMESPACE     NAME    PHASE          AGE   VERSION
fleet-local   local   Provisioning   12h

damp-vegetable-48645

07/08/2022, 1:17 PM

Copy code

kubectl get machines -n fleet-local
NAME                  CLUSTER   NODENAME    PROVIDERID         PHASE          AGE   VERSION
custom-4287b915efeb   local     tempharv3   <rke2://tempharv3>   Running        11h
custom-8154486041ba   local                                    Provisioning   10h
custom-b49899265d4e   local     tempharv2   <rke2://tempharv2>   Running        11h

damp-vegetable-48645

07/08/2022, 1:18 PM

Obtained an SSH session to the 'new' node, looking for the rke2 logs and will share those once located.

damp-vegetable-48645

07/08/2022, 1:31 PM

/var/lib/rancher/rke2/agent/logs/kubelet.log

kubelet.log

damp-vegetable-48645

07/08/2022, 1:31 PM

/var/lib/rancher/rke2/agent/containerd/containerd.log

containerd.log

damp-vegetable-48645

07/08/2022, 1:32 PM

Copy code

ping <http://registry-1.docker.io|registry-1.docker.io>
PING <http://registry-1.docker.io|registry-1.docker.io> (44.207.51.64) 56(84) bytes of data

10 Views

Open in Slack

Previous Next