This message was deleted Rancher Users #harvester

Join Slack

This message was deleted.

# harvester

adamant-kite-43734

06/16/2024, 11:39 PM

This message was deleted.

prehistoric-balloon-31801

06/17/2024, 1:29 AM

can you provide a support bundle?

rhythmic-article-81903

06/17/2024, 7:25 AM

Bundle attached

supportbundle_977b4a68-2931-4d6a-a100-9ca1d38f714f_2024-06-17T06-40-17Z.zip

prehistoric-balloon-31801

06/17/2024, 9:18 AM

seems Rancher is waiting for kubelet on first node

Copy code

- lastUpdateTime: "2024-06-16T21:43:45Z"
    message: 'configuring bootstrap node(s) custom-9cb22ccf7984: waiting for kubelet
      to update'
    reason: Waiting
    status: Unknown
    type: Provisioned

The node is

harvester1

but kubelet seems to work well and up there.

rhythmic-article-81903

06/17/2024, 9:19 AM

yes that I noticed

rhythmic-article-81903

06/17/2024, 9:19 AM

and I guess issue is that helm deployments "are waiting" for other nodes to update (as those complain too low rke2 version)

prehistoric-balloon-31801

06/17/2024, 9:22 AM

if you don't have any workload on harvester1, maybe try restarting rke2-server on it and see if it helps?

prehistoric-balloon-31801

06/17/2024, 9:22 AM

We'll need to check the probes. @bland-farmer-13503 Do you know what Rancher is probing?

rhythmic-article-81903

06/17/2024, 9:28 AM

Restarting did not help

👌 1

prehistoric-balloon-31801

06/17/2024, 9:41 AM

Thanks, we'll need time to check. Please don't delete anything 🙏

prehistoric-balloon-31801

06/17/2024, 12:35 PM

I'm posting the issue to out internal channel and might hear some feedback during US working hours.

rhythmic-article-81903

06/18/2024, 9:04 AM

any updates?

prehistoric-balloon-31801

06/18/2024, 9:15 AM

Hi, Rancher team will check. They requested some logs, which I can only share in my time zone. From my observation, the rancher is waiting kubelet version to become v1.27.13. but the machine object still shows it's 1.27.10.

Copy code

- lastUpdateTime: "2024-06-16T21:43:45Z"
    message: 'configuring bootstrap node(s) custom-9cb22ccf7984: waiting for kubelet
      to update'
    reason: Waiting
    status: Unknown
    type: Provisioned

Copy code

# machine custom-9cb22ccf7984, which is node harvester1
  nodeInfo:
    architecture: amd64
    bootID: 45075668-4501-4b4d-bd39-6f133b567b02
    containerRuntimeVersion: <containerd://1.7.11-k3s2>
    kernelVersion: 5.14.21-150400.24.108-default
    kubeProxyVersion: v1.27.10+rke2r1
    kubeletVersion: v1.27.10+rke2r1  <-----

# rkeConfig in cluster object
spec:
  kubernetesVersion: v1.27.13+rke2r1 <----
  localClusterAuthEndpoint: {}

The weird part is node harvester1 is already upgraded.

Copy code

$ kubectl get nodes
NAME         STATUS   ROLES                       AGE   VERSION
harvester1   Ready    control-plane,etcd,master   49d   v1.27.13+rke2r1     <- correct version
harvester2   Ready    control-plane,etcd,master   49d   v1.27.10+rke2r1
harvester3   Ready    <none>                      48d   v1.27.10+rke2r1
harvester4   Ready    <none>                      48d   v1.27.10+rke2r1
harvester5   Ready    control-plane,etcd,master   49d   v1.27.10+rke2r1
harvester6   Ready    <none>                      48d   v1.27.10+rke2r1
harvester7   Ready    <none>                      48d   v1.27.10+rke2r1

rhythmic-article-81903

06/19/2024, 7:24 AM

any updates on this? This is causing quite a lot headache for us now...

bland-farmer-13503

06/19/2024, 8:02 AM

Hi @rhythmic-article-81903, could you add

cattle-provisioning-capi-system

to support-bundle-namespaces setting and generate support bundle again? The machine node info is different from node status. I suspect there may be some error in capi-controller-manager deployment. Thank you. https://github.com/kubernetes-sigs/cluster-api/blob/00dbf7b9f6322d7ebd06ae2efa703b[…]d37d/internal/controllers/machine/machine_controller_noderef.go

👍 1

rhythmic-article-81903

06/19/2024, 9:02 AM

Attached

supportbundle_977b4a68-2931-4d6a-a100-9ca1d38f714f_2024-06-19T08-53-03Z.zip

rhythmic-article-81903

06/19/2024, 9:20 AM

It is now probably progressing. Will report back while there is indication in GUI. I deleted capi-controller-manager-*

bland-farmer-13503

06/19/2024, 9:21 AM

Yeah, there are some error message in it. Hope it works.

rhythmic-article-81903

06/19/2024, 9:22 AM

now it is progressing

🙌 1

rhythmic-article-81903

06/19/2024, 2:00 PM

I was able to finish upgrade. Thank you for the help!

👍 1

🎉 1

prehistoric-balloon-31801

07/11/2024, 1:22 AM

@rhythmic-article-81903 Sorry to bother you again, we are still checking the root cause of this issue. Was the cluster a new v1.3.0 installation? Means it's not upgraded from any previous v1.2 .xversions. Thanks.

rhythmic-article-81903

07/17/2024, 8:29 PM

it was new 1.3

🙏 1

21 Views

Open in Slack

Previous Next