This message was deleted.
# rke2
a
This message was deleted.
m
I would clean off the install, and try to join with your own custom name. Can add something like --node-name "useful-name"
This error can pop up if it's trying to re-use an existing node name
And make sure the node is deleted from Rancher before you try to re-join. If it gets stuck on deleting, you can remove the finalizer from the "Edit as YAML" (but I would wait a while before going nuclear like that).
g
Awesome, thanks a lot! Well, going nuclear after 4 days is fine I guess 😄 After removing the redeployed node, when I go to Cluster-manager I still see:
configuring bootstrap node(s) custom-db47e55727a2: waiting for probes: etcd, kube-apiserver, kube-controller-manager, kube-scheduler
(db47e55727a2 being the old node) I deleted everything, rebooted and ran the installer again with a node name. I can see the node in ranchers cluster manager. Set the log level to trace,
Copy code
time="2023-10-11T09:47:42-04:00" level=debug msg="[K8s] Processing secret custom-6086b7cf5eaf-machine-plan in namespace fleet-default at generation 0 with resource version 3540188"
time="2023-10-11T09:47:47-04:00" level=debug msg="[K8s] Processing secret custom-6086b7cf5eaf-machine-plan in namespace fleet-default at generation 0 with resource version 3540188"
time="2023-10-11T09:47:52-04:00" level=debug msg="[K8s] Processing secret custom-6086b7cf5eaf-machine-plan in namespace fleet-default at generation 0 with resource version 3540188"
time="2023-10-11T09:47:57-04:00" level=debug msg="[K8s] Processing secret custom-6086b7cf5eaf-machine-plan in namespace fleet-default at generation 0 with resource version 3540188"
and when I go to the node with the state of
Waiting for Node Ref
the yaml says this
Copy code
generation: 2
  labels:
    <http://cattle.io/os|cattle.io/os>: linux
    <http://cluster.x-k8s.io/cluster-name|cluster.x-k8s.io/cluster-name>: rke-dev
    <http://cluster.x-k8s.io/control-plane|cluster.x-k8s.io/control-plane>: 'true'
    <http://objectset.rio.cattle.io/hash|objectset.rio.cattle.io/hash>: d17c6951f999164f526bfd5e49a992c9adc9f837
    <http://rke.cattle.io/cluster-name|rke.cattle.io/cluster-name>: rke-dev
    <http://rke.cattle.io/control-plane-role|rke.cattle.io/control-plane-role>: 'true'
    <http://rke.cattle.io/etcd-role|rke.cattle.io/etcd-role>: 'true'
    <http://rke.cattle.io/node-name|rke.cattle.io/node-name>: rke-dev01_v2.maas
    <http://rke.cattle.io/worker-role|rke.cattle.io/worker-role>: 'true'
  name: custom-6086b7cf5eaf
  namespace: fleet-default
  ownerReferences:
    - apiVersion: <http://cluster.x-k8s.io/v1beta1|cluster.x-k8s.io/v1beta1>
      kind: Cluster
      name: rke-dev
      uid: ac949847-1047-4f4d-b4c3-cdef4f59eb4b
  resourceVersion: '3540233'
  uid: 747f6bc2-d9e2-4a04-aee5-278d9f1279fc
spec:
  bootstrap:
    configRef:
      apiVersion: <http://rke.cattle.io/v1|rke.cattle.io/v1>
      kind: RKEBootstrap
      name: custom-6086b7cf5eaf
      namespace: fleet-default
    dataSecretName: custom-6086b7cf5eaf-machine-bootstrap
  clusterName: rke-dev
  infrastructureRef:
    apiVersion: <http://rke.cattle.io/v1|rke.cattle.io/v1>
    kind: CustomMachine
    name: custom-6086b7cf5eaf
    namespace: fleet-default
  nodeDeletionTimeout: 10s
so the node name is set as expected. But the state is still stuck at
Waiting for Node Ref
. What wonders me a little is that the generation and resource version differs from the logs I get. All for stupid ideas, so I already tryed manually setting resourceVersion to the same as the logs, but Rancher wont let me
m
Is there anything on the Rancher-System-Agent logs or the rke2-server logs on the node itself?
r
Can you check the status/health of etcd on the remaining two nodes? I've noticed etcd refuse to do anything without quorum, which is 2 nodes since it originally had 3, so if one of your etcd nodes is in a bad state, you won't be able to do anything until/unless it's fixed. I've also seen where etcd wants to elect a leader, but the leader is the one still not up, so the other two just kinda' keep waiting on it and everything is stuck.