Hey folks im running a custom node driver with Vul...
# rke2
g
Hey folks im running a custom node driver with Vultr to build a RKE2 cluster on
v2.10-head
My cluster is indefinite waiting on:
[INFO ] configuring bootstrap node(s) vultr-pool1-v56vp-w52vz: waiting for cluster agent to connect[INFO ] non-ready bootstrap machine(s) vultr-pool1-v56vp-w52vz and join url to be available on bootstrap node
I have a 2 piece cluster. Controller+ETCD and 1 worker. I can't get the worker to register. looked around github for similar such as #48783 but no luck. Can anyone point me in any direction?
Copy code
NAME                      STATUS   ROLES                       AGE    VERSION          INTERNAL-IP     EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION       CONTAINER-RUNTIME
vultr-pool1-v56vp-w52vz   Ready    control-plane,etcd,master   126m   v1.31.9+rke2r1   140.82.12.145   <none>        Ubuntu 22.04.5 LTS   5.15.0-139-generic   <containerd://2.0.5-k3s1>
Copy code
k get all -A
NAMESPACE             NAME                                                        READY   STATUS      RESTARTS      AGE
cattle-fleet-system   pod/fleet-agent-0                                           2/2     Running     0             55m
cattle-system         pod/cattle-cluster-agent-9fb5964f8-px5zq                    1/1     Running     0             54m
cattle-system         pod/rancher-webhook-5c85d7877c-gbdwc                        1/1     Running     0             93m
cattle-system         pod/system-upgrade-controller-5fb67f585d-jnqrr              1/1     Running     0             52m
kube-system           pod/cilium-4znj7                                            1/1     Running     0             125m
kube-system           pod/cilium-operator-6c44d5bd85-t54b9                        0/1     Pending     0             125m
kube-system           pod/cilium-operator-6c44d5bd85-xlnht                        1/1     Running     1 (91m ago)   125m
kube-system           pod/etcd-vultr-pool1-v56vp-w52vz                            1/1     Running     0             84m
kube-system           pod/helm-install-rke2-cilium-hvng8                          0/1     Completed   0             125m
kube-system           pod/helm-install-rke2-coredns-plvt2                         0/1     Completed   0             125m
kube-system           pod/helm-install-rke2-ingress-nginx-xnm72                   0/1     Completed   0             125m
kube-system           pod/helm-install-rke2-metrics-server-bw9km                  0/1     Completed   0             125m
kube-system           pod/helm-install-rke2-runtimeclasses-8ppxf                  0/1     Completed   0             125m
kube-system           pod/helm-install-rke2-snapshot-controller-9j7qv             0/1     Completed   1             125m
kube-system           pod/helm-install-rke2-snapshot-controller-crd-lt92z         0/1     Completed   0             125m
kube-system           pod/kube-apiserver-vultr-pool1-v56vp-w52vz                  1/1     Running     1             125m
kube-system           pod/kube-controller-manager-vultr-pool1-v56vp-w52vz         1/1     Running     1             125m
kube-system           pod/kube-proxy-vultr-pool1-v56vp-w52vz                      1/1     Running     0             84m
kube-system           pod/kube-scheduler-vultr-pool1-v56vp-w52vz                  1/1     Running     1             125m
kube-system           pod/rke2-coredns-rke2-coredns-autoscaler-7b6fd8764b-2sm87   1/1     Running     5 (97m ago)   125m
kube-system           pod/rke2-coredns-rke2-coredns-c7b96fd45-7d9w8               1/1     Running     0             125m
kube-system           pod/rke2-ingress-nginx-controller-7qshn                     1/1     Running     0             52m
kube-system           pod/rke2-metrics-server-5754f9f5c7-2c5lm                    1/1     Running     0             54m
kube-system           pod/rke2-snapshot-controller-58dbcfd956-9m97s               1/1     Running     0             54m
kube-system           pod/vultr-ccm-cgvvf                                         1/1     Running     0             54m

NAMESPACE             NAME                                              TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)          AGE
cattle-fleet-system   service/fleet-agent                               ClusterIP   None            <none>        <none>           93m
cattle-system         service/cattle-cluster-agent                      ClusterIP   10.43.232.50    <none>        80/TCP,443/TCP   125m
cattle-system         service/rancher-webhook                           ClusterIP   10.43.44.235    <none>        443/TCP          93m
default               service/kubernetes                                ClusterIP   10.43.0.1       <none>        443/TCP          125m
kube-system           service/rke2-coredns-rke2-coredns                 ClusterIP   10.43.0.10      <none>        53/UDP,53/TCP    125m
kube-system           service/rke2-ingress-nginx-controller-admission   ClusterIP   10.43.141.140   <none>        443/TCP          52m
kube-system           service/rke2-metrics-server                       ClusterIP   10.43.240.229   <none>        443/TCP          54m

NAMESPACE     NAME                                           DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR            AGE
kube-system   daemonset.apps/cilium                          1         1         1       1            1           kubernetes.io/os=linux   125m
kube-system   daemonset.apps/rke2-ingress-nginx-controller   1         1         1       1            1           kubernetes.io/os=linux   52m
kube-system   daemonset.apps/vultr-ccm                       1         1         1       1            1           <none>                   108m

NAMESPACE       NAME                                                   READY   UP-TO-DATE   AVAILABLE   AGE
cattle-system   deployment.apps/cattle-cluster-agent                   1/1     1            1           125m
cattle-system   deployment.apps/rancher-webhook                        1/1     1            1           93m
cattle-system   deployment.apps/system-upgrade-controller              1/1     1            1           52m
kube-system     deployment.apps/cilium-operator                        1/2     2            1           125m
kube-system     deployment.apps/rke2-coredns-rke2-coredns              1/1     1            1           125m
kube-system     deployment.apps/rke2-coredns-rke2-coredns-autoscaler   1/1     1            1           125m
kube-system     deployment.apps/rke2-metrics-server                    1/1     1            1           54m
kube-system     deployment.apps/rke2-snapshot-controller               1/1     1            1           54m

NAMESPACE       NAME                                                              DESIRED   CURRENT   READY   AGE
cattle-system   replicaset.apps/cattle-cluster-agent-769997fdcf                   0         0         0       57m
cattle-system   replicaset.apps/cattle-cluster-agent-85c5f74f4f                   0         0         0       93m
cattle-system   replicaset.apps/cattle-cluster-agent-8bf8f95c5                    0         0         0       125m
cattle-system   replicaset.apps/cattle-cluster-agent-9fb5964f8                    1         1         1       54m
cattle-system   replicaset.apps/rancher-webhook-5c85d7877c                        1         1         1       93m
cattle-system   replicaset.apps/system-upgrade-controller-5fb67f585d              1         1         1       52m
kube-system     replicaset.apps/cilium-operator-6c44d5bd85                        2         2         1       125m
kube-system     replicaset.apps/rke2-coredns-rke2-coredns-autoscaler-7b6fd8764b   1         1         1       125m
kube-system     replicaset.apps/rke2-coredns-rke2-coredns-c7b96fd45               1         1         1       125m
kube-system     replicaset.apps/rke2-metrics-server-5754f9f5c7                    1         1         1       54m
kube-system     replicaset.apps/rke2-snapshot-controller-58dbcfd956               1         1         1       54m

NAMESPACE             NAME                           READY   AGE
cattle-fleet-system   statefulset.apps/fleet-agent   1/1     92m

NAMESPACE     NAME                                                  STATUS     COMPLETIONS   DURATION   AGE
kube-system   job.batch/helm-install-rke2-cilium                    Complete   1/1           38s        125m
kube-system   job.batch/helm-install-rke2-coredns                   Complete   1/1           37s        125m
kube-system   job.batch/helm-install-rke2-ingress-nginx             Complete   1/1           73m        125m
kube-system   job.batch/helm-install-rke2-metrics-server            Complete   1/1           71m        125m
kube-system   job.batch/helm-install-rke2-runtimeclasses            Complete   1/1           71m        125m
kube-system   job.batch/helm-install-rke2-snapshot-controller       Complete   1/1           71m        125m
kube-system   job.batch/helm-install-rke2-snapshot-controller-crd   Complete   1/1           71m        125m
c
says the bootstrap node isn’t ready yet.
Check the rancher-system-agent logs on that node, make sure it’s all reporting OK. Check that a Machine exists for this node on the Rancher cluster. Check that cattle-cluster-agent is checking in with Rancher.
look at the capi controller manager logs on the rancher cluster to see what it says.
g
Hey Brandon thanks for your response. I had time to circle back to this.
Copy code
k get machines -A
NAMESPACE       NAME                      CLUSTER   NODENAME                  PROVIDERID                       PHASE          AGE     VERSION
fleet-default   test-pool1-skvmw-2n4kq    test                                                                 Provisioning   2m32s   
fleet-default   test-pool2-42ms5-r4wnm    test                                                                 Provisioning   2m32s
I did see that the machines are not actually provisioned The machine plan for the worker node actually doesn't have the secret populated
Copy code
k get secrets -A | grep pool2

fleet-default                              test-pool2-42ms5-r4wnm-machine-bootstrap                   rke.cattle.io/bootstrap               1      4m5s
fleet-default                              test-pool2-42ms5-r4wnm-machine-bootstrap-token-2dlj9       kubernetes.io/service-account-token   3      4m6s
fleet-default                              test-pool2-42ms5-r4wnm-machine-driver-secret               Opaque                                1      3m56s
fleet-default                              test-pool2-42ms5-r4wnm-machine-plan                        rke.cattle.io/machine-plan            0      4m6s
fleet-default                              test-pool2-42ms5-r4wnm-machine-plan-token-p8fxz            kubernetes.io/service-account-token   3      2m48s
fleet-default                              test-pool2-42ms5-r4wnm-machine-state                       rke.cattle.io/machine-state           1      3m56s
fleet-default                              vultr-pool2-4ncj6-t26m5-machine-bootstrap                  rke.cattle.io/bootstrap               1      4d3h
fleet-default                              vultr-pool2-4ncj6-t26m5-machine-bootstrap-token-wqm65      kubernetes.io/service-account-token   3      4d3h
fleet-default                              vultr-pool2-4ncj6-t26m5-machine-plan                       rke.cattle.io/machine-plan            13     4d3h
fleet-default                              vultr-pool2-4ncj6-t26m5-machine-plan-token-tg6b7           kubernetes.io/service-account-token   3      4d3h
fleet-default                              vultr-pool2-4ncj6-t26m5-machine-state                      rke.cattle.io/machine-state           1      4d3h
Copy code
root@test-pool2-42ms5-r4wnm:~# journalctl -u rancher-system-agent.service 

Jun 23 18:18:49 test-pool2-42ms5-r4wnm systemd[1]: Started Rancher System Agent.
Jun 23 18:18:49 test-pool2-42ms5-r4wnm rancher-system-agent[1435]: time="2025-06-23T18:18:49Z" level=info msg="Rancher System Agent version v0.3.11 (b8c28d0) is starting"
Jun 23 18:18:49 test-pool2-42ms5-r4wnm rancher-system-agent[1435]: time="2025-06-23T18:18:49Z" level=info msg="Using directory /var/lib/rancher/agent/work for work"
Jun 23 18:18:49 test-pool2-42ms5-r4wnm rancher-system-agent[1435]: time="2025-06-23T18:18:49Z" level=info msg="Starting remote watch of plans"
Jun 23 18:18:50 test-pool2-42ms5-r4wnm rancher-system-agent[1435]: time="2025-06-23T18:18:50Z" level=info msg="Starting /v1, Kind=Secret controller"
hmm the
capi-controller-manger
is waiting for the infrastructure...
Copy code
k logs pod/capi-controller-manager-bd4c76b4c-nbjpk -n cattle-provisioning-capi-system

I0623 18:28:11.846181       1 machine_controller_phases.go:306] "Waiting for infrastructure provider to create machine infrastructure and report status.ready" controller="machine" controllerGroup="cluster.x-k8s.io" controllerKind="Machine" Machine="fleet-default/test-pool1-skvmw-2n4kq" namespace="fleet-default" name="test-pool1-skvmw-2n4kq" reconcileID="6fef1bf4-e2f9-425f-bac2-26f628acc678" MachineSet="fleet-default/test-pool1-skvmw" MachineDeployment="fleet-default/test-pool1" Cluster="fleet-default/test" VultrMachine="fleet-default/test-pool1-skvmw-2n4kq"
I0623 18:28:11.846207       1 machine_controller_noderef.go:60] "Waiting for infrastructure provider to report spec.providerID" controller="machine" controllerGroup="cluster.x-k8s.io" controllerKind="Machine" Machine="fleet-default/test-pool1-skvmw-2n4kq" namespace="fleet-default" name="test-pool1-skvmw-2n4kq" reconcileID="6fef1bf4-e2f9-425f-bac2-26f628acc678" MachineSet="fleet-default/test-pool1-skvmw" MachineDeployment="fleet-default/test-pool1" Cluster="fleet-default/test" VultrMachine="fleet-default/test-pool1-skvmw-2n4kq"
If i look at the
vultrmachine
config for pool2 it looks to be populated accordingly
Copy code
kubectl get vultrmachine  vultr-pool2-4ncj6-t26m5 -o yaml  -n fleet-default
apiVersion: <http://rke-machine.cattle.io/v1|rke-machine.cattle.io/v1>
kind: VultrMachine
metadata:
  annotations:
    <http://cluster.x-k8s.io/cloned-from-groupkind|cluster.x-k8s.io/cloned-from-groupkind>: <http://VultrMachineTemplate.rke-machine.cattle.io|VultrMachineTemplate.rke-machine.cattle.io>
    <http://cluster.x-k8s.io/cloned-from-name|cluster.x-k8s.io/cloned-from-name>: vultr-pool2-a206879f
  creationTimestamp: "2025-06-19T14:41:52Z"
  finalizers:
  - <http://wrangler.cattle.io/machine-provision-remove|wrangler.cattle.io/machine-provision-remove>
  generation: 2
  labels:
    <http://cattle.io/os|cattle.io/os>: linux
    <http://cluster.x-k8s.io/cluster-name|cluster.x-k8s.io/cluster-name>: vultr
    <http://cluster.x-k8s.io/deployment-name|cluster.x-k8s.io/deployment-name>: vultr-pool2
    <http://cluster.x-k8s.io/set-name|cluster.x-k8s.io/set-name>: vultr-pool2-4ncj6
    machine-template-hash: 722474703-4ncj6
    <http://rke.cattle.io/capi-machine-name|rke.cattle.io/capi-machine-name>: vultr-pool2-4ncj6-t26m5
    <http://rke.cattle.io/cluster-name|rke.cattle.io/cluster-name>: vultr
    <http://rke.cattle.io/rke-machine-pool-name|rke.cattle.io/rke-machine-pool-name>: pool2
    <http://rke.cattle.io/worker-role|rke.cattle.io/worker-role>: "true"
  name: vultr-pool2-4ncj6-t26m5
  namespace: fleet-default
  ownerReferences:
  - apiVersion: <http://cluster.x-k8s.io/v1beta1|cluster.x-k8s.io/v1beta1>
    blockOwnerDeletion: true
    controller: true
    kind: Machine
    name: vultr-pool2-4ncj6-t26m5
    uid: 641f44f7-28d2-4faf-af5b-7837a59e0ceb
  resourceVersion: "490184690"
  uid: 500ccf3c-ab2a-4669-ac44-b996049e53b4
spec:
  apiKey: ""
  appId: "0"
  cloudInitUserData: ""
  common:
    cloudCredentialSecretName: cattle-global-data:cc-nkhlb
  ddosProtection: false
  enableVpc: false
  enabledIpv6: false
  firewallGroupId: ""
  floatingIpv4Id: ""
  imageId: ""
  ipxeChainUrl: ""
  isoId: ""
  osId: "1743"
  providerID: <rke2://vultr-pool2-4ncj6-t26m5>
  region: ewr
  sendActivationEmail: false
  snapshotId: ""
  sshKeyIds: []
  startupScriptId: ""
  tags: []
  vpcIds: []
  vpsBackups: false
  vpsPlan: voc-c-2c-4gb-75s-amd
status:
  addresses:
  - address: 144.202.3.168
    type: InternalIP
  - address: vultr-pool2-4ncj6-t26m5
    type: Hostname
  cloudCredentialSecretName: cattle-global-data:cc-nkhlb
  conditions:
  - message: ""
    reason: ""
    status: "True"
    type: CreateJob
  - message: ""
    reason: ""
    status: "True"
    type: Ready
  driverHash: 5b37756e1f55ddfff19b9d4fe3531003fe91f6d75c56c3bb6a9854bd1ea36b5f
  driverUrl: <https://rancher.vultr.dev/assets/docker-machine-driver-vultr>
  jobName: vultr-pool2-4ncj6-t26m5-machine-provision
  ready: true
c
is providerID getting set on the node in the downstream cluster?
that is the responsibility of the cloud controller manager on the downstream cluster
it looks like you’re just using the built-in rke2 cloud provider though, since I see
<rke2://vultr-pool2-4ncj6-t26m5>
does that match the actual cloud provider ID used by vultr? I suspect not.
g
I was deploying the CCM after provisioning, which was my mistake. I wasn't aware that the default CCM would take over if not provided on creation. All good now! Thanks for your help Brandon!
Copy code
k get no -A
NAME                          STATUS   ROLES                  AGE     VERSION
test-controller-7cgmf-v494j   Ready    control-plane,master   3m51s   v1.31.9+rke2r1
test-etcd-72tnk-tklhn         Ready    etcd                   3m46s   v1.31.9+rke2r1
test-worker-dnrkw-fm894       Ready    worker                 75s     v1.31.9+rke2r1
test-worker-dnrkw-gmlg2       Ready    worker                 75s     v1.31.9+rke2r1