Hi all, I am trying to deploy an RKE2 downstream ...
# general
s
Hi all, I am trying to deploy an RKE2 downstream cluster on the Rancher UI. I would like to use vSphere CSI, so when creating the cluster, I selected vSphere as the Cloud Provider. I use the below details. • Rancher Helm installation 2.7.6-stable As a basic step, I tried to deploy a basic RKE2 cluster with 1 master and 1 worker node using the cloud-provider vSphere, Canal as my CNI and version 1.26.8+rke2r1 As for the VM template I used a basic 22.04 Ubuntu image and I installed all the needed dependencies based on the following guide: https://ranchermanager.docs.rancher.com/how-to-guides/new-user-guides/launch-kubernetes-with-rancher/use-new-nodes-in-an-infra-provider/vsphere/create-a-vm-template Ufw was disabled for testing purposes. Whatever I tried, the cluster is not getting created by instead is stuck in "[Waiting] configuring bootstrap node(s) custom-3c632038a0e5: waiting for cluster agent to connect" state. If I leave it at the default on the cloud provider, the downstream cluster is created. As far as I know, modifying the cloud provider afterward is not supported because the node provider IDs are set when the node joins the cluster, and thereafter they are immutable. These is my downstream cluster config yaml: https://github.com/speedkup/rk2/blob/main/downstream_cluster_config.yaml Can anyone help me figure out what I'm doing wrong? The logs from the Rancher pod can be found below. master node: journalctl -u rancher-system-agent.service -f
Copy code
Feb 25 12:29:01 rke2-shared-mn1 rancher-system-agent[1546]: time="2024-02-25T12:29:01+01:00" level=info msg="[K8s] updated plan secret fleet-default/custom-3c632038a0e5-machine-plan with feedback"
Feb 25 12:38:37 rke2-shared-mn1 rancher-system-agent[1546]: time="2024-02-25T12:38:37+01:00" level=info msg="[Applyinator] No image provided, creating empty working directory /var/lib/rancher/agent/work/20240225-123837/4d0d847abd604cb5a97d41fceaab4eb9cc956e0c30836d0e402f16dc3de33df8_0"
Feb 25 12:38:37 rke2-shared-mn1 rancher-system-agent[1546]: time="2024-02-25T12:38:37+01:00" level=info msg="[Applyinator] Running command: sh [-c rke2 etcd-snapshot list --etcd-s3=false 2>/dev/null]"
Feb 25 12:38:37 rke2-shared-mn1 rancher-system-agent[1546]: time="2024-02-25T12:38:37+01:00" level=info msg="[4d0d847abd604cb5a97d41fceaab4eb9cc956e0c30836d0e402f16dc3de33df8_0:stdout]: Name Location Size Created"
Feb 25 12:38:37 rke2-shared-mn1 rancher-system-agent[1546]: time="2024-02-25T12:38:37+01:00" level=info msg="[Applyinator] Command sh [-c rke2 etcd-snapshot list --etcd-s3=false 2>/dev/null] finished with err: <nil> and exit code: 0"
Feb 25 12:38:37 rke2-shared-mn1 rancher-system-agent[1546]: time="2024-02-25T12:38:37+01:00" level=info msg="[K8s] updated plan secret fleet-default/custom-3c632038a0e5-machine-plan with feedback"
master node kubelet.log tail -f /var/lib/rancher/rke2/agent/logs/kubelet.log
Copy code
W0225 12:29:39.428359    1836 driver-call.go:149] FlexVolume: driver call failed: executable: /var/lib/kubelet/volumeplugins/nodeagent~uds/uds, args: [init], error: executable file not found in $PATH, output: ""
E0225 12:29:39.428374    1836 plugins.go:736] "Error dynamically probing plugins" err="error creating Flexvolume plugin from directory nodeagent~uds, skipping. Error: unexpected end of JSON input"
E0225 12:29:39.428933    1836 driver-call.go:262] Failed to unmarshal output for command: init, output: "", error: unexpected end of JSON input
W0225 12:29:39.428945    1836 driver-call.go:149] FlexVolume: driver call failed: executable: /var/lib/kubelet/volumeplugins/nodeagent~uds/uds, args: [init], error: executable file not found in $PATH, output: ""
E0225 12:29:39.428972    1836 plugins.go:736] "Error dynamically probing plugins" err="error creating Flexvolume plugin from directory nodeagent~uds, skipping. Error: unexpected end of JSON input"
E0225 12:29:39.429316    1836 driver-call.go:262] Failed to unmarshal output for command: init, output: "", error: unexpected end of JSON input
W0225 12:29:39.429327    1836 driver-call.go:149] FlexVolume: driver call failed: executable: /var/lib/kubelet/volumeplugins/nodeagent~uds/uds, args: [init], error: executable file not found in $PATH, output: ""
E0225 12:29:39.429342    1836 plugins.go:736] "Error dynamically probing plugins" err="error creating Flexvolume plugin from directory nodeagent~uds, skipping. Error: unexpected end of JSON input"
I0225 12:29:47.361637    1836 pod_startup_latency_tracker.go:102] "Observed pod startup duration" pod="kube-system/rke2-canal-857gm" podStartSLOduration=-9.223372011493193e+09 pod.CreationTimestamp="2024-02-25 12:29:22 +0100 CET" firstStartedPulling="2024-02-25 12:29:23.422853807 +0100 CET m=+40.683324509" lastFinishedPulling="0001-01-01 00:00:00 +0000 UTC" observedRunningTime="2024-02-25 12:29:47.34101883 +0100 CET m=+64.601489584" watchObservedRunningTime="2024-02-25 12:29:47.361584014 +0100 CET m=+64.622054743"
I0225 12:29:49.337811    1836 pod_startup_latency_tracker.go:102] "Observed pod startup duration" pod="kube-system/rke2-multus-ds-c7jsw" podStartSLOduration=-9.223372011517014e+09 pod.CreationTimestamp="2024-02-25 12:29:24 +0100 CET" firstStartedPulling="2024-02-25 12:29:24.665866378 +0100 CET m=+41.926337089" lastFinishedPulling="0001-01-01 00:00:00 +0000 UTC" observedRunningTime="2024-02-25 12:29:49.337188573 +0100 CET m=+66.597659328" watchObservedRunningTime="2024-02-25 12:29:49.337761907 +0100 CET m=+66.598232642"
worker node: journalctl -u rancher-system-agent.service -f
Copy code
Feb 25 12:28:06 rke2-mgmt-wn1 systemd[1]: Started Rancher System Agent.
Feb 25 12:28:06 rke2-mgmt-wn1 rancher-system-agent[2489990]: time="2024-02-25T12:28:06+01:00" level=info msg="Rancher System Agent version v0.3.3 (9e827a5) is starting"
Feb 25 12:28:06 rke2-mgmt-wn1 rancher-system-agent[2489990]: time="2024-02-25T12:28:06+01:00" level=info msg="Using directory /var/lib/rancher/agent/work for work"
Feb 25 12:28:06 rke2-mgmt-wn1 rancher-system-agent[2489990]: time="2024-02-25T12:28:06+01:00" level=info msg="Starting remote watch of plans"
Feb 25 12:28:06 rke2-mgmt-wn1 rancher-system-agent[2489990]: E0225 12:28:06.754590 2489990 memcache.go:206] couldn't get resource list for management.cattle.io/v3:
Feb 25 12:28:06 rke2-mgmt-wn1 rancher-system-agent[2489990]: time="2024-02-25T12:28:06+01:00" level=info msg="Starting /v1, Kind=Secret controller"
cattle-provisioning-capi-system pod log:
Copy code
"Waiting for infrastructure provider to create machine infrastructure and report status.ready" controller="machine" controllerGroup="cluster.x-k8s.io" controllerKind="Machine" Machine="fleet-default/ │
│ .go:54] "Waiting for infrastructure provider to report spec.providerID" controller="machine" controllerGroup="cluster.x-k8s.io" controllerKind="Machine" Machine="fleet-default/custom-2523a42b9a24" namespace= │
│ go:286]
cattle-system pod log:
Copy code
2024/02/25 12:15:51 [ERROR] error syncing '_all_': handler user-controllers-controller: userControllersController: failed to set peers for key _all_: failed to start user controllers for cluster c-m-jmkm5jnv │
2024/02/25 12:17:51 [ERROR] error syncing '_all_': handler user-controllers-controller: userControllersController: failed to set peers for key _all_: failed to start user controllers for cluster c-m-jmkm5jnv