https://rancher.com/ logo
g

gorgeous-minister-309

09/13/2022, 8:04 AM
Hello. I'm trying to deploy a cluster in a vSphere environment, without success. I've tried using RKE1, RKE2, even k3s but nothing works. I'll describe one of my attempts, using RKE2. I'm using a Ubuntu 20.04 template, including vmtoolsd. The virtual machine is created in vsphere, boots and get an IP address correctly. Then the installation script is executed (/usr/local/custom_script/install.sh) and is looping on the
retrieve_connection_info
function. The call to
/v3/connect/agent
is failing with a 401 Unauthorized error. The rancher server is accessible from the virtual machine. The script is called without any parameters but includes some references to my rancher instance :
CATTLE_AGENT_BINARY_BASE_URL
,
CATTLE_SERVER
,
CATTLE_TOKEN
. I don't understand why this is failing. Any idea?
Versions: rancher 2.6.8, vsphere 7.0.3
s

shy-actor-78724

09/13/2022, 9:22 AM
Is the VM multi-homed, so is it perhaps using the wrong interface to connect to Rancher? Do you see any relevant logs from Rancher pods?
g

gorgeous-minister-309

09/13/2022, 9:37 AM
I have only one network interface on the virtual machine. The rancher instance is reachable (https) from the VM. I can't find useful logs from rancher side. Just:
Copy code
2022/09/13 08:49:48 [INFO] [planner] rkecluster fleet-default/rke2: waiting: waiting for viable init node
2022/09/13 08:50:51 [INFO] [planner] rkecluster fleet-default/rke2: waiting: configuring bootstrap node(s) rke2-pool1-5fb5f65fbf-mtwbm: waiting for bootstrap etcd to be available
2022/09/13 08:50:51 [ERROR] [planner] rkecluster fleet-default/rke2: error encountered during plan processing was Operation cannot be fulfilled on <http://machines.cluster.x-k8s.io|machines.cluster.x-k8s.io> "rke2-pool1-5fb5f65fbf-mtwbm": the object has been modified; please apply your changes to the latest version and try again
2022/09/13 08:50:51 [INFO] [planner] rkecluster fleet-default/rke2: waiting: configuring bootstrap node(s) rke2-pool1-5fb5f65fbf-mtwbm: waiting for agent to check in and apply initial plan
And from the node side:
Copy code
error 401 received while downloading Rancher connection information. Sleeping for 5 seconds and trying again
The command used at this step is:
Copy code
curl --connect-timeout 60 --max-time 60 --write-out %{http_code}\n -sS -H "Authorization: Bearer 47DEQp{REDACTED}uFU=" -H "X-Cattle-Id: 2c47306bda0e6f8{REDACTED}d4eaf217eb" -H "X-Cattle-Role-Etcd: false" -H "X-Cattle-Role-Control-Plane: false" -H "X-Cattle-Role-Worker: false" -H "X-Cattle-Node-Name: " -H "X-Cattle-Address: " -H "X-Cattle-Internal-Address: " -H "X-Cattle-Labels: " -H "X-Cattle-Taints: " <https://rancher.REDACTED/v3/connect/agent> -o /var/lib/rancher/agent/rancher2_connection_info.json
If I use a k3s cluster deployment (tech preview), this step succeeds and the node is initialized. After that, I have troubles configuring a CPI/CSI. I'd like to stick to a RKE1 or RKE2 cluster, though.
a

agreeable-oil-87482

09/13/2022, 11:41 AM
Can you grab the logs from the
rancher-system-agent
service. please
g

gorgeous-minister-309

09/13/2022, 11:54 AM
Do you mean on the deployed node? If so, I don't have a rancher-system-agent service configured yet.
The binary was downloaded, but I didn't reach the service installation step.
a

agreeable-oil-87482

09/13/2022, 11:59 AM
And you're using the vSphere node driver?
👍 1
g

gorgeous-minister-309

09/13/2022, 12:04 PM
I don't understand because the step it fails on is OK when I select a k3s kubernetes flavor instead of rke2. No 401 Unauthorized…
a

agreeable-oil-87482

09/13/2022, 12:06 PM
Can you enable debug logging in rancher and try again please
g

gorgeous-minister-309

09/13/2022, 12:30 PM
OK. I don't know what to obfuscate before publishing the whole result, though. Right now, it loops on the 401 error in the node and the rancher logs loop is:
Copy code
2022/09/13 12:29:36 [DEBUG] [CAPI] Reconcile MachineSet
2022/09/13 12:29:36 [DEBUG] [CAPI] Cannot retrieve CRD with metadata only client, falling back to slower listing
2022/09/13 12:29:36 [DEBUG] [CAPI] Cannot retrieve CRD with metadata only client, falling back to slower listing
2022/09/13 12:29:36 [DEBUG] [CAPI] Unable to retrieve Node status, missing NodeRef
2022/09/13 12:29:36 [DEBUG] [CAPI] Some nodes are not ready yet, requeuing until they are ready
2022/09/13 12:29:38 [DEBUG] [rke2configserver] parsed 8698dcff893f96d39ab97dde871968ae56b13a687e1e2680cfd16c5184fd5c6 as machineID
2022/09/13 12:29:38 [DEBUG] [rke2configserver] Got / machine from provisioning SA
2022/09/13 12:29:38 [DEBUG] [rke2configserver] Got / machine from cluster token
2022/09/13 12:29:40 [DEBUG] ObjectsAreEqualResults for machine-xwqg8: statusEqual: true conditionsEqual: false specEqual: true nodeNameEqual: true labelsEqual: true annotationsEqual: true requestsEqual: true limitsEqual: true rolesEqual: true
2022/09/13 12:29:40 [DEBUG] ObjectsAreEqualResults for machine-xwqg8: statusEqual: true conditionsEqual: false specEqual: true nodeNameEqual: true labelsEqual: true annotationsEqual: true requestsEqual: true limitsEqual: true rolesEqual: true
2022/09/13 12:29:40 [DEBUG] Updating machine for node [local-node]
2022/09/13 12:29:40 [DEBUG] Updated machine for node [local-node]
2022/09/13 12:29:40 [DEBUG] DesiredSet - No change(2) <http://provisioning.cattle.io/v1|provisioning.cattle.io/v1>, Kind=Cluster fleet-local/local for provisioning-cluster-create local
2022/09/13 12:29:40 [DEBUG] DesiredSet - No change(2) /v1, Kind=Secret fleet-local/local-kubeconfig for cluster-create fleet-local/local
2022/09/13 12:29:40 [DEBUG] DesiredSet - No change(2) <http://management.cattle.io/v3|management.cattle.io/v3>, Kind=ClusterRoleTemplateBinding local/local-fleet-local-owner for cluster-create fleet-local/local
2022/09/13 12:29:40 [DEBUG] DesiredSet - No change(2) <http://fleet.cattle.io/v1alpha1|fleet.cattle.io/v1alpha1>, Kind=Cluster fleet-local/local for fleet-cluster fleet-local/local
2022/09/13 12:29:43 [DEBUG] [rke2configserver] parsed 8698dcff893f96d39ab97dde871968ae56b13a687e1e2680cfd16c5184fd5c6 as machineID
2022/09/13 12:29:43 [DEBUG] [rke2configserver] Got / machine from provisioning SA
2022/09/13 12:29:43 [DEBUG] [rke2configserver] Got / machine from cluster token
2022/09/13 12:29:45 [DEBUG] DesiredSet - No change(2) /v1, Kind=ServiceAccount fleet-default/cl1-bootstrap-template-x52q7-machine-bootstrap for rke-machine fleet-default/cl1-bootstrap-template-x52q7
2022/09/13 12:29:45 [DEBUG] DesiredSet - No change(2) /v1, Kind=ServiceAccount fleet-default/cl1-bootstrap-template-x52q7-machine-plan for rke-machine fleet-default/cl1-bootstrap-template-x52q7
2022/09/13 12:29:45 [DEBUG] DesiredSet - No change(2) /v1, Kind=Secret fleet-default/cl1-bootstrap-template-x52q7-machine-bootstrap for rke-machine fleet-default/cl1-bootstrap-template-x52q7
2022/09/13 12:29:45 [DEBUG] DesiredSet - No change(2) /v1, Kind=Secret fleet-default/cl1-bootstrap-template-x52q7-machine-plan for rke-machine fleet-default/cl1-bootstrap-template-x52q7
2022/09/13 12:29:45 [DEBUG] DesiredSet - No change(2) <http://rbac.authorization.k8s.io/v1|rbac.authorization.k8s.io/v1>, Kind=Role fleet-default/cl1-bootstrap-template-x52q7-machine-plan for rke-machine fleet-default/cl1-bootstrap-template-x52q7
2022/09/13 12:29:45 [DEBUG] DesiredSet - No change(2) <http://rbac.authorization.k8s.io/v1|rbac.authorization.k8s.io/v1>, Kind=RoleBinding fleet-default/cl1-bootstrap-template-x52q7-machine-plan for rke-machine fleet-default/cl1-bootstrap-template-x52q7
2022/09/13 12:29:48 [DEBUG] [CAPI] Cannot retrieve CRD with metadata only client, falling back to slower listing
2022/09/13 12:29:48 [DEBUG] [CAPI] Infrastructure provider is not ready, requeuing
2022/09/13 12:29:48 [DEBUG] [CAPI] Cannot reconcile Machine's Node, no valid ProviderID yet
2022/09/13 12:29:48 [DEBUG] Searching for providerID for selector <http://rke.cattle.io/machine=c3492044-4855-44ed-8d9b-c8d4c5185a8f|rke.cattle.io/machine=c3492044-4855-44ed-8d9b-c8d4c5185a8f> in cluster fleet-default/cl1, machine cl1-pool1-769dbbd958-w5tb2: Get "<https://10.43.10.155/k8s/clusters/c-m-stx6kqzc/api/v1/nodes?labelSelector=rke.cattle.io%!F(MISSING)machine%!D(MISSING)c3492044-4855-44ed-8d9b-c8d4c5185a8f>": dial tcp 10.43.10.155:443: connect: no route to host
2022/09/13 12:29:48 [DEBUG] [rke2configserver] parsed 8698dcff893f96d39ab97dde871968ae56b13a687e1e2680cfd16c5184fd5c6 as machineID
2022/09/13 12:29:48 [DEBUG] [rke2configserver] Got / machine from provisioning SA
2022/09/13 12:29:48 [DEBUG] [rke2configserver] Got / machine from cluster token
2022/09/13 12:29:51 [DEBUG] Extras returned map[principalid:[<local://user-lrxjb>] username:[admin]]
2022/09/13 12:29:51 [DEBUG] Triggering auth refresh on user-lrxjb
2022/09/13 12:29:51 [DEBUG] Skipping refresh for user-lrxjb due to max-age
2022/09/13 12:29:51 [DEBUG] [CAPI] Reconcile MachineSet
2022/09/13 12:29:51 [DEBUG] [CAPI] Cannot retrieve CRD with metadata only client, falling back to slower listing
2022/09/13 12:29:51 [DEBUG] [CAPI] Cannot retrieve CRD with metadata only client, falling back to slower listing
2022/09/13 12:29:51 [DEBUG] [CAPI] Unable to retrieve Node status, missing NodeRef
2022/09/13 12:29:51 [DEBUG] [CAPI] Some nodes are not ready yet, requeuing until they are ready
2022/09/13 12:29:53 [DEBUG] [rke2configserver] parsed 8698dcff893f96d39ab97dde871968ae56b13a687e1e2680cfd16c5184fd5c6 as machineID
2022/09/13 12:29:53 [DEBUG] [rke2configserver] Got / machine from provisioning SA
2022/09/13 12:29:53 [DEBUG] [rke2configserver] Got / machine from cluster token
2022/09/13 12:29:59 [DEBUG] [rke2configserver] parsed 8698dcff893f96d39ab97dde871968ae56b13a687e1e2680cfd16c5184fd5c6 as machineID
2022/09/13 12:29:59 [DEBUG] [rke2configserver] Got / machine from provisioning SA
2022/09/13 12:29:59 [DEBUG] [rke2configserver] Got / machine from cluster token
2022/09/13 12:30:03 [DEBUG] DesiredSet - No change(2) /v1, Kind=ServiceAccount fleet-default/cl1-bootstrap-template-x52q7-machine-bootstrap for rke-machine fleet-default/cl1-bootstrap-template-x52q7
2022/09/13 12:30:03 [DEBUG] DesiredSet - No change(2) /v1, Kind=ServiceAccount fleet-default/cl1-bootstrap-template-x52q7-machine-plan for rke-machine fleet-default/cl1-bootstrap-template-x52q7
2022/09/13 12:30:03 [DEBUG] DesiredSet - No change(2) /v1, Kind=Secret fleet-default/cl1-bootstrap-template-x52q7-machine-bootstrap for rke-machine fleet-default/cl1-bootstrap-template-x52q7
2022/09/13 12:30:03 [DEBUG] DesiredSet - No change(2) /v1, Kind=Secret fleet-default/cl1-bootstrap-template-x52q7-machine-plan for rke-machine fleet-default/cl1-bootstrap-template-x52q7
2022/09/13 12:30:03 [DEBUG] DesiredSet - No change(2) <http://rbac.authorization.k8s.io/v1|rbac.authorization.k8s.io/v1>, Kind=Role fleet-default/cl1-bootstrap-template-x52q7-machine-plan for rke-machine fleet-default/cl1-bootstrap-template-x52q7
2022/09/13 12:30:03 [DEBUG] DesiredSet - No change(2) <http://rbac.authorization.k8s.io/v1|rbac.authorization.k8s.io/v1>, Kind=RoleBinding fleet-default/cl1-bootstrap-template-x52q7-machine-plan for rke-machine fleet-default/cl1-bootstrap-template-x52q7
2022/09/13 12:30:04 [DEBUG] [rke2configserver] parsed 8698dcff893f96d39ab97dde871968ae56b13a687e1e2680cfd16c5184fd5c6 as machineID
2022/09/13 12:30:04 [DEBUG] [rke2configserver] Got / machine from provisioning SA
2022/09/13 12:30:04 [DEBUG] [rke2configserver] Got / machine from cluster token
2022/09/13 12:30:06 [DEBUG] [CAPI] Reconcile MachineSet
I'm back on the topic. On a new rancher instance, the RKE2 deployment is initiating correctly, this time. The cluster (one node) is up but still marked as "Updating" in the cluster management page. I had this same issue on a "custom" cluster. The node stays in a "Waiting for Node Ref" state.
I have this error in the rancher logs:
Copy code
2022/09/16 13:24:11 [ERROR] [CAPI] Reconciler error: error creating client and cache for remote cluster: error creating dynamic rest mapper for remote cluster "fleet-default/cdr4": Get "<https://10.43.150.240/k8s/clusters/c-m-shq6vqt7/api?timeout=10s>": dial tcp 10.43.150.240:443: connect: no route to host
I don't understand it. Does rancher try to reach the cluster on an internal private IP address?
Hello @agreeable-oil-87482. I'd much appreciate your insight on this if you have some time.
880 Views