https://rancher.com/ logo
l

lively-translator-30710

11/29/2022, 3:40 PM
We have are trying to spin up clusters from Rancher on a three node Harvester cluster, but the never seem to finish starting up - they end up in this endless loop below. Is there anything specific we should be looking for to troubleshot this? Are they any timeouts we can increase - if it is just taking too long…. Anything else?
Copy code
[INFO ] configuring bootstrap node(s) pax-pool1-cb9c56f68-l9fss: waiting for agent to check in and apply initial plan
[INFO ] configuring bootstrap node(s) pax-pool1-cb9c56f68-l9fss: waiting for probes: calico, etcd, kube-apiserver, kube-controller-manager, kube-scheduler, kubelet
[INFO ] configuring bootstrap node(s) pax-pool1-cb9c56f68-l9fss: waiting for probes: calico, etcd, kube-apiserver, kube-controller-manager, kube-scheduler
[INFO ] configuring bootstrap node(s) pax-pool1-cb9c56f68-l9fss: waiting for probes: calico, kube-apiserver, kube-controller-manager, kube-scheduler
[INFO ] configuring bootstrap node(s) pax-pool1-cb9c56f68-l9fss: waiting for probes: calico, kube-controller-manager, kube-scheduler
[INFO ] configuring bootstrap node(s) pax-pool1-cb9c56f68-l9fss: waiting for probes: calico, kube-scheduler
[INFO ] configuring bootstrap node(s) pax-pool1-cb9c56f68-l9fss: waiting for probes: calico
[INFO ] configuring bootstrap node(s) pax-pool1-cb9c56f68-l9fss: waiting for probes: calico, etcd, kube-apiserver
[INFO ] configuring bootstrap node(s) pax-pool1-cb9c56f68-l9fss: waiting for probes: calico, kube-apiserver
[INFO ] configuring bootstrap node(s) pax-pool1-cb9c56f68-l9fss: waiting for probes: calico, kube-controller-manager, kube-scheduler
[INFO ] configuring bootstrap node(s) pax-pool1-cb9c56f68-l9fss: waiting for probes: calico
[INFO ] configuring bootstrap node(s) pax-pool1-cb9c56f68-l9fss: waiting for probes: calico, kube-apiserver
[INFO ] configuring bootstrap node(s) pax-pool1-cb9c56f68-l9fss: waiting for probes: calico, etcd, kube-apiserver
[INFO ] configuring bootstrap node(s) pax-pool1-cb9c56f68-l9fss: waiting for probes: calico, etcd, kube-apiserver, kube-controller-manager, kube-scheduler
[INFO ] configuring bootstrap node(s) pax-pool1-cb9c56f68-l9fss: waiting for probes: calico, kube-controller-manager, kube-scheduler, kubelet
[INFO ] configuring bootstrap node(s) pax-pool1-cb9c56f68-l9fss: waiting for probes: calico, kube-controller-manager, kube-scheduler
[INFO ] configuring bootstrap node(s) pax-pool1-cb9c56f68-l9fss: waiting for probes: calico
[INFO ] configuring bootstrap node(s) pax-pool1-cb9c56f68-l9fss: waiting for cluster agent to connect
[INFO ] configuring bootstrap node(s) pax-pool1-cb9c56f68-l9fss: waiting for probes: etcd, kube-apiserver
[INFO ] configuring bootstrap node(s) pax-pool1-cb9c56f68-l9fss: waiting for probes: kube-controller-manager, kube-scheduler
[INFO ] configuring bootstrap node(s) pax-pool1-cb9c56f68-l9fss: waiting for probes: kube-apiserver, kube-controller-manager, kube-scheduler
[INFO ] configuring bootstrap node(s) pax-pool1-cb9c56f68-l9fss: waiting for probes: etcd, kube-apiserver, kube-controller-manager, kube-scheduler
[INFO ] configuring bootstrap node(s) pax-pool1-cb9c56f68-l9fss: waiting for probes: kube-apiserver, kube-controller-manager, kube-scheduler
[INFO ] configuring bootstrap node(s) pax-pool1-cb9c56f68-l9fss: waiting for probes: kube-controller-manager, kube-scheduler
[INFO ] configuring bootstrap node(s) pax-pool1-cb9c56f68-l9fss: waiting for probes: kube-controller-manager, kube-scheduler, kubelet
[INFO ] configuring bootstrap node(s) pax-pool1-cb9c56f68-l9fss: waiting for probes: kube-controller-manager, kube-scheduler
[INFO ] configuring bootstrap node(s) pax-pool1-cb9c56f68-l9fss: waiting for cluster agent to connect
[INFO ] configuring bootstrap node(s) pax-pool1-cb9c56f68-l9fss: waiting for probes: kube-apiserver
[INFO ] configuring bootstrap node(s) pax-pool1-cb9c56f68-l9fss: waiting for cluster agent to connect
[INFO ] configuring bootstrap node(s) pax-pool1-cb9c56f68-l9fss: waiting for probes: etcd, kube-apiserver
[INFO ] configuring bootstrap node(s) pax-pool1-cb9c56f68-l9fss: waiting for cluster agent to connect
[INFO ] configuring bootstrap node(s) pax-pool1-cb9c56f68-l9fss: waiting for probes: etcd, kube-apiserver
[INFO ] configuring bootstrap node(s) pax-pool1-cb9c56f68-l9fss: waiting for probes: kube-controller-manager, kube-scheduler
[INFO ] configuring bootstrap node(s) pax-pool1-cb9c56f68-l9fss: waiting for cluster agent to connect
[INFO ] configuring bootstrap node(s) pax-pool1-cb9c56f68-l9fss: waiting for probes: kube-apiserver
[INFO ] configuring bootstrap node(s) pax-pool1-cb9c56f68-l9fss: waiting for cluster agent to connect
[INFO ] configuring bootstrap node(s) pax-pool1-cb9c56f68-l9fss: waiting for probes: kube-apiserver
[INFO ] configuring bootstrap node(s) pax-pool1-cb9c56f68-l9fss: waiting for cluster agent to connect
[INFO ] configuring bootstrap node(s) pax-pool1-cb9c56f68-l9fss: waiting for probes: kube-controller-manager
[INFO ] configuring bootstrap node(s) pax-pool1-cb9c56f68-l9fss: waiting for probes: kube-controller-manager, kube-scheduler
[INFO ] configuring bootstrap node(s) pax-pool1-cb9c56f68-l9fss: waiting for probes: kube-apiserver, kube-controller-manager, kube-scheduler
[INFO ] configuring bootstrap node(s) pax-pool1-cb9c56f68-l9fss: waiting for probes: kube-controller-manager, kube-scheduler
[INFO ] configuring bootstrap node(s) pax-pool1-cb9c56f68-l9fss: waiting for cluster agent to connect
[INFO ] configuring bootstrap node(s) pax-pool1-cb9c56f68-l9fss: waiting for probes: etcd, kube-apiserver
[INFO ] configuring bootstrap node(s) pax-pool1-cb9c56f68-l9fss: waiting for probes: etcd, kube-apiserver, kube-scheduler
[INFO ] configuring bootstrap node(s) pax-pool1-cb9c56f68-l9fss: waiting for probes: etcd, kube-apiserver, kube-controller-manager, kube-scheduler
[INFO ] configuring bootstrap node(s) pax-pool1-cb9c56f68-l9fss: waiting for probes: etcd, kube-apiserver, kube-controller-manager, kube-scheduler, kubelet
[INFO ] configuring bootstrap node(s) pax-pool1-cb9c56f68-l9fss: waiting for probes: kube-apiserver, kube-controller-manager, kube-scheduler, kubelet
[INFO ] configuring bootstrap node(s) pax-pool1-cb9c56f68-l9fss: waiting for probes: kube-controller-manager, kube-scheduler, kubelet
[INFO ] configuring bootstrap node(s) pax-pool1-cb9c56f68-l9fss: waiting for probes: kube-controller-manager, kube-scheduler
[INFO ] configuring bootstrap node(s) pax-pool1-cb9c56f68-l9fss: waiting for probes: kube-scheduler
[INFO ] configuring bootstrap node(s) pax-pool1-cb9c56f68-l9fss: waiting for cluster agent to connect
[INFO ] configuring bootstrap node(s) pax-pool1-cb9c56f68-l9fss: waiting for probes: kube-apiserver
[INFO ] configuring bootstrap node(s) pax-pool1-cb9c56f68-l9fss: waiting for cluster agent to connect
s

salmon-city-57654

11/29/2022, 4:06 PM
Hi @lively-translator-30710, it depends on your VM resource. What resource did you allocate to the VM to spin up the rancher? Also, try to log in to this VM to check if any errors occur. If your resource is not enough, you will see some OOM-related logs.
l

lively-translator-30710

11/29/2022, 4:10 PM
Greetings. Thanks for your quick response. For resources, on this last pass I gave it 8 cores and 16GB so I believe that should be enough (?). Going back over the logs.. .haven’t found anything obvious yet.
w

witty-jelly-95845

11/29/2022, 4:11 PM
Versions?
l

lively-translator-30710

11/29/2022, 4:17 PM
Rancher 2.6.9 and 2.7.0, Harvester 1.1.0. Tried ubuntu cloud images for 18.04 and 22.04
s

salmon-city-57654

11/29/2022, 4:23 PM
8 cores/16GB is enough. I will also check on my environment. Did you try the ubuntu image 20.04? And how long did you wait?
l

lively-translator-30710

11/29/2022, 4:44 PM
Haven’t tried 20.04. I can do that next. As to waiting - overnight… I only sent a snippet of the log… 🙂 In the latest attempt, I see that it periodically seems to restart a vm - including an Error message
Copy code
[INFO ] configuring bootstrap node(s) pax-pool1-cb9c56f68-l9fss: waiting for probes: kube-apiserver, kube-controller-manager, kube-scheduler
[INFO ] configuring bootstrap node(s) pax-pool1-cb9c56f68-l9fss: waiting for probes: etcd, kube-apiserver, kube-controller-manager, kube-scheduler
[INFO ] configuring bootstrap node(s) pax-pool1-cb9c56f68-l9fss: waiting for probes: kube-apiserver, kube-controller-manager, kube-scheduler
[INFO ] configuring bootstrap node(s) pax-pool1-cb9c56f68-l9fss: waiting for probes: kube-controller-manager, kube-scheduler
[INFO ] pax-pool1-cb9c56f68-l9fss
[INFO ] configuring bootstrap node(s) pax-pool1-cb9c56f68-l9fss: waiting for probes: kube-apiserver, kube-controller-manager
[INFO ] waiting for viable init node
[ERROR] Operation cannot be fulfilled on <http://machines.cluster.x-k8s.io|machines.cluster.x-k8s.io> "pax-pool1-cb9c56f68-l9fss": the object has been modified; please apply your changes to the latest version and try again
[INFO ] configuring bootstrap node(s) pax-pool1-cb9c56f68-l9fss: waiting for agent to check in and apply initial plan
[INFO ] configuring bootstrap node(s) pax-pool1-cb9c56f68-l9fss: waiting for probes: calico, etcd, kube-apiserver, kube-controller-manager, kube-scheduler, kubelet
[INFO ] configuring bootstrap node(s) pax-pool1-cb9c56f68-l9fss: waiting for probes: calico, etcd, kube-apiserver, kube-controller-manager, kube-scheduler
[INFO ] configuring bootstrap node(s) pax-pool1-cb9c56f68-l9fss: waiting for probes: calico, kube-apiserver, kube-controller-manager, kube-scheduler
[INFO ] configuring bootstrap node(s) pax-pool1-cb9c56f68-l9fss: waiting for probes: calico, kube-controller-manager, kube-scheduler
Do you install iptables?
l

lively-translator-30710

11/29/2022, 4:53 PM
I -am- using Rancher to deploy the cluster in Harvester with RKE2. Both Rancher and Harvester were clean installs.
f

full-plastic-79795

11/29/2022, 4:54 PM
l

lively-translator-30710

11/29/2022, 5:04 PM
Interesting… Will try that… On a previous installation of 2.6.3/1.0.3 we didn’t need to make these settings…. Will try that next.
I tried adding iptables. I also tried using cilium. No change - same results with both.
s

salmon-city-57654

11/30/2022, 10:34 AM
Hi @lively-translator-30710, Could you provide the rancher version, the rke2 version? Also, providing the
rke2-server
,
rancher-server-agent
, and kernel logs on the VM which handles the guest cluster would be more helpful. Thanks!
l

lively-translator-30710

11/30/2022, 1:44 PM
Current test was on Rancher 2.7.0 (same problem on 2.6.9). and Harvester 1.1.0 - so the RKE2 version that comes native with the release.. Harvester is running on bare metal - Dell Blades. I’ll work on getting the logs.
s

salmon-city-57654

12/01/2022, 2:11 AM
Hi @lively-translator-30710, Yes, please get the related logs from this guest cluster. We have tested with rancher 2.7.0 + rke2 1.24.7, and it works. Please also check your rke2 version.
l

lively-translator-30710

12/02/2022, 3:30 PM
Greetings, to help find the correct logs, which pods do I need to get the logs from? I
s

salmon-city-57654

12/04/2022, 2:29 PM
Hi, @lively-translator-30710 For the logs on the guest cluster, you may find a VM on the related harvester cluster. Then, log in to this VM and try to get the
rke2-server
,
rancher-server-agent
, and kernel logs. Thanks!
Hi @lively-translator-30710, again, one more thing you could check. Did your VM that the guest cluster and the imported Harvester cluster should be connected to each other. Could you check for it? Thanks!
l

lively-translator-30710

12/05/2022, 7:09 PM
the rke2 and rancher-server logs seem to be in syslog. This is the most ‘interesting’ error in those logs: Dec 5 185326 pax-test-1-pool1-2a9c45bc-lc8vz rancher-system-agent[2075]: time=“2022-12-05T185326Z” level=error msg=“[K8s] received secret to process that was older than the last secret operated on. (70898741 vs 70898775)” Dec 5 185326 pax-test-1-pool1-2a9c45bc-lc8vz rancher-system-agent[2075]: time=“2022-12-05T185326Z” level=error msg=“error syncing ‘fleet-default/pax-test-1-bootstrap-template-bpxsr-machine-plan’: handler secret-watch: secret received was too old, requeuing” Dec 5 185331 pax-test-1-pool1-2a9c45bc-lc8vz rancher-system-agent[2075]: time=“2022-12-05T185331Z” level=error msg=“[K8s] received secret to process that was older than the last secret operated on. (70898775 vs 70898810)”
This is the other error that shows up: Dec 5 185013 pax-test-1-pool1-2a9c45bc-lc8vz rancher-system-agent[2075]: time=“2022-12-05T185013Z” level=error msg=“error loading CA cert for probe (kube-scheduler) /var/lib/rancher/rke2/server/tls/kube-scheduler/kube-scheduler.crt: open /var/lib/rancher/rke2/server/tls/kube-scheduler/kube-scheduler.crt: no such file or directory” Dec 5 185013 pax-test-1-pool1-2a9c45bc-lc8vz rancher-system-agent[2075]: time=“2022-12-05T185013Z” level=error msg=“error while appending ca cert to pool for probe kube-scheduler” Dec 5 185013 pax-test-1-pool1-2a9c45bc-lc8vz rancher-system-agent[2075]: time=“2022-12-05T185013Z” level=error msg=“error loading CA cert for probe (kube-controller-manager) /var/lib/rancher/rke2/server/tls/kube-controller-manager/kube-controller-manager.crt: open /var/lib/rancher/rke2/server/tls/kube-controller-manager/kube-controller-manager.crt: no such file or directory” Dec 5 185013 pax-test-1-pool1-2a9c45bc-lc8vz rancher-system-agent[2075]: time=“2022-12-05T185013Z” level=error msg=“error while appending ca cert to pool for probe kube-controller-manager” Dec 5 185019 pax-test-1-pool1-2a9c45bc-lc8vz rancher-system-agent[2075]: time=“2022-12-05T185019Z” level=error msg=“error loading CA cert for probe (kube-scheduler) /var/lib/rancher/rke2/server/tls/kube-scheduler/kube-scheduler.crt: open /var/lib/rancher/rke2/server/tls/kube-scheduler/kube-scheduler.crt: no such file or directory” Dec 5 185019 pax-test-1-pool1-2a9c45bc-lc8vz rancher-system-agent[2075]: time=“2022-12-05T185019Z” level=error msg=“error while appending ca cert to pool for probe kube-scheduler”
s

salmon-city-57654

12/08/2022, 3:24 PM
hi @lively-translator-30710, sorry for the late update. I thought the above error would be caused by the cluster agent’s inability to connect. Could you share what is your vlan config the guest cluster VM and your harvester network config?
BTW, I test again with rancher v2.7/rancher v2.6.9 with rke2 v1.24.8. It seems to work well.
l

lively-translator-30710

12/12/2022, 3:25 PM
Greetings… I think it is some form of timing or timeout issue. The blades we were running Harvester on were using HDD’s. We replaced them with SSD’s on Friday, reloaded Harvester, and can now create clusters from rancher. Now we just need to the same on 30ish more blades…
s

salmon-city-57654

12/12/2022, 5:05 PM
OK, from your description. Looks like the timeout issue. It recommends deploying on SSD because of etcd. Thanks for your effort to try another device for this.
1347 Views