This message was deleted.
# harvester
a
This message was deleted.
s
Hi @lively-translator-30710, it depends on your VM resource. What resource did you allocate to the VM to spin up the rancher? Also, try to log in to this VM to check if any errors occur. If your resource is not enough, you will see some OOM-related logs.
l
Greetings. Thanks for your quick response. For resources, on this last pass I gave it 8 cores and 16GB so I believe that should be enough (?). Going back over the logs.. .haven’t found anything obvious yet.
w
Versions?
l
Rancher 2.6.9 and 2.7.0, Harvester 1.1.0. Tried ubuntu cloud images for 18.04 and 22.04
s
8 cores/16GB is enough. I will also check on my environment. Did you try the ubuntu image 20.04? And how long did you wait?
l
Haven’t tried 20.04. I can do that next. As to waiting - overnight… I only sent a snippet of the log… 🙂 In the latest attempt, I see that it periodically seems to restart a vm - including an Error message
Copy code
[INFO ] configuring bootstrap node(s) pax-pool1-cb9c56f68-l9fss: waiting for probes: kube-apiserver, kube-controller-manager, kube-scheduler
[INFO ] configuring bootstrap node(s) pax-pool1-cb9c56f68-l9fss: waiting for probes: etcd, kube-apiserver, kube-controller-manager, kube-scheduler
[INFO ] configuring bootstrap node(s) pax-pool1-cb9c56f68-l9fss: waiting for probes: kube-apiserver, kube-controller-manager, kube-scheduler
[INFO ] configuring bootstrap node(s) pax-pool1-cb9c56f68-l9fss: waiting for probes: kube-controller-manager, kube-scheduler
[INFO ] pax-pool1-cb9c56f68-l9fss
[INFO ] configuring bootstrap node(s) pax-pool1-cb9c56f68-l9fss: waiting for probes: kube-apiserver, kube-controller-manager
[INFO ] waiting for viable init node
[ERROR] Operation cannot be fulfilled on <http://machines.cluster.x-k8s.io|machines.cluster.x-k8s.io> "pax-pool1-cb9c56f68-l9fss": the object has been modified; please apply your changes to the latest version and try again
[INFO ] configuring bootstrap node(s) pax-pool1-cb9c56f68-l9fss: waiting for agent to check in and apply initial plan
[INFO ] configuring bootstrap node(s) pax-pool1-cb9c56f68-l9fss: waiting for probes: calico, etcd, kube-apiserver, kube-controller-manager, kube-scheduler, kubelet
[INFO ] configuring bootstrap node(s) pax-pool1-cb9c56f68-l9fss: waiting for probes: calico, etcd, kube-apiserver, kube-controller-manager, kube-scheduler
[INFO ] configuring bootstrap node(s) pax-pool1-cb9c56f68-l9fss: waiting for probes: calico, kube-apiserver, kube-controller-manager, kube-scheduler
[INFO ] configuring bootstrap node(s) pax-pool1-cb9c56f68-l9fss: waiting for probes: calico, kube-controller-manager, kube-scheduler
Do you install iptables?
l
I -am- using Rancher to deploy the cluster in Harvester with RKE2. Both Rancher and Harvester were clean installs.
Interesting… Will try that… On a previous installation of 2.6.3/1.0.3 we didn’t need to make these settings…. Will try that next.
I tried adding iptables. I also tried using cilium. No change - same results with both.
s
Hi @lively-translator-30710, Could you provide the rancher version, the rke2 version? Also, providing the
rke2-server
,
rancher-server-agent
, and kernel logs on the VM which handles the guest cluster would be more helpful. Thanks!
l
Current test was on Rancher 2.7.0 (same problem on 2.6.9). and Harvester 1.1.0 - so the RKE2 version that comes native with the release.. Harvester is running on bare metal - Dell Blades. I’ll work on getting the logs.
s
Hi @lively-translator-30710, Yes, please get the related logs from this guest cluster. We have tested with rancher 2.7.0 + rke2 1.24.7, and it works. Please also check your rke2 version.
l
Greetings, to help find the correct logs, which pods do I need to get the logs from? I
s
Hi, @lively-translator-30710 For the logs on the guest cluster, you may find a VM on the related harvester cluster. Then, log in to this VM and try to get the
rke2-server
,
rancher-server-agent
, and kernel logs. Thanks!
Hi @lively-translator-30710, again, one more thing you could check. Did your VM that the guest cluster and the imported Harvester cluster should be connected to each other. Could you check for it? Thanks!
l
the rke2 and rancher-server logs seem to be in syslog. This is the most ‘interesting’ error in those logs: Dec 5 185326 pax-test-1-pool1-2a9c45bc-lc8vz rancher-system-agent[2075]: time=“2022-12-05T185326Z” level=error msg=“[K8s] received secret to process that was older than the last secret operated on. (70898741 vs 70898775)” Dec 5 185326 pax-test-1-pool1-2a9c45bc-lc8vz rancher-system-agent[2075]: time=“2022-12-05T185326Z” level=error msg=“error syncing ‘fleet-default/pax-test-1-bootstrap-template-bpxsr-machine-plan’: handler secret-watch: secret received was too old, requeuing” Dec 5 185331 pax-test-1-pool1-2a9c45bc-lc8vz rancher-system-agent[2075]: time=“2022-12-05T185331Z” level=error msg=“[K8s] received secret to process that was older than the last secret operated on. (70898775 vs 70898810)”
This is the other error that shows up: Dec 5 185013 pax-test-1-pool1-2a9c45bc-lc8vz rancher-system-agent[2075]: time=“2022-12-05T185013Z” level=error msg=“error loading CA cert for probe (kube-scheduler) /var/lib/rancher/rke2/server/tls/kube-scheduler/kube-scheduler.crt: open /var/lib/rancher/rke2/server/tls/kube-scheduler/kube-scheduler.crt: no such file or directory” Dec 5 185013 pax-test-1-pool1-2a9c45bc-lc8vz rancher-system-agent[2075]: time=“2022-12-05T185013Z” level=error msg=“error while appending ca cert to pool for probe kube-scheduler” Dec 5 185013 pax-test-1-pool1-2a9c45bc-lc8vz rancher-system-agent[2075]: time=“2022-12-05T185013Z” level=error msg=“error loading CA cert for probe (kube-controller-manager) /var/lib/rancher/rke2/server/tls/kube-controller-manager/kube-controller-manager.crt: open /var/lib/rancher/rke2/server/tls/kube-controller-manager/kube-controller-manager.crt: no such file or directory” Dec 5 185013 pax-test-1-pool1-2a9c45bc-lc8vz rancher-system-agent[2075]: time=“2022-12-05T185013Z” level=error msg=“error while appending ca cert to pool for probe kube-controller-manager” Dec 5 185019 pax-test-1-pool1-2a9c45bc-lc8vz rancher-system-agent[2075]: time=“2022-12-05T185019Z” level=error msg=“error loading CA cert for probe (kube-scheduler) /var/lib/rancher/rke2/server/tls/kube-scheduler/kube-scheduler.crt: open /var/lib/rancher/rke2/server/tls/kube-scheduler/kube-scheduler.crt: no such file or directory” Dec 5 185019 pax-test-1-pool1-2a9c45bc-lc8vz rancher-system-agent[2075]: time=“2022-12-05T185019Z” level=error msg=“error while appending ca cert to pool for probe kube-scheduler”
s
hi @lively-translator-30710, sorry for the late update. I thought the above error would be caused by the cluster agent’s inability to connect. Could you share what is your vlan config the guest cluster VM and your harvester network config?
BTW, I test again with rancher v2.7/rancher v2.6.9 with rke2 v1.24.8. It seems to work well.
l
Greetings… I think it is some form of timing or timeout issue. The blades we were running Harvester on were using HDD’s. We replaced them with SSD’s on Friday, reloaded Harvester, and can now create clusters from rancher. Now we just need to the same on 30ish more blades…
s
OK, from your description. Looks like the timeout issue. It recommends deploying on SSD because of etcd. Thanks for your effort to try another device for this.
2310 Views