This message was deleted Rancher Users #harvester

Join Slack

This message was deleted.

# harvester

adamant-kite-43734

11/29/2022, 3:40 PM

This message was deleted.

salmon-city-57654

11/29/2022, 4:06 PM

Hi @lively-translator-30710, it depends on your VM resource. What resource did you allocate to the VM to spin up the rancher? Also, try to log in to this VM to check if any errors occur. If your resource is not enough, you will see some OOM-related logs.

lively-translator-30710

11/29/2022, 4:10 PM

Greetings. Thanks for your quick response. For resources, on this last pass I gave it 8 cores and 16GB so I believe that should be enough (?). Going back over the logs.. .haven’t found anything obvious yet.

witty-jelly-95845

11/29/2022, 4:11 PM

Versions?

lively-translator-30710

11/29/2022, 4:17 PM

Rancher 2.6.9 and 2.7.0, Harvester 1.1.0. Tried ubuntu cloud images for 18.04 and 22.04

salmon-city-57654

11/29/2022, 4:23 PM

8 cores/16GB is enough. I will also check on my environment. Did you try the ubuntu image 20.04? And how long did you wait?

lively-translator-30710

11/29/2022, 4:44 PM

Haven’t tried 20.04. I can do that next. As to waiting - overnight… I only sent a snippet of the log… 🙂 In the latest attempt, I see that it periodically seems to restart a vm - including an Error message

Copy code

[INFO ] configuring bootstrap node(s) pax-pool1-cb9c56f68-l9fss: waiting for probes: kube-apiserver, kube-controller-manager, kube-scheduler
[INFO ] configuring bootstrap node(s) pax-pool1-cb9c56f68-l9fss: waiting for probes: etcd, kube-apiserver, kube-controller-manager, kube-scheduler
[INFO ] configuring bootstrap node(s) pax-pool1-cb9c56f68-l9fss: waiting for probes: kube-apiserver, kube-controller-manager, kube-scheduler
[INFO ] configuring bootstrap node(s) pax-pool1-cb9c56f68-l9fss: waiting for probes: kube-controller-manager, kube-scheduler
[INFO ] pax-pool1-cb9c56f68-l9fss
[INFO ] configuring bootstrap node(s) pax-pool1-cb9c56f68-l9fss: waiting for probes: kube-apiserver, kube-controller-manager
[INFO ] waiting for viable init node
[ERROR] Operation cannot be fulfilled on <http://machines.cluster.x-k8s.io|machines.cluster.x-k8s.io> "pax-pool1-cb9c56f68-l9fss": the object has been modified; please apply your changes to the latest version and try again
[INFO ] configuring bootstrap node(s) pax-pool1-cb9c56f68-l9fss: waiting for agent to check in and apply initial plan
[INFO ] configuring bootstrap node(s) pax-pool1-cb9c56f68-l9fss: waiting for probes: calico, etcd, kube-apiserver, kube-controller-manager, kube-scheduler, kubelet
[INFO ] configuring bootstrap node(s) pax-pool1-cb9c56f68-l9fss: waiting for probes: calico, etcd, kube-apiserver, kube-controller-manager, kube-scheduler
[INFO ] configuring bootstrap node(s) pax-pool1-cb9c56f68-l9fss: waiting for probes: calico, kube-apiserver, kube-controller-manager, kube-scheduler
[INFO ] configuring bootstrap node(s) pax-pool1-cb9c56f68-l9fss: waiting for probes: calico, kube-controller-manager, kube-scheduler

full-plastic-79795

11/29/2022, 4:52 PM

Refer to https://docs.harvesterhci.io/v1.1/rancher/node/rke2-cluster#create-rke2-kubernetes-cluster

full-plastic-79795

11/29/2022, 4:52 PM

Do you install iptables?

lively-translator-30710

11/29/2022, 4:53 PM

I -am- using Rancher to deploy the cluster in Harvester with RKE2. Both Rancher and Harvester were clean installs.

lively-translator-30710

11/29/2022, 5:04 PM

Interesting… Will try that… On a previous installation of 2.6.3/1.0.3 we didn’t need to make these settings…. Will try that next.

lively-translator-30710

11/29/2022, 6:42 PM

I tried adding iptables. I also tried using cilium. No change - same results with both.

salmon-city-57654

11/30/2022, 10:34 AM

Hi @lively-translator-30710, Could you provide the rancher version, the rke2 version? Also, providing the

rke2-server

rancher-server-agent

, and kernel logs on the VM which handles the guest cluster would be more helpful. Thanks!

lively-translator-30710

11/30/2022, 1:44 PM

Current test was on Rancher 2.7.0 (same problem on 2.6.9). and Harvester 1.1.0 - so the RKE2 version that comes native with the release.. Harvester is running on bare metal - Dell Blades. I’ll work on getting the logs.

salmon-city-57654

12/01/2022, 2:11 AM

Hi @lively-translator-30710, Yes, please get the related logs from this guest cluster. We have tested with rancher 2.7.0 + rke2 1.24.7, and it works. Please also check your rke2 version.

lively-translator-30710

12/02/2022, 3:30 PM

Greetings, to help find the correct logs, which pods do I need to get the logs from? I

lively-translator-30710

12/02/2022, 3:31 PM

I’ve attached from two pods -one on rancher, and the other from Harvester

virt-launcher-pax-test-dec-2-pool1-2ec2374c-q96wb-hb6t8_compute.log pax-test-dec-2-pool1-2ec2374c-q96wb-machine-provision-hf5lm_machine.log

salmon-city-57654

12/04/2022, 2:29 PM

Hi, @lively-translator-30710 For the logs on the guest cluster, you may find a VM on the related harvester cluster. Then, log in to this VM and try to get the

rke2-server

rancher-server-agent

, and kernel logs. Thanks!

salmon-city-57654

12/05/2022, 4:36 PM

Hi @lively-translator-30710, again, one more thing you could check. Did your VM that the guest cluster and the imported Harvester cluster should be connected to each other. Could you check for it? Thanks!

lively-translator-30710

12/05/2022, 7:09 PM

the rke2 and rancher-server logs seem to be in syslog. This is the most ‘interesting’ error in those logs: Dec 5 185326 pax-test-1-pool1-2a9c45bc-lc8vz rancher-system-agent[2075]: time=“2022-12-05T185326Z” level=error msg=“[K8s] received secret to process that was older than the last secret operated on. (70898741 vs 70898775)” Dec 5 185326 pax-test-1-pool1-2a9c45bc-lc8vz rancher-system-agent[2075]: time=“2022-12-05T185326Z” level=error msg=“error syncing ‘fleet-default/pax-test-1-bootstrap-template-bpxsr-machine-plan’: handler secret-watch: secret received was too old, requeuing” Dec 5 185331 pax-test-1-pool1-2a9c45bc-lc8vz rancher-system-agent[2075]: time=“2022-12-05T185331Z” level=error msg=“[K8s] received secret to process that was older than the last secret operated on. (70898775 vs 70898810)”

lively-translator-30710

12/05/2022, 7:22 PM

This is the other error that shows up: Dec 5 185013 pax-test-1-pool1-2a9c45bc-lc8vz rancher-system-agent[2075]: time=“2022-12-05T185013Z” level=error msg=“error loading CA cert for probe (kube-scheduler) /var/lib/rancher/rke2/server/tls/kube-scheduler/kube-scheduler.crt: open /var/lib/rancher/rke2/server/tls/kube-scheduler/kube-scheduler.crt: no such file or directory” Dec 5 185013 pax-test-1-pool1-2a9c45bc-lc8vz rancher-system-agent[2075]: time=“2022-12-05T185013Z” level=error msg=“error while appending ca cert to pool for probe kube-scheduler” Dec 5 185013 pax-test-1-pool1-2a9c45bc-lc8vz rancher-system-agent[2075]: time=“2022-12-05T185013Z” level=error msg=“error loading CA cert for probe (kube-controller-manager) /var/lib/rancher/rke2/server/tls/kube-controller-manager/kube-controller-manager.crt: open /var/lib/rancher/rke2/server/tls/kube-controller-manager/kube-controller-manager.crt: no such file or directory” Dec 5 185013 pax-test-1-pool1-2a9c45bc-lc8vz rancher-system-agent[2075]: time=“2022-12-05T185013Z” level=error msg=“error while appending ca cert to pool for probe kube-controller-manager” Dec 5 185019 pax-test-1-pool1-2a9c45bc-lc8vz rancher-system-agent[2075]: time=“2022-12-05T185019Z” level=error msg=“error loading CA cert for probe (kube-scheduler) /var/lib/rancher/rke2/server/tls/kube-scheduler/kube-scheduler.crt: open /var/lib/rancher/rke2/server/tls/kube-scheduler/kube-scheduler.crt: no such file or directory” Dec 5 185019 pax-test-1-pool1-2a9c45bc-lc8vz rancher-system-agent[2075]: time=“2022-12-05T185019Z” level=error msg=“error while appending ca cert to pool for probe kube-scheduler”

salmon-city-57654

12/08/2022, 3:24 PM

hi @lively-translator-30710, sorry for the late update. I thought the above error would be caused by the cluster agent’s inability to connect. Could you share what is your vlan config the guest cluster VM and your harvester network config?

salmon-city-57654

12/08/2022, 3:25 PM

BTW, I test again with rancher v2.7/rancher v2.6.9 with rke2 v1.24.8. It seems to work well.

lively-translator-30710

12/12/2022, 3:25 PM

Greetings… I think it is some form of timing or timeout issue. The blades we were running Harvester on were using HDD’s. We replaced them with SSD’s on Friday, reloaded Harvester, and can now create clusters from rancher. Now we just need to the same on 30ish more blades…

salmon-city-57654

12/12/2022, 5:05 PM

OK, from your description. Looks like the timeout issue. It recommends deploying on SSD because of etcd. Thanks for your effort to try another device for this.

2350 Views

Open in Slack

Previous Next