This message was deleted.
# harvester
a
This message was deleted.
w
I figured machines would actually try to create, but not looking like it. Rancher is on 2.7.9, Harvester is on 1.3. Perhaps there is an API incompatibility?
plenty of resources, no relevant events
g
might be worth checking the rancher cluster..
it runs k8s jobs with details on how to provision the vm.. I would check those job / pod logs and also rancher logs itself to see what is likely causing the issue
w
Do you have any tips on what jobs/pods to look for specifically? I don't appear to have any new jobs/pods specific to those clusters. I do have logs in capi-controller-manager but they aren't terribly informative.
Copy code
I0418 11:04:24.806401       1 machine_controller_noderef.go:54] "Waiting for infrastructure provider to report spec.providerID" controller="machine" controllerGroup="<http://cluster.x-k8s.io|cluster.x-k8s.io>" controllerKind="Machine" Machine="fleet-default/test-harvester-virtual-cluster-pool1-bdc9cb86dxsqzs6-sgm88" namespace="fleet-default" name="test-harvester-virtual-cluster-pool1-bdc9cb86dxsqzs6-sgm88" reconcileID=fc613019-8396-4a0f-ad6e-fe964f229592 MachineSet="fleet-default/test-harvester-virtual-cluster-pool1-bdc9cb86dxsqzs6" MachineDeployment="fleet-default/test-harvester-virtual-cluster-pool1" Cluster="fleet-default/test-harvester-virtual-cluster" HarvesterMachine="fleet-default/test-harvester-virtual-cluster-pool1-2f8b5c8a-htx8n"
I0418 11:04:25.019718       1 machine_controller_phases.go:286] "Waiting for infrastructure provider to create machine infrastructure and report status.ready" controller="machine" controllerGroup="<http://cluster.x-k8s.io|cluster.x-k8s.io>" controllerKind="Machine" Machine="fleet-default/test-harvester-virtual-cluster-pool1-bdc9cb86dxsqzs6-fz6kp" namespace="fleet-default" name="test-harvester-virtual-cluster-pool1-bdc9cb86dxsqzs6-fz6kp" reconcileID=284bd614-e94e-4ad2-8d38-f0313614abd9 MachineSet="fleet-default/test-harvester-virtual-cluster-pool1-bdc9cb86dxsqzs6" MachineDeployment="fleet-default/test-harvester-virtual-cluster-pool1" Cluster="fleet-default/test-harvester-virtual-cluster" HarvesterMachine="fleet-default/test-harvester-virtual-cluster-pool1-2f8b5c8a-tzdwq"
Here are what the logs look like when I create a new cluster:
Copy code
2024/04/18 12:40:47 http: TLS handshake error from 10.42.1.0:33218: EOF
2024/04/18 12:40:47 http: TLS handshake error from 10.42.1.0:33226: EOF
I0418 12:40:47.626997       1 machine_controller_phases.go:219] "Waiting for bootstrap provider to generate data secret and report status.ready" controller="machine" controllerGroup="<http://cluster.x-k8s.io|cluster.x-k8s.io>" controllerKind="Machine" Machine="fleet-default/test-pool1-5bb5c766d9xspx6h-lvxs4" namespace="fleet-default" name="test-pool1-5bb5c766d9xspx6h-lvxs4" reconcileID=7a8664a4-56da-4745-a403-49cfba6987bf MachineSet="fleet-default/test-pool1-5bb5c766d9xspx6h" MachineDeployment="fleet-default/test-pool1" Cluster="fleet-default/test" RKEBootstrap="fleet-default/test-bootstrap-template-vqvhb"
I0418 12:40:47.687054       1 machine_controller_phases.go:286] "Waiting for infrastructure provider to create machine infrastructure and report status.ready" controller="machine" controllerGroup="<http://cluster.x-k8s.io|cluster.x-k8s.io>" controllerKind="Machine" Machine="fleet-default/test-pool1-5bb5c766d9xspx6h-lvxs4" namespace="fleet-default" name="test-pool1-5bb5c766d9xspx6h-lvxs4" reconcileID=7a8664a4-56da-4745-a403-49cfba6987bf MachineSet="fleet-default/test-pool1-5bb5c766d9xspx6h" MachineDeployment="fleet-default/test-pool1" Cluster="fleet-default/test" HarvesterMachine="fleet-default/test-pool1-74c8ba76-5g6rx"
I0418 12:40:47.687124       1 machine_controller_noderef.go:54] "Waiting for infrastructure provider to report spec.providerID" controller="machine" controllerGroup="<http://cluster.x-k8s.io|cluster.x-k8s.io>" controllerKind="Machine" Machine="fleet-default/test-pool1-5bb5c766d9xspx6h-lvxs4" namespace="fleet-default" name="test-pool1-5bb5c766d9xspx6h-lvxs4" reconcileID=7a8664a4-56da-4745-a403-49cfba6987bf MachineSet="fleet-default/test-pool1-5bb5c766d9xspx6h" MachineDeployment="fleet-default/test-pool1" Cluster="fleet-default/test" HarvesterMachine="fleet-default/test-pool1-74c8ba76-5g6rx"
2024/04/18 12:40:47 http: TLS handshake error from 10.42.1.0:33242: EOF
I0418 12:40:47.847197       1 machine_controller_phases.go:219] "Waiting for bootstrap provider to generate data secret and report status.ready" controller="machine" controllerGroup="<http://cluster.x-k8s.io|cluster.x-k8s.io>" controllerKind="Machine" Machine="fleet-default/test-pool1-5bb5c766d9xspx6h-lvxs4" namespace="fleet-default" name="test-pool1-5bb5c766d9xspx6h-lvxs4" reconcileID=bc8d91ce-eeae-4318-aff7-6e0ef7c2b2e6 MachineSet="fleet-default/test-pool1-5bb5c766d9xspx6h" MachineDeployment="fleet-default/test-pool1" Cluster="fleet-default/test" RKEBootstrap="fleet-default/test-bootstrap-template-vqvhb"
I0418 12:40:47.892738       1 machine_controller_phases.go:286] "Waiting for infrastructure provider to create machine infrastructure and report status.ready" controller="machine" controllerGroup="<http://cluster.x-k8s.io|cluster.x-k8s.io>" controllerKind="Machine" Machine="fleet-default/test-pool1-5bb5c766d9xspx6h-lvxs4" namespace="fleet-default" name="test-pool1-5bb5c766d9xspx6h-lvxs4" reconcileID=bc8d91ce-eeae-4318-aff7-6e0ef7c2b2e6 MachineSet="fleet-default/test-pool1-5bb5c766d9xspx6h" MachineDeployment="fleet-default/test-pool1" Cluster="fleet-default/test" HarvesterMachine="fleet-default/test-pool1-74c8ba76-5g6rx"
I0418 12:40:47.892806       1 machine_controller_noderef.go:54] "Waiting for infrastructure provider to report spec.providerID" controller="machine" controllerGroup="<http://cluster.x-k8s.io|cluster.x-k8s.io>" controllerKind="Machine" Machine="fleet-default/test-pool1-5bb5c766d9xspx6h-lvxs4" namespace="fleet-default" name="test-pool1-5bb5c766d9xspx6h-lvxs4" reconcileID=bc8d91ce-eeae-4318-aff7-6e0ef7c2b2e6 MachineSet="fleet-default/test-pool1-5bb5c766d9xspx6h" MachineDeployment="fleet-default/test-pool1" Cluster="fleet-default/test" HarvesterMachine="fleet-default/test-pool1-74c8ba76-5g6rx"
I'm curious about the TLS handshake errors
g
when a cluster is create, a corresponding MachineTemplate and Machine CRD is created
they should be in
fleet-default
namespace and I would expect a
job
to be created in same namespace
this job makes the api calls to harvester to provision the VM's and the logs of this pod would have info about what is going on
w
Ok I'll check for that, thanks for the tip. I didn't see any jobs created but I'll verify. hopefully I'll see something there
g
if you delete and recreate cluster a job should be created for each vm making up the cluster
w
I've checked both clusters and on all namespaces, and I don't have a Job being created
checked on local, and the harvester cluster. owner creds on both
This appears to have been fixed after a Rancher upgrade to 2.8.3 👍 (from 2.7.9)