Hi, I'm working with a home lab Harvester cluster ...
# harvester
h
Hi, I'm working with a home lab Harvester cluster and encountering an issue with a newly added worker node. Cluster Setup: • Existing Harvester cluster (3 nodes acting as masters/management). • Initially installed at v1.1, upgraded successfully to v1.4. • Recently added a fourth node intended as a worker. It's running the same v1.4 Harvester version as the upgraded nodes. Problem: When I provision a new VM (specifically as part of provisioning an RKE2 cluster via Rancher) and it spins up on the new worker node (node 4), its initial setup script fails. The specific error seen inside the VM during boot is:
Copy code
Error while connecting to Rancher to verify CA necessity. Sleeping for 5 seconds and trying again.
This loop prevents the VM setup from completing, and consequently, the VM doesn't get an IP address. Debugging Performed: 1. I suspect this error originates from the Rancher System Agent
install.sh
script, possibly around the CA verification logic (link for reference: https://github.com/rancher/system-agent/blob/main/install.sh#L721C1-L762C2). 2. Inside the failed VM's
virt-launcher
pod shell, I ran the
curl
command that's likely causing the issue. The output was: 3. verify result: 20 4. curl exit: 60 5.
curl exit 60
typically indicates an SSL certificate problem (like a self-signed certificate that isn't trusted). Key Observation: VMs provisioned on the original three nodes (which were upgraded to 1.4) using the same process and image do not experience this error and provision successfully. The issue seems isolated to VMs running specifically on the newly added v1.4 worker node. Question: Does anyone have experience with this type of CA verification failure (
curl exit 60
) when adding a new node to an existing, upgraded Harvester cluster? Could this be related to how the new node is trusting the cluster's CA, or perhaps an issue with the Rancher endpoint accessibility/trustworthiness only from VMs on that specific node? Any pointers or suggestions on how to diagnose this further would be greatly appreciated! Thanks!
t
wrapping my head around the problem. you have a 4 node harvester cluster, and you are seeing the error on creating VMs with Rancher? Typically the SSL problems stems from Rancher and not Harvester. Do you have a legit cert on Rancher?
h
No, it's been this self signed cert from the start? Will it work when i put a let's encrypt TLS termination reverse proxy in front of rancher?
t
it should. if you use rancher to deploy you should be able to check the rancher-agent service on the vm to see the exact error. then try and curl the url
to see the exact error. My guess is “cert signed by unknown authority”
h
Gonna try that first, thought rancher with self signed certs was the way to go since it's been working without problems since the start. It might be the simplest solution and should solve it. Thanks @thousands-advantage-10804!
t
Here is a vid to show you the cloud-init phase of creating vms. You can check
/var/log/cloud-init-output.log
for the output
r
same/similar problem after 1.5 upgrade when trying to add new node. Agent is not able to connect VIP. (and issue is not that Rancher TLS strict etc)
On existing node CA is available also in OS level (curl to VIP works) but not in new node.