Greetings, all. I'm hoping someone can provide di...
# harvester
p
Greetings, all. I'm hoping someone can provide direction on how to troubleshoot an issue where a pool vm created via Rancher UI is stuck in Reconciling - waiting for agent to check in and apply initial plan. If this has been answered elsewhere, a link to that thread would be appreciated!
b
Could be lots of reasons for that.
Is your rancher instance accessible from the VM Network on Harvester? Do you have any Firewalls or port restrictions outside of the VM? Does the VM have all the requirements installed for Rancher to provision it properly?
p
I would say yes that it's accessible from Harvester because I'm importing Harvester into Rancher without an issue. Rancher can spin up and delete vms in Harvester, but for some reason it doesn't complete the provisioning process. I thought it might be due to my having my own dns server, but I was able to create a cloud init file for micro os that adds my dns server to the mix. If I access the Rancher created vm, I can ping hosts via IP or dns name. I do have an nginx LB in front of Rancher so I'm wondering if there's some kind of config issue there. I'm forwarding 80,443,6443 through to the Rancher nodes. I don't know how (or what) to test to make sure the Rancher provisioning requests are routing properly. If I create a k3s cluster manually on Harvester and import it to Rancher, it seems to be fine. Just can't get Rancher to complete the provisioning process on it's own.
b
> Rancher can spin up and delete vms in Harvester, That just means that Rancher can reach the HarvsterCluster API. > I do have an nginx LB in front of Rancher so I'm wondering if there's some kind of config issue there. I'm forwarding 80,443,6443 through to the Rancher nodes. I think the docs specify an addition port?
Are you able to log into the VM that is stuck waiting to register?
I think the docs specify an addition port?
My notes show that you also need to include
9345
though I don't know if that's needed for provisioning and registration or not.
👀 1
r
Are you able to access the VM via ssh? I would suggest viewing the logs on the VM. If you have a LB in front of Rancher then there might be a problem with certificates.
p
Are you able to log into the VM that is stuck waiting to register? Yes, I'm able to login to all the vms that get spun up.
What logs should I be reviewing? I haven't found much info on what logs to review or how.
b
Check your routes to the rancher instance and journald logs.
you can just
journalctl -f
and see if things complain.
👀 1
p
The journal on Rancher or the new vms? Would the journal on the Rancher cluster be the same across nodes?
I updated nginx to pass through 9345, we'll see if that has any impact.
b
On the new VMs.
It should have the rancher system agent trying to check in.
p
The last message in the journal of one of the new vms is: Startup finished in 130ms. Safe to ASSume the agent isn't being loaded for some reason?
qemu-guest-agent is showing as active
b
Yeah, that's a different thing
systemctl status rancher-system-agent.service
p
Service can't be found, so it's not being loaded. I wonder if that's because I'm using MicroOS and it needs to use transactional-update? One might think that Suse would make sure their own stuff works with their other stuff....