adamant-kite-43734
07/09/2024, 6:35 PMacoustic-addition-45641
07/09/2024, 6:35 PMfleet-default
pods error out with:
Downloading driver from https://<public_rancher_URL>/assets/docker-machine-driver-harvester
Doing /etc/rancher/ssl
ls: cannot access 'docker-machine-driver-*': No such file or directory
downloaded file failed sha256 checksum
download of driver from https://<public_rancher_URL>/assets/docker-machine-driver-harvester failed
I have verified that the docker-machine-driver-harvester
file is present both by visiting the /assets API URL and by running ls
commands against the Rancher pods. I can also curl
the file from other VMs in Harvester, so I do not believe there is a network path issue between Harvester and Rancher.
What I suspect is occurring, but am unsure how to prove or troubleshoot, is that Fleet spins up a machine pod in fleet-default
that attempts to pull down the machine driver using the public URL of the Rancher cluster. This request goes through the NGINX load balancer (which is fine) and reaches a Rancher pod. However, I suspect that the Rancher pod then tries to return a response DIRECTLY to the Fleet machine pod, rather than through the NGINX load balancer (which results in an asymetric path, which would cause the download of the Harvester node driver from the Rancher pod to the Fleet machine pod to fail).
Again, just suspicion, but am trying to find a way to:
1. how to prove that the asymmetric path is the root cause of my issue (Fleet machine pod unable to download Harvester machine driver from Rancher pod).
2. how to resolve the issue. I.e. is there a way to trick Fleet into realizing that the Rancher pod that hosts the Harvester driver is in the same Rancher management cluster and use a clusterIP (or similar) to avoid reaching out to the external NGINX load balancer?
Again, hoping someone has ran into this before and can provide some pointers.
Thanks!gifted-breakfast-73755
08/26/2024, 9:19 PMfailureMessage: |-
Failure detected from referenced resource <http://rke-machine.cattle.io/v1|rke-machine.cattle.io/v1>, Kind=TritonMachine with name "chad-test-1-pool1-8ec441f2-6bdp9": Downloading driver from <http://localhost/assets/docker-machine-driver-triton>
Doing /etc/rancher/ssl
ls: cannot access 'docker-machine-driver-*': No such file or directory
downloaded file failed sha256 checksum
download of driver from <http://localhost/assets/docker-machine-driver-triton> failed
failureReason: CreateError
Seems like it's related to the SSL cert (https://github.com/rancher/machine/blob/9183b3ff738e16ece4391a2e6bcc8ef88889e8ae/package/download_driver.sh#L15). Did you ever figure this out?acoustic-addition-45641
08/26/2024, 9:28 PMgifted-breakfast-73755
08/26/2024, 10:12 PMdownload_driver.sh
use -k
but don't know how to patch the rancher/machine
image with containerd. I may just spoof the domain locally with /etc/hosts
for development and use the SSL cert from our other rancher instance.