This message was deleted Rancher Users #general

Join Slack

This message was deleted.

# general

adamant-kite-43734

07/09/2024, 6:35 PM

This message was deleted.

acoustic-addition-45641

07/09/2024, 6:35 PM

• Rancher v2.8.2 (5-node VMs on VMWare, and has been running for close to a year) • Harvester v1.3.1 (two different clusters less than a week old on dedicated, physical hardware) • Rancher is front-ended by NGINX, which provides health checks and SSL off-load, and also hosts the TLS public certificate. Rancher can deploy clusters to downstream OpenStack and VMWare environments with no issues. However, when attempting to deploy Kubernetes clusters to recently imported Harvester clusters, no VMs are created, and the associated

fleet-default

pods error out with:

Copy code

Downloading driver from https://<public_rancher_URL>/assets/docker-machine-driver-harvester
Doing /etc/rancher/ssl
ls: cannot access 'docker-machine-driver-*': No such file or directory
downloaded file  failed sha256 checksum
download of driver from https://<public_rancher_URL>/assets/docker-machine-driver-harvester failed

I have verified that the

docker-machine-driver-harvester

file is present both by visiting the /assets API URL and by running

ls

commands against the Rancher pods. I can also

curl

the file from other VMs in Harvester, so I do not believe there is a network path issue between Harvester and Rancher. What I suspect is occurring, but am unsure how to prove or troubleshoot, is that Fleet spins up a machine pod in

fleet-default

that attempts to pull down the machine driver using the public URL of the Rancher cluster. This request goes through the NGINX load balancer (which is fine) and reaches a Rancher pod. However, I suspect that the Rancher pod then tries to return a response DIRECTLY to the Fleet machine pod, rather than through the NGINX load balancer (which results in an asymetric path, which would cause the download of the Harvester node driver from the Rancher pod to the Fleet machine pod to fail). Again, just suspicion, but am trying to find a way to: 1. how to prove that the asymmetric path is the root cause of my issue (Fleet machine pod unable to download Harvester machine driver from Rancher pod). 2. how to resolve the issue. I.e. is there a way to trick Fleet into realizing that the Rancher pod that hosts the Harvester driver is in the same Rancher management cluster and use a clusterIP (or similar) to avoid reaching out to the external NGINX load balancer? Again, hoping someone has ran into this before and can provide some pointers. Thanks!

gifted-breakfast-73755

08/26/2024, 9:19 PM

@acoustic-addition-45641 I'm running into something similar when trying to create a RKE2 cluster using a custom node driver and running Rancher locally in Docker desktop:

Copy code

failureMessage: |-
      Failure detected from referenced resource <http://rke-machine.cattle.io/v1|rke-machine.cattle.io/v1>, Kind=TritonMachine with name "chad-test-1-pool1-8ec441f2-6bdp9": Downloading driver from <http://localhost/assets/docker-machine-driver-triton>
      Doing /etc/rancher/ssl
      ls: cannot access 'docker-machine-driver-*': No such file or directory
      downloaded file  failed sha256 checksum
      download of driver from <http://localhost/assets/docker-machine-driver-triton> failed
    failureReason: CreateError

Seems like it's related to the SSL cert (https://github.com/rancher/machine/blob/9183b3ff738e16ece4391a2e6bcc8ef88889e8ae/package/download_driver.sh#L15). Did you ever figure this out?

acoustic-addition-45641

08/26/2024, 9:28 PM

Unfortunately I have not. I plan to spin up a test Rancher manager instance that handles HTTPS in the cluster rather then using an external load balancer. This aligns more closely with the supported architecture. I just need to make time to do it.

gifted-breakfast-73755

08/26/2024, 10:12 PM

Sounds good. I tested on a separate rancher instance that has a valid SSL cert and I did not encounter the error so for me it is specific to the self signed cert rancher generates when running rancher locally. I tried to find a way to make

download_driver.sh

use

-k

but don't know how to patch the

rancher/machine

image with containerd. I may just spoof the domain locally with

/etc/hosts

for development and use the SSL cert from our other rancher instance.

👍 1

15 Views

Open in Slack

Previous Next