After installing the fresh v1 6 all looks good i can create Rancher Users #harvester

After installing the fresh v1.6 all looks good (i ...

brash-petabyte-67855

08/29/2025, 2:09 AM

After installing the fresh v1.6 all looks good (i can create VM and stuff), but when I'm trying to provision RKE2 cluster in Harvester from Rancher - it is stuck in Updating state. The clustername poolname-machine-provision-xxxxx are having this in the logs:

Downloading driver from <https://myrancher.com/assets/docker-machine-driver-harvester>

Doing /etc/rancher/ssl

Curl failed with error code 7

ls: docker-machine-driver-*: No such file or directory

downloaded file  failed sha256 checksum

download of driver from <https://myrancher.com/assets/docker-machine-driver-harvester> failed

brash-petabyte-67855

08/29/2025, 2:11 AM

what I'm missing?

brash-petabyte-67855

08/29/2025, 2:15 AM

when I'm hitting the same URL in browser it downloads a 52.8MB file

brash-petabyte-67855

08/29/2025, 2:59 AM

this exact bug https://github.com/rancher/rancher/issues/49635 (Just upgraded my Rancher to 2.12)

brash-petabyte-67855

08/29/2025, 3:00 AM

My Rancher cert is created by LetsEncrypt and is legit... had to set the

agent-tls-mode: system-store

(was hitting another complication), not sure if it can affect this.

bland-article-62755

08/29/2025, 4:43 AM

Ah, I think when I hit this before it was dns fallthrough with coredns.

bland-article-62755

08/29/2025, 4:53 AM

Yeah be sure you have

cluster.local

in your coreDNS config, but also your

<http://rancher.fqdn.com|rancher.fqdn.com>

in the fallthrough there too. I think the object is

kubectl edit configmap/coredns -n kube-system

bland-article-62755

08/29/2025, 4:55 AM

Also: https://github.com/rancher/rke2/issues/4940

brash-petabyte-67855

08/29/2025, 5:39 PM

Copy code

kubectl edit configmap/rke2-coredns-rke2-coredns -n=kube-system

brash-petabyte-67855

08/29/2025, 5:39 PM

Copy code

apiVersion: v1
data:
  Corefile: |-
    .:53 {
        errors
        health {
            lameduck 10s
        }
        ready
        kubernetes  cluster.local  cluster.local in-addr.arpa ip6.arpa {
            pods insecure
            fallthrough in-addr.arpa ip6.arpa <http://myrancher.com|myrancher.com>
            ttl 30
        }
        prometheus  0.0.0.0:9153
        forward  . /etc/resolv.conf
        cache  30
        loop
        reload
        loadbalance
    }
kind: ConfigMap
metadata:
  annotations:
    <http://meta.helm.sh/release-name|meta.helm.sh/release-name>: rke2-coredns
    <http://meta.helm.sh/release-namespace|meta.helm.sh/release-namespace>: kube-system
  creationTimestamp: "2025-07-14T17:55:03Z"
  labels:
    <http://app.kubernetes.io/instance|app.kubernetes.io/instance>: rke2-coredns
    <http://app.kubernetes.io/managed-by|app.kubernetes.io/managed-by>: Helm
    <http://app.kubernetes.io/name|app.kubernetes.io/name>: rke2-coredns
    <http://helm.sh/chart|helm.sh/chart>: rke2-coredns-1.42.302
    k8s-app: kube-dns
    <http://kubernetes.io/cluster-service|kubernetes.io/cluster-service>: "true"
    <http://kubernetes.io/name|kubernetes.io/name>: CoreDNS
  name: rke2-coredns-rke2-coredns
  namespace: kube-system
  resourceVersion: "22735246"
  uid: 28283b3e-adfa-4e06-9334-b1c9f9fa43dd

brash-petabyte-67855

08/29/2025, 5:40 PM

added rancher.fqdn.com - same error... do i need to restart anything?

brash-petabyte-67855

08/29/2025, 7:05 PM

restarted Rancher cluster, didn't help

bland-article-62755

08/29/2025, 7:17 PM

just to be clear, you didn't literally add

<http://rancher.fqdn.com|rancher.fqdn.com>

but you added your address.

bland-article-62755

08/29/2025, 7:17 PM

and the

<http://myrancher.com|myrancher.com>

on the

fallthrough

line matches it

bland-article-62755

08/29/2025, 7:18 PM

You have

cluster.local

in there twice too

bland-article-62755

08/29/2025, 7:19 PM

which doesn't seem right

brash-petabyte-67855

08/29/2025, 8:41 PM

Yes, added my real domain (should I add IP address there?). Taking care of cluster.local duplication..

bland-article-62755

08/29/2025, 8:42 PM

no, the next step is to shell into the provisioner pod and make sure you can curl the file

brash-petabyte-67855

08/29/2025, 8:43 PM

it gets killed in a seconds, is there a way to slow the execution?

bland-article-62755

08/29/2025, 8:43 PM

Deploy another pod into the same namespace

bland-article-62755

08/29/2025, 8:44 PM

I like the swissarmyknife one, but anything with some tools would work.

brash-petabyte-67855

08/29/2025, 8:47 PM

this one? https://hub.docker.com/r/leodotcloud/swiss-army-knife

bland-article-62755

08/29/2025, 8:48 PM

I normally use

rancherlabs/swiss-army-knife

but it's probably similar

👍 1

brash-petabyte-67855

08/29/2025, 8:53 PM

curl: (6) Could not resolve host: myrancher.com

bland-article-62755

08/29/2025, 8:54 PM

So DNS isn't working

bland-article-62755

08/29/2025, 8:55 PM

Did you rollout CoreDNS after making your edits?

brash-petabyte-67855

08/29/2025, 8:55 PM

redeploying..

bland-article-62755

08/29/2025, 8:55 PM

🙂

bland-article-62755

08/29/2025, 8:55 PM

and your myrancher.com address resolves from the host cli right?

brash-petabyte-67855

08/29/2025, 8:56 PM

yes, from everywhere except of kube-system ns

bland-article-62755

08/29/2025, 8:59 PM

That seems odd. Just that namespace?

bland-article-62755

08/29/2025, 8:59 PM

default

it works?

bland-article-62755

08/29/2025, 8:59 PM

Do you have security apps or stuff installed in your cluster?

brash-petabyte-67855

08/29/2025, 9:02 PM

rke2-coredns-rke2-coredns didn't help, just the default rke2 rancher

brash-petabyte-67855

08/29/2025, 9:04 PM

does not resolve in default as well

bland-article-62755

08/29/2025, 9:05 PM

Ok, so it's not doing the fallthrough you can resolve other address though, correct? Like you can ping google.com ?

bland-article-62755

08/29/2025, 9:05 PM

and you can ping your IP of your FQDN?

brash-petabyte-67855

08/29/2025, 9:05 PM

yes, google.com is pingable

brash-petabyte-67855

08/29/2025, 9:07 PM

and can ping IP of myrancher.com

bland-article-62755

08/29/2025, 9:08 PM

Here's the patch of our coreDNS for the helm:

Copy code

apiVersion: <http://helm.cattle.io/v1|helm.cattle.io/v1>
kind: HelmChartConfig
metadata:
  name: rke2-coredns
  namespace: kube-system
spec:
  valuesContent: |-
    prometheus:
      service:
        enabled: true
      monitor:
        enabled: true
    replicaCount: 3
    servers:
    - zones:
      - zone: .
      port: 53
      # If serviceType is nodePort you can specify nodePort here
      # nodePort: 30053
      # hostPort: 53
      plugins:
      - name: errors
      # Serves a /health endpoint on :8080, required for livenessProbe
      - name: health
        configBlock: |-
          lameduck 5s
      # Serves a /ready endpoint on :8181, required for readinessProbe
      - name: ready
      # Required to query kubernetes API for data
      - name: kubernetes
        parameters: <http://rke.aristotle.ucsb.edu|rke.aristotle.ucsb.edu> cluster.local in-addr.arpa ip6.arpa
        configBlock: |-
          pods insecure
          fallthrough in-addr.arpa ip6.arpa <http://rke.aristotle.ucsb.edu|rke.aristotle.ucsb.edu>
          ttl 30
      # Serves a /metrics endpoint on :9153, required for serviceMonitor
      - name: prometheus
        parameters: 0.0.0.0:9153
      - name: forward
        parameters: . /etc/resolv.conf
      - name: cache
        parameters: 30
      - name: loop
      - name: reload
      - name: loadbalance

bland-article-62755

08/29/2025, 9:09 PM

I'm not sure where else to point you to get the fallthrough working.

brash-petabyte-67855

08/29/2025, 9:14 PM

I will dig into the core dns, Thanks a lot for your help and diagnosis/pointers, greatly appreciated! 🍻

brash-petabyte-67855

08/30/2025, 1:18 AM

The DNS is working now, looks like there is some misconfig with my router firewall

brash-petabyte-67855

08/30/2025, 3:01 AM

I missed this https://help.mikrotik.com/docs/spaces/ROS/pages/3211299/NAT#NAT-HairpinNAT (DNS started to work after I switched forward to . 8.8.8.8 and then back to what it was 🙂 The curl worked from the host because there was another network link connected to another NIC (used for management and not available in containers for some reason), when I disconnected it it stopped.

3 Views

Open in Slack

Previous Next