After installing the fresh v1.6 all looks good (i ...
# harvester
b
After installing the fresh v1.6 all looks good (i can create VM and stuff), but when I'm trying to provision RKE2 cluster in Harvester from Rancher - it is stuck in Updating state. The clusternamepoolname-machine-provision-xxxxx are having this in the logs:
Downloading driver from <https://myrancher.com/assets/docker-machine-driver-harvester>
Doing /etc/rancher/ssl
Curl failed with error code 7
ls: docker-machine-driver-*: No such file or directory
downloaded file  failed sha256 checksum
download of driver from <https://myrancher.com/assets/docker-machine-driver-harvester> failed
what I'm missing?
when I'm hitting the same URL in browser it downloads a 52.8MB file
this exact bug https://github.com/rancher/rancher/issues/49635 (Just upgraded my Rancher to 2.12)
My Rancher cert is created by LetsEncrypt and is legit... had to set the
agent-tls-mode: system-store
(was hitting another complication), not sure if it can affect this.
b
Ah, I think when I hit this before it was dns fallthrough with coredns.
Yeah be sure you have
cluster.local
in your coreDNS config, but also your
<http://rancher.fqdn.com|rancher.fqdn.com>
in the fallthrough there too. I think the object is
kubectl edit configmap/coredns -n kube-system
b
Copy code
kubectl edit configmap/rke2-coredns-rke2-coredns -n=kube-system
Copy code
apiVersion: v1
data:
  Corefile: |-
    .:53 {
        errors
        health {
            lameduck 10s
        }
        ready
        kubernetes  cluster.local  cluster.local in-addr.arpa ip6.arpa {
            pods insecure
            fallthrough in-addr.arpa ip6.arpa <http://myrancher.com|myrancher.com>
            ttl 30
        }
        prometheus  0.0.0.0:9153
        forward  . /etc/resolv.conf
        cache  30
        loop
        reload
        loadbalance
    }
kind: ConfigMap
metadata:
  annotations:
    <http://meta.helm.sh/release-name|meta.helm.sh/release-name>: rke2-coredns
    <http://meta.helm.sh/release-namespace|meta.helm.sh/release-namespace>: kube-system
  creationTimestamp: "2025-07-14T17:55:03Z"
  labels:
    <http://app.kubernetes.io/instance|app.kubernetes.io/instance>: rke2-coredns
    <http://app.kubernetes.io/managed-by|app.kubernetes.io/managed-by>: Helm
    <http://app.kubernetes.io/name|app.kubernetes.io/name>: rke2-coredns
    <http://helm.sh/chart|helm.sh/chart>: rke2-coredns-1.42.302
    k8s-app: kube-dns
    <http://kubernetes.io/cluster-service|kubernetes.io/cluster-service>: "true"
    <http://kubernetes.io/name|kubernetes.io/name>: CoreDNS
  name: rke2-coredns-rke2-coredns
  namespace: kube-system
  resourceVersion: "22735246"
  uid: 28283b3e-adfa-4e06-9334-b1c9f9fa43dd
added rancher.fqdn.com - same error... do i need to restart anything?
restarted Rancher cluster, didn't help
b
just to be clear, you didn't literally add
<http://rancher.fqdn.com|rancher.fqdn.com>
but you added your address.
and the
<http://myrancher.com|myrancher.com>
on the
fallthrough
line matches it
You have
cluster.local
in there twice too
which doesn't seem right
b
Yes, added my real domain (should I add IP address there?). Taking care of cluster.local duplication..
b
no, the next step is to shell into the provisioner pod and make sure you can curl the file
b
it gets killed in a seconds, is there a way to slow the execution?
b
Deploy another pod into the same namespace
I like the swissarmyknife one, but anything with some tools would work.
b
b
I normally use
rancherlabs/swiss-army-knife
but it's probably similar
๐Ÿ‘ 1
b
curl: (6) Could not resolve host: myrancher.com
b
So DNS isn't working
Did you rollout CoreDNS after making your edits?
b
redeploying..
b
๐Ÿ™‚
and your myrancher.com address resolves from the host cli right?
b
yes, from everywhere except of kube-system ns
b
That seems odd. Just that namespace?
in
default
it works?
Do you have security apps or stuff installed in your cluster?
b
rke2-coredns-rke2-coredns didn't help, just the default rke2 rancher
does not resolve in default as well
b
Ok, so it's not doing the fallthrough you can resolve other address though, correct? Like you can ping google.com ?
and you can ping your IP of your FQDN?
b
yes, google.com is pingable
and can ping IP of myrancher.com
b
Here's the patch of our coreDNS for the helm:
Copy code
apiVersion: <http://helm.cattle.io/v1|helm.cattle.io/v1>
kind: HelmChartConfig
metadata:
  name: rke2-coredns
  namespace: kube-system
spec:
  valuesContent: |-
    prometheus:
      service:
        enabled: true
      monitor:
        enabled: true
    replicaCount: 3
    servers:
    - zones:
      - zone: .
      port: 53
      # If serviceType is nodePort you can specify nodePort here
      # nodePort: 30053
      # hostPort: 53
      plugins:
      - name: errors
      # Serves a /health endpoint on :8080, required for livenessProbe
      - name: health
        configBlock: |-
          lameduck 5s
      # Serves a /ready endpoint on :8181, required for readinessProbe
      - name: ready
      # Required to query kubernetes API for data
      - name: kubernetes
        parameters: <http://rke.aristotle.ucsb.edu|rke.aristotle.ucsb.edu> cluster.local in-addr.arpa ip6.arpa
        configBlock: |-
          pods insecure
          fallthrough in-addr.arpa ip6.arpa <http://rke.aristotle.ucsb.edu|rke.aristotle.ucsb.edu>
          ttl 30
      # Serves a /metrics endpoint on :9153, required for serviceMonitor
      - name: prometheus
        parameters: 0.0.0.0:9153
      - name: forward
        parameters: . /etc/resolv.conf
      - name: cache
        parameters: 30
      - name: loop
      - name: reload
      - name: loadbalance
I'm not sure where else to point you to get the fallthrough working.
b
I will dig into the core dns, Thanks a lot for your help and diagnosis/pointers, greatly appreciated! ๐Ÿป
The DNS is working now, looks like there is some misconfig with my router firewall
I missed this https://help.mikrotik.com/docs/spaces/ROS/pages/3211299/NAT#NAT-HairpinNAT (DNS started to work after I switched forward to . 8.8.8.8 and then back to what it was ๐Ÿ™‚ The curl worked from the host because there was another network link connected to another NIC (used for management and not available in containers for some reason), when I disconnected it it stopped.