https://rancher.com/ logo
Title
s

stale-painting-80203

03/24/2023, 10:57 PM
Just tried to create a downstream RKE2 cluster and see the following pods go into a CrashLoop. The cluster shows as active in rancher, so whats the impact of the cluster functioning correctly and how do I fix the crash?
kube-system           helm-install-rke2-ingress-nginx-tp5xd                   0/1     CrashLoopBackOff   45 (58s ago)    
kube-system           helm-install-rke2-metrics-server-rq8mc                  0/1     CrashLoopBackOff   45 (104s ago)
Both seem to have the same error:
+ helm_v3 install --set-string global.clusterCIDR=10.42.0.0/16 --set-string global.clusterCIDRv4=10.42.0.0/16 --set-string global.clusterDNS=10.43.0.10 --set-string global.clusterDomain=cluster.local --set-string global.rke2DataDir=/var/lib/rancher/rke2 --set-string global.serviceCIDR=10.43.0.0/16 rke2-ingress-nginx /tmp/rke2-ingress-nginx.tgz
Error: INSTALLATION FAILED: Kubernetes cluster unreachable: Get "<https://10.43.0.1:443/version>": dial tcp 10.43.0.1:443: i/o timeout
+ exit

+ helm_v3 install --set-string global.clusterCIDR=10.42.0.0/16 --set-string global.clusterCIDRv4=10.42.0.0/16 --set-string global.clusterDNS=10.43.0.10 --set-string global.clusterDomain=cluster.local --set-string global.rke2DataDir=/var/lib/rancher/rke2 --set-string global.serviceCIDR=10.43.0.0/16 rke2-metrics-server /tmp/rke2-metrics-server.tgz
Error: INSTALLATION FAILED: Kubernetes cluster unreachable: Get "<https://10.43.0.1:443/version>": dial tcp 10.43.0.1:443: i/o timeout
c

creamy-pencil-82913

03/24/2023, 11:26 PM
well it looks like the pods cant’ reach the in-cluster apiserver endpoint, so that’s going to break pretty much everything
are you sure you got all the correct ports open between nodes?
s

stale-painting-80203

03/24/2023, 11:29 PM
Calico, fleet-agent pods were running fine on that node. Anyway I just drained that node and it the crashing pods got moved to another node and as up and running. I was mostly curious as to the purpose for those crashing pods
Whoops spoke to soon. Now they are crashing on the other node
Removed the iptables rules completely to see if it's something to do with the ports and now all pods are working, so it's something to do with ports getting blocked.
I enabled the ports as per: # as per <https://ranchermanager.docs.rancher.com/v2.6/getting-started/installation-and-upgrade/installation-requirements/port-requirements>
firewall_allowed_tcp_ports:
  - "22"          # Node driver SSH provisioning
  - "80"          # http
  - "443"         # https
  - "2376"        # Node driver Docker daemon TLS port
  - "2379"        # etcd client requests
  - "2380"        # etcd peer communication
  - "6443"        # RKE2 Kubernetes API
  - "8443"        # Rancher webhook
  - "9099"        # Canal/Flannel livenessProbe/readinessProbe
  - "9100"        # Default port required by Monitoring to scrape metrics from Linux node-exporters
  - "9443"        # Rancher webhook
  - "9345"        # RKE2 Kubernetes API
  - "9796"        # Default port required by Monitoring to scrape metrics from Windows node-exporters
  - "10250"       # Metrics server communication with all nodes API
  - "10254"       # Ingress controller livenessProbe/readinessProbe
firewall_allowed_udp_ports:
  - "8472"        # Canal/Flannel VXLAN overlay networking
firewall_additional_rules:   # TCP/UDP NodePort port range
  - "iptables -A INPUT -p tcp --match multiport --dports 30000:32767 -j ACCEPT"
  - "iptables -A INPUT -p udp --match multiport --dports 30000:32767 -j ACCEPT"
c

creamy-pencil-82913

03/25/2023, 6:32 PM
Those are the docs for an old version of Rancher, you should reference the rke2 docs: https://docs.rke2.io/install/requirements#networking
s

stale-painting-80203

03/27/2023, 6:54 PM
Thanks for pointing me to the RKE2 docs, as it pointed out additional ports which needed to be opened. I ran a couple of tests creating clusters and it seems to be working!