https://rancher.com/ logo
#rke2
Title
n

narrow-noon-75604

06/08/2022, 7:18 PM
Hi, I am trying to install RKE2 v1.23.6 on a RHEL8 cluster (1server & 4agents) using the Airgap tarball method. The installation is successful, the nodes are added properly and into Ready state.
Copy code
kubectl get nodes
NAME           STATUS                     ROLES                       AGE    VERSION
rke2-master    Ready,SchedulingDisabled   control-plane,etcd,master   112m   v1.23.6+rke2r2
rke2-worker1   Ready                      <none>                      108m   v1.23.6+rke2r2
rke2-worker2   Ready                      <none>                      108m   v1.23.6+rke2r2
rke2-worker3   Ready                      <none>                      108m   v1.23.6+rke2r2
All the pods under "kube-system" & "calico-system" namespace are up & Running properly.
Copy code
kubectl get po -n calico-system
NAME                                       READY   STATUS    RESTARTS   AGE
calico-kube-controllers-69cb5d9c7b-xllwx   1/1     Running   0          113m
calico-node-86pcd                          1/1     Running   0          110m
calico-node-dxtjk                          1/1     Running   0          110m
calico-node-lb8n7                          1/1     Running   0          113m
calico-node-ld8mf                          1/1     Running   0          110m
calico-typha-86c8b747c4-rqfhk              1/1     Running   0          113m
calico-typha-86c8b747c4-xs7zs              1/1     Running   0          110m
Copy code
kubectl get po -n kube-system
NAME                                                    READY   STATUS      RESTARTS   AGE
cloud-controller-manager-rke2-master                    1/1     Running     0          114m
etcd-rke2-master                                        1/1     Running     0          114m
helm-install-rke2-calico-crd-j44qb                      0/1     Completed   0          114m
helm-install-rke2-calico-hv4xg                          0/1     Completed   1          114m
helm-install-rke2-coredns-btsgx                         0/1     Completed   0          114m
helm-install-rke2-ingress-nginx-c99sw                   0/1     Completed   0          114m
helm-install-rke2-metrics-server-j5gjq                  0/1     Completed   0          114m
kube-apiserver-rke2-master                              1/1     Running     0          113m
kube-controller-manager-rke2-master                     1/1     Running     0          114m
kube-proxy-rke2-master                                  1/1     Running     0          114m
kube-proxy-rke2-worker1                                 1/1     Running     0          110m
kube-proxy-rke2-worker2                                 1/1     Running     0          110m
kube-proxy-rke2-worker3                                 1/1     Running     0          110m
kube-scheduler-rke2-master                              1/1     Running     0          114m
rke2-coredns-rke2-coredns-69c8f974c-gvwqg               1/1     Running     0          7m56s
rke2-coredns-rke2-coredns-69c8f974c-qmq8m               1/1     Running     0          7m56s
rke2-coredns-rke2-coredns-autoscaler-65c9bb465d-4g4sw   1/1     Running     0          114m
rke2-ingress-nginx-controller-2bhbc                     1/1     Running     0          113m
rke2-ingress-nginx-controller-gnpk5                     1/1     Running     0          110m
rke2-ingress-nginx-controller-znh8z                     1/1     Running     0          110m
rke2-ingress-nginx-controller-ztpjf                     1/1     Running     0          110m
rke2-metrics-server-6564db4569-m5kcc                    1/1     Running     0          113m
I have deployed kafka but the pods are going to "CrashLoopBackOff" with a dns error,
Copy code
java.net.UnknownHostException: zookeeper.msgbus.svc: Temporary failure in name resolution
The logs of coredns pods are throwing a lot of errors as,
Copy code
[ERROR] plugin/errors: 2 4686730224998678655.5699833978585684595. HINFO: read udp xx.xx.xx.xx:51070->yy.yy.yy.yy:53: read: no route to host
[ERROR] plugin/errors: 2 4686730224998678655.5699833978585684595. HINFO: read udp xx.xx.xx.xx:42332->yy.yy.yy.yy:53: read: no route to host
[ERROR] plugin/errors: 2 4686730224998678655.5699833978585684595. HINFO: read udp xx.xx.xx.xx:57092->yy.yy.yy.yy:53: read: no route to host
I have opened port 53 to allow both tcp and udp traffic and am not about the cause of this issue. Any suggestions would be appreciated and let me know if you need any more debug logs
c

creamy-pencil-82913

06/08/2022, 8:13 PM
this usually means that CNI traffic between nodes is being blocked or otherwise dropped. This results in DNS traffic from the node the pod is running on not being able to pass to the node where the coredns pod is running. Can you confirm that all the correct ports are open between nodes?
n

narrow-noon-75604

06/09/2022, 2:15 AM
@creamy-pencil-82913 not all the ports are open between the nodes. Only specific ports mentioned in the rancher document are opened using firewall. https://rancher.com/docs/rancher/v2.5/en/installation/resources/advanced/firewall/
Also I came some known issues related to firewalld mentioned in the rke2 page, https://docs.rke2.io/known_issues/#firewalld-conflicts-with-default-networking I am using calico network stack in my cluster. Do I need to disable firewalld in order to use calico? or Is there any possibility to use calico with firewalld enabled?
c

creamy-pencil-82913

06/09/2022, 4:28 AM
did you open the Rancher/RKE required ports, or the RKE2 required ports, specific to whatever CNI you’re using? https://docs.rke2.io/install/requirements/#networking
and yes, you should disable firewalld. It is not supported by most of the CNI projects.
n

narrow-noon-75604

06/09/2022, 4:34 AM
We are planning to use RKE2 in the production platform. So it is not possible to disable firewalld there, can you please let me know if there is any possibility to use RKE2 with firewalld?
c

creamy-pencil-82913

06/09/2022, 4:34 AM
in production, you have it behind an actual firewall or security groups, right?
Kubernetes isn’t really designed to have the nodes just exposed directly to the internet. Normally you deploy it to a protected network, and expose everything through an external load-balancer and/or ingress.
n

narrow-noon-75604

06/09/2022, 4:45 AM
All the ports are opened in the server node as per the document you shared - https://docs.rke2.io/install/requirements/#networking
@creamy-pencil-82913 can you please share the issue/ticket link stating that firewalld should be disabled in order to support calico network.
c

creamy-pencil-82913

06/09/2022, 5:00 AM
https://projectcalico.docs.tigera.io/getting-started/kubernetes/requirements
If your Linux distribution comes with installed Firewalld or another iptables manager it should be disabled. These may interfere with rules added by Calico and result in unexpected behavior.
As I said, this is a requirement of the CNI projects, not rancher or rke2 itself.
n

narrow-noon-75604

06/09/2022, 5:54 AM
Thanks for all the info @creamy-pencil-82913
h

hundreds-hairdresser-46043

06/10/2022, 8:48 AM
I have the same problem on centos 8 stream with firewalld disabled - does not happen on ubuntu or oracle linux 8
13 Views