https://rancher.com/ logo
Title
s

steep-london-53093

02/19/2023, 10:19 AM
Hi folks, I am trying to setup HA RKE2 cluster (advertise-address is load-balancer address) using Cilium CNI, without kube-proxy component from stable channel on Ubuntu 22.04.2. Internet access via HTTP-proxy. First master node created. Problem: all pods started, but CoreDNS failing probes. Can you give me any hint to find the problem?
Events:                                                                                                                                                                                                                                                       
  Type     Reason     Age                     From     Message                                                                                                                                                                                                
  ----     ------     ----                    ----     -------                                                                                                                                                                                                
  Warning  Unhealthy  22m (x1150 over 19h)    kubelet  Liveness probe failed: Get "<http://10.42.0.160:8080/health>": context deadline exceeded (Client.Timeout exceeded while awaiting headers)                                                                
  Warning  BackOff    7m27s (x3234 over 19h)  kubelet  Back-off restarting failed container                                                                                                                                                                   
  Warning  Unhealthy  2m27s (x3001 over 19h)  kubelet  Readiness probe failed: Get "<http://10.42.0.160:8181/ready>": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
Configuration:
write-kubeconfig-mode: "0644"
advertise-address:
  - 192.168.0.12
cni:
  - cilium
disable-kube-proxy: true
disable:
  - rke2-canal
  - rke2-kube-proxy
# Make a etcd snapshot every 6 hours
etcd-snapshot-schedule-cron: " */6 * * *"
# Keep 56 etcd snapshorts (equals to 2 weeks with 6 a day)
etcd-snapshot-retention: 56
node-taint:
  - "CriticalAddonsOnly=true:NoExecute"
tls-san:
  - 192.168.0.12
c

creamy-pencil-82913

02/19/2023, 6:20 PM
You don't need to use an external loadbalancer for the advertise-address to make it HA. In fact that's probably what's causing you problems. Just deploy 3 server nodes and as many agents as you need.
s

steep-london-53093

02/19/2023, 6:48 PM
Uninstall node and install it again without advertise-address, but result is same. Also etcd has a strange records in log:
Feb 19 18:46:38 k8s-master-01 rke2[184220]: time="2023-02-19T18:46:38Z" level=error msg="Failed to connect to proxy. Empty dialer response" error="dial tcp 192.168.0.13:9345: connect: no route to host"
Feb 19 18:46:38 k8s-master-01 rke2[184220]: time="2023-02-19T18:46:38Z" level=error msg="Remotedialer proxy error" error="dial tcp 192.168.0.13:9345: connect: no route to host"
192.168.0.13 - address of this node
c

creamy-pencil-82913

02/20/2023, 6:05 AM
Are your nodes all on the same network, and can reach each other at their primary IP addresses? Do you have the correct ports opened? Have you disabled firewalld or ufw?
s

steep-london-53093

02/20/2023, 6:23 AM
This is the single node. In 192.168.0.0/24 network. Ufw disabled.
Node with single network interface.
So, etcd in log I show earlier can't connect to port on the same server.
Additional cilium config:
root@k8s-master-01:~# cat /var/lib/rancher/rke2/server/manifests/rke2-cilium-config.yaml
---
apiVersion: <http://helm.cattle.io/v1|helm.cattle.io/v1>
kind: HelmChartConfig
metadata:
 name: rke2-cilium
 namespace: kube-system
spec:
 valuesContent: |-
  kubeProxyReplacement: strict
  k8sServiceHost: 192.168.0.12
  k8sServicePort: 6443
  cni:
   chainingMode: "none"
Firewall disabled:
root@k8s-master-01:~# ufw status verbose
Status: inactive
root@k8s-master-01:~# iptables -L -n -v
Chain INPUT (policy ACCEPT 14367 packets, 52M bytes)
 pkts bytes target     prot opt in     out     source               destination         

Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination         

Chain OUTPUT (policy ACCEPT 4777 packets, 332K bytes)
 pkts bytes target     prot opt in     out     source               destination
@creamy-pencil-82913 can you give me any hint? May be this problems connected with networking (cilium)?
p

powerful-elephant-25838

04/07/2023, 1:45 PM
similar problem here no way to have cilium without kube-proxy working @steep-london-53093 have you resolved?
has anyone managed to have a RKE2 downstream cluster with cilium without kubeproxy? It does not seems to work in any way CoreDNS not working api not reachable