Hi, Ive been struggling with trying to bootstrap a...
# rke2
m
Hi, Ive been struggling with trying to bootstrap a cluster for the last few weeks at this point and i'm not sure whats causing the issue. Ive stripped the cluster down to a single server node to try and get just 1 node even running so far... The node itself says that it is ready in kubectl, however, looking at the actual pods there are several in crash loops.
Accidentially sent before I finished... I'm trying to run the cluster with calico as the CNI. and checking the logs in the calico controller I get several lines saying
Error getting cluster information config ClusterInformation="default" error=Get "<https://10.43.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default>": dial tcp 10.43.0.1:443: i/o timeout
The pods that are in crash loops are mostly helm deployments for nginx-ingress and metrics server, both also complaining about timeouts to 10.43.0.1 I can see that the kube-apiserver is running and is healthy and has an endpoint and endpoint slice for the nodes ip address but not for 10.43.0.1. I can also curl/wget the api server using the
<https://node-ip:6443>
but not from within the service cidr
n
do CNI pods run ok?
m
as far as I can tell yes, the calico-node pod is marked as ready and I cant see anything immediatly popping out to my eyes in the logs for it
The calico-kube-controllers however is not ready as it cannot contact the apiserver
n
calico-node pods in
calico-system
namespace must be running 1/1
m
image.png
They are for calico-node
n
did you do any special configuration to the calico cni?
m
Ive currently got a HelmChartConfig in the manifests directory set to:
Copy code
apiVersion: <http://helm.cattle.io/v1|helm.cattle.io/v1>
kind: HelmChartConfig
metadata:
  name: rke2-calico
  namespace: kube-system
spec:
  valuesContent: |-
    installation:
      calicoNetwork:
        mtu: 1440
but outside of that no customisation
I had the same issues without this customisation though
n
and kube-proxy is running? there is either some network policy or for some unknown reason, nat rules are not set
m
kube-proxy is running and is 1/1 and checking the logs I notice an initial rbac issue:
Copy code
E0824 13:27:27.838076       1 server.go:704] "Failed to retrieve node info" err="nodes \"soc-bmm-prod-sehby1-00\" is forbidden: User \"system:kube-proxy\" cannot get resource \"nodes\" in API gro │
│ up \"\" at the cluster scope"                                                                                                                                                                       │
│ E0824 13:27:28.895193       1 server.go:704] "Failed to retrieve node info" err="nodes \"soc-bmm-prod-sehby1-00\" is forbidden: User \"system:kube-proxy\" cannot get resource \"nodes\" in API gro │
│ up \"\" at the cluster scope"                                                                                                                                                                       │
│ I0824 13:27:31.175331       1 server.go:715] "Successfully retrieved node IP(s)" IPs=["10.32.64.100"]                                                                                               │
│ E0824 13:27:31.175803       1 server.go:245] "Kube-proxy configuration may be incomplete or incorrect" err="nodePortAddresses is unset; NodePort connections will be accepted on all local IPs. Con │
│ sider using `--nodeport-addresses primary`"                                                                                                                                                         │
│ I0824 13:27:31.273326       1 server.go:254] "kube-proxy running in dual-stack mode" primary ipFamily="IPv4"                                                                                        │
│ I0824 13:27:31.273455       1 server_linux.go:145] "Using iptables Proxier"                                                                                                                         │
│ I0824 13:27:31.282489       1 proxier.go:243] "Setting route_localnet=1 to allow node-ports on localhost; to change this either disable iptables.localhostNodePorts (--iptables-localhost-nodeports │
│ ) or set nodePortAddresses (--nodeport-addresses) to filter loopback addresses" ipFamily="IPv4"                                                                                                     │
│ I0824 13:27:31.295145       1 server.go:516] "Version info" version="v1.33.3+rke2r1"                                                                                                                │
│ I0824 13:27:31.295212       1 server.go:518] "Golang settings" GOGC="" GOMAXPROCS="" GOTRACEBACK=""                                                                                                 │
│ I0824 13:27:31.311702       1 config.go:440] "Starting serviceCIDR config controller"                                                                                                               │
│ I0824 13:27:31.311866       1 shared_informer.go:350] "Waiting for caches to sync" controller="serviceCIDR config"                                                                                  │
│ I0824 13:27:31.311964       1 config.go:199] "Starting service config controller"                                                                                                                   │
│ I0824 13:27:31.311982       1 shared_informer.go:350] "Waiting for caches to sync" controller="service config"                                                                                      │
│ I0824 13:27:31.312024       1 config.go:105] "Starting endpoint slice config controller"                                                                                                            │
│ I0824 13:27:31.312040       1 shared_informer.go:350] "Waiting for caches to sync" controller="endpoint slice config"                                                                               │
│ I0824 13:27:31.312149       1 config.go:329] "Starting node config controller"                                                                                                                      │
│ I0824 13:27:31.312174       1 shared_informer.go:350] "Waiting for caches to sync" controller="node config"                                                                                         │
│ I0824 13:27:31.412580       1 shared_informer.go:357] "Caches are synced" controller="node config"
Im not sure how to check for the nat rules unfortunatly 😅
n
could you share rke2.config without token and
kubectl get pods -A
?
m
Copy code
# Ansible managed

node-name: "soc-bmm-prod-sehby1-00"
node-ip: "10.32.64.100"
node-external-ip: "10.32.64.100"

cni: calico

write-kubeconfig-mode: "0640"

tls-san-security: true
tls-san:
- "10.32.64.99"
- "10.32.64.100"
- "10.32.64.101"
- "10.32.64.102"
- "10.32.64.103"
- "10.32.64.104"

cluster-cidr: "10.42.0.0/16"
service-cidr: "10.43.0.0/16"
service-node-port-range: "30000-32767"
cluster-dns: "10.43.0.10"

lb-server-port: "6444"

disable-cloud-controller: true
and the pods:
Copy code
carbon@soc-bmm-prod-sehby1-00:~ $ sudo kubectl get pods -A --kubeconfig /etc/rancher/rke2/rke2.yaml
NAMESPACE         NAME                                                   READY   STATUS             RESTARTS         AGE
calico-system     calico-kube-controllers-7c4fc78f4c-xfcjj               0/1     CrashLoopBackOff   12 (2m15s ago)   52m
calico-system     calico-node-vlhn5                                      1/1     Running            0                52m
calico-system     calico-typha-665ff8dd87-h2xkj                          1/1     Running            0                52m
kube-system       etcd-soc-bmm-prod-sehby1-00                            1/1     Running            0                53m
kube-system       helm-install-rke2-calico-99fxg                         0/1     Completed          2                53m
kube-system       helm-install-rke2-calico-crd-t6nsh                     0/1     Completed          0                53m
kube-system       helm-install-rke2-coredns-tpfpf                        0/1     Completed          0                53m
kube-system       helm-install-rke2-ingress-nginx-mpp8b                  0/1     CrashLoopBackOff   12 (96s ago)     53m
kube-system       helm-install-rke2-metrics-server-dj7fk                 0/1     CrashLoopBackOff   12 (97s ago)     53m
kube-system       helm-install-rke2-runtimeclasses-j8772                 0/1     CrashLoopBackOff   12 (2m6s ago)    53m
kube-system       helm-install-rke2-snapshot-controller-cb4tm            0/1     CrashLoopBackOff   12 (2m19s ago)   53m
kube-system       helm-install-rke2-snapshot-controller-crd-vsfzh        0/1     CrashLoopBackOff   12 (2m5s ago)    53m
kube-system       kube-apiserver-soc-bmm-prod-sehby1-00                  1/1     Running            0                53m
kube-system       kube-controller-manager-soc-bmm-prod-sehby1-00         1/1     Running            0                53m
kube-system       kube-proxy-soc-bmm-prod-sehby1-00                      1/1     Running            0                53m
kube-system       kube-scheduler-soc-bmm-prod-sehby1-00                  1/1     Running            0                53m
kube-system       kube-vip-7724d                                         1/1     Running            0                53m
kube-system       rke2-coredns-rke2-coredns-65dc69968-jk5xc              0/1     Running            14 (5m18s ago)   52m
kube-system       rke2-coredns-rke2-coredns-autoscaler-68d5f76f7-85bhq   0/1     CrashLoopBackOff   15 (3m33s ago)   52m
tigera-operator   tigera-operator-fcbdc5c89-j9swz                        1/1     Running            0                52m
n
try to put this:
disable-cloud-controller: true
away. you must have cloud-controller. and I do not see you have any.
m
even if im not running in a cloud?
n
even then
m
ill just nuke and re-bootstrap and see what happens
n
built-in cloud controller is kind of fake controller, but some things must be set up anyway.
👍 1
m
looks like the same this is happening after removing that from the config
wait, sorry, ignore that, diddnt save the config template 🙃 gonna re-re-bootstrap again
ok, this time there is now a cloud controller pod running. was watching the logs of the calico-kube-controllers and it still seems to be timing out when trying to contact the apiserver
Copy code
2025-08-24 14:36:57.928 [INFO][1] typha/cmdwrapper.go 56: Starting /usr/bin/kube-controllers                                                                                                        │
│ 2025-08-24 14:36:58.057 [INFO][15] kube-controllers/main.go 95: Loaded configuration from environment config=&config.Config{LogLevel:"info", WorkloadEndpointWorkers:1, ProfileWorkers:1, PolicyWor │
│ kers:1, NodeWorkers:1, Kubeconfig:"", DatastoreType:"kubernetes"}                                                                                                                                   │
│ 2025-08-24 14:36:58.064 [INFO][15] kube-controllers/winutils.go 149: Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.                                                   │
│ 2025-08-24 14:36:58.066 [INFO][15] kube-controllers/main.go 119: Ensuring Calico datastore is initialized                                                                                           │
│ 2025-08-24 14:37:28.734 [ERROR][15] kube-controllers/client.go 339: Error getting cluster information config ClusterInformation="default" error=Get "<https://10.43.0.1:443/apis/crd.projectcalico.o> │
│ rg/v1/clusterinformations/default": dial tcp 10.43.0.1:443: i/o timeout                                                                                                                             │
│ 2025-08-24 14:37:28.734 [INFO][15] kube-controllers/client.go 261: Unable to initialize ClusterInformation error=Get "<https://10.43.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/defau> │
│ lt": dial tcp 10.43.0.1:443: i/o timeout                                                                                                                                                            │
│ 2025-08-24 14:37:58.487 [INFO][15] kube-controllers/client.go 267: Unable to initialize default Tier error=Post "<https://10.43.0.1:443/apis/crd.projectcalico.org/v1/tiers>": context deadline excee │
│ ded                                                                                                                                                                                                 │
│ 2025-08-24 14:37:58.487 [INFO][15] kube-controllers/client.go 273: Unable to initialize adminnetworkpolicy Tier error=client rate limiter Wait returned an error: context deadline exceeded         │
│ 2025-08-24 14:37:58.488 [INFO][15] kube-controllers/client.go 279: Unable to initialize baselineadminnetworkpolicy Tier error=client rate limiter Wait returned an error: context deadline exceeded │
│ 2025-08-24 14:37:58.488 [INFO][15] kube-controllers/main.go 126: Failed to initialize datastore error=Get "<https://10.43.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default>": dial t │
│ cp 10.43.0.1:443: i/o timeout                                                                                                                                                                       │
│ 2025-08-24 14:37:58.488 [FATAL][15] kube-controllers/main.go 139: Failed to initialize Calico datastore                                                                                             │
│ 2025-08-24 14:37:58.497 [ERROR][1] typha/cmdwrapper.go 80: Failed so send signal to wrapped command process error=os: process already finished                                                      │
│ 2025-08-24 14:37:58.498 [ERROR][1] typha/cmdwrapper.go 80: Failed so send signal to wrapped command process error=os: process already finished                                                      │
│ Stream closed EOF for calico-system/calico-kube-controllers-df6777989-zk9wz (calico-kube-controllers)
re getting the pod list:
Copy code
NAMESPACE         NAME                                                   READY   STATUS      RESTARTS      AGE
calico-system     calico-kube-controllers-df6777989-zk9wz                0/1     Running     2 (36s ago)   3m20s
calico-system     calico-node-x6nkp                                      1/1     Running     0             3m20s
calico-system     calico-typha-647c647c64-vhkdd                          1/1     Running     0             3m20s
kube-system       cloud-controller-manager-soc-bmm-prod-sehby1-00        1/1     Running     0             4m19s
kube-system       etcd-soc-bmm-prod-sehby1-00                            1/1     Running     0             4m19s
kube-system       helm-install-rke2-calico-4xhxd                         0/1     Completed   2             4m28s
kube-system       helm-install-rke2-calico-crd-5mxr4                     0/1     Completed   0             4m28s
kube-system       helm-install-rke2-coredns-ld7hv                        0/1     Completed   0             4m28s
kube-system       helm-install-rke2-ingress-nginx-bzksz                  1/1     Running     2 (40s ago)   4m28s
kube-system       helm-install-rke2-metrics-server-9rmzm                 1/1     Running     2 (50s ago)   4m28s
kube-system       helm-install-rke2-runtimeclasses-65qfv                 1/1     Running     2 (51s ago)   4m28s
kube-system       helm-install-rke2-snapshot-controller-crd-98blg        1/1     Running     2 (37s ago)   4m28s
kube-system       helm-install-rke2-snapshot-controller-dc9rh            1/1     Running     2 (47s ago)   4m28s
kube-system       kube-apiserver-soc-bmm-prod-sehby1-00                  1/1     Running     0             4m19s
kube-system       kube-controller-manager-soc-bmm-prod-sehby1-00         1/1     Running     0             4m19s
kube-system       kube-proxy-soc-bmm-prod-sehby1-00                      1/1     Running     0             4m19s
kube-system       kube-scheduler-soc-bmm-prod-sehby1-00                  1/1     Running     0             4m19s
kube-system       kube-vip-66htq                                         1/1     Running     0             4m27s
kube-system       rke2-coredns-rke2-coredns-65dc69968-t7rdg              0/1     Running     1 (32s ago)   3m58s
kube-system       rke2-coredns-rke2-coredns-autoscaler-68d5f76f7-cmkk7   1/1     Running     1 (47s ago)   3m58s
tigera-operator   tigera-operator-fcbdc5c89-dpjvc                        1/1     Running     0             3m36s
not sure if its helpful info but checking
ip route
and I dont see any routes for 10.43.0.1:
Copy code
default via 10.32.64.1 dev datacenter0 proto static
10.32.64.0/24 dev datacenter0 proto kernel scope link src 10.32.64.100
10.42.208.192 dev calid0cef6a153a scope link
blackhole 10.42.208.192/26 proto 80
10.42.208.193 dev cali63a35f672b7 scope link
10.42.208.194 dev calia397e8fab13 scope link
10.42.208.195 dev cali08f167ef405 scope link
10.42.208.196 dev calie9395c23e6e scope link
10.42.208.197 dev calid55f83deec4 scope link
10.42.208.198 dev cali9f72b8b301a scope link
10.42.208.199 dev cali1d401316366 scope link
n
no, there won't be any. this is translated via nat, something like this:
Copy code
iptables -L -t nat -n | grep 10.43.0.1
KUBE-SVC-NPX46M4PTMTKRN6Y  6    --  0.0.0.0/0            10.43.0.1            /* default/kubernetes:https cluster IP */ tcp dpt:443
KUBE-SVC-YFPH5LFNKP7E3G4L  17   --  0.0.0.0/0            10.43.0.10           /* kube-system/rke2-coredns-rke2-coredns:udp-53 cluster IP */ udp dpt:53
KUBE-SVC-PUNXDRXNIM3ELMDM  6    --  0.0.0.0/0            10.43.0.10           /* kube-system/rke2-coredns-rke2-coredns:tcp-53 cluster IP */ tcp dpt:53
KUBE-MARK-MASQ  6    -- !10.42.0.0/16         10.43.0.1            /* default/kubernetes:https cluster IP */ tcp dpt:443
KUBE-MARK-MASQ  6    -- !10.42.0.0/16         10.43.0.10           /* kube-system/rke2-coredns-rke2-coredns:tcp-53 cluster IP */ tcp dpt:53
KUBE-MARK-MASQ  17   -- !10.42.0.0/16         10.43.0.10           /* kube-system/rke2-coredns-rke2-coredns:udp-53 cluster IP */ udp dpt:53
m
I only get:
Copy code
sudo iptables -L -t nat -n | grep 10.43.0.1
KUBE-SVC-NPX46M4PTMTKRN6Y  6    --  0.0.0.0/0            10.43.0.1            /* default/kubernetes:https cluster IP */ tcp dpt:443
KUBE-MARK-MASQ  6    -- !10.42.0.0/16         10.43.0.1            /* default/kubernetes:https cluster IP */ tcp dpt:443
n
this is ok.
looks like some calico global or k8s network policy is in place
maybe give it a try without setting: node-ip: "10.32.64.100" node-external-ip: "10.32.64.100" just minimal config, this should not be needed.
and cluster-dns: "10.43.0.10"
m
I did try previously without setting either of those, but that was also with the cloud controller disabled. Ill give it a whirl with those also removed 🙂
n
did you also removed etcd content when re-bootstrap? to ensure, all previous objects are removed.
m
yea I have an ansible script that removes the following dirs recursively: • /etc/rancher • /var/lib/rancher • /etc/cni/net.d • /var/lib/kubelet and also reboots the node before bootstrapping again
Ok config is now:
Copy code
# Ansible managed

node-name: "soc-bmm-prod-sehby1-00"

cni: calico

write-kubeconfig-mode: "0640"

tls-san-security: true
tls-san:
- "10.32.64.99"
- "10.32.64.100"
- "10.32.64.101"
- "10.32.64.102"
- "10.32.64.103"
- "10.32.64.104"
Just waiting for the pods to come up now
still i/o timeouts going to 10.43.0.1 unfortunatly 😞
Also just gave it a go with the canal CNI just to check and I get the same 10.43.0.1 i/o timeouts, going to return it back to calico just to continue here but yea 😕
n
are you sure, there is no network policy or calicoglobalnetwork policy?
m
There are no NetworkPolicies or globalnetworkpolicies defined in the cluster
I did try adding a global network policies to allow access to the kube apiserver from all services but that did not make much of a difference
n
are you able to trace packets using tcpdump?
could check if a packet is leaving cali interface and if it is send to any other or anything.
tcpdump -ni any
(but avoid port 22 if connecting via ssh)
m
did a tcpdump specifically on the 10.43.0.1 address and can see that traffic from lo seems to be working but anythuing comming from the cali interfaces dont get an acknoledgement:
Copy code
21:52:11.290555 lo    In  IP 10.43.0.1.https > soc-bmm-prod-sehby1-00.53596: Flags [P.], seq 2231431:2231577, ack 2841378, win 19442, options [nop,nop,TS val 740001056 ecr 3146050799], length 146
21:52:11.408139 lo    In  IP 10.43.0.1.https > soc-bmm-prod-sehby1-00.53596: Flags [.], ack 2841881, win 19442, options [nop,nop,TS val 740001173 ecr 3146051143], length 0
21:52:11.416261 lo    In  IP 10.43.0.1.https > soc-bmm-prod-sehby1-00.53596: Flags [P.], seq 2231577:2231667, ack 2841881, win 19442, options [nop,nop,TS val 740001181 ecr 3146051143], length 90
21:52:11.417021 lo    In  IP 10.43.0.1.https > soc-bmm-prod-sehby1-00.53596: Flags [P.], seq 2231667:2232129, ack 2841881, win 19442, options [nop,nop,TS val 740001182 ecr 3146051151], length 462
21:52:11.998311 lo    In  IP 10.43.0.1.https > soc-bmm-prod-sehby1-00.38884: Flags [P.], seq 817:988, ack 1, win 611, options [nop,nop,TS val 740001763 ecr 3146049516], length 171
21:52:12.082844 lo    In  IP 10.43.0.1.https > soc-bmm-prod-sehby1-00.53594: Flags [.], ack 6036, win 611, options [nop,nop,TS val 740001848 ecr 3146051817], length 0
21:52:12.095261 lo    In  IP 10.43.0.1.https > soc-bmm-prod-sehby1-00.53594: Flags [P.], seq 6677:6767, ack 6036, win 611, options [nop,nop,TS val 740001860 ecr 3146051817], length 90
21:52:12.095500 lo    In  IP 10.43.0.1.https > soc-bmm-prod-sehby1-00.53594: Flags [P.], seq 6767:7187, ack 6036, win 611, options [nop,nop,TS val 740001861 ecr 3146051817], length 420
21:52:13.047216 cali7787a67fd2b In  IP 10.42.208.194.47170 > 10.43.0.1.https: Flags [S], seq 332905808, win 64390, options [mss 1370,sackOK,TS val 1800120780 ecr 0,nop,wscale 7], length 0
21:52:13.097518 lo    In  IP 10.43.0.1.https > soc-bmm-prod-sehby1-00.53594: Flags [.], ack 6497, win 611, options [nop,nop,TS val 740002863 ecr 3146052832], length 0
21:52:13.109879 lo    In  IP 10.43.0.1.https > soc-bmm-prod-sehby1-00.53594: Flags [P.], seq 7187:7277, ack 6497, win 611, options [nop,nop,TS val 740002875 ecr 3146052832], length 90
21:52:13.110092 lo    In  IP 10.43.0.1.https > soc-bmm-prod-sehby1-00.53594: Flags [P.], seq 7277:7697, ack 6497, win 611, options [nop,nop,TS val 740002875 ecr 3146052832], length 420
21:52:13.421045 lo    In  IP 10.43.0.1.https > soc-bmm-prod-sehby1-00.53596: Flags [.], ack 2842384, win 19442, options [nop,nop,TS val 740003186 ecr 3146053155], length 0
21:52:13.429747 lo    In  IP 10.43.0.1.https > soc-bmm-prod-sehby1-00.53596: Flags [P.], seq 2232129:2232219, ack 2842384, win 19442, options [nop,nop,TS val 740003195 ecr 3146053155], length 90
21:52:13.430031 lo    In  IP 10.43.0.1.https > soc-bmm-prod-sehby1-00.53596: Flags [P.], seq 2232219:2232681, ack 2842384, win 19442, options [nop,nop,TS val 740003195 ecr 3146053165], length 462
21:52:14.063490 cali7787a67fd2b In  IP 10.42.208.194.47170 > 10.43.0.1.https: Flags [S], seq 332905808, win 64390, options [mss 1370,sackOK,TS val 1800121797 ecr 0,nop,wscale 7], length 0
21:52:14.113343 lo    In  IP 10.43.0.1.https > soc-bmm-prod-sehby1-00.53594: Flags [.], ack 6958, win 611, options [nop,nop,TS val 740003878 ecr 3146053848], length 0
21:52:14.123966 lo    In  IP 10.43.0.1.https > soc-bmm-prod-sehby1-00.53594: Flags [P.], seq 7697:7787, ack 6958, win 611, options [nop,nop,TS val 740003889 ecr 3146053848], length 90
21:52:14.124701 lo    In  IP 10.43.0.1.https > soc-bmm-prod-sehby1-00.53594: Flags [P.], seq 7787:8207, ack 6958, win 611, options [nop,nop,TS val 740003890 ecr 3146053848], length 420
21:52:15.087558 cali7787a67fd2b In  IP 10.42.208.194.47170 > 10.43.0.1.https: Flags [S], seq 332905808, win 64390, options [mss 1370,sackOK,TS val 1800122821 ecr 0,nop,wscale 7], length 0
21:52:15.128489 lo    In  IP 10.43.0.1.https > soc-bmm-prod-sehby1-00.53594: Flags [.], ack 7419, win 611, options [nop,nop,TS val 740004894 ecr 3146054863], length 0
21:52:15.138606 lo    In  IP 10.43.0.1.https > soc-bmm-prod-sehby1-00.53594: Flags [P.], seq 8207:8297, ack 7419, win 611, options [nop,nop,TS val 740004904 ecr 3146054863], length 90
21:52:15.138825 lo    In  IP 10.43.0.1.https > soc-bmm-prod-sehby1-00.53594: Flags [P.], seq 8297:8717, ack 7419, win 611, options [nop,nop,TS val 740004904 ecr 3146054863], length 420
21:52:15.183994 calie602664b6a1 In  IP 10.42.208.195.44934 > 10.43.0.1.https: Flags [S], seq 280413440, win 64390, options [mss 1370,sackOK,TS val 1410499711 ecr 0,nop,wscale 7], length 0
21:52:15.351205 lo    In  IP 10.43.0.1.https > soc-bmm-prod-sehby1-00.38884: Flags [P.], seq 988:1126, ack 1, win 611, options [nop,nop,TS val 740005116 ecr 3146051733], length 138
21:52:15.432701 lo    In  IP 10.43.0.1.https > soc-bmm-prod-sehby1-00.53596: Flags [.], ack 2842887, win 19442, options [nop,nop,TS val 740005198 ecr 3146055167], length 0
21:52:15.444039 lo    In  IP 10.43.0.1.https > soc-bmm-prod-sehby1-00.53596: Flags [P.], seq 2232681:2232771, ack 2842887, win 19442, options [nop,nop,TS val 740005209 ecr 3146055167], length 90
21:52:15.444764 lo    In  IP 10.43.0.1.https > soc-bmm-prod-sehby1-00.53596: Flags [P.], seq 2232771:2233233, ack 2842887, win 19442, options [nop,nop,TS val 740005210 ecr 3146055179], length 462
21:52:15.648552 lo    In  IP 10.43.0.1.https > soc-bmm-prod-sehby1-00.38884: Flags [P.], seq 1126:5253, ack 1, win 611, options [nop,nop,TS val 740005414 ecr 3146055086], length 4127
21:52:15.649415 lo    In  IP 10.43.0.1.https > soc-bmm-prod-sehby1-00.38884: Flags [P.], seq 5253:9132, ack 1, win 611, options [nop,nop,TS val 740005414 ecr 3146055384], length 3879
21:52:15.649844 lo    In  IP 10.43.0.1.https > soc-bmm-prod-sehby1-00.53596: Flags [P.], seq 2233233:2237360, ack 2842887, win 19442, options [nop,nop,TS val 740005415 ecr 3146055180], length 4127
21:52:15.650235 lo    In  IP 10.43.0.1.https > soc-bmm-prod-sehby1-00.53596: Flags [P.], seq 2237360:2241239, ack 2842887, win 19442, options [nop,nop,TS val 740005415 ecr 3146055385], length 3879
21:52:15.691549 lo    In  IP 10.43.0.1.https > soc-bmm-prod-sehby1-00.53596: Flags [.], ack 2842935, win 19442, options [nop,nop,TS val 740005457 ecr 3146055386], length 0
21:52:15.695533 lo    In  IP 10.43.0.1.https > soc-bmm-prod-sehby1-00.38884: Flags [.], ack 49, win 611, options [nop,nop,TS val 740005461 ecr 3146055386], length 0
n
what is underlaying OS system? redhat? ubuntu?
m
Linux soc-bmm-prod-sehby1-00 6.12.34+rpt-rpi-v8 #1 SMP PREEMPT Debian 1:6.12.34-1+rpt1~bookworm (2025-06-26) aarch64 GNU/Linux running debian with systemd-networkd with iptables
n
could you confirm, that before starting the cluster, iptables (and all tables, nat, mangle, raw) are empty? i.e., there are no rules loaded.
m
also using SSD as its main drive, no SD card 🙂
n
and also, could you show
route -n
when cluster is running?
m
there is a iptables-persisted state loaded on boot so iptables is not empty when rke2 starts. the save file contains the following:
Copy code
# Ansible managed
*filter
# :INPUT DROP [1:197]
# :FORWARD ACCEPT [0:0]
# :OUTPUT ACCEPT [267:10716]

# Allow all localhost traffic
-A INPUT -i lo -j ACCEPT
-A OUTPUT -o lo -j ACCEPT

# SSH
-A INPUT -p tcp -m tcp --dport 22 -j ACCEPT
-A OUTPUT -p tcp -m tcp --sport 22 -m state --state ESTABLISHED -j ACCEPT

# DNS
-A OUTPUT -p udp -m udp --dport 53 -j ACCEPT
-A OUTPUT -p tcp -m tcp --dport 53 -j ACCEPT
-A OUTPUT -p tcp -m tcp --dport 853 -j ACCEPT

# HTTP
-A INPUT -p tcp -m tcp --dport 80 -j ACCEPT
-A OUTPUT -p tcp -m tcp --sport 80 -m state --state ESTABLISHED -j ACCEPT

# Kubernetes API
-A INPUT -p tcp -m tcp --dport 6443 -j ACCEPT
-A OUTPUT -p tcp -m tcp --sport 6443 -m state --state ESTABLISHED -j ACCEPT

# HTTPS
-A INPUT -p tcp -m tcp --dport 443 -j ACCEPT
-A OUTPUT -p tcp -m tcp --sport 443 -m state --state ESTABLISHED -j ACCEPT

#       RKE2 supervisor API
-A INPUT -p tcp -m tcp --dport 9345 -j ACCEPT
-A OUTPUT -p tcp -m tcp --sport 9345 -m state --state ESTABLISHED -j ACCEPT

#       kubelet metrics
-A INPUT -p tcp -m tcp --dport 10250 -j ACCEPT
-A OUTPUT -p tcp -m tcp --sport 10250 -m state --state ESTABLISHED -j ACCEPT

# etcd client port
-A INPUT -p tcp -m tcp --dport 2379 -j ACCEPT
-A OUTPUT -p tcp -m tcp --sport 2379 -m state --state ESTABLISHED -j ACCEPT

# etcd peer port
-A INPUT -p tcp -m tcp --dport 2380 -j ACCEPT
-A OUTPUT -p tcp -m tcp --sport 2380 -m state --state ESTABLISHED -j ACCEPT

# etcd metrics port
-A INPUT -p tcp -m tcp --dport 2381 -j ACCEPT
-A OUTPUT -p tcp -m tcp --sport 2381 -m state --state ESTABLISHED -j ACCEPT

# NodePort port range
-A INPUT -p tcp -m multiport --dports 30000:32767 -j ACCEPT
-A OUTPUT -p tcp -m multiport --sports 30000:32767 -m state --state ESTABLISHED -j ACCEPT

# Calico CNI with BGP
-A INPUT -p tcp -m tcp --dport 179 -j ACCEPT
-A OUTPUT -p tcp -m tcp --sport 179 -m state --state ESTABLISHED -j ACCEPT

# Calico CNI with VXLAN
-A INPUT -p udp -m udp --dport 4789 -j ACCEPT
-A OUTPUT -p udp -m udp --sport 4789 -m state --state ESTABLISHED -j ACCEPT

# Calico CNI with Typha
-A INPUT -p tcp -m tcp --dport 5473 -j ACCEPT
-A OUTPUT -p tcp -m tcp --sport 5473 -m state --state ESTABLISHED -j ACCEPT

# Calico Typha health checks
-A INPUT -p tcp -m tcp --dport 9098 -j ACCEPT
-A OUTPUT -p tcp -m tcp --sport 9098 -m state --state ESTABLISHED -j ACCEPT

# Calico health checks
-A INPUT -p tcp -m tcp --dport 9099 -j ACCEPT
-A OUTPUT -p tcp -m tcp --sport 9099 -m state --state ESTABLISHED -j ACCEPT

# Allow existing traffic
-A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT

-P INPUT DROP
-P FORWARD ACCEPT
-P OUTPUT ACCEPT

COMMIT
as for ip route once the cluster is running:
Copy code
default via 10.32.64.1 dev datacenter0 proto static 
10.32.64.0/24 dev datacenter0 proto kernel scope link src 10.32.64.100 
10.42.208.192 dev cali5447c4b1272 scope link 
blackhole 10.42.208.192/26 proto 80 
10.42.208.193 dev calicabe44beaa2 scope link 
10.42.208.194 dev cali7787a67fd2b scope link 
10.42.208.195 dev calie602664b6a1 scope link 
10.42.208.196 dev cali1fc3c929199 scope link 
10.42.208.197 dev cali9caf7447c01 scope link 
10.42.208.198 dev cali2093edb2ca5 scope link 
10.42.208.199 dev cali13764af579c scope link
n
and is ip forward enabled? (
cat /proc/sys/net/ipv4/ip_forward
must return 1)
m
yes it is 1
n
then I would try again with completely empty iptables rules and all chains set to accept
I mean before rke starts (not to meant fight with calico deployed rules 😉 )
m
ill give it a whirl, im pretty sure that was the case before I added the rules so i'm having a feeling it aint gonna work but always good to try again and be sure 🙂
n
I think you should see the packet to be sent back to the originating cali interface and as you clearly see retransmission, for some reason it is not routed to the cali interface but rather dropped.
m
still getting i/o timeouts and non ackd requests from the cali interfaces T.T
Copy code
22:28:24.868971 lo    In  IP 10.43.0.1.https > soc-bmm-prod-sehby1-00.56974: Flags [P.], seq 27960:28050, ack 25457, win 610, options [nop,nop,TS val 1009298248 ecr 1466265991], length 90
22:28:24.869153 lo    In  IP 10.43.0.1.https > soc-bmm-prod-sehby1-00.56974: Flags [P.], seq 28050:28467, ack 25457, win 610, options [nop,nop,TS val 1009298248 ecr 1466265991], length 417
22:28:25.039727 lo    In  IP 10.43.0.1.https > soc-bmm-prod-sehby1-00.37640: Flags [P.], seq 2430300:2430389, ack 409634, win 29563, options [nop,nop,TS val 1009298419 ecr 1466266185], length 89
22:28:25.040310 lo    In  IP 10.43.0.1.https > soc-bmm-prod-sehby1-00.37640: Flags [P.], seq 2430389:2437629, ack 409634, win 29563, options [nop,nop,TS val 1009298419 ecr 1466266185], length 7240
22:28:25.040355 lo    In  IP 10.43.0.1.https > soc-bmm-prod-sehby1-00.37640: Flags [P.], seq 2437629:2441486, ack 409634, win 29563, options [nop,nop,TS val 1009298419 ecr 1466266185], length 3857
22:28:25.041092 lo    In  IP 10.43.0.1.https > soc-bmm-prod-sehby1-00.37640: Flags [P.], seq 2441486:2441517, ack 409634, win 29563, options [nop,nop,TS val 1009298420 ecr 1466266185], length 31
22:28:25.042529 lo    In  IP 10.43.0.1.https > soc-bmm-prod-sehby1-00.37640: Flags [.], ack 409730, win 29563, options [nop,nop,TS val 1009298421 ecr 1466266194], length 0
22:28:25.060791 calia34a6eff98d In  IP 10.42.208.194.48602 > 10.43.0.1.https: Flags [S], seq 2328333618, win 64390, options [mss 1370,sackOK,TS val 2477254100 ecr 0,nop,wscale 7], length 0
22:28:25.060855 datacenter0 Out IP 10.43.0.1.https > 10.42.208.194.48602: Flags [S.], seq 2697682218, ack 2328333619, win 65160, options [mss 1460,sackOK,TS val 3122549240 ecr 2477248969,nop,wscale 7], length 0
22:28:25.120763 cali38909b96b7c In  IP 10.42.208.197.45376 > 10.43.0.1.https: Flags [S], seq 3696161761, win 64390, options [mss 1370,sackOK,TS val 668299995 ecr 0,nop,wscale 7], length 0
22:28:25.120835 datacenter0 Out IP 10.43.0.1.https > 10.42.208.197.45376: Flags [S.], seq 1085029372, ack 3696161762, win 65160, options [mss 1460,sackOK,TS val 1702030389 ecr 668295904,nop,wscale 7], length 0
22:28:25.280756 calia6eb079446c In  IP 10.42.208.195.33938 > 10.43.0.1.https: Flags [S], seq 4150976786, win 64390, options [mss 1370,sackOK,TS val 908273061 ecr 0,nop,wscale 7], length 0
22:28:25.280855 datacenter0 Out IP 10.43.0.1.https > 10.42.208.195.33938: Flags [S.], seq 2244103980, ack 4150976787, win 65160, options [mss 1460,sackOK,TS val 538658040 ecr 908267936,nop,wscale 7], length 0
22:28:25.287898 lo    In  IP 10.43.0.1.https > soc-bmm-prod-sehby1-00.37640: Flags [P.], seq 2441517:2441679, ack 409730, win 29563, options [nop,nop,TS val 1009298667 ecr 1466266194], length 162
22:28:25.840378 lo    In  IP 10.43.0.1.https > soc-bmm-prod-sehby1-00.37640: Flags [P.], seq 2441679:2441989, ack 409730, win 29563, options [nop,nop,TS val 1009299219 ecr 1466266481], length 310
22:28:25.875056 lo    In  IP 10.43.0.1.https > soc-bmm-prod-sehby1-00.56974: Flags [.], ack 25915, win 610, options [nop,nop,TS val 1009299254 ecr 1466267026], length 0
22:28:25.889523 lo    In  IP 10.43.0.1.https > soc-bmm-prod-sehby1-00.56974: Flags [P.], seq 28467:28557, ack 25915, win 610, options [nop,nop,TS val 1009299268 ecr 1466267026], length 90
22:28:25.889759 lo    In  IP 10.43.0.1.https > soc-bmm-prod-sehby1-00.56974: Flags [P.], seq 28557:28974, ack 25915, win 610, options [nop,nop,TS val 1009299269 ecr 1466267026], length 417
22:28:25.893820 lo    In  IP 10.43.0.1.https > soc-bmm-prod-sehby1-00.37640: Flags [.], ack 410230, win 29563, options [nop,nop,TS val 1009299273 ecr 1466267045], length 0
22:28:25.906828 lo    In  IP 10.43.0.1.https > soc-bmm-prod-sehby1-00.37640: Flags [P.], seq 2441989:2442060, ack 410230, win 29563, options [nop,nop,TS val 1009299286 ecr 1466267045], length 71
22:28:25.907057 lo    In  IP 10.43.0.1.https > soc-bmm-prod-sehby1-00.37640: Flags [P.], seq 2442060:2442519, ack 410230, win 29563, options [nop,nop,TS val 1009299286 ecr 1466267059], length 459
22:28:26.144764 cali38909b96b7c In  IP 10.42.208.197.45376 > 10.43.0.1.https: Flags [S], seq 3696161761, win 64390, options [mss 1370,sackOK,TS val 668301019 ecr 0,nop,wscale 7], length 0
22:28:26.144846 datacenter0 Out IP 10.43.0.1.https > 10.42.208.197.45376: Flags [S.], seq 1085029372, ack 3696161762, win 65160, options [mss 1460,sackOK,TS val 1702031413 ecr 668295904,nop,wscale 7], length 0
22:28:26.668809 lo    In  IP 10.43.0.1.https > soc-bmm-prod-sehby1-00.37640: Flags [P.], seq 2442519:2442645, ack 410230, win 29563, options [nop,nop,TS val 1009300048 ecr 1466267059], length 126
22:28:26.681653 lo    In  IP 10.43.0.1.https > soc-bmm-prod-sehby1-00.37640: Flags [P.], seq 2442645:2442763, ack 410230, win 29563, options [nop,nop,TS val 1009300060 ecr 1466267821], length 118
22:28:26.753033 lo    In  IP 10.43.0.1.https > soc-bmm-prod-sehby1-00.37640: Flags [P.], seq 2442763:2442866, ack 410230, win 29563, options [nop,nop,TS val 1009300132 ecr 1466267833], length 103
22:28:26.784415 lo    In  IP 10.43.0.1.https > soc-bmm-prod-sehby1-00.37640: Flags [P.], seq 2442866:2443007, ack 410230, win 29563, options [nop,nop,TS val 1009300163 ecr 1466267905], length 141
22:28:26.880732 datacenter0 Out IP 10.43.0.1.https > 10.42.208.193.52352: Flags [S.], seq 2279102316, ack 4084434733, win 65160, options [mss 1460,sackOK,TS val 2734520835 ecr 144408429,nop,wscale 7], length 0
22:28:26.880801 cali0def3d64446 In  IP 10.42.208.193.52352 > 10.43.0.1.https: Flags [S], seq 4084434732, win 64390, options [mss 1370,sackOK,TS val 144415554 ecr 0,nop,wscale 7], length 0
22:28:26.880841 datacenter0 Out IP 10.43.0.1.https > 10.42.208.193.52352: Flags [S.], seq 2279102316, ack 4084434733, win 65160, options [mss 1460,sackOK,TS val 2734520835 ecr 144408429,nop,wscale 7], length 0
22:28:26.892124 lo    In  IP 10.43.0.1.https > soc-bmm-prod-sehby1-00.56974: Flags [.], ack 26373, win 610, options [nop,nop,TS val 1009300271 ecr 1466268043], length 0
22:28:26.900346 lo    In  IP 10.43.0.1.https > soc-bmm-prod-sehby1-00.56974: Flags [P.], seq 28974:29063, ack 26373, win 610, options [nop,nop,TS val 1009300279 ecr 1466268043], length 89
22:28:26.900508 lo    In  IP 10.43.0.1.https > soc-bmm-prod-sehby1-00.56974: Flags [P.], seq 29063:29480, ack 26373, win 610, options [nop,nop,TS val 1009300279 ecr 1466268043], length 417
22:28:26.913351 lo    In  IP 10.43.0.1.https > soc-bmm-prod-sehby1-00.37640: Flags [P.], seq 2443007:2443189, ack 410230, win 29563, options [nop,nop,TS val 1009300292 ecr 1466267936], length 182
22:28:26.946975 lo    In  IP 10.43.0.1.https > soc-bmm-prod-sehby1-00.37640: Flags [P.], seq 2443189:2443348, ack 410230, win 29563, options [nop,nop,TS val 1009300326 ecr 1466268065], length 159
22:28:26.979175 lo    In  IP 10.43.0.1.https > soc-bmm-prod-sehby1-00.37640: Flags [P.], seq 2443348:2443445, ack 410230, win 29563, options [nop,nop,TS val 1009300358 ecr 1466268099], length 97
22:28:27.072732 datacenter0 Out IP 10.43.0.1.https > 10.42.208.194.48602: Flags [S.], seq 2697682218, ack 2328333619, win 65160, options [mss 1460,sackOK,TS val 3122551252 ecr 2477248969,nop,wscale 7], length 0
22:28:27.072810 calia34a6eff98d In  IP 10.42.208.194.48602 > 10.43.0.1.https: Flags [S], seq 2328333618, win 64390, options [mss 1370,sackOK,TS val 2477256112 ecr 0,nop,wscale 7], length 0
22:28:27.072855 datacenter0 Out IP 10.43.0.1.https > 10.42.208.194.48602: Flags [S.], seq 2697682218, ack 2328333619, win 65160, options [mss 1460,sackOK,TS val 3122551252 ecr 2477248969,nop,wscale 7], length 0
22:28:27.149551 lo    In  IP 10.43.0.1.https > soc-bmm-prod-sehby1-00.37640: Flags [P.], seq 2443445:2443870, ack 410230, win 29563, options [nop,nop,TS val 1009300528 ecr 1466268131], length 425
22:28:27.149760 lo    In  IP 10.43.0.1.https > soc-bmm-prod-sehby1-00.37640: Flags [P.], seq 2443870:2444295, ack 410230, win 29563, options [nop,nop,TS val 1009300529 ecr 1466268301], length 425
22:28:27.207724 lo    In  IP 10.43.0.1.https > soc-bmm-prod-sehby1-00.37640: Flags [P.], seq 2444295:2444524, ack 410230, win 29563, options [nop,nop,TS val 1009300586 ecr 1466268302], length 229
22:28:27.248748 lo    In  IP 10.43.0.1.https > soc-bmm-prod-sehby1-00.37640: Flags [.], ack 410265, win 29563, options [nop,nop,TS val 1009300628 ecr 1466268360], length 0
22:28:27.296750 datacenter0 Out IP 10.43.0.1.https > 10.42.208.195.33938: Flags [S.], seq 2244103980, ack 4150976787, win 65160, options [mss 1460,sackOK,TS val 538660056 ecr 908267936,nop,wscale 7], length 0
22:28:27.296802 calia6eb079446c In  IP 10.42.208.195.33938 > 10.43.0.1.https: Flags [S], seq 4150976786, win 64390, options [mss 1370,sackOK,TS val 908275077 ecr 0,nop,wscale 7], length 0
22:28:27.296844 datacenter0 Out IP 10.43.0.1.https > 10.42.208.195.33938: Flags [S.], seq 2244103980, ack 4150976787, win 65160, options [mss 1460,sackOK,TS val 538660056 ecr 908267936,nop,wscale 7], length 0
22:28:27.400522 lo    In  IP 10.43.0.1.https > soc-bmm-prod-sehby1-00.37640: Flags [P.], seq 2444524:2444668, ack 410265, win 29563, options [nop,nop,TS val 1009300779 ecr 1466268360], length 144
22:28:27.530570 lo    In  IP 10.43.0.1.https > soc-bmm-prod-sehby1-00.37640: Flags [P.], seq 2444668:2444836, ack 410265, win 29563, options [nop,nop,TS val 1009300909 ecr 1466268593], length 168
22:28:27.537804 lo    In  IP 10.43.0.1.https > soc-bmm-prod-sehby1-00.37640: Flags [P.], seq 2444836:2444991, ack 410265, win 29563, options [nop,nop,TS val 1009300917 ecr 1466268682], length 155
22:28:27.555367 lo    In  IP 10.43.0.1.https > soc-bmm-prod-sehby1-00.37640: Flags [P.], seq 2444991:2445147, ack 410265, win 29563, options [nop,nop,TS val 1009300934 ecr 1466268690], length 156
I can confirm that iptables was clear and all chains were set to ACCEPT before rke2 was initially started 🙂
n
Copy code
22:28:27.072810 calia34a6eff98d In  IP 10.42.208.194.48602 > 10.43.0.1.https: Flags [S], seq 2328333618, win 64390, options [mss 1370,sackOK,TS val 2477256112 ecr 0,nop,wscale 7], length 0
22:28:27.072855 datacenter0 Out IP 10.43.0.1.https > 10.42.208.194.48602: Flags [S.], seq 2697682218, ack 2328333619, win 65160, options [mss 1460,sackOK,TS val 3122551252 ecr 2477248969,nop,wscale 7], length 0
however, this one looks like the packet has been sent to outer network instead of back to calia34.. I supose,
datacenter0
is real ethernet interface.
m
yes its the physical one
n
if you can check the pod probably listens on some port like healt.
telnet 10.42.208.194 [that_port]
should connect from the host. and you also should see using tcpdump that the packet is correctly routed to the pod (and maybe it should be routed anyway even if the pod is actually not listening on that port).
m
its just stuck there trying
I assume though its only got that port open for a short period of time before the pod crashes though?
n
yes maybe, but could you see using tcpdump, that it is routed to the cali interface?
m
my tcpdump-fu is not amazing unfortunatly so im not too sure exsactly what i need to look for 😅
n
I think that those lo packets are not relevant
Copy code
ip route show table all | grep 10.42.
?
m
Copy code
ip route show table all | grep 10.42.
10.42.208.192 dev cali5f9c11d3630 scope link 
blackhole 10.42.208.192/26 proto 80 
10.42.208.193 dev cali0def3d64446 scope link 
10.42.208.194 dev calia34a6eff98d scope link 
10.42.208.195 dev calia6eb079446c scope link 
10.42.208.196 dev cali86737a89e0c scope link 
10.42.208.197 dev cali38909b96b7c scope link 
10.42.208.198 dev cali7f88475d8bf scope link 
10.42.208.199 dev calia60e0215849 scope link 
local 10.42.208.200 dev vxlan.calico table local proto kernel scope host src 10.42.208.200
n
ip route get 10.42.208.194
m
Copy code
carbon@soc-bmm-prod-sehby1-00:~ $ ip route get 10.42.208.194
10.42.208.194 via 10.32.64.1 dev datacenter0 table 3264 src 10.32.64.100 uid 3206 
    cache
n
yes, this is bad
it should show something like:
Copy code
root@kub-a5:~# ip route get 10.42.114.108
10.42.114.108 dev cali48eafa66be7 src 10.16.60.14 uid 0
    cache
root@kub-a5:~#
m
is my systemd-networkd config potentially getting involved there?
n
i.e. directly the cali interface
Copy code
ip route show table 3264
m
Copy code
carbon@soc-bmm-prod-sehby1-00:~ $ ip route show table 3264
default via 10.32.64.1 dev datacenter0 proto static metric 100
n
Copy code
ip rule show
m
Copy code
carbon@soc-bmm-prod-sehby1-00:~ $ ip rule show
0:      from all lookup local
100:    from all lookup 3264 proto static
32766:  from all lookup main
32767:  from all lookup default
The actual route config on the network has:
Copy code
[Route]
Destination=0.0.0.0/0
Gateway=10.32.64.1
Metric=100
Table=3264
Scope=global
Would that catch all destination potentially be getting in the way here 😛
n
so this is the problem. the rule 100: takes precedence over the 'normal' routing table (ip route) and everything is routed to the dacatenter0 nic.
yes, this is it
if you remove/fix it two options: 1. it will start to work 2. no network will work 😉
m
yea I previously had issues with the 2nd point but its super easy to fix if i mess it up as I at least have physical access :D
I had calico at one point nuking the default route and thus I would ahve no networking at all T.T just restarting the node with the changes, lets see if i can c0onnect back to it :3
ok so i removed the Destination on the network interface but its still routing to datacenter0 on the cali interfaces 🤔
n
I would remove the table 3264 completely, it should not be there
maybe use the name 'main' instead of 3264
m
just starting the cluster now, ive remoted that whole routing table and rule
logs in calico controller are looking better :3
route looks better for the cali interface too 😄
Copy code
carbon@soc-bmm-prod-sehby1-00:~ $ ip route get 10.42.208.209
10.42.208.209 dev cali7f88475d8bf src 10.32.64.100 uid 3206 
    cache
Copy code
carbon@soc-bmm-prod-sehby1-00:~ $ sudo kubectl get pods --namespace kube-system --kubeconfig /etc/rancher/rke2/rke2.yaml
NAME                                                   READY   STATUS      RESTARTS         AGE
cloud-controller-manager-soc-bmm-prod-sehby1-00        1/1     Running     8 (3m ago)       42m
etcd-soc-bmm-prod-sehby1-00                            1/1     Running     5                42m
helm-install-rke2-calico-crd-l5xxh                     0/1     Completed   0                42m
helm-install-rke2-calico-zwb6m                         0/1     Completed   2                42m
helm-install-rke2-coredns-xnwwp                        0/1     Completed   0                42m
helm-install-rke2-ingress-nginx-5qstv                  0/1     Completed   12               42m
helm-install-rke2-metrics-server-kfd2x                 0/1     Completed   12               42m
helm-install-rke2-runtimeclasses-n2g6z                 0/1     Completed   12               42m
helm-install-rke2-snapshot-controller-crd-f6rc8        0/1     Completed   12               42m
helm-install-rke2-snapshot-controller-w2t6p            0/1     Completed   13               42m
kube-apiserver-soc-bmm-prod-sehby1-00                  1/1     Running     4                42m
kube-controller-manager-soc-bmm-prod-sehby1-00         1/1     Running     6 (3m4s ago)     42m
kube-proxy-soc-bmm-prod-sehby1-00                      1/1     Running     0                2m33s
kube-scheduler-soc-bmm-prod-sehby1-00                  1/1     Running     3 (3m14s ago)    42m
kube-vip-r869p                                         1/1     Running     5 (3m14s ago)    42m
rke2-coredns-rke2-coredns-65dc69968-t8vsk              1/1     Running     12               42m
rke2-coredns-rke2-coredns-autoscaler-68d5f76f7-pn8mh   1/1     Running     12 (3m14s ago)   42m
rke2-ingress-nginx-controller-jf5x8                    1/1     Running     0                90s
rke2-metrics-server-69bdccfdd9-cwwdv                   1/1     Running     0                107s
rke2-snapshot-controller-696989ffdd-t5rxr              1/1     Running     0                96s
is looking good too 😄
right, seems stable after a few minutes so im gonna nuke it and try and form a 3-server cluster now to see if that works :3
n
it should if routing table is not messed on any node 😉 and you need the cloud-controller too
m
found a race condition in my ansible script now 😅 and I need to go to bed so ill get back to you if it was all successful later. I really appreciate the help 🙂
🙌 1
So it looks like when the iptable rules allow it I can spin the cluster up and join multiple servers together 🙂 I am getting another issue now where RKE2 injects all of its rules inot iptables including one that blocks all localnet traffic from entering the cluster, this seems to be causing issues when using tools like kubectl to talk with the server as it cant talk with the instance....
I think I have figured that one out... I was running into a race condition where sometimes my ansible script had already made a connection before RKE2 had injected all of the iptable rules so that was matching under the established rule I had. changing the script to use kubectl against the server ip or vip resolves that 😄
1