Hello Wondering if I can get some assistance setting up a ne Rancher Users #rke2

Hello. Wondering if I can get some assistance sett...

enough-arm-41649

09/30/2025, 5:34 AM

Hello. Wondering if I can get some assistance setting up a new test cluster to play with Trident as my CSI. Unfortunately it seems I am having some kind of network issue with Calico before I can get Trident up and healthy. I have 3 nodes, 1 is master. All run on a minimal RHEL8 base OS. Firewall and FApolicyd are disabled. NetworkManager is running, but I have added the recommended entries so it wont mess with calico. I am using RKE2 v1.31.12. SELinux is on, but RKE2 is configured to support it. The Trident install goes fine, it spins up a trident controller node and 3 regular nodes, one per k8s node. The trident node on the same host as the controller node seems fine, the other 2 cant seem to reach the controller.

enough-arm-41649

09/30/2025, 5:35 AM

I used the NetApp KB to test the connectivity from a test pod and confirmed it cant reach outside itself. The GW IP shows up as an APIPA IP in side the pod.

enough-arm-41649

09/30/2025, 5:37 AM

I have a few production clusters I have setup with RKE2/Calico/RHEL in an identical setup and I have not had any issues with the inter-node comms. I made sure all my sysctl settings for ip_forwarding were set to 1, which was usually the issue when I did have issues.

enough-arm-41649

09/30/2025, 5:39 AM

Copy code

NAME                                  READY   STATUS    RESTARTS         AGE     IP              NODE        NOMINATED NODE   READINESS GATES
curl-pod                              1/1     Running   2 (9m40s ago)    29m     10.42.242.212   csi-test1   <none>           <none>
trident-controller-75ccc7c5f5-s9cs5   6/6     Running   0                3h15m   10.42.199.4     csi-test3   <none>           <none>
trident-node-linux-8tr66              1/2     Running   39 (5m30s ago)   3h15m   10.7.54.52      csi-test2   <none>           <none>
trident-node-linux-cx5w9              2/2     Running   2 (3h13m ago)    3h15m   10.7.54.53      csi-test3   <none>           <none>
trident-node-linux-p6kvl              1/2     Running   39 (5m38s ago)   3h15m   10.7.54.51      csi-test1   <none>           <none>

enough-arm-41649

09/30/2025, 5:41 AM

Looking a bit closer I see the Ip assigned to the trident nodes are the ones assigned to the k8s nodes, not the internal IP's, so maybe thats where it falls down. My production clusters use longhorn, so cant really compare.

enough-arm-41649

09/30/2025, 6:00 AM

Copy code

# kubectl get ippool default-ipv4-ippool -o yaml
apiVersion: <http://crd.projectcalico.org/v1|crd.projectcalico.org/v1>
kind: IPPool
metadata:
  creationTimestamp: "2025-09-18T05:50:54Z"
  generation: 1
  labels:
    <http://app.kubernetes.io/managed-by|app.kubernetes.io/managed-by>: tigera-operator
  name: default-ipv4-ippool
  resourceVersion: "920"
  uid: 220fc13b-6c52-4048-9edd-46aa7fbc1843
spec:
  allowedUses:
  - Workload
  - Tunnel
  assignmentMode: Automatic
  blockSize: 26
  cidr: 10.42.0.0/16
  ipipMode: Never
  natOutgoing: true
  nodeSelector: all()
  vxlanMode: Always

enough-arm-41649

09/30/2025, 6:01 AM

I dont see too much different between the ippool above and one that works on my production cluster (accounting for v1.29 to v1.31 comparison)

enough-arm-41649

09/30/2025, 6:30 AM

Copy code

# k get nodes -o wide
NAME        STATUS   ROLES                       AGE   VERSION           INTERNAL-IP   EXTERNAL-IP   OS-IMAGE                                KERNEL-VERSION                  CONTAINER-RUNTIME
csi-test1   Ready    control-plane,etcd,master   12d   v1.31.12+rke2r1   10.7.54.51    <none>        Red Hat Enterprise Linux 8.10 (Ootpa)   4.18.0-553.70.1.el8_10.x86_64   <containerd://2.0.5-k3s2>
csi-test2   Ready    <none>                      12d   v1.31.12+rke2r1   10.7.54.52    <none>        Red Hat Enterprise Linux 8.10 (Ootpa)   4.18.0-553.70.1.el8_10.x86_64   <containerd://2.0.5-k3s2>
csi-test3   Ready    <none>                      12d   v1.31.12+rke2r1   10.7.54.53    <none>        Red Hat Enterprise Linux 8.10 (Ootpa)   4.18.0-553.70.1.el8_10.x86_64   <containerd://2.0.5-k3s2>

enough-arm-41649

09/30/2025, 6:30 AM

nodes seem happy

enough-arm-41649

09/30/2025, 6:52 AM

The curl-pod (test pod) can ping the external IPs (10.7.54.x) but cant ping the trident controller. So not sure why it got a correct private IP and the others did not. Assuming they were supposed to get 10.42.x.x IP's which Im not even sure of.

enough-arm-41649

10/01/2025, 5:26 AM

Tried setting the felixconfiguration defaultEndpointToHostAction to Accept instead of drop and that did nothing to help.

enough-arm-41649

10/01/2025, 5:27 AM

Got confirmation the trident pods are supposed to be getting the external IP, so all that looks good.

enough-arm-41649

10/01/2025, 5:29 AM

No networkPolicy or GlobalNetworkPolicy are defined in any namespace, which is supposed to "allow all"

enough-arm-41649

10/01/2025, 5:42 AM

well.. damn

enough-arm-41649

10/01/2025, 5:42 AM

turns out it was having a 2nd NIC on the server

enough-arm-41649

10/01/2025, 5:42 AM

as soon as I disconnected that NIC from each host in vCenter, things all synced up

enough-arm-41649

10/01/2025, 5:43 AM

I kind of need that NIC as I want it to be dedicated to the NFS traffic on its own VLAN, but at least I know what the current issue is.

3 Views

Open in Slack

Previous Next