Hello. Wondering if I can get some assistance sett...
# rke2
e
Hello. Wondering if I can get some assistance setting up a new test cluster to play with Trident as my CSI. Unfortunately it seems I am having some kind of network issue with Calico before I can get Trident up and healthy. I have 3 nodes, 1 is master. All run on a minimal RHEL8 base OS. Firewall and FApolicyd are disabled. NetworkManager is running, but I have added the recommended entries so it wont mess with calico. I am using RKE2 v1.31.12. SELinux is on, but RKE2 is configured to support it. The Trident install goes fine, it spins up a trident controller node and 3 regular nodes, one per k8s node. The trident node on the same host as the controller node seems fine, the other 2 cant seem to reach the controller.
I used the NetApp KB to test the connectivity from a test pod and confirmed it cant reach outside itself. The GW IP shows up as an APIPA IP in side the pod.
I have a few production clusters I have setup with RKE2/Calico/RHEL in an identical setup and I have not had any issues with the inter-node comms. I made sure all my sysctl settings for ip_forwarding were set to 1, which was usually the issue when I did have issues.
Copy code
NAME                                  READY   STATUS    RESTARTS         AGE     IP              NODE        NOMINATED NODE   READINESS GATES
curl-pod                              1/1     Running   2 (9m40s ago)    29m     10.42.242.212   csi-test1   <none>           <none>
trident-controller-75ccc7c5f5-s9cs5   6/6     Running   0                3h15m   10.42.199.4     csi-test3   <none>           <none>
trident-node-linux-8tr66              1/2     Running   39 (5m30s ago)   3h15m   10.7.54.52      csi-test2   <none>           <none>
trident-node-linux-cx5w9              2/2     Running   2 (3h13m ago)    3h15m   10.7.54.53      csi-test3   <none>           <none>
trident-node-linux-p6kvl              1/2     Running   39 (5m38s ago)   3h15m   10.7.54.51      csi-test1   <none>           <none>
Looking a bit closer I see the Ip assigned to the trident nodes are the ones assigned to the k8s nodes, not the internal IP's, so maybe thats where it falls down. My production clusters use longhorn, so cant really compare.
Copy code
# kubectl get ippool default-ipv4-ippool -o yaml
apiVersion: <http://crd.projectcalico.org/v1|crd.projectcalico.org/v1>
kind: IPPool
metadata:
  creationTimestamp: "2025-09-18T05:50:54Z"
  generation: 1
  labels:
    <http://app.kubernetes.io/managed-by|app.kubernetes.io/managed-by>: tigera-operator
  name: default-ipv4-ippool
  resourceVersion: "920"
  uid: 220fc13b-6c52-4048-9edd-46aa7fbc1843
spec:
  allowedUses:
  - Workload
  - Tunnel
  assignmentMode: Automatic
  blockSize: 26
  cidr: 10.42.0.0/16
  ipipMode: Never
  natOutgoing: true
  nodeSelector: all()
  vxlanMode: Always
I dont see too much different between the ippool above and one that works on my production cluster (accounting for v1.29 to v1.31 comparison)
Copy code
# k get nodes -o wide
NAME        STATUS   ROLES                       AGE   VERSION           INTERNAL-IP   EXTERNAL-IP   OS-IMAGE                                KERNEL-VERSION                  CONTAINER-RUNTIME
csi-test1   Ready    control-plane,etcd,master   12d   v1.31.12+rke2r1   10.7.54.51    <none>        Red Hat Enterprise Linux 8.10 (Ootpa)   4.18.0-553.70.1.el8_10.x86_64   <containerd://2.0.5-k3s2>
csi-test2   Ready    <none>                      12d   v1.31.12+rke2r1   10.7.54.52    <none>        Red Hat Enterprise Linux 8.10 (Ootpa)   4.18.0-553.70.1.el8_10.x86_64   <containerd://2.0.5-k3s2>
csi-test3   Ready    <none>                      12d   v1.31.12+rke2r1   10.7.54.53    <none>        Red Hat Enterprise Linux 8.10 (Ootpa)   4.18.0-553.70.1.el8_10.x86_64   <containerd://2.0.5-k3s2>
nodes seem happy
The curl-pod (test pod) can ping the external IPs (10.7.54.x) but cant ping the trident controller. So not sure why it got a correct private IP and the others did not. Assuming they were supposed to get 10.42.x.x IP's which Im not even sure of.
Tried setting the felixconfiguration defaultEndpointToHostAction to Accept instead of drop and that did nothing to help.
Got confirmation the trident pods are supposed to be getting the external IP, so all that looks good.
No networkPolicy or GlobalNetworkPolicy are defined in any namespace, which is supposed to "allow all"
well.. damn
turns out it was having a 2nd NIC on the server
as soon as I disconnected that NIC from each host in vCenter, things all synced up
I kind of need that NIC as I want it to be dedicated to the NFS traffic on its own VLAN, but at least I know what the current issue is.