This message was deleted.
# rke2
a
This message was deleted.
upstream project docs are usually where I’d start…
f
I've ran across that document. But maybe I'm not interpreting it right?
That cross-subnet feature. If there happens to be a routable 10.42.0.0 network would it not encapsulate the packet anymore and send it off to that other network?
Am I understanding that correctly?
c
all your Linux nodes generally know about is a default route, right?
the CNI knows that it has one of the node CIDRs locally, and it adds specific routes for CIDRs assigned to other nodes.
Everything else just goes to the default route
Unless you are for some reason pushing your entire network topology’s routing table down to each and every node via BGP or something?
it would certainly be a best practice to not have your cluster-cidr overlap with an actual network range in use on your network, otherwise pods running in Kubernetes won’t be able to reach those hosts - as they will have higher priority routes via the CNI.
f
We just have default RKE2. So we have the 10.42.0.0 for pods and 10.43 for services. We tried netcat and tcpdump. server1 to server2 udp traffic works. pod on server1 to server2 udp traffic works but pod on server1 to pod on server2 udp traffic doesn't.
they do have a routable 10.42 network..
c
are you sure you’ve opened the vxlan ports in both directions?
all that the outer network will see is UDP vxlan traffic between the two nodes
f
yeah, all firewalls are disabled.
server 1 and 2 are in the same subnet
c
are you seeing the vxlan traffic from server1 show up on server2?
f
when we did a pod to pod connection We saw the traffic go out of the server1 interface, but never saw the traffic go through the server2 interface.
server 2 never saw that broadcasted udp traffic
c
well if it’s leaving server1 destined for server2, and not showing up on server2, that sounds like an issue with the network
f
that's the part that we have been trying to explain to the customer we have 🙂 it's been great fun.
c
You said the local firewall is disabled, does that include any external firewalling tools like security groups?
f
so if there is a routable 10.42 network somewhere. that would cause issue, right?
c
no, that would not
f
oh damn ok
c
the vxlan traffic is peer to peer. Leaves server1 destined for server2.
f
they keep saying there are no firewalls between the 2
c
it’s a direct mesh
f
we disable firewalld
disabled selinux to try. that didnt work either
c
what is the underlying infrastructure? vmware? ec2?
f
they did have ip_forword disabled. but we fixed that.
vmware in aws I guess
c
uhh what
well anyways, if it’s in AWS have them check security groups, they probably forgot to open the vxlan ports there.
f
yeah.. not sure why they use vmware in aws
I'll keep pushing on that agenda.
c
maybe they just like spending money
f
Well... it is an unamed large energy company. so... probably.
c
anyways, keep poking at the vxlan stuff. you should see unicast vxlan udp packets leaving one server, destined for the other. and then see the same packet show up on the interface on the other server.
it’s all just peer to peer packet encapsulation
f
ok thanks Brandon! Much appreciated!
So, this was interesting. We turned off general offloading on the network card and everything started working. RHEL 8.8 on VNC.
c
EL kernels have for ages been shipping with a bug that causes hardware offload to corrupt vxlan checksums on vmware hosts
on VNC? or vmware?
f
Vmware Cloud? maybe I'm using the acronym wrong. I'm not familiar with any of this.
They definitely use NSX-T.
c
vnc is a remote desktop thing
f
Ok then they are probably saying VMC. Hard to tell over voice 🙂