This message was deleted Rancher Users #rke2

Join Slack

This message was deleted.

# rke2

adamant-kite-43734

07/31/2023, 8:55 PM

This message was deleted.

creamy-pencil-82913

07/31/2023, 8:57 PM

https://docs.tigera.io/calico/latest/networking/configuring/vxlan-ipip ?

creamy-pencil-82913

07/31/2023, 8:58 PM

upstream project docs are usually where I’d start…

faint-park-33707

07/31/2023, 9:02 PM

I've ran across that document. But maybe I'm not interpreting it right?

faint-park-33707

07/31/2023, 9:03 PM

That cross-subnet feature. If there happens to be a routable 10.42.0.0 network would it not encapsulate the packet anymore and send it off to that other network?

faint-park-33707

07/31/2023, 9:03 PM

Am I understanding that correctly?

creamy-pencil-82913

07/31/2023, 9:04 PM

all your Linux nodes generally know about is a default route, right?

creamy-pencil-82913

07/31/2023, 9:04 PM

the CNI knows that it has one of the node CIDRs locally, and it adds specific routes for CIDRs assigned to other nodes.

creamy-pencil-82913

07/31/2023, 9:05 PM

Everything else just goes to the default route

creamy-pencil-82913

07/31/2023, 9:05 PM

Unless you are for some reason pushing your entire network topology’s routing table down to each and every node via BGP or something?

creamy-pencil-82913

07/31/2023, 9:07 PM

it would certainly be a best practice to not have your cluster-cidr overlap with an actual network range in use on your network, otherwise pods running in Kubernetes won’t be able to reach those hosts - as they will have higher priority routes via the CNI.

faint-park-33707

07/31/2023, 9:07 PM

We just have default RKE2. So we have the 10.42.0.0 for pods and 10.43 for services. We tried netcat and tcpdump. server1 to server2 udp traffic works. pod on server1 to server2 udp traffic works but pod on server1 to pod on server2 udp traffic doesn't.

faint-park-33707

07/31/2023, 9:07 PM

they do have a routable 10.42 network..

creamy-pencil-82913

07/31/2023, 9:07 PM

are you sure you’ve opened the vxlan ports in both directions?

creamy-pencil-82913

07/31/2023, 9:07 PM

all that the outer network will see is UDP vxlan traffic between the two nodes

faint-park-33707

07/31/2023, 9:08 PM

yeah, all firewalls are disabled.

faint-park-33707

07/31/2023, 9:08 PM

server 1 and 2 are in the same subnet

creamy-pencil-82913

07/31/2023, 9:08 PM

are you seeing the vxlan traffic from server1 show up on server2?

faint-park-33707

07/31/2023, 9:10 PM

when we did a pod to pod connection We saw the traffic go out of the server1 interface, but never saw the traffic go through the server2 interface.

faint-park-33707

07/31/2023, 9:11 PM

server 2 never saw that broadcasted udp traffic

creamy-pencil-82913

07/31/2023, 9:11 PM

well if it’s leaving server1 destined for server2, and not showing up on server2, that sounds like an issue with the network

faint-park-33707

07/31/2023, 9:11 PM

that's the part that we have been trying to explain to the customer we have 🙂 it's been great fun.

creamy-pencil-82913

07/31/2023, 9:11 PM

You said the local firewall is disabled, does that include any external firewalling tools like security groups?

faint-park-33707

07/31/2023, 9:11 PM

so if there is a routable 10.42 network somewhere. that would cause issue, right?

creamy-pencil-82913

07/31/2023, 9:12 PM

no, that would not

faint-park-33707

07/31/2023, 9:12 PM

oh damn ok

creamy-pencil-82913

07/31/2023, 9:12 PM

the vxlan traffic is peer to peer. Leaves server1 destined for server2.

faint-park-33707

07/31/2023, 9:12 PM

they keep saying there are no firewalls between the 2

creamy-pencil-82913

07/31/2023, 9:12 PM

it’s a direct mesh

faint-park-33707

07/31/2023, 9:12 PM

we disable firewalld

faint-park-33707

07/31/2023, 9:12 PM

disabled selinux to try. that didnt work either

creamy-pencil-82913

07/31/2023, 9:13 PM

what is the underlying infrastructure? vmware? ec2?

faint-park-33707

07/31/2023, 9:13 PM

they did have ip_forword disabled. but we fixed that.

faint-park-33707

07/31/2023, 9:13 PM

vmware in aws I guess

creamy-pencil-82913

07/31/2023, 9:13 PM

uhh what

creamy-pencil-82913

07/31/2023, 9:13 PM

well anyways, if it’s in AWS have them check security groups, they probably forgot to open the vxlan ports there.

faint-park-33707

07/31/2023, 9:14 PM

yeah.. not sure why they use vmware in aws

faint-park-33707

07/31/2023, 9:15 PM

I'll keep pushing on that agenda.

creamy-pencil-82913

07/31/2023, 9:15 PM

maybe they just like spending money

faint-park-33707

07/31/2023, 9:15 PM

Well... it is an unamed large energy company. so... probably.

creamy-pencil-82913

07/31/2023, 9:16 PM

anyways, keep poking at the vxlan stuff. you should see unicast vxlan udp packets leaving one server, destined for the other. and then see the same packet show up on the interface on the other server.

creamy-pencil-82913

07/31/2023, 9:16 PM

it’s all just peer to peer packet encapsulation

faint-park-33707

07/31/2023, 9:17 PM

ok thanks Brandon! Much appreciated!

faint-park-33707

08/02/2023, 5:40 PM

So, this was interesting. We turned off general offloading on the network card and everything started working. RHEL 8.8 on VNC.

creamy-pencil-82913

08/02/2023, 5:55 PM

EL kernels have for ages been shipping with a bug that causes hardware offload to corrupt vxlan checksums on vmware hosts

creamy-pencil-82913

08/02/2023, 5:55 PM

on VNC? or vmware?

creamy-pencil-82913

08/02/2023, 5:56 PM

https://kb.vmware.com/s/article/90495

faint-park-33707

08/02/2023, 6:23 PM

Vmware Cloud? maybe I'm using the acronym wrong. I'm not familiar with any of this.

faint-park-33707

08/02/2023, 6:26 PM

They definitely use NSX-T.

creamy-pencil-82913

08/02/2023, 6:26 PM

vnc is a remote desktop thing

faint-park-33707

08/02/2023, 6:28 PM

Ok then they are probably saying VMC. Hard to tell over voice 🙂

42 Views

Open in Slack

Previous Next