This message was deleted.
# harvester
a
This message was deleted.
w
this is an overview over our network structure. the external ips in the ip-pool is in a external network and somehow that needs to communicate with the internal network- where do we best configure this?
r
Hi, @wooden-area-49191. What is the network to which the two RKE2 VMs are attached? Is it the Vlan-42 network? If that’s the case, and the LB IP pool is created based on the Vlan-62 network, the LB IP allocation will work, but the connectivity will not.
Is the left-hand-side RKE2 box a control-plane node of the guest cluster?
w
Yes- both RKE2 clusters have a control plane in the guest clusters. The IP Pool allocation works but the connectivity does not. Host unreachable.
f
The thought as with most LBs is that the LB has a leg on both vlans and lets traffic in over IPv4 and IPv6 in that way. The LB is the smart thing.
w
I’m guessing there should be some form of setting in the load balancer that is required so it knows it’s supposed to send traffic from one network (external) to the vlan-42 network?
r
Do you see the LB IP configured on the RKE2 node’s interface? (need to ssh into the VM)
w
Great question- one sec
f
We can also simplify it with one physical nic with all vlans.
r
Harvester does not support VLAN trunk on VM Network yet
w
yes- the RKE2 node interface has the assigned ip to the eth0 interface.
Copy code
inet 193.180.173.23/32 scope global eth0
       valid_lft forever preferred_lft forever
r
thanks, things got a little bit clear so the LB IP is under VLAN62; and the management IP is under VLAN42. And they’re all bind to the same interface - eth0. Is that correct?
w
exacly
r
Does the VM have another interface? say eth1
which attach to vlan62
w
we have tried that also but it didn’t work.
r
you’ll need that vlan62 interface to work
f
Should we just set up all on one nic Christian, to make it simpler.
r
if you’re going to use only one nic, there are no multiple vlans
w
ok lets try this again @flat-librarian-14243
i’m adding another nic to the RKE2 cluster now with vlan62.
r
please find out the interface name, we’ll need that for the kube-vip daemonset
w
Ok! It’s still reconciling
Now it’s up. The second interface name is drumroll…. eth1
but now the lb ip is not assigned to either eth0 or eth1
the eth1 has a DHCP assigned address- this can be removed though
r
that won’t be necessary. please edit the kube-vip daemonset on the RKE2 cluster
w
ok
r
we need to add
eth1
to the
vip_interface
environment variable
by default, there’s no value for that env var
and kube-vip will use the interface which has default gw configured. in your case, it will be eth0. so we need to specify the one we want to use
w
so vip_interface here shoule be eth1?
r
from
Copy code
- name: svc_enable
      value: "true"
    - name: vip_arp
      value: "true"
    - name: vip_cidr
      value: "32"
    - name: vip_interface
    - name: vip_leaderelection
      value: "false"
to
Copy code
- name: svc_enable
      value: "true"
    - name: vip_arp
      value: "true"
    - name: vip_cidr
      value: "32"
    - name: vip_interface
      value: "eth1"
    - name: vip_leaderelection
      value: "false"
and wait for the pod to roll out the new config
w
done!
r
great, is the LB type svc already there?
f
This will be so fun!
@red-king-19196, the LB handles IPv6 right?
We can easily set up a pool
r
It should be IPv4-only atm
f
Oh crap
That's really bad. 😞
🥲 1
Because the config says IP, not IPv4
w
time=“2024-03-19T085043Z” level=info msg=“Starting kube-vip.io [v0.6.0]” 2024-03-19T085043.723870550Z time=“2024-03-19T085043Z” level=info msg=“namespace [kube-system], Mode: [ARP], Features(s): Control Plane:[false], Services:[true]” 2024-03-19T085043.724075653Z time=“2024-03-19T085043Z” level=fatal msg=“eth1 is not valid interface, reason: get eth1 failed, error: Link not found”
the daemonset crashes
should we set up all clusters with second interface?
r
how many nodes are there for the guest cluster?
w
or should we move the vlan-62 and vlan-42 to the mgmt network?
we can remove all of them, we have just restarted the whole cluster on version 1.3
r
only mgmt nodes need to attach to the additional network, i.e. vlan62
w
ok
makes sense
r
yeah because kube-vip pods are running on mgmt nodes only
so you need to make sure they have the same network config
w
just to make sure I’m not misunderstanding- we are talking about the guest clusters now- not the actual physical nodes right?
r
yes, the RKE2 VMs
w
ok thanks.
They are all reconciling now- should take a few minutes. The daemonset will retry until restarted right?
r
wait a second
did you modify the one on the harvester cluster?
sorry i should’ve made it clear: it is the one on the guest cluster we need to modify
w
ah ok makes more sense. i’ll revert
🙏 1
The main kube-vip in the Harvester cluster has now been restored. The kube-vip in the RKE2 guest cluster has been configured. I’m waiting for the reconciliation to be finished. We’ll keep our fingers crossed now 🙂
the ingress controller seems to be in a weird state
I tried setting up a new RKE2 cluster but got the same error with nginx
r
is there anything wrong mentioned in the ingress controller’s log ?
w
Yes. This is the error that prevents the pods from being created:
r
hmmm, this is weird. do you see any calico pod running on the guest cluster?
w
i’ve just confirmed- the host has a file called nodename at that location.
yes the calico pods seems to be running fine
but the ingress controllers does not start
r
What’s your RKE2/Rancher/Harvester version?
w
Clusters: v1.27.11+rke2r1 Rancher: v2.8.2 Harvester: 1.3.0
Found it! I had forgotten the - iptables addon which is required for RKE2 clusters for some reason.
👍 1
But the original problem still remains. It still can’t connect to the load balancer
The load balancer gets correctly updated with correct ip and the network is now available on the node. What is the next step to troubleshooting?
f
Let's see if we can start to reach them, I guess they don't answer to icmp?
BTW if the IP for the LB doesn't come from a DHCP server but an internal pool, where do I enter the gateway info?
r
Is the LB IP assigned to the interface attached to Vlan62?
BTW if the IP for the LB doesn’t come from a DHCP server but an internal pool, where do I enter the gateway info?
Gateway info can be configured when creating the LB IP pool.
Like this
w
Exactly- this is where we set the gateway and it’s set to the correct gateway in the network.
f
Exciting.
I will see if I have done something wrong.
r
Is the kube-vip daemonset (the one in the guest cluster) updated with the Vlan62 interface?
w
yes
i can feel that we are so close now. I will celebrate when this is done.
f
Agree, let's see what we can do, can we set up something that answers?
w
I’m trying now to create a completely new cluster just to make sure we have everything setup correclty.
Sorry. On the new cluster setup from scratch with kube-vip configured and iptables correctly in place - it still has problems connecting to the load balancer.
Any other ideas on how to move forward?
f
Do you have a service running on port 80 so we can test and curl it internally?
service connected to the LB.
r
from where you try to connect to the load balancer? is it possible to `ping`/`curl` from a pod or vm that is on the vlan62 network?
f
We can connect from the router/switch directly on the same vlan, other VMs on the same vlan as well.
As you hear I am mostly worried that I screwed up networking 😉
r
So we can be sure about that L2 is working fine
What’s the CNI used for the RKE2 guest cluster?
w
we are using calico as CNI. And yes- the service is running. It’s accessible from the cluster and if we put an public ip adress on the node and expose the ingress through the node we can reach it directly. The kube-vip receives the correct ip from the load balancer (provided by harvester cloud provider): 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP qlen 1000 link/ether 6e6df6b98f:4f brd ffffffffff:ff inet 172.30.11.185/24 brd 172.30.11.255 scope global dynamic noprefixroute eth0 valid_lft 175sec preferred_lft 175sec inet6 fe80:6c6df6fffeb98f4f/64 scope link valid_lft forever preferred_lft forever 3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP qlen 1000 link/ether dad0eb63bf:11 brd ffffffffff:ff inet 193.180.173.32/32 scope global eth1 valid_lft forever preferred_lft forever inet6 2a125bc2a1aa9364d5f4ad06/64 scope global dynamic noprefixroute valid_lft 2591938sec preferred_lft 604738sec inet6 fe80:ff936d8453d6ef84/64 scope link noprefixroute valid_lft forever preferred_lft forever We still can not find the root problem. The clusters have now been reset from scratch again
f
So eth1 can reach the router and other things on the network and on the internet. But we can't reach eth1 from the outside or on the same subnet. It should be on vlan62 and we can reach other things on vlan 62. Hoovering over the 443/HTTPS that is in endpoints gives the external IP.
It all looks good.
w
What can we try to narrow it down? Anyone have an idea?
f
OK after adding:
ip route add 193.180.173.0/24 via 193.180.173.32 dev eth1
we can ping it from the outside in, but we can't reach the services. But the default GW is 172.30.11.1 and the public IP there is on another /24 network. In my world the LB should have two legs, one with the public IP and a default GW and one internal with just it's internal network where it can reach the services. Or am I thinking wrong?
Also, the cool thing is that the machine gets an IPv6 adress and that works great from the internet but no services runs on it.
Yes, it's a routing issue, I might need your guidance to see if we have designed it wrong.
w
We tried to make the problem a bit less complicated so we created an vm from harvester and assigned a load balancer to it in harvester. The load balancer assigns an ip from the pool but no traffic arrives to the vm. How can we narrow it down further?
f
I am ready to redesign after what works well for Rancher, I just want to get this going when we are so close 🙂
@red-king-19196, I also see that nodes can get an internal and one external adress, how does it no witch is internal and externa, RFC1918 adresses?
w
Has anyone any recommendations on how to configure routing in gateways in order for load balancers to work properly in harvester?
f
Good morning.
r
vm load-balancing is a bit different from k8s guest cluster load-balancing. besides that, an iptables rule for each load balancer IP drops all traffic other than the ports specified. so simply pinging the load balancer IP will never work.
f
Thanks!
r
I did some experiments in my home lab, and it worked without any issues. But I’m unsure if I have a similar setup to yours. If you’d like to provide more information about your network infra configurations, I’d love to help look into it.
f
I am working to do a drawing to show how it's set up.
r
According to your previous-provided configuration:
Copy code
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP qlen 1000
    link/ether 6e:6d:f6:b9:8f:4f brd ff:ff:ff:ff:ff:ff
    inet 172.30.11.185/24 brd 172.30.11.255 scope global dynamic noprefixroute eth0
       valid_lft 175sec preferred_lft 175sec
    inet6 fe80::6c6d:f6ff:feb9:8f4f/64 scope link
       valid_lft forever preferred_lft forever
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP qlen 1000
    link/ether da:d0:eb:63:bf:11 brd ff:ff:ff:ff:ff:ff
    inet 193.180.173.32/32 scope global eth1
       valid_lft forever preferred_lft forever
    inet6 2a12:5bc2::a1aa:9364:d5f4:ad06/64 scope global dynamic noprefixroute
       valid_lft 2591938sec preferred_lft 604738sec
    inet6 fe80::ff93:6d84:53d6:ef84/64 scope link noprefixroute
       valid_lft forever preferred_lft forever
Is it possible to manually bind an IP address from Vlan62 to
eth1
? I assume
193.180.173.32
is within the subnet on Vlan62.
f
We can do that. I just wonder if you give an IP via a pool, how do you assign a route out?
r
with the connected route should be fine.
f
My plan/thought is that vlan 62 in the LB is the default way out and has a leg in 42. For the pods we can just use vlan 42 and a default way out there via a GW, is that OK?
r
Do you mean you’re going to set up two default gateways?
f
One on each vlan. That was my thought. But only use one default gateway per machine.
r
IIRC you can only set up one default route per routing table. If there are multiple default routes on a routing table, only one will be used.
f
I know that. There is only one per machine.
1
r
So, is the default route in the VM configured with the gateway IP in Vlan62?
f
It it’s a customers VM, yes.
If it’s a VM running a cluster, no. Only the load balancers should have that.
r
The load balancer is just a /32 IP address
f
How does it get a route out? It has two legs, one in each vlan to handle traffic and has the default GW created for vlan 62.
w
Both networks have the same machine as their default gw (but with different IP adresses) I think what is needed is a route between the networks on the GW, right?
f
Default GW for the LB must be via vlan 62, traffic in and traffic out the same way.
Zespre, what do you mean that the LB is a /32? It does use the subnet size described byt the pool, right? Otherwise it can't reach the GW.
This is how the vlans are set up:
But the LB, is that a pod? I can't find the public interface attatched to it.
I can see that the LB gets an IP but I am not sure what interface on what pod (or VM) it binds it to.
w
Where is the /32 ip supposed to be registered? On the LB pod, host machine or VM?
f
Zespre, is this your ticket? https://github.com/harvester/harvester/issues/5486 This might be the issue we have.
r
The allocated /32 LB IP addresses will be configured on the
mgmt-br
interface on one of the management nodes. The LB custom resources themselves are just denotations; they’re not actual entities that live on the system. What runs in the LB Pod is not the LB component that deals with traffic load balancing; it’s actually the LB controller, which reconciles the LB custom resources.
w
Ok so we need a nic on the vm in the same vlan to receive the traffic? We have a limited amount of ip:s in the public network- does it need to have an ip on the public network?
I’m sorry if I’m not understanding this correctly:)
f
Do we have and documentation on what kind of network setups are supported? It seems quite advanced to setup something so easy.
r
The GH ticket 5486 was created to track the issue we have here. It would be great if you could help to provide more details of your use case on the ticket. The current version of the Harvester cloud provider and load balancer requires the LB IP allocated from the configured IPPool to be routable on the VM Network where those VMs are attached. In your case, the IPPool has to be configured with a range of IPs on the VLAN 42 network. However, it’s possible to create another IPPool with a range of IPs on the VLAN 62 network and allocate LB IPs for LB svcs on the guest cluster. It’s just that those LB IPs will be bound to the VM’s NIC, which is attached to VLAN 42. Apparently, it’s not accessible from outside to the LB IP because the IP that belongs to the subnet running on VLAN 62 falls within the subnet running on VLAN 42. The “workaround” I suggested a few weeks ago is to have all the VM of the guest cluster also attach to the VLAN 62 network. After that, configure the kube-vip to listen on the NIC, which is attached to the VLAN 62 network, instead of the default one (attached to the VLAN 42 network). In that case, the traffic from outside can be routed and reach the LB IP (now assigned to the NIC on VLAN 62). The tricky part is the return traffic still goes back via the NIC attached to the VLAN 42 network due to the default GW setting. This is the limitation I have observed by now.
w
Ok thanks a lot for the clarification on the issue. As I interpret it - would it be easier if we use the management network in the guest cluster? If so, how can we choose the management network in Rancher?
r
That’s another limitation: you can’t choose what network to be the management network for the to-be-created guest cluster on the Rancher dashboard. IIRC, if you attach multiple networks to the VMs, the one with a default GW will be picked as the management network. So in your case, the management network will always be the VLAN 42 network.
w
This is something I have already understood (not at all clear from the documentation btw) - but the management network I can choose when creating vm:s in harvester is not available to pick when creating clusters in rancher. Why is that?
r
Seems there is a misunderstanding with the term “management network”
w
And also, what type of route do I need to provide elsewhere if we want to use the default gw on 42 but run kube-vip on 62?
r
do you have screenshots that I can check with?
w
I can leave the management network selected in harvester but not in rancher.
I have tried to recreate it through default/mgmt but it’s not working
r
what type of route do I need to provide elsewhere if we want to use the default gw on 42 but run kube-vip on 62?
You’ll need to make sure packets can be routed to the VM’s VLAN62 NIC via network infrastructure
w
So I’m not sure how anyone is using the load balancer feature today..
r
That “management Network” cannot be used in Rancher Integration because it’s actually the pod network on Harvester cluster
it’s internal only to the Harvester cluster
w
Ok but can you explain how to set up an example that works for us to use the load balancer in rancher without any network routes elsewhere?
Meaning if we don’t need any other vlans etc
If we have external IPs in a pool - how can we assign them correctly so they can be used with the harvester cloud provider to the clusters?
The 62 network has 254 external ip:s we want to use but hopefully through the load balancer
r
Firstly, I’ll create a new VM with two networks attached: one
default/customer
and the other
default/public
. Then, I’ll assign IP addresses to the VM for the two NICs, respectively. And try to ping those two IP addresses from outside to see if that works.
w
We don’t want to assign these to nodes directly
Yes this works
r
Then it should work. The difference is that the LB IP is a /32 IP, but the ingress traffic should be routed to the VM without problem.
Do you have a live environment? We might need to check if the traffic arrives at the VM.
ping
doesn’t work because there’s a iptables rule dropping that kind of packets.
w
So the vms in the guest cluster should have two nics- one in 42 (customers) and 62 (public), default gw is in 42 and 62 shouldn’t have any ip or gw?
I have a live environment yes
r
yeah gw is on 42
might need to tcpdump on the 62's interface
w
The way we have tested it is with curl and an empty Nginx on the test vm
We can curl the ip from the physical harvester nodes but not from outside
r
is the ip in the range of 62?
w
Yes
r
great, then we have a pair of control and treatment groups.
i highly suggest capturing packet using tcpdump on the guest cluster VM
something like
tcpdump -ennvi eth1 port 80
to see if the traffic is actually coming into the node
w
Hm when I test the basic vm on 62 now I seem to be able to reach it from the harvester nodes but not from outside
👀 1
Sorry, I had specified /28 instead of /24 in the fixed ip. restarting.
r
one question: is the vlan 42 network an isolated network? i remember that it’s a private cidr, but can it access outside via NAT or something?
i ask because the return traffic will go through vlan 42 since the default gw is on it
w
after restart and changing the ip from /28 to /24 it’s still the same error
r
still cannot access the 62 ip on the VM from outside world?
w
when you say isolated network- what to you mean? its a standard internal ip range on a separate vlan. nothing outside can reach it
i can access the 62 from rancher nodes but not from outside no
Is it because they are on different nics?
r
when you say isolated network- what to you mean?
let me rephrase my question: when running a basic vm attached to the vlan42 network, can the vm access outside?
w
ah yes. 42 is the default way out to the internet from the vm:s - they can access internet for updating packages etc
the private network 1001 is isolated from internet
👌 1
r
ok cool. then the return traffic shouldn’t be a problem
so where are we now? we cannot access the 62 ip on the basic vm attached to both 42 and 62 network, is that right?
i mean from outside
w
precisely. we can only access them from inside
👌 1
r
yeah, that’s basically what the
Active
statuses indicated on the dashboard. It means both 42 and 62's gw is reachable from harvester’s perspective
what about creating a basic VM but this time attach it to only 62 network. and see if you can ping it from outside
w
Good idea- lets try it. But then we need to specify default gw on the network
r
yeah, we need to set 62's gw for the vm
w
It’s starting up now
It’s working
i can reach it from outside
but what does this mean?
i can not have all nodes on 62 with default gw on that network
r
ok so let me summarize here: • we cannot access the 62 ip on the vm from outside when its default gw is on 42 • however, we can access the 62 ip on the vm from outside when its default gw is on 62
w
yes
r
> i can not have all nodes on 62 with default gw on that network yeah i know that
😂 1
Since this is an asymmetric routing, things could go wrong with some devices. The ingress (from VM’s perspective) traffic is on 62's NIC, but the egress traffic leaves the VM through 42's NIC. To prove it is actually the case, we have to run tcpdump on the VM that attached to both networks.
w
ok lets do that
This is what is being received on the node with two nics:
Copy code
06:11:16.233475 74:83:ef:3f:33:15 > 12:d1:c0:a5:26:fe, ethertype IPv4 (0x0800), length 143: (tos 0x0, ttl 53, id 0, offset 0, flags [DF], proto TCP (6), length 129)
    94.254.5.205.64279 > 193.180.173.9.80: Flags [P.], cksum 0xbe38 (correct), seq 0:77, ack 1, win 2058, options [nop,nop,TS val 357830547 ecr 1531826598], length 77: HTTP, length: 77
	GET / HTTP/1.1
	Host: 193.180.173.9
	User-Agent: curl/7.86.0
	Accept: */*

06:11:17.395156 74:83:ef:3f:33:15 > 12:d1:c0:a5:26:fe, ethertype IPv4 (0x0800), length 143: (tos 0x0, ttl 53, id 0, offset 0, flags [DF], proto TCP (6), length 129)
    94.254.5.205.64279 > 193.180.173.9.80: Flags [P.], cksum 0xb9af (correct), seq 0:77, ack 1, win 2058, options [nop,nop,TS val 357831708 ecr 1531826598], length 77: HTTP, length: 77
	GET / HTTP/1.1
	Host: 193.180.173.9
	User-Agent: curl/7.86.0
	Accept: */*
r
does tcpdump listen on eth1 (vlan 62)?
w
yes
r
seems only capturing ingress packets (as we thought)
could you also listen on eth0? run a different tcpdump command is okay. to see if the return traffic is there
w
ok
Copy code
tcpdump -ennvi enp1s0 port 80
tcpdump: listening on enp1s0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
06:16:05.503367 d2:c4:5e:68:25:bd > bc:24:11:e2:48:5f, ethertype IPv4 (0x0800), length 74: (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60)
    193.180.173.9.80 > 94.254.5.205.64439: Flags [S.], cksum 0xd3b7 (incorrect -> 0x598f), seq 3133257661, ack 3845955479, win 65160, options [mss 1460,sackOK,TS val 1532117933 ecr 4188442700,nop,wscale 7], length 0
06:16:05.517028 d2:c4:5e:68:25:bd > bc:24:11:e2:48:5f, ethertype IPv4 (0x0800), length 66: (tos 0x0, ttl 64, id 58879, offset 0, flags [DF], proto TCP (6), length 52)
    193.180.173.9.80 > 94.254.5.205.64439: Flags [.], cksum 0xd3af (incorrect -> 0x847b), ack 78, win 509, options [nop,nop,TS val 1532117947 ecr 4188442717], length 0
Copy code
193.180.173.9.80 > 94.254.5.205.64439: Flags [P.], cksum 0xd70a (incorrect -> 0x0800), seq 1:860, ack 78, win 509, options [nop,nop,TS val 1532117948 ecr 4188442717], length 859: HTTP, length: 859
	HTTP/1.1 200 OK
	Server: nginx/1.18.0 (Ubuntu)
	Date: Wed, 17 Apr 2024 06:16:05 GMT
	Content-Type: text/html
	Content-Length: 612
	Last-Modified: Wed, 17 Apr 2024 05:53:26 GMT
	Connection: keep-alive
	ETag: "661f63d6-264"
	Accept-Ranges: bytes

	<!DOCTYPE html>
	<html>
	<head>
	<title>Welcome to nginx!</title>
	<style>
r
cool, so nginx indeed replied
w
sure. and it works when I curl from the harvester physical nodes.
r
so we need to check which hop drops the response packet
w
great
r
maybe there’s a firewall that keeps track of all the connection states. it observes that the incoming response packet does not have a corresponding state and drops the packet. that could be the reason.
the last thing to check (from harvester’s perspective) is to tcpdump on the NIC of the physical node, see if that resp packet leaves harvester intact
need to find out which node the vm runs on
w
OK
i’ll check on the physical node where the vm runs
r
and then tcpdump on the uplink interface of the
external
clusternetwork
w
hm not sure how to know which it is..
r
navigate to the clusternetwork page on the dashboard
w
ok
Copy code
tcpdump -ennvi mgmt-br port 80
tcpdump: listening on mgmt-br, link-type EN10MB (Ethernet), snapshot length 262144 bytes
06:32:32.024215 74:83:ef:3f:33:15 > 12:d1:c0:a5:26:fe, ethertype 802.1Q (0x8100), length 82: vlan 62, p 0, ethertype IPv4 (0x0800), (tos 0x0, ttl 53, id 0, offset 0, flags [DF], proto TCP (6), length 64)
    94.254.5.205.64970 > 193.180.173.9.80: Flags [S], cksum 0x6781 (correct), seq 1252897823, win 65535, options [mss 1460,nop,wscale 6,nop,nop,TS val 3493801674 ecr 0,sackOK,eol], length 0
06:32:32.041166 74:83:ef:3f:33:15 > 12:d1:c0:a5:26:fe, ethertype 802.1Q (0x8100), length 70: vlan 62, p 0, ethertype IPv4 (0x0800), (tos 0x0, ttl 53, id 0, offset 0, flags [DF], proto TCP (6), length 52)
    94.254.5.205.64970 > 193.180.173.9.80: Flags [.], cksum 0x6e11 (correct), ack 3760628793, win 2058, options [nop,nop,TS val 3493801696 ecr 1533104462], length 0
06:32:32.041169 74:83:ef:3f:33:15 > 12:d1:c0:a5:26:fe, ethertype 802.1Q (0x8100), length 147: vlan 62, p 0, ethertype IPv4 (0x0800), (tos 0x0, ttl 53, id 0, offset 0, flags [DF], proto TCP (6), length 129)
    94.254.5.205.64970 > 193.180.173.9.80: Flags [P.], cksum 0xd39e (correct), seq 0:77, ack 1, win 2058, options [nop,nop,TS val 3493801696 ecr 1533104462], length 77: HTTP, length: 77
	GET / HTTP/1.1
	Host: 193.180.173.9
	User-Agent: curl/7.86.0
	Accept: */*

06:32:32.186408 74:83:ef:3f:33:15 > 12:d1:c0:a5:26:fe, ethertype 802.1Q (0x8100), length 147: vlan 62, p 0, ethertype IPv4 (0x0800), (tos 0x0, ttl 53, id 0, offset 0, flags [DF], proto TCP (6), length 129)
    94.254.5.205.64970 > 193.180.173.9.80: Flags [P.], cksum 0xd30d (correct), seq 0:77, ack 1, win 2058, options [nop,nop,TS val 3493801841 ecr 1533104462], length 77: HTTP, length: 77
	GET / HTTP/1.1
	Host: 193.180.173.9
	User-Agent: curl/7.86.0
	Accept: */*
this ^ is from the physical node 7 where the vm is running
no egress traffic is captured
What could be the reason for the traffic to not arrive here when we know it is sent from the vm?
r
sorry i was on a call. back now
i remember the
customers
network is on
external
clusternetwork. so it can’t be
mgmt-br
You can follow the steps below to find the uplink in your environment
or you can just listen on
external-bo
w
Ok 👌🏻 let me check
Ok on
external-bo
I can see the packages and it retries a few times sending these packages (but they aren’t received)
Copy code
193.180.173.9.80 > 94.254.5.205.49633: Flags [.], cksum 0xd3bb (incorrect -> 0x7811), ack 78, win 509, options [nop,nop,TS val 1536724833 ecr 2047885249,nop,nop,sack 1 {1:78}], length 0
07:32:54.428081 d2:c4:5e:68:25:bd > bc:24:11:e2:48:5f, ethertype 802.1Q (0x8100), length 929: vlan 42, p 0, ethertype IPv4 (0x0800), (tos 0x0, ttl 64, id 31472, offset 0, flags [DF], proto TCP (6), length 911)
    193.180.173.9.80 > 94.254.5.205.49633: Flags [P.], cksum 0xd70a (incorrect -> 0xae0f), seq 1:860, ack 78, win 509, options [nop,nop,TS val 1536726860 ecr 2047885249], length 859: HTTP, length: 859
	HTTP/1.1 200 OK
	Server: nginx/1.18.0 (Ubuntu)
	Date: Wed, 17 Apr 2024 07:32:40 GMT
	Content-Type: text/html
	Content-Length: 612
	Last-Modified: Wed, 17 Apr 2024 05:53:26 GMT
	Connection: keep-alive
	ETag: "661f63d6-264"
	Accept-Ranges: bytes

	<!DOCTYPE html>
	<html>
	<head>
	<title>Welcome to nginx!</title>
	<style>
	    body {
	        width: 35em;
	        margin: 0 auto;
	        font-family: Tahoma, Verdana, Arial, sans-serif;
	    }
	</style>
	</head>
	<body>
	<h1>Welcome to nginx!</h1>
	<p>If you see this page, the nginx web server is successfully installed and
	working. Further configuration is required.</p>

	<p>For online documentation and support please refer to
	<a href="<http://nginx.org/>"><http://nginx.org|nginx.org></a>.<br/>
	Commercial support is available at
	<a href="<http://nginx.com/>"><http://nginx.com|nginx.com></a>.</p>

	<p><em>Thank you for using nginx.</em></p>
	</body>
	</html>
07:32:54.523337 d2:c4:5e:68:25:bd > bc:24:11:e2:48:5f, ethertype 802.1Q (0x8100), length 82: vlan 42, p 0, ethertype IPv4 (0x0800), (tos 0x0, ttl 64, id 31473, offset 0, flags [DF], proto TCP (6), length 64)
    193.180.173.9.80 > 94.254.5.205.49633: Flags [.], cksum 0xd3bb (incorrect -> 0x677e), ack 78, win 509, options [nop,nop,TS val 1536726955 ecr 2047887370,nop,nop,sack 1 {1:78}], length 0
07:32:56.644935 d2:c4:5e:68:25:bd > bc:24:11:e2:48:5f, ethertype 802.1Q (0x8100), length 82: vlan 42, p 0, ethertype IPv4 (0x0800), (tos 0x0, ttl 64, id 31474, offset 0, flags [DF], proto TCP (6), length 64)
    193.180.173.9.80 > 94.254.5.205.49633: Flags [.], cksum 0xd3bb (incorrect -> 0x56ec), ack 78, win 509, options [nop,nop,TS val 1536729076 ecr 2047889491,nop,nop,sack 1 {1:78}], length 0
Could this be the problem that we have another bonding mode than active-backup?
r
so it must be some devices dropping the packet on the returning path, between the harvester node and the source
w
OK. This is very helpful
r
Could this be the problem that we have another bonding mode than active-backup?
unlikely. because in a basic vm setup it works without issues
w
ok thanks
I’m not an expert in routing or networks (as you might have noticed 🙂) but so I understand when I should talk to my network admins- what could they test to verify the problem on their side? Routing tables? tcpdump on the router etc?
r
the first thing to check i would suggest is the gw of 42 network
i don’t know what your network infra looks like, but it’s very likely that the returning packets are dropped in the first hop after they leave harvester.
yeah, if your network admin could do tcpdump on the gw interface of the 42 network that would be helpful
and probably do the same on the default gw on that router to see if the packets are routed successfully
👍 1