This message was deleted Rancher Users #harvester

Join Slack

This message was deleted.

# harvester

adamant-kite-43734

03/18/2024, 10:06 AM

This message was deleted.

wooden-area-49191

03/18/2024, 11:38 AM

this is an overview over our network structure. the external ips in the ip-pool is in a external network and somehow that needs to communicate with the internal network- where do we best configure this?

red-king-19196

03/19/2024, 6:26 AM

Hi, @wooden-area-49191. What is the network to which the two RKE2 VMs are attached? Is it the Vlan-42 network? If that’s the case, and the LB IP pool is created based on the Vlan-62 network, the LB IP allocation will work, but the connectivity will not.

red-king-19196

03/19/2024, 7:54 AM

Is the left-hand-side RKE2 box a control-plane node of the guest cluster?

wooden-area-49191

03/19/2024, 7:59 AM

Yes- both RKE2 clusters have a control plane in the guest clusters. The IP Pool allocation works but the connectivity does not. Host unreachable.

flat-librarian-14243

03/19/2024, 8:00 AM

The thought as with most LBs is that the LB has a leg on both vlans and lets traffic in over IPv4 and IPv6 in that way. The LB is the smart thing.

wooden-area-49191

03/19/2024, 8:02 AM

I’m guessing there should be some form of setting in the load balancer that is required so it knows it’s supposed to send traffic from one network (external) to the vlan-42 network?

red-king-19196

03/19/2024, 8:05 AM

Do you see the LB IP configured on the RKE2 node’s interface? (need to ssh into the VM)

wooden-area-49191

03/19/2024, 8:05 AM

Great question- one sec

flat-librarian-14243

03/19/2024, 8:06 AM

We can also simplify it with one physical nic with all vlans.

red-king-19196

03/19/2024, 8:07 AM

Harvester does not support VLAN trunk on VM Network yet

wooden-area-49191

03/19/2024, 8:14 AM

yes- the RKE2 node interface has the assigned ip to the eth0 interface.

wooden-area-49191

03/19/2024, 8:15 AM

Copy code

inet 193.180.173.23/32 scope global eth0
       valid_lft forever preferred_lft forever

red-king-19196

03/19/2024, 8:15 AM

thanks, things got a little bit clear so the LB IP is under VLAN62; and the management IP is under VLAN42. And they’re all bind to the same interface - eth0. Is that correct?

wooden-area-49191

03/19/2024, 8:16 AM

exacly

red-king-19196

03/19/2024, 8:21 AM

Does the VM have another interface? say eth1

red-king-19196

03/19/2024, 8:21 AM

which attach to vlan62

wooden-area-49191

03/19/2024, 8:21 AM

we have tried that also but it didn’t work.

red-king-19196

03/19/2024, 8:22 AM

you’ll need that vlan62 interface to work

flat-librarian-14243

03/19/2024, 8:22 AM

Should we just set up all on one nic Christian, to make it simpler.

red-king-19196

03/19/2024, 8:23 AM

if you’re going to use only one nic, there are no multiple vlans

wooden-area-49191

03/19/2024, 8:27 AM

ok lets try this again @flat-librarian-14243

wooden-area-49191

03/19/2024, 8:30 AM

i’m adding another nic to the RKE2 cluster now with vlan62.

red-king-19196

03/19/2024, 8:31 AM

please find out the interface name, we’ll need that for the kube-vip daemonset

wooden-area-49191

03/19/2024, 8:34 AM

Ok! It’s still reconciling

wooden-area-49191

03/19/2024, 8:41 AM

Now it’s up. The second interface name is drumroll…. eth1

wooden-area-49191

03/19/2024, 8:43 AM

but now the lb ip is not assigned to either eth0 or eth1

wooden-area-49191

03/19/2024, 8:43 AM

the eth1 has a DHCP assigned address- this can be removed though

red-king-19196

03/19/2024, 8:45 AM

that won’t be necessary. please edit the kube-vip daemonset on the RKE2 cluster

wooden-area-49191

03/19/2024, 8:45 AM

red-king-19196

03/19/2024, 8:45 AM

we need to add

eth1

to the

vip_interface

environment variable

red-king-19196

03/19/2024, 8:45 AM

by default, there’s no value for that env var

red-king-19196

03/19/2024, 8:46 AM

and kube-vip will use the interface which has default gw configured. in your case, it will be eth0. so we need to specify the one we want to use

wooden-area-49191

03/19/2024, 8:47 AM

so vip_interface here shoule be eth1?

red-king-19196

03/19/2024, 8:49 AM

from

Copy code

- name: svc_enable
      value: "true"
    - name: vip_arp
      value: "true"
    - name: vip_cidr
      value: "32"
    - name: vip_interface
    - name: vip_leaderelection
      value: "false"

Copy code

- name: svc_enable
      value: "true"
    - name: vip_arp
      value: "true"
    - name: vip_cidr
      value: "32"
    - name: vip_interface
      value: "eth1"
    - name: vip_leaderelection
      value: "false"

red-king-19196

03/19/2024, 8:49 AM

and wait for the pod to roll out the new config

wooden-area-49191

03/19/2024, 8:49 AM

done!

red-king-19196

03/19/2024, 8:49 AM

great, is the LB type svc already there?

flat-librarian-14243

03/19/2024, 8:49 AM

This will be so fun!

flat-librarian-14243

03/19/2024, 8:50 AM

@red-king-19196, the LB handles IPv6 right?

flat-librarian-14243

03/19/2024, 8:50 AM

We can easily set up a pool

red-king-19196

03/19/2024, 8:50 AM

It should be IPv4-only atm

flat-librarian-14243

03/19/2024, 8:50 AM

Oh crap

flat-librarian-14243

03/19/2024, 8:50 AM

That's really bad. 😞

🥲 1

flat-librarian-14243

03/19/2024, 8:51 AM

Because the config says IP, not IPv4

wooden-area-49191

03/19/2024, 8:52 AM

time=“2024-03-19T085043Z” level=info msg=“Starting kube-vip.io [v0.6.0]” 2024-03-19T085043.723870550Z time=“2024-03-19T085043Z” level=info msg=“namespace [kube-system], Mode: [ARP], Features(s): Control Plane:[false], Services:[true]” 2024-03-19T085043.724075653Z time=“2024-03-19T085043Z” level=fatal msg=“eth1 is not valid interface, reason: get eth1 failed, error: Link not found”

wooden-area-49191

03/19/2024, 8:52 AM

the daemonset crashes

wooden-area-49191

03/19/2024, 8:53 AM

should we set up all clusters with second interface?

red-king-19196

03/19/2024, 8:53 AM

how many nodes are there for the guest cluster?

wooden-area-49191

03/19/2024, 8:53 AM

or should we move the vlan-62 and vlan-42 to the mgmt network?

wooden-area-49191

03/19/2024, 8:53 AM

we can remove all of them, we have just restarted the whole cluster on version 1.3

red-king-19196

03/19/2024, 8:54 AM

only mgmt nodes need to attach to the additional network, i.e. vlan62

wooden-area-49191

03/19/2024, 8:54 AM

wooden-area-49191

03/19/2024, 8:54 AM

makes sense

red-king-19196

03/19/2024, 8:54 AM

yeah because kube-vip pods are running on mgmt nodes only

red-king-19196

03/19/2024, 8:54 AM

so you need to make sure they have the same network config

wooden-area-49191

03/19/2024, 8:56 AM

just to make sure I’m not misunderstanding- we are talking about the guest clusters now- not the actual physical nodes right?

red-king-19196

03/19/2024, 8:57 AM

yes, the RKE2 VMs

wooden-area-49191

03/19/2024, 8:58 AM

ok thanks.

wooden-area-49191

03/19/2024, 8:59 AM

They are all reconciling now- should take a few minutes. The daemonset will retry until restarted right?

red-king-19196

03/19/2024, 8:59 AM

wait a second

red-king-19196

03/19/2024, 9:00 AM

did you modify the one on the harvester cluster?

red-king-19196

03/19/2024, 9:01 AM

sorry i should’ve made it clear: it is the one on the guest cluster we need to modify

wooden-area-49191

03/19/2024, 9:18 AM

ah ok makes more sense. i’ll revert

🙏 1

wooden-area-49191

03/19/2024, 9:38 AM

The main kube-vip in the Harvester cluster has now been restored. The kube-vip in the RKE2 guest cluster has been configured. I’m waiting for the reconciliation to be finished. We’ll keep our fingers crossed now 🙂

wooden-area-49191

03/19/2024, 9:40 AM

the ingress controller seems to be in a weird state

wooden-area-49191

03/19/2024, 9:57 AM

I tried setting up a new RKE2 cluster but got the same error with nginx

red-king-19196

03/19/2024, 10:06 AM

is there anything wrong mentioned in the ingress controller’s log ?

wooden-area-49191

03/19/2024, 10:46 AM

Yes. This is the error that prevents the pods from being created:

red-king-19196

03/19/2024, 10:53 AM

hmmm, this is weird. do you see any calico pod running on the guest cluster?

wooden-area-49191

03/19/2024, 10:54 AM

i’ve just confirmed- the host has a file called nodename at that location.

wooden-area-49191

03/19/2024, 10:55 AM

yes the calico pods seems to be running fine

wooden-area-49191

03/19/2024, 10:56 AM

but the ingress controllers does not start

red-king-19196

03/19/2024, 11:09 AM

What’s your RKE2/Rancher/Harvester version?

wooden-area-49191

03/19/2024, 11:28 AM

Clusters: v1.27.11+rke2r1 Rancher: v2.8.2 Harvester: 1.3.0

wooden-area-49191

03/19/2024, 3:11 PM

Found it! I had forgotten the - iptables addon which is required for RKE2 clusters for some reason.

👍 1

wooden-area-49191

03/19/2024, 5:03 PM

But the original problem still remains. It still can’t connect to the load balancer

wooden-area-49191

03/19/2024, 5:04 PM

The load balancer gets correctly updated with correct ip and the network is now available on the node. What is the next step to troubleshooting?

flat-librarian-14243

03/19/2024, 5:30 PM

Let's see if we can start to reach them, I guess they don't answer to icmp?

flat-librarian-14243

03/19/2024, 5:36 PM

BTW if the IP for the LB doesn't come from a DHCP server but an internal pool, where do I enter the gateway info?

red-king-19196

03/20/2024, 7:12 AM

Is the LB IP assigned to the interface attached to Vlan62?

red-king-19196

03/20/2024, 7:13 AM

BTW if the IP for the LB doesn’t come from a DHCP server but an internal pool, where do I enter the gateway info?

Gateway info can be configured when creating the LB IP pool.

red-king-19196

03/20/2024, 7:15 AM

Like this

wooden-area-49191

03/20/2024, 8:14 AM

Exactly- this is where we set the gateway and it’s set to the correct gateway in the network.

flat-librarian-14243

03/20/2024, 8:48 AM

Exciting.

flat-librarian-14243

03/20/2024, 8:49 AM

I will see if I have done something wrong.

red-king-19196

03/20/2024, 8:50 AM

Is the kube-vip daemonset (the one in the guest cluster) updated with the Vlan62 interface?

wooden-area-49191

03/20/2024, 8:50 AM

yes

wooden-area-49191

03/20/2024, 8:51 AM

i can feel that we are so close now. I will celebrate when this is done.

flat-librarian-14243

03/20/2024, 8:51 AM

Agree, let's see what we can do, can we set up something that answers?

wooden-area-49191

03/20/2024, 8:51 AM

I’m trying now to create a completely new cluster just to make sure we have everything setup correclty.

wooden-area-49191

03/20/2024, 9:55 AM

Sorry. On the new cluster setup from scratch with kube-vip configured and iptables correctly in place - it still has problems connecting to the load balancer.

wooden-area-49191

03/20/2024, 9:55 AM

Any other ideas on how to move forward?

flat-librarian-14243

03/20/2024, 9:56 AM

Do you have a service running on port 80 so we can test and curl it internally?

flat-librarian-14243

03/20/2024, 9:56 AM

service connected to the LB.

red-king-19196

03/20/2024, 10:06 AM

from where you try to connect to the load balancer? is it possible to `ping`/`curl` from a pod or vm that is on the vlan62 network?

flat-librarian-14243

03/20/2024, 10:12 AM

We can connect from the router/switch directly on the same vlan, other VMs on the same vlan as well.

flat-librarian-14243

03/20/2024, 10:13 AM

As you hear I am mostly worried that I screwed up networking 😉

red-king-19196

03/20/2024, 10:14 AM

So we can be sure about that L2 is working fine

red-king-19196

03/20/2024, 10:14 AM

What’s the CNI used for the RKE2 guest cluster?

wooden-area-49191

03/21/2024, 7:13 PM

we are using calico as CNI. And yes- the service is running. It’s accessible from the cluster and if we put an public ip adress on the node and expose the ingress through the node we can reach it directly. The kube-vip receives the correct ip from the load balancer (provided by harvester cloud provider): 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP qlen 1000 link/ether 6e6df6b98f:4f brd ffffffffff:ff inet 172.30.11.185/24 brd 172.30.11.255 scope global dynamic noprefixroute eth0 valid_lft 175sec preferred_lft 175sec inet6 fe80:6c6df6fffeb98f4f/64 scope link valid_lft forever preferred_lft forever 3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP qlen 1000 link/ether dad0eb63bf:11 brd ffffffffff:ff inet 193.180.173.32/32 scope global eth1 valid_lft forever preferred_lft forever inet6 2a125bc2a1aa9364d5f4ad06/64 scope global dynamic noprefixroute valid_lft 2591938sec preferred_lft 604738sec inet6 fe80:ff936d8453d6ef84/64 scope link noprefixroute valid_lft forever preferred_lft forever We still can not find the root problem. The clusters have now been reset from scratch again

flat-librarian-14243

03/21/2024, 8:14 PM

So eth1 can reach the router and other things on the network and on the internet. But we can't reach eth1 from the outside or on the same subnet. It should be on vlan62 and we can reach other things on vlan 62. Hoovering over the 443/HTTPS that is in endpoints gives the external IP.

flat-librarian-14243

03/21/2024, 8:15 PM

It all looks good.

wooden-area-49191

03/21/2024, 8:21 PM

What can we try to narrow it down? Anyone have an idea?

flat-librarian-14243

03/21/2024, 8:50 PM

OK after adding:

ip route add 193.180.173.0/24 via 193.180.173.32 dev eth1

we can ping it from the outside in, but we can't reach the services. But the default GW is 172.30.11.1 and the public IP there is on another /24 network. In my world the LB should have two legs, one with the public IP and a default GW and one internal with just it's internal network where it can reach the services. Or am I thinking wrong?

flat-librarian-14243

03/21/2024, 9:11 PM

Also, the cool thing is that the machine gets an IPv6 adress and that works great from the internet but no services runs on it.

flat-librarian-14243

03/21/2024, 9:29 PM

Yes, it's a routing issue, I might need your guidance to see if we have designed it wrong.

wooden-area-49191

03/22/2024, 4:18 AM

We tried to make the problem a bit less complicated so we created an vm from harvester and assigned a load balancer to it in harvester. The load balancer assigns an ip from the pool but no traffic arrives to the vm. How can we narrow it down further?

flat-librarian-14243

03/22/2024, 10:21 AM

I am ready to redesign after what works well for Rancher, I just want to get this going when we are so close 🙂

flat-librarian-14243

03/22/2024, 10:52 AM

@red-king-19196, I also see that nodes can get an internal and one external adress, how does it no witch is internal and externa, RFC1918 adresses?

wooden-area-49191

03/24/2024, 7:20 AM

Has anyone any recommendations on how to configure routing in gateways in order for load balancers to work properly in harvester?

flat-librarian-14243

03/24/2024, 8:46 AM

Good morning.

red-king-19196

03/27/2024, 7:37 AM

vm load-balancing is a bit different from k8s guest cluster load-balancing. besides that, an iptables rule for each load balancer IP drops all traffic other than the ports specified. so simply pinging the load balancer IP will never work.

flat-librarian-14243

03/27/2024, 7:38 AM

Thanks!

red-king-19196

03/27/2024, 7:48 AM

I did some experiments in my home lab, and it worked without any issues. But I’m unsure if I have a similar setup to yours. If you’d like to provide more information about your network infra configurations, I’d love to help look into it.

flat-librarian-14243

03/28/2024, 8:10 PM

I am working to do a drawing to show how it's set up.

red-king-19196

03/29/2024, 7:07 AM

According to your previous-provided configuration:

Copy code

2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP qlen 1000
    link/ether 6e:6d:f6:b9:8f:4f brd ff:ff:ff:ff:ff:ff
    inet 172.30.11.185/24 brd 172.30.11.255 scope global dynamic noprefixroute eth0
       valid_lft 175sec preferred_lft 175sec
    inet6 fe80::6c6d:f6ff:feb9:8f4f/64 scope link
       valid_lft forever preferred_lft forever
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP qlen 1000
    link/ether da:d0:eb:63:bf:11 brd ff:ff:ff:ff:ff:ff
    inet 193.180.173.32/32 scope global eth1
       valid_lft forever preferred_lft forever
    inet6 2a12:5bc2::a1aa:9364:d5f4:ad06/64 scope global dynamic noprefixroute
       valid_lft 2591938sec preferred_lft 604738sec
    inet6 fe80::ff93:6d84:53d6:ef84/64 scope link noprefixroute
       valid_lft forever preferred_lft forever

Is it possible to manually bind an IP address from Vlan62 to

eth1

? I assume

193.180.173.32

is within the subnet on Vlan62.

flat-librarian-14243

03/29/2024, 9:39 AM

We can do that. I just wonder if you give an IP via a pool, how do you assign a route out?

red-king-19196

03/29/2024, 9:44 AM

with the connected route should be fine.

flat-librarian-14243

03/29/2024, 9:49 AM

My plan/thought is that vlan 62 in the LB is the default way out and has a leg in 42. For the pods we can just use vlan 42 and a default way out there via a GW, is that OK?

red-king-19196

03/29/2024, 10:06 AM

Do you mean you’re going to set up two default gateways?

flat-librarian-14243

03/29/2024, 10:07 AM

One on each vlan. That was my thought. But only use one default gateway per machine.

red-king-19196

03/29/2024, 10:09 AM

IIRC you can only set up one default route per routing table. If there are multiple default routes on a routing table, only one will be used.

flat-librarian-14243

03/29/2024, 10:10 AM

I know that. There is only one per machine.

✅ 1

red-king-19196

03/29/2024, 10:12 AM

So, is the default route in the VM configured with the gateway IP in Vlan62?

flat-librarian-14243

03/29/2024, 10:13 AM

It it’s a customers VM, yes.

flat-librarian-14243

03/29/2024, 10:13 AM

If it’s a VM running a cluster, no. Only the load balancers should have that.

red-king-19196

03/29/2024, 10:18 AM

The load balancer is just a /32 IP address

flat-librarian-14243

03/29/2024, 10:19 AM

How does it get a route out? It has two legs, one in each vlan to handle traffic and has the default GW created for vlan 62.

wooden-area-49191

03/29/2024, 1:57 PM

Both networks have the same machine as their default gw (but with different IP adresses) I think what is needed is a route between the networks on the GW, right?

flat-librarian-14243

03/29/2024, 7:42 PM

Default GW for the LB must be via vlan 62, traffic in and traffic out the same way.

flat-librarian-14243

03/29/2024, 7:58 PM

Zespre, what do you mean that the LB is a /32? It does use the subnet size described byt the pool, right? Otherwise it can't reach the GW.

flat-librarian-14243

03/31/2024, 7:22 PM

This is how the vlans are set up:

flat-librarian-14243

03/31/2024, 7:22 PM

But the LB, is that a pod? I can't find the public interface attatched to it.

flat-librarian-14243

03/31/2024, 7:40 PM

I can see that the LB gets an IP but I am not sure what interface on what pod (or VM) it binds it to.

wooden-area-49191

04/01/2024, 3:33 AM

Where is the /32 ip supposed to be registered? On the LB pod, host machine or VM?

flat-librarian-14243

04/01/2024, 7:07 PM

Zespre, is this your ticket? https://github.com/harvester/harvester/issues/5486 This might be the issue we have.

red-king-19196

04/11/2024, 8:16 AM

The allocated /32 LB IP addresses will be configured on the

mgmt-br

interface on one of the management nodes. The LB custom resources themselves are just denotations; they’re not actual entities that live on the system. What runs in the LB Pod is not the LB component that deals with traffic load balancing; it’s actually the LB controller, which reconciles the LB custom resources.

wooden-area-49191

04/12/2024, 6:00 PM

Ok so we need a nic on the vm in the same vlan to receive the traffic? We have a limited amount of ip:s in the public network- does it need to have an ip on the public network?

wooden-area-49191

04/12/2024, 6:01 PM

I’m sorry if I’m not understanding this correctly:)

flat-librarian-14243

04/12/2024, 6:06 PM

Do we have and documentation on what kind of network setups are supported? It seems quite advanced to setup something so easy.

red-king-19196

04/16/2024, 8:38 AM

The GH ticket 5486 was created to track the issue we have here. It would be great if you could help to provide more details of your use case on the ticket. The current version of the Harvester cloud provider and load balancer requires the LB IP allocated from the configured IPPool to be routable on the VM Network where those VMs are attached. In your case, the IPPool has to be configured with a range of IPs on the VLAN 42 network. However, it’s possible to create another IPPool with a range of IPs on the VLAN 62 network and allocate LB IPs for LB svcs on the guest cluster. It’s just that those LB IPs will be bound to the VM’s NIC, which is attached to VLAN 42. Apparently, it’s not accessible from outside to the LB IP because the IP that belongs to the subnet running on VLAN 62 falls within the subnet running on VLAN 42. The “workaround” I suggested a few weeks ago is to have all the VM of the guest cluster also attach to the VLAN 62 network. After that, configure the kube-vip to listen on the NIC, which is attached to the VLAN 62 network, instead of the default one (attached to the VLAN 42 network). In that case, the traffic from outside can be routed and reach the LB IP (now assigned to the NIC on VLAN 62). The tricky part is the return traffic still goes back via the NIC attached to the VLAN 42 network due to the default GW setting. This is the limitation I have observed by now.

wooden-area-49191

04/16/2024, 5:17 PM

Ok thanks a lot for the clarification on the issue. As I interpret it - would it be easier if we use the management network in the guest cluster? If so, how can we choose the management network in Rancher?

red-king-19196

04/17/2024, 4:53 AM

That’s another limitation: you can’t choose what network to be the management network for the to-be-created guest cluster on the Rancher dashboard. IIRC, if you attach multiple networks to the VMs, the one with a default GW will be picked as the management network. So in your case, the management network will always be the VLAN 42 network.

wooden-area-49191

04/17/2024, 4:57 AM

This is something I have already understood (not at all clear from the documentation btw) - but the management network I can choose when creating vm:s in harvester is not available to pick when creating clusters in rancher. Why is that?

red-king-19196

04/17/2024, 4:58 AM

Seems there is a misunderstanding with the term “management network”

wooden-area-49191

04/17/2024, 4:58 AM

And also, what type of route do I need to provide elsewhere if we want to use the default gw on 42 but run kube-vip on 62?

red-king-19196

04/17/2024, 4:58 AM

do you have screenshots that I can check with?

wooden-area-49191

04/17/2024, 5:02 AM

I can leave the management network selected in harvester but not in rancher.

wooden-area-49191

04/17/2024, 5:03 AM

I have tried to recreate it through default/mgmt but it’s not working

red-king-19196

04/17/2024, 5:03 AM

what type of route do I need to provide elsewhere if we want to use the default gw on 42 but run kube-vip on 62?

You’ll need to make sure packets can be routed to the VM’s VLAN62 NIC via network infrastructure

wooden-area-49191

04/17/2024, 5:03 AM

So I’m not sure how anyone is using the load balancer feature today..

red-king-19196

04/17/2024, 5:05 AM

That “management Network” cannot be used in Rancher Integration because it’s actually the pod network on Harvester cluster

red-king-19196

04/17/2024, 5:05 AM

it’s internal only to the Harvester cluster

wooden-area-49191

04/17/2024, 5:06 AM

Ok but can you explain how to set up an example that works for us to use the load balancer in rancher without any network routes elsewhere?

wooden-area-49191

04/17/2024, 5:07 AM

Meaning if we don’t need any other vlans etc

wooden-area-49191

04/17/2024, 5:09 AM

If we have external IPs in a pool - how can we assign them correctly so they can be used with the harvester cloud provider to the clusters?

wooden-area-49191

04/17/2024, 5:12 AM

The 62 network has 254 external ip:s we want to use but hopefully through the load balancer

red-king-19196

04/17/2024, 5:13 AM

Firstly, I’ll create a new VM with two networks attached: one

default/customer

and the other

default/public

. Then, I’ll assign IP addresses to the VM for the two NICs, respectively. And try to ping those two IP addresses from outside to see if that works.

wooden-area-49191

04/17/2024, 5:13 AM

We don’t want to assign these to nodes directly

wooden-area-49191

04/17/2024, 5:14 AM

Yes this works

red-king-19196

04/17/2024, 5:18 AM

Then it should work. The difference is that the LB IP is a /32 IP, but the ingress traffic should be routed to the VM without problem.

red-king-19196

04/17/2024, 5:19 AM

Do you have a live environment? We might need to check if the traffic arrives at the VM.

ping

doesn’t work because there’s a iptables rule dropping that kind of packets.

wooden-area-49191

04/17/2024, 5:19 AM

So the vms in the guest cluster should have two nics- one in 42 (customers) and 62 (public), default gw is in 42 and 62 shouldn’t have any ip or gw?

wooden-area-49191

04/17/2024, 5:20 AM

I have a live environment yes

red-king-19196

04/17/2024, 5:20 AM

yeah gw is on 42

red-king-19196

04/17/2024, 5:21 AM

might need to tcpdump on the 62's interface

wooden-area-49191

04/17/2024, 5:24 AM

The way we have tested it is with curl and an empty Nginx on the test vm

wooden-area-49191

04/17/2024, 5:25 AM

We can curl the ip from the physical harvester nodes but not from outside

red-king-19196

04/17/2024, 5:26 AM

is the ip in the range of 62?

wooden-area-49191

04/17/2024, 5:27 AM

Yes

red-king-19196

04/17/2024, 5:29 AM

great, then we have a pair of control and treatment groups.

red-king-19196

04/17/2024, 5:30 AM

i highly suggest capturing packet using tcpdump on the guest cluster VM

red-king-19196

04/17/2024, 5:31 AM

something like

tcpdump -ennvi eth1 port 80

red-king-19196

04/17/2024, 5:31 AM

to see if the traffic is actually coming into the node

wooden-area-49191

04/17/2024, 5:35 AM

Hm when I test the basic vm on 62 now I seem to be able to reach it from the harvester nodes but not from outside

👀 1

wooden-area-49191

04/17/2024, 5:39 AM

Sorry, I had specified /28 instead of /24 in the fixed ip. restarting.

red-king-19196

04/17/2024, 5:40 AM

one question: is the vlan 42 network an isolated network? i remember that it’s a private cidr, but can it access outside via NAT or something?

red-king-19196

04/17/2024, 5:41 AM

i ask because the return traffic will go through vlan 42 since the default gw is on it

wooden-area-49191

04/17/2024, 5:42 AM

after restart and changing the ip from /28 to /24 it’s still the same error

red-king-19196

04/17/2024, 5:43 AM

still cannot access the 62 ip on the VM from outside world?

wooden-area-49191

04/17/2024, 5:43 AM

when you say isolated network- what to you mean? its a standard internal ip range on a separate vlan. nothing outside can reach it

wooden-area-49191

04/17/2024, 5:43 AM

i can access the 62 from rancher nodes but not from outside no

wooden-area-49191

04/17/2024, 5:44 AM

Is it because they are on different nics?

red-king-19196

04/17/2024, 5:45 AM

when you say isolated network- what to you mean?

let me rephrase my question: when running a basic vm attached to the vlan42 network, can the vm access outside?

wooden-area-49191

04/17/2024, 5:46 AM

ah yes. 42 is the default way out to the internet from the vm:s - they can access internet for updating packages etc

wooden-area-49191

04/17/2024, 5:46 AM

the private network 1001 is isolated from internet

👌 1

red-king-19196

04/17/2024, 5:46 AM

ok cool. then the return traffic shouldn’t be a problem

red-king-19196

04/17/2024, 5:48 AM

so where are we now? we cannot access the 62 ip on the basic vm attached to both 42 and 62 network, is that right?

red-king-19196

04/17/2024, 5:49 AM

i mean from outside

wooden-area-49191

04/17/2024, 5:49 AM

precisely. we can only access them from inside

👌 1

red-king-19196

04/17/2024, 5:51 AM

yeah, that’s basically what the

Active

statuses indicated on the dashboard. It means both 42 and 62's gw is reachable from harvester’s perspective

red-king-19196

04/17/2024, 5:53 AM

what about creating a basic VM but this time attach it to only 62 network. and see if you can ping it from outside

wooden-area-49191

04/17/2024, 5:54 AM

Good idea- lets try it. But then we need to specify default gw on the network

red-king-19196

04/17/2024, 5:55 AM

yeah, we need to set 62's gw for the vm

wooden-area-49191

04/17/2024, 5:57 AM

It’s starting up now

wooden-area-49191

04/17/2024, 5:58 AM

It’s working

wooden-area-49191

04/17/2024, 5:58 AM

i can reach it from outside

wooden-area-49191

04/17/2024, 6:02 AM

but what does this mean?

wooden-area-49191

04/17/2024, 6:03 AM

i can not have all nodes on 62 with default gw on that network

red-king-19196

04/17/2024, 6:04 AM

ok so let me summarize here: • we cannot access the 62 ip on the vm from outside when its default gw is on 42 • however, we can access the 62 ip on the vm from outside when its default gw is on 62

wooden-area-49191

04/17/2024, 6:04 AM

yes

red-king-19196

04/17/2024, 6:05 AM

> i can not have all nodes on 62 with default gw on that network yeah i know that

😂 1

red-king-19196

04/17/2024, 6:09 AM

Since this is an asymmetric routing, things could go wrong with some devices. The ingress (from VM’s perspective) traffic is on 62's NIC, but the egress traffic leaves the VM through 42's NIC. To prove it is actually the case, we have to run tcpdump on the VM that attached to both networks.

wooden-area-49191

04/17/2024, 6:09 AM

ok lets do that

wooden-area-49191

04/17/2024, 6:11 AM

This is what is being received on the node with two nics:

Copy code

06:11:16.233475 74:83:ef:3f:33:15 > 12:d1:c0:a5:26:fe, ethertype IPv4 (0x0800), length 143: (tos 0x0, ttl 53, id 0, offset 0, flags [DF], proto TCP (6), length 129)
    94.254.5.205.64279 > 193.180.173.9.80: Flags [P.], cksum 0xbe38 (correct), seq 0:77, ack 1, win 2058, options [nop,nop,TS val 357830547 ecr 1531826598], length 77: HTTP, length: 77
	GET / HTTP/1.1
	Host: 193.180.173.9
	User-Agent: curl/7.86.0
	Accept: */*

06:11:17.395156 74:83:ef:3f:33:15 > 12:d1:c0:a5:26:fe, ethertype IPv4 (0x0800), length 143: (tos 0x0, ttl 53, id 0, offset 0, flags [DF], proto TCP (6), length 129)
    94.254.5.205.64279 > 193.180.173.9.80: Flags [P.], cksum 0xb9af (correct), seq 0:77, ack 1, win 2058, options [nop,nop,TS val 357831708 ecr 1531826598], length 77: HTTP, length: 77
	GET / HTTP/1.1
	Host: 193.180.173.9
	User-Agent: curl/7.86.0
	Accept: */*

red-king-19196

04/17/2024, 6:12 AM

does tcpdump listen on eth1 (vlan 62)?

wooden-area-49191

04/17/2024, 6:13 AM

yes

red-king-19196

04/17/2024, 6:14 AM

seems only capturing ingress packets (as we thought)

red-king-19196

04/17/2024, 6:14 AM

could you also listen on eth0? run a different tcpdump command is okay. to see if the return traffic is there

wooden-area-49191

04/17/2024, 6:14 AM

wooden-area-49191

04/17/2024, 6:16 AM

Copy code

tcpdump -ennvi enp1s0 port 80
tcpdump: listening on enp1s0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
06:16:05.503367 d2:c4:5e:68:25:bd > bc:24:11:e2:48:5f, ethertype IPv4 (0x0800), length 74: (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60)
    193.180.173.9.80 > 94.254.5.205.64439: Flags [S.], cksum 0xd3b7 (incorrect -> 0x598f), seq 3133257661, ack 3845955479, win 65160, options [mss 1460,sackOK,TS val 1532117933 ecr 4188442700,nop,wscale 7], length 0
06:16:05.517028 d2:c4:5e:68:25:bd > bc:24:11:e2:48:5f, ethertype IPv4 (0x0800), length 66: (tos 0x0, ttl 64, id 58879, offset 0, flags [DF], proto TCP (6), length 52)
    193.180.173.9.80 > 94.254.5.205.64439: Flags [.], cksum 0xd3af (incorrect -> 0x847b), ack 78, win 509, options [nop,nop,TS val 1532117947 ecr 4188442717], length 0

wooden-area-49191

04/17/2024, 6:17 AM

Copy code

193.180.173.9.80 > 94.254.5.205.64439: Flags [P.], cksum 0xd70a (incorrect -> 0x0800), seq 1:860, ack 78, win 509, options [nop,nop,TS val 1532117948 ecr 4188442717], length 859: HTTP, length: 859
	HTTP/1.1 200 OK
	Server: nginx/1.18.0 (Ubuntu)
	Date: Wed, 17 Apr 2024 06:16:05 GMT
	Content-Type: text/html
	Content-Length: 612
	Last-Modified: Wed, 17 Apr 2024 05:53:26 GMT
	Connection: keep-alive
	ETag: "661f63d6-264"
	Accept-Ranges: bytes

	<!DOCTYPE html>
	<html>
	<head>
	<title>Welcome to nginx!</title>
	<style>

red-king-19196

04/17/2024, 6:18 AM

cool, so nginx indeed replied

wooden-area-49191

04/17/2024, 6:18 AM

sure. and it works when I curl from the harvester physical nodes.

red-king-19196

04/17/2024, 6:19 AM

so we need to check which hop drops the response packet

wooden-area-49191

04/17/2024, 6:20 AM

great

red-king-19196

04/17/2024, 6:21 AM

maybe there’s a firewall that keeps track of all the connection states. it observes that the incoming response packet does not have a corresponding state and drops the packet. that could be the reason.

red-king-19196

04/17/2024, 6:24 AM

the last thing to check (from harvester’s perspective) is to tcpdump on the NIC of the physical node, see if that resp packet leaves harvester intact

red-king-19196

04/17/2024, 6:25 AM

need to find out which node the vm runs on

wooden-area-49191

04/17/2024, 6:26 AM

wooden-area-49191

04/17/2024, 6:26 AM

i’ll check on the physical node where the vm runs

red-king-19196

04/17/2024, 6:26 AM

and then tcpdump on the uplink interface of the

external

clusternetwork

wooden-area-49191

04/17/2024, 6:29 AM

hm not sure how to know which it is..

red-king-19196

04/17/2024, 6:30 AM

navigate to the clusternetwork page on the dashboard

wooden-area-49191

04/17/2024, 6:31 AM

wooden-area-49191

04/17/2024, 6:32 AM

Copy code

tcpdump -ennvi mgmt-br port 80
tcpdump: listening on mgmt-br, link-type EN10MB (Ethernet), snapshot length 262144 bytes
06:32:32.024215 74:83:ef:3f:33:15 > 12:d1:c0:a5:26:fe, ethertype 802.1Q (0x8100), length 82: vlan 62, p 0, ethertype IPv4 (0x0800), (tos 0x0, ttl 53, id 0, offset 0, flags [DF], proto TCP (6), length 64)
    94.254.5.205.64970 > 193.180.173.9.80: Flags [S], cksum 0x6781 (correct), seq 1252897823, win 65535, options [mss 1460,nop,wscale 6,nop,nop,TS val 3493801674 ecr 0,sackOK,eol], length 0
06:32:32.041166 74:83:ef:3f:33:15 > 12:d1:c0:a5:26:fe, ethertype 802.1Q (0x8100), length 70: vlan 62, p 0, ethertype IPv4 (0x0800), (tos 0x0, ttl 53, id 0, offset 0, flags [DF], proto TCP (6), length 52)
    94.254.5.205.64970 > 193.180.173.9.80: Flags [.], cksum 0x6e11 (correct), ack 3760628793, win 2058, options [nop,nop,TS val 3493801696 ecr 1533104462], length 0
06:32:32.041169 74:83:ef:3f:33:15 > 12:d1:c0:a5:26:fe, ethertype 802.1Q (0x8100), length 147: vlan 62, p 0, ethertype IPv4 (0x0800), (tos 0x0, ttl 53, id 0, offset 0, flags [DF], proto TCP (6), length 129)
    94.254.5.205.64970 > 193.180.173.9.80: Flags [P.], cksum 0xd39e (correct), seq 0:77, ack 1, win 2058, options [nop,nop,TS val 3493801696 ecr 1533104462], length 77: HTTP, length: 77
	GET / HTTP/1.1
	Host: 193.180.173.9
	User-Agent: curl/7.86.0
	Accept: */*

06:32:32.186408 74:83:ef:3f:33:15 > 12:d1:c0:a5:26:fe, ethertype 802.1Q (0x8100), length 147: vlan 62, p 0, ethertype IPv4 (0x0800), (tos 0x0, ttl 53, id 0, offset 0, flags [DF], proto TCP (6), length 129)
    94.254.5.205.64970 > 193.180.173.9.80: Flags [P.], cksum 0xd30d (correct), seq 0:77, ack 1, win 2058, options [nop,nop,TS val 3493801841 ecr 1533104462], length 77: HTTP, length: 77
	GET / HTTP/1.1
	Host: 193.180.173.9
	User-Agent: curl/7.86.0
	Accept: */*

wooden-area-49191

04/17/2024, 6:33 AM

this ^ is from the physical node 7 where the vm is running

wooden-area-49191

04/17/2024, 6:35 AM

no egress traffic is captured

wooden-area-49191

04/17/2024, 6:42 AM

What could be the reason for the traffic to not arrive here when we know it is sent from the vm?

red-king-19196

04/17/2024, 6:56 AM

sorry i was on a call. back now

red-king-19196

04/17/2024, 6:57 AM

i remember the

customers

network is on

external

clusternetwork. so it can’t be

mgmt-br

red-king-19196

04/17/2024, 6:59 AM

You can follow the steps below to find the uplink in your environment

red-king-19196

04/17/2024, 7:08 AM

or you can just listen on

external-bo

wooden-area-49191

04/17/2024, 7:20 AM

Ok 👌🏻 let me check

wooden-area-49191

04/17/2024, 7:34 AM

Ok on

external-bo

I can see the packages and it retries a few times sending these packages (but they aren’t received)

Copy code

193.180.173.9.80 > 94.254.5.205.49633: Flags [.], cksum 0xd3bb (incorrect -> 0x7811), ack 78, win 509, options [nop,nop,TS val 1536724833 ecr 2047885249,nop,nop,sack 1 {1:78}], length 0
07:32:54.428081 d2:c4:5e:68:25:bd > bc:24:11:e2:48:5f, ethertype 802.1Q (0x8100), length 929: vlan 42, p 0, ethertype IPv4 (0x0800), (tos 0x0, ttl 64, id 31472, offset 0, flags [DF], proto TCP (6), length 911)
    193.180.173.9.80 > 94.254.5.205.49633: Flags [P.], cksum 0xd70a (incorrect -> 0xae0f), seq 1:860, ack 78, win 509, options [nop,nop,TS val 1536726860 ecr 2047885249], length 859: HTTP, length: 859
	HTTP/1.1 200 OK
	Server: nginx/1.18.0 (Ubuntu)
	Date: Wed, 17 Apr 2024 07:32:40 GMT
	Content-Type: text/html
	Content-Length: 612
	Last-Modified: Wed, 17 Apr 2024 05:53:26 GMT
	Connection: keep-alive
	ETag: "661f63d6-264"
	Accept-Ranges: bytes

	<!DOCTYPE html>
	<html>
	<head>
	<title>Welcome to nginx!</title>
	<style>
	    body {
	        width: 35em;
	        margin: 0 auto;
	        font-family: Tahoma, Verdana, Arial, sans-serif;
	    }
	</style>
	</head>
	<body>
	<h1>Welcome to nginx!</h1>
	<p>If you see this page, the nginx web server is successfully installed and
	working. Further configuration is required.</p>

	<p>For online documentation and support please refer to
	<a href="<http://nginx.org/>"><http://nginx.org|nginx.org></a>.<br/>
	Commercial support is available at
	<a href="<http://nginx.com/>"><http://nginx.com|nginx.com></a>.</p>

	<p><em>Thank you for using nginx.</em></p>
	</body>
	</html>
07:32:54.523337 d2:c4:5e:68:25:bd > bc:24:11:e2:48:5f, ethertype 802.1Q (0x8100), length 82: vlan 42, p 0, ethertype IPv4 (0x0800), (tos 0x0, ttl 64, id 31473, offset 0, flags [DF], proto TCP (6), length 64)
    193.180.173.9.80 > 94.254.5.205.49633: Flags [.], cksum 0xd3bb (incorrect -> 0x677e), ack 78, win 509, options [nop,nop,TS val 1536726955 ecr 2047887370,nop,nop,sack 1 {1:78}], length 0
07:32:56.644935 d2:c4:5e:68:25:bd > bc:24:11:e2:48:5f, ethertype 802.1Q (0x8100), length 82: vlan 42, p 0, ethertype IPv4 (0x0800), (tos 0x0, ttl 64, id 31474, offset 0, flags [DF], proto TCP (6), length 64)
    193.180.173.9.80 > 94.254.5.205.49633: Flags [.], cksum 0xd3bb (incorrect -> 0x56ec), ack 78, win 509, options [nop,nop,TS val 1536729076 ecr 2047889491,nop,nop,sack 1 {1:78}], length 0

wooden-area-49191

04/17/2024, 7:47 AM

Could this be the problem that we have another bonding mode than active-backup?

red-king-19196

04/17/2024, 7:47 AM

so it must be some devices dropping the packet on the returning path, between the harvester node and the source

wooden-area-49191

04/17/2024, 7:47 AM

OK. This is very helpful

red-king-19196

04/17/2024, 7:48 AM

Could this be the problem that we have another bonding mode than active-backup?

unlikely. because in a basic vm setup it works without issues

wooden-area-49191

04/17/2024, 7:48 AM

ok thanks

wooden-area-49191

04/17/2024, 7:51 AM

I’m not an expert in routing or networks (as you might have noticed 🙂) but so I understand when I should talk to my network admins- what could they test to verify the problem on their side? Routing tables? tcpdump on the router etc?

red-king-19196

04/17/2024, 7:56 AM

the first thing to check i would suggest is the gw of 42 network

red-king-19196

04/17/2024, 7:57 AM

i don’t know what your network infra looks like, but it’s very likely that the returning packets are dropped in the first hop after they leave harvester.

red-king-19196

04/17/2024, 8:00 AM

yeah, if your network admin could do tcpdump on the gw interface of the 42 network that would be helpful

red-king-19196

04/17/2024, 8:01 AM

and probably do the same on the default gw on that router to see if the packets are routed successfully

👍 1

3 Views

Open in Slack

Previous Next