This message was deleted Rancher Users #k3s

Join Slack

This message was deleted.

# k3s

adamant-kite-43734

08/14/2023, 11:47 AM

This message was deleted.

abundant-garden-4496

08/14/2023, 1:52 PM

I am installing it with the following options: shell: "curl -sfL https://get.k3s.io | K3S_TOKEN={{ k3s_token }} INSTALL_K3S_EXEC='server --cluster-init -i {{ internal_ip }} --flannel-iface {{ nic_name }}.4000 --node-external-ip {{ internal_ip }} --disable=servicelb --disable-network-policy' sh -"

abundant-garden-4496

08/14/2023, 1:53 PM

where internal_ip is the internal IP address on VLAN 4000 (192.168.x.x). That is the same VLAN that the LoadBalancer IP's are also presented in the 159.x.x.x range.

abundant-garden-4496

08/14/2023, 4:21 PM

Is there anyone in this slack providing support?

miniature-salesclerk-33951

08/14/2023, 9:04 PM

You might try wireguard instead of vxlan for flannel.

abundant-garden-4496

08/14/2023, 9:06 PM

Hi Scott. Yes, I tried it just out of desperation, and even host-gw mode and all of them do exactly the same

miniature-salesclerk-33951

08/14/2023, 9:09 PM

I wonder if it's a routing issue. Can you curl from the same layer2 vlan?

miniature-salesclerk-33951

08/14/2023, 9:09 PM

"The pods are also accessible from inside the cluster private IP range via CURL from another pod." suggests that you can

miniature-salesclerk-33951

08/14/2023, 9:09 PM

In this case, it seems like those IPs are missing a route from the outside world

abundant-garden-4496

08/14/2023, 9:10 PM

Yes, I can run CURL on one of the hosts (192.168.x.x) and it reaches the POD just fine. Same with another POD in the cluster

miniature-salesclerk-33951

08/14/2023, 9:10 PM

Sounds like you have layer2 connectivity but don't have the necessary routing for outside requests to find their way in

abundant-garden-4496

08/14/2023, 9:11 PM

That's what it feels like to me. But where would that route come from? I can boot one of those hosts in recovery mode (light OS) and configure NETPLAN to directly assign the LB IP to a host and it is then reachable across the Internet

miniature-salesclerk-33951

08/14/2023, 9:11 PM

I don't have any experience with Hetzner at all. It's BGP policy for bare metal.

abundant-garden-4496

08/14/2023, 9:12 PM

so it feels like for some reason the IP to Mac is never propagated beyond the local vswitch

miniature-salesclerk-33951

08/14/2023, 9:12 PM

Right - that makes sense if you don't have BGP routing to it

miniature-salesclerk-33951

08/14/2023, 9:12 PM

So you're able to connect over layer2, but not from outside of it

abundant-garden-4496

08/14/2023, 9:12 PM

I haven't actually tried BGP because they indicated it wouldn't work, I am only using L2, but I have a mind to try BGP

miniature-salesclerk-33951

08/14/2023, 9:12 PM

If you don't do BGP, MetalLB will only work over layer2

miniature-salesclerk-33951

08/14/2023, 9:12 PM

So that explains your problem

abundant-garden-4496

08/14/2023, 9:12 PM

that makes sense

abundant-garden-4496

08/14/2023, 9:13 PM

I was following this guide which implied it worked: https://mlohr.com/kubernetes-cluster-on-hetzner-bare-metal-servers/

abundant-garden-4496

08/14/2023, 9:13 PM

I'll try doing a BGP advertisement and see what happens now.

miniature-salesclerk-33951

08/14/2023, 9:14 PM

You need to have BGP setup with your network ops team or provider and then do this: https://metallb.universe.tf/configuration/#bgp-configuration

miniature-salesclerk-33951

08/14/2023, 9:15 PM

Layer2 can work, by the way, if you expose everything through an external balancer that can talk on that subnet and is exposed somehow to the outside world so it acts as a proxy. That doesn't sound like what you're trying to architect, though

abundant-garden-4496

08/14/2023, 9:15 PM

Emm. it feels like if Hetzner tell me it won't work, I am going to struggle to get any AS information from them

miniature-salesclerk-33951

08/14/2023, 9:16 PM

But you can do what you're doing and proxy requests through something like haproxy, nginx plug, or a cloud provider load balancer

abundant-garden-4496

08/14/2023, 9:16 PM

Yes, I was trying to avoid using their cloud LB because it is pretty limited in what it can do

miniature-salesclerk-33951

08/14/2023, 9:17 PM

Yup, they tend to be pretty limited. Perhaps you can install haproxy on that subnet and expose services through it.

abundant-garden-4496

08/14/2023, 9:17 PM

I'm very familiar with the Azure LB, but it is really designed for fronting Azure VNET's and not external networks

abundant-garden-4496

08/14/2023, 9:18 PM

I am using HA Proxy in the cluster for some things, but the risk is I end up with a single point of failure still

miniature-salesclerk-33951

08/14/2023, 9:18 PM

Or connect a VPN to your internal network and add routing that way, but then it'll pass through your network to Hetzner's which adds extra bandwidth and latency and dependencies.

miniature-salesclerk-33951

08/14/2023, 9:18 PM

You can do haproxy HA with keepalived

miniature-salesclerk-33951

08/14/2023, 9:19 PM

You route traffic through the VIP shared between the haproxy nodes.

miniature-salesclerk-33951

08/14/2023, 9:20 PM

That's what we're doing with our Rancher stuff

abundant-garden-4496

08/14/2023, 9:20 PM

I might have to do something like that VPN in the interim. I've been bashing my head for days on how the L2 IP to MAC got propagated to the routers, and the answer is it isn't without BGP 🙂

miniature-salesclerk-33951

08/14/2023, 9:20 PM

I've done it all of the above ways. It's job security having to learn about all this stuff 🙂

abundant-garden-4496

08/14/2023, 9:20 PM

In your HA proxy config, are those running in the cluster exposed via a Daemonset and node ports or something?

miniature-salesclerk-33951

08/14/2023, 9:21 PM

If you're not a DevOps Engineer yet you will be by the time you get it all setup 😂

abundant-garden-4496

08/14/2023, 9:21 PM

My head it hurting from everything i've had to learn in the last week along to move my fully functioning Dapp off Azure k8s, Mongo Atlas and Vercel into dedicated hosts

miniature-salesclerk-33951

08/14/2023, 9:22 PM

I generally use ClusterIP and then expose through Ingress. DNS is pointed at haproxy which routes the request back to the cluster. cert-manager to handle the automatic letsencrypt for everything.

abundant-garden-4496

08/14/2023, 9:22 PM

I started at Ingress, but moved to Services just to rule out Traefik. Will move back now I now the issue

miniature-salesclerk-33951

08/14/2023, 9:23 PM

Since you are using MetalLB, you can use LoadBalancer type services and you'll need to add config in haproxy to route traffic to the MetaLB IP based on however you want to do that, but to do host based routing, you'll need to do layer7, so you'll want to have certbot or something like that on haproxy.

miniature-salesclerk-33951

08/14/2023, 9:23 PM

It's less work to not use MetalLB and use ClusterIP instead this way, since the ingress controller already handles the name based routing for you.

miniature-salesclerk-33951

08/14/2023, 9:24 PM

MetalLB is helpful if you have BGP routing and/or if you want to do something like hyperconverged or kubevirt with Multus and do full on VMs inside your k8s.

miniature-salesclerk-33951

08/14/2023, 9:25 PM

MetalLB is used in OpenStack+Kubernetes, for example

abundant-garden-4496

08/14/2023, 9:26 PM

I started at ClusterIP before discovering Metal, but my issue there was I still needed an external Load Balancer to direct the traffic at the different instances on the nodes

miniature-salesclerk-33951

08/14/2023, 9:26 PM

If you don't want to do an external load balancer you can also set DNS A records to the nodes running your ingress handlers with a low TTL, but you risk disruptions when nodes go down since there's no active health checking that way

abundant-garden-4496

08/14/2023, 9:27 PM

I started with that DNS round robin solution too, but DNS is not fault tolerant aware. If a node goes down, it will still keep sending the bad IP to clients

miniature-salesclerk-33951

08/14/2023, 9:27 PM

Absolutely correct. That's the problem that load balancers solve.

abundant-garden-4496

08/14/2023, 9:28 PM

yeah exactly... so in your scenario, where do you normally sit you HA proxy? On the same cluster, or an additional box?

miniature-salesclerk-33951

08/14/2023, 9:29 PM

Outside of the cluster somewhere that can talk to both the cluster and can listen publicly. The cluster doesn't need to be able to listen publicly - it just needs to be able to talk to haproxy and other cluster nodes.

abundant-garden-4496

08/14/2023, 9:32 PM

Gotcha. Just had a thought, why is the route from an external client to my cluster working for me if I boot a host into recovery mode, create an 02-netplan.yaml with an IP address equal to any of my L2 IP address ranges. What is advertising the route there? I am not setting up BGP

miniature-salesclerk-33951

08/14/2023, 9:33 PM

One thing to note is that if you don't use something like MetalLB, you'll need to set service type to ClusterIP if they're set to LoadBalancer in helm charts, or else they'll be stuck in pending forever.

✅ 1

abundant-garden-4496

08/14/2023, 9:33 PM

That would suggest the vSwitch feature in Heztner is already doing the advertisement and presenting those interfaces to my hosts: https://docs.hetzner.com/robot/dedicated-server/network/vswitch

miniature-salesclerk-33951

08/14/2023, 9:33 PM

I'm guessing the node external IP has a route but your BGP address pool doesn't

miniature-salesclerk-33951

08/14/2023, 9:34 PM

The k3s default loadbalancer uses the node externalIP by default, which is why the DNS trick works

abundant-garden-4496

08/14/2023, 9:34 PM

No, I can get to any of the 159.x.x.x addresses if I assign them as IP's on Netplan to a NIC

abundant-garden-4496

08/14/2023, 9:35 PM

it is only when I let Metal assign the IP to a MAC dynamically that it doesn't work

miniature-salesclerk-33951

08/14/2023, 9:35 PM

Yeah, unless you do BGP, MetalLB will only work over Layer2.

abundant-garden-4496

08/14/2023, 9:35 PM

at least I can ping that IP or traceroute to it, and see it reaches my servers NIC

miniature-salesclerk-33951

08/14/2023, 9:36 PM

Layer2 is link-local

miniature-salesclerk-33951

08/14/2023, 9:36 PM

You can do Layer2 to an external load balancer and expose things that way or setup BGP to get external routes in

abundant-garden-4496

08/14/2023, 9:36 PM

I wonder if Ubuntu then is doing something different if you stick it into netplan

miniature-salesclerk-33951

08/14/2023, 9:37 PM

BGP and layer2 and network level stuff

miniature-salesclerk-33951

08/14/2023, 9:37 PM

Are you familiar with OSI model?

miniature-salesclerk-33951

08/14/2023, 9:38 PM

https://en.wikipedia.org/wiki/OSI_model#Layer_architecture

miniature-salesclerk-33951

08/14/2023, 9:39 PM

haproxy can run at layer 4 or 7. MetalLB without BGP runs at layer2.

miniature-salesclerk-33951

08/14/2023, 9:39 PM

layer4 = tcp/udp mode. layer7 = http/https mode. (in haproxy speak)

abundant-garden-4496

08/14/2023, 9:40 PM

Yes, learned the OSI model about 20 years ago 🙂

miniature-salesclerk-33951

08/14/2023, 9:40 PM

Yup - these are old tricks and they're still relevant 🙂

miniature-salesclerk-33951

08/14/2023, 9:41 PM

Containers are also similar to bsd jail chroots

abundant-garden-4496

08/14/2023, 9:41 PM

Definitely. The mystery for me is why it works in Netplan without K8s. Which suggests the route is advertised already by the vSwitch

miniature-salesclerk-33951

08/14/2023, 9:41 PM

Yeah, when you get into Hetzner, you've left my wheelhouse

abundant-garden-4496

08/14/2023, 9:42 PM

their prices can't be beat really.. .so worth the pain

miniature-salesclerk-33951

08/14/2023, 9:42 PM

It looks like vScitch is also Layer2 - https://docs.hetzner.com/robot/dedicated-server/network/vswitch/

abundant-garden-4496

08/14/2023, 9:42 PM

64gb, 8 cores, 512GB NVMe dedicated servers which are very fast for about 40 euros

miniature-salesclerk-33951

08/14/2023, 9:42 PM

If Hetzner is like AWS, you probably have a route section where you can define routes for your cloud networking to a gateway

abundant-garden-4496

08/14/2023, 9:43 PM

no, you just ordered a subnet range for your existing VNET, and it is mapped via the primary interface of the server. So no routing config is accessible to the user, but it does work

miniature-salesclerk-33951

08/14/2023, 9:44 PM

"You can use any private IP addresses for free within the VLAN. Plus, you can order additional public subnets (IPv4 and IPv6) by going to the

IPs

menu tab."

miniature-salesclerk-33951

08/14/2023, 9:44 PM

Is this IP pool that you used public or private?

miniature-salesclerk-33951

08/14/2023, 9:44 PM

(Even if it is public, you still need a route)

abundant-garden-4496

08/14/2023, 9:45 PM

Yep.. so my 192.168.x.x range is in VLAN 4000 which I have just used. The IP's menu is where I have ordered a routable subnet on 159.x.x.x range which is mapped to all servers in the vSwitch

miniature-salesclerk-33951

08/14/2023, 9:45 PM

192.168.x.x is not public

abundant-garden-4496

08/14/2023, 9:46 PM

no.. that is my private range... just for host to host comms for k3s

abundant-garden-4496

08/14/2023, 9:46 PM

159.x.x.x is the LB public range

miniature-salesclerk-33951

08/14/2023, 9:47 PM

"*Public subnet* You need to create an additional routing table for the public subnet so you can configure another default gateway." Ref. https://docs.hetzner.com/robot/dedicated-server/network/vswitch/#traffic

miniature-salesclerk-33951

08/14/2023, 9:48 PM

That's interesting that it has you set the route table yourself and isn't providing it over DHCP.

abundant-garden-4496

08/14/2023, 9:49 PM

I believe I am doing that in NETPLAN like this: network: version: 2 ethernets: enp41s0: dhcp4: no vlans: enp41s0.4000: id: 4000 link: enp41s0 mtu: 1400 addresses: - 192.168.100.2/24 routes: - to: 0.0.0.0/0 via: 159.69.172.25 table: 1 on-link: true routing-policy: - from: 159.69.172.24/29 to: 10.43.0.0/16 table: 254 priority: 0 - from: 159.69.172.24/29 to: 10.42.0.0/16 table: 254 priority: 0 - from: 159.69.172.24/29 table: 1 priority: 10 - to: 159.69.172.24/29 table: 1 priority: 10

miniature-salesclerk-33951

08/14/2023, 9:50 PM

Should be able to check with

route -n

and

ip r

miniature-salesclerk-33951

08/14/2023, 9:51 PM

Ah, here's the netplan docs: https://docs.hetzner.com/robot/dedicated-server/network/vswitch/#example-configuration-systemd-and-netplan-egubuntu-1804

abundant-garden-4496

08/14/2023, 9:51 PM

I am at the limits of my linux knowledge here. Glad I have you for help. This is route -n Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface 0.0.0.0 65.109.113.129 0.0.0.0 UG 0 0 0 enp41s0 10.42.0.0 10.42.0.0 255.255.255.0 UG 0 0 0 flannel.1 10.42.0.0 0.0.0.0 255.255.0.0 U 0 0 0 flannel-wg 10.42.1.0 10.42.1.0 255.255.255.0 UG 0 0 0 flannel.1 10.42.2.0 0.0.0.0 255.255.255.0 U 0 0 0 cni0 159.69.172.0 0.0.0.0 255.255.255.0 U 0 0 0 enp41s0.4000 172.17.0.0 0.0.0.0 255.255.0.0 U 0 0 0 docker0 192.168.100.0 0.0.0.0 255.255.255.0 U 0 0 0 enp41s0.4000

miniature-salesclerk-33951

08/14/2023, 9:51 PM

I'm more used to RHEL and SUSE, so not very familiar with netplan

abundant-garden-4496

08/14/2023, 9:52 PM

so shouldn't I be seeing a 0.0.0.0 via 159.69.172.25 here (gw)

miniature-salesclerk-33951

08/14/2023, 9:53 PM

159.69.172.0   0.0.0.0        255.255.255.0  U    0     0       0 enp41s0.4000

miniature-salesclerk-33951

08/14/2023, 9:54 PM

What's your

ip r

look like

abundant-garden-4496

08/14/2023, 9:54 PM

odd. i am still seeing a flannel.wg route here, even though I disabled that again and went back to vxlan

abundant-garden-4496

08/14/2023, 9:55 PM

default via 65.109.113.129 dev enp41s0 proto static onlink 10.42.0.0/24 via 10.42.0.0 dev flannel.1 onlink 10.42.0.0/16 dev flannel-wg scope link 10.42.1.0/24 via 10.42.1.0 dev flannel.1 onlink 10.42.2.0/24 dev cni0 proto kernel scope link src 10.42.2.1 159.69.172.0/24 dev enp41s0.4000 proto kernel scope link src 159.69.172.26 172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown 192.168.100.0/24 dev enp41s0.4000 proto kernel scope link src 192.168.100.2

miniature-salesclerk-33951

08/14/2023, 9:55 PM

no metrics?

abundant-garden-4496

08/14/2023, 9:55 PM

that is all ip r comes back with for me

miniature-salesclerk-33951

08/14/2023, 9:56 PM

scope link

on 159.69.172.0/24 confirms layer2 routing

abundant-garden-4496

08/14/2023, 9:57 PM

that's very interesting. so that means L2 link local address only

miniature-salesclerk-33951

08/14/2023, 9:57 PM

So, that's as far as the host is concerned

abundant-garden-4496

08/14/2023, 9:58 PM

going to reboot this server to recovery mode and see if it still has the working config and try and run those command there

miniature-salesclerk-33951

08/14/2023, 9:59 PM

Looks like you need to define the gateway for it if you want it to be publicly routable - https://docs.hetzner.com/robot/dedicated-server/network/vswitch/#server-configuration-linux

miniature-salesclerk-33951

08/14/2023, 10:02 PM

It looks like you followed the netplan docs but the netplan docs don't setup the gateway in the example

✅ 1

abundant-garden-4496

08/14/2023, 10:02 PM

Thanks for all your help Scott. That has been incredibly helpful for me. I'm going to put it into recovery mode and see if the working config is there. If it isn't i'll follow the Hertzner guide to setup routing to it, then run the route and ip r commands to check the output. From there I might be able to work out how to setup Netplan, though my problem may be if I want the IP's to float, I can't assign them to any host

👍 1

miniature-salesclerk-33951

08/14/2023, 10:02 PM

Good luck!

abundant-garden-4496

08/14/2023, 10:02 PM

That sounds like my problem then. I wasn't fully familiar with the Ip command to know I was missing anything.

abundant-garden-4496

08/14/2023, 10:03 PM

Really appreciate it. I am so much further along now

👍 1

abundant-garden-4496

08/14/2023, 10:23 PM

Odd. In recovery mode I followed the Hetzner docs using the Ip commands to setup the VLAN and the extra public IP. On first PING from my remote laptop it responded, then immediately went to request timeout Perhaps one of the other nodes got upset or the network detected a spanning tree loop and blocked it

miniature-salesclerk-33951

08/14/2023, 10:50 PM

Interesting. Sounds like progress, though. Does it still work over layer2?

miniature-salesclerk-33951

08/14/2023, 10:51 PM

It's possible ICMP isn't supported - you might try curl

miniature-salesclerk-33951

08/14/2023, 10:52 PM

Or you might try

traceroute -T

abundant-garden-4496

08/15/2023, 9:26 AM

Hi, sorry, needed sleep 🙂

abundant-garden-4496

08/15/2023, 9:26 AM

It still is working via L2 from the other hosts, but not via L3. And it is working properly now remotely, it must have taken time to propagate

abundant-garden-4496

08/15/2023, 9:27 AM

Just got to work out how to get that routing config into Netplan now, without hard coding the IP to a single host

abundant-garden-4496

08/15/2023, 2:58 PM

Hey @miniature-salesclerk-33951 So I have had to add the IP to the NIC's in Netplan and the routes and now I have an identical output wit IP ROUTE and ROUTE -n on both a standalone host in recovery mode that is pingable, and a k3s node that is also pingable now and I can traceroute to it.

🎉 1

abundant-garden-4496

08/15/2023, 2:59 PM

However, still the traffic does not get to the pod when called externally. I just get a connected timed out unless I access it from one of the nodes within the local network. This is pointing more and more like a CNI issue as suspected by the MetalLB guys I spoke to.

miniature-salesclerk-33951

08/15/2023, 4:01 PM

Is it possible the mtu isn't set correctly? It might explain why ICMP gets through but not larger packets.

abundant-garden-4496

08/15/2023, 4:15 PM

I have it set to 1400, but there is a note in that Hetzner article about Netplan not setting it. Though that article is talking about a much older version of Netplan

abundant-garden-4496

08/15/2023, 4:17 PM

When I statically assign the IP to the hosts using Netplan or the IP commands, it does respond to pings and also if you do a Traceroute you can see it goes to the Internet interface on VLAN 4000 as it should: 1 192.168.1.1 (192.168.1.1) 11.420 ms 4.466 ms 4.265 ms 2 185.232.119.252 (185.232.119.252) 19.061 ms 16.024 ms 15.818 ms 3 185.232.119.144 (185.232.119.144) 19.660 ms 15.404 ms 185.232.119.146 (185.232.119.146) 15.642 ms 4 185.232.119.128 (185.232.119.128) 17.313 ms 185.232.119.130 (185.232.119.130) 17.298 ms 185.232.119.128 (185.232.119.128) 16.695 ms 5 195.66.227.209 (195.66.227.209) 18.449 ms 15.226 ms 26.395 ms 6 core6.par.hetzner.com (213.239.252.169) 24.291 ms 25.731 ms 25.546 ms 7 213-239-245-217.clients.your-server.de (213.239.245.217) 41.778 ms core11.nbg1.hetzner.com (213.239.252.173) 36.272 ms 36.181 ms 8 vswitchgw.juniper2.dc1.nbg1.hetzner.com (213.239.245.186) 37.504 ms vswitchgw.juniper2.dc1.nbg1.hetzner.com (213.239.245.62) 33.721 ms vswitchgw.juniper2.dc1.nbg1.hetzner.com (213.239.245.186) 36.834 ms 9 static.26.172.69.159.clients.your-server.de (159.69.172.26) 61.126 ms 57.870 ms 56.397 ms

abundant-garden-4496

08/15/2023, 4:17 PM

If however, I do a traceroute to an IP managed by MetalLB, then it hits the frontend interface on 65.x (but doesn't respond to pings, which I think is right for Metal): 1 192.168.1.1 (192.168.1.1) 6.053 ms 4.201 ms 5.441 ms 2 185.232.119.252 (185.232.119.252) 17.600 ms 15.897 ms 16.190 ms 3 185.232.119.144 (185.232.119.144) 17.489 ms 15.798 ms 185.232.119.146 (185.232.119.146) 14.850 ms 4 185.232.119.130 (185.232.119.130) 59.486 ms 14.223 ms 12.832 ms 5 195.66.227.209 (195.66.227.209) 14.902 ms 14.471 ms 16.598 ms 6 core6.par.hetzner.com (213.239.252.169) 23.860 ms 23.795 ms 22.142 ms 7 core11.nbg1.hetzner.com (213.239.252.173) 38.614 ms core12.nbg1.hetzner.com (213.239.252.253) 36.700 ms 35.750 ms 8 vswitchgw.juniper2.dc1.nbg1.hetzner.com (213.239.245.186) 38.128 ms vswitchgw.juniper2.dc1.nbg1.hetzner.com (213.239.245.62) 39.776 ms vswitchgw.juniper2.dc1.nbg1.hetzner.com (213.239.245.186) 38.918 ms 9 static.48.13.108.65.clients.your-server.de (65.108.13.48) 62.170 ms 56.627 ms 55.951 ms

abundant-garden-4496

08/15/2023, 4:20 PM

Also looks that by statically assigning the IP to the NIC, which is required to populate the default route, prevents that IP being allocated by Metal. It skipped it ad started at .27 instead

abundant-garden-4496

08/15/2023, 4:23 PM

Plus, all of my networking is running on VLAN 4000 on the 192.168.x.x address. That includes complex apps like Mongo, Kafka, ArgoCD, etc all working just fine without errors. It is only this Ingress with issues

miniature-salesclerk-33951

08/15/2023, 4:53 PM

That makes sense - you'd have an IP conflict.

miniature-salesclerk-33951

08/15/2023, 4:55 PM

It might be something where you need to set the gateway in your MetalLB config for the ip address pool

abundant-garden-4496

08/15/2023, 4:59 PM

There isn't a gateway option in metal is there? I can't see anything like that in the IPAddressPool or L2Advertsement CRD's

miniature-salesclerk-33951

08/15/2023, 5:06 PM

Looking at the docs, if there's an externalTrafficPolicy on the service, that could factor, too

miniature-salesclerk-33951

08/15/2023, 5:07 PM

I ran into this recently on JupyterHub helm deployments

abundant-garden-4496

08/15/2023, 5:08 PM

I have externalTrafficPolicy set to Cluster. My understanding is all that controls is hpw the cni (flannel) routes the traffic either to the same node, or any node

miniature-salesclerk-33951

08/15/2023, 5:10 PM

Yeah, I was confusing that with egress policy, which is different

abundant-garden-4496

08/15/2023, 5:12 PM

trying it set to local, though i don't think it will help

abundant-garden-4496

08/15/2023, 5:57 PM

yep, that makes no difference. It does get to the node though, so it isn't an external routing problem, which also suggests MetalLB is fine. I can see the packets being received by TCPDUMP, but just not forwarded to the pods

abundant-garden-4496

08/15/2023, 7:47 PM

something very odd. Using MTR over --tcp from my laptop to any service port I am exposing via the LB, shows the packets stop at the vswitch network device (juniper) just before my server. If I run the same test against a port that the LB is not listening on, then it also shows the host reverse DNS name and IP as the last hop. I can change the service port while it is running and as soon as I do, the packet will then get through. Doesn't matter what the port is same result. It is like the host just swallows it if it is being listened for, without responding back to MTR

miniature-salesclerk-33951

08/15/2023, 8:36 PM

that's weird

miniature-salesclerk-33951

08/15/2023, 8:46 PM

Oh!

miniature-salesclerk-33951

08/15/2023, 8:46 PM

"Since wireguard is a Layer3 vpn, almost all load-balancers will not work, this includes kube-vip and metalLB." - https://www.reddit.com/r/selfhosted/comments/mu6et4/has_anyone_setup_k3s_over_wireguard_is_it_possible/ So you might try going back to vxlan

abundant-garden-4496

08/15/2023, 9:13 PM

Thanks. I am using VXLAN already. I only switched to wireguard briefly a few days ago to rule out vxlan issues 😞

abundant-garden-4496

08/15/2023, 9:42 PM

Dumb question time (I'm resorting to those now). What is the difference between agents and servers? I only have 3 nodes and want HA for things like ETCD, so I need 3x Servers right,, no Nodes?

miniature-salesclerk-33951

08/15/2023, 11:22 PM

Generally in Kubernetes of any flavor, you have three control plane nodes that also run etcd. You have worker nodes outside of that that don't run etcd that you shedule workloads on. k3s is a little special in that it can run on just one node. It is possible to have your control plane nodes schedule workloads if all you have is three machines to work with.

miniature-salesclerk-33951

08/15/2023, 11:23 PM

So there is a minimum of 3 nodes necessary to do HA over etcd

miniature-salesclerk-33951

08/15/2023, 11:24 PM

If you keep those 3 server scheduleable, then they are also functioning as worker nodes. Not ideal for larger production, but probably OK for a quick prototype or home lab situation.

abundant-garden-4496

08/16/2023, 12:46 PM

Thanks for the confirmation, that is what I expected. These are monster nodes with a lot of hardware at their disposal. to run a single app, so sounds ideal for my scenario 😉

🙂 1

abundant-garden-4496

08/16/2023, 2:32 PM

I'm still puzzled by the fact I get back more responses than I am sending. This could be corrupting the transmission for more complex protocols like http. ARPING 159.69.172.28 from 192.168.100.2 enp41s0.4000 Unicast reply from 159.69.172.28 [A8A1599410:28] 2.768ms Unicast reply from 159.69.172.28 [A8A1599410:28] 0.878ms Unicast reply from 159.69.172.28 [A8A1599410:28] 4.089ms Unicast reply from 159.69.172.28 [A8A1599410:28] 0.888ms Unicast reply from 159.69.172.28 [A8A1599410:28] 13.128ms Unicast reply from 159.69.172.28 [A8A1599410:28] 0.930ms Unicast reply from 159.69.172.28 [A8A1599410:28] 5.925ms Sent 4 probes (1 broadcast(s)) Received 7 response(s) I just deleted the advertisement, rebooted my nodes, and recreated the L2 advertisement. Now I can see double the responses on two of the three nodes, and on the third nodes which actually has the IP, it isn't resolving

miniature-salesclerk-33951

08/17/2023, 8:38 PM

I think that makes sense. The assigned IP should listen on any nodes which have a MetalLB controller. It's possible that the metallb controllers were currently in a deployment rollout and the old pods hadn't been pruned off yet on two of them?

abundant-garden-4496

08/18/2023, 6:45 PM

Completely rebuilt the OS, and redeployed this time using Calico as the networking stack, to rule out Flannel with the Networking guys were convinced was the problem. Almost the same issue, but a little more interesting. So I added a new host into the mix, and repurposed one of the others. Before, each of the three hosts were in the same location, but each in a different data centre. The new three hosts are spread across 2 different data centres only. The two hosts in the same DC can ARPING to the LoadBalancer IP. The other host in a different DC cannot. This vSwitch is designed to allow you to connected all your servers together in the same VLAN, even in different countries. But MetalLB is not propagating any MAC address changes beyond the local DC switch, which is probably right. I've no idea how this can get resolved. I think I can request co-location, which would introduce a single point of failure, and may still not advertise the IP beyond the switch.

abundant-garden-4496

08/18/2023, 6:57 PM

Oh, this is actually a little more complex. The two hosts that can ARPING the address are in the same DC, but they are NOT in the same DC as where the address resolves to. The host that cannot ARPING the address is the one hosting it. Also, the other host not in the cluster, but in the same VLAN and a 3rd DC, also cannot ARPING them

miniature-salesclerk-33951

08/18/2023, 7:01 PM

I'm curious why you replaced flannel with Calico and not Cilium? With Cilium, you can remove kube-proxy from the equation, which might help.

miniature-salesclerk-33951

08/18/2023, 7:02 PM

There's a guide for k3s + cilium + metallb https://cilium.io/blog/2020/04/29/cilium-with-rancher-labs-k3s/

abundant-garden-4496

08/18/2023, 7:04 PM

I'm up for anything, so I will try that. I only did it to rule out Flannel. Is Flannel also using kube-proxy? Thanks for the guide. This is pretty painless now as it is all scripted via anisble, takes about 15 mins to have all servers built and all apps and infra deployed. If it actually worked that would be a bonus 🙂

abundant-garden-4496

08/18/2023, 7:07 PM

Oh, it uses Vagrant. Lol, something else to learm, why not 🙂

abundant-garden-4496

08/18/2023, 7:11 PM

Vagrant is only used to installed K3s by the looks of it, so I can probably ignore that and use Ansible. Should be pretty painless to switch over

abundant-garden-4496

08/18/2023, 7:43 PM

I've just noticed something in the MTR from my laptop: static.48.13.108.65.clients.your-server.de That last step it is going to is the correct host, that has that MAC. Is the MAC address for the frontend IP (65.x.x.x) going to be the same as the LoadBalancer IP being advertised? They are on the same NIC

abundant-garden-4496

08/19/2023, 5:06 PM

Holy Shi*t. I works using Cilium as the network stack. Maybe kube-proxy was the issue after all.

🎉 1

abundant-garden-4496

08/19/2023, 5:06 PM

Thanks for that suggestion!

867 Views

Open in Slack

Previous Next