This message was deleted Rancher Users #k3s

Join Slack

This message was deleted.

# k3s

adamant-kite-43734

12/08/2022, 11:05 PM

This message was deleted.

creamy-pencil-82913

12/09/2022, 12:04 AM

because with ipv6 a /48 would be huuuuuuuuuge

creamy-pencil-82913

12/09/2022, 12:05 AM

and the controller-manager has to store a bitmask of all the allocated IPs in that range. With something larger than a /108 the bitmask is like, several megs of memory.

creamy-hospital-75658

12/09/2022, 12:06 AM

It's really not all that huge if you follow the actual specifications... It's also the smallest range you can reliably announce over BGP...

creamy-hospital-75658

12/09/2022, 12:07 AM

And the issue with several megs of memory... in 2022... is what?

creamy-pencil-82913

12/09/2022, 12:08 AM

You know how big IPv6 is right

creamy-hospital-75658

12/09/2022, 12:08 AM

Yes...

creamy-pencil-82913

12/09/2022, 12:09 AM

a /16 with IPv4 is 2^16=65536 4 byte addresses. A /48 with IPv6 is 2^80=1208925819614629174706176 128 bit addresses

creamy-hospital-75658

12/09/2022, 12:10 AM

That's completely ignoring everything about the ipv6 specification on subnet allocations...

creamy-pencil-82913

12/09/2022, 12:10 AM

Why would you need 2^80 services in your cluster

creamy-pencil-82913

12/09/2022, 12:11 AM

This isn’t even something that gets routed outside the cluster, it is literally just for the addresses assigned for ClusterIP services

creamy-pencil-82913

12/09/2022, 12:11 AM

and the IPAM needs to keep track of every single one of those allocations

creamy-hospital-75658

12/09/2022, 12:11 AM

Each L2 as an example should have a unique /64. Because of how kubernetes work, that for the most part results in that each pod should have a /64 by itself...

creamy-pencil-82913

12/09/2022, 12:12 AM

You’re talking about pod cidrs, but you asked about the service-cluster-ip-range which is something else entirely

creamy-pencil-82913

12/09/2022, 12:12 AM

service-cluster-ip-range controls the CIDR block allocated for ClusterIP services.

creamy-hospital-75658

12/09/2022, 12:12 AM

And yes that's for ClusterIP services... Which every other service consist of... So even if you do a LoadBalancer type, it will still have a ClusterIP

creamy-pencil-82913

12/09/2022, 12:12 AM

Yep

creamy-hospital-75658

12/09/2022, 12:13 AM

So it's NOT just ClusterIP services... It's ALL services...

creamy-pencil-82913

12/09/2022, 12:13 AM

Are you really going to have 1.2 million billion billion ClusterIPs?

creamy-pencil-82913

12/09/2022, 12:13 AM

And I’m not sure what BGP announcements have to do with it, like I said these aren’t routed outside the cluster

creamy-pencil-82913

12/09/2022, 12:14 AM

they just exist as the target of KubeProxy rules

creamy-pencil-82913

12/09/2022, 12:15 AM

And either way arguing with me won’t help, the Kubernetes authors decided it would be ludicrous to ever need more than 2^20 ClusterIPs

creamy-hospital-75658

12/09/2022, 12:15 AM

Again... If you follow the specifications, each of them would have a /64 themselves because they're different L2s. So it's not billions. A full /48 is 64k total...

creamy-pencil-82913

12/09/2022, 12:15 AM

No, they don’t

creamy-pencil-82913

12/09/2022, 12:15 AM

Each node does not allocate out of a range of ClusterIPs

creamy-pencil-82913

12/09/2022, 12:15 AM

You are confusing that with pod IPs

creamy-hospital-75658

12/09/2022, 12:15 AM

And they are routed outside if you use things like Calico as the CNI

creamy-pencil-82913

12/09/2022, 12:15 AM

ClusterIP services are centrally allocated out of the controller manager

creamy-pencil-82913

12/09/2022, 12:15 AM

You are again confusing that with pod IPs

creamy-hospital-75658

12/09/2022, 12:16 AM

Again, you're ignoring that ClusterIP is used for ALL services, regardless if they're ClusterIP or LoadBalancer or whatever.

creamy-pencil-82913

12/09/2022, 12:16 AM

How am I?

creamy-pencil-82913

12/09/2022, 12:16 AM

They are not allocated per node, they are not routed outside the cluster

creamy-pencil-82913

12/09/2022, 12:17 AM

Only pod IP CIDRs are sub-allocated for nodes

creamy-hospital-75658

12/09/2022, 12:17 AM

You keep pointing to ClusterIP services... It's all services, not just ClusterIP services, because all services have a ClusterIP and this limitation limits all of it.

creamy-pencil-82913

12/09/2022, 12:18 AM

If you want to go argue with the Kubernetes maintainers why you need to have more than 1048576 services in a cluster you are welcome to do so

creamy-hospital-75658

12/09/2022, 12:19 AM

What I want is that specs are followed and that the assigned range is announceable.

creamy-hospital-75658

12/09/2022, 12:20 AM

But so what you're saying it's an upstreams limitation that would break compatibility?

creamy-hospital-75658

12/09/2022, 12:20 AM

Or upstreams is the wrong word here since it's not a fork but still

creamy-pencil-82913

12/09/2022, 12:21 AM

It is a limitation enforced by the Kubernetes controller manager itself. The service IPAM will refuse to start if you try to get it to track an ipv6 CIDR larger than /108

creamy-hospital-75658

12/09/2022, 12:22 AM

Right but don't k3s has its own controller manager?

creamy-pencil-82913

12/09/2022, 12:22 AM

No, we just run the Kubernetes apiserver, controller-manager, scheduler, and so on

creamy-pencil-82913

12/09/2022, 12:22 AM

We run other things in addition to that, but we also use all of the core upstream stuff

creamy-hospital-75658

12/09/2022, 12:23 AM

Ah right. So then yea it's them that I need to ask I guess.

creamy-pencil-82913

12/09/2022, 12:24 AM

As I said earlier, announcing the service CIDR is not a logical thing to do anyway, the service IPs don’t actually exist anywhere, they are just the target of iptables rules managed by kube-proxy. You will never find anything that actually listens on or handles traffic to those IPs, and you are not expected to pass traffic to those IPs except from within the cluster.

creamy-pencil-82913

12/09/2022, 12:25 AM

That’s why they’re called cluster IPs, you’re not SUPPOSED to get at them from outside the cluster. I highly doubt upstream will want to change that.

creamy-hospital-75658

12/09/2022, 12:26 AM

That is true for ClusterIP... It's not true for as an example LoadBalancer which is explicitly designed to be exposed.

creamy-pencil-82913

12/09/2022, 12:26 AM

kube-proxy fakes those IPs with iptables or lvs, some other CNIs will do the same with ebpf… but they don’t really exist anywhere

creamy-hospital-75658

12/09/2022, 12:26 AM

And Calico does it with... BGP...

creamy-pencil-82913

12/09/2022, 12:27 AM

Calico routes traffic between pods with BGP, not cluster IPs. As far as I know.

creamy-hospital-75658

12/09/2022, 12:28 AM

You have seriously only just scrateched the surface of calico then... It does NOT just do traffic between pods... It also handles services, and external peering, both with the underlaying network on the nodes, but you can also do BGP peering with external service providers.

creamy-hospital-75658

12/09/2022, 12:31 AM

And ofc, while I could do a /108 as service range, and announce the full /48... The only thing that accomplishes is that now there's 60 bits that are completely dead space, completely negating one of the goals in ipv6 which was obfuscating using big ranges so that you can't just guess the IP of a service... That's why the smallest subnet size is /64

creamy-pencil-82913

12/09/2022, 12:33 AM

If you’re using Calico then you should be using their pod and service IPAMs and not the core Kubernetes one anyways? So the limitation doesn’t even affect you.

creamy-pencil-82913

12/09/2022, 12:34 AM

They have their own thing for managing blocks that doesn’t use the kubernetes built-in IPAM at all

creamy-hospital-75658

12/09/2022, 12:34 AM

Hm? Don't those settings have to match? I've had issues in the past when I used Weave when it didn't match 😕

creamy-pencil-82913

12/09/2022, 12:35 AM

https://projectcalico.docs.tigera.io/networking/get-started-ip-addresses

creamy-pencil-82913

12/09/2022, 12:36 AM

I guess I’ve not tried to use it as the service IPAM instead, just as the node IPAM

creamy-pencil-82913

12/09/2022, 12:37 AM

I see that Tigera does have a blog post up about advertising the service ip range via BGP but I note that they don’t talk about doing it for ipv6, just ipv4. https://www.tigera.io/blog/advertising-kubernetes-service-ips-with-calico-and-bgp/

creamy-hospital-75658

12/09/2022, 12:38 AM

Yea because you can't announce a /108... There's no provider that would peer with you with that size and most would probably outright block further communication if you even tried it 🙂

creamy-pencil-82913

12/09/2022, 12:39 AM

This seems really dumb though, it just ECMPs the cluster cidr across all the nodes, it doesn’t even try to steer traffic to the nodes actually hosting the endpoints for the service.

creamy-pencil-82913

12/09/2022, 12:39 AM

so it still has to go through iptables or whatnot and then get bounced over to the pod

creamy-hospital-75658

12/09/2022, 12:41 AM

It doesn't need to. Calico does a VXLAN mesh that it routes all traffic over so regardless of which node the pod is running on, it'll still reach it..

creamy-pencil-82913

12/09/2022, 12:41 AM

I am still not sure why you would want to let things directly connect to clusterip services from outside the cluster anyway

creamy-pencil-82913

12/09/2022, 12:42 AM

yes, but what even is the point of pushing this routing outside the cluster if it still has to bounce around between the nodes.

creamy-pencil-82913

12/09/2022, 12:42 AM

exposing clusterips outside the cluster is considered an antipattern anyway

creamy-pencil-82913

12/09/2022, 12:42 AM

you’re intended to use loadbalancers and/or ingress to get into the cluster, and then clusterips within it.

creamy-hospital-75658

12/09/2022, 12:45 AM

Well, if I expose the pod network, well then there's no load balancing. That's exactly why load balance is used. But that still needs an IP and uses the service range.

creamy-hospital-75658

12/09/2022, 12:48 AM

But it seems calico indeed does not care at all about the settings defined during cluster setup. So I guess the problem is moot, even if I consider it a weird limitation 🙂

creamy-pencil-82913

12/09/2022, 12:51 AM

you’re just going to send all your traffic from outside the cluster directly to pods somehow and not use loadbalancers at all?

creamy-hospital-75658

12/09/2022, 12:55 AM

MetalLB

creamy-pencil-82913

12/09/2022, 1:56 AM

Yeah you don't need to advertise the ClusterIP for that, you can just get metallb to do the bgp peering. It will even do better than calico and specifically send traffic to nodes with endpoints for the service.

creamy-pencil-82913

12/09/2022, 1:57 AM

That would be a way better approach than trying to advertise your ClusterIP range with equal weight to all the nodes.

120 Views

Open in Slack

Previous Next