This message was deleted.
# k3s
a
This message was deleted.
c
ClusterIP addresses are handled by kube-proxy, not the CNI
r
hmm
c
Unless the CNI includes a kube-proxy replacement, which flannel does not.
You won't see routes, just iptables entries
r
ok, thanks for the hint, I'll see if I can figure out what's amiss in iptables.
Here's the error I'm getting. The ip its trying to talk to is the service cidr range, first address (kubernetes api).
Copy code
$   kubectl logs kustomize-controller-666f8f4b5f-ppwhk --previous
{"level":"info","ts":"2023-06-06T18:40:59.283Z","logger":"controller-runtime.metrics","msg":"Metrics server is starting to listen","addr":":8080"}
{"level":"error","ts":"2023-06-06T18:41:29.308Z","logger":"setup","msg":"unable to create controller","controller":"kustomize-controller","error":"failed setting index fields: failed to get API group resources: unable to retrieve the complete list of server APIs: <http://kustomize.toolkit.fluxcd.io/v1|kustomize.toolkit.fluxcd.io/v1>: Get \"https://[fdde:91f3:4d41:3100::1]:443/apis/kustomize.toolkit.fluxcd.io/v1\": dial tcp [fdde:91f3:4d41:3100::1]:443: i/o timeout"}
When I checkout iptables, I see a rule:
Copy code
-A KUBE-SERVICES -d fdde:91f3:4d41:3100::1/128 -p tcp -m comment --comment "default/kubernetes:https cluster IP" -m tcp --dport 443 -j KUBE-SVC-NPX46M4PTMTKRN6Y
That rule says:
Copy code
-A KUBE-SVC-NPX46M4PTMTKRN6Y ! -s fdde:91f4:4d41:3000::/56 -d fdde:91f3:4d41:3100::1/128 -p tcp -m comment --comment "default/kubernetes:https cluster IP" -m tcp --dport 443 -j KUBE-MARK-MASQ
-A KUBE-SVC-NPX46M4PTMTKRN6Y -m comment --comment "default/kubernetes:https -> [2602:814:4000:3::15]:6443" -j KUBE-SEP-CTIXJTB53XIITEK7
and KUBE-MARQ-MASQ sets 0x4000:
Copy code
-A KUBE-MARK-MASQ -j MARK --set-xmark 0x4000/0x4000
Since this pod (kustomize controller) has a Pod IP inside the PodCIDR:
Copy code
IPs:
  IP:           10.42.3.8
  IP:           fdde:91f4:4d41:3003::8
It is getting ignored by NPX46M because the not source would fail. either I'm having a brainfart and it's obvious whats wrong, or there's something amiss here.
c
what are your cluster and service cidrs set to?
r
one sec and I'll grab the master commission from history
c
any other components you’ve replaced, or args you’ve added, would be good to call out as well
r
25 curl -sfL https://get.k3s.io | sh -s - --flannel-backend=wireguard-native --node-ip=:: --cluster-cidr=10.24.0.0/16,fdde91f44d413000:/56 --service-cidr=10.43.0.0/16,fdde91f34d413100:/112 --disable traefik --disable servicelb --disable local-storage --datastore-endpoint=postgres://k3s-prod:redacted@redacted:15432/redacted?sslmode=disable
I adjusted the line above to fix a typo in the datastore-endpoint call.
c
that all looks normal. Is the pod in question running on a server, or on an agent node?
r
on an agent node.
I have a hunch as to what's wrong -- that ula prefix is also partially routable inside the network. I'm replacing both podcidr and servicecidr with completely isolated ULA prefixes
going to rebuild and see if behavior continues
c
wg uses a separate port for ipv4 and ipv6 tunnels, you might check that both are open on the off chance that’s causing problems
the routing overlap seems more likely though
r
Crap, that wasn't it. Same behavior. Created cluster with:
Copy code
curl -sfL <https://get.k3s.io> | sh -s - --flannel-backend=wireguard-native --cluster-cidr=fdc0:9e85:e99e::/56,10.42.0.0/16 --service-cidr=fd62:a374:67c2::/112,10.43.0.0/16 --node-ip=2602:814:4000:3::15,10.174.3.15 --node-external-ip=2602:814:4000:3::15,23.154.40.15  --disable traefik --disable servicelb --disable local-storage --datastore-endpoint=<postgres://k3s-prod>:redacted@redacted:15432/redacted?sslmode=disable
error from kustomize controller:
Copy code
$   kubectl logs kustomize-controller-666f8f4b5f-6n986 --previous
{"level":"info","ts":"2023-06-06T20:58:30.895Z","logger":"controller-runtime.metrics","msg":"Metrics server is starting to listen","addr":":8080"}
{"level":"error","ts":"2023-06-06T20:59:00.900Z","logger":"setup","msg":"unable to create controller","controller":"kustomize-controller","error":"failed setting index fields: failed to get API group resources: unable to retrieve the complete list of server APIs: <http://kustomize.toolkit.fluxcd.io/v1|kustomize.toolkit.fluxcd.io/v1>: Get \"https://[fd62:a374:67c2::1]:443/apis/kustomize.toolkit.fluxcd.io/v1\": dial tcp [fd62:a374:67c2::1]:443: i/o timeout"}
rules:
Copy code
-A KUBE-SERVICES -d fd62:a374:67c2::1/128 -p tcp -m comment --comment "default/kubernetes:https cluster IP" -m tcp --dport 443 -j KUBE-SVC-NPX46M4PTMTKRN6Y
-A KUBE-SVC-NPX46M4PTMTKRN6Y ! -s fdc0:9e85:e99e::/56 -d fd62:a374:67c2::1/128 -p tcp -m comment --comment "default/kubernetes:https cluster IP" -m tcp --dport 443 -j KUBE-MARK-MASQ
using tcpdump I do see traffic attempting from fdc0:: (podcidr) going to the service cidr ip (its a tcp syn to 443.)
regarding wireguard, all four nodes show connectivity to all the other nodes, so I think wireguard line proto is up at least
Is there a way I can see the running flannel config file?
oh I see it, /var/lib/rancher/k3s/agent/etc/flannel/net-conf.json
You know whats weird? The only packets I see on flannel-wg-v6 are going to a prefix that isn't in the wg allowed ips.
c
is wireguard using the wrong IPs for the tunnels?
r
22:22:31.993897 IP6 2602:814:4000:3::f203.10250 > fdc0:9e85:e99e::2.41458: Flags [S.], seq 3442809166, ack 2077726981, win 64704, options [mss 1360,sackOK,TS val 2234894331 ecr 914303222,nop,wscale 7], length 0
Copy code
root@tpi2n4:/var/lib/rancher/k3s/agent/etc/flannel# wg show all dump | grep e99e
flannel-wg-v6   qs8Zk8eg1/RVM0O/TNKcMaPkvEsv7HVtvpDae7pSpm0=    (none)  [2602:814:4000:3:7c18:f6ff:feff:f872]:51821     fdc0:9e85:e99e:1::/64   1686086490      6000    1744    25
flannel-wg-v6   jUhUEL2XdIujWzegVZ3wMasDCUo7HLPYmuZT7lUmsxc=    (none)  [2602:814:4000:3:cc73:97ff:feed:86ee]:51821     fdc0:9e85:e99e:3::/64   1686086549      2900    5212    25
flannel-wg-v6   Ny1UczukarE44YGyuLlymHVRW6HVCqewEXK0GcVULHg=    (none)  [2602:814:4000:3:a81d:89ff:fe59:f2]:51821       fdc0:9e85:e99e:6::/64   1686086453      2944    4384    25
flannel-wg-v6   /s8psXkYy60uqydwl8J1HWszgr0O20NDsst7VcVzy14=    (none)  [2602:814:4000:3:f00f::de75]:51821      fdc0:9e85:e99e::/64     1686086476      7216    164432  25
and on the master:
Copy code
root@k3smstr-202306:~# ip -6 a l |grep e99e
    inet6 fdc0:9e85:e99e::/128 scope global
    inet6 fdc0:9e85:e99e::1/64 scope global
... what's
::2
? It seems like it'd be on the master node.
and to be clear, this is a brand new ula generator prefix that is nowhere else on my network -- I just plopped it into k3s to provision without any network setup
c
you can do
kubectl get service -A -o wide
to see what those service ClusterIPs are for
kubernetes is always +1 in the range, and dns is +10
for pods, the node local IPAM just allocates them sequentially out of the range given to the host. You should be able to find those in the pod list.
it is odd that there appear to be missing peers though?
r
thats the thing though....
Copy code
$   kubectl get services -o wide -A
NAMESPACE     NAME                      TYPE        CLUSTER-IP             EXTERNAL-IP   PORT(S)                  AGE   SELECTOR
default       kubernetes                ClusterIP   fd62:a374:67c2::1      <none>        443/TCP                  52m   <none>
kube-system   kube-dns                  ClusterIP   10.43.0.10             <none>        53/UDP,53/TCP,9153/TCP   51m   k8s-app=kube-dns
kube-system   metrics-server            ClusterIP   fd62:a374:67c2::bf29   <none>        443/TCP                  51m   k8s-app=metrics-server
flux-system   notification-controller   ClusterIP   fd62:a374:67c2::5202   <none>        80/TCP                   31m   app=notification-controller
flux-system   source-controller         ClusterIP   fd62:a374:67c2::e97a   <none>        80/TCP                   31m   app=source-controller
flux-system   webhook-receiver          ClusterIP   fd62:a374:67c2::1f8    <none>        80/TCP                   31m   app=notification-controller
⎈ june-2023/flux-system  ~/
  $   kubectl get nodes -o wide -A
NAME             STATUS   ROLES                  AGE   VERSION        INTERNAL-IP             EXTERNAL-IP           OS-IMAGE                         KERNEL-VERSION   CONTAINER-RUNTIME
tpi2n2           Ready    <none>                 48m   v1.26.5+k3s1   2602:814:4000:3::f202   <none>                Debian GNU/Linux 11 (bullseye)   6.1.14           <containerd://1.7.1-k3s1>
tpi2n4           Ready    <none>                 48m   v1.26.5+k3s1   2602:814:4000:3::f204   <none>                Debian GNU/Linux 11 (bullseye)   6.2.14           <containerd://1.7.1-k3s1>
tpi2n1           Ready    <none>                 48m   v1.26.5+k3s1   2602:814:4000:3::f201   <none>                Debian GNU/Linux 11 (bullseye)   6.1.14           <containerd://1.7.1-k3s1>
tpi2n3           Ready    <none>                 48m   v1.26.5+k3s1   2602:814:4000:3::f203   <none>                Debian GNU/Linux 11 (bullseye)   6.1.14           <containerd://1.7.1-k3s1>
k3smstr-202306   Ready    control-plane,master   52m   v1.26.5+k3s1   2602:814:4000:3::15     2602:814:4000:3::15   Debian GNU/Linux 11 (bullseye)   6.1.21-v8+       <containerd://1.7.1-k3s1>
c
Why do you have peers for 0, 1, 3, 6
there would not normally be gaps in the ranges like that
r
I cannot answer that, sadly, I'm not sure.
c
are you wiping the datastore between tests?
r
deleting the pgsql database, yes. If you try to rebootstrap a provisioned database it will fail to start.
c
… unless you pass a token, which is what you’re expected to do if you want to reuse the DB
but in this case wiping it is good
r
Right, I mean, when I completely delete the cluster and reprovision, I run
k3s-agent-uninstall.sh
on all agents, run
k3s-uninstall.sh
on the master, delete the pgsql datastore, create new database, run the k3s server install, then use
k3s token create
to make a join token
all four nodes are using the same join token -- that's ok, right?
c
yeah thats fine
if you start your server with the same --token you can still use
k3s token create
to make join tokens for the agents. It’ll just keep the DB from erroring out when you uninstall/reinstall but don’t wipe the DB
hmm, the server has an external IP but the agents do not? And the external IP is the same as the internal IP?
You might see if dropping that makes things less confused
r
hmm, ok, I'll try that. I have to leave for an appointment but I'll try it tomorrow and letcha know how it works out. Thanks for the help!
caught this in the latest iteration, ha
Copy code
Jun 07 03:08:42 k3smstr-202306 k3s[50185]: E0607 03:08:42.288556   50185 fieldmanager.go:210] "[SHOULD NOT HAPPEN] failed to update managedFields" err="failed to convert new object (/k3smstr-202306; /v1, Kind=Node) to smd typed: .status.addresses: duplicate entries for key [type=\"InternalIP\"]" VersionKind="/, Kind=" namespace="" name="k3smstr-202306"
every time we say something "should not happen" -- surprise!
I rebuilt using vxlan this morning instead of wireguard-native and everything's working just fine now. There may be some issue with wireguard. If there's someone on the dev team that wants to dig deeper into this let me know, I'm willing to delete/recreate a few more times before I start putting workloads on.
430 Views