This message was deleted Rancher Users #k3s

Join Slack

This message was deleted.

# k3s

adamant-kite-43734

09/26/2022, 4:23 PM

This message was deleted.

bland-account-99790

09/26/2022, 4:47 PM

What you get when executing

kubectl get nodes -o wide

aloof-article-40011

09/26/2022, 5:01 PM

Copy code

NAME       STATUS   ROLES    AGE    VERSION         INTERNAL-IP   EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION      CONTAINER-RUNTIME
node3      Ready    <none>   271d   v1.19.16+k3s1   10.0.1.4      10.0.1.4      Ubuntu 20.04.3 LTS   5.4.0-126-generic   <containerd://1.4.11-k3s1>
node2      Ready    <none>   271d   v1.19.16+k3s1   10.0.1.3      10.0.1.3      Ubuntu 20.04.3 LTS   5.4.0-126-generic   <containerd://1.4.11-k3s1>
node1      Ready    <none>   271d   v1.19.16+k3s1   10.0.1.2      10.0.1.2      Ubuntu 20.04.3 LTS   5.4.0-126-generic   <containerd://1.4.11-k3s1>
control1   Ready    master   271d   v1.19.16+k3s1   10.0.0.6      10.0.0.6      Ubuntu 20.04.3 LTS   5.4.0-126-generic   <containerd://1.4.11-k3s1>
control2   Ready    master   271d   v1.19.16+k3s1   10.0.0.5      10.0.0.5      Ubuntu 20.04.3 LTS   5.4.0-126-generic   <containerd://1.4.11-k3s1>
control3   Ready    master   271d   v1.19.16+k3s1   10.0.0.7      10.0.0.7      Ubuntu 20.04.3 LTS   5.4.0-126-generic   <containerd://1.4.11-k3s1>

aloof-article-40011

09/26/2022, 5:09 PM

Systemd service is running with this command on masters:

Copy code

k3s server --tls-san <http://gate.nellcorp.com|gate.nellcorp.com> \
--datastore-endpoint ${DB_URL} \
--flannel-backend=host-gw \
--token ${TOKEN} \
--advertise-address=${NODE_IP} \
--node-ip=${NODE_IP} \
--node-external-ip=${NODE_IP} \
--flannel-iface=ens10 \
--node-taint=k3s-controlplane=true:NoSchedule \
--private-registry=/home/ubuntu/.k3s/registries.yaml \
--kube-apiserver-arg=token-auth-file=${TOKEN_PATH} \
--kubelet-arg=cluster-dns=1.1.1.1 \
--kubelet-arg=cluster-domain=cluster.local

creamy-pencil-82913

09/26/2022, 5:20 PM

Why are you overriding the kubelet’s cluster-dns and cluster-domain settings?

creamy-pencil-82913

09/26/2022, 5:21 PM

also that is a very old and unsupported release of K3s. 1.22 is just about to go end-of-life, everything older than that has been unsupported for months if not longer.

aloof-article-40011

09/26/2022, 9:55 PM

Will update in a few. And was overriding while diagnosing the issue, as initially I assumed it was a dns issue. This was happening even without the override.

aloof-article-40011

09/27/2022, 8:38 AM

Cluster updated, still no cni0 on masters

Copy code

NAME       STATUS   ROLES                  AGE    VERSION        INTERNAL-IP   EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION      CONTAINER-RUNTIME
control3   Ready    control-plane,master   272d   v1.24.4+k3s1   10.0.0.7      10.0.0.7      Ubuntu 20.04.3 LTS   5.4.0-126-generic   <containerd://1.6.6-k3s1>
node1      Ready    <none>                 272d   v1.24.4+k3s1   10.0.1.2      10.0.1.2      Ubuntu 20.04.3 LTS   5.4.0-126-generic   <containerd://1.6.6-k3s1>
node2      Ready    <none>                 272d   v1.24.4+k3s1   10.0.1.3      10.0.1.3      Ubuntu 20.04.3 LTS   5.4.0-126-generic   <containerd://1.6.6-k3s1>
node3      Ready    <none>                 272d   v1.24.4+k3s1   10.0.1.4      10.0.1.4      Ubuntu 20.04.3 LTS   5.4.0-126-generic   <containerd://1.6.6-k3s1>
control1   Ready    control-plane,master   272d   v1.24.4+k3s1   10.0.0.6      10.0.0.6      Ubuntu 20.04.3 LTS   5.4.0-126-generic   <containerd://1.6.6-k3s1>
control2   Ready    control-plane,master   272d   v1.24.4+k3s1   10.0.0.5      10.0.0.5      Ubuntu 20.04.3 LTS   5.4.0-126-generic   <containerd://1.6.6-k3s1>

aloof-article-40011

09/27/2022, 8:43 AM

I guess my question is, is this expected behavior? Should master nodes not have cni0?

creamy-pencil-82913

09/27/2022, 9:49 AM

no, they should have the same CNI bits as the agents.

creamy-pencil-82913

09/27/2022, 9:50 AM

Can you start the server with --debug and post the k3s-server logs from startup onwards?

creamy-pencil-82913

09/27/2022, 9:50 AM

I suspect something is going wrong with your flannel config, although it’s odd that the nodes would be coming Ready with a broken CNI

aloof-article-40011

09/27/2022, 11:24 AM

@creamy-pencil-82913 I've removed logs from before enabling debug, let me know if this is enough. Thanks for helping!

aloof-article-40011

09/27/2022, 11:35 AM

Also, re: flannel config, I'm not really configuring it other than setting the flannel-backend to host-gw, and flannel-iface to the interface in each node that connects to all other nodes. I'm doing that bc I don't want any traffic to leave these 2 subnets.

aloof-article-40011

09/27/2022, 11:37 AM

Also, what I think could be the issue, is this:

Copy code

Sep 27 10:25:10 control1 k3s[8063]: time="2022-09-27T10:25:10Z" level=debug msg="Creating the CNI conf in directory /var/lib/rancher/k3s/agent/etc/cni/net.d"
Sep 27 10:25:10 control1 k3s[8063]: time="2022-09-27T10:25:10Z" level=debug msg="Creating the flannel configuration for backend host-gw in file /var/lib/rancher/k3s/agent/etc/flannel/net-conf.json"
Sep 27 10:25:10 control1 k3s[8063]: time="2022-09-27T10:25:10Z" level=debug msg="The flannel configuration is {\n\t\"Network\": \"10.42.0.0/16\",\n\t\"EnableIPv6\": false,\n\t\"EnableIPv4\": true,\n\t\"IPv6Network\": \"::/0\",\n\t\"Backend\": {\n\t\"Type\": \"host-gw\"\n}\n}\n"

So flannel config is being set in

/var/lib/rancher/k3s/agent

, but isn't this directory only read k3s in agent mode? As in, would the server not just ignore it?

aloof-article-40011

09/27/2022, 12:12 PM

And this is probably what's preventing the interface from setting up:

Copy code

Sep 27 12:03:17 control1 k3s[12756]: time="2022-09-27T12:03:17Z" level=info msg="Running flannel backend."
Sep 27 12:03:17 control1 k3s[12756]: I0927 12:03:17.173846   12756 route_network.go:55] Watching for new subnet leases
Sep 27 12:03:17 control1 k3s[12756]: I0927 12:03:17.191596   12756 route_network.go:92] Subnet added: 10.42.4.0/24 via 10.0.1.3
Sep 27 12:03:17 control1 k3s[12756]: E0927 12:03:17.192455   12756 route_network.go:167] Error adding route to {Ifindex: 3 Dst: 10.42.4.0/24 Src: <nil> Gw: 10.0.1.3 Flags: [] Table: 0 Realm: 0}: network is unreachable
Sep 27 12:03:17 control1 k3s[12756]: I0927 12:03:17.192769   12756 route_network.go:92] Subnet added: 10.42.5.0/24 via 10.0.1.4
Sep 27 12:03:17 control1 k3s[12756]: E0927 12:03:17.193238   12756 route_network.go:167] Error adding route to {Ifindex: 3 Dst: 10.42.5.0/24 Src: <nil> Gw: 10.0.1.4 Flags: [] Table: 0 Realm: 0}: network is unreachable
Sep 27 12:03:17 control1 k3s[12756]: I0927 12:03:17.193490   12756 route_network.go:92] Subnet added: 10.42.3.0/24 via 10.0.1.2
Sep 27 12:03:17 control1 k3s[12756]: E0927 12:03:17.193935   12756 route_network.go:167] Error adding route to {Ifindex: 3 Dst: 10.42.3.0/24 Src: <nil> Gw: 10.0.1.2 Flags: [] Table: 0 Realm: 0}: network is unreachable
Sep 27 12:03:17 control1 k3s[12756]: I0927 12:03:17.194213   12756 route_network.go:92] Subnet added: 10.42.1.0/24 via 65.21.182.167
Sep 27 12:03:17 control1 k3s[12756]: E0927 12:03:17.194817   12756 route_network.go:167] Error adding route to {Ifindex: 3 Dst: 10.42.1.0/24 Src: <nil> Gw: 65.21.182.167 Flags: [] Table: 0 Realm: 0}: network is unreachable
Sep 27 12:03:17 control1 k3s[12756]: I0927 12:03:17.195117   12756 route_network.go:92] Subnet added: 10.42.2.0/24 via 65.108.53.74
Sep 27 12:03:17 control1 k3s[12756]: E0927 12:03:17.195838   12756 route_network.go:167] Error adding route to {Ifindex: 3 Dst: 10.42.2.0/24 Src: <nil> Gw: 65.108.53.74 Flags: [] Table: 0 Realm: 0}: network is unreachable

It can't find a route to the podCIDR via the worker nodes. Which is strange, as I can ping any one of the nodes from the master

aloof-article-40011

09/27/2022, 12:47 PM

I also suspect this is due to the masters being in a different subnet and thus needing a gateway to reach the workers. But seems like workers setup routes on their own:

bland-account-99790

09/27/2022, 2:12 PM

If you use

host-gw

, all nodes (masters and workers) need to be in the same network, otherwise the routes will not be able to be created. That's the log you are getting

bland-account-99790

09/27/2022, 2:13 PM

why did you choose

host-gw

instead of the default

vxlan

aloof-article-40011

09/27/2022, 2:13 PM

Wanted simplicity and I had already setup an internal network across nodes. this is in hetzner so I setup a vswitch

bland-account-99790

09/27/2022, 2:14 PM

https://github.com/flannel-io/flannel/blob/master/Documentation/backends.md#host-gw

Use host-gw to create IP routes to subnets via remote machine IPs. Requires direct layer2 connectivity between hosts running flannel.

aloof-article-40011

09/27/2022, 2:15 PM

also I believe I got it to work, by deploying a pod in a master node, I had to add a toleration for

k3s-controlplane: true

on a test pod, and as soon as a pod was scheduled, cni0 suddenly appreared on the master node

aloof-article-40011

09/27/2022, 2:16 PM

I think this is very unintuitive and also defeats the point of master nodes no? I mean if I don't schedule a pod on them networking breaks?

bland-account-99790

09/27/2022, 2:18 PM

Don't you have pods like coredns or traeffik deployed in master?

aloof-article-40011

09/27/2022, 2:34 PM

no, those apparently also do not have tolerations for controlplane

bland-account-99790

09/27/2022, 4:29 PM

If you choose your own taints for the node, I guess you should change the toleration of the pods, right?

bland-account-99790

09/27/2022, 4:30 PM

I can see that the pods are expecting these taints:

Copy code

- effect: NoSchedule
    key: <http://node-role.kubernetes.io/control-plane|node-role.kubernetes.io/control-plane>
    operator: Exists
  - effect: NoSchedule
    key: <http://node-role.kubernetes.io/master|node-role.kubernetes.io/master>
    operator: Exists

aloof-article-40011

09/27/2022, 5:27 PM

ah, interesting, I believe the taint was changed here https://github.com/rancher/docs/issues/2707

aloof-article-40011

09/27/2022, 5:46 PM

rtfm I guess, --node-taint CriticalAddonsOnly=true:NoExecute
will try this

aloof-article-40011

09/27/2022, 5:58 PM

working! was using wrong taint, thank a lot @creamy-pencil-82913 and @bland-account-99790!

🙌 1

78 Views

Open in Slack

Previous Next