This message was deleted.
# k3s
a
This message was deleted.
c
The goal is to let hosts access cluster pods.
That’s kind of an anti-pattern. hosts outside the cluster should access pods through services or ingresses. You really don’t want things directly poking at pods.
b
Hi @creamy-pencil-82913, The thing is that when I deployed cert-manager, I got an error message:
Copy code
Not ready: Internal error occurred: failed calling webhook "<http://webhook.cert-manager.io|webhook.cert-manager.io>": failed to call webhook: Post "<https://cert-manager-webhook.cert-manager.svc:443/mutate?timeout=10s>": context deadline exceeded
I have applied a value
hostNetwork=true
that basically solved an issue. The same thing was with the
actions-runner-controller
. After I deployed it pods were stuck and the error was:
Copy code
failed calling webhook "mutate.runnerdeployment.actions.summerwind.dev ... context deadline exceeded
After
hostNetwork=true
- started to work correctly. Usually, it means that control-plane is somehow unable to access the service's cluster IP associated with the admission webhook server (of ARC) • Possible causes: (1) Running apiserver as a binary and we didn't make service cluster IPs available to the host network or (2)We're running the apiserver in the pod but our pod network (i.e. CNI plugin installation and config) is not good so our pods(like kube-apiserver) in the K8s control-plane nodes can't access ARC's admission webhook server pod(s) in probably data-plane nodes So, was looking for some fixes or points that I forgot to check and apply.
I forgot to mention that I'm interested only in the nodes/hosts of the cluster being able to reach the pods and services.
c
That should already work without any extra configuration. Any node in the cluster should be able to reach services in the cluster even from the host network namespace; that’s what kube-proxy handles.
Are you by any chance running the server nodes with --disable-agent?
Have you confirmed that you don’t have something blocking vxlan traffic between nodes?
b
Hmm, don't think that I have applied
--disable-agent
, I can get all cluster nodes (masters and agents) with
kubectl get nodes
. As well:
Copy code
25: cni0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default qlen 1000
    link/ether ae:a3:01:34:dd:21 brd ff:ff:ff:ff:ff:ff
    inet 10.42.0.1/24 brd 10.42.0.255 scope global cni0
       valid_lft forever preferred_lft forever

29: flannel.1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN group default
    link/ether 3e:f8:87:e3:bc:0a brd ff:ff:ff:ff:ff:ff
    inet 10.42.0.0/32 scope global flannel.1
       valid_lft forever preferred_lft forever
Have you confirmed that you don’t have something blocking vxlan traffic between nodes?
That's a good question, to be honest, didn't check that. Is there any guide on how to verify that nothing is blocking vxlan traffic between nodes?
c
if pods running on one node can’t access pods on other nodes, or if you can’t access a service when the pods for it are only running on other nodes, then something is preventing vxlan from working correctly
Some hosts are also affected by a kernel bug that corrupts vxlan packets, you might try the
ethtool
command mentioned here: https://github.com/flannel-io/flannel/issues/1279
❤️ 1
b
On the run, I have checked the firewall rules that were added. The port 8472 which is used for vxlan was whitelisted, but for TCP, not UDP. Perhaps that was a mistake, need to verify that. As well, definitely will try to use
ethtool
for debugging. Thank you! I will check and post updates here later on, so other community members would have a quick solution for such cases 😄
c
yeah that would do it, vxlan is udp
w
If you need the host to do something with pods Use a static pod
b
Hi guys, I found an issue related to misconfiguration. I was using k3s 1.24 while the host OS was RHEL 8.7, it's not supported: https://www.suse.com/suse-k3s/support-matrix/all-supported-versions/k3s-v1-24/ So, manually upgraded the cluster to version 1.26 using binary: https://www.suse.com/suse-k3s/support-matrix/all-supported-versions/k3s-v1-24/ Now, when I try to check logs of pod I get this:
Copy code
Defaulted container "lb-tcp-80" out of: lb-tcp-80, lb-tcp-443
Error from server: Get "<https://10.154.146.197:10250/containerLogs/kube-system/svclb-traefik-04de87bd-dfwxq/lb-tcp-80?follow=true>": proxy error from 127.0.0.1:6443 while dialing 10.154.146.197:10250, code 503: 503 Service Unavailable
Do you know what could cause such an error message?
c
Are you using Rancher, or just k3s by itself? Rancher doesn't support 1.25 or 1.26 yet. If 1.24 was working for you on EL8 I would probably have recommend just staying on that version.
b
Hi @creamy-pencil-82913, I'm using just k3s without Rancher. Very annoying error is bothering me for quite a long time, not sure where is the issue, but when I change the cert-manager
hostNetwork=true
this error disappears (it feels that something is wrong with my K3S network):
Copy code
Not ready: Internal error occurred: failed calling webhook "<http://webhook.cert-manager.io|webhook.cert-manager.io>": failed to call webhook: Post "<https://cert-manager-webhook.cert-manager.svc:443/mutate?timeout=10s>": context deadline exceeded
For a better understanding of what's going on, I decided to install K3S from scratch. I have installed the K3S server on a single node - all pods are up and running:
Copy code
NAMESPACE     NAME                                      READY   STATUS      RESTARTS   AGE   IP          NODE                   NOMINATED NODE   READINESS GATES
kube-system   coredns-5c6b6c5476-6n6kd                  1/1     Running     0          55m   10.42.0.7   <http://yc3134.danskenet.net|yc3134.danskenet.net>   <none>           <none>
kube-system   helm-install-traefik-42dgt                0/1     Completed   1          65m   10.42.0.6   <http://yc3134.danskenet.net|yc3134.danskenet.net>   <none>           <none>
kube-system   helm-install-traefik-crd-6jrsx            0/1     Completed   0          65m   10.42.0.5   <http://yc3134.danskenet.net|yc3134.danskenet.net>   <none>           <none>
kube-system   local-path-provisioner-5d56847996-dm797   1/1     Running     0          65m   10.42.0.3   <http://yc3134.danskenet.net|yc3134.danskenet.net>   <none>           <none>
kube-system   metrics-server-7b67f64457-rqnrp           1/1     Running     0          65m   10.42.0.2   <http://yc3134.danskenet.net|yc3134.danskenet.net>   <none>           <none>
kube-system   svclb-traefik-fa4448fb-xlm9c              2/2     Running     0          54m   10.42.0.8   <http://yc3134.danskenet.net|yc3134.danskenet.net>   <none>           <none>
kube-system   traefik-56b8c5fb5c-lh4kn                  1/1     Running     0          54m   10.42.0.9   <http://yc3134.danskenet.net|yc3134.danskenet.net>   <none>           <none>
Then I started installing K3S on the second master node and the installation failed.
Copy code
Job for k3s.service failed because the control process exited with error code.
See "systemctl status k3s.service" and "journalctl -xe" for details.
I was trying to telnet on both nodes: So, on the first node (master1) :
Copy code
telnet 10.154.146.193 2379
Trying 10.154.146.193...
Connected to 10.154.146.193.
Escape character is '^]'.
^CConnection closed by foreign host.
On master2:
Copy code
telnet 10.154.146.193 2379
Trying 10.154.146.193...
telnet: connect to address 10.154.146.193: No route to host
Didn't have this issue with 1.24. As well, in our internal system, all ports that are required for K3S are opened. Perhaps something is wrong with K3S iptables? Is there any possibility to open all ports for subnets?
journalctl -xe:
Copy code
Mar 27 15:34:53 <http://ed3137.edkanet.com|ed3137.edkanet.com> k3s[1492151]: time="2023-03-27T15:34:53+02:00" level=info msg="Managed etcd cluster not yet initialized"
Mar 27 15:34:54 <http://ed3137.edkanet.com|ed3137.edkanet.com> k3s[1492151]: time="2023-03-27T15:34:54+02:00" level=warning msg="Cluster CA certificate is not trusted by the host CA bundle, but the token does not include a CA hash. Use the full token from the server's node-token file to enable Cluster CA validation."
Mar 27 15:34:54 <http://ed3137.edkanet.com|ed3137.edkanet.com> k3s[1492151]: time="2023-03-27T15:34:54+02:00" level=info msg="Reconciling bootstrap data between datastore and disk"
Mar 27 15:34:54 <http://ed3137.edkanet.com|ed3137.edkanet.com> k3s[1492151]: time="2023-03-27T15:34:54+02:00" level=info msg=start
Mar 27 15:34:54 <http://ed3137.edkanet.com|ed3137.edkanet.com> k3s[1492151]: time="2023-03-27T15:34:54+02:00" level=info msg="schedule, now=2023-03-27T15:34:54+02:00, entry=1, next=2023-03-28T00:00:00+02:00"
Mar 27 15:34:54 <http://ed3137.edkanet.com|ed3137.edkanet.com> k3s[1492151]: time="2023-03-27T15:34:54+02:00" level=info msg="Tunnel server egress proxy mode: agent"
Mar 27 15:34:54 <http://ed3137.edkanet.com|ed3137.edkanet.com> k3s[1492151]: time="2023-03-27T15:34:54+02:00" level=info msg="Tunnel server egress proxy waiting for runtime core to become available"
Mar 27 15:34:54 <http://ed3137.edkanet.com|ed3137.edkanet.com> k3s[1492151]: time="2023-03-27T15:34:54+02:00" level=info msg="Running kube-apiserver --advertise-port=6443 --allow-privileged=true --anonymous-auth=false --api-audiences=<https://kubernetes.default.svc.cluster.local>,k3s --authorization-mode=Node,RBAC --bind-address=127.0.0.1 --cert-dir=/var/lib/rancher/k3s/server/tls/temporary-certs --client-ca-file=/var/lib/rancher/k3s/server/tls/client-ca.crt --egress-selector-config-file=/var/lib/rancher/k3s/server/etc/egress-selector-config.yaml --enable-admission-plugins=NodeRestriction --enable-aggregator-routing=true --enable-bootstrap-token-auth=true --etcd-cafile=/var/lib/rancher/k3s/server/tls/etcd/server-ca.crt --etcd-certfile=/var/lib/rancher/k3s/server/tls/etcd/client.crt --etcd-keyfile=/var/lib/rancher/k3s/server/tls/etcd/client.key --etcd-servers=<https://127.0.0.1:2379> --feature-gates=JobTrackingWithFinalizers=true --kubelet-certificate-authority=/var/lib/rancher/k3s/server/tls/server-ca.crt --kubelet-client-certificate=/var/lib/rancher/k3s/server/tls/client-kube-apiserver.crt --kubelet-client-key=/var/lib/rancher/k3s/server/tls/client-kube-apiserver.key --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname --profiling=false --proxy-client-cert-file=/var/lib/rancher/k3s/server/tls/client-auth-proxy.crt --proxy-client-key-file=/var/lib/rancher/k3s/server/tls/client-auth-proxy.key --requestheader-allowed-names=system:auth-proxy --requestheader-client-ca-file=/var/lib/rancher/k3s/server/tls/request-header-ca.crt --requestheader-extra-headers-prefix=X-Remote-Extra- --requestheader-group-headers=X-Remote-Group --requestheader-username-headers=X-Remote-User --secure-port=6444 --service-account-issuer=<https://kubernetes.default.svc.cluster.local> --service-account-key-file=/var/lib/rancher/k3s/server/tls/service.key --service-account-signing-key-file=/var/lib/rancher/k3s/server/tls/service.current.key --service-cluster-ip-range=10.43.0.0/16 --service-node-port-range=30000-32767 --storage-backend=etcd3 --tls-cert-file=/var/lib/rancher/k3s/server/tls/serving-kube-apiserver.crt --tls-cipher-suites=TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305 --tls-private-key-file=/var/lib/rancher/k3s/server/tls/serving-kube-apiserver.key"
Mar 27 15:34:54 <http://ed3137.edkanet.com|ed3137.edkanet.com> k3s[1492151]: time="2023-03-27T15:34:54+02:00" level=info msg="Running kube-scheduler --authentication-kubeconfig=/var/lib/rancher/k3s/server/cred/scheduler.kubeconfig --authorization-kubeconfig=/var/lib/rancher/k3s/server/cred/scheduler.kubeconfig --bind-address=127.0.0.1 --kubeconfig=/var/lib/rancher/k3s/server/cred/scheduler.kubeconfig --profiling=false --secure-port=10259"
Mar 27 15:34:54 <http://ed3137.edkanet.com|ed3137.edkanet.com> k3s[1492151]: time="2023-03-27T15:34:54+02:00" level=info msg="Running kube-controller-manager --allocate-node-cidrs=true --authentication-kubeconfig=/var/lib/rancher/k3s/server/cred/controller.kubeconfig --authorization-kubeconfig=/var/lib/rancher/k3s/server/cred/controller.kubeconfig --bind-address=127.0.0.1 --cluster-cidr=10.42.0.0/16 --cluster-signing-kube-apiserver-client-cert-file=/var/lib/rancher/k3s/server/tls/client-ca.nochain.crt --cluster-signing-kube-apiserver-client-key-file=/var/lib/rancher/k3s/server/tls/client-ca.key --cluster-signing-kubelet-client-cert-file=/var/lib/rancher/k3s/server/tls/client-ca.nochain.crt --cluster-signing-kubelet-client-key-file=/var/lib/rancher/k3s/server/tls/client-ca.key --cluster-signing-kubelet-serving-cert-file=/var/lib/rancher/k3s/server/tls/server-ca.nochain.crt --cluster-signing-kubelet-serving-key-file=/var/lib/rancher/k3s/server/tls/server-ca.key --cluster-signing-legacy-unknown-cert-file=/var/lib/rancher/k3s/server/tls/server-ca.nochain.crt --cluster-signing-legacy-unknown-key-file=/var/lib/rancher/k3s/server/tls/server-ca.key --configure-cloud-routes=false --controllers=*,tokencleaner,-service,-route,-cloud-node-lifecedle --feature-gates=JobTrackingWithFinalizers=true --kubeconfig=/var/lib/rancher/k3s/server/cred/controller.kubeconfig --profiling=false --root-ca-file=/var/lib/rancher/k3s/server/tls/server-ca.crt --secure-port=10257 --service-account-private-key-file=/var/lib/rancher/k3s/server/tls/service.current.key --service-cluster-ip-range=10.43.0.0/16 --use-service-account-credentials=true"
Mar 27 15:34:54 <http://ed3137.edkanet.com|ed3137.edkanet.com> k3s[1492151]: time="2023-03-27T15:34:54+02:00" level=info msg="Running cloud-controller-manager --allocate-node-cidrs=true --authentication-kubeconfig=/var/lib/rancher/k3s/server/cred/cloud-controller.kubeconfig --authorization-kubeconfig=/var/lib/rancher/k3s/server/cred/cloud-controller.kubeconfig --bind-address=127.0.0.1 --cloud-config=/var/lib/rancher/k3s/server/etc/cloud-config.yaml --cloud-provider=k3s --cluster-cidr=10.42.0.0/16 --configure-cloud-routes=false --controllers=*,-route --kubeconfig=/var/lib/rancher/k3s/server/cred/cloud-controller.kubeconfig --leader-elect-resource-name=k3s-cloud-controller-manager --node-status-update-frequency=1m0s --profiling=false"
Mar 27 15:34:54 <http://ed3137.edkanet.com|ed3137.edkanet.com> k3s[1492151]: time="2023-03-27T15:34:54+02:00" level=info msg="Server node token is available at /var/lib/rancher/k3s/server/token"
Mar 27 15:34:54 <http://ed3137.edkanet.com|ed3137.edkanet.com> k3s[1492151]: time="2023-03-27T15:34:54+02:00" level=info msg="To join server node to cluster: k3s server -s <https://10.158.146.17:6443> -t ${SERVER_NODE_TOKEN}"
Mar 27 15:34:54 <http://ed3137.edkanet.com|ed3137.edkanet.com> k3s[1492151]: time="2023-03-27T15:34:54+02:00" level=info msg="Agent node token is available at /var/lib/rancher/k3s/server/agent-token"
Mar 27 15:34:54 <http://ed3137.edkanet.com|ed3137.edkanet.com> k3s[1492151]: time="2023-03-27T15:34:54+02:00" level=info msg="To join agent node to cluster: k3s agent -s <https://10.158.146.17:6443> -t ${AGENT_NODE_TOKEN}"
Mar 27 15:34:54 <http://ed3137.edkanet.com|ed3137.edkanet.com> k3s[1492151]: time="2023-03-27T15:34:54+02:00" level=info msg="Wrote kubeconfig /etc/rancher/k3s/k3s.yaml"
Mar 27 15:34:54 <http://ed3137.edkanet.com|ed3137.edkanet.com> k3s[1492151]: time="2023-03-27T15:34:54+02:00" level=info msg="Run: k3s kubectl"
Mar 27 15:34:54 <http://ed3137.edkanet.com|ed3137.edkanet.com> k3s[1492151]: time="2023-03-27T15:34:54+02:00" level=info msg="certificate CN=<http://ed3137.edkanet.com|ed3137.edkanet.com> signed by CN=k3s-server-ca@1679919549: notBefore=2023-03-27 12:19:09 +0000 UTC notAfter=2024-03-26 13:34:54 +0000 UTC"
Mar 27 15:34:54 <http://ed3137.edkanet.com|ed3137.edkanet.com> k3s[1492151]: time="2023-03-27T15:34:54+02:00" level=info msg="certificate CN=system:node:<http://ed3137.edkanet.com|ed3137.edkanet.com>,O=system:nodes signed by CN=k3s-client-ca@1679919549: notBefore=2023-03-27 12:19:09 +0000 UTC notAfter=2024-03-26 13:34:54 +0000 UTC"
Mar 27 15:34:54 <http://ed3137.edkanet.com|ed3137.edkanet.com> k3s[1492151]: time="2023-03-27T15:34:54+02:00" level=warning msg="Host resolv.conf includes loopback or multicast nameservers - kubelet will use autogenerated resolv.conf with nameserver 8.8.8.8"
Mar 27 15:34:54 <http://ed3137.edkanet.com|ed3137.edkanet.com> k3s[1492151]: time="2023-03-27T15:34:54+02:00" level=info msg="Module overlay was already loaded"
Mar 27 15:34:54 <http://ed3137.edkanet.com|ed3137.edkanet.com> k3s[1492151]: time="2023-03-27T15:34:54+02:00" level=info msg="Module nf_conntrack was already loaded"
Mar 27 15:34:54 <http://ed3137.edkanet.com|ed3137.edkanet.com> k3s[1492151]: time="2023-03-27T15:34:54+02:00" level=info msg="Module br_netfilter was already loaded"
Mar 27 15:34:54 <http://ed3137.edkanet.com|ed3137.edkanet.com> k3s[1492151]: time="2023-03-27T15:34:54+02:00" level=info msg="Module iptable_nat was already loaded"
Mar 27 15:34:54 <http://ed3137.edkanet.com|ed3137.edkanet.com> k3s[1492151]: time="2023-03-27T15:34:54+02:00" level=info msg="Module iptable_filter was already loaded"
Mar 27 15:34:54 <http://ed3137.edkanet.com|ed3137.edkanet.com> k3s[1492151]: time="2023-03-27T15:34:54+02:00" level=info msg="Using private registry config file at /etc/rancher/k3s/registries.yaml"
Mar 27 15:34:54 <http://ed3137.edkanet.com|ed3137.edkanet.com> k3s[1492151]: time="2023-03-27T15:34:54+02:00" level=warning msg="SELinux is enabled on this host, but k3s has not been started with --selinux - containerd SELinux support is disabled"
Mar 27 15:34:54 <http://ed3137.edkanet.com|ed3137.edkanet.com> k3s[1492151]: time="2023-03-27T15:34:54+02:00" level=info msg="Logging containerd to /var/lib/rancher/k3s/agent/containerd/containerd.log"
Mar 27 15:34:54 <http://ed3137.edkanet.com|ed3137.edkanet.com> k3s[1492151]: time="2023-03-27T15:34:54+02:00" level=info msg="Running containerd -c /var/lib/rancher/k3s/agent/etc/containerd/config.toml -a /run/k3s/containerd/containerd.sock --state /run/k3s/containerd --root /var/lib/rancher/k3s/agent/containerd"
Mar 27 15:34:55 <http://ed3137.edkanet.com|ed3137.edkanet.com> k3s[1492151]: time="2023-03-27T15:34:55+02:00" level=info msg="containerd is now running"
Mar 27 15:34:55 <http://ed3137.edkanet.com|ed3137.edkanet.com> k3s[1492151]: time="2023-03-27T15:34:55+02:00" level=info msg="Connecting to proxy" url="<wss://127.0.0.1:6443/v1-k3s/connect>"
Mar 27 15:34:55 <http://ed3137.edkanet.com|ed3137.edkanet.com> k3s[1492151]: time="2023-03-27T15:34:55+02:00" level=info msg="Running kubelet --address=0.0.0.0 --allowed-unsafe-sysctls=net.ipv4.ip_forward,net.ipv6.conf.all.forwarding --anonymous-auth=false --authentication-token-webhook=true --authorization-mode=Webhook --cgroup-driver=systemd --client-ca-file=/var/lib/rancher/k3s/agent/client-ca.crt --cloud-provider=external --cluster-dns=10.43.0.10 --cluster-domain=cluster.local --container-runtime-endpoint=unix:///run/k3s/containerd/containerd.sock --containerd=/run/k3s/containerd/containerd.sock --eviction-hard=imagefs.available<5%,nodefs.available<5% --eviction-minimum-reclaim=imagefs.available=10%,nodefs.available=10% --fail-swap-on=false --healthz-bind-address=127.0.0.1 --hostname-override=<http://ed3137.edkanet.com|ed3137.edkanet.com> --kubeconfig=/var/lib/rancher/k3s/agent/kubelet.kubeconfig --node-labels= --pod-infra-container-image=rancher/mirrored-pause:3.6 --pod-manifest-path=/var/lib/rancher/k3s/agent/pod-manifests --read-only-port=0 --resolv-conf=/var/lib/rancher/k3s/agent/etc/resolv.conf --serialize-image-pulls=false --tls-cert-file=/var/lib/rancher/k3s/agent/serving-kubelet.crt --tls-private-key-file=/var/lib/rancher/k3s/agent/serving-kubelet.key"
Mar 27 15:34:55 <http://ed3137.edkanet.com|ed3137.edkanet.com> k3s[1492151]: time="2023-03-27T15:34:55+02:00" level=info msg="Handling backend connection request [<http://ed3137.edkanet.com|ed3137.edkanet.com>]"
Mar 27 15:34:55 <http://ed3137.edkanet.com|ed3137.edkanet.com> k3s[1492151]: time="2023-03-27T15:34:55+02:00" level=info msg="Waiting to retrieve kube-proxy configuration; server is not ready: <https://127.0.0.1:6443/v1-k3s/readyz>: 500 Internal Server Error"
c
Make sure you have all the required ports open between nodes. Sounds like CNI traffic between nodes is getting blocked.
483 Views