This message was deleted.
# k3s
a
This message was deleted.
f
Just as a follow-up. These are the labels and annotations on the node:
Copy code
Labels:             <http://beta.kubernetes.io/arch=amd64|beta.kubernetes.io/arch=amd64>
                    <http://beta.kubernetes.io/instance-type=k3s|beta.kubernetes.io/instance-type=k3s>
                    <http://beta.kubernetes.io/os=linux|beta.kubernetes.io/os=linux>
                    <http://kubernetes.io/arch=amd64|kubernetes.io/arch=amd64>
                    <http://kubernetes.io/hostname=phchbs-st32018|kubernetes.io/hostname=phchbs-st32018>
                    <http://kubernetes.io/os=linux|kubernetes.io/os=linux>
                    <http://node-role.kubernetes.io/control-plane=true|node-role.kubernetes.io/control-plane=true>
                    <http://node-role.kubernetes.io/master=true|node-role.kubernetes.io/master=true>
                    <http://node.kubernetes.io/instance-type=k3s|node.kubernetes.io/instance-type=k3s>
Annotations:        <http://alpha.kubernetes.io/provided-node-ip|alpha.kubernetes.io/provided-node-ip>: <MY-IPV4>
                    <http://csi.volume.kubernetes.io/nodeid|csi.volume.kubernetes.io/nodeid>: {"<http://driver.longhorn.io|driver.longhorn.io>":"phchbs-st32018"}
                    <http://flannel.alpha.coreos.com/backend-data|flannel.alpha.coreos.com/backend-data>: {"VNI":1,"VtepMAC":"7e:65:80:f9:9f:cb"}
                    <http://flannel.alpha.coreos.com/backend-type|flannel.alpha.coreos.com/backend-type>: vxlan
                    <http://flannel.alpha.coreos.com/kube-subnet-manager|flannel.alpha.coreos.com/kube-subnet-manager>: true
                    <http://flannel.alpha.coreos.com/public-ip|flannel.alpha.coreos.com/public-ip>: <MY-IPV4>
                    <http://k3s.io/hostname|k3s.io/hostname>: phchbs-st32018
                    <http://k3s.io/internal-ip|k3s.io/internal-ip>: <MY-IPV4>
                    <http://k3s.io/node-args|k3s.io/node-args>:
                      ["server","--disable","traefik","--flannel-iface","ens160","--data-dir","/opt/k3s","--prefer-bundled-bin","--resolv-conf","/etc...
                    <http://k3s.io/node-config-hash|k3s.io/node-config-hash>: VWJ7GMRNIEVMZ5NDM7NJVQP7REBOJBVASIV4F273RFU4UY3GQXIA====
                    <http://k3s.io/node-env|k3s.io/node-env>:
                      {"K3S_DATA_DIR":"/opt/k3s/data/3fcd4fcf3ae2ba4d577d4ee08ad7092538cd7a7f0da701efa2a8807d44a25f66","K3S_KUBECONFIG_MODE":"644"}
                    <http://node.alpha.kubernetes.io/ttl|node.alpha.kubernetes.io/ttl>: 0
                    <http://volumes.kubernetes.io/controller-managed-attach-detach|volumes.kubernetes.io/controller-managed-attach-detach>: true
c
Did you upgrade directly from 1.26 to 1.29, or did you step through 1.27 and 1.28 first?
Have you configured dual-stack cluster-cidr and service-cidr ranges, or is the cluster ipv4 only?
f
I upgraded directly from 1.26 to 1.29. To upgrade I usually run
k3s-killall.sh
first, and then re-run the installer with the new k3s version desired. Should I go through each version instead? The cluster is ipv4 only and I checked from the k3s service logs (systemd) that only ipv4 CIDR ranges are passed as
--service-cluster-ip-range
or
--cluster-cidr
When attempting to upgrade to a new version of K3s, the Kubernetes version skew policy applies. Ensure that your plan does not skip intermediate minor versions when upgrading.
f
To give you a better overview, the node is behind a company proxy configured using the default
http_proxy/https_proxy
variables. Since the upgrade, it seems that all the networking operations are trying to resolve the proxy domain using ipv6 and that is failing. The workaround I found so far is to use the proxy IP directly instead of the domain, but I'm not sure it won't change in the future and I didn't have this issue before the upgrade, although I was using the same proxy.
c
I can’t say I’ve ever seen the outcome you’re describing due to skipping minors though.
What is the output of
ip addr
in the pods? Things running in pods should be smart enough to not try to connect to ipv6 addresses if they don’t have that address family available. However, you said you’re getting “could not resolve host” errors, not “could not connect to host”… which suggests that DNS lookups are failing?
f
This is
ip addr
from a random nginx pod in the cluster (sorry for the screenshot but I can't copy/paste on this node)
However, you said you’re getting “could not resolve host” errors, not “could not connect to host”… which suggests that DNS lookups are failing?
Yes, it's failing to resolve the domain of the HTTP proxy, but that doesn't happen if I force ipv4 for example running
curl -4 <http://myproxy.com>
. It seems that even for DNS lookups, it's trying to solve
AAAA
instead of
A
, and I don't have any
AAAA
entry for the proxy
Btw, from the results of
ip addr
it seems like the pod has an
ipv6
address as well, but why is it the case? I never configured the dual-stack here
c
This is all client behavior and not controlled by Kubernetes, but normally it will resolve both A and AAAA and then use whichever ones are appropriate for the available address families.
You might try adding this to /etc/sysctl.conf and rebooting?
Copy code
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
f
> This is all client behavior and not controlled by Kubernetes I see the same behaviour also for Kubernetes operations like image pulling though.
c
You said it wasn’t affecting things that run on the node itself?
image pulls happen outside kubernetes, in containerd. which runs on the node, not in a container.
if it is happening on the node itself, then you have an OS level problem
try those sysctls and see what happens
f
You're right, I was considering containerd as part of kubernetes itself but I get your point here. I confirm though that it doesn't happen on the OS, outside of containerd. The same commands I tested, such as
curl
or
ping
would fail in the pods but not on the node itself, and I get the same dns lookup errors in containerd when pulling images
try those sysctls and see what happens
I'll give it a try tomorrow and let you know. Thanks a lot for your help here, really appreciated!
@creamy-pencil-82913 just wanted to give you an update about this. In the end I restarted the server and I wasn't able to reproduce the issue anymore, regardless of the
sysctl
ipv6 configs. Still not sure what caused it in the first place, but it wasn't related to K3s in the end. Thanks again for your help 👍