adamant-kite-43734
05/17/2024, 8:58 PMcreamy-pencil-82913
05/17/2024, 9:04 PMquiet-memory-19288
05/17/2024, 9:05 PMcreamy-pencil-82913
05/17/2024, 9:05 PMcreamy-pencil-82913
05/17/2024, 9:06 PMquiet-memory-19288
05/17/2024, 9:52 PMquiet-memory-19288
05/17/2024, 9:54 PM- name: "Install k3s"
shell: >
curl -sfL <https://get.k3s.io> |
INSTALL_K3S_CHANNEL=v1.25.16+k3s2 sh -s - \
--disable=coredns,servicelb,traefik,local-storage,metrics-server \
--node-ip=10.146.0.254 \
--write-kubeconfig-mode 644 --flannel-iface=k3s_interface \
--tls-san=10.43.0.1 \
--log=error
become: true
tags: k3s,k3s_install
this is the old version of the install if that is interesting.quiet-memory-19288
05/17/2024, 9:54 PMquiet-memory-19288
05/17/2024, 10:15 PMtaint
messages on the pods:
May 17 22:04:30 hrr13 k3s[725]: I0517 22:04:30.698187 725 event.go:307] "Event occurred" object="joby-recorder/hrr-compress-7d49fc7cf5-5gs7w" fieldPath="" kind="Pod" apiVersion="" type="Normal" reason="TaintManagerEviction" message="Marking for deletio
n Pod joby-recorder/hrr-compress-7d49fc7cf5-5gs7w"
May 17 22:04:30 hrr13 k3s[725]: I0517 22:04:30.698233 725 event.go:307] "Event occurred" object="joby-recorder/hrr-record-1" fieldPath="" kind="Pod" apiVersion="" type="Normal" reason="TaintManagerEviction" message="Marking for deletion Pod joby-record
er/hrr-record-1"
May 17 22:04:30 hrr13 k3s[725]: I0517 22:04:30.698271 725 event.go:307] "Event occurred" object="joby-recorder/hrr-upload-notify-b75555d96-69phc" fieldPath="" kind="Pod" apiVersion="" type="Normal" reason="TaintManagerEviction" message="Marking for del
etion Pod joby-recorder/hrr-upload-notify-b75555d96-69phc"
May 17 22:04:30 hrr13 k3s[725]: I0517 22:04:30.697549 725 taint_manager.go:106] "NoExecuteTaintManager is deleting pod" pod="joby-recorder/hrr-record-0"
May 17 22:04:30 hrr13 k3s[725]: I0517 22:04:30.698466 725 event.go:307] "Event occurred" object="joby-recorder/hrr-system-7644d4dcc9-k4lbt" fieldPath="" kind="Pod" apiVersion="" type="Normal" reason="TaintManagerEviction" message="Marking for deletion
Pod joby-recorder/hrr-system-7644d4dcc9-k4lbt"
May 17 22:04:30 hrr13 k3s[725]: I0517 22:04:30.698510 725 event.go:307] "Event occurred" object="joby-recorder/hrr-api-tunnel-6d8bdbcfb8-dwdft" fieldPath="" kind="Pod" apiVersion="" type="Normal" reason="TaintManagerEviction" message="Marking for delet
ion Pod joby-recorder/hrr-api-tunnel-6d8bdbcfb8-dwdft"
May 17 22:04:30 hrr13 k3s[725]: I0517 22:04:30.698544 725 event.go:307] "Event occurred" object="joby-recorder/hrr-record-0" fieldPath="" kind="Pod" apiVersion="" type="Normal" reason="TaintManagerEviction" message="Marking for deletion Pod joby-record
er/hrr-record-0"
But nothing to say why its hung. And cant get un-hung in this state. There are a LOT of logs so I am probably missing things.
not much for containerd:
hrr@hrr13:~$ journalctl -u k3s | grep containerd
May 17 21:59:16 hrr13 k3s[725]: time="2024-05-17T21:59:16Z" level=info msg="Logging containerd to /var/lib/rancher/k3s/agent/containerd/containerd.log"
May 17 21:59:16 hrr13 k3s[725]: time="2024-05-17T21:59:16Z" level=info msg="Running containerd -c /var/lib/rancher/k3s/agent/etc/containerd/config.toml -a /run/k3s/containerd/containerd.sock --state /run/k3s/containerd --root /var/lib/rancher/k3s/agent/containerd"
May 17 21:59:17 hrr13 k3s[725]: time="2024-05-17T21:59:17Z" level=info msg="containerd is now running"
May 17 21:59:17 hrr13 k3s[725]: time="2024-05-17T21:59:17Z" level=info msg="Running kubelet --address=0.0.0.0 --anonymous-auth=false --authentication-token-webhook=true --authorization-mode=Webhook --cgroup-driver=systemd --client-ca-file=/var/lib/rancher/k3s/agent/client-ca.crt --cloud-provider=external --cluster-dns=10.43.0.10 --cluster-domain=cluster.local --container-runtime-endpoint=unix:///run/k3s/containerd/containerd.sock --containerd=/run/k3s/containerd/containerd.sock --eviction-hard=imagefs.available<5%,nodefs.available<5% --eviction-minimum-reclaim=imagefs.available=10%,nodefs.available=10% --fail-swap-on=false --feature-gates=CloudDualStackNodeIPs=true --healthz-bind-address=127.0.0.1 --hostname-override=hrr13 --kubeconfig=/var/lib/rancher/k3s/agent/kubelet.kubeconfig --node-ip=10.146.0.254 --node-labels= --pod-infra-container-image=rancher/mirrored-pause:3.6 --pod-manifest-path=/var/lib/rancher/k3s/agent/pod-manifests --read-only-port=0 --resolv-conf=/run/systemd/resolve/resolv.conf --serialize-image-pulls=false --tls-cert-file=/var/lib/rancher/k3s/agent/serving-kubelet.crt --tls-private-key-file=/var/lib/rancher/k3s/agent/serving-kubelet.key"
May 17 21:59:17 hrr13 k3s[725]: Flag --containerd has been deprecated, This is a cadvisor flag that was mistakenly registered with the Kubelet. Due to legacy concerns, it will follow the standard CLI deprecation timeline before being removed.
May 17 21:59:17 hrr13 k3s[725]: I0517 21:59:17.929755 725 kuberuntime_manager.go:257] "Container runtime initialized" containerRuntime="containerd" version="v1.7.11-k3s2" apiVersion="v1"
May 17 21:59:17 hrr13 k3s[725]: E0517 21:59:17.974789 725 cri_stats_provider.go:448] "Failed to get the info of the filesystem with mountpoint" err="unable to find data in memory cache" mountpoint="/var/lib/rancher/k3s/agent/containerd/io.containerd.snapshotter.v1.overlayfs"
hrr@hrr13:~$
quiet-memory-19288
05/17/2024, 10:16 PMri_stats_provider.go:448] "Failed to get the info of the filesystem with mountpoint" err="unable to find data in memory cache"
quiet-memory-19288
05/17/2024, 10:42 PMbefore
we rebooted.
So my idea is, despite the system rename, since we didnt restart clean, k3s still called its node the original name and then after restart it came up with the new name and is like WTF!?
And maybe the old version 1.25 doesnt track the node name the same?
or maybe the old k3s-uninstall.sh was cleaner and now I need to do something more extreem so it know its node name is ‘new’’?
Any of that sound reasonable/possible @creamy-pencil-82913?creamy-pencil-82913
05/18/2024, 4:32 AM