This message was deleted.
# k3s
a
This message was deleted.
c
you’d have to look at the logs (containerd.log and k3s log from journald) to see why they’re not getting deleted.
q
Ok, ty! I’ll check it out.
c
I see that you’re logging to a file called “error” so the relevant k3s logs might not be in journald…
is that on purpose?
q
Yeah, but I can remove it for this. Just a fyi, I reverted to the old version we had: k3s --version k3s version v1.25.16+k3s2 (c31827ee) go version go1.20.10 And we cant repo it at all. I’ll remove that line and test out on 1.28 again on a non-production system.
I am always feeling shaky without coredns, but we really do only run our own stuff and dont have containers talking to each other.
Copy code
- name: "Install k3s"
  shell: >
    curl -sfL <https://get.k3s.io> |
    INSTALL_K3S_CHANNEL=v1.25.16+k3s2 sh -s - \
    --disable=coredns,servicelb,traefik,local-storage,metrics-server \
    --node-ip=10.146.0.254 \
    --write-kubeconfig-mode 644 --flannel-iface=k3s_interface \
    --tls-san=10.43.0.1 \
    --log=error
  become: true
  tags: k3s,k3s_install
this is the old version of the install if that is interesting.
Maybe one of these system level deployments is needed in more modern k3s-s
ok, I dont see any errors related to k3s itself, but I do see the
taint
messages on the pods:
Copy code
May 17 22:04:30 hrr13 k3s[725]: I0517 22:04:30.698187     725 event.go:307] "Event occurred" object="joby-recorder/hrr-compress-7d49fc7cf5-5gs7w" fieldPath="" kind="Pod" apiVersion="" type="Normal" reason="TaintManagerEviction" message="Marking for deletio
n Pod joby-recorder/hrr-compress-7d49fc7cf5-5gs7w"
May 17 22:04:30 hrr13 k3s[725]: I0517 22:04:30.698233     725 event.go:307] "Event occurred" object="joby-recorder/hrr-record-1" fieldPath="" kind="Pod" apiVersion="" type="Normal" reason="TaintManagerEviction" message="Marking for deletion Pod joby-record
er/hrr-record-1"
May 17 22:04:30 hrr13 k3s[725]: I0517 22:04:30.698271     725 event.go:307] "Event occurred" object="joby-recorder/hrr-upload-notify-b75555d96-69phc" fieldPath="" kind="Pod" apiVersion="" type="Normal" reason="TaintManagerEviction" message="Marking for del
etion Pod joby-recorder/hrr-upload-notify-b75555d96-69phc"
May 17 22:04:30 hrr13 k3s[725]: I0517 22:04:30.697549     725 taint_manager.go:106] "NoExecuteTaintManager is deleting pod" pod="joby-recorder/hrr-record-0"
May 17 22:04:30 hrr13 k3s[725]: I0517 22:04:30.698466     725 event.go:307] "Event occurred" object="joby-recorder/hrr-system-7644d4dcc9-k4lbt" fieldPath="" kind="Pod" apiVersion="" type="Normal" reason="TaintManagerEviction" message="Marking for deletion 
Pod joby-recorder/hrr-system-7644d4dcc9-k4lbt"
May 17 22:04:30 hrr13 k3s[725]: I0517 22:04:30.698510     725 event.go:307] "Event occurred" object="joby-recorder/hrr-api-tunnel-6d8bdbcfb8-dwdft" fieldPath="" kind="Pod" apiVersion="" type="Normal" reason="TaintManagerEviction" message="Marking for delet
ion Pod joby-recorder/hrr-api-tunnel-6d8bdbcfb8-dwdft"
May 17 22:04:30 hrr13 k3s[725]: I0517 22:04:30.698544     725 event.go:307] "Event occurred" object="joby-recorder/hrr-record-0" fieldPath="" kind="Pod" apiVersion="" type="Normal" reason="TaintManagerEviction" message="Marking for deletion Pod joby-record
er/hrr-record-0"
But nothing to say why its hung. And cant get un-hung in this state. There are a LOT of logs so I am probably missing things. not much for containerd:
Copy code
hrr@hrr13:~$ journalctl -u k3s | grep containerd
May 17 21:59:16 hrr13 k3s[725]: time="2024-05-17T21:59:16Z" level=info msg="Logging containerd to /var/lib/rancher/k3s/agent/containerd/containerd.log"
May 17 21:59:16 hrr13 k3s[725]: time="2024-05-17T21:59:16Z" level=info msg="Running containerd -c /var/lib/rancher/k3s/agent/etc/containerd/config.toml -a /run/k3s/containerd/containerd.sock --state /run/k3s/containerd --root /var/lib/rancher/k3s/agent/containerd"
May 17 21:59:17 hrr13 k3s[725]: time="2024-05-17T21:59:17Z" level=info msg="containerd is now running"
May 17 21:59:17 hrr13 k3s[725]: time="2024-05-17T21:59:17Z" level=info msg="Running kubelet --address=0.0.0.0 --anonymous-auth=false --authentication-token-webhook=true --authorization-mode=Webhook --cgroup-driver=systemd --client-ca-file=/var/lib/rancher/k3s/agent/client-ca.crt --cloud-provider=external --cluster-dns=10.43.0.10 --cluster-domain=cluster.local --container-runtime-endpoint=unix:///run/k3s/containerd/containerd.sock --containerd=/run/k3s/containerd/containerd.sock --eviction-hard=imagefs.available<5%,nodefs.available<5% --eviction-minimum-reclaim=imagefs.available=10%,nodefs.available=10% --fail-swap-on=false --feature-gates=CloudDualStackNodeIPs=true --healthz-bind-address=127.0.0.1 --hostname-override=hrr13 --kubeconfig=/var/lib/rancher/k3s/agent/kubelet.kubeconfig --node-ip=10.146.0.254 --node-labels= --pod-infra-container-image=rancher/mirrored-pause:3.6 --pod-manifest-path=/var/lib/rancher/k3s/agent/pod-manifests --read-only-port=0 --resolv-conf=/run/systemd/resolve/resolv.conf --serialize-image-pulls=false --tls-cert-file=/var/lib/rancher/k3s/agent/serving-kubelet.crt --tls-private-key-file=/var/lib/rancher/k3s/agent/serving-kubelet.key"
May 17 21:59:17 hrr13 k3s[725]: Flag --containerd has been deprecated, This is a cadvisor flag that was mistakenly registered with the Kubelet. Due to legacy concerns, it will follow the standard CLI deprecation timeline before being removed.
May 17 21:59:17 hrr13 k3s[725]: I0517 21:59:17.929755     725 kuberuntime_manager.go:257] "Container runtime initialized" containerRuntime="containerd" version="v1.7.11-k3s2" apiVersion="v1"
May 17 21:59:17 hrr13 k3s[725]: E0517 21:59:17.974789     725 cri_stats_provider.go:448] "Failed to get the info of the filesystem with mountpoint" err="unable to find data in memory cache" mountpoint="/var/lib/rancher/k3s/agent/containerd/io.containerd.snapshotter.v1.overlayfs"
hrr@hrr13:~$
well that last line?
ri_stats_provider.go:448] "Failed to get the info of the filesystem with mountpoint" err="unable to find data in memory cache"
Fudge, I may have found it. We have a ‘master image’ with k3s installed on it on all the embedded systems. But when we get ready for production we change the host name and re-install k3s, but it looks like its not a clean uninstall: $ kubectl get pods -o json -n joby-recorder | jq -r ‘.items[] | .metadata.name + ” ” + .spec.nodeName’ hrr-record-0 mysystem999 hrr-compress-7d49fc7cf5-5gs7w mysystem999 hrr-api-tunnel-6d8bdbcfb8-dwdft mysystem999 hrr-upload-notify-b75555d96-69phc mysystem999 hrr-system-7644d4dcc9-k4lbt mysystem999 hrr-record-1 mysystem999 hrr-upload-notify-b75555d96-46kwc mysystem13 hrr-api-tunnel-6d8bdbcfb8-s5hfh mysystem13 hrr-compress-7d49fc7cf5-k9xfb mysystem13 hrr-system-7644d4dcc9-rz5r7 mysystem13 In this example, all the pods belonging to this old node mysystem999 are tainted forever and cant go away. The odd thing is, the system rename happend before we deployed anything, but we did deploy
before
we rebooted. So my idea is, despite the system rename, since we didnt restart clean, k3s still called its node the original name and then after restart it came up with the new name and is like WTF!? And maybe the old version 1.25 doesnt track the node name the same? or maybe the old k3s-uninstall.sh was cleaner and now I need to do something more extreem so it know its node name is ‘new’’? Any of that sound reasonable/possible @creamy-pencil-82913?
c
yeah, I would definitely be careful about what you leave behind if you’re preparing for system imaging. Clusters aren’t really meant to be cloned like that.
✔️ 1