Hello Everyone, I installed a k3s single node clus...
# general
w
Hello Everyone, I installed a k3s single node cluster, some months ago, with traefik, letsencrypt, installed a mysql DB and 2 wordpress websites and one django website, this is just for some personal projects and learn some kubernets on the way. Everything was smooth but last week all websites were down, and today I start to investigate the problem, and it seems that my cluster is instable. Is seems that the k3s server is always restarting and I dont know where to start. So from my firts steps: • I have free memory, disk and CPU • Check the k3s version ◦ 1.31.5 • when i check the systemd status : ◦ Loaded: loaded (/etc/systemd/system/k3s.service; enabled; preset: enabled) ◦ Active: activating (start) since Thu 2025-07-31 183529 CEST; 1min 29s ago (this is always restarting) ◦ Docs: https://k3s.io ◦ Process: 14422 ExecStartPre=/bin/sh -xc ! /usr/bin/systemctl is-enabled --quiet nm-cloud-setup.service 2>/dev/null (code=exited, status=0/SUCCESS) ◦ Process: 14424 ExecStartPre=/sbin/modprobe br_netfilter (code=exited, status=0/SUCCESS) ◦ Process: 14427 ExecStartPre=/sbin/modprobe overlay (code=exited, status=0/SUCCESS) ◦ Main PID: 14429 (k3s-server) ◦ Tasks: 441 ◦ Memory: 3.1G (peak: 5.1G) ◦ CPU: 2min 36.649s ◦ CGroup: /system.slice/k3s.service • And my journal logs: root@vmi2453314:~# journalctl -u k3s -xe Jul 31 183807 vmi2453314 k3s[14429]: I0731 183807.459437 14429 metrics.go:299] "Failed to get storage metrics" storage_cluster_id="etcd-0" err="context deadline exceeded" Jul 31 183820 vmi2453314 k3s[14429]: I0731 183820.463378 14429 metrics.go:299] "Failed to get storage metrics" storage_cluster_id="etcd-0" err="context deadline exceeded" Jul 31 183823 vmi2453314 k3s[14429]: time="2025-07-31T183823+02:00" level=info msg="Waiting for API server to become available to start kube-scheduler" Jul 31 183823 vmi2453314 k3s[14429]: time="2025-07-31T183823+02:00" level=info msg="Waiting for API server to become available to start cloud-controller-manager" Jul 31 183823 vmi2453314 k3s[14429]: time="2025-07-31T183823+02:00" level=info msg="Waiting for API server to become available" Jul 31 183825 vmi2453314 k3s[14429]: time="2025-07-31T183825+02:00" level=error msg="Sending HTTP/1.1 502 response to 127.0.0.151450 dial tcp 10.42.0.13510250 connect: no route to host" Jul 31 183825 vmi2453314 k3s[14429]: time="2025-07-31T183825+02:00" level=error msg="Sending HTTP/1.1 502 response to 127.0.0.151432 dial tcp 10.42.0.13510250 connect: no route to host" Jul 31 183825 vmi2453314 k3s[14429]: time="2025-07-31T183825+02:00" level=error msg="Sending HTTP/1.1 502 response to 127.0.0.151416 dial tcp 10.42.0.13510250 connect: no route to host" Jul 31 183825 vmi2453314 k3s[14429]: time="2025-07-31T183825+02:00" level=error msg="Sending HTTP/1.1 502 response to 127.0.0.151434 dial tcp 10.42.0.13510250 connect: no route to host" Jul 31 183825 vmi2453314 k3s[14429]: time="2025-07-31T183825+02:00" level=error msg="Sending HTTP/1.1 502 response to 127.0.0.151466 dial tcp 10.42.0.13510250 connect: no route to host" Jul 31 183825 vmi2453314 k3s[14429]: E0731 183825.487717 14429 remote_available_controller.go:448] "Unhandled Error" err="v1beta1.metrics.k8s.io failed with: failing or missing response from https://10.42.0.135:1025> Jul 31 183831 vmi2453314 k3s[14429]: I0731 183831.538689 14429 controller.go:615] quota admission added evaluator for: endpoints Jul 31 183833 vmi2453314 k3s[14429]: time="2025-07-31T183833+02:00" level=error msg="Sending HTTP/1.1 502 response to 127.0.0.141060 dial tcp 10.42.0.13510250 connect: no route to host" Jul 31 183833 vmi2453314 k3s[14429]: time="2025-07-31T183833+02:00" level=error msg="Sending HTTP/1.1 502 response to 127.0.0.141044 dial tcp 10.42.0.13510250 connect: no route to host" Jul 31 183833 vmi2453314 k3s[14429]: time="2025-07-31T183833+02:00" level=error msg="Sending HTTP/1.1 502 response to 127.0.0.141032 dial tcp 10.42.0.13510250 connect: no route to host" Jul 31 183833 vmi2453314 k3s[14429]: time="2025-07-31T183833+02:00" level=error msg="Sending HTTP/1.1 502 response to 127.0.0.141030 dial tcp 10.42.0.13510250 connect: no route to host" Jul 31 183833 vmi2453314 k3s[14429]: time="2025-07-31T183833+02:00" level=error msg="Sending HTTP/1.1 502 response to 127.0.0.141020 dial tcp 10.42.0.13510250 connect: no route to host"
c
Please don’t paste whole pages of logs into the slack channel. If you need to share logs, open a GH issue and attach them. or share via a pastebin service.
w
Sorry 🫡 It seems that the problem was related with kube-prometheus, as soon as I deleted the services and pods, The cluster is good again...