adamant-kite-43734
01/03/2025, 2:04 AMbumpy-mechanic-40986
01/03/2025, 2:08 AMbumpy-mechanic-40986
01/03/2025, 2:11 AMcreamy-pencil-82913
01/03/2025, 3:25 AMbumpy-mechanic-40986
01/03/2025, 3:27 AMkubernetes.default.svc
bumpy-mechanic-40986
01/03/2025, 3:35 AM*Error scraping target:* Get "<https://10.42.43.72:6443/metrics>": context deadline exceeded
Are the two VMs that are running on the same host, but they are running on different disks? I tried migrating the third back onto the same host as the other two and it seemed to be in a worse position, so I migrated it back.
My end state/goal is for three hosts each with a single VM each but I'm keen to narrow this down further on what's happening before committing to setting up the third host.bumpy-mechanic-40986
01/03/2025, 3:37 AM*Error scraping target:* Get "<https://10.42.43.72:6443/metrics>": context deadline exceeded
*Error scraping target:* Get "<https://10.42.43.70:6443/metrics>": context deadline exceeded
creamy-pencil-82913
01/03/2025, 5:46 AMbumpy-mechanic-40986
01/03/2025, 5:50 AMbumpy-mechanic-40986
01/03/2025, 5:51 AMcreamy-pencil-82913
01/03/2025, 5:59 AMbumpy-mechanic-40986
01/03/2025, 6:11 AMbumpy-mechanic-40986
01/03/2025, 6:12 AMbumpy-mechanic-40986
01/04/2025, 10:49 PM10.42.43.70
node which to I doubled the vCPU as part of testing.
10.42.43.71
which is the only node that consistently reports as healthy in Prometheus responds in 150-300ms so I can only assume that these values are totally ok.
So could it just be that my Prometheus deployment is broken in some form? But that wouldn't explain why I'm also having issues with Portainer?bumpy-mechanic-40986
01/05/2025, 12:02 AMkubectl logs -n kube-system metrics-server-7cbbc464f4-wpvpq
is filled with:
"Failed to scrape node, timeout to access kubelet" err="Get \"<https://10.42.43.71:10250/metrics/resource>\": context deadline exceeded" node="k3s02" timeout="10s"
bumpy-mechanic-40986
01/05/2025, 12:08 AMbumpy-mechanic-40986
01/05/2025, 12:14 AM*Error scraping target:* Get "<https://10.42.43.71:6443/metrics>": context deadline exceeded
bumpy-mechanic-40986
01/05/2025, 12:17 AMbumpy-mechanic-40986
01/05/2025, 12:39 AMcreamy-pencil-82913
01/05/2025, 1:14 AMbumpy-mechanic-40986
01/05/2025, 1:19 AMbumpy-mechanic-40986
01/05/2025, 1:21 AMbumpy-mechanic-40986
01/05/2025, 1:36 AMkubectl run debug-pod --rm -it --image=nicolaka/netshoot --overrides='{"spec": { "nodeSelector": {"kubernetes.io/hostname": "pve01"}}}' -- /bin/bash in bash at 143342
If you don't see a command prompt, try pressing enter.
debug-pod:~# ping 10.42.43.70
PING 10.42.43.70 (10.42.43.70) 56(84) bytes of data.
^C
--- 10.42.43.70 ping statistics ---
3 packets transmitted, 0 received, 100% packet loss, time 2039ms
debug-pod:~# ping 10.42.43.71
PING 10.42.43.71 (10.42.43.71) 56(84) bytes of data.
64 bytes from 10.42.43.71: icmp_seq=1 ttl=63 time=0.528 ms
64 bytes from 10.42.43.71: icmp_seq=2 ttl=63 time=0.163 ms
^C
--- 10.42.43.71 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1009ms
rtt min/avg/max/mdev = 0.163/0.345/0.528/0.182 ms
debug-pod:~# ping 10.42.43.72
PING 10.42.43.72 (10.42.43.72) 56(84) bytes of data.
^C
--- 10.42.43.72 ping statistics ---
5 packets transmitted, 0 received, 100% packet loss, time 4133ms
bumpy-mechanic-40986
01/05/2025, 1:50 AMvmbr0.43@vmbr0
where it was vmbr0
when they were originally registered to the clusterbumpy-mechanic-40986
01/05/2025, 2:10 AMbumpy-mechanic-40986
01/06/2025, 3:36 AMkubernetes.default.svc
as down since.
Thanks for rubber ducking @creamy-pencil-82913 particularly as the issue ended up not being related/caused by k3s at all!