little-father-38603
11/28/2022, 1:57 PMFailed to scrape node" err="Get \"https://***:10250/metrics/resource\": context deadline exceeded" node="node-that-stopped-reporting"
kubectl top node shows:
node-that-stopped-working <unknown> <unknown> <unknown> <unknown>
HPAs are not working because of this, and I'm not sure what other issues this could cause.
If I restart the node, the reporting starts working again for a few days, and then it stops.
The machines are powerful enough and this problem started when the cluster was still empty, so I doubt is resources-related.
Even now that the cluster is running some workloads, there are pleny of resources available.
I have no idea what could cause this, but I noticed something interesting:
kubectl get --raw /api/v1/nodes/$NODE_NAME/proxy/stats/summary
takes a really long time (~2 minutes) on those nodes that don't work, while it's fast (~2 seconds) on a working node.
Any kind of help is greatly appreciated!