little-father-3860311/28/2022, 1:57 PM
kubectl top node shows:
Failed to scrape node" err="Get \"https://***:10250/metrics/resource\": context deadline exceeded" node="node-that-stopped-reporting"
HPAs are not working because of this, and I'm not sure what other issues this could cause. If I restart the node, the reporting starts working again for a few days, and then it stops. The machines are powerful enough and this problem started when the cluster was still empty, so I doubt is resources-related. Even now that the cluster is running some workloads, there are pleny of resources available. I have no idea what could cause this, but I noticed something interesting:
node-that-stopped-working <unknown> <unknown> <unknown> <unknown>
takes a really long time (~2 minutes) on those nodes that don't work, while it's fast (~2 seconds) on a working node. Any kind of help is greatly appreciated!
kubectl get --raw /api/v1/nodes/$NODE_NAME/proxy/stats/summary