This message was deleted.
# rke2
a
This message was deleted.
c
Have you looked at any of the logs on the downstream cluster nodes? Like, rancher system agent logs, or rke2 server logs?
If all you're looking at is rancher you will have no idea what's going on
p
Ive looked about at everything
Theres nothing that jumps out
c
If it looks like it's up, what do the cattle cluster agent pod logs say on the downstream cluster?
p
Copy code
time="2024-09-12T20:29:52Z" level=error msg="Error during subscribe websocket: close sent"
E0912 20:33:06.566017      39 leaderelection.go:340] Failed to update lock optimitically: Put "<https://10.43.0.1:443/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/cattle-controllers?timeout=15m0s>": dial tcp 10.43.0.1:443: connect: connection refused, falling back to slow path
E0912 20:33:06.566984      39 leaderelection.go:347] error retrieving resource lock kube-system/cattle-controllers: Get "<https://10.43.0.1:443/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/cattle-controllers?timeout=15m0s>": dial tcp 10.43.0.1:443: connect: connection refused
E0912 20:33:08.568114      39 leaderelection.go:340] Failed to update lock optimitically: Put "<https://10.43.0.1:443/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/cattle-controllers?timeout=15m0s>": dial tcp 10.43.0.1:443: connect: connection refused, falling back to slow path
E0912 20:33:08.569026      39 leaderelection.go:347] error retrieving resource lock kube-system/cattle-controllers: Get "<https://10.43.0.1:443/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/cattle-controllers?timeout=15m0s>": dial tcp 10.43.0.1:443: connect: connection refused
E0912 20:33:10.568364      39 leaderelection.go:340] Failed to update lock optimitically: Put "<https://10.43.0.1:443/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/cattle-controllers?timeout=15m0s>": dial tcp 10.43.0.1:443: connect: connection refused, falling back to slow path
E0912 20:33:10.569437      39 leaderelection.go:347] error retrieving resource lock kube-system/cattle-controllers: Get "<https://10.43.0.1:443/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/cattle-controllers?timeout=15m0s>": dial tcp 10.43.0.1:443: connect: connection refused
E0912 20:33:14.464303      39 reflector.go:150] pkg/mod/k8s.io/client-go@v0.30.1/tools/cache/reflector.go:232: Failed to watch *summary.SummarizedObject: unknown
E0912 20:33:14.464552      39 reflector.go:150] pkg/mod/k8s.io/client-go@v0.30.1/tools/cache/reflector.go:232: Failed to watch *summary.SummarizedObject: unknown
E0912 20:33:14.464571      39 reflector.go:150] pkg/mod/k8s.io/client-go@v0.30.1/tools/cache/reflector.go:232: Failed to watch *v3.Cluster: unknown (get <http://clusters.meta.k8s.io|clusters.meta.k8s.io>)
E0912 20:33:14.465990      39 reflector.go:150] pkg/mod/k8s.io/client-go@v0.30.1/tools/cache/reflector.go:232: Failed to watch *summary.SummarizedObject: unknown
E0912 20:33:14.466061      39 reflector.go:150] pkg/mod/k8s.io/client-go@v0.30.1/tools/cache/reflector.go:232: Failed to watch *summary.SummarizedObject: unknown
E0912 20:33:14.466114      39 reflector.go:150] pkg/mod/k8s.io/client-go@v0.30.1/tools/cache/reflector.go:232: Failed to watch *summary.SummarizedObject: unknown
E0912 20:33:14.466163      39 reflector.go:150] pkg/mod/k8s.io/client-go@v0.30.1/tools/cache/reflector.go:232: Failed to watch *summary.SummarizedObject: unknown
E0912 20:33:14.466858      39 reflector.go:150] pkg/mod/k8s.io/client-go@v0.30.1/tools/cache/reflector.go:232: Failed to watch *summary.SummarizedObject: unknown
E0912 20:33:14.467159      39 reflector.go:150] pkg/mod/k8s.io/client-go@v0.30.1/tools/cache/reflector.go:232: Failed to watch *summary.SummarizedObject: unknown
E0912 20:33:14.468751      39 reflector.go:150] pkg/mod/k8s.io/client-go@v0.30.1/tools/cache/reflector.go:232: Failed to watch *summary.SummarizedObject: unknown
time="2024-09-12T20:33:27Z" level=info msg="Updating TLS secret for cattle-system/serving-cert (count: 36): map[<http://field.cattle.io/projectId:c-m-p7k54glf:p-ncp4l|field.cattle.io/projectId:c-m-p7k54glf:p-ncp4l> <http://listener.cattle.io/cn-10.42.100.207:10.42.100.207|listener.cattle.io/cn-10.42.100.207:10.42.100.207> <http://listener.cattle.io/cn-10.42.100.221:10.42.100.221|listener.cattle.io/cn-10.42.100.221:10.42.100.221> <http://listener.cattle.io/cn-10.42.100.226:10.42.100.226|listener.cattle.io/cn-10.42.100.226:10.42.100.226> <http://listener.cattle.io/cn-10.42.100.236:10.42.100.236|listener.cattle.io/cn-10.42.100.236:10.42.100.236> <http://listener.cattle.io/cn-10.42.100.240:10.42.100.240|listener.cattle.io/cn-10.42.100.240:10.42.100.240> <http://listener.cattle.io/cn-10.42.100.245:10.42.100.245|listener.cattle.io/cn-10.42.100.245:10.42.100.245> <http://listener.cattle.io/cn-10.42.100.246:10.42.100.246|listener.cattle.io/cn-10.42.100.246:10.42.100.246> <http://listener.cattle.io/cn-10.42.100.247:10.42.100.247|listener.cattle.io/cn-10.42.100.247:10.42.100.247> <http://listener.cattle.io/cn-10.42.209.144:10.42.209.144|listener.cattle.io/cn-10.42.209.144:10.42.209.144> <http://listener.cattle.io/cn-10.42.209.146:10.42.209.146|listener.cattle.io/cn-10.42.209.146:10.42.209.146> <http://listener.cattle.io/cn-10.42.209.166:10.42.209.166|listener.cattle.io/cn-10.42.209.166:10.42.209.166> <http://listener.cattle.io/cn-10.42.209.167:10.42.209.167|listener.cattle.io/cn-10.42.209.167:10.42.209.167> <http://listener.cattle.io/cn-10.42.209.168:10.42.209.168|listener.cattle.io/cn-10.42.209.168:10.42.209.168> <http://listener.cattle.io/cn-10.42.209.171:10.42.209.171|listener.cattle.io/cn-10.42.209.171:10.42.209.171> <http://listener.cattle.io/cn-10.42.209.175:10.42.209.175|listener.cattle.io/cn-10.42.209.175:10.42.209.175> <http://listener.cattle.io/cn-10.42.209.176:10.42.209.176|listener.cattle.io/cn-10.42.209.176:10.42.209.176> <http://listener.cattle.io/cn-10.42.219.112:10.42.219.112|listener.cattle.io/cn-10.42.219.112:10.42.219.112> <http://listener.cattle.io/cn-10.42.219.114:10.42.219.114|listener.cattle.io/cn-10.42.219.114:10.42.219.114> <http://listener.cattle.io/cn-10.42.219.115:10.42.219.115|listener.cattle.io/cn-10.42.219.115:10.42.219.115> <http://listener.cattle.io/cn-10.42.219.117:10.42.219.117|listener.cattle.io/cn-10.42.219.117:10.42.219.117> <http://listener.cattle.io/cn-10.42.219.121:10.42.219.121|listener.cattle.io/cn-10.42.219.121:10.42.219.121> <http://listener.cattle.io/cn-10.42.219.125:10.42.219.125|listener.cattle.io/cn-10.42.219.125:10.42.219.125> <http://listener.cattle.io/cn-10.42.219.65:10.42.219.65|listener.cattle.io/cn-10.42.219.65:10.42.219.65> <http://listener.cattle.io/cn-10.42.219.66:10.42.219.66|listener.cattle.io/cn-10.42.219.66:10.42.219.66> <http://listener.cattle.io/cn-10.42.219.72:10.42.219.72|listener.cattle.io/cn-10.42.219.72:10.42.219.72> <http://listener.cattle.io/cn-10.42.219.80:10.42.219.80|listener.cattle.io/cn-10.42.219.80:10.42.219.80> <http://listener.cattle.io/cn-10.42.219.82:10.42.219.82|listener.cattle.io/cn-10.42.219.82:10.42.219.82> <http://listener.cattle.io/cn-10.42.219.92:10.42.219.92|listener.cattle.io/cn-10.42.219.92:10.42.219.92> <http://listener.cattle.io/cn-10.42.219.94:10.42.219.94|listener.cattle.io/cn-10.42.219.94:10.42.219.94> <http://listener.cattle.io/cn-10.42.219.96:10.42.219.96|listener.cattle.io/cn-10.42.219.96:10.42.219.96> <http://listener.cattle.io/cn-10.42.219.97:10.42.219.97|listener.cattle.io/cn-10.42.219.97:10.42.219.97> <http://listener.cattle.io/cn-10.42.219.98:10.42.219.98|listener.cattle.io/cn-10.42.219.98:10.42.219.98> <http://listener.cattle.io/cn-127.0.0.1:127.0.0.1|listener.cattle.io/cn-127.0.0.1:127.0.0.1> <http://listener.cattle.io/cn-localhost:localhost|listener.cattle.io/cn-localhost:localhost> <http://listener.cattle.io/cn-rancher.cattle-system:rancher.cattle-system|listener.cattle.io/cn-rancher.cattle-system:rancher.cattle-system> <http://listener.cattle.io/fingerprint:SHA1=B42813C17138FE73F69F0402F0E41C90D1C85719]|listener.cattle.io/fingerprint:SHA1=B42813C17138FE73F69F0402F0E41C90D1C85719]>"
W0912 20:33:28.252424      39 warnings.go:70] v1 ComponentStatus is deprecated in v1.19+
W0912 20:33:29.993128      39 warnings.go:70] v1 ComponentStatus is deprecated in v1.19+
time="2024-09-12T20:34:36Z" level=info msg="Updating workload [cattle-monitoring-system/rancher-monitoring-prometheus-node-exporter] with public endpoints [[{\"nodeName\":\":legrosdell\",\"addresses\":[\"10.0.16.200\"],\"port\":9796,\"protocol\":\"TCP\",\"podName\":\"cattle-monitoring-system:rancher-monitoring-prometheus-node-exporter-lm7dl\",\"allNodes\":false}]]"
time="2024-09-12T20:34:36Z" level=info msg="Updating workload [kube-system/rke2-ingress-nginx-controller] with public endpoints [[{\"nodeName\":\":legrosdell\",\"addresses\":[\"10.0.16.200\"],\"port\":80,\"protocol\":\"TCP\",\"podName\":\"kube-system:rke2-ingress-nginx-controller-nbfsn\",\"allNodes\":false},{\"nodeName\":\":legrosdell\",\"addresses\":[\"10.0.16.200\"],\"port\":443,\"protocol\":\"TCP\",\"podName\":\"kube-system:rke2-ingress-nginx-controller-nbfsn\",\"allNodes\":false},{\"nodeName\":\":legrosdell\",\"addresses\":[\"10.0.16.200\"],\"port\":5432,\"protocol\":\"TCP\",\"podName\":\"kube-system:rke2-ingress-nginx-controller-nbfsn\",\"allNodes\":false},{\"nodeName\":\":legrosdell\",\"addresses\":[\"10.0.16.200\"],\"port\":3306,\"protocol\":\"TCP\",\"podName\":\"kube-system:rke2-ingress-nginx-controller-nbfsn\",\"allNodes\":false},{\"nodeName\":\":legrosdell\",\"addresses\":[\"10.0.16.200\"],\"port\":1433,\"protocol\":\"TCP\",\"podName\":\"kube-system:rke2-ingress-nginx-controller-nbfsn\",\"allNodes\":false}]]"
E0912 20:35:19.373584      39 gvks.go:69] failed to sync schemas: unable to retrieve the complete list of server APIs: <http://custom.metrics.k8s.io/v1beta1|custom.metrics.k8s.io/v1beta1>: stale GroupVersion discovery: <http://custom.metrics.k8s.io/v1beta1|custom.metrics.k8s.io/v1beta1>
time="2024-09-12T20:35:19Z" level=error msg="Failed to read API for groups map[<http://custom.metrics.k8s.io/v1beta1:stale|custom.metrics.k8s.io/v1beta1:stale> GroupVersion discovery: <http://custom.metrics.k8s.io/v1beta1|custom.metrics.k8s.io/v1beta1>]"
E0912 20:35:20.695065      39 gvks.go:69] failed to sync schemas: unable to retrieve the complete list of server APIs: <http://custom.metrics.k8s.io/v1beta1|custom.metrics.k8s.io/v1beta1>: stale GroupVersion discovery: <http://custom.metrics.k8s.io/v1beta1|custom.metrics.k8s.io/v1beta1>, <http://metrics.k8s.io/v1beta1|metrics.k8s.io/v1beta1>: stale GroupVersion discovery: <http://metrics.k8s.io/v1beta1|metrics.k8s.io/v1beta1>
W0912 20:35:20.963599      39 warnings.go:70] v1 ComponentStatus is deprecated in v1.19+
time="2024-09-12T20:35:20Z" level=error msg="Failed to read API for groups map[<http://custom.metrics.k8s.io/v1beta1:stale|custom.metrics.k8s.io/v1beta1:stale> GroupVersion discovery: <http://custom.metrics.k8s.io/v1beta1|custom.metrics.k8s.io/v1beta1> <http://metrics.k8s.io/v1beta1:stale|metrics.k8s.io/v1beta1:stale> GroupVersion discovery: <http://metrics.k8s.io/v1beta1]|metrics.k8s.io/v1beta1]>"
E0912 20:35:22.307678      39 gvks.go:69] failed to sync schemas: unable to retrieve the complete list of server APIs: <http://metrics.k8s.io/v1beta1|metrics.k8s.io/v1beta1>: stale GroupVersion discovery: <http://metrics.k8s.io/v1beta1|metrics.k8s.io/v1beta1>
W0912 20:35:22.872957      39 warnings.go:70] v1 ComponentStatus is deprecated in v1.19+
time="2024-09-12T20:35:23Z" level=error msg="Failed to read API for groups map[<http://metrics.k8s.io/v1beta1:stale|metrics.k8s.io/v1beta1:stale> GroupVersion discovery: <http://metrics.k8s.io/v1beta1|metrics.k8s.io/v1beta1>]"
W0912 20:35:24.701344      39 warnings.go:70] v1 ComponentStatus is deprecated in v1.19+
W0912 20:35:52.295505      39 warnings.go:70] v1 ComponentStatus is deprecated in v1.19+
theres that metrics.k8.io stale that often comes back in the logs, but i found nothing about it anywhere so...
c
what specifically are those from?
p
I don't know, i binned it. I cordoned my only working node and now even after restoring from snapshot, fleet doesnt want to start up its agent.
Its not my production cluster so nothing is burning, but if it was my production cluster, i'd be shot on the spot
I think things started burning when i was fucking around with snapshotting, restoring after trying to update rke2 https://github.com/rancher/rancher/issues/46994
I don't know if its related, but for example the etcdrestore in the yaml never goes away
Ok i'm back. So about that cluster metrics, ive also found out that rancher-webhook says
Copy code
failed to sync schemas: unable to retrieve the complete list of server APIs: <http://custom.metrics.k8s.io/v1beta1|custom.metrics.k8s.io/v1beta1>: stale GroupVersion discovery: <http://custom.metrics.k8s.io/v1bet|custom.metrics.k8s.io/v1bet>
And for the record, i can't create a new cluster from scratch either. (rke2 1.30 , all defaults). The new node also says waiting for join url. Do you think it mayt be related to this ? https://rancher-users.slack.com/archives/C3ASABBD1/p1726156914270439 I may have an issue with my upstream cluster (my production cluster is still running thankfully)
FUCKING FIXED IIIIT
😂 1
ON UPSTREAM CLUSTER THE rancher ENDPOINT WENT FROM 172.17.0.2 to 172.17.0.3 AT SOME UNKOWN POINT (update to 2.9.1?) BUT THERE WAS NOTHING ON 172.17.0.3
SWITCHED BACK TO .2 AND EVERYONE IS ALIVE NOOOW
ALL OF THIS BEcAUSE OF THAT EFFIN EFFEEEEEEEEEEER
thank you @creamy-pencil-82913 for answering, it was much appreciated 🙏