fierce-australia-70779
04/14/2025, 5:06 PM1.31.3
to 1.31.7
. The upgrade seemed to work fine and the Rancher cluster and its two child clusters appear to be working correctly - no problems with any workloads. With that said, if we log into Rancher and click on the name of one of the child clusters (e.g. dev
). we receive an Error: Unknown schema for type: namespace
error. Although not exactly the same, the error looks similar to the image below (I borrowed this image from a different Github issue). We're not quite sure why this error is happening and it doesn't seem like something that an RKE2 upgrade should trigger.
Looking into it a bit further, if I look at the logs of any of the cattle-cluster-agent
pods on the child cluster, there are a lot of errors of the form:
level=error msg="failed to sync schemas: the server is currently unable to handle the request"
I've tried increasing the log level on the cattle-cluster-agent
deployment by setting the CATTLE_TRACE
environment variable to true
but this hasn't yielded any additional relevant logging information. These clusters are running in an airgap environment and so I can't readily share any logs in their entirety.
Looking in the web browser's logs, I can see that when we click on one of these clusters, it sends a request to <hxxps://rancher.ourdomain.com/k8s/clusters/clusterid-goes-here/v1/schema>
. Comparing the results of this request between a known working cluster and the currently misbehaving cluster, the data returned in the misbehaving cluster is a lot smaller. On the working cluster, you can see that the results being returned contain descriptions of virtually every resource type in the cluster; on the non-working cluster, it's missing lots of types including build-in Kubernetes types such as Namespace
, which seems to explain the UI error in the screenshot.
As an additional data point, we've tried creating a brand new cluster in Rancher after having performed the upgrade (our cleverly named debug
cluster) and this cluster works perfectly fine - no errors in the cattle-cluster-agent
on the new cluster at all. Suffice to say it seems that Rancher is generally working fine - it just seems to have some trouble communicating with the existing clusters for some reason.
So - has anyone else experienced anything like this and have suggestions for additional troubleshooting/debugging that can be done? Is there an easy way to tell Rancher to "re-onboard" an existing cluster in order to make sure its cattle-cluster-agent
deployment is set up correctly to talk to Rancher?
Environment:
• Rancher 2.10.3
• RKE2 1.31.7
on Ubuntu 22.04
• vSphere 7 on-prem