https://rancher.com/ logo
#general
Title
# general
a

adamant-kite-43734

05/02/2023, 7:03 PM
This message was deleted.
s

stocky-fall-82040

05/02/2023, 7:05 PM
Is there a Release Candiate helm chart for 2.6.7-rc9 that is available for download?
f

future-night-17486

05/02/2023, 9:23 PM
try this:
Copy code
➜  ~ helm search repo rancher-latest -l --devel | grep 2.6.7-rc9
rancher-latest/rancher	2.6.7-rc9       	v2.6.7-rc9       	Install Rancher Server to manage Kubernetes clu...
➜  ~ helm repo list
NAME          	URL
rancher-latest	<https://releases.rancher.com/server-charts/latest>
Note that the RC version are unstable versions.
Rancher UI show the latest version for each supported minor version, which is why v1.20.15-rancher2-2 is shown rather than v1.20.15-rancher2-1. Rancher still supports v1.20.15-rancher2-1, you can still edit the existing cluster in Rancher, you just cannot create new cluster with that version.
s

stocky-fall-82040

05/02/2023, 10:12 PM
Jack, you're amazing. Thank you for taking the time to answer my question. I do, in fact, now see the release candidate. I must have had something incorrect in helm cli parameters or with when I added my repo. I'm trying to upgrade from Rancher 2.5.17 (v1.20-15-rancher2-1) where both my local (rancher ha) cluster and all downstream clusters, which are rke1 are all running the same version v1.20-15-rancher2-1). After I upgrade the local cluster to Rancher 2.6.11 v1.24.10-rancher4-1, the cattle agent and node agents appear to update on my downstream clusters, but are stuck in a Cluster Not Reporting /Disconnected state. With Rancher 2.6.11, v1.20-15-rancher2-2 is the lowest version supported, so I was making an assumption that this problem had to do with an i,compatibility with my local and downstream cluster k8s versions.
Is there something I should be looking for in the logs? The cluster agent and node agent aren't throwing errors. The log stops with a "Starting Plan Monitor" message. All pods are in a ready/running state. No firewalls or services have been restarted. It's the strangest thing and the Kubernetes version difference is the only thing I can think of that looks suspicious. This has worked in other clusters I've managed as well.
f

future-night-17486

05/02/2023, 10:54 PM
the supported k8s version of Rancher's local cluster (where rancher is installed into) and the supported k8s version of the downstream cluster are not necessary to be the same range, although in practice they happen to be the same. In your case, Rancher 2.6.11 with the local cluster v1.24.10-rancher4-1 and downstream cluster v1.20-15-rancher2-2, should be within the supported version range. Are all nodes in active? In the downstream cluster, do you see cluster-agent and node-agent upgrade to use the image tag 2.6.11 ? In the downstream cluster, the node-agent is a daemonSet, can you check the logs of all its pods to see if there is any error?
also, can you check if there is any error in the cluster-agent pods ?
s

stocky-fall-82040

05/02/2023, 11:46 PM
So technically, where I was headed, where Im experiencing problems, should work. All local/ha cluster nodes are active and healthy. The node agent on the downstream clusters gets upgraded to 2.6.11. I will double check all pod logs, but was not seeing any errors previously. Its as if the agent checks in, but the local/rancher cluster never renegotiates comms.
I did upgrade from 2.5.17 to 2.6.11 without any issues in a test environment. The downstream clusters are reachable and are still running v1.20.15-rancher2_1. Something in my other cluster just isn't right and I will have to dig in logs to find out. I will recheck cluster-agent pods. I do know, that if I restore my local/rancher ha cluster to 2.5.17, the downstream clusters respond and work as if they've never been changed.
I tried all night until I had to rollback this morning. The cluster and node agents did not automatically swap to 2.6.11. I was able to patch/upgrade those to 2.6.11 by running the v3/import/*-<cluster_id>.yaml. The agent logs show good connections. What caught my eye are errors dealing with fleet, fleet-default namespace and permissions. We do nothing with fleet directly, but There are errors like : • "In Rancher 403 clusters.management.cattle.io cluster is forbidden: user cannot get resource clusters in API group management.cattle.io in the cluster scope", • "failed to register agent: secrets "fleet agent" is forbidden: User "systemserviceaccountfleet-system:fleet-agent" cannot get resource "secrets" in API group in namespace "fleet-system" RBAC: clusterrole.rbac.authorization.k8s.io "fleet-system-fleet-agent-role not found" , • "Receiving 403 forbidden user cannot get resource clusters in API group management.cattle.io at the cluster scope" In Cluster Management, when I click on Cluster/Related Resources, I see errors in the Refers To "Err Applied" for Cluster in fleet-default namespace. Something to do with provisioing.cattle.io.cluster, fleet-default namespace and lack of cluster owner permissions.
f

future-night-17486

05/03/2023, 7:55 PM
Hi Jonathan, sorry to hear that the upgrade did not work. From your newest update, it sounds like this reported GH issue although not 100% same. and a workaround exists according the comments in the above issue.
s

stocky-fall-82040

05/05/2023, 9:30 PM
I was able to work through the fleet issues, but I'm still faced with Cluster Agent not Reporting messages. The Cluster Agent Upgrades never begin after the upstream cluster is completed upgrading. In other environments, I receive this message, then the downstream clusters begin to update, reconcile and report in like normal. Do you know what component or logs I should look into post upgrade to determine why the upstream cluster would be unable to contact the downstream clusters? Is there some hook or other feature that runs to trigger this event?
In the end, I destoryed everything and rebuilt from scratch. Those clusters have been patched since Rancher 2.1. I'm assuming artifacts and other "stuff" had been hanging around or was causing issues. After the complete rebuild and data restore, things work fine. We are working through converting from Catalogs to Repos now. Only a few hand built helm charts are showing up in the Repo. Any ideas on where to poke around to look for Repo sync logs? I dont know if its a index.yaml parse issue, or something wrong with the helm charts. Always a mystery.
30 Views