https://rancher.com/ logo
Title
b

bright-agency-92586

08/05/2022, 6:42 PM
Hello everyone. I'm facing an issue with Rancher where both the eks-operator and Rancher pods are crashing due to a boundsError related to the buildUpstreamClusterState function. I assume this is related to one of the EKS clusters we have added. I'd like to remove those clusters from Rancher, however since many of those clusters were provisioned using Rancher, when I delete them it also deletes the whole cluster from AWS. Is there any way to prevent this on version 2.5.15? I was able to prevent this on another Rancher instance running 2.6 by editing the clusters.management.cattle.io, clusters.provisioning.cattle.io and EKSClusterConfig resources and removing the finalizers, but clusters.provisioning.cattle.io and EKSClusterConfig do not seem to be available on 2.5.15. Any help would be greatly appreciated.
a

agreeable-waiter-30020

08/05/2022, 8:17 PM
On the clusters.management.cattle.io object, there is an eksConfg field that should have an imported field. Setting imported to true, let that settle (nothing should really happen besides some CRDs getting updated), then deleting the cluster from Rancher should remove it from Rancher but not delete it from EKS. Be aware though: there are some Rancher-managed things leftover. I am not sure how those leftover things will affect anything when/if you try to add the cluster back to Rancher.
b

bright-agency-92586

08/08/2022, 12:17 PM
Hi Donnie, Thanks a lot for that, I'll do some tests today with that. For cleaning up the cluster created by the 2.6 instance I had, i just deleted everyting under the cattle-system, cattle-fleet-system, cattle-impersonation-system and local namespaces, and then deleted the namespaces themselves. After importing the clusters again some resources got marked as already existing (mostly the CustomResourceDefinitions), some got patched and most of them got created from scratch, and after a few seconds the cluster was available again.
a

agreeable-waiter-30020

08/08/2022, 3:56 PM
Sorry, let me clarify my remarks. When I said “Rancher-managed things leftover” I was talking about Rancher-managed things in AWS (service roles, VPC, launch templates). Since those things were created by Rancher, they may not get cleaned up properly via the method I mentioned above.
b

bright-agency-92586

08/08/2022, 7:00 PM
Oh, I see. I actually want the cluster to remain functionally the same on the AWS side, so the roles, launch templates, auto scaling groups and other objects remaining there actually works for me. Thanks a lot for the advice.
Hi Donnie, Tried the steps you mentioned on a separate dev environment and it worked perfectly. However, on a few clusters on the prod environment I noticed the "imported" field is not there. Checking further, it seems that the field driverName switched from EKS to amazonelasticcontainerservice somehow, and on a single cluster the provider in the UI shows as ElasticContainerService as well. No idea how it could have been switched, but apparently it was always showing up as EKS on the UI up until we started having this issue, so it may be related.
t

tall-school-18125

08/09/2022, 10:30 PM
If you remove a Rancher-provisioned cluster from Rancher, its RBAC will be frozen in time. There is no supported use case to gracefully detach a Rancher-provisioned cluster, as we only support removing imported clusters. https://github.com/rancher/rancher/issues/25234
b

bright-agency-92586

08/10/2022, 12:37 PM
Hi Catherine, We actually don't really want to detach the cluster, but it's the only thing we could think of to stop Rancher from constantly crashing. We've submitted a bug report 3 weeks ago: https://github.com/rancher/rancher/issues/38377 If we could stop Rancher from constantly crashing without removing any cluster it would be even better for us. Thanks for the reply.
t

tall-school-18125

08/10/2022, 3:30 PM
I'm not sure if this is the same error, but in v2.6.6 there was a bug fix for a performance issue in which Rancher would crash repeatedly. https://github.com/rancher/rancher/issues/37250 Edit: never mind, I just saw you are on Rancher v2.5.x - that bug seems not to be in 2.5.x.
b

bright-agency-92586

08/10/2022, 5:22 PM
Hi Catherine, Yes, we are using version 2.5.15. Kubectl access works perfectly, it's just the main Rancher pods that keep crashing and restarting.