https://rancher.com/ logo
Title
e

eager-refrigerator-66976

09/12/2022, 2:22 PM
Hey guys! anyone had issues with rancher just deletes managed custom cluster on attempt to modify cluster configuration? I’ve submitted bug here https://github.com/rancher/rancher/issues/38833 this is very scary… my clusters are just destroyed when I modify cluster configuration…
🙏 1
b

brash-planet-10109

09/12/2022, 2:27 PM
OMG Is it? Are you sure, it is because of modifications
e

eager-refrigerator-66976

09/13/2022, 8:03 AM
I am pretty sure that any modification to the cluster configuration applied via terraform cause cluster deletion occasionally…
I’ve lost couple times prod clusters already just by applying manifests change…
I was doing some more testing, and this doesn’t happen if I modify
Cluster
k8s CR that rancher creates, like it was very stable for ~2k iterations
also I’ve tried to reproduce this applying cluster modification via rancher API directly, so this also was very stable for 2k+ iterations
so far this happens for me only If I change my cluster configuration with terraform provider
b

brash-planet-10109

09/13/2022, 8:08 AM
That's really scary I will use UI then for provisioning and managing state
e

eager-refrigerator-66976

09/13/2022, 8:09 AM
well, this doesn’t scale if you have to manage many cluster in gitOps maner
👍 1
but yeh this is a disaster, first time it happened to me I was applying some label change in production cluster and it just destroyed the whole cluster in seconds
b

brash-planet-10109

09/13/2022, 8:12 AM
Yeah correct Please share your analysis after resolution
b

best-microphone-20624

09/15/2022, 12:05 AM
Have you considered using a k8s gitops cd tool to apply cluster custom resource updates from git as opposed to using terraform? Unfortunately the rancher experience for gitops provisioning and managing custom clusters is not great. We try to use our IaaS tool as little as possible for this and use our k8s gitops tool as much as possible for managing the clusters and their addons.
e

eager-refrigerator-66976

09/15/2022, 7:55 AM
@best-microphone-20624 yeh, this is something I am currently considering to use instead as using terraform is just dangerous the thing is actually this might be not terraform provider issue, all it does is calling norman (rancher apis) APIs so this might be API bug
b

best-microphone-20624

09/15/2022, 12:07 PM
I would expect fleet-based cluster mgmt at scale to be better tested than terraform-based cluster mgmt. It may be that terraform uses the same underlying apis but does so in a less reliable manner.