This message was deleted.
# rke2
a
This message was deleted.
c
If you ever get it working please let me know!
c
You need to deploy the AWS cloud provider, and disable the default rke2 cloud provider. You will also need to delete and re-register all your nodes after doing that, as the provider id cannot be changed after it is set.
a
@creamy-rainbow-46562
c
Well, some time later.... Almost all the stuff works after the recreation of the cluster.... In the follow blocks you can find the relevant code for this case.... The terraform module invocation to create the cluster:
Copy code
resource "rancher2_cluster_v2" "cluster" {
  ...
  rke_config {
    machine_selector_config {
      config = {
        #<https://docs.rke2.io/reference/server_config>
        cloud-provider-name = "aws"
        kubelet-arg                 = "cloud-provider=aws" #I tried it with "external" but the cluster didn't starts
        kube-apiserver-arg          = "cloud-provider=aws" #I tried it with "external" but the cluster didn't starts
        kube-controller-manager-arg = "cloud-provider=aws" #I tried it with "external" but the cluster didn't starts
      }
    }
    ...
  }
  ...
}
Then the terraform invocation to install the AWS cloud provider helm:
Copy code
data "template_file" "helm-ccm-values-template-file" {
  template = file("${path.module}/helm/ccm.yaml")
}

resource "helm_release" "aws-cloud-controller-manager" {
  name              = "aws-cloud-controller-manager"
  namespace         = "kube-system"
  repository        = "<https://kubernetes.github.io/cloud-provider-aws>"
  chart             = "aws-cloud-controller-manager"
  create_namespace  = false
  wait              = true
  version           = "0.0.8"

  values = [
    data.template_file.helm-ccm-values-template-file.rendered
  ]
}
In this ccm.yaml I just keep almos everithing as the default, but I set also the cloud provider
Copy code
namespace: "kube-system"

args:
  - --v=2
  - --cloud-provider=aws

...
And I did the same with the autoscaler:
Copy code
data "template_file" "helm-as-values-template-file" {
  template = file("${path.module}/helm/as.yaml")
  
  vars = {
       # a lot of replaces that is important for me, including the ASG names to watch and so on
  }
}

resource "helm_release" "autoscaler" {
  name              = "cluster-autoscaler"
  namespace         = "kube-system"
  repository        = "<https://kubernetes.github.io/autoscaler>"
  chart             = "cluster-autoscaler"
  create_namespace  = false
  wait              = true
  version           = "9.32.0"

  values = [
    data.template_file.helm-as-values-template-file.rendered
  ]
}
And with that, the things works.... well.. almost all the things works.... The AWS ASG, with the tags in the right place and etc can be watched by the k8s autoscaler and knows when a instance dies and is replaced, removing it from the cluster. HOWEVER, the "ghost" of the machine remains.... like the follow image... stucking the status of the cluster as "updating" and locking it for any other changes.... version k8s: v1.26.8+rke2r1 version rancher: 2.7.6
Did you guys have any idea what I can possibly missing?
l
I am interested to know how to get rid of ghost machines in rke2. I have AWS ASG setup to provision new downstream node and when I scale down worker nodes, ec2 and rancher will clean up the node fast and clean. But, then the machine remains as node not found. I have to manually delete the machine
kubectl delete machine xyz -n fleet-default
in local cluster. What’s the reason and best way I can clean up ghost machines from ASG scaling down?