https://rancher.com/ logo
s

sparse-vr-29074

03/14/2023, 1:31 PM
Hey everyone ! I'm having an issue with Rancher and cluster management with node that stays stuck in "Deleting" state, underlaying VM is removed but nodes stays in "Deleting" state in the machine pool. More informations in the thread
Here are the version I'm using • Rancher v2.7.1 • Helm v2.16.8-rancher2 • Machine v0.15.0-rancher95
To "kick" the deleting process I have to remove the finalizers from the
machines
resource which is a no go for me
Copy code
creationTimestamp: "2023-03-14T09:35:13Z"
  deletionGracePeriodSeconds: 0
  deletionTimestamp: "2023-03-14T13:05:56Z"
  finalizers:
  - <http://machine.cluster.x-k8s.io|machine.cluster.x-k8s.io>
Any help would be very welcome, I'm stuck with this issue since a while ! Thanks
b

big-hydrogen-97240

03/14/2023, 1:38 PM
The finalizer you are showing there is the cluster API finalizer. If you don't want to remove the finalizer, you can annotate the machine object. That will cause the machine to go back through the controllers and the cluster API controllers should remove the finalizer.
s

sparse-vr-29074

03/14/2023, 1:41 PM
Hum interesting ! thanks, could you provide me an example of annotation please ?
b

big-hydrogen-97240

03/14/2023, 1:43 PM
It doesn't matter what the annotation is. There are likely already annotations on the machine object, you can just add an additional one.
s

sparse-vr-29074

03/14/2023, 1:45 PM
here is my current YAML
Copy code
apiVersion: <http://cluster.x-k8s.io/v1beta1|cluster.x-k8s.io/v1beta1>
kind: Machine
metadata:
  annotations:
    <http://machine.cluster.x-k8s.io/exclude-node-draining|machine.cluster.x-k8s.io/exclude-node-draining>: 'true'
  creationTimestamp: '2023-03-14T09:35:13Z'
  deletionGracePeriodSeconds: 0
  deletionTimestamp: '2023-03-14T13:05:56Z'
  finalizers:
    - <http://machine.cluster.x-k8s.io|machine.cluster.x-k8s.io>
This happens almost on each cluster lifecycle which is really annoying. I need to find a solution in a programatic way and not just a workaround and I can not modify the machine resource code base
b

big-hydrogen-97240

03/14/2023, 1:48 PM
I would expect there to be a few more annotations on the object. For my own understanding, is this a "custom cluster" or a "provisioned cluster"?
s

sparse-vr-29074

03/14/2023, 1:49 PM
this is a custom cluster created with the help of the Outscale node driver
b

big-hydrogen-97240

03/14/2023, 1:50 PM
So you created the VMs with that driver and then ran a command provided by Rancher on them? Or you use that docker-machine driver in Rancher?
s

sparse-vr-29074

03/14/2023, 1:52 PM
I use this docker machine driver in Rancher and I create the cluster with the help of an helm chart https://github.com/outscale/osc-k8s-rancher/tree/main/osc-rke2-cluster-template/charts
b

big-hydrogen-97240

03/14/2023, 1:52 PM
OK. That sounds like a provisioned cluster to me, using a custom docker-machine driver.
s

sparse-vr-29074

03/14/2023, 1:53 PM
yes indeed
b

big-hydrogen-97240

03/14/2023, 1:54 PM
Are you deleting the VMs outside of Rancher and this is causing them to get stuck in this way? Or are you deleting the nodes from Rancher?
s

sparse-vr-29074

03/14/2023, 1:54 PM
I deleting the cluster by uninstalling the helm chart
If I remove the VM phiscally Rancher lost the track etheir
Should I try to remove the cluster from Rancher as a test ?
b

big-hydrogen-97240

03/14/2023, 1:56 PM
Deleting the helm chart should be equivalent to deleting the cluster from Rancher.
s

sparse-vr-29074

03/14/2023, 1:57 PM
ok, good to know !
b

big-hydrogen-97240

03/14/2023, 1:59 PM
Does the machine object have an
ownerReference
on it?
s

sparse-vr-29074

03/14/2023, 2:00 PM
Copy code
ownerReferences:
    - apiVersion: <http://cluster.x-k8s.io/v1beta1|cluster.x-k8s.io/v1beta1>
      blockOwnerDeletion: true
      controller: true
      kind: MachineSet
      name: fhn-dev-rancher-dev-zone-dev1-control-b5dfbbf5d
      uid: 53e6daeb-0ff2-429f-ac79-3373a8c3b17e
b

big-hydrogen-97240

03/14/2023, 2:01 PM
OK. That appears normal. I would open an issue on GitHub describing the problem you are having. If it is easily reproducible for you, provide all the steps.
s

sparse-vr-29074

03/14/2023, 2:02 PM
Ok I'm going to do that ! Thanks a lot for your time & help !
b

big-hydrogen-97240

03/14/2023, 2:03 PM
Yeah, no problem. And good luck! Some of these custom docker-machine drivers can be tricky.
s

sparse-vr-29074

03/14/2023, 2:04 PM
It seems !
25 Views