This message was deleted Rancher Users #k3s

Join Slack

This message was deleted.

# k3s

adamant-kite-43734

03/12/2025, 12:25 PM

This message was deleted.

calm-rainbow-75708

03/12/2025, 12:26 PM

• create a new node then drain the original node

calm-rainbow-75708

03/12/2025, 12:26 PM

• delete the node

calm-rainbow-75708

03/12/2025, 12:26 PM

• terminate the ec2 instances

calm-rainbow-75708

03/12/2025, 12:27 PM

Most of the time, it is working, but sometimes the new node does not registrer properly and I need to do the following to remove the old node

calm-rainbow-75708

03/12/2025, 12:28 PM

Copy code

+----------------------------+--------+--------------+---------------------------+
|          ENDPOINT          | HEALTH |     TOOK     |           ERROR           |
+----------------------------+--------+--------------+---------------------------+
| <https://10.46.232.150:2379> |   true |  10.703449ms |                           |
|  <https://10.46.233.60:2379> |   true |  13.320709ms |                           |
|  <https://10.46.232.22:2379> |   true |  30.889352ms |                           |
|  <https://10.46.233.42:2379> |  false | 5.001702693s | context deadline exceeded |
+----------------------------+--------+--------------+---------------------------+

calm-rainbow-75708

03/12/2025, 12:28 PM

Copy code

+------------------+---------+------------------------------------------------------+----------------------------+----------------------------+------------+
|        ID        | STATUS  |                         NAME                         |         PEER ADDRS         |        CLIENT ADDRS        | IS LEARNER |
+------------------+---------+------------------------------------------------------+----------------------------+----------------------------+------------+
|   e85825167aeb74 | started |  ip-10-46-232-22.eu-west-1.compute.internal-850ae29a |  <https://10.46.232.22:2380> |  <https://10.46.232.22:2379> |      false |
|  b08c569b39de238 | started |  ip-10-46-233-60.eu-west-1.compute.internal-6e3a5329 |  <https://10.46.233.60:2380> |  <https://10.46.233.60:2379> |      false |
| cd8c7e2146c24d33 | started |  ip-10-46-233-42.eu-west-1.compute.internal-368aaa88 |  <https://10.46.233.42:2380> |  <https://10.46.233.42:2379> |      false |
| e0f0b05b7170b845 | started | ip-10-46-232-150.eu-west-1.compute.internal-0ec17b5a | <https://10.46.232.150:2380> | <https://10.46.232.150:2379> |      false |
+------------------+---------+------------------------------------------------------+----------------------------+----------------------------+------------+

calm-rainbow-75708

03/12/2025, 12:28 PM

=> remove the failing node:

calm-rainbow-75708

03/12/2025, 12:28 PM

etcdctl member remove cd8c7e2146c24d33

calm-rainbow-75708

03/12/2025, 12:29 PM

I am using v1.32.1+k3s1, is it expected I would like now to have to clean this manually 😞

calm-rainbow-75708

03/12/2025, 12:31 PM

Is it because I terminate the old node too quickly after deleting the node from the cluster ?

calm-rainbow-75708

03/12/2025, 12:33 PM

As soon as I delete the failing etcd node, the new node is appearing

creamy-pencil-82913

03/12/2025, 4:05 PM

It sounds like you’re deleting the old node before the new node has finished joining. I would probably suggest waiting for the new node to show as

Ready

kubectl get node

before deleting the old one. Can’t say for sure without logs though.

calm-rainbow-75708

03/12/2025, 6:03 PM

I am creating a new node, waiting for it to be ready, then draining the old node, then deleting the old node then terminate the old node. Sometimes the old node is kept in etcd. Then new "control-plane" node cannot be added until i delete the entry manually

creamy-pencil-82913

03/12/2025, 6:05 PM

When you delete the old node, there will be logs of k3s trying to remove it from etcd. Find out why that is failing.

creamy-pencil-82913

03/12/2025, 6:05 PM

Check logs on all nodes, including the one you are deleting

calm-rainbow-75708

03/12/2025, 6:05 PM

ok the etcd leader node ?

calm-rainbow-75708

03/12/2025, 6:06 PM

ok i'll check that, thanks

creamy-pencil-82913

03/12/2025, 6:07 PM

no. k3s doesn’t care which node is the etcd leader. etcd takes care of voting on that by itself.

calm-rainbow-75708

03/12/2025, 6:09 PM

ok, can i restrict the log analysis on my 3/4 control-plane nodes ?

calm-rainbow-75708

03/12/2025, 6:09 PM

or even agent nodes ?

creamy-pencil-82913

03/12/2025, 6:19 PM

Just the etcd nodes

creamy-pencil-82913

03/12/2025, 6:19 PM

and like I said, including the one you deleted

✅ 1

calm-rainbow-75708

03/12/2025, 6:20 PM

yes, thanks

creamy-pencil-82913

03/12/2025, 6:20 PM

Catch it before you terminate the instance and see what it says after it’s deleted. You should see one of the other etcd nodes trying to remove it from the cluster after the Kubernetes node resource is deleted.

calm-rainbow-75708

03/12/2025, 6:21 PM

calm-rainbow-75708

03/12/2025, 6:21 PM

I'll do that tomorrow it is late here. Thanks a lot, I wanted to be sure that I was doing what is expected.

calm-rainbow-75708

03/18/2025, 3:23 PM

@creamy-pencil-82913 I did managed to reproduced my issue only once, but in all log i don't really know what string to look at. Would you know which string I can look at to see when the node is removed from etcd ?

calm-rainbow-75708

03/18/2025, 3:24 PM

I found that:

Copy code

logrus.Infof("Starting managed etcd member removal controller")
    nodes.OnChange(ctx, "managed-etcd-controller", e.sync)
    nodes.OnRemove(ctx, "managed-etcd-controller", e.onRemove)
}

var (
    removalAnnotation         = "etcd." + version.Program + ".<http://cattle.io/remove|cattle.io/remove>"
    removedNodeNameAnnotation = "etcd." + version.Program + ".<http://cattle.io/removed-node-name|cattle.io/removed-node-name>"
)

creamy-pencil-82913

03/18/2025, 4:02 PM

https://github.com/k3s-io/k3s/blob/ba4a6384e31006a1ce38bd62b432840443c606d4/pkg/etcd/member_controller.go#L131

🙌 1

calm-rainbow-75708

03/18/2025, 4:18 PM

Thanks a lot. It will be much easier

prehistoric-soccer-87998

04/04/2025, 5:23 PM

Since i setup "graceful shutdown feature" in kubelet, it seems that I cannot reproduce my problem.

7 Views

Open in Slack

Previous Next