This message was deleted Rancher Users #rke2

Join Slack

This message was deleted.

# rke2

adamant-kite-43734

06/11/2025, 8:53 PM

This message was deleted.

creamy-pencil-82913

06/11/2025, 8:56 PM

defrag is disruptive (stops all IO) and should not be done while the database is in use. RKE2 defrags the database every time the service is restarted. If you have something that is alerting on etcd datastore fragmentation being over some arbitrary threshold I would probably just ignore/surpress that. There are monitoring packages out there with terrible default thresholds. It is normal for there to be free pages within the allocated space, as etcd hits a steady state of things being created/deleted the unused pages allocated from disk will turn over a bit. If you’re constantly defragging the space on disk for no good reason it will just have to go reallocate space again anyway.

bland-article-62755

06/11/2025, 8:59 PM

Hm Yeah the default settings for Prometheus for RKE2/Elemental start griping when the fragmentation ratio is over 50%. Typically I see it trigger every other day or so and I run it once and it's fine for another day or two. But today I saw one cluster (Harvester) go from 1.1G to ~200 MB.

bland-article-62755

06/11/2025, 9:03 PM

Uh, correction 1.1 GiB to 81 MiB.

creamy-pencil-82913

06/11/2025, 9:07 PM

what is that figure. Unused space?

bland-article-62755

06/11/2025, 9:07 PM

Let me grab the output

creamy-pencil-82913

06/11/2025, 9:07 PM

or actual db size.

bland-article-62755

06/11/2025, 9:07 PM

db size

bland-article-62755

06/11/2025, 9:08 PM

Copy code

Switched to context "compute".
Getting compute etcd Status
+------------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
|           ENDPOINT           |        ID        | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+------------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| <https://128.111.126.103:2379> | 20a7fa08b68045e4 |  3.5.16 |  267 MB |     false |      false |        30 |  997461295 |          997461295 |        |
| <https://128.111.126.108:2379> | 57fcd8cd8bf11a0e |  3.5.16 |  1.1 GB |      true |      false |        30 |  997461295 |          997461295 |        |
| <https://128.111.126.102:2379> | ee8de2b884379670 |  3.5.16 |  230 MB |     false |      false |        30 |  997461295 |          997461295 |        |
+------------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
Defragging the etcd in the compute cluster via etcd-kube10
Finished defragmenting etcd member[<https://128.111.126.103:2379>]
Finished defragmenting etcd member[<https://128.111.126.108:2379>]
Finished defragmenting etcd member[<https://128.111.126.102:2379>]
Getting compute etcd Health
+------------------------------+--------+------------+-------+
|           ENDPOINT           | HEALTH |    TOOK    | ERROR |
+------------------------------+--------+------------+-------+
| <https://128.111.126.102:2379> |   true | 5.251518ms |       |
| <https://128.111.126.108:2379> |   true | 6.502341ms |       |
| <https://128.111.126.103:2379> |   true | 7.157662ms |       |
+------------------------------+--------+------------+-------+
Getting compute etcd Status
+------------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
|           ENDPOINT           |        ID        | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+------------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| <https://128.111.126.103:2379> | 20a7fa08b68045e4 |  3.5.16 |   81 MB |     false |      false |        30 |  997461460 |          997461460 |        |
| <https://128.111.126.108:2379> | 57fcd8cd8bf11a0e |  3.5.16 |   81 MB |      true |      false |        30 |  997461460 |          997461460 |        |
| <https://128.111.126.102:2379> | ee8de2b884379670 |  3.5.16 |   81 MB |     false |      false |        30 |  997461461 |          997461461 |        |
+------------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+

bland-article-62755

06/11/2025, 9:09 PM

Essentially running :

kubectl -n kube-system exec -it ${etcdnode} -- etcdctl --endpoints 127.0.0.1:2379 --cacert /var/lib/rancher/rke2/server/tls/etcd/server-ca.crt --cert /var/lib/rancher/rke2/server/tls/etcd/server-client.crt --key /var/lib/rancher/rke2/server/tls/etcd/server-client.key endpoint status --cluster -w table

and

kubectl -n kube-system exec -it ${etcdnode} -- etcdctl --cert /var/lib/rancher/rke2/server/tls/etcd/server-client.crt --key /var/lib/rancher/rke2/server/tls/etcd/server-client.key --cacert /var/lib/rancher/rke2/server/tls/etcd/server-ca.crt defrag --cluster

creamy-pencil-82913

06/11/2025, 9:10 PM

its just directly proportional to how much you’re storing in there. It will not return space to the OS by default - so if you have a bunch of events or other temporary resources that cause it to need to store 1gb of resources, 1gb is where it will stay - even after those resources are deleted and the pages are freed.

creamy-pencil-82913

06/11/2025, 9:10 PM

Hopefully 1gb of disk space isn’t going to break the camel’s back. That’s nothing these days. I would just leave it alone, its not a problem, its not hurting anything.

bland-article-62755

06/11/2025, 9:11 PM

The way the warning were worded it seemed like there was a bunch of stale data it was hanging onto that could cause issues and the defrag only keeps what's active/relevant.

creamy-pencil-82913

06/11/2025, 9:12 PM

no. it’s not a problem. the pages just are there when etcd needs them, without having to grow the file on disk.

creamy-pencil-82913

06/11/2025, 9:12 PM

the alert as a whole is garbage as far as I’m concerned

bland-article-62755

06/11/2025, 9:12 PM

👍 thx

bland-article-62755

06/11/2025, 9:12 PM

It's good to know. 🙂

96 Views

Open in Slack

Previous Next