future-gigabyte-33261
05/25/2025, 11:31 AMfuture-gigabyte-33261
05/25/2025, 2:17 PMbland-article-62755
05/25/2025, 2:52 PMbland-article-62755
05/25/2025, 2:53 PMfuture-gigabyte-33261
05/25/2025, 8:50 PMfuture-gigabyte-33261
05/25/2025, 8:51 PMfuture-gigabyte-33261
05/25/2025, 8:53 PMfuture-gigabyte-33261
05/25/2025, 8:54 PMfuture-gigabyte-33261
05/25/2025, 8:56 PMfuture-gigabyte-33261
05/25/2025, 8:59 PMred-king-19196
05/26/2025, 3:33 AMfuture-gigabyte-33261
05/26/2025, 2:17 PMbland-article-62755
05/26/2025, 4:24 PMfuture-gigabyte-33261
05/26/2025, 4:33 PMprehistoric-morning-49258
05/26/2025, 7:12 PMfuture-gigabyte-33261
05/27/2025, 4:25 PMfuture-gigabyte-33261
05/27/2025, 4:26 PMfew-appointment-23216
05/29/2025, 8:38 AMfew-appointment-23216
05/29/2025, 8:53 AMbland-article-62755
05/29/2025, 2:09 PMbland-article-62755
05/29/2025, 2:10 PMbland-article-62755
05/29/2025, 2:11 PMwatch kubectl get events -A
future-gigabyte-33261
05/29/2025, 2:16 PMbland-article-62755
05/29/2025, 2:16 PMbland-article-62755
05/29/2025, 2:16 PMfuture-gigabyte-33261
05/29/2025, 2:17 PMfuture-gigabyte-33261
05/29/2025, 2:17 PMfuture-gigabyte-33261
05/29/2025, 2:18 PMbland-article-62755
05/29/2025, 2:19 PMprehistoric-morning-49258
05/29/2025, 2:19 PMfuture-gigabyte-33261
05/29/2025, 2:19 PMbland-article-62755
05/29/2025, 2:20 PMfuture-gigabyte-33261
05/29/2025, 2:20 PMfuture-gigabyte-33261
05/29/2025, 2:20 PMfuture-gigabyte-33261
05/29/2025, 2:20 PMbland-article-62755
05/29/2025, 2:20 PMfuture-gigabyte-33261
05/29/2025, 2:21 PMfuture-gigabyte-33261
05/29/2025, 2:21 PMfuture-gigabyte-33261
05/29/2025, 2:22 PMfuture-gigabyte-33261
05/29/2025, 2:22 PMfuture-gigabyte-33261
05/29/2025, 2:22 PMfuture-gigabyte-33261
05/29/2025, 2:24 PMbland-article-62755
05/29/2025, 2:26 PMfuture-gigabyte-33261
05/29/2025, 2:27 PMbland-article-62755
05/29/2025, 2:27 PMsystemctl
for the server processes and make sure they're running properly and/or journaldbland-article-62755
05/29/2025, 2:27 PMfuture-gigabyte-33261
05/29/2025, 2:28 PMfuture-gigabyte-33261
05/29/2025, 2:29 PMfuture-gigabyte-33261
05/29/2025, 2:32 PMfuture-gigabyte-33261
05/29/2025, 2:33 PMbland-article-62755
05/29/2025, 2:35 PMbland-article-62755
05/29/2025, 2:35 PMfuture-gigabyte-33261
05/29/2025, 2:35 PMbland-article-62755
05/29/2025, 2:36 PMfuture-gigabyte-33261
05/29/2025, 2:37 PMfuture-gigabyte-33261
05/29/2025, 2:37 PMfuture-gigabyte-33261
05/29/2025, 2:37 PMbland-article-62755
05/29/2025, 2:37 PMfuture-gigabyte-33261
05/29/2025, 2:39 PMfuture-gigabyte-33261
05/29/2025, 2:39 PMfuture-gigabyte-33261
05/29/2025, 2:39 PMfuture-gigabyte-33261
05/29/2025, 2:39 PMbland-article-62755
05/29/2025, 2:40 PMbut basically we only restarted the nodes a couple of times.
future-gigabyte-33261
05/29/2025, 2:40 PMfuture-gigabyte-33261
05/29/2025, 2:41 PMbland-article-62755
05/29/2025, 2:41 PMfuture-gigabyte-33261
05/29/2025, 2:41 PMfuture-gigabyte-33261
05/29/2025, 2:42 PMfuture-gigabyte-33261
05/29/2025, 2:42 PMfuture-gigabyte-33261
05/29/2025, 2:42 PMfuture-gigabyte-33261
05/29/2025, 2:42 PMbland-article-62755
05/29/2025, 2:42 PMfuture-gigabyte-33261
05/29/2025, 2:43 PMbland-article-62755
05/29/2025, 2:43 PMbland-article-62755
05/29/2025, 2:44 PMbland-article-62755
05/29/2025, 2:47 PMbland-article-62755
05/29/2025, 2:49 PMfuture-gigabyte-33261
05/29/2025, 2:52 PMfuture-gigabyte-33261
05/29/2025, 2:52 PMbland-article-62755
05/29/2025, 2:53 PM--cluster-reset
then restore one of the snapshotbland-article-62755
05/29/2025, 2:53 PMbland-article-62755
05/29/2025, 2:54 PMfuture-gigabyte-33261
05/29/2025, 2:54 PMfuture-gigabyte-33261
05/29/2025, 2:54 PMfuture-gigabyte-33261
05/29/2025, 2:55 PMfuture-gigabyte-33261
05/29/2025, 2:55 PMbland-article-62755
05/29/2025, 2:56 PMfuture-gigabyte-33261
05/29/2025, 2:57 PMfuture-gigabyte-33261
05/29/2025, 3:00 PMbland-article-62755
05/29/2025, 3:00 PMkubectl -n kube-system exec -it $(kubectl -n kube-system get pod -l component=etcd --no-headers -o custom-columns=NAME:.metadata.name | head -1) -- etcdctl --endpoints 127.0.0.1:2379 --cacert /var/lib/rancher/rke2/server/tls/etcd/server-ca.crt --cert /var/lib/rancher/rke2/server/tls/etcd/server-client.crt --key /var/lib/rancher/rke2/server/tls/etcd/server-client.key endpoint status --cluster -w table
bland-article-62755
05/29/2025, 3:01 PMfuture-gigabyte-33261
05/29/2025, 3:01 PMfuture-gigabyte-33261
05/29/2025, 3:03 PMbland-article-62755
05/29/2025, 3:04 PMbland-article-62755
05/29/2025, 3:05 PMbland-article-62755
05/29/2025, 3:05 PMbland-article-62755
05/29/2025, 3:05 PMbland-article-62755
05/29/2025, 3:05 PMfuture-gigabyte-33261
05/29/2025, 3:05 PMbland-article-62755
05/29/2025, 3:05 PMbland-article-62755
05/29/2025, 3:06 PMfuture-gigabyte-33261
05/29/2025, 3:06 PMbland-article-62755
05/29/2025, 3:06 PMbland-article-62755
05/29/2025, 3:07 PMfuture-gigabyte-33261
05/29/2025, 3:07 PMbland-article-62755
05/29/2025, 3:07 PMbland-article-62755
05/29/2025, 3:08 PMbland-article-62755
05/29/2025, 3:10 PMdf -h
on the nodes and see if there's anything you can do to clear up more space.bland-article-62755
05/29/2025, 3:10 PMfuture-gigabyte-33261
05/29/2025, 3:10 PMfuture-gigabyte-33261
05/29/2025, 3:11 PMbland-article-62755
05/29/2025, 3:11 PMcat /etc/os-release
future-gigabyte-33261
05/29/2025, 3:12 PMfuture-gigabyte-33261
05/29/2025, 3:12 PMfuture-gigabyte-33261
05/29/2025, 3:12 PMbland-article-62755
05/29/2025, 3:12 PMfuture-gigabyte-33261
05/29/2025, 3:13 PMfuture-gigabyte-33261
05/29/2025, 3:16 PMharvester-node01:~ # ./brian-script.sh
Getting etcd Status
{"level":"warn","ts":"2025-05-29T15:15:35.007994Z","logger":"etcd-client","caller":"v3/retry_interceptor.go:63","msg":"retrying of unary invoker failed","target":"<etcd-endpoints://0xc0007141e0/127.0.0.1:2379>","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: last connection error: connection error: desc = \"transport: Error while dialing: dial tcp 10.10.100.30:2379: connect: connection refused\""}
Failed to get the status of endpoint <https://10.10.100.30:2379> (context deadline exceeded)
+---------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------------------------------+
| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+---------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------------------------------+
| <https://10.10.100.10:2379> | 4df3314745170102 | 3.5.16 | 2.1 GB | false | false | 620 | 124220055 | 124220055 | memberID:5616887342432715010 |
| | | | | | | | | | alarm:NOSPACE |
| <https://10.10.100.20:2379> | d76c47a87e7992ad | 3.5.16 | 2.1 GB | true | false | 620 | 124220182 | 124220182 | memberID:5616887342432715010 |
| | | | | | | | | | alarm:NOSPACE |
+---------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------------------------------+
command terminated with exit code 1
Defragging the etcd in the current cluster via etcd-harvester-node01
{"level":"warn","ts":"2025-05-29T15:15:40.196025Z","logger":"etcd-client","caller":"v3/retry_interceptor.go:63","msg":"retrying of unary invoker failed","target":"<etcd-endpoints://0xc00031c000/127.0.0.1:2379>","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
Failed to defragment etcd member[<https://10.10.100.10:2379>] (context deadline exceeded)
{"level":"warn","ts":"2025-05-29T15:15:45.197574Z","logger":"etcd-client","caller":"v3/retry_interceptor.go:63","msg":"retrying of unary invoker failed","target":"<etcd-endpoints://0xc00031c000/127.0.0.1:2379>","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: last connection error: connection error: desc = \"transport: Error while dialing: dial tcp 10.10.100.30:2379: connect: connection refused\""}
Failed to defragment etcd member[<https://10.10.100.30:2379>] (context deadline exceeded)
{"level":"warn","ts":"2025-05-29T15:15:50.202868Z","logger":"etcd-client","caller":"v3/retry_interceptor.go:63","msg":"retrying of unary invoker failed","target":"<etcd-endpoints://0xc00031c000/127.0.0.1:2379>","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
Failed to defragment etcd member[<https://10.10.100.20:2379>] (context deadline exceeded)
command terminated with exit code 1
Getting etcd Health
{"level":"warn","ts":"2025-05-29T15:15:55.375067Z","logger":"client","caller":"v3/retry_interceptor.go:63","msg":"retrying of unary invoker failed","target":"<etcd-endpoints://0xc00026e000/10.10.100.20:2379>","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
{"level":"warn","ts":"2025-05-29T15:15:55.375035Z","logger":"client","caller":"v3/retry_interceptor.go:63","msg":"retrying of unary invoker failed","target":"<etcd-endpoints://0xc00027a000/10.10.100.30:2379>","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: last connection error: connection error: desc = \"transport: Error while dialing: dial tcp 10.10.100.30:2379: connect: connection refused\""}
+---------------------------+--------+--------------+---------------------------+
| ENDPOINT | HEALTH | TOOK | ERROR |
+---------------------------+--------+--------------+---------------------------+
| <https://10.10.100.10:2379> | false | 6.953365ms | Active Alarm(s): NOSPACE |
| <https://10.10.100.20:2379> | false | 5.001969326s | context deadline exceeded |
| <https://10.10.100.30:2379> | false | 5.001875971s | context deadline exceeded |
+---------------------------+--------+--------------+---------------------------+
Error: unhealthy cluster
command terminated with exit code 1
Getting etcd Status
{"level":"warn","ts":"2025-05-29T15:16:00.572917Z","logger":"etcd-client","caller":"v3/retry_interceptor.go:63","msg":"retrying of unary invoker failed","target":"<etcd-endpoints://0xc0004fe1e0/127.0.0.1:2379>","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest balancer error: last connection error: connection error: desc = \"transport: Error while dialing: dial tcp 10.10.100.30:2379: connect: connection refused\""}
Failed to get the status of endpoint <https://10.10.100.30:2379> (context deadline exceeded)
+---------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------------------------------+
| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+---------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------------------------------+
| <https://10.10.100.10:2379> | 4df3314745170102 | 3.5.16 | 2.1 GB | false | false | 620 | 124220498 | 124220498 | memberID:5616887342432715010 |
| | | | | | | | | | alarm:NOSPACE |
| <https://10.10.100.20:2379> | d76c47a87e7992ad | 3.5.16 | 2.1 GB | true | false | 620 | 124220669 | 124220669 | memberID:5616887342432715010 |
| | | | | | | | | | alarm:NOSPACE |
+---------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------------------------------+
command terminated with exit code 1
bland-article-62755
05/29/2025, 3:17 PMbland-article-62755
05/29/2025, 3:18 PMfuture-gigabyte-33261
05/29/2025, 3:19 PMbland-article-62755
05/29/2025, 3:19 PMbland-article-62755
05/29/2025, 3:19 PMfuture-gigabyte-33261
05/29/2025, 3:20 PMbland-article-62755
05/29/2025, 3:23 PMkubectl -n kube-system get pod -l component=etcd
future-gigabyte-33261
05/29/2025, 3:24 PMbland-article-62755
05/29/2025, 3:28 PMkubectl -n kube-system exec -it etcd-harvester-node02 -- etcdctl --cert /var/lib/rancher/rke2/server/tls/etcd/server-client.crt --key /var/lib/rancher/rke2/server/tls/etcd/server-client.key --cacert /var/lib/rancher/rke2/server/tls/etcd/server-ca.crt defrag
future-gigabyte-33261
05/29/2025, 3:29 PMbland-article-62755
05/29/2025, 3:29 PMkubectl get nodes
?future-gigabyte-33261
05/29/2025, 3:30 PMfuture-gigabyte-33261
05/29/2025, 3:31 PMbland-article-62755
05/29/2025, 3:34 PMbland-article-62755
05/29/2025, 3:35 PMbland-article-62755
05/29/2025, 3:37 PMbland-article-62755
05/29/2025, 3:41 PMfuture-gigabyte-33261
05/29/2025, 3:41 PMfuture-gigabyte-33261
05/29/2025, 3:44 PMfuture-gigabyte-33261
05/29/2025, 3:57 PMfuture-gigabyte-33261
05/29/2025, 3:59 PMred-king-19196
06/03/2025, 4:05 PMred-king-19196
06/03/2025, 4:06 PMred-king-19196
06/03/2025, 4:07 PMrke2-master-plan
and rke2-worker-plan
red-king-19196
06/03/2025, 4:07 PMkubectl -n cattle-system get plans
red-king-19196
06/03/2025, 4:08 PMred-king-19196
06/03/2025, 4:10 PM$ kubectl -n cattle-system get plans hvst-upgrade-grcnc-prepare -o yaml
apiVersion: <http://upgrade.cattle.io/v1|upgrade.cattle.io/v1>
kind: Plan
metadata:
annotations:
<http://sim.harvesterhci.io/creationTimestamp|sim.harvesterhci.io/creationTimestamp>: "2025-05-25T12:00:39Z"
creationTimestamp: "2025-05-25T12:00:39Z"
generation: 1
labels:
<http://harvesterhci.io/upgrade|harvesterhci.io/upgrade>: hvst-upgrade-grcnc
<http://harvesterhci.io/upgradeComponent|harvesterhci.io/upgradeComponent>: node
name: hvst-upgrade-grcnc-prepare
namespace: cattle-system
resourceVersion: "1910"
uid: 20f139cc-1370-41f2-92b2-e3454dead2a7
spec:
concurrency: 1
jobActiveDeadlineSecs: 3600
nodeSelector:
matchExpressions:
- key: <http://upgrade.cattle.io/disable|upgrade.cattle.io/disable>
operator: Exists
- key: <http://upgrade.cattle.io/disable|upgrade.cattle.io/disable>
operator: Exists
- key: <http://upgrade.cattle.io/disable|upgrade.cattle.io/disable>
operator: Exists
- key: <http://upgrade.cattle.io/disable|upgrade.cattle.io/disable>
operator: Exists
... the list goes on and on ...
red-king-19196
06/03/2025, 4:13 PM❯ kubectl -n harvester-system get upgrades hvst-upgrade-grcnc -o yaml
apiVersion: <http://harvesterhci.io/v1beta1|harvesterhci.io/v1beta1>
kind: Upgrade
metadata:
annotations:
<http://harvesterhci.io/auto-cleanup-system-generated-snapshot|harvesterhci.io/auto-cleanup-system-generated-snapshot>: "true"
<http://harvesterhci.io/replica-replenishment-wait-interval|harvesterhci.io/replica-replenishment-wait-interval>: "600"
<http://sim.harvesterhci.io/creationTimestamp|sim.harvesterhci.io/creationTimestamp>: "2025-05-25T11:22:06Z"
creationTimestamp: "2025-05-25T11:22:06Z"
finalizers:
- <http://wrangler.cattle.io/harvester-upgrade-controller|wrangler.cattle.io/harvester-upgrade-controller>
generateName: hvst-upgrade-
generation: 1
labels:
<http://harvesterhci.io/latestUpgrade|harvesterhci.io/latestUpgrade>: "true"
<http://harvesterhci.io/upgradeState|harvesterhci.io/upgradeState>: UpgradingNodes
name: hvst-upgrade-grcnc
namespace: harvester-system
resourceVersion: "2387"
uid: 4edce737-efd1-4000-aa75-5066137c1e0d
spec:
logEnabled: true
version: v1.4.2
status:
conditions:
- status: Unknown
type: Completed
- status: "True"
type: LogReady
- status: "True"
type: ImageReady
- status: "True"
type: RepoReady
- lastUpdateTime: "2025-05-25T22:13:13Z"
status: "True"
type: NodesPrepared
- status: "True"
type: SystemServicesUpgraded
- status: Unknown
type: NodesUpgraded
imageID: harvester-system/hvst-upgrade-grcnc
nodeStatuses:
harvester-node03:
state: Images preloading
previousVersion: v1.4.1
repoInfo: |
release:
harvester: v1.4.2
harvesterChart: 1.4.2
os: Harvester v1.4.2
kubernetes: v1.31.4+rke2r1
rancher: v2.10.1
monitoringChart: 103.1.1+up45.31.1
minUpgradableVersion: v1.4.1
upgradeLog: hvst-upgrade-grcnc-upgradelog
red-king-19196
06/03/2025, 4:14 PMfuture-gigabyte-33261
06/13/2025, 10:20 PM