This message was deleted.
# rke2
a
This message was deleted.
w
rke2-server
doesn't exist as a service.
rancher-system-agent.service
does (this is one of the waiting for nodes)
Copy code
root@prod-cp-53a68577-62tkz:/home/elan# systemctl status  rancher-system-agent.service
● rancher-system-agent.service - Rancher System Agent
     Loaded: loaded (/etc/systemd/system/rancher-system-agent.service; enabled; vendor preset: enabled)
     Active: active (running) since Mon 2023-10-30 01:25:09 UTC; 1min 1s ago
       Docs: <https://www.rancher.com>
   Main PID: 1781 (rancher-system-)
      Tasks: 13 (limit: 9477)
     Memory: 12.0M
        CPU: 148ms
     CGroup: /system.slice/rancher-system-agent.service
             └─1781 /usr/local/bin/rancher-system-agent sentinel

Oct 30 01:25:09 prod-cp-53a68577-62tkz systemd[1]: Started Rancher System Agent.
Oct 30 01:25:09 prod-cp-53a68577-62tkz rancher-system-agent[1781]: time="2023-10-30T01:25:09Z" level=info msg="Rancher System Agent version v0.3.3 (9e827a5) is starting"
Oct 30 01:25:09 prod-cp-53a68577-62tkz rancher-system-agent[1781]: time="2023-10-30T01:25:09Z" level=info msg="Using directory /var/lib/rancher/agent/work for work"
Oct 30 01:25:09 prod-cp-53a68577-62tkz rancher-system-agent[1781]: time="2023-10-30T01:25:09Z" level=info msg="Starting remote watch of plans"
Oct 30 01:25:09 prod-cp-53a68577-62tkz rancher-system-agent[1781]: E1030 01:25:09.921285    1781 memcache.go:206] couldn't get resource list for <http://management.cattle.io/v3|management.cattle.io/v3>:
Oct 30 01:25:10 prod-cp-53a68577-62tkz rancher-system-agent[1781]: time="2023-10-30T01:25:10Z" level=info msg="Starting /v1, Kind=Secret controller"
Copy code
prod-cp-53a68577-6nzzb:/ # etcdctl --cert /var/lib/rancher/rke2/server/tls/etcd/server-client.crt --key /var/lib/rancher/rke2/server/tls/etcd/server-client.key --endpoints <https://127.0.0.1:2379/> --cacert /var/lib/rancher/rke2/server/tls/etcd/server-ca.crt member list
b1a35304b2000c6e, started, prod-cp-53a68577-ltsss-eed6eb90, <https://192.168.10.224:2380>, <https://192.168.10.224:2379>, false
c60256a5ceb9dd11, started, prod-cp-53a68577-6nzzb-de634ddb, <https://192.168.10.203:2380>, <https://192.168.10.203:2379>, false
etcd looks fine
Copy code
2023/10/30 01:50:44 [INFO] [machineprovision] fleet-default/prod-cp-53a68577-5t44s: reconciling machine job
2023-10-30T01:50:44.767917469Z 2023/10/30 01:50:44 [ERROR] error syncing 'fleet-default/prod-cp-53a68577-5t44s': handler machine-provision-remove: <http://machines.cluster.x-k8s.io|machines.cluster.x-k8s.io> "prod-cp-65f5b4b5bf-q5282" not found, requeuing
heh wow
if you accidentally disable a node driver it deletes all clusters.
fyi in case anyone was wondering
i'm not sure why the behavior is to delete your clusters.
welp, that restore worked! It's impressive
ok, so restored to older backup because i broke the rancher db lol, then that brought the downstream cluster online again, where it didn't sync up with the s3 backed up snapshots, so i curl'd a downloaded snapshot to an etcd node, and it sync'd up... it's the latest one i have. Now it's restoring the latest snapshot πŸ˜„ 🀞🏼
πŸ™Œ 1
looking good heh
i love how cluster autoscaler freaked out and spun up like 15 nodes lol