This message was deleted Rancher Users #rke2

Join Slack

This message was deleted.

# rke2

adamant-kite-43734

10/30/2023, 1:09 AM

This message was deleted.

wonderful-rain-13345

10/30/2023, 1:17 AM

rke2-server

doesn't exist as a service.

rancher-system-agent.service

does (this is one of the waiting for nodes)

wonderful-rain-13345

10/30/2023, 1:26 AM

Copy code

root@prod-cp-53a68577-62tkz:/home/elan# systemctl status  rancher-system-agent.service
● rancher-system-agent.service - Rancher System Agent
     Loaded: loaded (/etc/systemd/system/rancher-system-agent.service; enabled; vendor preset: enabled)
     Active: active (running) since Mon 2023-10-30 01:25:09 UTC; 1min 1s ago
       Docs: <https://www.rancher.com>
   Main PID: 1781 (rancher-system-)
      Tasks: 13 (limit: 9477)
     Memory: 12.0M
        CPU: 148ms
     CGroup: /system.slice/rancher-system-agent.service
             └─1781 /usr/local/bin/rancher-system-agent sentinel

Oct 30 01:25:09 prod-cp-53a68577-62tkz systemd[1]: Started Rancher System Agent.
Oct 30 01:25:09 prod-cp-53a68577-62tkz rancher-system-agent[1781]: time="2023-10-30T01:25:09Z" level=info msg="Rancher System Agent version v0.3.3 (9e827a5) is starting"
Oct 30 01:25:09 prod-cp-53a68577-62tkz rancher-system-agent[1781]: time="2023-10-30T01:25:09Z" level=info msg="Using directory /var/lib/rancher/agent/work for work"
Oct 30 01:25:09 prod-cp-53a68577-62tkz rancher-system-agent[1781]: time="2023-10-30T01:25:09Z" level=info msg="Starting remote watch of plans"
Oct 30 01:25:09 prod-cp-53a68577-62tkz rancher-system-agent[1781]: E1030 01:25:09.921285    1781 memcache.go:206] couldn't get resource list for <http://management.cattle.io/v3|management.cattle.io/v3>:
Oct 30 01:25:10 prod-cp-53a68577-62tkz rancher-system-agent[1781]: time="2023-10-30T01:25:10Z" level=info msg="Starting /v1, Kind=Secret controller"

wonderful-rain-13345

10/30/2023, 1:41 AM

Copy code

prod-cp-53a68577-6nzzb:/ # etcdctl --cert /var/lib/rancher/rke2/server/tls/etcd/server-client.crt --key /var/lib/rancher/rke2/server/tls/etcd/server-client.key --endpoints <https://127.0.0.1:2379/> --cacert /var/lib/rancher/rke2/server/tls/etcd/server-ca.crt member list
b1a35304b2000c6e, started, prod-cp-53a68577-ltsss-eed6eb90, <https://192.168.10.224:2380>, <https://192.168.10.224:2379>, false
c60256a5ceb9dd11, started, prod-cp-53a68577-6nzzb-de634ddb, <https://192.168.10.203:2380>, <https://192.168.10.203:2379>, false

etcd looks fine

wonderful-rain-13345

10/30/2023, 1:51 AM

Copy code

2023/10/30 01:50:44 [INFO] [machineprovision] fleet-default/prod-cp-53a68577-5t44s: reconciling machine job
2023-10-30T01:50:44.767917469Z 2023/10/30 01:50:44 [ERROR] error syncing 'fleet-default/prod-cp-53a68577-5t44s': handler machine-provision-remove: <http://machines.cluster.x-k8s.io|machines.cluster.x-k8s.io> "prod-cp-65f5b4b5bf-q5282" not found, requeuing

wonderful-rain-13345

10/30/2023, 2:02 AM

heh wow

wonderful-rain-13345

10/30/2023, 2:02 AM

if you accidentally disable a node driver it deletes all clusters.

wonderful-rain-13345

10/30/2023, 2:02 AM

fyi in case anyone was wondering

wonderful-rain-13345

10/30/2023, 2:06 AM

i'm not sure why the behavior is to delete your clusters.

wonderful-rain-13345

10/30/2023, 3:04 AM

welp, that restore worked! It's impressive

wonderful-rain-13345

10/30/2023, 3:26 AM

ok, so restored to older backup because i broke the rancher db lol, then that brought the downstream cluster online again, where it didn't sync up with the s3 backed up snapshots, so i curl'd a downloaded snapshot to an etcd node, and it sync'd up... it's the latest one i have. Now it's restoring the latest snapshot 😄 🤞🏼

🙌 1

wonderful-rain-13345

10/30/2023, 3:32 AM

looking good heh

wonderful-rain-13345

10/30/2023, 3:32 AM

i love how cluster autoscaler freaked out and spun up like 15 nodes lol

3 Views

Open in Slack

Previous Next