This message was deleted Rancher Users #rke2

Join Slack

This message was deleted.

# rke2

adamant-kite-43734

10/01/2023, 4:58 AM

This message was deleted.

wonderful-rain-13345

10/01/2023, 5:00 AM

Something is blocking the provisioning. I've tried removing the node pools and making new ones-- didn't help.

wonderful-rain-13345

10/01/2023, 5:01 AM

I'm not even sure where the logs are for seeing what is blocked

wonderful-rain-13345

10/01/2023, 5:12 AM

i have a MachineDeployments that gets created, state is

Active

but no machinesets.

wonderful-rain-13345

10/01/2023, 6:27 AM

thankfully i have backups of my rancher db too 😄

wonderful-rain-13345

10/01/2023, 7:10 AM

do snapshots get deleted from s3 after they are restored?!

wonderful-rain-13345

10/01/2023, 7:11 AM

I see the snapshot in rancher UI, but not in my bucket?

wonderful-rain-13345

10/01/2023, 7:11 AM

https://docs.rke2.io/backup_restore#restoring-a-snapshot-to-existing-nodes

wonderful-rain-13345

10/01/2023, 7:12 AM

says

--cluster-reset-restore-path=

why is it trying to look on file system AND s3 and failing when it finds it in S3 but not in file system, then it removed it from s3? and now it can't restore because it's only on file system?

miniature-salesclerk-33951

10/01/2023, 11:15 PM

IIRC, it defaults to S3 even if you have local. There was an S3 disable arg to get it to use local instead of S3.

miniature-salesclerk-33951

10/01/2023, 11:16 PM

Note that you still need the agent token to be present, so if you didn't back that up before deleting all the nodes you might be screwed

miniature-salesclerk-33951

10/01/2023, 11:18 PM

But if you have the agent token and a local backup you should be able to make it work by bootstrapping it and then adding the other CP and agent nodes. Not a bad idea to restore the other CP nodes from the same backup before rejoining them

wonderful-rain-13345

10/01/2023, 11:27 PM

Docs seem to say you don't need token if it's an existing node hmm

wonderful-rain-13345

10/01/2023, 11:27 PM

Good thing I have backups of all 3 nodes+rancher 😂

wonderful-rain-13345

10/01/2023, 11:28 PM

I'm using vSphere so I've been snapshotting them and reverting in like 10 secs when it fails haha

🧙 1

👍 1

wonderful-rain-13345

10/01/2023, 11:28 PM

Thanks Scott

miniature-salesclerk-33951

10/01/2023, 11:53 PM

The token gets written to the RKE2 systemd service and in /var/lib/rancher/rke2/agent/. If RKE2 is uninstalled and those get deleted then you need to supply the token again.

wonderful-rain-13345

10/01/2023, 11:59 PM

yeah makes sense, i should be good because i'm not removing anything, and have the nodes

👍 1

wonderful-rain-13345

10/02/2023, 12:03 AM

is the token the same for all nodes?

wonderful-rain-13345

10/02/2023, 12:06 AM

do i still do cluster reset?

miniature-salesclerk-33951

10/02/2023, 12:17 AM

Token is same for all nodes

wonderful-rain-13345

10/02/2023, 12:18 AM

so the 2 104 day are the ones i started with

wonderful-rain-13345

10/02/2023, 12:18 AM

i had 3, but rancher deleted for me 😄

wonderful-rain-13345

10/02/2023, 12:18 AM

the 16hr running one i guess got replicated lol

wonderful-rain-13345

10/02/2023, 12:18 AM

with the error ones, i think RKE2 gets uninstalled when taht happens.

miniature-salesclerk-33951

10/02/2023, 12:18 AM

Pass

--etcd-s3=false

for it to use the local data instead of s3

wonderful-rain-13345

10/02/2023, 12:18 AM

ohh thank you!

wonderful-rain-13345

10/02/2023, 12:20 AM

so what i am trying to do: • Restore rancher to a known good working state prior to the state-- my cluster was stuck reconciling waiting for a control plane, worker, etcd, however it wouldn't make any lol • once rancher is operational for the cluster again, Then restore the etcd backup from a future time hah

miniature-salesclerk-33951

10/02/2023, 12:20 AM

Then

--cluster-reset-restore-path=

is the full path to your local backup

wonderful-rain-13345

10/02/2023, 12:21 AM

yeah it was strange without the s3 flag, it looked at S3, complained key didn't exist. then i corrected it to match s3 path, it found it, then it complained the s3 path didn't exist in snapshots directory

miniature-salesclerk-33951

10/02/2023, 12:21 AM

Probably in /var/lib/rancher/rke2/server/db/snapshots

wonderful-rain-13345

10/02/2023, 12:21 AM

yeah that is correct path

wonderful-rain-13345

10/02/2023, 12:28 AM

so if i only have 1 node do i still need to do a reset?

wonderful-rain-13345

10/02/2023, 12:28 AM

or just a restore

miniature-salesclerk-33951

10/02/2023, 3:21 AM

Good question. It's been a minute since I've done it

miniature-salesclerk-33951

10/02/2023, 3:22 AM

If you're down to one node, then I believe the answer is yes - https://docs.rke2.io/backup_restore#cluster-reset

wonderful-rain-13345

10/02/2023, 3:23 AM

wonderful-rain-13345

10/02/2023, 3:23 AM

miniature-salesclerk-33951

10/02/2023, 3:23 AM

Remove the server/db directory on the other CP nodes and agents before starting the RKE2 service

miniature-salesclerk-33951

10/02/2023, 3:24 AM

IOW - these steps - https://docs.rke2.io/backup_restore#restoring-a-snapshot-to-existing-nodes

miniature-salesclerk-33951

10/02/2023, 3:26 AM

Except add the

--etcd-s3=false

option when you restore on the first node

wonderful-rain-13345

10/02/2023, 3:26 AM

thank you Scott

miniature-salesclerk-33951

10/02/2023, 3:27 AM

Good luck!

wonderful-rain-13345

10/02/2023, 3:27 AM

hehe yeah

wonderful-rain-13345

10/02/2023, 3:27 AM

i mean it's just my home lab but i really liked that pet

wonderful-rain-13345

10/02/2023, 3:27 AM

i had just gotten everything working and solved my metalb/arp stuff

wonderful-rain-13345

10/02/2023, 3:28 AM

I have messed up so many clusters in rancher w/ failed reconciles. First time i broke a cluster when restoring a snapshot

miniature-salesclerk-33951

10/02/2023, 3:44 AM

Oh man, yeah, I hope you can save it 😬

wonderful-rain-13345

10/02/2023, 3:45 AM

heh, yeah I need to find some good docs on setting up cilium w/o kubeproxy

wonderful-rain-13345

10/02/2023, 5:09 AM

Cp? You mean etcd?

miniature-salesclerk-33951

10/02/2023, 3:46 PM

CP = control plane, assuming you are running etcd on the control plane nodes.

wonderful-rain-13345

10/02/2023, 3:46 PM

ahh no

wonderful-rain-13345

10/02/2023, 3:46 PM

each in their own pools

miniature-salesclerk-33951

10/02/2023, 3:47 PM

Ah, then the etcd nodes

wonderful-rain-13345

10/02/2023, 3:47 PM

😉 yeah thats what i figured

miniature-salesclerk-33951

10/02/2023, 6:11 PM

Any luck?

wonderful-rain-13345

10/02/2023, 6:32 PM

heh i restored a backup of my rancher node because it started returning 404. I restored from 2 days ago.

wonderful-rain-13345

10/02/2023, 6:33 PM

Copy code

root@rancher-02:/var/log# kubectl get pods --all-namespaces
NAMESPACE     NAME                                      READY   STATUS      RESTARTS   AGE
kube-system   helm-install-traefik-crd-kxrsn            0/1     Completed   0          76m
kube-system   helm-install-traefik-cfqmc                0/1     Completed   2          76m
kube-system   local-path-provisioner-687d6d7765-7qx8p   1/1     Running     0          76m
kube-system   svclb-traefik-0abeebb0-bnk28              2/2     Running     0          73m
kube-system   coredns-7b5bbc6644-bc8dm                  1/1     Running     0          76m
kube-system   traefik-64b96ccbcd-v728v                  1/1     Running     0          73m
kube-system   metrics-server-667586758d-ht9nn           1/1     Running     0          76m

wonderful-rain-13345

10/02/2023, 6:33 PM

my rancher is gone?

wonderful-rain-13345

10/02/2023, 6:33 PM

lol so i use kind, and postgres backs it. so the only thing i can think of is my db is borked? from a 2 day ago backup, where it was known good? lol

wonderful-rain-13345

10/02/2023, 6:39 PM

i wonder if it's because i forgot to shutdown rancher vm before restoring DB

wonderful-rain-13345

10/02/2023, 9:53 PM

hmm strange. rancher wrote 400mb to it

wonderful-rain-13345

10/02/2023, 9:59 PM

HAHAHA

wonderful-rain-13345

10/02/2023, 10:00 PM

i have 2 postgres pods on that host. they "switched" ports somehow and one has an empty kine db.

wonderful-rain-13345

10/02/2023, 10:03 PM

Copy code

{"level":"info","ts":"2023-10-02T22:03:10.874835Z","caller":"rafthttp/transport.go:355","msg":"removed remote peer","local-member-id":"82b4997c08526da6","removed-remote-peer-id":"53b8bb1243338979"}
panic: removed all voters

goroutine 229 [running]:
<http://go.etcd.io/etcd/raft/v3.(*raft).applyConfChange(0x0|go.etcd.io/etcd/raft/v3.(*raft).applyConfChange(0x0>?, {0x0, {0xc003202d10, 0x1, 0x1}, {0x0, 0x0, 0x0}})
        /go/pkg/mod/github.com/k3s-io/etcd/raft/v3@v3.5.9-k3s1/raft.go:1633 +0x1d4
<http://go.etcd.io/etcd/raft/v3.(*node).run(0xc0008bd7a0)|go.etcd.io/etcd/raft/v3.(*node).run(0xc0008bd7a0)>
        /go/pkg/mod/github.com/k3s-io/etcd/raft/v3@v3.5.9-k3s1/node.go:360 +0xaf7
created by <http://go.etcd.io/etcd/raft/v3.RestartNode|go.etcd.io/etcd/raft/v3.RestartNode>
        /go/pkg/mod/github.com/k3s-io/etcd/raft/v3@v3.5.9-k3s1/node.go:244 +0x24a

wonderful-rain-13345

10/02/2023, 10:05 PM

ok removing etcd/* worked

wonderful-rain-13345

10/02/2023, 10:08 PM

odd, it's trying to talk to other etcd nodes

Copy code

root@production-home-etcd-646e4b00-qvm24:/var/lib/rancher/rke2/server/db# rke2 server   --cluster-reset --cluster-reset-restore-path=/var/lib/rancher/rke2/server/db/snapshots/etcd-snapshot-production-home-etcd-646e4b00-qvm24-1696104001 --etcd-s3=false
WARN[0000] not running in CIS mode
INFO[0000] Applying Pod Security Admission Configuration
INFO[0000] Static pod cleanup in progress
INFO[0000] Logging temporary containerd to /var/lib/rancher/rke2/agent/containerd/containerd.log
INFO[0000] Running temporary containerd /var/lib/rancher/rke2/bin/containerd -c /var/lib/rancher/rke2/agent/etc/containerd/config.toml -a /run/k3s/containerd/containerd.sock --state /run/k3s/containerd --root /var/lib/rancher/rke2/agent/containerd
INFO[0010] Static pod cleanup completed successfully
WARN[0010] remove /var/lib/rancher/rke2/agent/etc/rke2-agent-load-balancer.json: no such file or directory
WARN[0010] remove /var/lib/rancher/rke2/agent/etc/rke2-api-server-agent-load-balancer.json: no such file or directory
INFO[0010] Starting rke2 v1.26.8+rke2r1 (6fc8479d8b95283b1422ad77cb3da6c9132374d2)
FATA[0016] starting kubernetes: preparing server: failed to get CA certs: Get "<https://172.16.1.167:9345/cacerts>": dial tcp 172.16.1.167:9345: connect: no route to host

wonderful-rain-13345

10/02/2023, 10:08 PM

that ip is an old etcd

wonderful-rain-13345

10/02/2023, 10:11 PM

deleted them from rancher

wonderful-rain-13345

10/02/2023, 10:12 PM

i'll try this later. strange

miniature-salesclerk-33951

10/02/2023, 11:11 PM

Were you trying to manage the same cluster with both Rancher and kind or were you running kind somewhere else with a postgres database running in RKE2?

miniature-salesclerk-33951

10/02/2023, 11:13 PM

Yeah, I mentioned clearing /var/lib/rancher/rke2/server/db before joining the other nodes. The old pod data might still be there until the dust settles. Were you using local hostpath for storage or something else? Longhorn, Ceph, etc?

miniature-salesclerk-33951

10/02/2023, 11:14 PM

If you joined the other etcd nodes before deleting the stale data, then the stale data might win over the backup if it hits quorum

wonderful-rain-13345

10/03/2023, 12:23 AM

The node it was trying to reach is dead. Data is vSphere csi

wonderful-rain-13345

10/03/2023, 12:24 AM

It was looking for a node that existed at the time of the backup.

wonderful-rain-13345

10/03/2023, 2:44 PM

rke2 server --cluster-reset causes a panic lol

wonderful-rain-13345

10/03/2023, 2:50 PM

had to clear out

etcd

directory

👍 1

miniature-salesclerk-33951

10/03/2023, 3:24 PM

I recall something about this where the etcd member controller runs on the etcd leader so if the current etcd leader goes down, deleting a node wouldn't actually remove it from etcd and it impacted clusters with separate etcd and CP nodes

miniature-salesclerk-33951

10/03/2023, 3:24 PM

There was an issue for it in github recently. I think the stale members can be manually removed with etcdctl

wonderful-rain-13345

10/03/2023, 3:38 PM

Hmm so if I add a go to etcd will it rebuild the machine or just deploy it on the existing?

wonderful-rain-13345

10/03/2023, 3:39 PM

Heh will find out just made the change

wonderful-rain-13345

10/03/2023, 3:39 PM

Haha it deployed a new node

wonderful-rain-13345

10/03/2023, 3:45 PM

hmm

Copy code

<http://cattle.io/cn-2600_1700_1ce0_c4bf__e0-e9acbc:2600:1700:1ce0:c4bf::e0|cattle.io/cn-2600_1700_1ce0_c4bf__e0-e9acbc:2600:1700:1ce0:c4bf::e0> <http://listener.cattle.io/cn-__1-f16284:::1|listener.cattle.io/cn-__1-f16284:::1> <http://listener.cattle.io/cn-kubernetes:kubernetes|listener.cattle.io/cn-kubernetes:kubernetes> <http://listener.cattle.io/cn-kubernetes.default:kubernetes.default|listener.cattle.io/cn-kubernetes.default:kubernetes.default> <http://listener.cattle.io/cn-kubernetes.default.svc:kubernetes.default.svc|listener.cattle.io/cn-kubernetes.default.svc:kubernetes.default.svc> <http://listener.cattle.io/cn-kubernetes.default.svc.cluster.local:kubernetes.default.svc.cluster.local|listener.cattle.io/cn-kubernetes.default.svc.cluster.local:kubernetes.default.svc.cluster.local> <http://listener.cattle.io/cn-localhost:localhost|listener.cattle.io/cn-localhost:localhost> <http://listener.cattle.io/cn-production-home-cp-1c6758a3-bkl59:production-home-cp-1c6758a3-bkl59|listener.cattle.io/cn-production-home-cp-1c6758a3-bkl59:production-home-cp-1c6758a3-bkl59> <http://listener.cattle.io/cn-production-home-cp-1c6758a3-ftmr6:production-home-cp-1c6758a3-ftmr6|listener.cattle.io/cn-production-home-cp-1c6758a3-ftmr6:production-home-cp-1c6758a3-ftmr6> <http://listener.cattle.io/cn-production-home-cp-1c6758a3-wbptl:production-home-cp-1c6758a3-wbptl|listener.cattle.io/cn-production-home-cp-1c6758a3-wbptl:production-home-cp-1c6758a3-wbptl> <http://listener.cattle.io/cn-production-home-cp-47e6a5f9-2bqk9:production-home-cp-47e6a5f9-2bqk9|listener.cattle.io/cn-production-home-cp-47e6a5f9-2bqk9:production-home-cp-47e6a5f9-2bqk9> <http://listener.cattle.io/cn-production-home-cp-47e6a5f9-2m44h:production-home-cp-47e6a5f9-2m44h|listener.cattle.io/cn-production-home-cp-47e6a5f9-2m44h:production-home-cp-47e6a5f9-2m44h> <http://listener.cattle.io/cn-production-home-cp-47e6a5f9-7tg6c:production-home-cp-47e6a5f9-7tg6c|listener.cattle.io/cn-production-home-cp-47e6a5f9-7tg6c:production-home-cp-47e6a5f9-7tg6c> <http://listener.cattle.io/cn-production-home-cp-47e6a5f9-97vst:production-home-cp-47e6a5f9-97vst|listener.cattle.io/cn-production-home-cp-47e6a5f9-97vst:production-home-cp-47e6a5f9-97vst> <http://listener.cattle.io/cn-production-home-cp-47e6a5f9-9mt25:production-home-cp-47e6a5f9-9mt25|listener.cattle.io/cn-production-home-cp-47e6a5f9-9mt25:production-home-cp-47e6a5f9-9mt25> <http://listener.cattle.io/cn-production-home-cp-47e6a5f9-9tgs6:production-home-cp-47e6a5f9-9tgs6|listener.cattle.io/cn-production-home-cp-47e6a5f9-9tgs6:production-home-cp-47e6a5f9-9tgs6> <http://listener.cattle.io/cn-production-home-cp-47e6a5f9-brq2h:production-home-cp-47e6a5f9-brq2h|listener.cattle.io/cn-production-home-cp-47e6a5f9-brq2h:production-home-cp-47e6a5f9-brq2h> <http://listener.cattle.io/cn-production-home-cp-47e6a5f9-hp4m9:production-home-cp-47e6a5f9-hp4m9|listener.cattle.io/cn-production-home-cp-47e6a5f9-hp4m9:production-home-cp-47e6a5f9-hp4m9> <http://listener.cattle.io/cn-production-home-cp-47e6a5f9-snvp4:production-home-cp-47e6a5f9-snvp4|listener.cattle.io/cn-production-home-cp-47e6a5f9-snvp4:production-home-cp-47e6a5f9-snvp4> <http://listener.cattle.io/cn-production-home-cp-47e6a5f9-vjpfl:production-home-cp-47e6a5f9-vjpfl|listener.cattle.io/cn-production-home-cp-47e6a5f9-vjpfl:production-home-cp-47e6a5f9-vjpfl> <http://listener.cattle.io/cn-production-home-cp-47e6a5f9-wbbwv:production-home-cp-47e6a5f9-wbbwv|listener.cattle.io/cn-production-home-cp-47e6a5f9-wbbwv:production-home-cp-47e6a5f9-wbbwv> <http://listener.cattle.io/cn-production-home-cp-b91dac01-2ckzj:production-home-cp-b91dac01-2ckzj|listener.cattle.io/cn-production-home-cp-b91dac01-2ckzj:production-home-cp-b91dac01-2ckzj> <http://listener.cattle.io/cn-production-home-cp-b91dac01-9b55j:production-home-cp-b91dac01-9b55j|listener.cattle.io/cn-production-home-cp-b91dac01-9b55j:production-home-cp-b91dac01-9b55j> <http://listener.cattle.io/cn-production-home-cp-b91dac01-g4kb9:production-home-cp-b91dac01-g4kb9|listener.cattle.io/cn-production-home-cp-b91dac01-g4kb9:production-home-cp-b91dac01-g4kb9> <http://listener.cattle.io/cn-production-home-cp-b91dac01-xrgrg:production-home-cp-b91dac01-xrgrg|listener.cattle.io/cn-production-home-cp-b91dac01-xrgrg:production-home-cp-b91dac01-xrgrg> <http://listener.cattle.io/cn-production-home-cp-bd9306a2-pw7h4:production-home-cp-bd9306a2-pw7h4|listener.cattle.io/cn-production-home-cp-bd9306a2-pw7h4:production-home-cp-bd9306a2-pw7h4> <http://listener.cattle.io/cn-production-home-cp-bd9306a2-vcdwb:production-home-cp-bd9306a2-vcdwb|listener.cattle.io/cn-production-home-cp-bd9306a2-vcdwb:production-home-cp-bd9306a2-vcdwb> <http://listener.cattle.io/cn-production-home-cp-bd9306a2-wvfgr:production-home-cp-bd9306a2-wvfgr|listener.cattle.io/cn-production-home-cp-bd9306a2-wvfgr:production-home-cp-bd9306a2-wvfgr> <http://listener.cattle.io/cn-production-home-cp-de9b0e1e-7r9t5:production-home-cp-de9b0e1e-7r9t5|listener.cattle.io/cn-production-home-cp-de9b0e1e-7r9t5:production-home-cp-de9b0e1e-7r9t5> <http://listener.cattle.io/cn-production-home-cp-de9b0e1e-8fwxt:production-home-cp-de9b0e1e-8fwxt|listener.cattle.io/cn-production-home-cp-de9b0e1e-8fwxt:production-home-cp-de9b0e1e-8fwxt> <http://listener.cattle.io/cn-production-home-etcd-646e4b00-dd7sd:production-home-etcd-646e4b00-dd7sd|listener.cattle.io/cn-production-home-etcd-646e4b00-dd7sd:production-home-etcd-646e4b00-dd7sd> <http://listener.cattle.io/cn-production-home-etcd-646e4b00-k5rhp:production-home-etcd-646e4b00-k5rhp|listener.cattle.io/cn-production-home-etcd-646e4b00-k5rhp:production-home-etcd-646e4b00-k5rhp> <http://listener.cattle.io/cn-production-home-etcd-646e4b00-p5p4n:production-home-etcd-646e4b00-p5p4n|listener.cattle.io/cn-production-home-etcd-646e4b00-p5p4n:production-home-etcd-646e4b00-p5p4n> <http://listener.cattle.io/cn-production-home-etcd-646e4b00-qvm24:production-home-etcd-646e4b00-qvm24|listener.cattle.io/cn-production-home-etcd-646e4b00-qvm24:production-home-etcd-646e4b00-qvm24> <http://listener.cattle.io/cn-production-home-etcd-646e4b00-wtmt2:production-home-etcd-646e4b00-wtmt2|listener.cattle.io/cn-production-home-etcd-646e4b00-wtmt2:production-home-etcd-646e4b00-wtmt2> <http://listener.cattle.io/cn-production-home-etcd-646e4b00-zk2sb:production-home-etcd-646e4b00-zk2sb|listener.cattle.io/cn-production-home-etcd-646e4b00-zk2sb:production-home-etcd-646e4b00-zk2sb> <http://listener.cattle.io/fingerprint:SHA1=D66CAF4BE3B4AA14BB9890D3FCF178F745D7020A]|listener.cattle.io/fingerprint:SHA1=D66CAF4BE3B4AA14BB9890D3FCF178F745D7020A]>"

Those nodes are the old ones, and there are no CP nodes.

wonderful-rain-13345

10/03/2023, 4:11 PM

heh

wonderful-rain-13345

10/03/2023, 4:11 PM

so i removed all the etcd nodes.

rkecontrolplane was already initialized but no etcd machines exist that have plans, indicating the etcd plane has been entirely replaced. Restoration from etcd snapshot is required.

wonderful-rain-13345

10/03/2023, 4:11 PM

I restored etcd only from snapshot. Error applying plan -- check rancher-system-agent.service logs on node for more information

wonderful-rain-13345

10/03/2023, 4:11 PM

haha

wonderful-rain-13345

10/03/2023, 4:19 PM

ok. strange. very strange I am back where i started here with being unable to deploy anything. It's waiting for stuff to get registred but won't actually spawn anything. No machine sets or anything

wonderful-rain-13345

10/03/2023, 4:22 PM

heh you require etcd to restore the cluster configuration, which lives within rancher?

miniature-salesclerk-33951

10/03/2023, 6:11 PM

Re "rkecontrolplane was already initialized but no etcd machines exist that have plans, indicating the etcd plane has been entirely replaced. Restoration from etcd snapshot is required." - This is when you would use

--cluster-reset

with the snapshot restore.

miniature-salesclerk-33951

10/03/2023, 6:14 PM

Ah, this is a downstream cluster you provisioned with Rancher and not a standalone RKE2 cluster. In the future, I would have first attempted the etcd restore from Cluster Management in Rancher. I think you can still save it though if you get etcd restored and then the rancher-system-agent running on the nodes.

wonderful-rain-13345

10/03/2023, 6:19 PM

restore in rancher is how i got into this mess 🙂

wonderful-rain-13345

10/03/2023, 6:19 PM

hmm okl

miniature-salesclerk-33951

10/03/2023, 6:20 PM

Yeah, I understand 🙂

miniature-salesclerk-33951

10/03/2023, 6:20 PM

I've had a Rancher restore go sideways because of an s3 problem, which is how I discovered the --etd-s3=false flag

wonderful-rain-13345

10/03/2023, 6:20 PM

lol

wonderful-rain-13345

10/03/2023, 6:20 PM

does it not download the file first then make changes?

wonderful-rain-13345

10/03/2023, 6:21 PM

i can see it now: 1. reset the cluster 2. download the backup, but it fails 3. ???

miniature-salesclerk-33951

10/03/2023, 6:21 PM

reset cluster and restore happen together

wonderful-rain-13345

10/03/2023, 6:21 PM

but the file needs to be downloaded first?

miniature-salesclerk-33951

10/03/2023, 6:22 PM

The local backups done through the Rancher UI should already be there

wonderful-rain-13345

10/03/2023, 6:22 PM

sure

miniature-salesclerk-33951

10/03/2023, 6:22 PM

In /var/lib/rancher/rke2/server/db/snapshots, iirc

wonderful-rain-13345

10/03/2023, 6:22 PM

i gotcha, just saying it's weird S3 can make a restore break a cluster.

wonderful-rain-13345

10/03/2023, 6:23 PM

if the file was downloaded first, before changes were made, it shouldn't be an issue

wonderful-rain-13345

10/03/2023, 6:23 PM

🤷🏼

wonderful-rain-13345

10/03/2023, 6:23 PM

I wonder if thats what happened to me

miniature-salesclerk-33951

10/03/2023, 6:23 PM

If you don't specify --etcd-s3=false and s3 download fails for some reason (bad cert, s3 is down), then what will happen is etcd will end up defaulting to empty and will come up with no pods, etc.

wonderful-rain-13345

10/03/2023, 6:23 PM

wow that is insane

wonderful-rain-13345

10/03/2023, 6:23 PM

why wouldn't the first step be to stage the file!?

wonderful-rain-13345

10/03/2023, 6:24 PM

"if file download fails, bail out and make no changes"

miniature-salesclerk-33951

10/03/2023, 6:24 PM

No idea - I don't grok why it defaults to etcd if the local backup already exists.

wonderful-rain-13345

10/03/2023, 6:25 PM

either way

wonderful-rain-13345

10/03/2023, 6:25 PM

why mutate state of the environment if the backup download fails or the backup fails to unzip or whatever

miniature-salesclerk-33951

10/03/2023, 6:26 PM

How it is and why it is are two different questions. I can give a crack at how it is from my experience. The why someone else will need to answer, lol.

wonderful-rain-13345

10/03/2023, 6:26 PM

heh

wonderful-rain-13345

10/03/2023, 6:26 PM

i'm just sayin.. doesn't seem safe

wonderful-rain-13345

10/03/2023, 6:26 PM

I've had every cluster I've ever made using rancher fail because it allowed me to break etcd.

wonderful-rain-13345

10/03/2023, 6:26 PM

I'm thinking rancher isn't the best way to manager a cluster unfortunately.

miniature-salesclerk-33951

10/03/2023, 6:27 PM

Best very much depends on your use case. When I discovered this was when my team was intentionally trying to break and restore a cluster we spun up strictly for that purpose. I've never had to do this in production or longterm Rancher clusters 🤞

😅 1

miniature-salesclerk-33951

10/03/2023, 6:31 PM

But that said, you'll want to cluster-reset on one etcd node with --etc-s3=false, pointing to your local snapshot. Then join the other etcd nodes after clearing /var/lib/rancher/rke2/server/db/etcd. Then join the control plane nodes. You should also start/restart the rancher-system-agent on those nodes as you go as well

wonderful-rain-13345

10/03/2023, 6:31 PM

heh i have no nodes now lol

miniature-salesclerk-33951

10/03/2023, 6:32 PM

Yeah, this is why you're bootstrapping and cluster-resetting. --cluster-reset = ignore any etcd node history and just make a new cluster with this one node

wonderful-rain-13345

10/03/2023, 6:32 PM

right but can't run that if no nodes

wonderful-rain-13345

10/03/2023, 6:32 PM

😉

miniature-salesclerk-33951

10/03/2023, 6:33 PM

So, I'm hoping that the nodes will make it back on their own, but you can also do this manually with RKE2 and import the cluster into Rancher after the fact.

miniature-salesclerk-33951

10/03/2023, 6:33 PM

https://ranchermanager.docs.rancher.com/how-to-guides/new-user-guides/kubernetes-clusters-in-rancher-setup/register-existing-clusters

✅ 1

wonderful-rain-13345

10/03/2023, 6:34 PM

i reset, had a working etcd single node. It was still looking for old nodes, it didn't clean those up in the rke2 service logs. I then tried to restore a backup of etd only, and it broke the etcd node. I tried to create a new one, here i am

wonderful-rain-13345

10/03/2023, 6:34 PM

Don't use lose functionality because yyou are importing a cluster?

miniature-salesclerk-33951

10/03/2023, 6:36 PM

I think you mostly need to go back and setup any upgrade policies, etc. after the fact. https://ranchermanager.docs.rancher.com/how-to-guides/new-user-guides/kubernetes-clusters-in-rancher-setup/register-existing-clusters#manag[…]usters

wonderful-rain-13345

10/03/2023, 6:36 PM

thanks Scott

miniature-salesclerk-33951

10/03/2023, 6:37 PM

"The ability to see a read-only version of the cluster's configuration arguments and environment variables used to launch each node in the cluster" - I think this is the biggest difference is you can't edit the cluster.yaml directly in the web view

miniature-salesclerk-33951

10/03/2023, 6:38 PM

Very much at your own risk, it's still technically possible:

kubectl edit <http://cluster.management.cattle.io|cluster.management.cattle.io> -n fleet-local

kubectl edit <http://cluster.management.cattle.io|cluster.management.cattle.io> -n fleet-default

I don't know the ramifications of touching it, though 🙂

wonderful-rain-13345

10/03/2023, 6:39 PM

i have a backup lol

miniature-salesclerk-33951

10/03/2023, 6:39 PM

Lol

wonderful-rain-13345

10/03/2023, 6:40 PM

heh what should i change

miniature-salesclerk-33951

10/03/2023, 6:40 PM

Probably nothing 🙂 .

miniature-salesclerk-33951

10/03/2023, 6:40 PM

At this point you just need to get your data back

wonderful-rain-13345

10/03/2023, 6:44 PM

yeah

wonderful-rain-13345

10/03/2023, 6:45 PM

it is strange how with 0 nodes rancher will not make a new one.

miniature-salesclerk-33951

10/03/2023, 6:47 PM

You have zero machines to work with, so I'm assuming you are not using Elemental for provisioning them.

wonderful-rain-13345

10/03/2023, 6:47 PM

no.

wonderful-rain-13345

10/03/2023, 6:47 PM

What is elemental?

wonderful-rain-13345

10/03/2023, 6:48 PM

I use vsphere in rancher

miniature-salesclerk-33951

10/03/2023, 6:50 PM

Elemental is an immutable operating system from SUSE where you define the OS with cloud-data, it generates an iso that you then boot the machine with and Rancher can manage the OS itself from there. So when you delete an Elemental managed machine, Elemental can be setup to reinstall the OS and reboot it so that it can be ready to be provisioned back to a cluster

wonderful-rain-13345

10/03/2023, 6:51 PM

i'll read about it

miniature-salesclerk-33951

10/03/2023, 6:51 PM

cloud-data is like cloud-init user-data. It doesn't mean it has to run on cloud. I'm deploying bare metal RKE2 nodes with it now.

wonderful-rain-13345

10/03/2023, 6:51 PM

I'm familiar.

wonderful-rain-13345

10/03/2023, 6:52 PM

how does that differ from using ubuntu images that I don't configure and letting rancher deploy rke2?

miniature-salesclerk-33951

10/03/2023, 6:52 PM

To be clear, I don't think it fixes your current situation at this very moment.

wonderful-rain-13345

10/03/2023, 6:52 PM

oh yea i get it 😄

wonderful-rain-13345

10/03/2023, 6:52 PM

or are you saying it's kinda like using a container image and storing state outside of containers in volumes?

miniature-salesclerk-33951

10/03/2023, 6:52 PM

They're immutable and Rancher can handle the updates. It's sort of like Red Hat CoreOS or Fedora CoreOS, but the under the hood architecture it significantly different.

wonderful-rain-13345

10/03/2023, 6:52 PM

and node identity is preserved

miniature-salesclerk-33951

10/03/2023, 6:53 PM

It's not even "like" - the OS is literally a container image.

wonderful-rain-13345

10/03/2023, 6:53 PM

"we can just swap out the chassis"

wonderful-rain-13345

10/03/2023, 6:53 PM

hmm where have i seen that before.. 🤔

miniature-salesclerk-33951

10/03/2023, 6:53 PM

TalosLinux?

wonderful-rain-13345

10/03/2023, 6:53 PM

rancher OS 😄

miniature-salesclerk-33951

10/03/2023, 6:53 PM

lol

wonderful-rain-13345

10/03/2023, 6:53 PM

I got burned by RancherOS, k3Os

miniature-salesclerk-33951

10/03/2023, 6:53 PM

I've never used those. I have used RHCOS and TalosLinux before, though

wonderful-rain-13345

10/03/2023, 6:54 PM

"welp folks, thanks, been fun, but this is now EOL"

miniature-salesclerk-33951

10/03/2023, 6:54 PM

Elemental is sort of a mash up of RHCOS and Talos.

wonderful-rain-13345

10/03/2023, 6:54 PM

i also wanna be clear-- I can appreciate how difficult orchestration and scheduling is.

wonderful-rain-13345

10/03/2023, 6:54 PM

I work at a public cloud company 🙂

miniature-salesclerk-33951

10/03/2023, 6:55 PM

Like the deployment and architecture is similar to Talos, but it's still SUSE based and therefore resembles RHCOS in the sense that you can ssh to it and it still feels like a Linux environment until you try to make any changes to it, that is 😅

miniature-salesclerk-33951

10/03/2023, 6:55 PM

Nice!

wonderful-rain-13345

10/03/2023, 6:56 PM

and i don't want folks to think i'm trashing the team here either.

wonderful-rain-13345

10/03/2023, 6:56 PM

This stuff is extremely complex and there are SO MANY edge cases

miniature-salesclerk-33951

10/03/2023, 6:56 PM

I work for a very large public organization and I'm using Rancher now in no small part because public cloud costs scale with the usage 😅

miniature-salesclerk-33951

10/03/2023, 6:58 PM

You might appreciate this - We might be one of the last shops with Eucalyptus still running as well.

wonderful-rain-13345

10/03/2023, 6:58 PM

hehe

wonderful-rain-13345

10/03/2023, 6:58 PM

I haven't used it, but i've heard of it

miniature-salesclerk-33951

10/03/2023, 7:00 PM

Yeah - 10 years ago, Eucalyptus and OpenStack were to two major private cloud platforms. Eucalyptus got bought by HP and more or less died on the vine. Rackspace got bought by a VC and... technically still exists, but it's not the same company it once was.

miniature-salesclerk-33951

10/03/2023, 7:01 PM

Anyway, Euca is/was basically an on-prem, private AWS built on top of libvirt and ceph.

miniature-salesclerk-33951

10/03/2023, 7:02 PM

I have a couple of k3s instances running in Eucalyptus right now, hehe

wonderful-rain-13345

10/03/2023, 7:02 PM

hehe ok i restored the backup of rancher

wonderful-rain-13345

10/03/2023, 7:02 PM

it keeps spawning new machines because it can't find any. up to 300

miniature-salesclerk-33951

10/03/2023, 7:02 PM

I assume s/machines/pods - so that's good!

wonderful-rain-13345

10/03/2023, 7:02 PM

miniature-salesclerk-33951

10/03/2023, 7:02 PM

New VMs?

wonderful-rain-13345

10/03/2023, 7:03 PM

yeah so apparently if rancher can't find any nodes, it'll just keep making them

miniature-salesclerk-33951

10/03/2023, 7:03 PM

Makes sense - at this point, I would let Rancher trying to make it work from here

miniature-salesclerk-33951

10/03/2023, 7:03 PM

Ah, yeah, you're probably using the VMWare provisioner

wonderful-rain-13345

10/03/2023, 7:03 PM

huh

wonderful-rain-13345

10/03/2023, 7:03 PM

yeah

wonderful-rain-13345

10/03/2023, 7:04 PM

but still, if the max in a node pool is N, why would it try to do 300

miniature-salesclerk-33951

10/03/2023, 7:04 PM

It looks like Rancher has enough to work with to rebuild things.

miniature-salesclerk-33951

10/03/2023, 7:04 PM

And it looks like 256 etcd machines?

miniature-salesclerk-33951

10/03/2023, 7:07 PM

Also, congratulations on having a home cluster that can provision 300 VMs on the fly

wonderful-rain-13345

10/03/2023, 7:07 PM

You mean moneypit

wonderful-rain-13345

10/03/2023, 7:08 PM

Yeah, buying those old Dell servers are great. The servers are older than my kids

wonderful-rain-13345

10/03/2023, 7:09 PM

So like where are the docs to just do rke2 without rancher?

miniature-salesclerk-33951

10/03/2023, 7:12 PM

https://docs.rke2.io/

miniature-salesclerk-33951

10/03/2023, 7:12 PM

https://ranchermanager.docs.rancher.com/faq/rancher-is-no-longer-needed

miniature-salesclerk-33951

10/03/2023, 7:13 PM

That one etcd server says 106 days

miniature-salesclerk-33951

10/03/2023, 7:14 PM

So I think the restore worked. I don't understand why it provisioned 200-something etcd nodes

miniature-salesclerk-33951

10/03/2023, 7:14 PM

That doesn't seem right

wonderful-rain-13345

10/03/2023, 7:14 PM

Oh I restored a backup of rancher db

wonderful-rain-13345

10/03/2023, 7:14 PM

Right lol

miniature-salesclerk-33951

10/03/2023, 7:15 PM

And you had 256 etcd nodes?

miniature-salesclerk-33951

10/03/2023, 7:15 PM

No wonder you broke etcd!

wonderful-rain-13345

10/03/2023, 7:15 PM

No 3

miniature-salesclerk-33951

10/03/2023, 7:15 PM

Ah, then the 256 etcd nodes remains a mystery

wonderful-rain-13345

10/03/2023, 7:15 PM

Yeah

wonderful-rain-13345

10/03/2023, 7:15 PM

Rancher did it

miniature-salesclerk-33951

10/03/2023, 7:16 PM

but if it can get down to 3, Rancher should have enough to go on to rebuild things from there

miniature-salesclerk-33951

10/03/2023, 7:16 PM

How big is the machinepool for etcd?

miniature-salesclerk-33951

10/03/2023, 7:17 PM

If you go to edit cluster, can you verify how many etcd nodes are in the pool?

wonderful-rain-13345

10/03/2023, 7:49 PM

I will when I get home. I went to go pick up the kids

wonderful-rain-13345

10/03/2023, 7:49 PM

How do you manage your r ke2 deployments? CI? Is there tooling for it?

wonderful-rain-13345

10/03/2023, 7:50 PM

Kustomize?

miniature-salesclerk-33951

10/03/2023, 8:39 PM

Deploying things on RKE2 or deploying RKE2 itself?

miniature-salesclerk-33951

10/03/2023, 8:43 PM

Deploying things on RKE2, mainly with helm/git with Jenkins for the CI/CD, leveraging different clusters for dev/staging/prod environments. For some basic stuff, like automating cert-manager, I've started using fleet from Rancher. For deploying RKE2 itself, mainly doing Elemental on bare metal. I define all the OS specific stuff in a registration endpoint and provision the baremetal nodes with them via BMC and use Rancher/Fleet/Elemental to manage it from there. https://github.com/rancher/elemental-operator#inventory-management

miniature-salesclerk-33951

10/03/2023, 8:46 PM

For k3s stuff, I deploy OpenSUSE Leap Transactional Server since it pretty well includes everything k3s needs right out of box and then use the https://github.com/rancher/system-upgrade-controller to automate the OS (based on the SLE Micro example) and k3s upgrades from there.

miniature-salesclerk-33951

10/03/2023, 8:49 PM

I've been running k3s much longer than Rancher itself and a lot of how something was deployed/managed came down to its particular use case.

Copy code

# kubectl get node
NAME                   STATUS   ROLES                  AGE    VERSION
<Redacted Hostname>    Ready    control-plane,master   656d   v1.27.6+k3s1

^^ This cluster is running on CentOS Stream 9.

miniature-salesclerk-33951

10/03/2023, 8:52 PM

At my previous two employers, I administered OpenShift and briefly did here as well, before replacing it with Rancher. It was interesting coming to k3s from OpenShift, since everything about k3s is so much simpler. Also do a whole lot of GKE at the moment, as well.

wonderful-rain-13345

10/03/2023, 8:55 PM

Cool

miniature-salesclerk-33951

10/03/2023, 9:19 PM

The GKE stuff is all currently deployed through Terraform, but we're moving away from it. Jenkins+Ansible AWX is where most of the GitOps is happening now.

wonderful-rain-13345

10/03/2023, 9:34 PM

Why use k3s over k8s?

miniature-salesclerk-33951

10/03/2023, 9:35 PM

Like vanilla k8s or another distribution?

miniature-salesclerk-33951

10/03/2023, 9:39 PM

I initially picked k3s for something where I needed to deploy something for infrastructure support that I wanted to live internally and needed to run on Kubernetes but didn't have a cluster handy yet. I also liked that k3s ran natively with a BYO Linux distro and didn't depend on bringing up a bunch of VMs. And since it was initially a prototype, it was nice to be able to start simple with one node and then scale it out later if we needed to. k3s, like OpenShift, is also opinionated out of box, but in sort of the exact opposite direction of OpenShift which throws every bell and whistle at you. k3s was a quick and easy way to get from scratch to

kubectl

on a fresh server.

wonderful-rain-13345

10/03/2023, 9:40 PM

in general k8s

wonderful-rain-13345

10/03/2023, 9:40 PM

like what was in the "5" that were removed

miniature-salesclerk-33951

10/03/2023, 9:40 PM

There are other tiny distributions now (k0s from Mirantis, for example), but at the time most other small, quick and dirty, kubernetes deals ran VMs and/or were more geared for a developer's laptop than a server environment.

miniature-salesclerk-33951

10/03/2023, 9:43 PM

AFAIK, the big thing k3s removes from k8s out of box is a bunch of cloud-provider specific drivers, that can be added back in. k8s upstream includes all the Google, Azure, Amazon, etc. stuff regardless of where you deploy it. k3s defaults to local storage and if you want more than that, deploy a CSI. But k3s is also opinionated in that it gives you a CNI (Flannel) and a default storageClass (host path) out of box, where you have to bring those yourself with default k8s.

wonderful-rain-13345

10/03/2023, 9:43 PM

wonderful-rain-13345

10/03/2023, 9:43 PM

i'm gonna give rancher one more try

wonderful-rain-13345

10/03/2023, 9:44 PM

and i'm never touching the etcd node pool again.

miniature-salesclerk-33951

10/03/2023, 9:44 PM

lol

wonderful-rain-13345

10/03/2023, 9:44 PM

but for real though, you should not be able to accidently delete all etcd nodes in a cluster.

miniature-salesclerk-33951

10/03/2023, 9:44 PM

out of curiosity, why not just run etcd on the control plane nodes?

miniature-salesclerk-33951

10/03/2023, 9:44 PM

That's how most k8s setups are.

wonderful-rain-13345

10/03/2023, 9:45 PM

My CP nodes tend to run hot

wonderful-rain-13345

10/03/2023, 9:45 PM

i basically had no workloads running but thet ate up 8gb of ram fast and high cpu

miniature-salesclerk-33951

10/03/2023, 9:45 PM

Aren't they all VMs?

wonderful-rain-13345

10/03/2023, 9:45 PM

yea

miniature-salesclerk-33951

10/03/2023, 9:45 PM

Different ESXi hosts?

wonderful-rain-13345

10/03/2023, 9:45 PM

but so if they are eating ram and CPU, then etcd suffers

wonderful-rain-13345

10/03/2023, 9:46 PM

not all of them

wonderful-rain-13345

10/03/2023, 9:46 PM

but yea

wonderful-rain-13345

10/03/2023, 9:46 PM

i keep them on the smaller side so they can be migrated across the esx cluster

wonderful-rain-13345

10/03/2023, 9:46 PM

i run vsan at home 😄

miniature-salesclerk-33951

10/03/2023, 9:46 PM

Generally control plane nodes shouldn't be taking up that many resources.

miniature-salesclerk-33951

10/03/2023, 9:47 PM

Running the API and etcd is most all of what they should be doing if you have worker nodes.

wonderful-rain-13345

10/03/2023, 9:47 PM

i guess i can experiment

wonderful-rain-13345

10/03/2023, 9:47 PM

but also, if i had CP + etcd, i'd had been able to add nodes easier i imagine?

miniature-salesclerk-33951

10/03/2023, 9:47 PM

What are you doing for storage?

miniature-salesclerk-33951

10/03/2023, 9:48 PM

Is it a NAS?

wonderful-rain-13345

10/03/2023, 9:48 PM

wonderful-rain-13345

10/03/2023, 9:48 PM

vSan

miniature-salesclerk-33951

10/03/2023, 9:48 PM

Is there local affinity to the VM for the storage with vSan?

miniature-salesclerk-33951

10/03/2023, 9:49 PM

I ask because I recently had some interesting latency issues with ceph rbd backed disks for etcd under load.

wonderful-rain-13345

10/03/2023, 9:50 PM

wonderful-rain-13345

10/03/2023, 9:50 PM

yeah i'm not seeing latency issues

wonderful-rain-13345

10/03/2023, 9:53 PM

https://core.vmware.com/resource/understanding-data-locality-vmware-vsan?irclickid=2[…]03851%7Ccampaignid%3A11461&irpid=2003851&irgwc=1&im_rewards=1

miniature-salesclerk-33951

10/03/2023, 9:53 PM

cpu load and disk IO look the same at first glance when you're looking at top, etc. Unless you're scheduling a bunch of stuff on your control plane nodes and/or downloading a bunch of huge images on them for some reason, they shouldn't be using much in terms of resources.

wonderful-rain-13345

10/03/2023, 9:53 PM

tldr "yes"

wonderful-rain-13345

10/03/2023, 9:53 PM

nope not at all.

miniature-salesclerk-33951

10/03/2023, 9:54 PM

Yeah, at my last employer, we used VMWare with PureStorage and for the most part it was well-behaved, but taking snapshots with memory could definitely impact things.

wonderful-rain-13345

10/03/2023, 9:55 PM

oh vsan is insanely amazing

miniature-salesclerk-33951

10/03/2023, 9:57 PM

We're using Longhorn with RKE2 here with data affinity set and that's worked out well. It's pretty cool to have something like that on bare-metal where the pod storage is local to the host. Most everywhere else, it's a ceph shop, for better or worse. Ceph can handle a whole lot more data, but with a lot more complexity, latency, overhead, etc. For Rancher stuff, mainly using Ceph RGW for S3 backups of etcd, rancher, longhorn, etc.

wonderful-rain-13345

10/03/2023, 9:58 PM

ceph is very hard

wonderful-rain-13345

10/03/2023, 9:58 PM

we run very large ceph deployments

wonderful-rain-13345

10/03/2023, 9:58 PM

like "support millions of VMs" large

wonderful-rain-13345

10/03/2023, 9:59 PM

thankfully not my department

miniature-salesclerk-33951

10/03/2023, 9:59 PM

I'm guessing you're an OpenStack shop?

wonderful-rain-13345

10/03/2023, 10:00 PM

nope

miniature-salesclerk-33951

10/03/2023, 10:01 PM

FWIW, I just deleted ~3,000 GCP disks this past week with one of my SLA cleanup scripts. I'm glad I didn't try deleting that much from a ceph cluster all at once and possibly triggering a rebalance in the process.

wonderful-rain-13345

10/03/2023, 10:01 PM

i think our deletes are async

wonderful-rain-13345

10/03/2023, 10:02 PM

I know they are for object storage buckets

miniature-salesclerk-33951

10/03/2023, 10:02 PM

Yeah, I don't think RGW gets hit quite as hard as RBD does when you do that

miniature-salesclerk-33951

10/03/2023, 10:03 PM

RGW = S3 object store RBD = block storage (like EBS)

wonderful-rain-13345

10/03/2023, 10:03 PM

yep

miniature-salesclerk-33951

10/03/2023, 10:04 PM

LINSTOR (aka piraeus) is another one I'd like to check out. It does local storage but uses DRBD under the hood to handle the replication where Longhorn uses iscsi.

miniature-salesclerk-33951

10/03/2023, 10:05 PM

We had a support contract with LINBIT at my last employer and they were great to work with.

miniature-salesclerk-33951

10/03/2023, 10:07 PM

We also used Portworx by Pure and it worked... but it definitely had some pain points. Pure's model is very much about paying a bunch of money to make a problem go away. The Pure stuff played well with VMWare, though, and it was ridiculously fast.

wonderful-rain-13345

10/05/2023, 9:54 PM

Heh I gave up.

wonderful-rain-13345

10/05/2023, 9:55 PM

Even with a single node and seeing no other nodes in etcd, it showed running in rancher, but I couldn't join new nodes. I could add new etcd nodes and they'd join and replicate the data across the cluster however they wouldn't show running in rancher. I couldn't spin up control plane nodes ejther

miniature-salesclerk-33951

10/06/2023, 4:45 AM

Did you start the rancher-system-agent on them?

wonderful-rain-13345

10/25/2023, 11:34 PM

I just made a new cluster 😂

88 Views

Open in Slack

Previous Next