https://rancher.com/ logo
#rke2
Title
f

faint-airport-83518

05/19/2022, 11:17 PM
with rke2 we don't need leader election anymore? https://github.com/rancher/rke2/issues/349. I tried pulling the logic out from https://github.com/rancherfederal/rke2-azure-tf/blob/main/modules/custom_data/files/rke2-init.sh#L51-L71 and am getting an error.
This is trying to add a new node to an existing cluster
Copy code
May 19 23:20:53 vm-server01 rke2[14190]: time="2022-05-19T23:20:53Z" level=info msg="Failed to test data store connection: context deadline exceeded"
May 19 23:20:55 vm-server01 rke2[14190]: time="2022-05-19T23:20:55Z" level=info msg="Waiting to retrieve kube-proxy configuration; server is not ready: <https://127.0.0.1:9345/v1-rke2/readyz>: 500 Internal Server Error"
May 19 23:21:00 vm-server01 rke2[14190]: time="2022-05-19T23:21:00Z" level=info msg="Waiting to retrieve kube-proxy configuration; server is not ready: <https://127.0.0.1:9345/v1-rke2/readyz>: 500 Internal Server Error"
May 19 23:21:05 vm-server01 rke2[14190]: {"level":"warn","ts":"2022-05-19T23:21:05.809Z","logger":"etcd-client","caller":"v3@v3.5.3-k3s1/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"<etcd-endpoints://0xc001c90000/127.0.0.1:2379>","attempt":0,"error":"rpc error: 
code = DeadlineExceeded desc = latest balancer error: last connection error: connection error: desc = \"transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused\""}
@gray-lawyer-73831 heeeelllp
g

gray-lawyer-73831

05/19/2022, 11:34 PM
So, you still need a “leader node” or “bootstrap node” that is brought up without the
server
flag, but you just don't have to wait until that's fully up to join others anymore. You still can, and will see less errors if you do waiting, but in not waiting the errors are just transient and go away after the first node is up
f

faint-airport-83518

05/19/2022, 11:35 PM
ah, okay. so we just don't need that obnoxious wait time
alright.. let me see if I can figure out some kinda logic...
💪 1
what happens if the "leader" is deleted?
g

gray-lawyer-73831

05/19/2022, 11:38 PM
The cluster should actually continue to operate as normal! It's mostly needed to initialize etcd, and once that's done, then it's happy to swap out nodes and no longer have that special node anymore
f

faint-airport-83518

05/19/2022, 11:39 PM
I guess it's weird then - because my cluster is still there I just changed the init script and added new nodes..
anyway, I'll tweak it just to remove the sleep time and see if it works
I think it worked eventually but for some reason the other servers didn't join the cluster for over an hour
if anyone ever looks at this thread, my issue was due to someone setting
set -e
at the top of this script, that makes the script just die
20 Views