This message was deleted.
# k3s
a
This message was deleted.
c
are you starting prod1 with --cluster-init instead of --server=prod2 to tell it to join the existing cluster?
The only reason you’d get a new cluster with etcd on prod1 is if you started it with --cluster-init
g
I have tried both, same result each time
I also tried starting prod1 with --cluster-init and restarting 2+3 with --server=prod1
c
I would: • uninstall K3s • Make sure the data (specifically the certs and etcd db) are removed from disk • Reinstall and ensure that you’re passing --token=x and --server=prod2 or --server=prod3 to get it to join the existing cluster
👍 1
g
I'm wondering if there's somewhere the configuration could have been cached and causing issues, but I',m not sure
Thank you, I will try that next, the cluster is limping along on 2 nodes for now, so I'm waiting till after hours to avoid downtime at this point
c
check the cli args in the systemd unit, and /etc/rancher/k3s/config.yaml
g
I have a k3s.yaml there, no config.yaml, and to be clear, should I expect it to show prod1 as the server from prod2 if I used prod1 in the startup command, or localhost? Not sure what to look for startup command from
systemctl cat k3s
on prod2
Copy code
ExecStart=/usr/local/bin/k3s \
    server \
        '--server' \
        '<https://ldi-mech-prod1:6443>' \ 
        '--token' \
        <redacted> \
        '--disable' \
        'traefik' \
and the server in the k3s.yaml there is https://127.0.0.1:6443 Thanks again, your recommendations are very helpful here, our operations engineer who set this all up retired and left no docs
c
that’s what I would expect to see if the cluster was originally built with prod1 as the first node, and prod2 and prod3 joining. You can remove the --server or --cluster-init options from any of the servers once the cluster is up and running and it won’t make a difference.
You now want to join prod1 to prod2 or prod3 though, so set up your args appropriately.
g
Wanted to come back and say thanks, I was a bit in panic mode yesterday and didn't realize that I was trying things out of order, came back and reread this conversation today and was able to fix it just by running the uninstaller and correct installation command from the impacted node. If you'll be at kubecon this year let me come buy you a beer or something