I’m trying to add an etcd-only (dedicated etcd) no...
# rke2
a
I’m trying to add an etcd-only (dedicated etcd) node to a cluster. It’s a small cluster of Ubuntu 20.04.6 servers. I have one node with roles “control-plane,etcd,master” and another with roles “control-plane,master” so far. I can ’kubectl get nodes” and see them but the etcd-only node is NotReady and I see a bunch of errors when I run journalctl -xef. RKE2 is v1.28.10+rke2r1. One of the errors is “panic: bootstrap data already found and encrypted with different token”
c
It sounds like there’s an issue with the token on the second server… I would probably uninstall RKE2 from that node,
kubectl delete node
to remove it from the cluster, then reinstall and rejoin it with the correct token.
I’m not sure how it would even get into the cluster with the wrong token though.
just to be clear, you have an all-roles server, and a control-plane-only server, and the etcd-only node does not show up in the cluster at all? Or it does show up but is NotReady?
a
the etcd-only node was half-joined to the cluster. Was in state Not Ready
chatgpt told me to do this: kubectl config set-cluster tzz-yo --server=https://zt1.tzz.yo:6443 --certificate-authority=/etc/docker/certs.d/docker.tzz.yo:5000/ca.crt kubectl config set-credentials msh --client-certificate=/etc/docker/certs.d/docker.tzz.yo:5000/ca.crt --client-key=/var/lib/rancher/rke2/server/tls/etcd/client.key kubectl config set-context tzz-yo --cluster=tzz --user=msh kubectl config use-context tzz-yo
but my user can’t read /etc/docker/certs.d/docker.tzz.yo:5000/ca.crt no matter what permissions I set
chatgpt confuses rke2 with generic upstream K8s, sadly.
makes references to /etc/kubernetes/
I broke the shit out of my ability to talk to my cluster with those AI generated commands.
🙂
so I went cp /etc/rancher/rke2/rke2.conf /home/msh/.kube/config
and I can still talk to the cluster
NAME STATUS ROLES AGE VERSION cz1 Ready control-plane,etcd,master 2d22h v1.28.10+rke2r1 cz2 Ready control-plane,master 47h v1.28.10+rke2r1
the etcd-only host is gone
etcd node is always saying, Jun 01 000426 cp0 systemd[1]: rke2-server.service: Found left-over process 837 (containerd-shim) in control group while starting unit. Ignoring. Jun 01 000426 cp0 systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies. Jun 01 000426 cp0 systemd[1]: rke2-server.service: Found left-over process 844 (containerd-shim) in control group while starting unit. Ignoring. Jun 01 000426 cp0 systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies. Jun 01 000426 cp0 rke2[26308]: time=“2024-06-01T000426Z” level=warning msg=“not running in CIS mode” Jun 01 000426 cp0 rke2[26308]: time=“2024-06-01T000426Z” level=info msg=“Applying Pod Security Admission Configuration” Jun 01 000426 cp0 rke2[26308]: time=“2024-06-01T000426Z” level=info msg=“Starting rke2 v1.28.10+rke2r1 (b0d0d687d98f4fa015e7b30aaf2807b50edcc5d7)” Jun 01 000426 cp0 rke2[26308]: time=“2024-06-01T000426Z” level=info msg=“Managed etcd cluster bootstrap already complete and initialized” Jun 01 000426 cp0 rke2[26308]: time=“2024-06-01T000426Z” level=info msg=“Reconciling bootstrap data between datastore and disk” Jun 01 000426 cp0 rke2[26308]: time=“2024-06-01T000426Z” level=info msg=“Successfully reconciled with datastore” Jun 01 000426 cp0 rke2[26308]: time=“2024-06-01T000426Z” level=info msg=start Jun 01 000426 cp0 rke2[26308]: time=“2024-06-01T000426Z” level=info msg=“schedule, now=2024-06-01T000426Z, entry=1, next=2024-06-01T120000Z” Jun 01 000426 cp0 rke2[26308]: time=“2024-06-01T000426Z” level=info msg=“Starting etcd for existing cluster member” Jun 01 000426 cp0 rke2[26308]: time=“2024-06-01T000426Z” level=info msg=“Defragmenting etcd database” Jun 01 000426 cp0 rke2[26308]: time=“2024-06-01T000426Z” level=info msg=“etcd data store connection OK” Jun 01 000426 cp0 rke2[26308]: time=“2024-06-01T000426Z” level=info msg=“Saving cluster bootstrap data to datastore” Jun 01 000426 cp0 rke2[26308]: panic: bootstrap data already found and encrypted with different token
I think I need to wipe out the etcd database entirely
just on the etcd-only node
c
yeah just stop rke2-server, rm -rf /var/lib/rancher/rke2/server/db, then start it again
on that node that wont join
a
do I need to run a cluster-init with v1.28.10+rke2r1?
I realized there is was a rancher http gui docker image last week and forgot I had installed it until today. I could use it to import a cluster but I don’t really have a cluster yet.
btw I completely wiped the etcd-only VM and reinstalled without any restrictions on what process/daemon could run.
so it’s a plain server.
I have only one controlplane VM at the moment
How do I add nodes to the default local-node cluster or should I build a new cluster using cli only and then see if the gui can import it as a generic k8s cluster?
200 Views