adamant-kite-43734
06/13/2024, 5:02 PMhundreds-evening-84071
06/13/2024, 5:24 PMnutritious-tomato-14686
06/13/2024, 5:25 PMkubectl drain
and then kubectl delete node
) + rke2-killall.sh
. Then readd it as an agent.nutritious-tomato-14686
06/13/2024, 5:26 PMdisable-etcd: true
is equivalent to --disable-etcd=true
in the CLIlittle-country-11254
06/13/2024, 5:27 PMlittle-country-11254
06/13/2024, 6:55 PMrke2-killall.sh
doesn’t clean up everything? when I run the script then try run the agent I still end up with control-plane components.
Unless re-adding an agent involves something I miss?
my procedure is:
• Cordon
• Drain
• Delete node
• rke2-killall
• start agentnutritious-tomato-14686
06/13/2024, 6:56 PMrm -rf /var/lib/rancher/rke2/
, should cleanup all the etcd and server internal configuration.nutritious-tomato-14686
06/13/2024, 6:56 PMlittle-country-11254
06/13/2024, 6:58 PMhundreds-evening-84071
06/13/2024, 7:04 PMhundreds-evening-84071
06/13/2024, 7:04 PMnutritious-tomato-14686
06/13/2024, 7:04 PMhundreds-evening-84071
06/13/2024, 7:05 PMnutritious-tomato-14686
06/13/2024, 7:07 PMrke2-uninstall.sh
and then reinstall rke2 again as an agent... I wasn't sure the networking/host setup Joseph is running. Sometimes people have airgap setups or registry redirectslittle-country-11254
06/13/2024, 7:07 PMlittle-country-11254
06/13/2024, 8:37 PM1. Cordon and Drain the node.
2. Delete the node: Kubectl delete <node name>
3. Stop rke2-agent / server (We had cases where an rke2-server was running and in others the agent was running, it became messy)
4. Run the kill-all script
5. Clear two these two directories, could be different and contextual:
a. rm -rf /var/lib/kubelet/pods/*
b. rm -rf /var/lib/rancher/rke2/
6. Start the agent
... then the directories are re-created and the node re-joined the cluster as a worker. We had to make sure the rke2-server was masked and disabled in worker nodes and without clearing those 2 paths we would not have changed anything.
Thank you for the assist we are grateful🙏 how we got here was originally caused by rke2 certs expiring and not being auto-renewed via the server & agent reboot along with unmasking of servers and agent processes where they were masked. Long day 😄