This message was deleted Rancher Users #vsphere

Join Slack

This message was deleted.

# vsphere

adamant-kite-43734

08/03/2023, 7:03 AM

This message was deleted.

swift-sunset-4572

08/03/2023, 8:57 AM

how many control plane node you have ?

limited-elephant-16797

08/03/2023, 8:58 AM

3 Control plane nodes, but none of them is running now

swift-sunset-4572

08/03/2023, 8:59 AM

you are getting this erro when ? scaling up ? can you explain the scenario

limited-elephant-16797

08/03/2023, 9:01 AM

I edited the cluster config to increase the node CPU and Memory, then new nodes were stuck in "Waiting for init node". I deleted those nodes and after that it started throwing the above error.

swift-sunset-4572

08/03/2023, 9:04 AM

oh okay got it so what happens is when you deploy control plane nodes ( odd numbers ) because of RAF algo that etcd uses so that means the etcd main conf was on one of those nodes and when it got deleted ( when it was stuck at the init node ) that conf got deleted as well that was used by etcd . Now i am assuming you deleted all the three nodes and they started showing this error , because now the nodes think and try to find the existing etcd plane but thats not available now so its asking for a backup to restore the etcd

swift-sunset-4572

08/03/2023, 9:05 AM

so onething you can try is go to the rancher backup and restore section in the documentation and try to check whether there is any backup folder created on your control plane nodes

swift-sunset-4572

08/03/2023, 9:06 AM

if it is present then you can try restoring it , else scale these nodes to one and then delete the first node so that it creates control plane as a fresh node then scale it to 3

limited-elephant-16797

08/03/2023, 9:06 AM

And as no backup is available so it's impossible to restore it, right? What do you suggest to avoid this situation in future? How many number of nodes should I select considering the application load was well handled by 3 nodes of 16 GB and 8vCPUs.

swift-sunset-4572

08/03/2023, 9:08 AM

well to be honest its better to segregate the roles so that you have etcd seperately that way etcd will not get affected if you change any config on the masters node , second thing is when provisioning a cluster you can take backup of your etcds so that atleast you will sort of have a state with you to go back to incase of errors also which platform you are using to provision this cluster on ?

swift-sunset-4572

08/03/2023, 9:09 AM

https://ranchermanager.docs.rancher.com/how-to-guides/new-user-guides/backup-restore-and-dis[…]recovery/back-up-rancher-launched-kubernetes-clusters

limited-elephant-16797

08/03/2023, 9:09 AM

Running on Vsphere, and running a different etcd is currently not possible.

swift-sunset-4572

08/03/2023, 9:10 AM

okay and how are you assigning ips to vm ? i am assuming you are using vapps ?

swift-sunset-4572

08/03/2023, 9:11 AM

do you have sufficient ips and resources on your data store ? i am just trying to solve this node stuck at init problem for you first

limited-elephant-16797

08/03/2023, 9:11 AM

Yes, it is automatically taking IPs from the network pool

swift-sunset-4572

08/03/2023, 9:11 AM

my set up is also same with three master + etcd role nodes i have enabled the backup for etcd

swift-sunset-4572

08/03/2023, 9:12 AM

that workss for me

swift-sunset-4572

08/03/2023, 9:12 AM

this problem only arises when you either delete all the nodes or scale down to an even number ofcontrol plane

limited-elephant-16797

08/03/2023, 9:12 AM

Where are you storing the backups?

swift-sunset-4572

08/03/2023, 9:12 AM

well as of now locally only

limited-elephant-16797

08/03/2023, 9:13 AM

So currently I can go with the odd number of nodes, right? And taking the regular backup of etcd will save from such problems in future.

swift-sunset-4572

08/03/2023, 9:14 AM

yes bro also when you make changes in the config of control plane node never delete all the nodes at once as it will remove the etcd config plane and you will face the same error

swift-sunset-4572

08/03/2023, 9:15 AM

just do keep a look on the vsphere logs of any node that gets stuck in init phase

swift-sunset-4572

08/03/2023, 9:15 AM

logs on vsphere gives more clarity why it happens

limited-elephant-16797

08/03/2023, 9:16 AM

Ok, thanks a lot for the help. Any idea what's the reason for the node getting stuck in init?

swift-sunset-4572

08/03/2023, 9:17 AM

well it gets stuck in the init phase when either it doesnt get proper resources , seems more like a vsphere related error to me . coz when rancher shows " waiting for agent to apply initial plan " thats where vcenter provides network to the node using the cloud init file

swift-sunset-4572

08/03/2023, 9:19 AM

feel free to reach out bro i will be around trying to make this community more active

👍 2

limited-elephant-16797

08/03/2023, 9:21 AM

How can I go with segregating etcd nodes on Vsphere? Any suggestions?

limited-elephant-16797

08/03/2023, 9:43 AM

I tried scaling down nodes to 1 and then deleting the last node but the new node is still stuck with the same "etcd error".

swift-sunset-4572

08/03/2023, 10:19 AM

bro you have to create 3 seperate pools for 3 roles , master worker and etcd

limited-elephant-16797

08/03/2023, 11:36 AM

Then how to connect them all? Will it be automatically done?

swift-sunset-4572

08/03/2023, 3:42 PM

yes bro they connect automatically

swift-sunset-4572

08/03/2023, 3:42 PM

rancher configures them all by themselves

limited-elephant-16797

08/03/2023, 3:43 PM

Perfect, I will give it a try. Thanks 👍

swift-sunset-4572

08/03/2023, 3:43 PM

yeah 😄

1044 Views

Open in Slack

Previous Next