So We've been having rare, but occasional hiccups ...
# harvester
b
So We've been having rare, but occasional hiccups with networking in our harvester clusters. Typically all our hosts have 2 SFPs (25G or 100G link speeds depending on the host/cluster) with 1 of them dedicated for the mgmt network and the other dedicated to trunk in all the VM networks/VLANs. When we had originally designed the clusters a few years ago, we didn't know (or you couldn't at the time) that you could peel off the VM networks from the mgmt interface instead of having to have a separate link for them. For stability/redundancy it seems better to have the two 100G links in a LAG for the mgmt link and use that lagged connection for the VM Networks. Our bandwith, even combined, seems to be well within the limits of a single connection, but the havoc that having a bad card or SFP failure seems to be much more detrimental. Here's my questions in relation to if anyone has opinions or guesses: • Any foreseeable issues with that design? • What's the best way to reconfig the networking for each node? ◦ Edit the /oem/90_custom.yaml and reboot? ◦ Write a new file and reboot? (Are there any validation options or tools that might generate this?) ◦ Reinstall? ▪︎ Can you leave the node in and just reinstall with the same IP/Name, or does it need to be removed first? ◦ Live patch and edit/update the yaml? • Anything else come to mind, or general advice?
b
i will warn you that your vm network config can only have one configuration. i just built new nodes, and i had to add all of them, shut everything down, and change the config to swing everything over to the new nodes.
i'm hoping that the kube-ovn stuff coming in 1.6.0 will make it easier to reconfigure networking. 🤷
b
yeah I figure that'd likely have to be our plan as well. Or at least 1/2 the nodes and restart the VMs to flip to the new network.
b
also just fyi, it appears that live migrations still happen over the management network unless you patch the kubevirt resources to use another network.
b
I think that should be fine. We're essentially going to be reducing the Cluster Network Config down to the mgmt link, but not the VM Network mgmt ... if that makes sense. (confusing because it's named the same in both places.)
b
it matters to me because my nodes now have 1gig for the management network and 10gig for the VM & storage. i just wanted to mention it. 🙂
b
I think storage happens over that by default as well right?
b
over the management network, yeah. i flipped it to use one of the VLANs on my VM network.
b
Yeah basically we're going from 100G for each, to two combined 100G.
b
nice. i don't get to have nice hardware like that anymore. 😉
b
Well it's only nice if it works....
💯 1
Currently it's not working all the time, hence the lagging to try to help shore it up.
Plus we hit a really awful bug with the broadcom nics.
b
at $previous_job we had Cisco UCS with lots of uplinks. when we initially put everything in, we had a batch of bad cables. that was nearly impossible to diagnose. i hope for your sake you don't have a problem like that. 🙂
b
It might be. Or the SFPs...
tldr; virtio guests got reduced down to dialup/old dsl speeds.
b
ouch
b
But only the guests, and only for the particular model we had purchased. It got announced on the kernel list, but no one tagged broadcom.
b
We are looking to do similar quite soon. Currently each node has 5 nics, 1 mgmt, 2 compute, 2 storage and VM migrations use the storage network. For better resiliency we are going to use 2 compute NICs for mgmt cluster network and create a new compute virtual machine network from that. Our rough plan is 1. Label current nodes with oldcompute: true 2. Update current compute networking configuration to only run on nodes with that label. 3. We use PXE boot to install harvester so we will update the management NIC config in the configuration files for that. 4. Reinstall each node with the same IP but different name (we use node-index-install_datetime as a format) 5. Create a new compute virtual machine network from the mgmt cluster network 6. Move VMs across one by one until node is full and then repeat with other nodes
We will lose the ability to only run the new compute network on a subset of nodes but don't foresee that being an issue
b
When you do 4 - since it has the same IP does it replace the old node, or do you remove it before you do the install?
b
Yep we remove it - I'll add step 3.5 - drain the node and then delete it
For other reasons we have done this numerous times before and never hit an issue reusing the same IP with a different name
If you are also PXE booting make sure that your node-0 pxe boot config gets changed from CREATE to JOIN! Otherwise you will have a bad time and get two clusters
b
Ah, we're doing PXE but just running through the normal TUI installer.
b
I think it should be possible to do what you said
Edit the /oem/90_custom.yaml
but we didn't have confidence something wouldn't come along at some point and overwrite the file with what we specified at install time
So opting for the full reinstall of the node
b
Yeah, the switches our boxes are connected to need a full FW update and reboot so we're looking at a little bit of a down time anyways.
b
For our use case the VMs are part of HA deployments so hopefully can avoid any downtime if we are in control of the when things get shutdown and moved (unlike the upgrade process...)