I have a 3 node cluster and when I build VMs I ran...
# harvester
h
I have a 3 node cluster and when I build VMs I randomly end up with VMs that do not get a DHCP address. I am creating 9 VMs and they 3 populate on each host. Every time I run my terraform apply, one of the hosts VMs do not get DHCP addresses. Its not always the same host... I just did a couple more rebuilds today and the problem moved between two hosts each time. I know these are pretty vague details and I am prepared to provide more if anyone is able/willing to give a hand.
h
Make sure you have enough IPs, check that the VM is indeed fully up, check the VLAN had a dhcp. How do you know they don’t have an IP? Checking via the harvester UI or opening a console to check?
h
I checked with the DHCP server admin and they have thousands available. All 9 of the VMs are on the same VLAN. I checked both the console and logged into the VM and also attempted to run dhclient -r to get an IP and it does not get one. Though on this exact same host I had VMs with IP addresses earlier today.
h
I doubt it’s related to the host and more on the switch port. Maybe it’s not recognized as a valid port for virtualization? Check the Port Security posture. There could be ACLs or rogue device detection
b
Make sure it's actually not getting an IP address and not just quemu-guest-agent failing to update the IPs properly in the vm/kubevirt objects.
t
have you tried deploying vms by hand? because it moved between hosts I wonder if it is terraform…
h
I ended up getting sidetracked for a few days with a production issue but I am back... I narrowed down the issue a bit more. The node that has the VIP is always the node that has VMs that fail to get DHCP ip addresses.
h
Sounds like your dhcp or network is trying to block rogue network devices
b
Does the DHCP service see the request? If so does it respond?
t
are you vlan tagging?
b
All excelent questions. The Port Security posture that Alejandro brought up earlier is still relevant too.
h
Not tagging One the harvester node that has the VIP I get these messages mgmt-br: received packer on mgmt-bo with own address as source address On a VM that lives on this hypervisor when I attempt to get a lease there is no error but also no IP granted... dhclient -4 -r -v Just standard output but nothing happens...
its airgapped so i cant copy/paste so sorry for typos
no port security either
the network has just one vlan, i have no control over how its set up
i am on a pretty crappy ruckus switch
t
sounds like the port may not be setup correcrtly. Can you have the network team check that the ports are all identical?
b
Does the VM mac show up in
/var/log/messages
on your dhcp server?
That would at least point to traffic issues getting out vs getting back in.
h
The ruckus switch has been reset to factory defaults a couple months back. This problem follows the vip, so if the vip is on server 1, those VMs do not get IPs, if the vip moves to server 2, then teh VMs on server 1 now get IP addresses. This makes me think its not a port config issue on the switch. The DHCP server is controlled by someone else and its a windows server. I can have them check if my mac shows up on the list though.
t
Oh. Is the VIP on the same subnet as the rest of the IPs?
h
Do you have other network ports that have many VMs with IPs? As in, the other nodes have multiple VMs with their IP assigned?
h
The vip and the vms are on the same network, i only have one network to work with.
b
I think he meant, to make sure the vip was in the same subnet as the host nodes.
h
it is
for sure
b
oh. I had something weird once. What's the slash on your subnet?
Lower than
/24
?
er.. CIDR notation (clearly I'm not a network and I'm a sys admin)
Oh, one other question, are you setting your VIP via DHCP or statically?
h
its 23
vip was set via dhcp
b
Is the VIP the .0 in the second set of IPs? I had issues with that before.
DHCP makes it smell like some sort of acl or firewall.
h
its .107 so not the .0 not sure about the firewall question, i will work with the system admins and see what they say
b
Because I think it'll be reporting more than one mac address for the port and seems like something is stomping on traffic when that happens.
t
Can you make the VIP static?
h
how would i go about making that static in a running cluster?
t
on the controller node(s) update
/oem/harvester.config
I would also triple check that the vips and “network” have the exact same subnet mask.