it is not “throttled”. but it is an overlay, encap...
# harvester
t
it is not “throttled”. but it is an overlay, encapsulated network. meaning the mtu it smaller for the vms than the hosts. Also are you using the same network card for both hosts and vms?
w
Hi @thousands-advantage-10804 I've 2x10GiB and 1x1GiB, the 10Gi's are used for storage and management. I've setup a network for the 1GiB ports to route traffic to the Vm's - so the VM's use their own set of ports for communicating out - since our line is 1GiB I figured this would be be better, saving the 10GiB nics for the hosts/storage. I'm not 100% sure we have the VLANS setup properly, storage I believe was tagged - but planning to do a bit of an audit on the network anyway. It is stuff like this that has me thinking about moving away to use bare metal k8s / Ceph with rancher instead - we've a decreasing need for VM's - but we're not in a rush and the flexibility that Harvester gives us does make it possible for us to experiment with a wide range of setups. Ultimately our stack is very much used for development and experimentation. Spotted this yesterday when running a backup as the speeds were not what was expected - hence wanting to dig into this a bit and understand why this is an ordinal slower than expected. Any pointers on things to check configuration wise greatly appreciated. This setup has 5 nodes, each with 2xSFP+ 10GiB Nics and an 1GiB RJ45 - also another separate RJ45 for remote management. Each machine has a mix of storage with a minimum of 1TiB NVME, 1TiB SSD, 20TiB HDD - we're using longhorn and we're generally using 3 replicas for all volumes. We've a separate 4 node low power cluster (3 cp, 1 worker) with 1GiB networking and 1TiB SSD's dedicated as a rancher management cluster with this harvester cluster attached, we use this to provision k8s clusters on harvester alongside running a bunch of dedicated VM's.
e
Hi @worried-state-78253, can you provide a few more details about how your Harvester cluster is set up? Particularly the details of how the VM Network is set up and how the network interfaces on the VMs in question are set up. And what kind of network infrastructure you have to support your Harvester cluster (i.e. switches and routers and how they are connected, DHCP servers...) At or below 1000Base-T speeds processing power probably isn't a concern, unless you're using some sort of SBC or otherwise unsuitable hardware. Even with a slightly reduced MTU due to overlay networking the throughput should be way higher. Use
ethtool
to confirm that the interfaces are all running in the expected modes. You wrote that you're using RJ45 based cabling for the network hardware in question. There are different qualities of cabling. For 1000Base-T anything worse than CAT5e is a probably a waste of time. Note that CAT5 != CAT5e. You can also try to find the bottleneck by using tools like
perf3
to double check that the line-speeds between your nodes matches your expectations roughly. If your hardware is good and you're sure that there aren't any weird things configured on your switches (QoS...) there are some gotchas when setting up VMs and their networks: One thing to look out for is the device type of the virtual network device. There are several models to choose from:
e1000
,
rtl8139
... and they all have pros and cons. If your guest operating system supports it, go with the paravirtualized
virtio
for best performance in most cases. The other ones are software-emulated versions of popular real hardware, which offer better compatibility with operating systems that don't ship with
virtio
drivers (Windows) , but they can be a performance bottleneck. The other thing to look out for is the setup of the virtual network. The defaults may not be best for performance. If you can, set up the network to be bridged rather than NAT'ed. But this requires e.g. an external DHCP server or similar provisions configure networking for your guest VMs. --- If you're looking to go beyond 10 000Base-T speeds you should consider doing PCI passthrough of a virtual function of the network card (i.e SR-IOV or whatever equivalent your hardware supports). This requires careful hardware-specific setup beginning with enabling the necessary processor features in your EFI firmware and loading the required kernel modules. But if done right it should be possible to get the full performance out of your hardware from within a VM.
w
I'm currently auditing things, just got the macs from every machines interfaces, about to cross check this with the switch to see what its seeing them connected as. Really annoyingly my 10GiB SPF+ switch needs resetting to connect the management software (ubiquity usw pro). So I'm going to shut all the VM's down a bit later and get that working again as thats where the VLAN's are setup for the management/storage network. Basically there are additional 1GiB RJ45 ports, I'm using Cat6 cables - those are all part of a network thats bridged to from the VM's. These are assigned to a cluster network (default) and thats attached to the VM network (tenant) which is attached to the VM's using virtio as a bridge. A lot to unpack with your very detailed and timely response as I'm presently auditing and documenting our network config.
BINGO! ethtool does indeed show 100MiB as the link speed! so there is the problem!!! Need to check why thats dropped down now...
WoW ... just unplugging and plugging back in and its now showing as 1GiB.... wonder how/what could have caused it to switch down... think I'm going to see if I can get these nics monitored next!
🎉 1
Handy little script if anyone needs it - you can set an array of host IPs / Nics and it'll report back ... think I'll adapt this to run from our monitoring server and keep an eye on them 🙂 no idea how they dropped - almost all of them had dropped to 100MiB which was super strange...
Copy code
#!/bin/bash

# List of hosts and interfaces (format: ip_or_hostname:interface)
HOSTS_AND_IFACES=(
  "192.168.X.Y:enp5s0"
)

# SSH options (add -i /path/to/key if needed)
SSH_OPTS="-o BatchMode=yes -o ConnectTimeout=5"

echo "|_. Host |_. Interface |_. Link Detected |_. Speed |_. Duplex |_. Status |"
for entry in "${HOSTS_AND_IFACES[@]}"; do
  host="${entry%%:*}"
  iface="${entry##*:}"

  # Run ethtool remotely
  output=$(ssh $SSH_OPTS "rancher@$host" "LANG=C sudo ethtool $iface 2>/dev/null")
  if [ $? -ne 0 ] || [[ -z "$output" ]]; then
    echo "| $host | $iface | ERROR | - | - | - |"
    continue
  fi

  link=$(echo "$output" | awk -F': ' '/Link detected:/ {print $2}')
  speed=$(echo "$output" | awk -F': ' '/Speed:/ {print $2}')
  duplex=$(echo "$output" | awk -F': ' '/Duplex:/ {print $2}')
  status="OK"
  if [[ "$link" != "yes" ]]; then
    status="NO LINK"
  elif [[ "$speed" == "Unknown!" || -z "$speed" ]]; then
    status="NO SPEED"
  fi

  echo "| $host | $iface | $link | $speed | $duplex | $status |"
done
and icing on cake -
Copy code
root@runner-f-1:~# speedtest-cli --secure
Retrieving <http://speedtest.net|speedtest.net> configuration...
...
Retrieving <http://speedtest.net|speedtest.net> server list...
Selecting best server based on ping...
Hosted by Community Fibre Limited (London) [49.78 km]: 11.87 ms
Testing download speed................................................................................
Download: 509.55 Mbit/s
Testing upload speed......................................................................................................
Upload: 566.69 Mbit/s
While not "full speed", thats a 500% improvement! I suspect the test is actually limited - and the speed is comparable to the earlier test from a machine outside the virtual network!