This message was deleted.
# harvester
a
This message was deleted.
w
Are you asking which OS Harvester uses as its base? If so, it's a customised version of SLE Micro
b
No I was asking what OS was being used on the VM that was having errors.
t
ethtool doesn't work. The max inside the VM is 256 because qemu/kvm set the default limits as 256. To bump them up, you need to modify the qemu command line to add the rx_queue_size and tx_queue_size to the device args by putting the config I referenced in the libvirt xml.
I already bumped up the limit on the physical interfaces, but that is not where the packets are being dropped at. I have a script which monitors the interfaces on the host OS side and inside the container. It checks the physical, veth, net*, and tap* interfaces and the taps are where the packets are being dropped because the VM can't respond fast enough sometimes. Bumping up the tx/rx queue sizes is supposed to help with that, but I don't see any way I can accomplish that with Harvester beside using my own packaged KubeVirt. I am going to take another look at macvtap today, maybe that is a solution to the issue I am running into. I can't use SR-IOV due to the issue with PCI devices being assigned incorrectly by KubeVirt and some of my VM images don't support it anyway.
b
What kind of traffic are we talking about here?
We have virtual machines doing 2gb+ in normal traffic.
t
This is a late reply, but the issue in my case is a device with an Atom processor and the number of packets per second. Large packets and throughput aren't as much of an issue, I start experiencing drops around 50-55K+ pps. At a full packet size, that is nearly maxing out the 1G pipe, but the test case I am having to satisfy uses a smaller average packet size and is more concerned about at what point packets start dropping.
b
Linux os? What does your sysctl look like? Are you using 9000 mtu on your layer 3 interfaces? What network card? Are you using all the offloading capabilities?
t
packets are being dropped on the virtual side on the tap inside the container that connects to the virtio device inside the VM. MTU doesn't help, these are small packets, around 400 bytes.
Fedora 38 test VMs. Offloading on the physical side Intel I350 is on, but again the packet drops are on the virtual side, not the physical. The VM isn't processing the packets fast enough. I believe it is bottlenecked due to the Atom C3958 speed between a combination of ksoftirqd, %si, and %st. Bumping up the rx queue size to 1024 has given about a 10-15% increase, I think I can push around 60-65K pps now with similar drop rates as before at the lower pps. I'm starting to look to see whether macvtap is an option to reduce the CPU utilization. This is also UDP packets, tested with both a Spirent and with a shell script that spawns multiple parallel iperf3 tests to create multiple flows. Single flows max out a single core quicker because all traffic goes through a single virtio multiqueue thread and the single ksoftirqd inside the VM consumes 99% of a core. With multiple threads, the load is spread across the multiple virtio multiqueue threads and multiple ksoftirqd threads inside the VM.
The 1024 rx queue would definitely help if this was bursty traffic, but in this case, this is a load test that I have been given and I have to see how much performance I can get out of this box compared to a previous solution running on the same exact hardware. Even if I have to use something else like macvtap to get closer to the performance number I am trying to hit, I would likely still leave the config in place to use the larger rx queue to handle bursts better.
b
Have oyu increased the txqueuelen of the inteface?
what does ethtool -l <intrface> say
Have oyu tried enabling zero copy?
t
ethtool -l inside the VM shows 4 combined since this is a quad-core VM. I am limited in what changes I can make inside the VM due to the test requirements. I'm testing with Fedora mainly while trying to tune the system, but the real test uses a VNF appliance. Early on, I bumped up the txqueuelen on the physicals before I tracked down that the packet drops were occurring on the tap that is inside the container. The tap faces the VM. I did enable zero copy on the host OS side, but that didn't help my pps because the packets are being dropped on the virtual side. I have run the VM on dedicated cores using CPUManager and not on dedicated to allow the multiqueue threads to move around. In the non-dedicated test, other processes such as kube-apiserver spinning up would trigger packet loss, so I'm running dedicated now. I've tested IRQ/interrupts in the host OS not dedicated and on a dedicated core that I isolated from Kubernetes use by excluding it from the defaultCpuSet and disabling that core temporarily via 'chcpu -d 15' to allow kubelet to start and not complain that the default set didn't include the core.
b
fond this:
There is no "ring buffer" of packets on a virtio-net device. This means that if the TUN/TAP device’s TX queue fills up because the VM is not receiving (either fast enough or at all) then there is nowhere for new packets to go, and the hypervisor sees TX loss on the tap. If you notice TX loss on a TUN/TAP, increase the tap
txqueuelen
to avoid that, similar to increasing the RX ring buffer to stop receive loss on a physical NIC.
What do you have for receive settings on these systls?
net.core.rmem_max = 2147483647 net.core.wmem_max = 2147483647 net.core.netdev_max_backlog = 2500 net.core.somaxconn = 65000 net.core.default_qdisc = fq
Copy code
sysctl -p | grep net.core.
So it seems, increase rx queue and tx queue for physical interface ot max, set txqueuelen on tap interface to 10000, then make sure you have enought udp buffer in the vm: net.core.rmem_max
t
Thanks for the input. That sounds like an old doc. qemu support for rx_queue_size/tx_queue_size for virtio network devices is documented in multiple places. The default is 256 and these options were added to allow bumping up the size up to a max of 1024. I just found one place saying tx_queue_size is only good for vhost-user so that explains why I only saw rx increase inside the VM when I changed both in my sidecar script. The default net.core values were much lower. I just bumped them up to the values listed above and reran the test and no real change in pps before the drops. I bumped the sysctl values up in the host OS and tested, then also bumped up inside the VM and tested again. txqueuelen on the tap can't be changed. The below log is from inside the container. I tried 100/1000/10000 and same error.
Copy code
fedora-05-01:/ # ip link set tap1 txqueuelen 1000 
RTNETLINK answers: Operation not permitted
b
Not sure what else you ever figured out, but I was able to get mutiqueue enabled by adding: networkInterfaceMultiqueue: true domain: devices: networkInterfaceMultiqueue: true
that increased ring count from 1 to 8 on all interfaces