This message was deleted.
# harvester
a
This message was deleted.
n
Hi, can you give more detailed info on what vlan networks each of the vm connected and which vlan is not accessible.And is your mgmt vlan is in custom vlan ? How are trying to communicate with other vlans ?
c
So the VMs are connected to multiple VLANs, but only one VLAN cannot communicate. The others are working just fine. All the VMs are on the same physical host and even just a ping between the ipv6 IPs in the same /64 are not working. I am not 100% certain how the hosts physical interface is configured as I did not do that installation/configuration, but here is the output from the host (of course all the containers configuration has been omitted from the output: 2: eno49: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 9000 qdisc mq master mgmt-bo state UP group default qlen 4096 link/ether 48df3792ee:20 brd ffffffffff:ff altname enp4s0f0 3: eno50: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 9000 qdisc mq master mgmt-bo state UP group default qlen 4096 link/ether 48df3792ee:20 brd ffffffffff:ff permaddr 48df3792ee:28 altname enp4s0f1 4: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq state UP group default qlen 2048 link/ether 30e1715392:2c brd ffffffffff:ff altname enp2s0f0 5: eno2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq state UP group default qlen 2048 link/ether 30e1715392:2d brd ffffffffff:ff altname enp2s0f1 6: eno3: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 9000 qdisc mq state DOWN group default qlen 2048 link/ether 30e1715392:2e brd ffffffffff:ff altname enp2s0f2 7: eno4: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 9000 qdisc mq state DOWN group default qlen 2048 link/ether 30e1715392:2f brd ffffffffff:ff altname enp2s0f3 8: ens1f0np0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq state UP group default qlen 8192 link/ether b883036c41:14 brd ffffffffff:ff altname enp8s0f0np0 9: ens1f1np1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq state UP group default qlen 8192 link/ether b883036c41:15 brd ffffffffff:ff altname enp8s0f1np1 10: mgmt-br: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 link/ether 48df3792ee:20 brd ffffffffff:ff inet 192.168.214.201/24 brd 192.168.214.255 scope global mgmt-br valid_lft forever preferred_lft forever 11: mgmt-bo: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 9000 qdisc noqueue master mgmt-br state UP group default qlen 1000 link/ether 48df3792ee:20 brd ffffffffff:ff
n
Even if all the VMs are on the same host, the physical nics would be connected towards an external switch and each of the vm will have vlan networks created.so we could use physical nic from mgmt cluster or we could exclusively select a physical nic and create a cluster network/vlan config on top of it and the vlan networks could use this.So can you detail about what is the 3 vlan ids on the VM and what subnet they are in ? are you using separate cluster network for connecting them ? can I have access to your setup ?
c
So we determined that when creating the network configuration if we specifically add in a mac address for the adapter then it works just fine.
@brainy-whale-97450 Need you to answer his question above.
b
Each host has 2 x 40g lacp for management and normal vm traffic. Ther are several vlans in our service provider network mapped to them depending on cluster and location. Each host machine also has 2 x 100G Mellanox adapters dedicated to sriov. Each of them is connected to a Cisco Nexus 9k with trunks and 9216 mtu.
n
since ping is not working between vms in a specific host, we need to look into what subnet you are pinging in and what vlan network its is using.can you check the subnet allocated for the vlan vm network not working and is it the ping to a different subnet or in the same subnet.I do not think logs help here, we need to trace the traffic via tcpdump to understand where the packets are dropped.If you are pinging within same subnet, we need to check if vlans are configured correctly at the external switch (cisco Nexus) and if ping is to a different subnet, then we need to check if inter vlan routing is properly configured on cisco nexus switch.Since ping is not working only after reboot of this vm, we can first look at the vm interface to see if everything ip is configured correctly and vm has route for the destination ip.
b
This is not the issue.
The issue is randomly some vlans do not communicate. It seems to be mellanox vf specific.
It has happened many times in various clusters.
Vlan and network is certainly not the issue.
We have tried enabling trust for the vfs, using all_multi and many other things. Not sure a racher issue or even a harvester issue, but i do not remember having this issues with vmware. But maybe that was only bnx2x.
n
did you check where exactly packets getting dropped ? are they reaching cisco switches ?
b
It seems to be the ones we cannot see, the vf to vf
n
since @cuddly-restaurant-47972 said configuring a mac during vm network configuration did work fine, what is that mac ? is it somewhere there is static arp added ?
I think identifying the packet drop will be helpful to isolate the issues.Is it packet not going out to physical nic from harvester towards cisco switches or it reaching the external switch but getting dropped there.But from what you say I understand that your suspicion is packet not going out of harvester towards external switch
if you have setup details, I can try to check what is happening
b
Used random macs and no static arp or anything like that.
n
you mean you created a vm with vlan network (with random mac specified from UI) and rebooted and you did not see any issue on this vlan ?
c
Yes, so a static MAC (that was randomly generated) was assigned using the netplan network configuration file and the network works properly after that.
n
do you have more information about the vm pod (kubectl exec -it pod-name /bin/bash and then ip addr show and output of ip addr show from vm's guest os during the issue state
b
So the long term way to solve is to set all VF to "trust on" and also to add a static mac to each vlan interface. After that no more issues.
Copy code
machine01-atl01:~ # cat /oem/mellanox.yaml
name: "Set MTU and Queuing"
stages:
  boot:
    - commands:
        - /sbin/ip link set dev ens1f0np0 mtu 9000
        - /sbin/ip link set dev ens1f1np1 mtu 9000
        - /sbin/ip link set dev ens1f0np0 txqueuelen 8192
        - /sbin/ip link set dev ens1f1np1 txqueuelen 8192
        - /usr/sbin/ethtool -G ens1f0np0 rx 8192 tx 8192
        - /usr/sbin/ethtool -G ens1f1np1 rx 8192 tx 8192
        - /usr/sbin/ethtool -K ens1f0np0 hw-tc-offload on
        - /usr/sbin/ethtool -K ens1f0np0 hw-tc-offload on
        - /sbin/ip link set dev eno49 mtu 9000
        - /sbin/ip link set dev eno50 mtu 9000
        - /sbin/ip link set dev eno49 txqueuelen 4096
        - /sbin/ip link set dev eno50 txqueuelen 4096
        - /usr/sbin/ethtool -G eno49 rx 4096 tx 4096
        - /usr/sbin/ethtool -G eno50 rx 4096 tx 4096
        - /sbin/ip link set dev eno1 mtu 9000
        - /sbin/ip link set dev eno2 mtu 9000
        - /sbin/ip link set dev eno3 mtu 9000
        - /sbin/ip link set dev eno4 mtu 9000
        - /sbin/ip link set dev eno1 txqueuelen 2048
        - /sbin/ip link set dev eno2 txqueuelen 2048
        - /sbin/ip link set dev eno3 txqueuelen 2048
        - /sbin/ip link set dev eno4 txqueuelen 2048
        - /sbin/ip link set dev mgmt-bo mtu 9000
        - /usr/sbin/ethtool -G eno1 rx 2047 tx 511 rx-jumbo 1023
        - /usr/sbin/ethtool -G eno2 rx 2047 tx 511 rx-jumbo 1023
        - /usr/sbin/ethtool -G eno3 rx 2047 tx 511 rx-jumbo 1023
        - /usr/sbin/ethtool -G eno4 rx 2047 tx 511 rx-jumbo 1023
        -  /sbin/ip link set dev ens1f0np0 vf 0 trust on
        -  /sbin/ip link set dev ens1f0np0 vf 1 trust on
        -  /sbin/ip link set dev ens1f0np0 vf 2 trust on
        -  /sbin/ip link set dev ens1f0np0 vf 3 trust on
        -  /sbin/ip link set dev ens1f0np0 vf 4 trust on
        -  /sbin/ip link set dev ens1f0np0 vf 5 trust on
        -  /sbin/ip link set dev ens1f0np0 vf 6 trust on
        -  /sbin/ip link set dev ens1f0np0 vf 7 trust on
        -  /sbin/ip link set dev ens1f0np0 vf 8 trust on
        -  /sbin/ip link set dev ens1f0np0 vf 9 trust on
        -  /sbin/ip link set dev ens1f0np0 vf 10 trust on
        -  /sbin/ip link set dev ens1f0np0 vf 11 trust on
        -  /sbin/ip link set dev ens1f0np0 vf 12 trust on
        -  /sbin/ip link set dev ens1f0np0 vf 13 trust on
        -  /sbin/ip link set dev ens1f0np0 vf 14 trust on
        -  /sbin/ip link set dev ens1f0np0 vf 15 trust on
        -  /sbin/ip link set dev ens1f1np1 vf 15 trust on
        -  /sbin/ip link set dev ens1f1np1 vf 14 trust on
        -  /sbin/ip link set dev ens1f1np1 vf 13 trust on
        -  /sbin/ip link set dev ens1f1np1 vf 12 trust on
        -  /sbin/ip link set dev ens1f1np1 vf 11 trust on
        -  /sbin/ip link set dev ens1f1np1 vf 10 trust on
        -  /sbin/ip link set dev ens1f1np1 vf 9 trust on
        -  /sbin/ip link set dev ens1f1np1 vf 8 trust on
        -  /sbin/ip link set dev ens1f1np1 vf 7 trust on
        -  /sbin/ip link set dev ens1f1np1 vf 6 trust on
        -  /sbin/ip link set dev ens1f1np1 vf 5 trust on
        -  /sbin/ip link set dev ens1f1np1 vf 4 trust on
        -  /sbin/ip link set dev ens1f1np1 vf 3 trust on
        -  /sbin/ip link set dev ens1f1np1 vf 2 trust on
        -  /sbin/ip link set dev ens1f1np1 vf 1 trust on
        -  /sbin/ip link set dev ens1f1np1 vf 0 trust on