- The containers between hosts were not able to co...
# general
f
• The containers between hosts were not able to communicate using Flannel overlay network. • we were seeing "bad udp cksum" error on tcpdump
Copy code
12:08:04.657660 IP (tos 0x0, ttl 64, id 6305, offset 0, flags [DF], proto UDP (17), length 154)
    ccperf1-ch-s1-z3-0.openstack.na-us-1.cloud.sap.8301 > ccperf1-ch-s1-z3-1.node.perf1-ms-cc.consul.mu.devlab.cc.int.ariba.com.8301: [bad udp cksum 0x93a0 -> 0xf7a3!] UDP, length 126
12:08:05.790492 IP (tos 0x0, ttl 64, id 6466, offset 0, flags [none], proto UDP (17), length 110)
    ccperf1-ch-s1-z3-0.openstack.na-us-1.cloud.sap.40809 > ccperf1-ch-s1-z3-1.node.perf1-ms-cc.consul.mu.devlab.cc.int.ariba.com.8472: [bad udp cksum 0x2f9f -> 0xf321!] OTV, flags [I] (0x08), overlay 0, instance 1
IP (tos 0x0, ttl 63, id 10042, offset 0, flags [DF], proto TCP (6), length 60)
    10.1.2.0.57732 > sampleapp-java-v-192ed3e-17-1707307092.service.default.perf1-ms-cc.consul.mu.devlab.cc.int.ariba.com.http: Flags [S], cksum 0x6432 (incorrect -> 0x27b5), seq 1690897648, win 64390, options [mss 1370,sackOK,TS val 2711604204 ecr 0,nop,wscale 7], length 0
• We've followed this article https://github.com/kubernetes/kubernetes/pull/92035#issuecomment-1329950771 & after doing below chksum offloading, containers able to communicate between hosts.
Copy code
ethtool -K flannel.1 tx-checksum-ip-generic off
ethtool -K ens192 tx-checksum-ip-generic off
• but we see perf degradation after doing chksum offloading. so trying to find out any other alternative solution for this issue. • OS & Kernel version
Copy code
ccloud@ccperf1-ch-s1-z2-0:~$ uname -r
5.15.0-76-generic
ccloud@ccperf1-ch-s1-z2-0:~$ cat /etc/os-release
PRETTY_NAME="Ubuntu 22.04.2 LTS"
• This article https://github.com/flannel-io/flannel/issues/1279#issuecomment-647715533 says, Here is another workaround for the issue not requiring turning off chksum offload. UDP port 8472 is the default port for flannel encapsulating packet. It clears the mark to avoid doing SNAT on the encapsulating packet, thus no double SNAT. but this workaround didnt help us. do we need to tweak anything here?
Copy code
sudo iptables -A OUTPUT -p udp -m udp --dport 8472 -j MARK --set-xmark 0x0
current iptable rules for flannel
Copy code
# iptables -t nat -vnL | grep flannel
 629K   65M FLANNEL-POSTRTG  all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* flanneld masq */
    0     0 RETURN     all  --  *      *       0.0.0.0/0            0.0.0.0/0            mark match 0x4000/0x4000 /* flanneld masq */
19457 1167K RETURN     all  --  *      *       10.1.43.0/24         10.1.0.0/16          /* flanneld masq */
    0     0 RETURN     all  --  *      *       10.1.0.0/16          10.1.43.0/24         /* flanneld masq */
 195K   12M RETURN     all  --  *      *      !10.1.0.0/16          10.1.43.0/24         /* flanneld masq */
    0     0 MASQUERADE  all  --  *      *       10.1.0.0/16         !224.0.0.0/4          /* flanneld masq */ random-fully
    0     0 MASQUERADE  all  --  *      *      !10.1.0.0/16          10.1.0.0/16          /* flanneld masq */ random-fully
• any suggestion/ideas to fix the containers communication between hosts without doing chksum offload.