This message was deleted Rancher Users #k3s

Join Slack

This message was deleted.

# k3s

adamant-kite-43734

12/07/2022, 6:55 AM

This message was deleted.

able-mechanic-45652

12/07/2022, 6:56 AM

for example our backend application gets fetches the container image from our internal registry but gets stuck on podinitializing phase, pod logs do not show anything

creamy-pencil-82913

12/07/2022, 6:57 AM

Well yeah if it's still initializing it hasn't run yet, there aren't going to be any logs

creamy-pencil-82913

12/07/2022, 6:57 AM

Have you checked the k3s service and containerd logs?

able-mechanic-45652

12/07/2022, 7:00 AM

k3s service logs did not show anything relevant

able-mechanic-45652

12/07/2022, 7:00 AM

how to check containerd logs?

able-mechanic-45652

12/07/2022, 7:06 AM

its a bit odd as the pod initialization should only run db migrations of our node app, as these migrations are already in database it should be quick check. Currently it has been there doing nothing for past 10 minutes

able-mechanic-45652

12/07/2022, 7:10 AM

ok, got something, seems the migration times out when setting up node express app, it tries to access redis but I guess it timesout

able-mechanic-45652

12/07/2022, 7:10 AM

nodeevents491 throw er; // Unhandled 'error' event ^ Error: getaddrinfo EAI_AGAIN core-redis at GetAddrInfoReqWrap.onlookup [as oncomplete] (nodedns109:26) Emitted 'error' event on RedisClient instance at: at RedisClient.on_error (/home/node/node_modules/redis/index.js34214) at Socket.<anonymous> (/home/node/node_modules/redis/index.js22314) at Socket.emit (nodeevents513:28) at emitErrorNT (nodeinternal/streams/destroy157:8) at emitErrorCloseNT (nodeinternal/streams/destroy122:3) at processTicksAndRejections (nodeinternal/process/task queues83:21) { errno: -3001, code: 'EAI_AGAIN', syscall: 'getaddrinfo', hostname: 'core-redis' }

able-mechanic-45652

12/07/2022, 7:11 AM

odd part again is that core-redis is running in same cluster

able-mechanic-45652

12/07/2022, 7:20 AM

using nc from the app pod would seem that this is DNS issue, core-redis is not resolving, but accessing the service IP with port opens the connection as expected.

able-mechanic-45652

12/07/2022, 7:20 AM

coredns is running

creamy-pencil-82913

12/07/2022, 8:21 AM

Possibly https://github.com/flannel-io/flannel/issues/1279#issuecomment-634124089

able-mechanic-45652

12/08/2022, 6:40 AM

@creamy-pencil-82913 I'm might have found clue in dmesg, seems there is lot of reject messages

able-mechanic-45652

12/08/2022, 6:40 AM

# iptables -L -n | grep 8472 ACCEPT tcp -- 10.21.9.141 0.0.0.0/0 state NEW tcp dpt:8472 ACCEPT tcp -- 10.21.9.142 0.0.0.0/0 state NEW tcp dpt:8472

able-mechanic-45652

12/08/2022, 6:41 AM

shouldn't that rule cover all cluster members? It's missing master (.140) and one other worker node (.147)

able-mechanic-45652

12/08/2022, 6:54 AM

# cat /etc/sysconfig/iptables *filter :INPUT ACCEPT [0:0] :FORWARD ACCEPT [0:0] :OUTPUT ACCEPT [51:25393] -A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT -A INPUT -p icmp -j ACCEPT -A INPUT -i lo -j ACCEPT -A INPUT -p tcp -m state --state NEW -m tcp --dport 22 -j ACCEPT -A INPUT -p tcp -m state --state NEW -m tcp -s 10.21.9.141,10.21.9.142 --dport 8472 -j ACCEPT -A INPUT -p tcp -m state --state NEW -m tcp --dport 30000:32767 -j ACCEPT -A INPUT -s 10.42.0.0/16 -j ACCEPT -A INPUT -d 10.42.0.0/16 -j ACCEPT -A INPUT -j LOG --log-prefix "IPTables-Reject-Input: " --log-level 4 -A INPUT -j REJECT --reject-with icmp-host-prohibited -A FORWARD -s 10.42.0.0/16 -j ACCEPT -A FORWARD -d 10.42.0.0/16 -j ACCEPT -A FORWARD -j LOG --log-prefix "IPTables-Reject-Forward: " --log-level 4 -A FORWARD -j REJECT --reject-with icmp-host-prohibited -A OUTPUT -o lo -j ACCEPT COMMIT *nat :PREROUTING ACCEPT [17:964] :INPUT ACCEPT [0:0] :OUTPUT ACCEPT [0:0] :POSTROUTING ACCEPT [0:0] COMMIT

creamy-pencil-82913

12/08/2022, 8:09 AM

That’s not managed by k3s or kubernetes. Technically you’re supposed to turn off firewalld or any other manually configured iptables-based firewall and just let Kubernetes manage all of it.

creamy-pencil-82913

12/08/2022, 8:09 AM

If you are going to manage your own rules it’s on you to make sure everything gets opened up properly, but the steps to do that aren’t really well documented anywhere.

able-mechanic-45652

12/08/2022, 8:12 AM

Yeah, based on the documentation I was not sure how to manage the firewall, k3s seems to add rules but I wasn't sure how to set the rules for it

able-mechanic-45652

12/08/2022, 8:14 AM

I fixed the rules a bit and no longer see iptables reject messages related to k3s so that should be ok now

creamy-pencil-82913

12/08/2022, 8:19 AM

kubernetes makes extensive use of iptables for almost everything relating to passing traffic between pods and services

creamy-pencil-82913

12/08/2022, 8:19 AM

you can’t really interact with that very much, it just kind of does its own thing

able-mechanic-45652

12/08/2022, 8:21 AM

can you make any rules for iptables or should the /etc/sysconfig/iptables be left with default empty rules?

creamy-pencil-82913

12/08/2022, 8:22 AM

You can but if you block stuff in there it may well break things, as you noticed

able-mechanic-45652

12/08/2022, 8:26 AM

*filter :INPUT ACCEPT [0:0] :FORWARD ACCEPT [0:0] :OUTPUT ACCEPT [51:25393] -A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT -A INPUT -p icmp -j ACCEPT -A INPUT -i lo -j ACCEPT -A INPUT -p tcp -m state --state NEW -m tcp --dport 22 -j ACCEPT -A INPUT -p tcp -m state --state NEW -m tcp --dport 6443 -j ACCEPT -A INPUT -p udp -m udp -s 10.21.9.140,10.21.9.141,10.21.9.142,10.21.9.147 --dport 8472 -j ACCEPT -A INPUT -p tcp -m state --state NEW -m tcp -s 10.21.9.140,10.21.9.141,10.21.9.142,10.21.9.147 --dport 10250 -j ACCEPT -A INPUT -p tcp -m state --state NEW -m tcp --dport 30000:32767 -j ACCEPT -A INPUT -s 10.42.0.0/16 -j ACCEPT -A INPUT -d 10.42.0.0/16 -j ACCEPT -A INPUT -j LOG --log-prefix "IPTables-Reject-Input: " --log-level 4 -A INPUT -j REJECT --reject-with icmp-host-prohibited -A FORWARD -s 10.42.0.0/16 -j ACCEPT -A FORWARD -d 10.42.0.0/16 -j ACCEPT -A FORWARD -j LOG --log-prefix "IPTables-Reject-Forward: " --log-level 4 -A FORWARD -j REJECT --reject-with icmp-host-prohibited -A OUTPUT -o lo -j ACCEPT COMMIT *nat :PREROUTING ACCEPT [17:964] :INPUT ACCEPT [0:0] :OUTPUT ACCEPT [0:0] :POSTROUTING ACCEPT [0:0] COMMIT

able-mechanic-45652

12/08/2022, 8:27 AM

theres my current rule set, I guess any problematic rules would show up as reject log entries

able-mechanic-45652

12/08/2022, 8:27 AM

Though still the pods won't come up

able-mechanic-45652

12/08/2022, 8:28 AM

relevant problem seems to be Readiness probe failed: dial tcp 10.42.1.1153000 connect: connection refused

able-mechanic-45652

12/08/2022, 8:29 AM

I guessing the firewall should let that pass or log issue

creamy-pencil-82913

12/08/2022, 8:34 AM

if your rules are interfering with the kubetet’s access to pods for health checking purposes, thats a problem

creamy-pencil-82913

12/08/2022, 8:34 AM

although that error sounds more like a problem with whatever’s running in th epod

able-mechanic-45652

12/08/2022, 8:36 AM

I fail to see how the iptables would block the health check, or rather reject the health check without logging the rejection

able-mechanic-45652

12/08/2022, 8:36 AM

The pod itself is ok, it's running same container as before the reinstall, the health check is pretty much node express server responding with static 'ok' string

able-mechanic-45652

12/08/2022, 8:48 AM

"Unable to attach or mount volumes: unmounted volumes=[storage], unattached volumes=[kube-api-access-24vkq storage]: timed out waiting for the condition" Any ideas what this is trying to do?

creamy-pencil-82913

12/08/2022, 8:51 AM

what is the volume named ‘storage’ on that pod?

creamy-pencil-82913

12/08/2022, 8:51 AM

is that a PVC or something?

able-mechanic-45652

12/08/2022, 8:58 AM

yeah, pod has pvc on core-data-claim which is packed by pv

able-mechanic-45652

12/08/2022, 8:59 AM

get pvc / get pv would seem to show them bound so I guess they should be ok

creamy-pencil-82913

12/08/2022, 9:53 AM

looks like they’re not mounting, is there anything interesting in the kubelet logs?

able-mechanic-45652

12/22/2022, 6:23 AM

for future reference, the root cause in this seemed to be clamav pod running as daemonset in different namespace, it probably corrupted the db files or something and caused huge load on the cluster which I did not spot, this then slowed other operations so that they did not complete. After removing the clamav the cluster started to behave as expected

124 Views

Open in Slack

Previous Next