This message was deleted.
# k3s
a
This message was deleted.
a
for example our backend application gets fetches the container image from our internal registry but gets stuck on podinitializing phase, pod logs do not show anything
c
Well yeah if it's still initializing it hasn't run yet, there aren't going to be any logs
Have you checked the k3s service and containerd logs?
a
k3s service logs did not show anything relevant
how to check containerd logs?
its a bit odd as the pod initialization should only run db migrations of our node app, as these migrations are already in database it should be quick check. Currently it has been there doing nothing for past 10 minutes
ok, got something, seems the migration times out when setting up node express app, it tries to access redis but I guess it timesout
nodeevents491 throw er; // Unhandled 'error' event ^ Error: getaddrinfo EAI_AGAIN core-redis at GetAddrInfoReqWrap.onlookup [as oncomplete] (nodedns109:26) Emitted 'error' event on RedisClient instance at: at RedisClient.on_error (/home/node/node_modules/redis/index.js34214) at Socket.<anonymous> (/home/node/node_modules/redis/index.js22314) at Socket.emit (nodeevents513:28) at emitErrorNT (nodeinternal/streams/destroy157:8) at emitErrorCloseNT (nodeinternal/streams/destroy122:3) at processTicksAndRejections (nodeinternal/process/task queues83:21) { errno: -3001, code: 'EAI_AGAIN', syscall: 'getaddrinfo', hostname: 'core-redis' }
odd part again is that core-redis is running in same cluster
using nc from the app pod would seem that this is DNS issue, core-redis is not resolving, but accessing the service IP with port opens the connection as expected.
coredns is running
c
a
@creamy-pencil-82913 I'm might have found clue in dmesg, seems there is lot of reject messages
# iptables -L -n | grep 8472 ACCEPT tcp -- 10.21.9.141 0.0.0.0/0 state NEW tcp dpt:8472 ACCEPT tcp -- 10.21.9.142 0.0.0.0/0 state NEW tcp dpt:8472
shouldn't that rule cover all cluster members? It's missing master (.140) and one other worker node (.147)
# cat /etc/sysconfig/iptables *filter :INPUT ACCEPT [0:0] :FORWARD ACCEPT [0:0] :OUTPUT ACCEPT [51:25393] -A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT -A INPUT -p icmp -j ACCEPT -A INPUT -i lo -j ACCEPT -A INPUT -p tcp -m state --state NEW -m tcp --dport 22 -j ACCEPT -A INPUT -p tcp -m state --state NEW -m tcp -s 10.21.9.141,10.21.9.142 --dport 8472 -j ACCEPT -A INPUT -p tcp -m state --state NEW -m tcp --dport 30000:32767 -j ACCEPT -A INPUT -s 10.42.0.0/16 -j ACCEPT -A INPUT -d 10.42.0.0/16 -j ACCEPT -A INPUT -j LOG --log-prefix "IPTables-Reject-Input: " --log-level 4 -A INPUT -j REJECT --reject-with icmp-host-prohibited -A FORWARD -s 10.42.0.0/16 -j ACCEPT -A FORWARD -d 10.42.0.0/16 -j ACCEPT -A FORWARD -j LOG --log-prefix "IPTables-Reject-Forward: " --log-level 4 -A FORWARD -j REJECT --reject-with icmp-host-prohibited -A OUTPUT -o lo -j ACCEPT COMMIT *nat :PREROUTING ACCEPT [17:964] :INPUT ACCEPT [0:0] :OUTPUT ACCEPT [0:0] :POSTROUTING ACCEPT [0:0] COMMIT
c
That’s not managed by k3s or kubernetes. Technically you’re supposed to turn off firewalld or any other manually configured iptables-based firewall and just let Kubernetes manage all of it.
If you are going to manage your own rules it’s on you to make sure everything gets opened up properly, but the steps to do that aren’t really well documented anywhere.
a
Yeah, based on the documentation I was not sure how to manage the firewall, k3s seems to add rules but I wasn't sure how to set the rules for it
I fixed the rules a bit and no longer see iptables reject messages related to k3s so that should be ok now
c
kubernetes makes extensive use of iptables for almost everything relating to passing traffic between pods and services
you can’t really interact with that very much, it just kind of does its own thing
a
can you make any rules for iptables or should the /etc/sysconfig/iptables be left with default empty rules?
c
You can but if you block stuff in there it may well break things, as you noticed
a
*filter :INPUT ACCEPT [0:0] :FORWARD ACCEPT [0:0] :OUTPUT ACCEPT [51:25393] -A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT -A INPUT -p icmp -j ACCEPT -A INPUT -i lo -j ACCEPT -A INPUT -p tcp -m state --state NEW -m tcp --dport 22 -j ACCEPT -A INPUT -p tcp -m state --state NEW -m tcp --dport 6443 -j ACCEPT -A INPUT -p udp -m udp -s 10.21.9.140,10.21.9.141,10.21.9.142,10.21.9.147 --dport 8472 -j ACCEPT -A INPUT -p tcp -m state --state NEW -m tcp -s 10.21.9.140,10.21.9.141,10.21.9.142,10.21.9.147 --dport 10250 -j ACCEPT -A INPUT -p tcp -m state --state NEW -m tcp --dport 30000:32767 -j ACCEPT -A INPUT -s 10.42.0.0/16 -j ACCEPT -A INPUT -d 10.42.0.0/16 -j ACCEPT -A INPUT -j LOG --log-prefix "IPTables-Reject-Input: " --log-level 4 -A INPUT -j REJECT --reject-with icmp-host-prohibited -A FORWARD -s 10.42.0.0/16 -j ACCEPT -A FORWARD -d 10.42.0.0/16 -j ACCEPT -A FORWARD -j LOG --log-prefix "IPTables-Reject-Forward: " --log-level 4 -A FORWARD -j REJECT --reject-with icmp-host-prohibited -A OUTPUT -o lo -j ACCEPT COMMIT *nat :PREROUTING ACCEPT [17:964] :INPUT ACCEPT [0:0] :OUTPUT ACCEPT [0:0] :POSTROUTING ACCEPT [0:0] COMMIT
theres my current rule set, I guess any problematic rules would show up as reject log entries
Though still the pods won't come up
relevant problem seems to be Readiness probe failed: dial tcp 10.42.1.1153000 connect: connection refused
I guessing the firewall should let that pass or log issue
c
if your rules are interfering with the kubetet’s access to pods for health checking purposes, thats a problem
although that error sounds more like a problem with whatever’s running in th epod
a
I fail to see how the iptables would block the health check, or rather reject the health check without logging the rejection
The pod itself is ok, it's running same container as before the reinstall, the health check is pretty much node express server responding with static 'ok' string
"Unable to attach or mount volumes: unmounted volumes=[storage], unattached volumes=[kube-api-access-24vkq storage]: timed out waiting for the condition" Any ideas what this is trying to do?
c
what is the volume named ‘storage’ on that pod?
is that a PVC or something?
a
yeah, pod has pvc on core-data-claim which is packed by pv
get pvc / get pv would seem to show them bound so I guess they should be ok
c
looks like they’re not mounting, is there anything interesting in the kubelet logs?
a
for future reference, the root cause in this seemed to be clamav pod running as daemonset in different namespace, it probably corrupted the db files or something and caused huge load on the cluster which I did not spot, this then slowed other operations so that they did not complete. After removing the clamav the cluster started to behave as expected
113 Views