This message was deleted.
# rke2
a
This message was deleted.
c
are you sure that networking between the different nodes is working properly? I suspect that coredns is now running on some of the new nodes, but your networking is broken and requests against coredns pods on the new nodes are failing.
o
so I went to two hosts that didnt have any kubernetes instances running or installed, then ran the process to install the linux agents on the two hosts.
in rancher, i see the two new worker nodes as part of the cluster
and i dont see any coredns / dns related pods running in that node at all
i do see rke2-canal running on both of the worker nodes, but nothing dns
ah, now i see some on the other 2 nodes..
sorry, on one of the new nodes, not both
and yes, looks like all of the requests are hitting that coredns on the worker node - do i just terminate that pod and the networking should switch back to the master then? like should the worker nodes have coredns on them as well?
c
I would probably fix your networking, rather than playing whack a mole by killing pods and hoping they go back to different nodes.
o
i didnt alter networking to start with. everything changed with the installation/introduction of the worker nodes
c
yes but clearly the network between your old nodes and your new nodes isn’t working, because you can’t access coredns when it’s running on the new nodes
o
think it was an issue hiding up til now? this was a one node cluster prior to adding the worker nodes. or are you saying at the OS level? also should coredns be on each node? because i only see it on 2/3, so if it needs to be on all 3 thats another thing ill have to investigate
c
no, it doesn’t need to be on all the nodes. It should work no matter where it’s running. If DNS doesn’t work for pods on a node when there’s not a coredns replica running on that node, then networking between your nodes is broken - something is causing the packets to get dropped when it goes between nodes.
o
roger that - just had me curious as the two new nodes were just added today, so this networking issue is new to me. im looking into it now. thanks brandon
random question, just wanting to check the processes i see that were taken. we should be able to install the linux agent by itself, right? or does the server part need installed first THEN the worker part?
c
there is no such thing. There is just RKE2, and you run it as a server, or as an agent.
o
okay thought so, just wanted to make sure i wasnt underthinking the situation
FWIW: compared basic fundamental networking stuff at the OS level of the hosts to start with.. come to find out,
firewalld
was running on the two hosts i used as worker nodes.. disabling/turning it off solved the connectivity issues
c
that’d do it. you could also check the port requirements in the docs, and open those.
o
yeah, that's the plan. this is the first project i have to be reliant on a separate IT team to do these things.. so, hacky for now, but it works!