This message was deleted.
# rke2
a
This message was deleted.
k
are you using a DISA STIG by any chance
f
I can ask them. I notice they were adhering to some CIS benchmarks. Not sure if they have delved further into STIG items.
k
i mean for the servers that are more secure
f
Yeah I'm talking about those. I will check with them.
Is there a particular STIG item that comes to mind?
k
regardless, start with checking fapolicyd and sysctl settings again you will need /etc/fapolicyd/rules.d/80-rke2.rules
Copy code
allow perm=any all : dir=/var/lib/rancher/
allow perm=any all : dir=/opt/cni/
allow perm=any all : dir=/run/k3s/
allow perm=any all : dir=/var/lib/kubelet/
Copy code
cp -f /usr/share/rke2/rke2-cis-sysctl.conf /etc/sysctl.d/60-rke2-cis.conf
and then
systemctl restart systemd-sysctl
and if you are using a DISA STIG'd image, check in the /etc/sysctl.d/ folder and make sure that one is the last applied
f
Yeah I've tried to compare the full output of sysctl -a from a working cluster to this non-working one. I will check on the fapolicyd. Thanks much!
๐Ÿ‘ 1
r
I got the behavior you're describing when I had firewalld turned on with CentOS 7. I've heard people say they've gotten RKE2 working with ufw on, but firewalld supposedly is explicitly incompatible (even though they both use the kernel's netfilter under the hood).
There's also an RPM you have to install if you have SELinux set to enforcing. You should be able to find instructions on that on the RKE2 docs page.
f
They don't have fapolicyd running, We also turn off firewalld and inject that selinux rpm.
r
Might you have something else odd with SELinux, like I remember when I set up some KVM VMs I had to set an SELinux boolean value to allow them to read from my NFS share. Only other thing I can think of would be to check your flannel logs and/or check if you've got anything odd with IPs or if the IP spaces intersect in inconvenient ways (I had a former employer which used the same IP range Docker uses by default for something and they didn't want to change it so I always had to tweak that with Docker). Defaults with Kubernetes would be 10.42 & 10.43.
f
We set SELinux to permissive mode to try that and it didn't work unfortunately. The Flannel logs I have not checked. I will get on that. I will also ask them if they use those networks elsewhere. I know they have a Diamanti cluster but I'm not sure what ranges that uses.
Thanks for the tips Bill!
k
do you have an example of what you're trying to do from one node to another, and the nodes themselves can communicate, just not the pods?
f
3 node system, 1 is control-plane, 2 and 3 are agents. Ran busybox on node3. I can't ping pods on node1 or node2. Same if I were to spawn the busybox on 1 or 2. They can't ping the other pods. We noticed this when we couldn't get dns from coredns pods. We can only reach coredns if the pod was on the same server as the busybox.
k
i'm assuming there's no network policies in place, are you trying to ping across namespaces
f
yeah. just a stock RKE2 install. No polices are created yet.
coredns is in kube-system. We ran busybox in default.
we couldn't do an nslookup either connecting to the service
k
can you ping the coredns pod by pod ip
f
nope
only if it's on the same physical server
Nothing interesting found in the flannel logs
k
sorry I can't think of much else, maybe something went weird with install since the ip_forwarding wasn't enabled initially
f
No worries, I'm out of options to look at as well. It's greatly appreciated the tips you gave me Eric! I'll try tackling this more tomorrow.
๐Ÿ‘ 1
k
maybe some other 3rd party software if it's "more secure" hard to say without knowing more about the image
f
Yeah, I've been trying to knock those out as well. They had Vmware Carbon Black(not enforcing), Dynatrace (in infra only mode) and Puppet(not enforcing anymore).
Puppet was initially messing with ip_forwarding
Thought we had it fixed. But nope!
k
ah, in my experience i'd just re-verify they are truly disabled, a good restart helps sometimes too
f
I think we restarted a few times today and reverified. lol
๐Ÿ˜‚ 1
So. Much. Fun.
Anyhoot. Have a great day Eric. I'm logging off.
r
At this point I think the next thing I'd try is to sniff traffic outgoing and incoming on two nodes you're trying to talk to/from. One benefit of canal (being calico + flannel) is that if I recall correctly is they're both fairly debuggable externally. I seem to recall that you could join another node to Calico with a BGP client and I recall with flannel that the command
route
would show the IP ranges going to the flannel network interfaces for each other node (which might not be there if you didn't add the config for Network Manager that's in some of the docs but not all of them).
๐Ÿ‘ 1
f
That's going to be our next step. Hopefully today.