This message was deleted Rancher Users #rke2

Join Slack

This message was deleted.

# rke2

adamant-kite-43734

07/27/2023, 7:59 PM

This message was deleted.

kind-church-47495

07/27/2023, 8:06 PM

are you using a DISA STIG by any chance

faint-park-33707

07/27/2023, 8:07 PM

I can ask them. I notice they were adhering to some CIS benchmarks. Not sure if they have delved further into STIG items.

kind-church-47495

07/27/2023, 8:08 PM

i mean for the servers that are more secure

faint-park-33707

07/27/2023, 8:08 PM

Yeah I'm talking about those. I will check with them.

faint-park-33707

07/27/2023, 8:10 PM

Is there a particular STIG item that comes to mind?

kind-church-47495

07/27/2023, 8:10 PM

regardless, start with checking fapolicyd and sysctl settings again you will need /etc/fapolicyd/rules.d/80-rke2.rules

Copy code

allow perm=any all : dir=/var/lib/rancher/
allow perm=any all : dir=/opt/cni/
allow perm=any all : dir=/run/k3s/
allow perm=any all : dir=/var/lib/kubelet/

Copy code

cp -f /usr/share/rke2/rke2-cis-sysctl.conf /etc/sysctl.d/60-rke2-cis.conf

and then

systemctl restart systemd-sysctl

and if you are using a DISA STIG'd image, check in the /etc/sysctl.d/ folder and make sure that one is the last applied

faint-park-33707

07/27/2023, 8:12 PM

Yeah I've tried to compare the full output of sysctl -a from a working cluster to this non-working one. I will check on the fapolicyd. Thanks much!

👍 1

rough-farmer-49135

07/27/2023, 8:16 PM

I got the behavior you're describing when I had firewalld turned on with CentOS 7. I've heard people say they've gotten RKE2 working with ufw on, but firewalld supposedly is explicitly incompatible (even though they both use the kernel's netfilter under the hood).

rough-farmer-49135

07/27/2023, 8:17 PM

There's also an RPM you have to install if you have SELinux set to enforcing. You should be able to find instructions on that on the RKE2 docs page.

faint-park-33707

07/27/2023, 8:19 PM

They don't have fapolicyd running, We also turn off firewalld and inject that selinux rpm.

rough-farmer-49135

07/27/2023, 8:23 PM

Might you have something else odd with SELinux, like I remember when I set up some KVM VMs I had to set an SELinux boolean value to allow them to read from my NFS share. Only other thing I can think of would be to check your flannel logs and/or check if you've got anything odd with IPs or if the IP spaces intersect in inconvenient ways (I had a former employer which used the same IP range Docker uses by default for something and they didn't want to change it so I always had to tweak that with Docker). Defaults with Kubernetes would be 10.42 & 10.43.

faint-park-33707

07/27/2023, 8:26 PM

We set SELinux to permissive mode to try that and it didn't work unfortunately. The Flannel logs I have not checked. I will get on that. I will also ask them if they use those networks elsewhere. I know they have a Diamanti cluster but I'm not sure what ranges that uses.

faint-park-33707

07/27/2023, 8:26 PM

Thanks for the tips Bill!

kind-church-47495

07/27/2023, 8:27 PM

do you have an example of what you're trying to do from one node to another, and the nodes themselves can communicate, just not the pods?

faint-park-33707

07/27/2023, 8:30 PM

3 node system, 1 is control-plane, 2 and 3 are agents. Ran busybox on node3. I can't ping pods on node1 or node2. Same if I were to spawn the busybox on 1 or 2. They can't ping the other pods. We noticed this when we couldn't get dns from coredns pods. We can only reach coredns if the pod was on the same server as the busybox.

kind-church-47495

07/27/2023, 8:31 PM

i'm assuming there's no network policies in place, are you trying to ping across namespaces

faint-park-33707

07/27/2023, 8:32 PM

yeah. just a stock RKE2 install. No polices are created yet.

faint-park-33707

07/27/2023, 8:32 PM

coredns is in kube-system. We ran busybox in default.

faint-park-33707

07/27/2023, 8:33 PM

we couldn't do an nslookup either connecting to the service

kind-church-47495

07/27/2023, 8:44 PM

can you ping the coredns pod by pod ip

faint-park-33707

07/27/2023, 8:44 PM

nope

faint-park-33707

07/27/2023, 8:45 PM

only if it's on the same physical server

faint-park-33707

07/27/2023, 8:46 PM

Nothing interesting found in the flannel logs

kind-church-47495

07/27/2023, 8:54 PM

sorry I can't think of much else, maybe something went weird with install since the ip_forwarding wasn't enabled initially

faint-park-33707

07/27/2023, 8:56 PM

No worries, I'm out of options to look at as well. It's greatly appreciated the tips you gave me Eric! I'll try tackling this more tomorrow.

👍 1

kind-church-47495

07/27/2023, 8:56 PM

maybe some other 3rd party software if it's "more secure" hard to say without knowing more about the image

faint-park-33707

07/27/2023, 8:57 PM

Yeah, I've been trying to knock those out as well. They had Vmware Carbon Black(not enforcing), Dynatrace (in infra only mode) and Puppet(not enforcing anymore).

faint-park-33707

07/27/2023, 8:58 PM

Puppet was initially messing with ip_forwarding

faint-park-33707

07/27/2023, 8:58 PM

Thought we had it fixed. But nope!

kind-church-47495

07/27/2023, 8:58 PM

ah, in my experience i'd just re-verify they are truly disabled, a good restart helps sometimes too

faint-park-33707

07/27/2023, 8:59 PM

I think we restarted a few times today and reverified. lol

😂 1

faint-park-33707

07/27/2023, 8:59 PM

So. Much. Fun.

faint-park-33707

07/27/2023, 8:59 PM

Anyhoot. Have a great day Eric. I'm logging off.

rough-farmer-49135

07/27/2023, 10:12 PM

At this point I think the next thing I'd try is to sniff traffic outgoing and incoming on two nodes you're trying to talk to/from. One benefit of canal (being calico + flannel) is that if I recall correctly is they're both fairly debuggable externally. I seem to recall that you could join another node to Calico with a BGP client and I recall with flannel that the command

route

would show the IP ranges going to the flannel network interfaces for each other node (which might not be there if you didn't add the config for Network Manager that's in some of the docs but not all of them).

👍 1

faint-park-33707

07/28/2023, 2:14 PM

That's going to be our next step. Hopefully today.

10 Views

Open in Slack

Previous Next