This message was deleted.
# general
a
This message was deleted.
c
Can you read through the comments at https://github.com/rancher/rke2/issues/851 and see if anything here matches what you’ve done or are experiencing? Sounds very similar.
t
Thanks Brandon, I actually found that issue, but figured it does not really match my use case, because I do not have SELinux enforced and neither do I use any non-standard network plugins
c
Are you using your own containerd?
t
Nothing of the sorts, it's a stock installation of SLE Micro
This is what I wrote myself together from the previous attempts .. hence I am a bit confused what's different this time 🙂 https://w3.nue.suse.com/~gpfuetzenreuter/init-rke-node.sh.txt
c
you might look at the containerd log file and see if there’s anything else in there that might suggest whats gone wrong
t
is it possible I don't even have containerd?
Copy code
rancher-har-nue-01:~ # ls /var/log/containers/ 
rancher-har-nue-01:~ # rpm -qa|grep container 
container-selinux-2.188.0-150400.1.8.noarch
c
The containerd log, not the container logs
/var/lib/rancher/rke2/agent/containerd/containerd.log
so you do have selinux stuff on here but it’s not in enforcing mode?
what mode is it in?
t
oh, sorry, I forgot containerd was shipped together with rke2. containerd.log shows some of this:
Copy code
time="2022-12-05T19:26:40.803544362Z" level=warning msg="cleanup warnings time=\"2022-12-05T19:26:40Z\" level=info msg=\"starting signal loop\" namespace=k8s.
io pid=17802\ntime=\"2022-12-05T19:26:40Z\" level=warning msg=\"failed to read init pid file\" error=\"open /run/k3s/containerd/io.containerd.runtime.v2.task/
<http://k8s.io/3e36f5b6a0971ee3b62b5597f9c9931d8e58edc45b9a55ecf509272f6bb5a1a2/init.pid|k8s.io/3e36f5b6a0971ee3b62b5597f9c9931d8e58edc45b9a55ecf509272f6bb5a1a2/init.pid>: no such file or directory\"\n" 
time="2022-12-05T19:26:40.803953105Z" level=error msg="copy shim log" error="read /proc/self/fd/21: file already closed" 
time="2022-12-05T19:26:40.818880484Z" level=error msg="RunPodSandbox for &PodSandboxMetadata{Name:etcd-rancher-har-nue-01,Uid:e18aa5e5b83a5a3c56d78e4054612394
,Namespace:kube-system,Attempt:0,} failed, error" error="failed to create containerd task: failed to create shim: OCI runtime create failed: runc create faile
d: unable to start container process: error during container init: write /proc/self/attr/keycreate: invalid argument: unknown"
if you want I can upload the full file, but it seems similar to kubelet log?
SELinux shows as permissive, I have not configured anything in regards to it
Copy code
rancher-har-nue-01:~ # getenforce 
Permissive
c
what versions of SLE Micro and RKE2 are you using?
t
Copy code
rancher-har-nue-01:~ # rpm -qa|grep rke 
rke2-selinux-0.11-1.sle.noarch 
rke2-common-1.23.14~rke2r1-0.x86_64 
rke2-server-1.23.14~rke2r1-0.x86_64 
rancher-har-nue-01:~ # grep PRETTY /etc/os-release 
PRETTY_NAME="SUSE Linux Enterprise Micro 5.3"
c
I think it is related to weird selinux stuff. Either make sure you have all the selinux-related packages installed (latest rke2-selinux) and rke2 started with
selinux: true
or remove the other selinux bits.
https://github.com/containerd/containerd/issues/5864#issuecomment-898625687 suggests that it is related to using selinux contexts that don’t exist, which would make sense if you were missing some selinux bits. Permissive mode doesn’t block anything, but you still need to set up the contexts properly.
t
interesting. I now tried it with
selinux: true
added to config.yaml, but that seems to get stuck with the same loop. what I notice is that on one of my existing installations (same setup just slightly older versions)
Copy code
ls -RZ /var/lib/rancher/|grep container_var_lib
....
system_u:object_r:container_var_lib_t:s0 rke2 
    system_u:object_r:container_var_lib_t:s0 agent 
    system_u:object_r:container_var_lib_t:s0 bin 
    system_u:object_r:container_var_lib_t:s0 server
....
where as on the new one
Copy code
ls -RZ /var/lib/rancher/|grep container_var_lib
<empty>
rancher-har-nue-01:~ # restorecon -Rvn /var/lib/rancher/|grep container_var_lib
<empty>
the old/working one has these versions
Copy code
rancher-prv-01:~ # rpm -qa|egrep 'selinux|rke' 
selinux-policy-targeted-20210716-150400.2.3.noarch 
patterns-microos-selinux-5.3.3-150400.1.1.x86_64 
libselinux1-3.1-150400.1.69.x86_64 
selinux-policy-20210716-150400.2.3.noarch 
container-selinux-2.188.0-150400.1.2.noarch 
selinux-tools-3.1-150400.1.69.x86_64 
rke2-selinux-0.9-1.sle.noarch 
rke2-common-1.23.9~rke2r1-0.x86_64 
rke2-server-1.23.9~rke2r1-0.x86_64
c
did you install the rke2-selinux package that I mentioned above?
t
yes, it is installed
c
ah but no container-selinux
that should be a dependency for the rke2-selinux package
did you get any errors when installing it?
you appear to be missing a bunch of selinux related stuff, compared to your working node at least
t
it is.
Copy code
rancher-har-nue-01:~ # rpm -q container-selinux 
container-selinux-2.188.0-150400.1.8.noarch 
rancher-har-nue-01:~ # rpm -q rke2-selinux 
rke2-selinux-0.11-1.sle.noarch
rancher-har-nue-01:~ # rpm -q --requires rke2-selinux |grep container 
container-selinux >= 2.164.2-1.1
c
ah ok. I didn’t see it in the package list you posted above
t
I think I confused you by using different grep patterns, sorry 😄
c
yeah
I would probably just compare packages between the two, see if there’s anything else you’re missing. It is very odd that you are not getting any container contexts set on the binaries
what contexts DO the rke2 binaries have?
t
the package sets seem to be the same.. just the versions are newer, but I guess others are using the same packages too.. on the binary there's another obscurity indeed:
Copy code
old:
rancher-prv-01:~ # ls -Z /usr/bin/rke2 
system_u:object_r:container_runtime_exec_t:s0 /usr/bin/rke2
new:
rancher-har-nue-01:~ # ls -Z /usr/bin/rke2 
system_u:object_r:bin_t:s0 /usr/bin/rke2
and
restorecon -Rvn /usr/bin/rke2
doesn't return anything 😕
heh, I tried to hack around it by copying the contexts in /etc/selinux from the old to the new one - I forgot, this is SLE Micro, /usr/bin is read-only
Copy code
rancher-har-nue-01:~ # kubectl get node 
NAME                STATUS  ROLES                      AGE   VERSION 
rancher-har-nue-01  Ready   control-plane,etcd,master  118s  v1.23.14+rke2r1
some combination of copying /etc/selinux from the other setup, installing libselinux and container-selinux devel packages from security:SELinux, ignoring some relabel errors during boot, copying the rke2 binaries to /usr/local/bin and repeated restorecon's on the latter and /var/lib/rancher (it would keep resetting) made it come up now. hm I rather not keep it so macgyvered, wonder if something in the packages changed during versions that made the selinux container policies no longer install correctly themselves
870 Views