https://rancher.com/ logo
t

thankful-balloon-877

12/05/2022, 4:16 PM
Hi,I am installed Rancher with RKE2 on SLE Micro using the RPM package. I did this two times before, and it always worked great, installing the package,
systemctl start rke2-server
, and waiting for it to come up. This time, the service will not come up - in the kubelet.log file I find several entries of this:
Copy code
E1205 13:53:15.683661    2726 remote_runtime.go:209] "RunPodSandbox from runtime service failed" err="rpc error: code = Unknown desc = failed to create containerd task: failed to create shim: OCI runtime create failed: runc create failed: unable to start container process: error during container init: write /proc/self/attr/keycreate: invalid argument: unknown"
E1205 13:53:15.683713    2726 kuberuntime_sandbox.go:70] "Failed to create sandbox for pod" err="rpc error: code = Unknown desc = failed to create containerd task: failed to create shim: OCI runtime create failed: runc create failed: unable to start container process: error during container init: write /proc/self/attr/keycreate: invalid argument: unknown" pod="kube-system/etcd-rancher-har-nue-01"
E1205 13:53:15.683746    2726 kuberuntime_manager.go:833] "CreatePodSandbox for pod failed" err="rpc error: code = Unknown desc = failed to create containerd task: failed to create shim: OCI runtime create failed: runc create failed: unable to start container process: error during container init: write /proc/self/attr/keycreate: invalid argument: unknown" pod="kube-system/etcd-rancher-har-nue-01"
E1205 13:53:15.683802    2726 pod_workers.go:951] "Error syncing pod, skipping" err="failed to \"CreatePodSandbox\" for \"etcd-rancher-har-nue-01_kube-system(e18aa5e5b83a5a3c56d78e4054612394)\" with CreatePodSandboxError: \"Failed to create sandbox for pod \\\"etcd-rancher-har-nue-01_kube-system(e18aa5e5b83a5a3c56d78e4054612394)\\\": rpc error: code = Unknown desc = failed to create containerd task: failed to create shim: OCI runtime create failed: runc create failed: unable to start container process: error during container init: write /proc/self/attr/keycreate: invalid argument: unknown\"" pod="kube-system/etcd-rancher-har-nue-01" podUID=e18aa5e5b83a5a3c56d78e4054612394
E1205 13:53:15.723238    2726 kubelet.go:2466] "Error getting node" err="node \"rancher-har-nue-01\" not found"
Am I right in thinking that this is my issue? If yes, any ideas what is happening here and where that "invalid argument: unknown" could come from?
c

creamy-pencil-82913

12/05/2022, 5:53 PM
Can you read through the comments at https://github.com/rancher/rke2/issues/851 and see if anything here matches what you’ve done or are experiencing? Sounds very similar.
t

thankful-balloon-877

12/05/2022, 6:10 PM
Thanks Brandon, I actually found that issue, but figured it does not really match my use case, because I do not have SELinux enforced and neither do I use any non-standard network plugins
c

creamy-pencil-82913

12/05/2022, 6:26 PM
Are you using your own containerd?
t

thankful-balloon-877

12/05/2022, 6:26 PM
Nothing of the sorts, it's a stock installation of SLE Micro
This is what I wrote myself together from the previous attempts .. hence I am a bit confused what's different this time 🙂 https://w3.nue.suse.com/~gpfuetzenreuter/init-rke-node.sh.txt
c

creamy-pencil-82913

12/05/2022, 7:00 PM
you might look at the containerd log file and see if there’s anything else in there that might suggest whats gone wrong
t

thankful-balloon-877

12/05/2022, 7:16 PM
is it possible I don't even have containerd?
Copy code
rancher-har-nue-01:~ # ls /var/log/containers/ 
rancher-har-nue-01:~ # rpm -qa|grep container 
container-selinux-2.188.0-150400.1.8.noarch
c

creamy-pencil-82913

12/05/2022, 7:25 PM
The containerd log, not the container logs
/var/lib/rancher/rke2/agent/containerd/containerd.log
so you do have selinux stuff on here but it’s not in enforcing mode?
what mode is it in?
t

thankful-balloon-877

12/05/2022, 7:28 PM
oh, sorry, I forgot containerd was shipped together with rke2. containerd.log shows some of this:
Copy code
time="2022-12-05T19:26:40.803544362Z" level=warning msg="cleanup warnings time=\"2022-12-05T19:26:40Z\" level=info msg=\"starting signal loop\" namespace=k8s.
io pid=17802\ntime=\"2022-12-05T19:26:40Z\" level=warning msg=\"failed to read init pid file\" error=\"open /run/k3s/containerd/io.containerd.runtime.v2.task/
<http://k8s.io/3e36f5b6a0971ee3b62b5597f9c9931d8e58edc45b9a55ecf509272f6bb5a1a2/init.pid|k8s.io/3e36f5b6a0971ee3b62b5597f9c9931d8e58edc45b9a55ecf509272f6bb5a1a2/init.pid>: no such file or directory\"\n" 
time="2022-12-05T19:26:40.803953105Z" level=error msg="copy shim log" error="read /proc/self/fd/21: file already closed" 
time="2022-12-05T19:26:40.818880484Z" level=error msg="RunPodSandbox for &PodSandboxMetadata{Name:etcd-rancher-har-nue-01,Uid:e18aa5e5b83a5a3c56d78e4054612394
,Namespace:kube-system,Attempt:0,} failed, error" error="failed to create containerd task: failed to create shim: OCI runtime create failed: runc create faile
d: unable to start container process: error during container init: write /proc/self/attr/keycreate: invalid argument: unknown"
if you want I can upload the full file, but it seems similar to kubelet log?
SELinux shows as permissive, I have not configured anything in regards to it
Copy code
rancher-har-nue-01:~ # getenforce 
Permissive
c

creamy-pencil-82913

12/05/2022, 7:29 PM
what versions of SLE Micro and RKE2 are you using?
t

thankful-balloon-877

12/05/2022, 7:29 PM
Copy code
rancher-har-nue-01:~ # rpm -qa|grep rke 
rke2-selinux-0.11-1.sle.noarch 
rke2-common-1.23.14~rke2r1-0.x86_64 
rke2-server-1.23.14~rke2r1-0.x86_64 
rancher-har-nue-01:~ # grep PRETTY /etc/os-release 
PRETTY_NAME="SUSE Linux Enterprise Micro 5.3"
c

creamy-pencil-82913

12/05/2022, 7:32 PM
I think it is related to weird selinux stuff. Either make sure you have all the selinux-related packages installed (latest rke2-selinux) and rke2 started with
selinux: true
or remove the other selinux bits.
https://github.com/containerd/containerd/issues/5864#issuecomment-898625687 suggests that it is related to using selinux contexts that don’t exist, which would make sense if you were missing some selinux bits. Permissive mode doesn’t block anything, but you still need to set up the contexts properly.
t

thankful-balloon-877

12/05/2022, 7:54 PM
interesting. I now tried it with
selinux: true
added to config.yaml, but that seems to get stuck with the same loop. what I notice is that on one of my existing installations (same setup just slightly older versions)
Copy code
ls -RZ /var/lib/rancher/|grep container_var_lib
....
system_u:object_r:container_var_lib_t:s0 rke2 
    system_u:object_r:container_var_lib_t:s0 agent 
    system_u:object_r:container_var_lib_t:s0 bin 
    system_u:object_r:container_var_lib_t:s0 server
....
where as on the new one
Copy code
ls -RZ /var/lib/rancher/|grep container_var_lib
<empty>
rancher-har-nue-01:~ # restorecon -Rvn /var/lib/rancher/|grep container_var_lib
<empty>
the old/working one has these versions
Copy code
rancher-prv-01:~ # rpm -qa|egrep 'selinux|rke' 
selinux-policy-targeted-20210716-150400.2.3.noarch 
patterns-microos-selinux-5.3.3-150400.1.1.x86_64 
libselinux1-3.1-150400.1.69.x86_64 
selinux-policy-20210716-150400.2.3.noarch 
container-selinux-2.188.0-150400.1.2.noarch 
selinux-tools-3.1-150400.1.69.x86_64 
rke2-selinux-0.9-1.sle.noarch 
rke2-common-1.23.9~rke2r1-0.x86_64 
rke2-server-1.23.9~rke2r1-0.x86_64
c

creamy-pencil-82913

12/05/2022, 7:57 PM
did you install the rke2-selinux package that I mentioned above?
t

thankful-balloon-877

12/05/2022, 7:57 PM
yes, it is installed
c

creamy-pencil-82913

12/05/2022, 7:57 PM
ah but no container-selinux
that should be a dependency for the rke2-selinux package
did you get any errors when installing it?
you appear to be missing a bunch of selinux related stuff, compared to your working node at least
t

thankful-balloon-877

12/05/2022, 7:58 PM
it is.
Copy code
rancher-har-nue-01:~ # rpm -q container-selinux 
container-selinux-2.188.0-150400.1.8.noarch 
rancher-har-nue-01:~ # rpm -q rke2-selinux 
rke2-selinux-0.11-1.sle.noarch
rancher-har-nue-01:~ # rpm -q --requires rke2-selinux |grep container 
container-selinux >= 2.164.2-1.1
c

creamy-pencil-82913

12/05/2022, 7:58 PM
ah ok. I didn’t see it in the package list you posted above
t

thankful-balloon-877

12/05/2022, 7:58 PM
I think I confused you by using different grep patterns, sorry 😄
c

creamy-pencil-82913

12/05/2022, 7:58 PM
yeah
I would probably just compare packages between the two, see if there’s anything else you’re missing. It is very odd that you are not getting any container contexts set on the binaries
what contexts DO the rke2 binaries have?
t

thankful-balloon-877

12/05/2022, 8:01 PM
the package sets seem to be the same.. just the versions are newer, but I guess others are using the same packages too.. on the binary there's another obscurity indeed:
Copy code
old:
rancher-prv-01:~ # ls -Z /usr/bin/rke2 
system_u:object_r:container_runtime_exec_t:s0 /usr/bin/rke2
new:
rancher-har-nue-01:~ # ls -Z /usr/bin/rke2 
system_u:object_r:bin_t:s0 /usr/bin/rke2
and
restorecon -Rvn /usr/bin/rke2
doesn't return anything 😕
heh, I tried to hack around it by copying the contexts in /etc/selinux from the old to the new one - I forgot, this is SLE Micro, /usr/bin is read-only
Copy code
rancher-har-nue-01:~ # kubectl get node 
NAME                STATUS  ROLES                      AGE   VERSION 
rancher-har-nue-01  Ready   control-plane,etcd,master  118s  v1.23.14+rke2r1
some combination of copying /etc/selinux from the other setup, installing libselinux and container-selinux devel packages from security:SELinux, ignoring some relabel errors during boot, copying the rke2 binaries to /usr/local/bin and repeated restorecon's on the latter and /var/lib/rancher (it would keep resetting) made it come up now. hm I rather not keep it so macgyvered, wonder if something in the packages changed during versions that made the selinux container policies no longer install correctly themselves
445 Views