This message was deleted.
# general
a
This message was deleted.
c
wait are you trying to register an existing cluster with the rancher server, or deploy an entirely new cluster?
b
new cluster
c
you’ll see that until rke2 starts up. If it’s not starting, check the rke2-server logs
b
ok, all log entries for rke2 are info level so it is hard for me to tell which of the errors is causing this issue since I am not familiar with the installation process details. I can see this
Copy code
Waiting to retrieve agent configuration; server is not ready: \"overlayfs\" snapshotter cannot be enabled for \"/var/lib/rancher/rke2/agent/containerd\", try using \"fuse-overlayfs\" or \"native\": failed to mount overlay: invalid argument"
and
Copy code
Failed to test data store connection: context deadline exceeded
c
yeah that would do it
What OS is this, and what is the backing filesystem for /var/lib/rancher/rke2? Why is overlayfs broken?
b
it is suse enterprise
c
which version
b
SUSE Linux Enterprise Server 15 SP3
15.3
c
that should work, we test our releases on that. Is there anything else going on with this node? Has it been hardened, has a shared filesystem at /var/lib, anything like that?
b
I can see certs under this directory
/var/lib/rancher/rke2/server/tls/
c
yes it is complaining about the overlayfs for containerd, not the certs
b
ah
c
but you won’t get all the certs created until the pods can run, and that requires that containerd be functional
b
this is a diskless machine
fs is in ram
c
and that is what the agent is complaining about
b
not sure this may be the issue
c
that is not going to work…
you need actual disk, not just a tmpfs
the cluster requires storing data to a persistent filesystem, I’m not sure how a diskless ephemeral node would even work?
b
yeah, well I am just testing things not really hoping to have a production system or so
ok, thank you I will see if I can use another machine or maybe mount a block device from the network
c
I would probably test on a more traditional install. like, one with a disk.
👍 1
b
I have mounted a ceph rbd on the node, how can I clean it?
c
run the uninstall script?
for both rke2 and the agent
b
ok, got a block device on the host, still giving the same error
Copy code
-- Logs begin at Wed 2023-01-11 21:00:16 CET, end at Wed 2023-01-11 21:28:03 CET. --
Jan 11 21:22:18 nid003205 systemd[1]: Started Rancher System Agent.
Jan 11 21:22:18 nid003205 rancher-system-agent[35560]: time="2023-01-11T21:22:18+01:00" level=info msg="Rancher System Agent version v0.2.13 (4fa9427) is starting"
Jan 11 21:22:18 nid003205 rancher-system-agent[35560]: time="2023-01-11T21:22:18+01:00" level=info msg="Using directory /var/lib/rancher/agent/work for work"
Jan 11 21:22:18 nid003205 rancher-system-agent[35560]: time="2023-01-11T21:22:18+01:00" level=info msg="Starting remote watch of plans"
Jan 11 21:22:18 nid003205 rancher-system-agent[35560]: time="2023-01-11T21:22:18+01:00" level=info msg="Starting /v1, Kind=Secret controller"
Jan 11 21:22:18 nid003205 rancher-system-agent[35560]: time="2023-01-11T21:22:18+01:00" level=info msg="Detected first start, force-applying one-time instruction set"
Jan 11 21:22:18 nid003205 rancher-system-agent[35560]: time="2023-01-11T21:22:18+01:00" level=info msg="[Applyinator] Applying one-time instructions for plan with checksum 1f46a0b078bf7b84542151fcf4267b598c701ff891f92444bcef90f7fbb47a7f"
Jan 11 21:22:18 nid003205 rancher-system-agent[35560]: time="2023-01-11T21:22:18+01:00" level=info msg="[Applyinator] Extracting image rancher/system-agent-installer-rke2:v1.24.8-rke2r1 to directory /var/lib/rancher/agent/work/20230111-212218/1f46a0b078bf7b84542151fcf4267b598
c701ff891f92444bcef90f7fbb47a7f_0"
Jan 11 21:22:18 nid003205 rancher-system-agent[35560]: time="2023-01-11T21:22:18+01:00" level=info msg="Using private registry config file at /etc/rancher/agent/registries.yaml"
Jan 11 21:22:18 nid003205 rancher-system-agent[35560]: time="2023-01-11T21:22:18+01:00" level=info msg="Pulling image <http://index.docker.io/rancher/system-agent-installer-rke2:v1.24.8-rke2r1|index.docker.io/rancher/system-agent-installer-rke2:v1.24.8-rke2r1>"
Jan 11 21:22:19 nid003205 rancher-system-agent[35560]: time="2023-01-11T21:22:19+01:00" level=info msg="Extracting file installer.sh to /var/lib/rancher/agent/work/20230111-212218/1f46a0b078bf7b84542151fcf4267b598c701ff891f92444bcef90f7fbb47a7f_0/installer.sh"
Jan 11 21:22:19 nid003205 rancher-system-agent[35560]: time="2023-01-11T21:22:19+01:00" level=info msg="Extracting file rke2.linux-amd64.tar.gz to /var/lib/rancher/agent/work/20230111-212218/1f46a0b078bf7b84542151fcf4267b598c701ff891f92444bcef90f7fbb47a7f_0/rke2.linux-amd64.t
ar.gz"
Jan 11 21:22:20 nid003205 rancher-system-agent[35560]: time="2023-01-11T21:22:20+01:00" level=info msg="Extracting file sha256sum-amd64.txt to /var/lib/rancher/agent/work/20230111-212218/1f46a0b078bf7b84542151fcf4267b598c701ff891f92444bcef90f7fbb47a7f_0/sha256sum-amd64.txt"
Jan 11 21:22:21 nid003205 rancher-system-agent[35560]: time="2023-01-11T21:22:21+01:00" level=info msg="Extracting file run.sh to /var/lib/rancher/agent/work/20230111-212218/1f46a0b078bf7b84542151fcf4267b598c701ff891f92444bcef90f7fbb47a7f_0/run.sh"
Jan 11 21:22:21 nid003205 rancher-system-agent[35560]: time="2023-01-11T21:22:21+01:00" level=info msg="[Applyinator] Running command: sh [-c run.sh]"
Jan 11 21:22:21 nid003205 rancher-system-agent[35560]: time="2023-01-11T21:22:21+01:00" level=info msg="[1f46a0b078bf7b84542151fcf4267b598c701ff891f92444bcef90f7fbb47a7f_0:stderr]: + SA_INSTALL_PREFIX=/usr/local"
Jan 11 21:22:21 nid003205 rancher-system-agent[35560]: time="2023-01-11T21:22:21+01:00" level=info msg="[1f46a0b078bf7b84542151fcf4267b598c701ff891f92444bcef90f7fbb47a7f_0:stderr]: + mkdir -p /var/lib/rancher/rke2"
Jan 11 21:22:21 nid003205 rancher-system-agent[35560]: time="2023-01-11T21:22:21+01:00" level=info msg="[1f46a0b078bf7b84542151fcf4267b598c701ff891f92444bcef90f7fbb47a7f_0:stderr]: + SAI_FILE_DIR=/var/lib/rancher/rke2/system-agent-installer"
Jan 11 21:22:21 nid003205 rancher-system-agent[35560]: time="2023-01-11T21:22:21+01:00" level=info msg="[1f46a0b078bf7b84542151fcf4267b598c701ff891f92444bcef90f7fbb47a7f_0:stderr]: + RESTART_STAMP_FILE=/var/lib/rancher/rke2/system-agent-installer/rke2_restart_stamp"
Jan 11 21:22:21 nid003205 rancher-system-agent[35560]: time="2023-01-11T21:22:21+01:00" level=info msg="[1f46a0b078bf7b84542151fcf4267b598c701ff891f92444bcef90f7fbb47a7f_0:stderr]: + RKE2_SA_ENV_FILE_NAME=rke2-sa.env"
Jan 11 21:22:21 nid003205 rancher-system-agent[35560]: time="2023-01-11T21:22:21+01:00" level=info msg="[1f46a0b078bf7b84542151fcf4267b598c701ff891f92444bcef90f7fbb47a7f_0:stderr]: + '[' '!' -d /var/lib/rancher/rke2/system-agent-installer ']'"
Jan 11 21:22:21 nid003205 rancher-system-agent[35560]: time="2023-01-11T21:22:21+01:00" level=info msg="[1f46a0b078bf7b84542151fcf4267b598c701ff891f92444bcef90f7fbb47a7f_0:stderr]: + mkdir -p /var/lib/rancher/rke2/system-agent-installer"
Jan 11 21:22:21 nid003205 rancher-system-agent[35560]: time="2023-01-11T21:22:21+01:00" level=info msg="[1f46a0b078bf7b84542151fcf4267b598c701ff891f92444bcef90f7fbb47a7f_0:stderr]: + check_target_mountpoint"
Jan 11 21:22:21 nid003205 rancher-system-agent[35560]: time="2023-01-11T21:22:21+01:00" level=info msg="[1f46a0b078bf7b84542151fcf4267b598c701ff891f92444bcef90f7fbb47a7f_0:stderr]: + mountpoint -q ''"
Jan 11 21:22:21 nid003205 rancher-system-agent[35560]: time="2023-01-11T21:22:21+01:00" level=info msg="[1f46a0b078bf7b84542151fcf4267b598c701ff891f92444bcef90f7fbb47a7f_0:stderr]: + check_target_ro"
Jan 11 21:22:21 nid003205 rancher-system-agent[35560]: time="2023-01-11T21:22:21+01:00" level=info msg="[1f46a0b078bf7b84542151fcf4267b598c701ff891f92444bcef90f7fbb47a7f_0:stderr]: + touch /usr/local/.rke2-ro-test"
Jan 11 21:22:21 nid003205 rancher-system-agent[35560]: time="2023-01-11T21:22:21+01:00" level=info msg="[1f46a0b078bf7b84542151fcf4267b598c701ff891f92444bcef90f7fbb47a7f_0:stderr]: + rm -rf /usr/local/.rke2-ro-test"
Jan 11 21:22:21 nid003205 rancher-system-agent[35560]: time="2023-01-11T21:22:21+01:00" level=info msg="[1f46a0b078bf7b84542151fcf4267b598c701ff891f92444bcef90f7fbb47a7f_0:stderr]: + test 0 -ne 0"
Jan 11 21:22:21 nid003205 rancher-system-agent[35560]: time="2023-01-11T21:22:21+01:00" level=info msg="[1f46a0b078bf7b84542151fcf4267b598c701ff891f92444bcef90f7fbb47a7f_0:stderr]: + SYSTEMD_BASE_PATH=/usr/local/lib/systemd/system"
Jan 11 21:22:21 nid003205 rancher-system-agent[35560]: time="2023-01-11T21:22:21+01:00" level=info msg="[1f46a0b078bf7b84542151fcf4267b598c701ff891f92444bcef90f7fbb47a7f_0:stderr]: + RKE2_SA_ENV_FILE_PATH=/var/lib/rancher/rke2/system-agent-installer/rke2-sa.env"
Jan 11 21:22:21 nid003205 rancher-system-agent[35560]: time="2023-01-11T21:22:21+01:00" level=info msg="[1f46a0b078bf7b84542151fcf4267b598c701ff891f92444bcef90f7fbb47a7f_0:stderr]: + RKE2_SA_ENV_SRV_REF=EnvironmentFile=-/var/lib/rancher/rke2/system-agent-installer/rke2-sa.env
"
Jan 11 21:22:21 nid003205 rancher-system-agent[35560]: time="2023-01-11T21:22:21+01:00" level=info msg="[1f46a0b078bf7b84542151fcf4267b598c701ff891f92444bcef90f7fbb47a7f_0:stderr]: + '[' -f /var/lib/rancher/rke2/system-agent-installer/rke2_restart_stamp ']'"
Jan 11 21:22:21 nid003205 rancher-system-agent[35560]: time="2023-01-11T21:22:21+01:00" level=info msg="[1f46a0b078bf7b84542151fcf4267b598c701ff891f92444bcef90f7fbb47a7f_0:stderr]: + '[' -n 14e342151591d92e3a3c812770b163f1bf24bfc0eec9e8099b20e59979b84725 ']'"
Jan 11 21:22:21 nid003205 rancher-system-agent[35560]: time="2023-01-11T21:22:21+01:00" level=info msg="[1f46a0b078bf7b84542151fcf4267b598c701ff891f92444bcef90f7fbb47a7f_0:stderr]: + '[' '' '!=' 14e342151591d92e3a3c812770b163f1bf24bfc0eec9e8099b20e59979b84725 ']'"
Jan 11 21:22:21 nid003205 rancher-system-agent[35560]: time="2023-01-11T21:22:21+01:00" level=info msg="[1f46a0b078bf7b84542151fcf4267b598c701ff891f92444bcef90f7fbb47a7f_0:stderr]: + RESTART=true"
Jan 11 21:22:21 nid003205 rancher-system-agent[35560]: time="2023-01-11T21:22:21+01:00" level=info msg="[1f46a0b078bf7b84542151fcf4267b598c701ff891f92444bcef90f7fbb47a7f_0:stderr]: + env INSTALL_RKE2_ARTIFACT_PATH=/var/lib/rancher/agent/work/20230111-212218/1f46a0b078bf7b8454
2151fcf4267b598c701ff891f92444bcef90f7fbb47a7f_0 INSTALL_RKE2_TAR_PREFIX=/usr/local installer.sh"
Jan 11 21:22:21 nid003205 rancher-system-agent[35560]: time="2023-01-11T21:22:21+01:00" level=info msg="[1f46a0b078bf7b84542151fcf4267b598c701ff891f92444bcef90f7fbb47a7f_0:stdout]: [INFO]  staging local checksums from /var/lib/rancher/agent/work/20230111-212218/1f46a0b078bf7b
84542151fcf4267b598c701ff891f92444bcef90f7fbb47a7f_0/sha256sum-amd64.txt"
Jan 11 21:22:21 nid003205 rancher-system-agent[35560]: time="2023-01-11T21:22:21+01:00" level=info msg="[1f46a0b078bf7b84542151fcf4267b598c701ff891f92444bcef90f7fbb47a7f_0:stdout]: [INFO]  staging tarball from /var/lib/rancher/agent/work/20230111-212218/1f46a0b078bf7b84542151
fcf4267b598c701ff891f92444bcef90f7fbb47a7f_0/rke2.linux-amd64.tar.gz"
Jan 11 21:22:21 nid003205 rancher-system-agent[35560]: time="2023-01-11T21:22:21+01:00" level=info msg="[1f46a0b078bf7b84542151fcf4267b598c701ff891f92444bcef90f7fbb47a7f_0:stdout]: [INFO]  verifying tarball"
Jan 11 21:22:21 nid003205 rancher-system-agent[35560]: time="2023-01-11T21:22:21+01:00" level=info msg="[1f46a0b078bf7b84542151fcf4267b598c701ff891f92444bcef90f7fbb47a7f_0:stdout]: [INFO]  unpacking tarball file to /usr/local"
Jan 11 21:22:22 nid003205 rancher-system-agent[35560]: time="2023-01-11T21:22:22+01:00" level=info msg="[1f46a0b078bf7b84542151fcf4267b598c701ff891f92444bcef90f7fbb47a7f_0:stderr]: + '[' -f /var/lib/rancher/rke2/system-agent-installer/rke2-sa.env ']'"
Jan 11 21:22:22 nid003205 rancher-system-agent[35560]: time="2023-01-11T21:22:22+01:00" level=info msg="[1f46a0b078bf7b84542151fcf4267b598c701ff891f92444bcef90f7fbb47a7f_0:stderr]: + OLD_ENV_FILE_PATH_HASH=e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855"
Jan 11 21:22:22 nid003205 rancher-system-agent[35560]: time="2023-01-11T21:22:22+01:00" level=info msg="[1f46a0b078bf7b84542151fcf4267b598c701ff891f92444bcef90f7fbb47a7f_0:stderr]: + install -m 600 /dev/null /var/lib/rancher/rke2/system-agent-installer/rke2-sa.env"
Jan 11 21:22:22 nid003205 rancher-system-agent[35560]: time="2023-01-11T21:22:22+01:00" level=info msg="[1f46a0b078bf7b84542151fcf4267b598c701ff891f92444bcef90f7fbb47a7f_0:stderr]: ++ env"
Jan 11 21:22:22 nid003205 rancher-system-agent[35560]: time="2023-01-11T21:22:22+01:00" level=info msg="[1f46a0b078bf7b84542151fcf4267b598c701ff891f92444bcef90f7fbb47a7f_0:stderr]: ++ grep '^RKE2_'"
Jan 11 21:22:22 nid003205 rancher-system-agent[35560]: time="2023-01-11T21:22:22+01:00" level=info msg="[1f46a0b078bf7b84542151fcf4267b598c701ff891f92444bcef90f7fbb47a7f_0:stderr]: ++ true"
Jan 11 21:22:22 nid003205 rancher-system-agent[35560]: time="2023-01-11T21:22:22+01:00" level=info msg="[1f46a0b078bf7b84542151fcf4267b598c701ff891f92444bcef90f7fbb47a7f_0:stderr]: + RKE2_ENV="
Jan 11 21:22:22 nid003205 rancher-system-agent[35560]: time="2023-01-11T21:22:22+01:00" level=info msg="[1f46a0b078bf7b84542151fcf4267b598c701ff891f92444bcef90f7fbb47a7f_0:stderr]: + '[' -n '' ']'"
Jan 11 21:22:22 nid003205 rancher-system-agent[35560]: time="2023-01-11T21:22:22+01:00" level=info msg="[1f46a0b078bf7b84542151fcf4267b598c701ff891f92444bcef90f7fbb47a7f_0:stderr]: ++ env"
Jan 11 21:22:22 nid003205 rancher-system-agent[35560]: time="2023-01-11T21:22:22+01:00" level=info msg="[1f46a0b078bf7b84542151fcf4267b598c701ff891f92444bcef90f7fbb47a7f_0:stderr]: ++ grep -Ei '^(NO|HTTP|HTTPS)_PROXY'"
c
where did you mount the block device?
b
the mount device is mounted in /var/lib/rancher
c
I suspect that this is probably going to be a lot more trouble than its worth. Does your environment not offer nodes with traditional block volume storage for the root filesystem?
stop looking at the rancher-system-agent logs, that’s not going to have anything interesting. everything you want is in the rke2-server logs
b
with k3s works...
c
once that comes up the agent will be happy. the agent logs will not tell you anything different
b
ok, thank you for the tip, I am checking the rke2-server logs
I can see this
Copy code
Jan 11 21:23:12 nid003205 rke2[35719]: time="2023-01-11T21:23:12+01:00" level=error msg="error syncing 'kube-system/rke2-metrics-server': handler helm-controller-chart-registration: <http://helmcharts.helm.cattle.io|helmcharts.helm.cattle.io> \"rke2-metrics-server\" not found, requeuing"
and
Copy code
Jan 11 21:23:11 nid003205 rke2[35719]: time="2023-01-11T21:23:11+01:00" level=error msg="error syncing 'kube-system/rke2-ingress-nginx': handler helm-controller-chart-registration: <http://helmcharts.helm.cattle.io|helmcharts.helm.cattle.io> \"rke2-ingress-nginx\" not found, requeuing"
and
Copy code
Jan 11 21:23:10 nid003205 rke2[35719]: time="2023-01-11T21:23:10+01:00" level=error msg="error syncing 'kube-system/rke2-coredns': handler helm-controller-chart-registration: <http://helmcharts.helm.cattle.io|helmcharts.helm.cattle.io> \"rke2-coredns\" not found, requeuing"
also
Copy code
Jan 11 21:23:09 nid003205 rke2[35719]: time="2023-01-11T21:23:09+01:00" level=error msg="error syncing 'kube-system/rke2-calico': handler helm-controller-chart-registration: <http://helmcharts.helm.cattle.io|helmcharts.helm.cattle.io> \"rke2-calico\" not found, requeuing"
c
thats all normal
it sounds like it is coming up though?
what do you get from
kubectl get pod -A
b
Copy code
-bash: kubectl: command not found
c
you’ll need to put the rke2 binaries in your path
export PATH=/var/lib/rancher/rke2/bin/:$PATH
the agent has handled the first 3 steps for you
b
Copy code
nid003205:~ # kubectl get pod -A
The connection to the server localhost:8080 was refused - did you specify the right host or port?
c
export KUBECONFIG=/etc/rancher/rke2/rke2.yaml
b
Copy code
nid003205:~ # kubectl get pod -A
NAMESPACE         NAME                                                    READY   STATUS             RESTARTS       AGE
calico-system     calico-kube-controllers-75979f6c8f-s4xp8                1/1     Running            0              33m
calico-system     calico-node-jmjkb                                       0/1     Running            0              33m
calico-system     calico-typha-9fbfb8cb6-bbht5                            1/1     Running            0              33m
cattle-system     cattle-cluster-agent-77ccc6f98d-lchpn                   0/1     CrashLoopBackOff   11 (39s ago)   34m
cattle-system     kube-api-auth-kk862                                     1/1     Running            0              34m
kube-system       cloud-controller-manager-nid003205                      1/1     Running            0              34m
kube-system       etcd-nid003205                                          1/1     Running            0              34m
kube-system       helm-install-rke2-calico-5lrdl                          0/1     Completed          2              34m
kube-system       helm-install-rke2-calico-crd-fdhkn                      0/1     Completed          0              34m
kube-system       helm-install-rke2-coredns-lbsnq                         0/1     Completed          0              34m
kube-system       helm-install-rke2-ingress-nginx-42gpw                   0/1     Completed          0              34m
kube-system       helm-install-rke2-metrics-server-xqsfr                  0/1     Completed          0              34m
kube-system       kube-apiserver-nid003205                                1/1     Running            0              34m
kube-system       kube-controller-manager-nid003205                       1/1     Running            0              34m
kube-system       kube-proxy-nid003205                                    1/1     Running            0              34m
kube-system       kube-scheduler-nid003205                                1/1     Running            0              34m
kube-system       rke2-coredns-rke2-coredns-58fd75f64b-rlw84              1/1     Running            0              34m
kube-system       rke2-coredns-rke2-coredns-autoscaler-768bfc5985-tcm7n   1/1     Running            0              34m
kube-system       rke2-ingress-nginx-controller-q4rlb                     1/1     Running            0              33m
kube-system       rke2-metrics-server-67697454f8-q82bj                    1/1     Running            0              33m
tigera-operator   tigera-operator-5dd8cf7c89-5wq4n                        1/1     Running            0              33m
ah
logs says it can't access the rancher server
c
well that would do it.
does it say why?
b
Copy code
ERROR: <https://rancher-test.cscs.ch/ping> is not accessible (Could not resolve host: <http://rancher-test.cscs.ch|rancher-test.cscs.ch>)
c
do you have a real DNS entry for that?
b
but then
Copy code
curl -k <https://rancher-test.cscs.ch/ping>
pong
c
or did you just put it in the hosts file on the node
b
so I get a response from the server
I ticked the insecure checkbox when copying the command to register the node
c
are there any errors in the
rke2-coredns-rke2-coredns-58fd75f64b-rlw84
pod logs?
looks like for some reason that can’t be resolved from within a pod
b
I see a few of those in the rke2-coredns container
Copy code
[ERROR] plugin/errors: 2 <http://rancher-test.cscs.ch|rancher-test.cscs.ch>. A: read udp 172.16.62.129:43013->10.92.100.225:53: i/o timeout
[ERROR] plugin/errors: 2 <http://rancher-test.cscs.ch|rancher-test.cscs.ch>. A: read udp 172.16.62.129:45646->10.92.100.225:53: i/o timeout
[ERROR] plugin/errors: 2 <http://rancher-test.cscs.ch|rancher-test.cscs.ch>. AAAA: read udp 172.16.62.129:39659->10.92.100.225:53: i/o timeout
c
is that your internal dns server?
yeah that would sure do it
do you have something blocking that?
b
ah ok, so the dns config is not inherited from the host?
c
it is…
coredns uses the hosts’s dns servers
that should be the same as what the host is using?
b
ok, then firewall blocking the connection
c
firewalld/ufw/et cetera should be disabled. they are not supported for use with rke2.
b
local firewall on the host is disabled, ok, I will check with the guy managing the network tomorrow
thx!
thank you for your patience 🙂
c
gl!
617 Views