This message was deleted Rancher Users #general

Join Slack

This message was deleted.

# general

adamant-kite-43734

08/27/2024, 8:56 PM

This message was deleted.

creamy-pencil-82913

08/27/2024, 9:01 PM

check the pod args? I suspect you’ve got a bad kube-apiserver-arg in your config

creamy-pencil-82913

08/27/2024, 9:02 PM

Also, you can just look at /var/log/pods, you don’t need to jump through hoops with crictl

creamy-pencil-82913

08/27/2024, 9:02 PM

check /var/lib/rancher/rke2/agent/pod-manifests and see what arg is ending up as --=true, then go look at your config.yaml and see how you ended up with that.

gifted-breakfast-73755

08/27/2024, 9:12 PM

Ah, thank you. I'm brand new to RKE2 so I'm using to using

docker

on RKE1

gifted-breakfast-73755

08/27/2024, 9:12 PM

Copy code

spec:
  containers:
  - args:
    - --admission-control-config-file=/etc/rancher/rke2/rke2-pss.yaml
    - --=true
    - --allow-privileged=true
    - --anonymous-auth=false
    - --api-audiences=<https://kubernetes.default.svc.cluster.local>,rke2
    - --authorization-mode=Node,RBAC
    - --bind-address=0.0.0.0

gifted-breakfast-73755

08/27/2024, 9:16 PM

The only cluster config I did when creating the cluster through the Rancher UI was changing the Container Network from

calico

canal

(since that's what I'm used to on RKE1, not sure if that is a good idea or not) and

Add-On Config

flannel.iface: ''

-->

flannel.iface: net1

gifted-breakfast-73755

08/27/2024, 9:19 PM

Oh, I think I know what happened, I was planning to see if there was a way to configure

data-dir

before I saw your response so I clicked

Add

under

Advanced

Additional API Server Args

but never filled out the field. I just want into edit the cluster in the Rancher UI and removed the empty argument so hopefully that does it.

gifted-breakfast-73755

08/27/2024, 9:23 PM

That fixed it, thank you @creamy-pencil-82913!

creamy-pencil-82913

08/27/2024, 9:26 PM

glad to hear it!

gifted-breakfast-73755

08/27/2024, 9:33 PM

@creamy-pencil-82913 Any suggestions for the cluster seemingly stuck on

Waiting for probes: kube-controller-manager

Copy code

root@chad-test-control-d9b756cd-kptv5:/var/lib/rancher/rke2/bin# /var/lib/rancher/rke2/bin/crictl ps -a
CONTAINER           IMAGE               CREATED             STATE               NAME                       ATTEMPT             POD ID              POD
cecb08c222d0b       ba59c048c1040       14 minutes ago      Running             cloud-controller-manager   0                   99f7e4aa15d6a       cloud-controller-manager-chad-test-control-d9b756cd-kptv5
b7fd8d0eeda0e       aef054cc887e5       14 minutes ago      Running             kube-scheduler             0                   988774415a5fa       kube-scheduler-chad-test-control-d9b756cd-kptv5
61581396ecea3       aef054cc887e5       14 minutes ago      Running             kube-apiserver             0                   79c2463065f3c       kube-apiserver-chad-test-control-d9b756cd-kptv5
97883d9c6c254       aef054cc887e5       58 minutes ago      Running             kube-proxy                 0                   728498d5d4f66       kube-proxy-chad-test-control-d9b756cd-kptv5
cfdec2e871f2e       3497a01296944       58 minutes ago      Running             etcd                       0                   b31afb962f0e1       etcd-chad-test-control-d9b756cd-kptv5

creamy-pencil-82913

08/27/2024, 9:34 PM

check the rancher-system-agent logs in journald and see what it’s waiting for?

gifted-breakfast-73755

08/27/2024, 9:34 PM

Copy code

root@chad-test-control-d9b756cd-kptv5:/var/lib/rancher/rke2/bin# systemctl status rancher-system-agent
● rancher-system-agent.service - Rancher System Agent
     Loaded: loaded (/etc/systemd/system/rancher-system-agent.service; enabled; preset: enabled)
     Active: active (running) since Tue 2024-08-27 21:18:06 UTC; 16min ago
       Docs: <https://www.rancher.com>
   Main PID: 11094 (rancher-system-)
      Tasks: 9 (limit: 2268)
     Memory: 116.5M (peak: 186.7M)
        CPU: 3.365s
     CGroup: /system.slice/rancher-system-agent.service
             └─11094 /usr/local/bin/rancher-system-agent sentinel

Aug 27 21:33:12 chad-test-control-d9b756cd-kptv5 rancher-system-agent[11094]: time="2024-08-27T21:33:12Z" level=error msg="[K8s] received secret to process that was older than the last secret operated on. (2793387>
Aug 27 21:33:12 chad-test-control-d9b756cd-kptv5 rancher-system-agent[11094]: time="2024-08-27T21:33:12Z" level=error msg="error syncing 'fleet-default/chad-test-bootstrap-template-s46ct-machine-plan': handler sec>
Aug 27 21:34:06 chad-test-control-d9b756cd-kptv5 rancher-system-agent[11094]: time="2024-08-27T21:34:06Z" level=error msg="error loading CA cert for probe (kube-controller-manager) /var/lib/rancher/rke2/server/tls>
Aug 27 21:34:06 chad-test-control-d9b756cd-kptv5 rancher-system-agent[11094]: time="2024-08-27T21:34:06Z" level=error msg="error while appending ca cert to pool for probe kube-controller-manager"
Aug 27 21:34:11 chad-test-control-d9b756cd-kptv5 rancher-system-agent[11094]: time="2024-08-27T21:34:11Z" level=error msg="error loading CA cert for probe (kube-controller-manager) /var/lib/rancher/rke2/server/tls>
Aug 27 21:34:11 chad-test-control-d9b756cd-kptv5 rancher-system-agent[11094]: time="2024-08-27T21:34:11Z" level=error msg="error while appending ca cert to pool for probe kube-controller-manager"
Aug 27 21:34:16 chad-test-control-d9b756cd-kptv5 rancher-system-agent[11094]: time="2024-08-27T21:34:16Z" level=error msg="error loading CA cert for probe (kube-controller-manager) /var/lib/rancher/rke2/server/tls>
Aug 27 21:34:16 chad-test-control-d9b756cd-kptv5 rancher-system-agent[11094]: time="2024-08-27T21:34:16Z" level=error msg="error while appending ca cert to pool for probe kube-controller-manager"
Aug 27 21:34:21 chad-test-control-d9b756cd-kptv5 rancher-system-agent[11094]: time="2024-08-27T21:34:21Z" level=error msg="error loading CA cert for probe (kube-controller-manager) /var/lib/rancher/rke2/server/tls>
Aug 27 21:34:21 chad-test-control-d9b756cd-kptv5 rancher-system-agent[11094]: time="2024-08-27T21:34:21Z" level=error msg="error while appending ca cert to pool for probe kube-controller-manager"

creamy-pencil-82913

08/27/2024, 9:34 PM

I don’t see kube-controller-manager running though. That usually means you don’t have enough memory or CPU for the kubelet to schedule the static pod. Are you sure this node meets system requirements?

gifted-breakfast-73755

08/27/2024, 9:36 PM

It is a pretty small VM: 2 vCPUs, 2 GB RAM, 50 GB disk

gifted-breakfast-73755

08/27/2024, 9:37 PM

According to https://docs.rke2.io/install/requirements#linuxwindows I need at least 4 GB of RAM so I can change over to that but would that be related to the

error loading CA cert for probe (kube-controller-manager) /var/lib/rancher/rke2/server/tls>

error?

gifted-breakfast-73755

08/27/2024, 9:38 PM

Copy code

root@chad-test-control-d9b756cd-kptv5:/var/lib/rancher/rke2/bin# free -m
               total        used        free      shared  buff/cache   available
Mem:            1940         763         152           1        1204        1176
Swap:              0           0           0

creamy-pencil-82913

08/27/2024, 9:40 PM

yes, 1.9GB is definitely not enough.

creamy-pencil-82913

08/27/2024, 9:41 PM

The kube-controller-manager hasn’t started yet, because you don’t have enough memory for the pod to be admitted. So health checks that try to use things that it creates on startup (like certificates), will fail until it runs.

gifted-breakfast-73755

08/27/2024, 9:42 PM

Ah ok, I will try again tomorrow with 4 GB+ memory and report back. Thanks!

gifted-breakfast-73755

08/28/2024, 4:11 PM

That worked, good call.

217 Views

Open in Slack

Previous Next