This message was deleted.
# general
a
This message was deleted.
c
check the pod args? I suspect you’ve got a bad kube-apiserver-arg in your config
Also, you can just look at /var/log/pods, you don’t need to jump through hoops with crictl
check /var/lib/rancher/rke2/agent/pod-manifests and see what arg is ending up as --=true, then go look at your config.yaml and see how you ended up with that.
g
Ah, thank you. I'm brand new to RKE2 so I'm using to using
docker
on RKE1
Copy code
spec:
  containers:
  - args:
    - --admission-control-config-file=/etc/rancher/rke2/rke2-pss.yaml
    - --=true
    - --allow-privileged=true
    - --anonymous-auth=false
    - --api-audiences=<https://kubernetes.default.svc.cluster.local>,rke2
    - --authorization-mode=Node,RBAC
    - --bind-address=0.0.0.0
The only cluster config I did when creating the cluster through the Rancher UI was changing the Container Network from
calico
to
canal
(since that's what I'm used to on RKE1, not sure if that is a good idea or not) and
Add-On Config
>
flannel.iface: ''
-->
flannel.iface: net1
Oh, I think I know what happened, I was planning to see if there was a way to configure
data-dir
before I saw your response so I clicked
Add
under
Advanced
>
Additional API Server Args
but never filled out the field. I just want into edit the cluster in the Rancher UI and removed the empty argument so hopefully that does it.
That fixed it, thank you @creamy-pencil-82913!
c
glad to hear it!
g
@creamy-pencil-82913 Any suggestions for the cluster seemingly stuck on
Waiting for probes: kube-controller-manager
?
Copy code
root@chad-test-control-d9b756cd-kptv5:/var/lib/rancher/rke2/bin# /var/lib/rancher/rke2/bin/crictl ps -a
CONTAINER           IMAGE               CREATED             STATE               NAME                       ATTEMPT             POD ID              POD
cecb08c222d0b       ba59c048c1040       14 minutes ago      Running             cloud-controller-manager   0                   99f7e4aa15d6a       cloud-controller-manager-chad-test-control-d9b756cd-kptv5
b7fd8d0eeda0e       aef054cc887e5       14 minutes ago      Running             kube-scheduler             0                   988774415a5fa       kube-scheduler-chad-test-control-d9b756cd-kptv5
61581396ecea3       aef054cc887e5       14 minutes ago      Running             kube-apiserver             0                   79c2463065f3c       kube-apiserver-chad-test-control-d9b756cd-kptv5
97883d9c6c254       aef054cc887e5       58 minutes ago      Running             kube-proxy                 0                   728498d5d4f66       kube-proxy-chad-test-control-d9b756cd-kptv5
cfdec2e871f2e       3497a01296944       58 minutes ago      Running             etcd                       0                   b31afb962f0e1       etcd-chad-test-control-d9b756cd-kptv5
c
check the rancher-system-agent logs in journald and see what it’s waiting for?
g
Copy code
root@chad-test-control-d9b756cd-kptv5:/var/lib/rancher/rke2/bin# systemctl status rancher-system-agent
● rancher-system-agent.service - Rancher System Agent
     Loaded: loaded (/etc/systemd/system/rancher-system-agent.service; enabled; preset: enabled)
     Active: active (running) since Tue 2024-08-27 21:18:06 UTC; 16min ago
       Docs: <https://www.rancher.com>
   Main PID: 11094 (rancher-system-)
      Tasks: 9 (limit: 2268)
     Memory: 116.5M (peak: 186.7M)
        CPU: 3.365s
     CGroup: /system.slice/rancher-system-agent.service
             └─11094 /usr/local/bin/rancher-system-agent sentinel

Aug 27 21:33:12 chad-test-control-d9b756cd-kptv5 rancher-system-agent[11094]: time="2024-08-27T21:33:12Z" level=error msg="[K8s] received secret to process that was older than the last secret operated on. (2793387>
Aug 27 21:33:12 chad-test-control-d9b756cd-kptv5 rancher-system-agent[11094]: time="2024-08-27T21:33:12Z" level=error msg="error syncing 'fleet-default/chad-test-bootstrap-template-s46ct-machine-plan': handler sec>
Aug 27 21:34:06 chad-test-control-d9b756cd-kptv5 rancher-system-agent[11094]: time="2024-08-27T21:34:06Z" level=error msg="error loading CA cert for probe (kube-controller-manager) /var/lib/rancher/rke2/server/tls>
Aug 27 21:34:06 chad-test-control-d9b756cd-kptv5 rancher-system-agent[11094]: time="2024-08-27T21:34:06Z" level=error msg="error while appending ca cert to pool for probe kube-controller-manager"
Aug 27 21:34:11 chad-test-control-d9b756cd-kptv5 rancher-system-agent[11094]: time="2024-08-27T21:34:11Z" level=error msg="error loading CA cert for probe (kube-controller-manager) /var/lib/rancher/rke2/server/tls>
Aug 27 21:34:11 chad-test-control-d9b756cd-kptv5 rancher-system-agent[11094]: time="2024-08-27T21:34:11Z" level=error msg="error while appending ca cert to pool for probe kube-controller-manager"
Aug 27 21:34:16 chad-test-control-d9b756cd-kptv5 rancher-system-agent[11094]: time="2024-08-27T21:34:16Z" level=error msg="error loading CA cert for probe (kube-controller-manager) /var/lib/rancher/rke2/server/tls>
Aug 27 21:34:16 chad-test-control-d9b756cd-kptv5 rancher-system-agent[11094]: time="2024-08-27T21:34:16Z" level=error msg="error while appending ca cert to pool for probe kube-controller-manager"
Aug 27 21:34:21 chad-test-control-d9b756cd-kptv5 rancher-system-agent[11094]: time="2024-08-27T21:34:21Z" level=error msg="error loading CA cert for probe (kube-controller-manager) /var/lib/rancher/rke2/server/tls>
Aug 27 21:34:21 chad-test-control-d9b756cd-kptv5 rancher-system-agent[11094]: time="2024-08-27T21:34:21Z" level=error msg="error while appending ca cert to pool for probe kube-controller-manager"
c
I don’t see kube-controller-manager running though. That usually means you don’t have enough memory or CPU for the kubelet to schedule the static pod. Are you sure this node meets system requirements?
g
It is a pretty small VM: 2 vCPUs, 2 GB RAM, 50 GB disk
According to https://docs.rke2.io/install/requirements#linuxwindows I need at least 4 GB of RAM so I can change over to that but would that be related to the
error loading CA cert for probe (kube-controller-manager) /var/lib/rancher/rke2/server/tls>
error?
Copy code
root@chad-test-control-d9b756cd-kptv5:/var/lib/rancher/rke2/bin# free -m
               total        used        free      shared  buff/cache   available
Mem:            1940         763         152           1        1204        1176
Swap:              0           0           0
c
yes, 1.9GB is definitely not enough.
The kube-controller-manager hasn’t started yet, because you don’t have enough memory for the pod to be admitted. So health checks that try to use things that it creates on startup (like certificates), will fail until it runs.
g
Ah ok, I will try again tomorrow with 4 GB+ memory and report back. Thanks!
That worked, good call.
165 Views