Hi all Anyone faced this issue while adding rke2 cluster nod Rancher Users #rke2

Hi all, Anyone faced this issue while adding rke2 ...

icy-author-82766

10/02/2025, 8:25 PM

Hi all, Anyone faced this issue while adding rke2 cluster nodes to rancher ? [ERROR] error syncing ‘c-m-csvjq78f’: handler cluster-deploy: cluster context c-m-csvjq78f is unavailable, requeuing [INFO] [planner] rkecluster fleet-default/cluster-main: configuring bootstrap node(s) custom-6cccc7462912: waiting for cluster agent to connect [ERROR] error syncing ‘_all_‘: handler user-controllers-controller: userControllersController: failed to set peers for key _all_: failed to start user controllers for cluster c-m-csvjq78f: ClusterUnavailable 503: cluster not found, requeuing

creamy-pencil-82913

10/02/2025, 8:32 PM

icy-author-82766

10/02/2025, 8:54 PM

it looks like the cattle-cluster-agent is not deployed yet. Im seeing only these in the rancher-system-agent.service $ journalctl -u rancher-system-agent.service -f Started rancher-system-agent.service - Rancher System Agent. time=“2025-10-02T194006Z” level=info msg=“Rancher System Agent version v0.3.13 (5a64be2) is starting” time=“2025-10-02T194006Z” level=info msg=“Using directory /var/lib/rancher/agent/work for work” time=“2025-10-02T194006Z” level=info msg=“Starting remote watch of plans” time=“2025-10-02T194006Z” level=info msg=“Starting /v1, Kind=Secret controller” In the rancher UI it is waiting for the cluster agent to connect. I’ve checked the connectivity from the master and worker nodes to rancher server endpoint and it looks good. i see few other people also reported the same issue. https://slack-archive.rancher.com/t/28536525/since-we-ve-updated-rancher-to-2-11-newly-created-clusters-u https://slack-archive.rancher.com/t/27160825/hi-i-m-having-an-issue-when-i-deploy-rancher-using-the-docke

creamy-pencil-82913

10/02/2025, 8:56 PM

cattle cluster agent, not rancher system agent

creamy-pencil-82913

10/02/2025, 8:56 PM

cattle cluster agent runs as a pod IN the cluster. You’d need to look on an existing server node that is already up.

icy-author-82766

10/02/2025, 9:28 PM

Yeah, it looks like the cluster is stuck in the bootstrap phase (waiting for the cluster agent to connect ). it seems the core kubernetes controlplane isn’t fully initialized. so the cattle-cluster-agent is not deployed yet.

creamy-pencil-82913

10/02/2025, 9:43 PM

You said you were trying to add nodes. Is this an existing functional cluster that you are trying to add nodes to?

creamy-pencil-82913

10/02/2025, 9:43 PM

Or are you just trying to bring the cluster up for the first time.

icy-author-82766

10/02/2025, 9:46 PM

I’m trying to bring it up for the first time. I’m deploying rancher using docker compose and trying to add master and worker nodes to it.

creamy-pencil-82913

10/02/2025, 9:48 PM

Well for starters running rancher in docker isn't technically supported for anything except basically toy deployments. But it should generally work.

creamy-pencil-82913

10/02/2025, 9:48 PM

Did you add nodes with all roles to the cluster? Etcd, control plane, and worker?

creamy-pencil-82913

10/02/2025, 9:49 PM

Check the logs on the server (etcd and control-plane) nodes to see why it's not coming up.

icy-author-82766

10/02/2025, 9:52 PM

I have two node one for master (Etcd, control plane) and one for worker (worker)

icy-author-82766

10/02/2025, 9:57 PM

I got couple of issues at first but resolved it with below steps. [FATAL] Aborting system-agent installation due to requested strict CA verification with no CA checksum provided Fix: Go to Rancher Golbal Settings > agent-tls-mode > Change value from Strict to System Store. ---------------------------------------- time=“2025-09-30T015752Z” level=fatal msg=“invalid value provided for --profile flag” Fix: Changed the security compliance profile to csi from csi-1.23 ----------------------------------------- time=“2025-09-30T020446Z” level=fatal msg=“invalid kernel parameter value vm.overcommit_memory=0 - expected 1\ninvalid kernel parameter value kernel.panic=-1 - expected 10\ninvalid kernel parameter value kernel.panic_on_oops=0 - expected 1\n” Fix: Step 1: Create a System Sysctl Configuration File Create a new file in the /etc/sysctl.d/ directory (e.g., 90-rke2.conf). This is the standard way to apply permanent system tunings. sudo vi /etc/sysctl.d/90-rke2.conf # RKE2/Kubelet required settings vm.overcommit_memory = 1 kernel.panic = 10 kernel.panic_on_oops = 1 Step 2: Load the New Configuration sudo sysctl -p /etc/sysctl.d/90-rke2.conf Step 3: Restart RKE2 With the kernel parameters now matching the expected values, the RKE2 server should pass its internal checks and start successfully. sudo systemctl restart rke2-server sudo journalctl -u rke2-server -f ----------------------------------------- time=“2025-09-30T021156Z” level=fatal msg=“missing required: user: unknown user etcd\nmissing required: group: unknown group etcd\n” rke2-server.service: Main process exited, code=exited, status=1/FAILURE Fix: Step 1: Create the etcd Group sudo groupadd --system etcd Step 2: Create the etcd User sudo useradd --system \ --shell /sbin/nologin \ --comment “etcd service user” \ --gid etcd \ etcd Step 3: Verify and Restart # Verify the user and group are set up id etcd # Restart RKE2 sudo systemctl restart rke2-server sudo journalctl -u rke2-server -f

icy-author-82766

10/02/2025, 9:58 PM

Now I’m stuck at (waiting for cluster agent to connect)

creamy-pencil-82913

10/02/2025, 10:20 PM

you’re trying to set this up as a hardened cluster (profile: cis) without having hardened the base os first?

icy-author-82766

10/02/2025, 11:02 PM

Yes, that is correct.

creamy-pencil-82913

10/02/2025, 11:03 PM

if you’re just getting started, you might try not doing that, first?

creamy-pencil-82913

10/02/2025, 11:04 PM

but, if you have everything working, the rke2-server service should be running, and you should be able to use kubectl to interact with the cluster and see what pods are running or not running

icy-author-82766

10/02/2025, 11:05 PM

oh got it. I’ll try that.

icy-author-82766

10/02/2025, 11:05 PM

Thanks

73 Views

Open in Slack

Previous Next