Anyone have issues with setting up a new cluster a...
# rke2
s
Anyone have issues with setting up a new cluster and the rancher system agent gets stuck on
msg="Starting /v1, Kind=Secret controller"
? Running it on ubuntu 24 minimum.
c
it’s not stuck. that node just doesn’t have anything to do yet, because Rancher hasn’t given it anything to do. Figure out what Rancher thinks the cluster is waiting on.
probably waiting on something to come up on another node
s
It's a fresh cluster and this is the first node.
etc, control not worker.
c
has it installed and started rke2 yet?
s
Rancher is just saying it's waiting for the node.
Nope.
c
has the node been able to actually talk to rancher?
what do the full logs say
also check the pod logs on the rancher side, there may be something there as well
s
It's L2 adjacent to rancher so no firewall blocking.
c
node hasn’t been created yet, so something’s gone wrong.
check the full rancher-system-agent logs, and the capi and rancher pod logs
s
image.png
Not seeing anything related to that cluster in rancher logs.
Using LSOF -i -n -o -P I see the node with an established connection to the rancher server via 443.
c
did you look at the capi controller logs?
s
Looking through them now.
Copy code
I0407 19:11:13.012827       1 machine_controller_phases.go:306] "Waiting for infrastructure provider to create machine infrastructure and report status.ready" controller="machine" controllerGroup="<http://cluster.x-k8s.io|cluster.x-k8s.io>" controllerKind="Machine" Machine="fleet-default/custom-7e6bf12a948f" namespace="fleet-default" name="custom-7e6bf12a948f" reconcileID="fe538cf8-25c9-402b-baeb-d44f6c213670" Cluster="fleet-default/infrastructure" Cluster="fleet-default/infrastructure" CustomMachine="fleet-default/custom-7e6bf12a948f"
I0407 19:11:13.032519       1 machine_controller_noderef.go:60] "Waiting for infrastructure provider to report spec.providerID" controller="machine" controllerGroup="<http://cluster.x-k8s.io|cluster.x-k8s.io>" controllerKind="Machine" Machine="fleet-default/custom-7e6bf12a948f" namespace="fleet-default" name="custom-7e6bf12a948f" reconcileID="fe538cf8-25c9-402b-baeb-d44f6c213670" Cluster="fleet-default/infrastructure" Cluster="fleet-default/infrastructure" CustomMachine="fleet-default/custom-7e6bf12a948f"
c
is this a custom cluster, or ?
s
Yes custom.
c
there’s something in some log that you’re missing
s
Agreed.
But every log I look at just basically states it's waiting on the node.
And the node has nothing for logging other than what I posted.
c
what output did you get when you ran the install command on the node?
s
Some secret log somewhere else on the node?
Copy code
% Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 33705    0 33705    0     0   707k      0 --:--:-- --:--:-- --:--:--  715k
[INFO]  Label: <http://cattle.io/os=linux|cattle.io/os=linux>
[INFO]  Role requested: etcd
[INFO]  Role requested: controlplane
[INFO]  CA strict verification is set to false
[INFO]  Using default agent configuration directory /etc/rancher/agent
[INFO]  Using default agent var directory /var/lib/rancher/agent
[INFO]  Determined CA is not necessary to connect to Rancher
[INFO]  Successfully tested Rancher connection
[INFO]  Downloading rancher-system-agent binary from https://#############.#######.####/assets/rancher-system-agent-amd64
[INFO]  Successfully downloaded the rancher-system-agent binary.
[INFO]  Downloading rancher-system-agent-uninstall.sh script from https://#############.#######.####/assets/system-agent-uninstall.sh
[INFO]  Successfully downloaded the rancher-system-agent-uninstall.sh script.
[INFO]  Generating Cattle ID
[INFO]  Successfully downloaded Rancher connection information
[INFO]  systemd: Creating service file
[INFO]  Creating environment file /etc/systemd/system/rancher-system-agent.env
[INFO]  Enabling rancher-system-agent.service
Created symlink /etc/systemd/system/multi-user.target.wants/rancher-system-agent.service → /etc/systemd/system/rancher-system-agent.service.
[INFO]  Starting/restarting rancher-system-agent.service
c
check the rest of the logs on the rancher cluster for that machine id
theres some log somewhere that will tell you what it’s looking for
s
All of the logs from rancher with the machine id:
Copy code
2025/04/07 18:28:47 [INFO] EnsureSecretForServiceAccount: waiting for secret [fleet-default:custom-7e6bf12a948f-machine-plan-token-kgnj7] for service account [fleet-default:custom-7e6bf12a948f-machine-plan] to be populated with token
2025/04/07 18:28:47 [INFO] EnsureSecretForServiceAccount: got the service account token for service account [fleet-default:custom-7e6bf12a948f-machine-plan] in 45.708823ms
2025/04/07 18:28:47 [INFO] [rke2configserver] fleet-default/custom-7e6bf12a948f machineID: 418740b92a6b873f5c255e3564ab4d4faa1316df296b63818b8558a4cf1e8fe delivering planSecret custom-7e6bf12a948f-machine-plan with token secret fleet-default/custom-7e6bf12a948f-machine-plan-token-kgnj7 to system-agent from plan service account watch
2025/04/07 18:28:47 [INFO] EnsureSecretForServiceAccount: waiting for secret [fleet-default:custom-7e6bf12a948f-machine-bootstrap-token-6pzs8] for service account [fleet-default:custom-7e6bf12a948f-machine-bootstrap] to be populated with token
2025/04/07 18:28:47 [INFO] EnsureSecretForServiceAccount: got the service account token for service account [fleet-default:custom-7e6bf12a948f-machine-bootstrap] in 12.419947ms
Ohhh geez.. 😑 I forgot to add the worker node...
Thank you for the assist. Sorry it ended up something stupid.