This message was deleted.
# general
a
This message was deleted.
p
Check also rke2-server.service logs on the node.
Because nothing in what you share sounds breaking, that reflector error isnt very telling of anything
f
This is the logs from the rk2-server.service
p
I think its just still booting
wait no
if you restart rancher-system-agent by any chance?
f
I have tried to leave this for more than 24 hours and it stays in this state.
Ive tried to reboot the system 4 times, and there was a thread somewhere that suggested, it could resolve the issue.
Tried to restart that service now again
p
They may be something weird going on with fleet, although the error should be different... go to your local cluster, enable "show all namespaces" on the top, go to more ressources > fleet > clusters then force update your non-working cluster (url : (rancherurl)/dashboard/c/local/explorer/fleet.cattle.io.cluster )
f
Ive set restrictedAdmin on the helm deploy, so that rancher does not manage my local k8s cluster. As I dont want rancher to manage my EKS cluster.
p
you can also check the logs of the fleet-controller-manager => fleet-agentmanagement ( /dashboard/c/local/explorer/apps.deployment/cattle-fleet-system/fleet-controller )
Fleet will still be used to manage your remote cluster
f
okay. let me see
p
If fleet fails to communicate, you will get a working cluster from the kuybernetes POV but rancher UI showing non-working things
To be more precise, i don't see a fleet agent pod on the list you shared earlier so that could be a lead
f
I dont see a
fleet-controller-manager,
only a
fleet-controller
p
yes, sorry
click on it (not view logs directly) and check logs for the 'fleet-agentmanagement' one
f
Thank you. I see some connection errors here. Its trying to connect the public endpoint, and I have security-group rules on there that only allowed spesific sources. Let me fix that.
Copy code
fleet-agentmanagement time="2024-10-22T14:03:49Z" level=error msg="error syncing 'fleet-local/local': handler import-cluster: Get \"<https://rancher.a.b.c/k8s/clusters/local/version?timeout=15s>\": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers), requeuing"
p
oooh that sounds bad
It looks like fleet has failed to initialize upstream, but it might be because you're using the restrictedAdmin mode
do you have a cattle-fleet-local-system namespace? in the upstram cluster
m
@fancy-art-11312 because you only have one node and the node might only have one or some not all of control plane, etcd, and worker. once the cluster have at least one control plane, 1 etcd, and 1 worker. Your cluster will be ready. nothing wrong... please add your worker or your etcd or control plane
p
I dont think thats the issue
1
f
I did select all the roles
1
Copy code
.... --etcd --controlplane --worker
m
If you select all roles, it will not be that case.
p
but your command kubectl get nodes doesnt show edge-node-1 is a worker
Copy code
kubectl get nodes
NAME         STATUS   ROLES                              AGE    VERSION
gra-node-1   Ready    control-plane,etcd,master,worker   172d   v1.30.4+rke2r1
gra-node-2   Ready    control-plane,etcd,master,worker   98d    v1.30.4+rke2r1
rbx-node-1   Ready    control-plane,etcd,master,worker   176d   v1.30.4+rke2r1
sbg-node-1   Ready    control-plane,etcd,master,worker   168d   v1.30.4+rke2r1
sbg-node-2   Ready    control-plane,etcd,master,worker   99d    v1.30.4+rke2r1
mine for example
It might be because fleet doe
m
@fancy-art-11312 please add one worker node, your cluster might be ready.
f
Should I maybe try to recreate the cluster. I fixed the fleet-controller connection issue
p
If you fixed it, your lcuster might appear online all by itself
m
Your cluster does not have a worker node.
f
the roles on the node still does not include worker
m
when you add a worker, the cluster will be ready
f
or i can try to just add a worker
let me do that
m
please
f
I got some new errors, might be because I set the
restrictedAdmin=true
in the helm chart
p
where is that error coming from..?
f
I see it on the UI
p
is that cluster upstream or the downstream one you tried to create?
f
Not really sure what you mean. I created a new cluster on the UI, and then ran the registration command on the nodes thats on the edge.
p
mtn-cluster
Is that upstream (the rancher cluster) or downstream (the one you're trying to make) ?
f
The one I am trying to make
p
oh. Weird indeed then
f
Im going to try to create a new cluster. Maybe there's something not configured correctly due to the connection issue it had.
Yay
Thank you sooo much @powerful-librarian-10572. Really appreciate your help on how to troubleshoot my issue!
p
I have no idea why you had to recreate the whole cluster but youre welcome
f
🙇
m
@fancy-art-11312 By the way, did you do any extra configurations on you fresh VM Ubuntu24? or just register the node?
f
Only updates and then register node
m
Cooooool👍. Thanks @fancy-art-11312
f
Copy code
#cloud-config
package_update: true
package_upgrade: true
package_reboot_if_required: true
packages:
  - vim

# Manage /etc/hosts with cloud-init.
# On every boot, /etc/hosts will be re-written from
# ``/etc/cloud/templates/hosts.tmpl``.
manage_etc_hosts: true

# Setting hostname
preserve_hostname: false
hostname: edge-node-2

users:
  - name: maarten
    sudo: ALL=(ALL) NOPASSWD:ALL
    shell: /bin/bash
    ssh-authorized-keys:
      - ecdsa-sha2-nistp521 abc

runcmd:
  - curl -fL <https://a.b.c/system-agent-install.sh> | sudo  sh -s - --server <https://rancher.a.b.c> --label '<http://cattle.io/os=linux|cattle.io/os=linux>' --token abc --etcd --controlplane --worker
1