This message was deleted.
# rke2
a
This message was deleted.
e
We are using the rancher release 2.6.8, and are trying to deploy a k8s 1.22 cluster
We have got past this issue, however now we get “Waiting for agent to check in and apply initial plan” when we go onto the node we can’t see any containers which is strange..
c
what kind of containers are you looking for?
have you checked the rancher-system-agent and rke2-server logs in journald?
e
I would have expected to see a cluster agent image or something? When I say container i mean no docker images on the node
There was also no cluster agent service
I had to manually run the install.sh script for the service to get created
c
when you provision rke2 nodes from rancher, it installs rancher-system-agent as a systemd unit. That in turn installs RKE2 and manages the configuration on that node. Once RKE2 is up, cattle-cluster-agent is deployed to the cluster and checks in with the Rancher management cluster.
So, if things are working, you should see systemd units for both rancher-system-agent and rke2-server
If not, check the journald logs for those same units
RKE2 doesn’t use Docker. You shouldn’t have Docker installed on the node, and you will never see any Docker images on the node nor Docker containers running.
e
Thanks @creamy-pencil-82913 we are using zypper to install containerd as part of our VMware template, is this something we should still do?
c
no. RKE2 bundles its own version of containerd that has been modified to support our registry mirror rewrites.
You shouldn’t have containerd, docker, cri-io, or anything else already installed an running.
just a bare node.
e
Thank you, no cloud init either?
c
Well, reasonably bare. Nothing beyond the normal packages that you would need to bootstrap a node on that infra provider. That includes cloud-init on most platforms.
s
In my cluster for rke2 rancher-system-agent is not installed.
c
in your cluster, or on your node
Did you use provision the cluster via Rancher?
s
I ssh onto the node and it doesn’t exist. Yeah provisioned via Rancher
Now stuck in “waiting for agent to check in and apply initial plan”
c
Did you do a custom cluster (which gives you a command to run on existing nodes), or try to provision via one of the infrastructure providers?
s
In rancher Ui,create cluster, switch rke1 to rke2,selected VMware vsphere, filled in the machine pools info etc and created
c
Check the cloud-init logs and see if there’s anything in there about why the agent failed to install
s
Just seeing a bunch of DEBUG messages. Nothing relating to an error
c
do you see it attempting to install any rancher-related components, or download and run a script from your Rancher server?
s
I ran cat cloud-unit.log | grep rancher and nothing
c
what does the cloud-init userdata script provided to this node look like?
s
How would I check that? Not sure we set anything related to cloud-init user data. In the template we created we ran zypper install cloud-init. Also installed clone master cleanup and ran it. In rancher Ui did set anything related to cloud-init
c
injecting userdata is how Rancher tells the node to install things once it comes up
s
How would I do/set that?
c
it is done for you by Rancher when you ask it to provision nodes
s
Ah ok. How can I check what it is?
s
Is it in the /etc/cloud/cloud.cig file. Sorry but of a noobie here
c
instance
The /var/lib/cloud/instance directory is a symbolic link that points to the most recently used instance-id directory. This folder contains the information cloud-init received from datasources, including vendor and user data. This can be helpful to review to ensure the correct data was passed.
its the paragraph I linked directly to
s
Ah ok cool yes in there now
Starts with #cloud-config Got a bunch of stuff in it…
c
find anything interesting?
oh I see the DM.
ok so what’s that install script have in it?
just follow the crumbs here, figure out where somethings not working
s
The script is huge! :)
Maybe seems like that script isn’t getting run?
When I run that script manually it now created the rancher-system-agent….
Which gives a bunch of errors loading carts
c
the userdata showed that it was supposed to be run
you’d have to look at the cloud-init logs to figure out why it didn’t
but yeah, that’s what you should see until the cluster comes up
you can now check the rke2-server logs, since that should be in the process of starting
s
Ok cool will look into that. Thanks for the help
Got a path to the rke2-server logs?
c
in journald, same as the agent…
s
Journalctl -u rke2-server?
c
yeah should be
s
Just says no entries
c
look at the rancher-system-agent logs then, see if it logged an error running the rke2 install
s
Would there be any reason the install.sh script isn’t an executable? We are seeing this never gets run. When we ssh onto the node we need to run chmod +x install.sh and then we can run the script
c
that sounds like an issue with cloud-init or something on your nodes. What are you using for the base image?
We test all provisioning stuff as part of every release, on all the supported operating systems, so I know it works.
s
We are using sles15sp4
e
We where told to use the
SLE-15-SP4-Full-x86_64-GM-Media1.iso
, however also been made aware of a more container friendly iso? The version of cloud init we are using is
zypper install -y cloud-init-21.4-150100.8.58.1
s
The cloud init logs say changing ownership of /usr/local/custom_script/install.sh to 0:0
c
owernship is fine, I’m just curious why it’s not being executed at all
does it show it attempting to run it?
e
Is there a log we could check to find that, the journalctl log has no entries which I’m assuming is because the install script hasn’t been run to create the service etc?
c
it would all be in the cloud-init log, since that’s what should be responsible for executing it
your screenshot showed that the script was present in the cloud-init userdata, right?
s
It doesn’t show any attempt to running it. Just says writes to the script and changes the permissions. Doesn’t show any attempt to run. Yeah I’m the user data file there is a setting Path: /usr/local/custom_script/install.sh And a runcmd -sh /usr/local/custom_script/install.sh
e
In the user-data.txt we can see it has “groups - staff” when checking this group on the instance that group has “staff1000:docker” does that look accurate? It does look like this is the user getting created to run the script but unsure why it doesn’t
c
I would expect it to get run as root, not as staff but I’m not sure
e
Interesting that’s what was in the user data and we haven’t overwrote anything 🙁
Would you be available for a 30 min zoom session tomorrow?
c
no, sorry. Thats beyond what I’m allowed to do for community support. This is best-effort slack chat only 😉
also I’m a K3s/RKE2 engineer, if you were on with support they’d probably hook you up with someone from the Rancher team since that’s where all the provisioning stuff is handled.
e
No worries, we already have a support ticket opened (we pay for rancher/Suse support)
c
oh. then you should be working with them, they’re going to be way more hands-on than I am 😉
e
Your support has been pretty good 😂😂 thanks you 😁
s
Looks like we figured out why the install script wasn’t running. We were just enabling the cloud-init service. If we enable Cloud-init-local Cloud-config Cloud-final Then it kicks off the install script
c
that’d do it!