Doesn’t look like those aren’t installed/available...
# rke2
c
Doesn’t look like those aren’t installed/available by default, but the VMs seem to be able to ping one-another without issue
h
Does the service start?
Copy code
systemctl status rke2-server.service
If it's not running?
Copy code
journalctl -u rke2-server.service
Have you looked at that output?
c
Unfortunately
systemctl status rke2-server.service
returns that there is no service by that name (could not be found), and
journalctl
similarly reports nothing. I’ll run a quick
find
to see if that service exists to begin with. However, I should ask, should I be checking this on a specific node or should it be on all nodes?
h
are you creating RKE2 cluster from Rancher UI? or downloading the tar ball and running it?
the rke2-server service will be on all nodes • if they have all roles (etcd/CP/worker) • if they have etcd/CP role
👍 1
c
I’m using the Rancher API to create the RKE2 cluster in a Triton datacenter using a custom driver (again, works with Ubuntu just fine).
We use the
/v1/provisioning.cattle.io.clusters
endpoint and pass a JSON dataset with all the required data needed to create the cluster.
c
If the services aren't present then the install is failing for some reason. Check rancher-system-agent service logs. If that is not present either then the cloud-init userdata script that installs the agent is not running for some reason. Make sure your image supports cloud-init.
👍 1
c
Thank you for the info! I’ll take a look around more and report back
Quick report back: there is no
rancher-system-agent
service, but
cloud-init
seems to have completed running as a service:
Finished cloud-init.service - Initial cloud-init job (metadata service crawler).
However, I do see
sudo sh /tmp/install_script.sh
still running, and I just noticed that
sleep 5
was also in
ps -ef
output and has the parent PID of the
install_script.sh
process. Looks like
sleep 5
is in the
download_rancher_file
function, so it’s failing to download (or at least validate that it can download) with this curl:
Copy code
curl $noproxy --connect-timeout 60 --max-time 300 --write-out "%{http_code}\n" ${CURL_BIN_CAFLAG} ${CURL_LOG} -fL "${url}" -o "${CATTLE_AGENT_BIN_PREFIX}/bin/${name}"
I’ll check this out and see if I can find out what it’s doing.
Quick
while true; do ps -ef | grep curl | grep -v grep; done
shows that it’s trying to connect to
"${rancher}/healthz"
and DNS isn’t resolving the domain. Thank you for the tips and being a sounding board while I try to look at and explain stuff, hah. I’ll report back again if I figure out what I need to do to resolve this.