https://rancher.com/ logo
Title
b

brainy-tomato-18651

10/10/2022, 3:02 PM
Hi, i’m currently facing an issue on my Rancher (v2.6.5) and I hope you could help me. I deployed an RKE2 cluster (Hosted in AWS) and one of my etcd nodes failed. When I connect to the node locally I found this error.
-- An ExecStart= process belonging to unit rke2-server.service has exited.
--
-- The process' exit code is 'exited' and its exit status is 1.
Oct 10 14:54:03 my-etcd-node1 systemd[1]: *rke2-server.service: Failed with result 'exit-code'.*
-- Subject: Unit failed
-- Defined-By: systemd
-- Support: <http://www.ubuntu.com/support>
--
-- The unit rke2-server.service has entered the 'failed' state with result 'exit-code'.
Oct 10 14:54:03 my-etcd-node1 systemd[1]: *Failed to start Rancher Kubernetes Engine v2 (server).*
-- Subject: A start job for unit rke2-server.service has failed
-- Defined-By: systemd
-- Support: <http://www.ubuntu.com/support>
--
-- A start job for unit rke2-server.service has finished with a failure.
--
-- The job identifier is 68238 and the job result is failed.
Can anybody guide me on the solution?
a

agreeable-oil-87482

10/10/2022, 3:06 PM
What do the logs state?
b

brainy-tomato-18651

10/10/2022, 3:08 PM
mmm I’m not quite sure. How can I get them?
d

damp-engine-97478

10/10/2022, 3:30 PM
use kubernetes dashboard
and go to jobs, there u will have all the jobs running, on the right side of each job u will 3 dots, click on it, u will see logs. At the end, u will see the latest error occured
b

brainy-tomato-18651

10/10/2022, 3:36 PM
Is not available the CAPI cluster I can’t go through to it to get the jobs
a

agreeable-oil-87482

10/10/2022, 3:37 PM
journalctl -u rke2-server
🙌 1
b

brainy-tomato-18651

10/10/2022, 4:08 PM
This is what I got
Sorry is too long to paste it in the chat
Oct 07 21:32:40 my-etcd-node1 sh[489]: + /usr/bin/systemctl is-enabled --quiet nm-cloud-setup.service
Oct 07 21:32:40 my-etcd-node1 systemd[1]: Starting Rancher Kubernetes Engine v2 (server)...
Oct 07 21:32:40 my-etcd-node1 sh[494]: Failed to get unit file state for nm-cloud-setup.service: No such file or directory
Oct 07 21:32:42 my-etcd-node1 rke2[548]: time="2022-10-07T21:32:42Z" level=warning msg="not running in CIS mode"
Oct 07 21:32:42 my-etcd-node1 rke2[548]: time="2022-10-07T21:32:42Z" level=fatal msg="failed to parse system-default-registry: registries must be valid RFC 3986 URI authorities: <http://34.233.44.173.nip.io/>"
Oct 07 21:32:42 my-etcd-node1 systemd[1]: rke2-server.service: Main process exited, code=exited, status=1/FAILURE
Oct 07 21:32:42 my-etcd-node1 systemd[1]: rke2-server.service: Failed with result 'exit-code'.
Oct 07 21:32:42 my-etcd-node1 systemd[1]: Failed to start Rancher Kubernetes Engine v2 (server).
Oct 07 21:32:47 my-etcd-node1 systemd[1]: rke2-server.service: Scheduled restart job, restart counter is at 1.
Oct 07 21:32:47 my-etcd-node1 systemd[1]: Stopped Rancher Kubernetes Engine v2 (server).
Oct 07 21:32:47 my-etcd-node1 systemd[1]: Starting Rancher Kubernetes Engine v2 (server)...
This is what I’ve found
I can’t connect to the cluster through the API
a

agreeable-oil-87482

10/10/2022, 5:50 PM
Looks like you deleted the attachment.
b

brainy-tomato-18651

10/10/2022, 6:00 PM
yes sorry I didn’t realize was confidential information in it so I had to delete it. But notice I paste you some important info about the issue i’m having.
a

agreeable-oil-87482

10/10/2022, 6:06 PM
Looks like an invalid parameter for your registry
b

brainy-tomato-18651

10/10/2022, 6:09 PM
Yes I noticed that all started when I’ve configured a repository that the url was an IP address, so I think all started there. So then I removed that configuration and left the default configuration with DockerHub, but I’m still facing the same issue. I don’t know how to fix it.
a

agreeable-oil-87482

10/10/2022, 6:11 PM
It looks like it's still trying to use your nip.io address. You don't need Https:// prefix. Just host:port
b

brainy-tomato-18651

10/10/2022, 6:11 PM
So I missed the port…right?
Cause was 10.x.x.x.nip.io
Something like that
a

agreeable-oil-87482

10/10/2022, 6:12 PM
Looks like you added a http prefix
b

brainy-tomato-18651

10/10/2022, 6:12 PM
Yes..To be honest at this point I don’t remember exactly but I believe I put it with http
a

agreeable-oil-87482

10/10/2022, 6:15 PM
Yeah. You need to remove that
b

brainy-tomato-18651

10/10/2022, 6:16 PM
Yes I removed it from Rancher UI but it seems the etcd node stucks in that error
a

agreeable-oil-87482

10/10/2022, 6:19 PM
Does /etc/rancher/rke2/config.yaml exist on that node?
b

brainy-tomato-18651

10/10/2022, 6:20 PM
Let me check it
When I do a cat to that file it says cat: /etc/rancher/rke2/config.yaml: Is a directory
a

agreeable-oil-87482

10/10/2022, 6:23 PM
Does the registries.yaml list that registry with the http prefix?
b

brainy-tomato-18651

10/10/2022, 6:23 PM
Oh it is there?
Let me check it
This is what I got {“configs”:{},“mirrors”:null}
when I did a cat
a

agreeable-oil-87482

10/10/2022, 6:25 PM
Grep that folder for that registry reference
b

brainy-tomato-18651

10/10/2022, 6:26 PM
Sorry I’m kind of new How can I do that?
Just grep -rl “string” /etc/rancher/rke2/ ?
Any idea how to fix this? 🤔