This message was deleted.
# rke2
a
This message was deleted.
c
did you try to change the token value?
something causing the token to be changed is really the only possible cause of that message.
b
I did not try to change the token. Before this we saw fatal errors like this:
Copy code
Aug 13 14:25:23 notice k3s: time="2023-08-13T14:25:23Z" level=fatal msg="leaderelection lost for rke2-etcd"
Aug 13 14:43:34 notice k3s: time="2023-08-13T14:43:34Z" level=fatal msg="leaderelection lost for rke2"
Aug 13 14:50:23 notice k3s: time="2023-08-13T14:50:23Z" level=fatal msg="leaderelection lost for rke2-etcd"
Aug 13 19:01:33 notice k3s: time="2023-08-13T19:01:33Z" level=fatal msg="leaderelection lost for rke2-etcd"
Aug 13 20:09:24 notice k3s: time="2023-08-13T20:09:24Z" level=fatal msg="leaderelection lost for rke2"
Aug 13 22:48:29 notice k3s: time="2023-08-13T22:48:29Z" level=fatal msg="leaderelection lost for rke2-etcd"
Aug 14 00:51:37 notice k3s: time="2023-08-14T00:51:37Z" level=fatal msg="leaderelection lost for rke2"
Aug 14 00:56:27 notice k3s: time="2023-08-14T00:56:27Z" level=fatal msg="clusterrole: EnsureRBACPolicy failed: unable to initialize roles: timed out waiting for the condition"
Aug 14 00:59:40 notice k3s: time="2023-08-14T00:59:40Z" level=fatal msg="clusterrole: EnsureRBACPolicy failed: unable to initialize roles: timed out waiting for the condition"
Aug 14 01:12:03 notice k3s: time="2023-08-14T01:12:03Z" level=fatal msg="leaderelection lost for rke2-etcd"
Aug 14 02:35:26 notice k3s: time="2023-08-14T02:35:26Z" level=fatal msg="leaderelection lost for rke2-etcd"
Aug 14 04:44:22 notice k3s: time="2023-08-14T04:44:22Z" level=fatal msg="leaderelection lost for rke2"
Aug 14 17:00:44 notice k3s: time="2023-08-14T17:00:44Z" level=fatal msg="leaderelection lost for rke2"
c
why does it say k3s but you’re running rke2?
b
It's an artifact of history... we started on k3s and moved to rke2
c
Something else is going on with this node. The token can be specified in the config; if it is not then it is written to disk in the token file. The token must match the contents of the datastore.
Did you somehow use the same datastore when switching from k3s to rke2?
b
nope we used sqlight with k3s (which IMHO sqlight worked much better for our single node use case)
A lot of fingers pointing to
etcd
which seems very displeased with the disk we have it running on.
c
yeah, the leader election messages would point to the datastore timing out on operations, but that would not account for the errors about the bootstrap data having a different token.
Did someone perhaps try to do some cleanup or move things around and accidentally delete the token file?
b
What should the path to the token file be?
c
if you didn’t specify it in the config, one is generated for you and stored at
/var/lib/rancher/rke2/server/token
b
Copy code
$ ls -l /data/rancher/rke2/server/token
-rw------- 1 root root 109 Aug 14 04:44 /data/rancher/rke2/server/token
Copy code
Access: 2023-06-21 17:36:33.832657697 +0000
Modify: 2023-08-14 04:44:43.806223505 +0000
Change: 2023-08-14 04:44:43.806223505 +0000
 Birth: -
c
what happened at 4:44 AM?
sure looks like the file got deleted and a new token generated, that doesn’t match what’s in the datastore.
b
Checking , but probably the ansible-run that runs our k8s deployment tooling.
I'll run the tooling manually and see if it generates a new token. (although I don't think it'll run because rke2 is in a restart loop...)
@creamy-pencil-82913 is there a way to "go back"?
c
having an old copy of the datastore won’t help. The token is used to encrypt the data in the datastore, and that data hasn’t changed. You’ve changed the token on disk, so you can no longer decrypt the bootstrap data.
b
OK. I've
sudo mv /data/rancher/rke2/server/db /data/rancher/rke2/server/db.old
and that got RKE2 running...
c
you’ll have lost all your old cluster data though
b
Understood, I just need to get the machine back into service. We have automation that sets up rke2 and applies all the k8s manifests. blowing away a single node is not that big of a deal.
Without the prior token there's nothing we could have done anyway? correct?
c
yeah, you need the token. If you’re not specifying it in the config, you should back it up somewhere alongside the snapshots in case you ever want to do a restore.
492 Views