This message was deleted Rancher Users #general

Join Slack

This message was deleted.

# general

adamant-kite-43734

10/04/2024, 8:23 PM

This message was deleted.

creamy-pencil-82913

10/04/2024, 8:52 PM

what kubernetes distro? what is the specific error you’re getting?

high-thailand-50933

10/04/2024, 8:55 PM

v1.26.7+rke2r1

high-thailand-50933

10/04/2024, 8:57 PM

Error from agent nodes: Oct 04 134940 [hostnameremoved] rke2[2949459]: time="2024-10-04T134940-07:00" level=error msg="CA cert validation failed: Get \"https://127.0.0.1:6444/cacerts\": tls: failed to verify certificate: x509: certificate signed by unknown authority"

high-thailand-50933

10/04/2024, 9:06 PM

At periods of attempting to solve this, when using this script as refenced in some of the help docs. As it gets toward the end and does the http request it get stuck with a 401 Unauthorized

creamy-pencil-82913

10/04/2024, 9:14 PM

did you update the token in the node config?

high-thailand-50933

10/04/2024, 9:14 PM

creamy-pencil-82913

10/04/2024, 9:14 PM

…

creamy-pencil-82913

10/04/2024, 9:15 PM

https://docs.rke2.io/security/certificates

If a new root CA is required, the rotation will be disruptive. The
rke2 certificate rotate-ca --force
option must be used, all nodes (servers and agents) will need to be reconfigured to use the new token value, and pods will need to be restarted to trust the new root CA.

creamy-pencil-82913

10/04/2024, 9:16 PM

Since you’re getting an error about an untrusted CA, I’m assuming you broke the root of trust and the CA hash has changed.

lemon-keyboard-45433

10/04/2024, 9:17 PM

Yes, it appears that way

creamy-pencil-82913

10/04/2024, 9:17 PM

that is also called out later on that page

If you used the
--force
option or changed the root CA, ensure that any nodes that were joined with a secure token are reconfigured to use the new token value, prior to being restarted. The token may be stored in a
.env
file, systemd unit, or config.yaml, depending on how the node was configured during initial installation.

lemon-keyboard-45433

10/04/2024, 9:17 PM

I really appreciate. Just to clarify, we need to run that on the agents as well?

creamy-pencil-82913

10/04/2024, 9:18 PM

any nodes

lemon-keyboard-45433

10/04/2024, 9:18 PM

This is valuable information!

creamy-pencil-82913

10/04/2024, 9:18 PM

yes thats why its in the docs lol

high-thailand-50933

10/04/2024, 9:18 PM

@creamy-pencil-82913 Boris and Aaron are my colleagues thank you for engaging with us on this.

creamy-pencil-82913

10/04/2024, 9:19 PM

everything should go a lot more smoothly if you’re using the same root CA. If you’re changing the root CA then things are much more complicated.

worried-dusk-80606

10/04/2024, 9:22 PM

When running that on the agents we see this

Copy code

# rke2 certificate rotate-ca --path /var/lib/rancher/rke2
FATA[0000] open /var/lib/rancher/rke2/server/token: no such file or directory

worried-dusk-80606

10/04/2024, 9:23 PM

do we need to copy that from the main server?

creamy-pencil-82913

10/04/2024, 9:23 PM

what?

creamy-pencil-82913

10/04/2024, 9:23 PM

Why are you rotating again?

creamy-pencil-82913

10/04/2024, 9:24 PM

You need to reconfigure them to update the token. Not rotate again.

creamy-pencil-82913

10/04/2024, 9:24 PM

Please go sit down and re-read that page

creamy-pencil-82913

10/04/2024, 9:27 PM

1. Pick a server node to rotate on 2. Run the script to generate new certs 3. Run the rotate-ca command to load the new certs into the datastore 4. Update the token on ALL the nodes to include the new token value that the script printed 5. Restart the service on ALL the nodes, servers first, then agents

creamy-pencil-82913

10/04/2024, 9:32 PM

Also, that page specifically tells to use a temp dir to hold the new certs and not overwrite the stuff in /var/lib/rancher/rke2, you should never find yourself runing the rotate-ca command against the current data dir

👍 1

worried-dusk-80606

10/04/2024, 9:34 PM

Sorry, i've read the page several times. We did verify our token in the agents

/etc/rancher/rke2/config.yaml

matches the end portion of the value on the server

/var/lib/rancher/rke2/server/token

worried-dusk-80606

10/04/2024, 9:36 PM

when we try to run ANY

rke2 certificate rotate

command on an agent it give this error

Copy code

# rke2 certificate rotate
FATA[0000] open /var/lib/rancher/rke2/server/token: no such file or directory

worried-dusk-80606

10/04/2024, 9:37 PM

That's why we thought it was only run on the server and not the agents

worried-dusk-80606

10/04/2024, 9:37 PM

our server starts, our agents wont

worried-dusk-80606

10/04/2024, 9:39 PM

we rotated because our certs where expired.

creamy-pencil-82913

10/04/2024, 9:39 PM

v1.26 is pretty old

worried-dusk-80606

10/04/2024, 9:39 PM

I was wondering about updating...

lemon-keyboard-45433

10/04/2024, 9:39 PM

hahaha

lemon-keyboard-45433

10/04/2024, 9:40 PM

it is pretty old

creamy-pencil-82913

10/04/2024, 9:40 PM

you don’t need to do a rotate. You just need to update the token and then restart.

creamy-pencil-82913

10/04/2024, 9:41 PM

There was a bug where

certificate rotate

would fail on agents because it was trying to rotate files that only exist on the server but that is long fixed

lemon-keyboard-45433

10/04/2024, 9:41 PM

I updated the token on config.yaml of the agent node which leads me to a question. Do I use the node-toke in the rke2-server

creamy-pencil-82913

10/04/2024, 9:41 PM

You’re not even on the last patch for 1.26.x, you might try at least getting on the latest patch release for that minor.

❤️ 1

lemon-keyboard-45433

10/04/2024, 9:41 PM

sorry it has been a long day

creamy-pencil-82913

10/04/2024, 9:44 PM

If you didn’t start the server with a --agent-token value then there is not a separate agent-only token, you’d just want to use the server token for joining both servers and agents.

creamy-pencil-82913

10/04/2024, 9:45 PM

they’re all the same thing if you didn’t set up a separate agent token value

Copy code

root@rke2-server-1:/# ls -la /var/lib/rancher/rke2/server/*token 
lrwxrwxrwx 1 root root  34 Oct  4 20:23 /var/lib/rancher/rke2/server/agent-token -> /var/lib/rancher/rke2/server/token
lrwxrwxrwx 1 root root  34 Oct  4 20:23 /var/lib/rancher/rke2/server/node-token -> /var/lib/rancher/rke2/server/token
-rw------- 1 root root 109 Oct  4 20:23 /var/lib/rancher/rke2/server/token

lemon-keyboard-45433

10/04/2024, 9:46 PM

I saw that 🙂. I see some people use the last part of the token firstmiddlelast

lemon-keyboard-45433

10/04/2024, 9:46 PM

is that correct or should I use the entire length of the toke?

creamy-pencil-82913

10/04/2024, 9:48 PM

We haven’t migrated this content over to the rke2 docs yet, but you should read https://docs.k3s.io/cli/token#token-format

creamy-pencil-82913

10/04/2024, 9:49 PM

since it’s the secure token TLS bootstrapping process that you’re running into problems with

worried-dusk-80606

10/04/2024, 9:52 PM

ya, after validating we're still getting the same error

Copy code

Oct 04 14:52:03  rke2[3223401]: time="2024-10-04T14:52:03-07:00" level=error msg="CA cert validation failed: Get \"<https://127.0.0.1:6444/cacerts>\": tls: failed to verify certificate: x509: certificate signed by unknown authority"

worried-dusk-80606

10/04/2024, 9:53 PM

we updated the agent nodes to use the entire token and restarted it, but it gives the same error still

worried-dusk-80606

10/04/2024, 9:54 PM

We think it may actually have something to do with the rke2-ingress-nginx-controller terminating SSL with a different cert

creamy-pencil-82913

10/04/2024, 10:03 PM

it shouldn’t, no. ingress has nothing to do with the connection to the supervisor or apiserver. It is not in that path in any way.

👍 1

creamy-pencil-82913

10/04/2024, 10:04 PM

Have you restarted all the servers already?

creamy-pencil-82913

10/04/2024, 10:08 PM

On the agent, what do you get from:

Copy code

curl -ks <https://SERVER:9345/cacerts> | openssl x509 -noout -text
echo QUIT | openssl s_client -connect SERVER:9345 | openssl x509 -noout -text

where SERVER is the host you’re using as the

server:

address in the agent config

lemon-keyboard-45433

10/04/2024, 10:08 PM

you mean the actual servers correct?

lemon-keyboard-45433

10/04/2024, 10:08 PM

not the just the agent service?

lemon-keyboard-45433

10/04/2024, 10:09 PM

for restart?

creamy-pencil-82913

10/04/2024, 10:09 PM

by restart the servers, I mean restart the rke2-server service on the the server nodes

creamy-pencil-82913

10/04/2024, 10:09 PM

you know which nodes are servers and which are agents right?

high-thailand-50933

10/04/2024, 10:10 PM

you know which nodes are servers and which are agents right?

Yes, we are good with this one...

worried-dusk-80606

10/04/2024, 10:10 PM

Copy code

$ echo QUIT | openssl s_client -connect `hostname`:9345 | openssl x509 -noout -text
depth=1 CN = rke2-server-ca@1727823995

worried-dusk-80606

10/04/2024, 10:10 PM

Copy code

$ curl -ks https://`hostname`:9345/cacerts | openssl x509 -noout -text
        Subject: CN = rke2-server-ca@1728070064

lemon-keyboard-45433

10/04/2024, 10:11 PM

we only have have one server node and the rest are agent nodes

creamy-pencil-82913

10/04/2024, 10:13 PM

Copy code

Issuer: CN = rke2-server-ca@1727823995
        Validity
            Not Before: Oct  1 23:06:35 2024 GMT
            Not After : Oct  2 00:12:11 2025 GMT
        Subject: O = rke2, CN = rke2


        Issuer: CN = rke2-server-ca@1728070064
        Validity
            Not Before: Oct  4 19:27:44 2024 GMT
            Not After : Oct  2 19:27:44 2034 GMT
        Subject: CN = rke2-server-ca@1728070064

The server cert isn’t signed by the new cluster CA, it’s still signed by the old one.

worried-dusk-80606

10/04/2024, 10:14 PM

that's why I thought it was the loadbalancer

worried-dusk-80606

10/04/2024, 10:14 PM

i.e. rke2-ingress-controller

lemon-keyboard-45433

10/04/2024, 10:14 PM

earlier when I ran the script to rotate, it ran perfectly. Except when I ran rke2 certificate rotate-ca --path=/home/mydir/oct4certs/rotate-ca --force

creamy-pencil-82913

10/04/2024, 10:15 PM

the ingress runs on ports 80 and 443. the supervisor and apiserver are on 9345 and 6443. They are completely unrelated.

🙌 1

👍 1

creamy-pencil-82913

10/04/2024, 10:15 PM

Did you put an external load-balancer in front of port 9345?

lemon-keyboard-45433

10/04/2024, 10:15 PM

it gave me a 401 error for the IP:port/cacerts address

worried-dusk-80606

10/04/2024, 10:15 PM

i dont think we placed anything infront

creamy-pencil-82913

10/04/2024, 10:16 PM

So you did the rotate-ca, did you restart the rke2-server service after that?

worried-dusk-80606

10/04/2024, 10:16 PM

yes

worried-dusk-80606

10/04/2024, 10:16 PM

several times

high-thailand-50933

10/04/2024, 10:17 PM

We will restart it again as we follow this path explicitly. Just to be on the safe side.

creamy-pencil-82913

10/04/2024, 10:17 PM

ok. This may be something that we fixed later on, the version you’re on is pretty old. But try doing this on the server:

Copy code

rm /var/lib/rancher/rke2/server/tls/dynamic-cert.json; kubectl delete secret -n kube-system rke2-serving

creamy-pencil-82913

10/04/2024, 10:17 PM

then restart the rke2-server service

creamy-pencil-82913

10/04/2024, 10:18 PM

then run those two openssl commands again and see if the CAs match

lemon-keyboard-45433

10/04/2024, 10:21 PM

thank you for that, the initial command worked like a charm, when I tried restarting the server service it threw the following:

lemon-keyboard-45433

10/04/2024, 10:21 PM

level=fatal msg="/var/lib/rancher/rke2/server/tls/etcd/server-ca.crt, /var/lib/rancher/rke2/server/tls/service.key, /var/lib/rancher/rke2/server/tls/etcd/peer-ca.key, /var/lib/rancher/rke2/server/tls/server-ca.crt, /var/lib/rancher/rke2/server/tls/server-ca.key, /var/lib/rancher/rke2/server/tls/client-ca.crt, /var/lib/rancher/rke2/server/tls/client-ca.key, /var/lib/rancher/rke2/server/tls/etcd/server-ca.key, /var/lib/rancher/rke2/server/tls/request-header-ca.crt, /var/lib/rancher/rke2/server/tls/request-header-ca.key, /var/lib/rancher/rke2/server/tls/etcd/peer-ca.crt newer than datastore and could cause a cluster outage. Remove the file(s) from disk and restart to be recreated from datastore."

lemon-keyboard-45433

10/04/2024, 10:23 PM

it failed the restart and I found that error on var/log

creamy-pencil-82913

10/04/2024, 10:23 PM

yeah thats because you ran the script pointed at the existing data dir and overwrote the files

lemon-keyboard-45433

10/04/2024, 10:23 PM

ohh

creamy-pencil-82913

10/04/2024, 10:24 PM

you’re supposed to generate them in a temp dir and then run the rotate-ca command to load them in

creamy-pencil-82913

10/04/2024, 10:24 PM

so now you gotta go clean those files up and let it extract them from the datastore again

lemon-keyboard-45433

10/04/2024, 10:24 PM

ok, could I solve that with removing the tls dir?

creamy-pencil-82913

10/04/2024, 10:24 PM

just delete them and restart it, it should be ok

lemon-keyboard-45433

10/04/2024, 10:24 PM

ok, that's what I thought

lemon-keyboard-45433

10/04/2024, 10:24 PM

I appreciate that!

creamy-pencil-82913

10/04/2024, 10:25 PM

maybe just rename the tls dir instead of deleting it

worried-dusk-80606

10/04/2024, 10:27 PM

Per https://docs.rke2.io/security/certificates should we try this?

Copy code

# Create updated CA certs and keys, cross-signed by the current CAs.
# This script will create a new temporary directory containing the updated certs, and output the new token values.
curl -sL <https://github.com/k3s-io/k3s/raw/master/contrib/util/rotate-default-ca-certs.sh> | PRODUCT=rke2 bash -

# Load the updated certs into the datastore; see the script output for the updated token values.
rke2 certificate rotate-ca --path=/var/lib/rancher/rke2/server/rotate-ca

creamy-pencil-82913

10/04/2024, 10:28 PM

… isnt that what you already did?

creamy-pencil-82913

10/04/2024, 10:29 PM

thats how you got here in the first place, right? https://rancher-users.slack.com/archives/C3ASABBD1/p1728075981005419?thread_ts=1728073415.718259&cid=C3ASABBD1 That’s the same script

creamy-pencil-82913

10/04/2024, 10:31 PM

Just move that dir out of the way so it can re-extract the certs that you already updated from the datastore out to disk again.

68 Views

Open in Slack

Previous Next