03/24/2023, 9:47 AM
Hello guys 🙂 , i have a problem about changing TLS settings on my downstream cluster. We are using single container rancher on docker with k3s embedded for local cluster. Information :
rancher/sever : 2.7.1
rke2 : v1.24.9+rke2r2 (2f4571a879954e1ea8d4560023eaf57c567df737)
go : version go1.18.7b7
k8s : v1.24.9+rke2r2
Os : Ubuntu 22.04LTS
Description : I have changed the certificates on docker rancher container from self signed to a custom CA. I followed the documentation there : For step
4. Reconfigure Rancher agents to trust the private CA
, i have choosen method 2 by injecting the custom CA checksum inside rancher deployments/daemonsets. For a reason that i don't understand, there's no
on cattle-system namespace, so i didn't edited the rancher agent, just the
. After i rebooted my rancher docker container with the new certificates bind to it. What happened ? Cluster sets itself into 'Updating' state, and master nodes are in state
, with message "Waiting for plan to be applied". Here's the YAML status of the kind
of my stuck master node :
    - lastTransitionTime: '2022-10-14T09:53:34Z'
      status: 'True'
      type: Ready
    - lastTransitionTime: '2022-10-14T09:53:32Z'
      status: 'True'
      type: BootstrapReady
    - lastTransitionTime: '2022-10-14T09:56:29Z'
      status: 'True'
      type: InfrastructureReady
    - lastTransitionTime: '2023-03-01T14:13:52Z'
      status: 'True'
      type: NodeHealthy
    - lastTransitionTime: '2022-10-14T09:53:38Z'
      status: 'True'
      type: PlanApplied
    - lastTransitionTime: '2023-03-24T09:14:48Z'
      message: waiting for plan to be applied
      reason: Waiting
      status: Unknown
      type: Reconciled
  infrastructureReady: true
  lastUpdated: '2022-10-14T09:57:38Z'
So i dive into machine logs, didn't find anything in
like empty for all rancher / rke services. So i decided to launch
rancher-system-agent sentinel
by hand, and here we go :
rancher-system-agent sentinel
INFO[0000] Rancher System Agent version v0.2.13 (4fa9427) is starting 
INFO[0000] Using directory /var/lib/rancher/agent/work for work 
INFO[0000] Starting remote watch of plans               
INFO[0000] Initial connection to Kubernetes cluster failed with error Get "<>": x509: certificate signed by unknown authority, removing CA data and trying again 
panic: error while connecting to Kubernetes cluster with nullified CA data: Get "<>": x509: certificate signed by unknown authority


goroutine 10 [running]:
<*watcher).start(0xc0002be280|*watcher).start(0xc0002be280>, {0x18bd5c0?, 0xc0002b8740})
    /go/src/ +0x9b4
created by <|>
    /go/src/ +0x155
So this means that my custom CA certificate didn't propagate to the host, as a "workaround", I added the certificate to the node local openssl truststore and it worked fine after. But every time i want to scale a new node it fails to install rke2. Also that it triggers a rollout of the
deployment with the old certificate checksum in
variable. Do you have any advices / tips to help ? Thank you so muuuch 🙂


03/24/2023, 11:41 AM
Firstly, you don’t have the node agent daemonset because you’re using RKE2 for your downstream clusters. The node agent is only used with RKE. Second, step 4 looks like it is designed to get the agents to reconnect and get the new deployment from Rancher. That deployment should have the new CA checksum in it, but it sounds like it does not.
đź‘€ 1
I know that doesn’t sound very helpful, but inspecting the secrets and Rancher settings will hopefully tell you where the problem is.


03/24/2023, 12:36 PM
Thanks for your answer 🙂 I'll check my secrets in cattle-system namespace. Do you know if secrets
are relevant ? Otherwise i've already set up
with the new CA certificate and
with the new key/cert.