gifted-agent-35161
03/24/2023, 9:47 AMrancher/sever : 2.7.1
rke2 : v1.24.9+rke2r2 (2f4571a879954e1ea8d4560023eaf57c567df737)
go : version go1.18.7b7
k8s : v1.24.9+rke2r2
Os : Ubuntu 22.04LTS
Description :
I have changed the certificates on docker rancher container from self signed to a custom CA. I followed the documentation there : https://ranchermanager.docs.rancher.com/getting-started/installation-and-upgrade/resources/update-rancher-certificate
For step 4. Reconfigure Rancher agents to trust the private CA
, i have choosen method 2 by injecting the custom CA checksum inside rancher deployments/daemonsets.
For a reason that i don't understand, there's no daemonset/cattle-node-agent
on cattle-system namespace, so i didn't edited the rancher agent, just the cattle-cluster-agent
.
After i rebooted my rancher docker container with the new certificates bind to it.
What happened ?
Cluster sets itself into 'Updating' state, and master nodes are in state Reconciling
, with message "Waiting for plan to be applied".
Here's the YAML status of the kind Machine
of my stuck master node :
conditions:
- lastTransitionTime: '2022-10-14T09:53:34Z'
status: 'True'
type: Ready
- lastTransitionTime: '2022-10-14T09:53:32Z'
status: 'True'
type: BootstrapReady
- lastTransitionTime: '2022-10-14T09:56:29Z'
status: 'True'
type: InfrastructureReady
- lastTransitionTime: '2023-03-01T14:13:52Z'
status: 'True'
type: NodeHealthy
- lastTransitionTime: '2022-10-14T09:53:38Z'
status: 'True'
type: PlanApplied
- lastTransitionTime: '2023-03-24T09:14:48Z'
message: waiting for plan to be applied
reason: Waiting
status: Unknown
type: Reconciled
infrastructureReady: true
lastUpdated: '2022-10-14T09:57:38Z'
So i dive into machine logs, didn't find anything in journalctl
like empty for all rancher / rke services. So i decided to launch rancher-system-agent sentinel
by hand, and here we go :
rancher-system-agent sentinel
INFO[0000] Rancher System Agent version v0.2.13 (4fa9427) is starting
INFO[0000] Using directory /var/lib/rancher/agent/work for work
INFO[0000] Starting remote watch of plans
INFO[0000] Initial connection to Kubernetes cluster failed with error Get "<https://192.168.10.203/version>": x509: certificate signed by unknown authority, removing CA data and trying again
panic: error while connecting to Kubernetes cluster with nullified CA data: Get "<https://192.168.10.203/version>": x509: certificate signed by unknown authority
goroutine 10 [running]:
<http://github.com/rancher/system-agent/pkg/k8splan.(*watcher).start(0xc0002be280|github.com/rancher/system-agent/pkg/k8splan.(*watcher).start(0xc0002be280>, {0x18bd5c0?, 0xc0002b8740})
/go/src/github.com/rancher/system-agent/pkg/k8splan/watcher.go:99 +0x9b4
created by <http://github.com/rancher/system-agent/pkg/k8splan.Watch|github.com/rancher/system-agent/pkg/k8splan.Watch>
/go/src/github.com/rancher/system-agent/pkg/k8splan/watcher.go:63 +0x155
So this means that my custom CA certificate didn't propagate to the host, as a "workaround", I added the certificate to the node local openssl truststore and it worked fine after.
But every time i want to scale a new node it fails to install rke2. Also that it triggers a rollout of the cattle-cluster-agent
deployment with the old certificate checksum in CATTLE_CA_CHECKSUM
variable.
Do you have any advices / tips to help ? Thank you so muuuch 🙂big-hydrogen-97240
03/24/2023, 11:41 AMgifted-agent-35161
03/24/2023, 12:36 PMtls-rancher
and tls-rancher-internal-ca
are relevant ? Otherwise i've already set up tls-ca
with the new CA certificate and tls-rancher-ingress
with the new key/cert.