Hi. I have RKE2 1.31.3 running provisioned by Ranc...
# rke2
p
Hi. I have RKE2 1.31.3 running provisioned by Rancher. On one of the nodes I see
Copy code
Waiting for probes: kube-controller-manager, kube-scheduler
but rke2-server is seams to be functioning fine
Here is logs from kube scheduler for example
c
check the rancher-system-agent logs on the node to see what errors the probes are getting
p
@creamy-pencil-82913 here is logs for rancher-system-agent sevice
c
Theres only 6 seconds of logs here
did you just restart the service or something?
p
No, a few hours ago. That’s all logs I see after restarting service
c
I suspect you’re looking at the wrong node then
p
In logs it shows master-1 node. Exactly the same node where I have this issue
c
in what logs
c
right but how did you map the machine
custom-c5855…
to host
cluster-master-1
? I really suspect you’re on the wrong machine. Have you checked the other ones?
p
Yes. In table I clearly see that node name is cluster-master-1
In cluster management when I open Yaml for this node I see next conditions
Copy code
phase: Running
  v1beta2:
    conditions:
      - lastTransitionTime: '2025-09-20T09:25:49Z'
        message: ''
        observedGeneration: 3
        reason: Available
        status: 'True'
        type: Available
      - lastTransitionTime: '2025-09-20T09:25:49Z'
        message: ''
        observedGeneration: 3
        reason: Ready
        status: 'True'
        type: Ready
      - lastTransitionTime: '2025-05-03T06:59:35Z'
        message: ''
        observedGeneration: 3
        reason: Ready
        status: 'True'
        type: BootstrapConfigReady
      - lastTransitionTime: '2025-05-03T06:59:35Z'
        message: ''
        observedGeneration: 3
        reason: Ready
        status: 'True'
        type: InfrastructureReady
      - lastTransitionTime: '2025-09-20T09:25:49Z'
        message: ''
        observedGeneration: 3
        reason: NodeHealthy
        status: 'True'
        type: NodeHealthy
      - lastTransitionTime: '2025-09-20T09:25:49Z'
        message: ''
        observedGeneration: 3
        reason: NodeReady
        status: 'True'
        type: NodeReady
      - lastTransitionTime: '2025-05-03T06:59:35Z'
        message: ''
        observedGeneration: 3
        reason: NotPaused
        status: 'False'
        type: Paused
      - lastTransitionTime: '2025-05-03T06:59:35Z'
        message: ''
        observedGeneration: 3
        reason: NotDeleting
        status: 'False'
        type: Deleting
Also when I try to do any action with cluster I receive this error
Copy code
Internal error occurred: failed calling webhook "<http://rancher.cattle.io.clusters.provisioning.cattle.io|rancher.cattle.io.clusters.provisioning.cattle.io>": failed to call webhook: an error on the server ("{\"kind\":\"AdmissionReview\",\"apiVersion\":\"<http://admission.k8s.io/v1\|admission.k8s.io/v1\>",\"request\":{\"uid\":\"f202f69b-38be-437b-8975-2abdeb8d3221\",\"kind\":{\"group\":\"<http://provisioning.cattle.io|provisioning.cattle.io>\",\"version\":\"v1\",\"kind\":\"Cluster\"},\"resource\":{\"group\":\"<http://provisioning.cattle.io|provisioning.cattle.io>\",\"version\":\"v1\",\"resource\":\"clusters\"},\"requestKind\":{\"group\":\"<http://provisioning.cattle.io|provisioning.cattle.io>\",\"version\":\"v1\",\"kind\":\"Cluster\"},\"requestResource\":{\"group\":\"<http://provisioning.cattle.io|provisioning.cattle.io>\",\"version\":\"v1\",\"resource\":\"clusters\"},\"name\":\"cluster\",\"namespace\":\"fleet-default\",\"operation\":\"UPDATE\",\"userInfo\":{\"username\":\"user-9626r\",\"groups\":[\"system:authenticated\",\"system:cattle:authenticated\"],\"extra\":{\"principalid\":[\"<local://user-9626r>\"],\"requesthost\":[\"<http://rancher.cluster.com|rancher.cluster.com>\"],\"requesttokenid\":[\"token-kzst9\"],\"username\":[\"admin\"]}},\"object\":{\"apiVersion\":\"<http://provisioning.cattle.io/v1\|provisioning.cattle.io/v1\>",\"kind\":\"Cluster\",\"metadata\":{\"annotations\":{\"<http://provisioning.cattle.io/management-cluster-display-name\|provisioning.cattle.io/management-cluster-display-name\>":\"cluster\"},\"creationTimestamp\":null,\"managedFields\":[{\"apiVersion\":\"<http://provisioning.cattle.io/v1\|provisioning.cattle.io/v1\>",\"fieldsType\":\"FieldsV1\",\"fieldsV1\":{\"f:spec\":{\"f:rkeConfig\":{\"f:machinePoolDefaults\":{}}}},\"manager\":\"rancher-v2.8.1-secret-migrator\",\"operation\":\"Update\",\"time\":\"2024-01-25T22:12:06Z\"},{\"apiVersion\":\"<http://provisioning.cattle.io/v1\|provisioning.cattle.io/v1\>",\"fieldsType\":\"FieldsV1\",\"fieldsV1\":{\"f:status\":{\".\":{},\"f:agentDeployed\":{},\"f:clientSecretName\":{},\"f:clusterName\":{},\"f:conditions\":{\".\":{},\"k:{\\\"type\\\":\\\"Connected\\\"}\":{\"f:lastUpdateTime\":{},\"f:status\":{}},\"k:{\\\"type\\\":\\\"Created\\\"}\":{\"f:lastUpdateTime\":{}},\"k:{\\\"type\\\":\\\"Provisioned\\\"}\":{\"f:lastUpdateTime\":{},\"f:message\":{}},\"k:{\\\"type\\\":\\\"RKECluster\\\"}\":{\"f:lastUpdateTime\":{}},\"k:{\\\"type\\\":\\\"Ready\\\"}\":{\"f:lastUpdateTime\":{},\"f:status\":{}},\"k:{\\\"type\\\":\\\"Updated\\\"}\":{\"f:lastUpdateTime\":{},\"f:message\":{}}},\"f:fleetWorkspaceName\":{},\"f:observedGeneration\":{},\"f:ready\":{}}},\"manager\":\"rancher\",\"operation\":\"Update\",\"subresource\":\"status\",\"time\":\"2025-09-20T09:41:23Z\"},{\"apiVersion\":\"<http://provisioning.cattle.io/v1\|provisioning.cattle.io/v1\>",\"fieldsType\":\"FieldsV1\",\"fieldsV1\":{\"f:metadata\":{\"f:annotations\":{\"f:<http://provisioning.cattle.io/manag%22)|provisioning.cattle.io/manag")> has prevented the request from succeeding
BTW: I had certificates expired on nodes so for each node I had to do manual certificate rotation
Copy code
systemctl stop rke2-server/agent
rke2 certificate rotate
systemctl start rke2-server/agent
Can it be any issues related to it so some of certs still are not valid ? Also I checked logs for rancher-system-agent on other nodes and they look the same
I already tried to remove not working master node and add a new one 😞 . So master-2 now in this state. Looks like I have to wait new rancher release before it will be resolved