https://rancher.com/ logo
Title
a

abundant-hair-58573

01/27/2023, 5:25 PM
We had an issue with rancher when trying to upgrade K3s, so we're attempting to restore from a backup. We created a new 2 node k3 cluster and followed the steps here. The restore operation finished but Rancher will not come back up. When I look at the Rancher pod logs I see this
2023/01/27 17:04:31 [ERROR] failed to start cluster controllers c-mmcz2: context canceled
2023/01/27 17:04:55 [ERROR] error syncing 'c-mmcz2': handler cluster-deploy: Get "https://<ip-address>:6443/apis/apps/v1/namespaces/cattle-system/daemonsets/cattle-node-agent": cluster agent disconnected, requeuing
2023/01/27 17:06:26 [INFO] Stopping cluster agent for c-mmcz2
2023/01/27 17:06:26 [ERROR] failed to start cluster controllers c-mmcz2: context canceled
2023/01/27 17:06:35 [INFO] Stopping cluster agent for c-cssrh
2023/01/27 17:06:35 [ERROR] failed to start cluster controllers c-cssrh: context canceled
2023/01/27 17:06:41 [ERROR] error syncing 'c-cssrh': handler cluster-deploy: Get "https://<ip-address>:6443/apis/apps/v1/namespaces/cattle-system/daemonsets/cattle-node-agent": cluster agent disconnected, requeuing
I believe c-cssrh is our RKE cluster that rancher was managing before the crash, and that we're hoping can just rejoin Rancher once we get it back up. The ip-address it's trying to connect to is a control plane in that cluster. I'm not sure what c-mmcz2 is, the ip-address it's trying to connect to is no longer around and I don't know what it was, possibly an old rancher manager and that's the name of the rancher cluster in k3s that was corrupted? Is there somewhere else to look for more detailed logs?
I also see this towards the beginning of the log
[ERROR] error syncing 'cattle-fleet-system/helm-operation-6rkmb': handler helm-operation: an error on the server ("container not found (\"proxy\")"
) has prevented the request from succeeding (get pods helm-operation-6rkmb), requeuing
I restarted one of the 2 rancher manager nodes and this is in the container log for rancher
2023/01/27 17:45:15 [ERROR] Failed to connect to peer <wss://10.42.1.13/v3/connect> [local ID=10.42.0.24]: websocket: bad handshake
2023/01/27 17:45:18 [ERROR] Failed to connect to peer <wss://10.42.0.22/v3/connect> [local ID=10.42.0.24]: websocket: bad handshake
2023/01/27 17:45:19 [INFO] Handling backend connection request [10.42.0.22]
2023/01/27 17:45:19 [INFO] Handling backend connection request [10.42.1.13]
2023/01/27 17:45:19 [INFO] Stopping cluster agent for local
2023/01/27 17:45:19 [INFO] Shutting down /v1, Kind=Namespace workers
2023/01/27 17:45:19 [INFO] Shutting down <http://rbac.authorization.k8s.io/v1|rbac.authorization.k8s.io/v1>, Kind=Role workers
2023/01/27 17:45:19 [INFO] Shutting down <http://rbac.authorization.k8s.io/v1|rbac.authorization.k8s.io/v1>, Kind=RoleBinding workers
2023/01/27 17:45:19 [INFO] Shutting down <http://rbac.authorization.k8s.io/v1|rbac.authorization.k8s.io/v1>, Kind=ClusterRole workers
2023/01/27 17:45:19 [INFO] Shutting down <http://rbac.authorization.k8s.io/v1|rbac.authorization.k8s.io/v1>, Kind=ClusterRoleBinding workers
2023/01/27 17:45:19 [INFO] Shutting down /v1, Kind=ServiceAccount workers
2023/01/27 17:45:19 [INFO] Shutting down /v1, Kind=Secret workers
2023/01/27 17:45:19 [INFO] Registering istio for cluster "local"
2023/01/27 17:45:19 [INFO] Starting cluster controllers for local
2023/01/27 17:45:19 [INFO] Starting <http://management.cattle.io/v3|management.cattle.io/v3>, Kind=ClusterScan controller
2023/01/27 17:45:19 [INFO] Starting <http://rke.cattle.io/v1|rke.cattle.io/v1>, Kind=ETCDSnapshot controller
2023/01/27 17:45:19 [INFO] Starting <http://management.cattle.io/v3|management.cattle.io/v3>, Kind=ClusterAlert controller
2023/01/27 17:45:19 [INFO] Starting <http://management.cattle.io/v3|management.cattle.io/v3>, Kind=Notifier controller
2023/01/27 17:45:19 [INFO] Starting <http://management.cattle.io/v3|management.cattle.io/v3>, Kind=ClusterLogging controller
2023/01/27 17:45:20 [INFO] Starting cluster agent for local [owner=true]
2023/01/27 17:45:20 [INFO] Starting <http://apiregistration.k8s.io/v1|apiregistration.k8s.io/v1>, Kind=APIService controller
2023/01/27 17:45:20 [INFO] Starting cluster controllers for local
2023/01/27 17:45:20 [INFO] Starting <http://rbac.authorization.k8s.io/v1|rbac.authorization.k8s.io/v1>, Kind=ClusterRole controller
2023/01/27 17:45:20 [INFO] Starting <http://rbac.authorization.k8s.io/v1|rbac.authorization.k8s.io/v1>, Kind=ClusterRoleBinding controller
2023/01/27 17:45:20 [INFO] Starting /v1, Kind=ServiceAccount controller
2023/01/27 17:45:20 [INFO] Starting /v1, Kind=ResourceQuota controller
2023/01/27 17:45:20 [INFO] Starting /v1, Kind=ConfigMap controller
2023/01/27 17:45:20 [INFO] Starting /v1, Kind=Node controller
2023/01/27 17:45:20 [INFO] Starting /v1, Kind=Secret controller
2023/01/27 17:45:20 [INFO] Starting /v1, Kind=Namespace controller
2023/01/27 17:45:20 [INFO] Starting <http://rbac.authorization.k8s.io/v1|rbac.authorization.k8s.io/v1>, Kind=Role controller
2023/01/27 17:45:20 [INFO] Starting <http://rbac.authorization.k8s.io/v1|rbac.authorization.k8s.io/v1>, Kind=RoleBinding controller
2023/01/27 17:45:20 [INFO] Starting /v1, Kind=LimitRange controller
2023/01/27 17:45:20 [INFO] Starting <http://management.cattle.io/v3|management.cattle.io/v3>, Kind=ProjectAlert controller
2023/01/27 17:45:20 [INFO] Starting /v1, Kind=Service controller
2023/01/27 17:45:20 [INFO] Starting apps/v1, Kind=Deployment controller
2023/01/27 17:45:20 [INFO] Starting cluster controllers for local
2023/01/27 17:45:20 [INFO] Starting <http://management.cattle.io/v3|management.cattle.io/v3>, Kind=ProjectAlertRule controller
2023/01/27 17:45:20 [INFO] Starting <http://management.cattle.io/v3|management.cattle.io/v3>, Kind=ClusterAlertRule controller
2023/01/27 17:45:20 [INFO] Starting <http://management.cattle.io/v3|management.cattle.io/v3>, Kind=ClusterAlertGroup controller
2023/01/27 17:45:20 [INFO] Starting <http://management.cattle.io/v3|management.cattle.io/v3>, Kind=ProjectAlertGroup controller
2023/01/27 17:45:20 [INFO] Starting batch/v1, Kind=Job controller
2023/01/27 17:45:20 [INFO] Starting batch/v1beta1, Kind=CronJob controller
2023/01/27 17:45:20 [INFO] Starting /v1, Kind=Pod controller
2023/01/27 17:45:20 [INFO] Starting apps/v1, Kind=DaemonSet controller
2023/01/27 17:45:20 [INFO] Starting apps/v1, Kind=StatefulSet controller
2023/01/27 17:45:20 [INFO] Starting /v1, Kind=Event controller
2023/01/27 17:45:20 [INFO] Starting /v1, Kind=ReplicationController controller
2023/01/27 17:45:20 [INFO] Starting apps/v1, Kind=ReplicaSet controller
I0127 17:45:33.544407      35 trace.go:205] Trace[472941069]: "Reflector ListAndWatch" name:pkg/mod/github.com/rancher/client-go@v1.23.3-rancher2/tools/cache/reflector.go:168 (27-Jan-2023 17:44:50.222) (total time: 43322ms):
Trace[472941069]: ---"Objects listed" error:<nil> 43322ms (17:45:33.544)
Trace[472941069]: [43.322085588s] [43.322085588s] END
2023/01/27 17:46:24 [INFO] Stopping cluster agent for c-cssrh
2023/01/27 17:46:24 [ERROR] failed to start cluster controllers c-cssrh: context canceled
2023/01/27 17:46:36 [INFO] Stopping cluster agent for c-mmcz2
2023/01/27 17:46:36 [ERROR] failed to start cluster controllers c-mmcz2: context canceled
I'd at least like to be able to get to the Rancher UI