adamant-kite-43734
10/03/2023, 8:13 PMfew-memory-46527
10/03/2023, 8:23 PMfast-piano-59234
few-memory-46527
10/05/2023, 11:10 AMfast-piano-59234
few-memory-46527
10/05/2023, 11:52 AMfew-memory-46527
10/06/2023, 12:16 PM[rke-admin@poclphusamaster ~]$ kubectl -n ingress-nginx logs -l app=ingress-nginx
I1006 11:38:29.158073 7 main.go:101] "successfully validated configuration, accepting" ingress="rancher/cattle-system"
I1006 11:38:29.209486 7 event.go:282] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"cattle-system", Name:"rancher", UID:"2b8db858-dfe2-4f42-acac-4576ff93a400", APIVersion:"<http://networking.k8s.io/v1|networking.k8s.io/v1>", ResourceVersion:"388086", FieldPath:""}): type: 'Normal' reason: 'Sync' Scheduled for sync
W1006 11:38:31.447094 7 controller.go:1076] Service "cattle-system/rancher" does not have any active Endpoint.
I1006 11:38:50.127347 7 leaderelection.go:258] successfully acquired lease ingress-nginx/ingress-controller-leader-nginx
I1006 11:38:50.127471 7 status.go:84] "New leader elected" identity="nginx-ingress-controller-2hj82"
I1006 11:38:50.143367 7 status.go:300] "updating Ingress status" namespace="cattle-system" ingress="rancher" currentValue=[] newValue=[{IP:172.27.16.66 Hostname: Ports:[]}]
I1006 11:38:50.148472 7 event.go:282] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"cattle-system", Name:"rancher", UID:"2b8db858-dfe2-4f42-acac-4576ff93a400", APIVersion:"<http://networking.k8s.io/v1|networking.k8s.io/v1>", ResourceVersion:"388260", FieldPath:""}): type: 'Normal' reason: 'Sync' Scheduled for sync
I1006 11:38:50.199425 7 admission.go:149] processed ingress via admission controller {testedIngressLength:1 testedIngressTime:0.047s renderingIngressLength:1 renderingIngressTime:0s admissionTime:17.9kBs testedConfigurationSize:0.047}
I1006 11:38:50.199471 7 main.go:101] "successfully validated configuration, accepting" ingress="rancher/cattle-system"
I1006 11:38:50.203551 7 event.go:282] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"cattle-system", Name:"rancher", UID:"2b8db858-dfe2-4f42-acac-4576ff93a400", APIVersion:"<http://networking.k8s.io/v1|networking.k8s.io/v1>", ResourceVersion:"388267", FieldPath:""}): type: 'Normal' reason: 'Sync' Scheduled for sync
fast-piano-59234
Service "cattle-system/rancher" does not have any active Endpoint.
is probably a good lead. What EC2 instance type(s) or you using?few-memory-46527
10/08/2023, 10:57 AMfew-memory-46527
10/09/2023, 5:07 AMfast-piano-59234
where <http://omkar.com|omkar.com> is pointing to one of the public Ip of a EC2 instance.
, are you using it as a load balancer or something? You need to be more specific about your setup so we can match your intentions with your setup and figure out where this is causing issues.fast-piano-59234
few-memory-46527
10/09/2023, 8:53 AM[rke-admin@poclphusamaster ~]$ kubectl -n ingress-nginx logs -l app=ingress-nginx
W1009 06:12:29.622770 7 controller.go:1076] Service "cattle-system/rancher" does not have any active Endpoint.
I1009 06:12:29.764479 7 admission.go:149] processed ingress via admission controller {testedIngressLength:1 testedIngressTime:0.142s renderingIngressLength:1 renderingIngressTime:0s admissionTime:17.9kBs testedConfigurationSize:0.142}
I1009 06:12:29.764581 7 main.go:101] "successfully validated configuration, accepting" ingress="rancher/cattle-system"
I1009 06:12:29.787702 7 event.go:282] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"cattle-system", Name:"rancher", UID:"2b8db858-dfe2-4f42-acac-4576ff93a400", APIVersion:"networking.k8s.io/v1", ResourceVersion:"1722126", FieldPath:""}): type: 'Normal' reason: 'Sync' Scheduled for sync
W1009 06:12:29.790538 7 controller.go:1076] Service "cattle-system/rancher" does not have any active Endpoint.
I1009 06:12:50.021381 7 status.go:300] "updating Ingress status" namespace="cattle-system" ingress="rancher" currentValue=[] newValue=[{IP:172.27.16.66 Hostname: Ports:[]}]
I1009 06:12:50.035999 7 event.go:282] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"cattle-system", Name:"rancher", UID:"2b8db858-dfe2-4f42-acac-4576ff93a400", APIVersion:"networking.k8s.io/v1", ResourceVersion:"1722296", FieldPath:""}): type: 'Normal' reason: 'Sync' Scheduled for sync
I1009 06:12:50.168049 7 admission.go:149] processed ingress via admission controller {testedIngressLength:1 testedIngressTime:0.108s renderingIngressLength:1 renderingIngressTime:0.016s admissionTime:17.9kBs testedConfigurationSize:0.124}
I1009 06:12:50.168149 7 main.go:101] "successfully validated configuration, accepting" ingress="rancher/cattle-system"
I1009 06:12:50.178851 7 event.go:282] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"cattle-system", Name:"rancher", UID:"2b8db858-dfe2-4f42-acac-4576ff93a400", APIVersion:"networking.k8s.io/v1", ResourceVersion:"1722303", FieldPath:""}): type: 'Normal' reason: 'Sync' Scheduled for sync
To get the svc details
[rke-admin@poclphusamaster ~]$ kubectl get svc -A
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
cattle-fleet-system gitjob ClusterIP 10.43.141.158 <none> 80/TCP 4d18h
cattle-system rancher ClusterIP 10.43.53.160 <none> 80/TCP,443/TCP 4d18h
cattle-system rancher-webhook ClusterIP 10.43.167.160 <none> 443/TCP 4d18h
cattle-system webhook-service ClusterIP 10.43.95.239 <none> 443/TCP 4d18h
cert-manager cert-manager ClusterIP 10.43.51.208 <none> 9402/TCP 172m
cert-manager cert-manager-webhook ClusterIP 10.43.88.69 <none> 443/TCP 172m
default kubernetes ClusterIP 10.43.0.1 <none> 443/TCP 4d18h
ingress-nginx ingress-nginx-controller-admission ClusterIP 10.43.102.5 <none> 443/TCP 4d18h
kube-system kube-dns ClusterIP 10.43.0.10 <none> 53/UDP,53/TCP,9153/TCP 4d18h
kube-system metrics-server ClusterIP 10.43.18.165 <none> 443/TCP 4d18h
This are the pods
[rke-admin@poclphusamaster ~]$ kubectl get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE
cattle-fleet-local-system fleet-agent-78f694664b-cwvcr 1/1 Running 8 (160m ago) 4d18h
cattle-fleet-system fleet-controller-6666887949-xjsrf 1/1 Running 216 (160m ago) 4d18h
cattle-fleet-system gitjob-7b97c9c7fd-hsl5z 1/1 Running 8 (160m ago) 4d18h
cattle-system rancher-6bcbdd6cb7-77n4w 1/1 Running 148 (160m ago) 4d18h
cattle-system rancher-6bcbdd6cb7-jcx5k 1/1 Running 248 (160m ago) 4d18h
cattle-system rancher-6bcbdd6cb7-xvvtz 1/1 Running 149 (160m ago) 4d18h
cattle-system rancher-webhook-5d4f5b7f6d-thvnf 1/1 Running 6 (160m ago) 4d18h
cert-manager cert-manager-57d89b9548-m2j4c 1/1 Running 1 (160m ago) 173m
cert-manager cert-manager-cainjector-5bcf77b697-69nhk 1/1 Running 1 (160m ago) 173m
cert-manager cert-manager-webhook-9cb88bd6d-hr7w7 1/1 Running 1 (160m ago) 173m
ingress-nginx nginx-ingress-controller-2hj82 1/1 Running 1 (160m ago) 2d21h
kube-system calico-kube-controllers-5685fbd9f7-4xx52 1/1 Running 1 (160m ago) 2d21h
kube-system canal-shx8c 2/2 Running 2 (160m ago) 2d21h
kube-system coredns-8578b6dbdd-f9wch 1/1 Running 1 (160m ago) 2d21h
kube-system coredns-autoscaler-f7b68ccb7-sfm9s 1/1 Running 1 (160m ago) 2d21h
kube-system kube-vip-ds-t2ltj 1/1 Running 11 (160m ago) 4d18h
kube-system metrics-server-6bc7854fb5-44m94 1/1 Running 1 (160m ago) 2d21h
kube-system rke-coredns-addon-deploy-job--1-zkfq7 0/1 Completed 0 4d18h
kube-system rke-ingress-controller-deploy-job--1-bjgvc 0/1 Completed 0 4d18h
kube-system rke-metrics-addon-deploy-job--1-6j72t 0/1 Completed 0 4d18h
kube-system rke-network-plugin-deploy-job--1-qk4j7 0/1 Completed 0 4d18h
To get the ingress
[rke-admin@poclphusamaster ~]$ kubectl -n cattle-system get ingress
NAME CLASS HOSTS ADDRESS PORTS AGE
rancher nginx hurancher.omkar.org 172.27.16.66 80, 443 4d18h
few-memory-46527
10/09/2023, 8:56 AMfew-memory-46527
10/09/2023, 2:25 PMfew-memory-46527
10/09/2023, 2:27 PM[rke-admin@poclphusamaster ~]$ kubectl -n cattle-system logs -l app=rancher
2023/10/09 14:12:00 [ERROR] error syncing 'c-mtn95': handler cluster-deploy: Get "<https://172.27.16.68:6443/apis/apps/v1/namespaces/cattle-system/daemonsets/cattle-node-agent>": cluster agent disconnected, requeuing
2023/10/09 14:12:41 [ERROR] error syncing 'c-mtn95': handler cluster-deploy: Get "<https://172.27.16.68:6443/apis/apps/v1/namespaces/cattle-system/daemonsets/cattle-node-agent>": cluster agent disconnected, requeuing
2023/10/09 14:13:26 [ERROR] error syncing 'c-mtn95': handler cluster-deploy: Get "<https://172.27.16.68:6443/apis/apps/v1/namespaces/cattle-system/daemonsets/cattle-node-agent>": cluster agent disconnected, requeuing
2023/10/09 14:14:11 [ERROR] error syncing 'c-mtn95': handler cluster-deploy: Get "<https://172.27.16.68:6443/apis/apps/v1/namespaces/cattle-system/daemonsets/cattle-node-agent>": cluster agent disconnected, requeuing
2023/10/09 14:14:55 [ERROR] error syncing 'c-mtn95': handler cluster-deploy: Get "<https://172.27.16.68:6443/apis/apps/v1/namespaces/cattle-system/daemonsets/cattle-node-agent>": cluster agent disconnected, requeuing
2023/10/09 14:15:41 [ERROR] error syncing 'c-mtn95': handler cluster-deploy: Get "<https://172.27.16.68:6443/apis/apps/v1/namespaces/cattle-system/daemonsets/cattle-node-agent>": cluster agent disconnected, requeuing
2023/10/09 14:16:24 [ERROR] error syncing 'c-mtn95': handler cluster-deploy: Get "<https://172.27.16.68:6443/apis/apps/v1/namespaces/cattle-system/daemonsets/cattle-node-agent>": cluster agent disconnected, requeuing
2023/10/09 14:17:06 [ERROR] error syncing 'c-mtn95': handler cluster-deploy: Get "<https://172.27.16.68:6443/apis/apps/v1/namespaces/cattle-system/daemonsets/cattle-node-agent>": cluster agent disconnected, requeuing
2023/10/09 14:17:47 [ERROR] error syncing 'c-mtn95': handler cluster-deploy: Get "<https://172.27.16.68:6443/apis/apps/v1/namespaces/cattle-system/daemonsets/cattle-node-agent>": cluster agent disconnected, requeuing
2023/10/09 14:19:52 [ERROR] error syncing 'c-mtn95': handler cluster-deploy: Get "<https://172.27.16.68:6443/apis/apps/v1/namespaces/cattle-system/daemonsets/cattle-node-agent>": cluster agent disconnected, requeuing
2023/10/09 13:59:27 [ERROR] Failed to connect to peer <wss://10.42.0.41/v3/connect> [local ID=10.42.0.39]: dial tcp 10.42.0.41:443: connect: connection refused
2023/10/09 13:59:32 [ERROR] Failed to connect to peer <wss://10.42.0.41/v3/connect> [local ID=10.42.0.39]: dial tcp 10.42.0.41:443: connect: connection refused
2023/10/09 13:59:37 [ERROR] Failed to connect to peer <wss://10.42.0.41/v3/connect> [local ID=10.42.0.39]: dial tcp 10.42.0.41:443: connect: connection refused
2023/10/09 13:59:42 [ERROR] Failed to connect to peer <wss://10.42.0.41/v3/connect> [local ID=10.42.0.39]: dial tcp 10.42.0.41:443: connect: connection refused
2023/10/09 13:59:47 [ERROR] Failed to connect to peer <wss://10.42.0.41/v3/connect> [local ID=10.42.0.39]: dial tcp 10.42.0.41:443: connect: connection refused
2023/10/09 13:59:52 [ERROR] Failed to connect to peer <wss://10.42.0.41/v3/connect> [local ID=10.42.0.39]: dial tcp 10.42.0.41:443: connect: connection refused
2023/10/09 13:59:57 [ERROR] Failed to connect to peer <wss://10.42.0.41/v3/connect> [local ID=10.42.0.39]: dial tcp 10.42.0.41:443: connect: connection refused
2023/10/09 13:59:59 [INFO] Handling backend connection request [10.42.0.41]
W1009 14:05:59.523895 33 warnings.go:80] policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
W1009 14:15:32.527095 33 warnings.go:80] policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
2023/10/09 14:17:31 [ERROR] error syncing 'rancher-partner-charts': handler helm-clusterrepo-ensure: git -C /var/lib/rancher-data/local-catalogs/v2/rancher-partner-charts/8f17acdce9bffd6e05a58a3798840e408c4ea71783381ecd2e9af30baad65974 fetch origin 39cf64a120af1737af61b201312012815ce4c252 error: exit status 128, detail: error: Server does not allow request for unadvertised object 39cf64a120af1737af61b201312012815ce4c252
, requeuing
2023/10/09 14:18:05 [INFO] Stopping cluster agent for c-mtn95
2023/10/09 14:18:05 [ERROR] failed to start cluster controllers c-mtn95: context canceled
2023/10/09 14:19:31 [ERROR] error syncing 'rancher-partner-charts': handler helm-clusterrepo-ensure: git -C /var/lib/rancher-data/local-catalogs/v2/rancher-partner-charts/8f17acdce9bffd6e05a58a3798840e408c4ea71783381ecd2e9af30baad65974 fetch origin 39cf64a120af1737af61b201312012815ce4c252 error: exit status 128, detail: error: Server does not allow request for unadvertised object 39cf64a120af1737af61b201312012815ce4c252
, requeuing
2023/10/09 14:20:16 [INFO] Stopping cluster agent for c-mtn95
2023/10/09 14:20:16 [ERROR] failed to start cluster controllers c-mtn95: context canceled
2023/10/09 14:21:32 [ERROR] error syncing 'rancher-partner-charts': handler helm-clusterrepo-ensure: git -C /var/lib/rancher-data/local-catalogs/v2/rancher-partner-charts/8f17acdce9bffd6e05a58a3798840e408c4ea71783381ecd2e9af30baad65974 fetch origin 39cf64a120af1737af61b201312012815ce4c252 error: exit status 128, detail: error: Server does not allow request for unadvertised object 39cf64a120af1737af61b201312012815ce4c252
, requeuing
I am stuck on this for long time now. Requesting to guide me on this.few-memory-46527
10/10/2023, 7:31 AMfast-piano-59234
few-memory-46527
10/10/2023, 6:12 PMfew-memory-46527
10/10/2023, 6:14 PM2023/10/10 18:04:59 [ERROR] failed to start cluster controllers c-mtn95: context canceled
W1010 18:05:33.515067 33 warnings.go:80] policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
2023/10/10 18:07:18 [INFO] Stopping cluster agent for c-mtn95
2023/10/10 18:07:18 [ERROR] failed to start cluster controllers c-mtn95: context canceled
2023/10/10 18:09:10 [INFO] Stopping cluster agent for c-mtn95
2023/10/10 18:09:10 [ERROR] failed to start cluster controllers c-mtn95: context canceled
2023/10/10 18:11:14 [INFO] Stopping cluster agent for c-mtn95
2023/10/10 18:11:14 [ERROR] failed to start cluster controllers c-mtn95: context canceled
2023/10/10 18:13:27 [INFO] Stopping cluster agent for c-mtn95
2023/10/10 18:13:27 [ERROR] failed to start cluster controllers c-mtn95: context canceled
2023/10/10 18:07:32 [INFO] Stopping cluster agent for c-mtn95
2023/10/10 18:07:32 [ERROR] failed to start cluster controllers c-mtn95: context canceled
2023/10/10 18:09:33 [INFO] Stopping cluster agent for c-mtn95
2023/10/10 18:09:33 [ERROR] failed to start cluster controllers c-mtn95: context canceled
2023/10/10 18:10:01 [ERROR] error syncing 'c-mtn95': handler cluster-deploy: Get "<https://172.27.16.68:6443/apis/apps/v1/namespaces/cattle-system/daemonsets/cattle-node-agent>": cluster agent disconnected, requeuing
2023/10/10 18:11:38 [INFO] Stopping cluster agent for c-mtn95
few-memory-46527
10/10/2023, 6:48 PMfast-piano-59234
cattle-cluster-agent
pod container logging? What is it pointing to (CATTLE_SERVER
environment variable)few-memory-46527
10/11/2023, 5:55 PMfew-memory-46527
10/12/2023, 1:23 PMfew-memory-46527
10/14/2023, 12:52 PMfew-memory-46527
10/14/2023, 12:53 PMfast-piano-59234
kube-vip
doing on this cluster? Is it not interfering with the route I just described? The goal to root cause is to eliminate any possible issue until found, so if there is no reason to have it running, you should remove it for now until you can access the UI.
Support options can be found on https://www.suse.com/support/few-memory-46527
10/16/2023, 10:14 AMfast-piano-59234
few-memory-46527
10/17/2023, 10:13 AMfast-piano-59234
few-memory-46527
10/20/2023, 3:32 AMfast-piano-59234
curl -v <https://hurancher.zeomega.org>
, might need to add -k
if its a non valid certificatefew-memory-46527
10/20/2023, 1:01 PMfew-memory-46527
10/20/2023, 6:32 PMfast-piano-59234
172.27.250.78
and what is the network path/what component is in between from the machine running curl
to 172.27.250.78
and from there to 172.27.16.66
?few-memory-46527
10/21/2023, 2:57 AMfast-piano-59234
curl <https://hurancher.zeomega.org> --resolve <http://hurancher.zeomega.org:443:172.27.16.66|hurancher.zeomega.org:443:172.27.16.66>
few-memory-46527
10/23/2023, 2:44 PM[rke-admin@poclphusamaster ~]$ kubectl -n cattle-system logs -l app=rancher
2023/10/23 14:36:39 [INFO] Starting <http://apiregistration.k8s.io/v1|apiregistration.k8s.io/v1>, Kind=APIService controller
2023/10/23 14:36:39 [INFO] Starting /v1, Kind=LimitRange controller
2023/10/23 14:36:39 [INFO] Starting /v1, Kind=Namespace controller
2023/10/23 14:36:39 [INFO] Starting <http://rbac.authorization.k8s.io/v1|rbac.authorization.k8s.io/v1>, Kind=ClusterRole controller
2023/10/23 14:36:39 [INFO] Starting <http://rbac.authorization.k8s.io/v1|rbac.authorization.k8s.io/v1>, Kind=ClusterRoleBinding controller
2023/10/23 14:36:39 [INFO] Starting /v1, Kind=Secret controller
2023/10/23 14:36:39 [INFO] Starting /v1, Kind=ServiceAccount controller
2023/10/23 14:36:39 [INFO] Starting cluster agent for local [owner=true]
2023/10/23 14:37:53 [INFO] Stopping cluster agent for c-hpwz8
2023/10/23 14:37:53 [ERROR] failed to start cluster controllers c-hpwz8: context canceled
time="2023-10-23 14:37:02" level=error msg="Failed to get HPA for project c-hpwz8:p-4q9vw err=Unknown schema type [horizontalPodAutoscaler]"
time="2023-10-23 14:37:02" level=error msg="Failed to get Pod for project c-hpwz8:p-4q9vw err=Unknown schema type [pod]"
time="2023-10-23 14:37:02" level=error msg="Failed to get Namespaces for project c-hpwz8:p-fmkr5 err=Unknown schema type [namespace]"
time="2023-10-23 14:37:02" level=error msg="Failed to get Workload for project c-hpwz8:p-fmkr5 err=Unknown schema type [workload]"
time="2023-10-23 14:37:02" level=error msg="Failed to get HPA for project c-hpwz8:p-fmkr5 err=Unknown schema type [horizontalPodAutoscaler]"
time="2023-10-23 14:37:02" level=error msg="Failed to get Pod for project c-hpwz8:p-fmkr5 err=Unknown schema type [pod]"
2023/10/23 14:37:38 [ERROR] error syncing 'c-hpwz8': handler cluster-deploy: Get "<https://172.27.16.69:6443/apis/apps/v1/namespaces/cattle-system/daemonsets/cattle-node-agent>": cluster agent disconnected, requeuing
2023/10/23 14:37:59 [INFO] Stopping cluster agent for c-hpwz8
2023/10/23 14:37:59 [ERROR] failed to start cluster controllers c-hpwz8: context canceled
2023/10/23 14:38:17 [ERROR] error syncing 'c-hpwz8': handler cluster-deploy: Get "<https://172.27.16.69:6443/apis/apps/v1/namespaces/cattle-system/daemonsets/cattle-node-agent>": cluster agent disconnected, requeuing
2023/10/23 14:36:39 [INFO] Starting <http://rbac.authorization.k8s.io/v1|rbac.authorization.k8s.io/v1>, Kind=ClusterRoleBinding controller
2023/10/23 14:36:39 [INFO] Starting <http://rbac.authorization.k8s.io/v1|rbac.authorization.k8s.io/v1>, Kind=ClusterRole controller
2023/10/23 14:36:39 [INFO] Starting <http://rbac.authorization.k8s.io/v1|rbac.authorization.k8s.io/v1>, Kind=RoleBinding controller
2023/10/23 14:36:39 [INFO] Starting <http://rbac.authorization.k8s.io/v1|rbac.authorization.k8s.io/v1>, Kind=Role controller
2023/10/23 14:36:39 [INFO] Starting /v1, Kind=Namespace controller
2023/10/23 14:36:39 [INFO] Starting /v1, Kind=Secret controller
2023/10/23 14:36:39 [INFO] Starting /v1, Kind=ServiceAccount controller
2023/10/23 14:36:44 [INFO] Handling backend connection request [10.42.0.28]
2023/10/23 14:38:22 [INFO] Stopping cluster agent for c-hpwz8
2023/10/23 14:38:22 [ERROR] failed to start cluster controllers c-hpwz8: context canceled
In the downstream cluster I can see that there are 2 snapshots
[rke-admin@Poclphusanode2 etcd-snapshots]$ ll
total 2192
-rw------- 1 root root 1423633 Oct 23 09:35 c-hpwz8-rl-6p2wh_2023-10-23T14:34:50Z.zip
-rw------- 1 root root 815925 Oct 20 23:07 c-hpwz8-rl-75hmd_2023-10-21T04:07:04Z.zip
I am aware of the command that I can run in the rancher node i.e rke etcd snapshot-restore --name snapshot.db --config cluster.yml and also if I want to recover the downstream cluster I can do it through UI but the UI is not accessible now. How can I restore the downstream cluster if I do not have the cluster.yml file in there?fast-piano-59234
few-memory-46527
10/25/2023, 1:05 PM