ambitious-plastic-3551
11/16/2022, 5:44 PMambitious-plastic-3551
11/16/2022, 5:44 PMsparse-fireman-14239
11/17/2022, 8:14 PMnode-taint:
- "CriticalAddonsOnly=true:NoExecute"
Though, kubelet is started with what I assume is the correct argument.
--register-with-taints=CriticalAddonsOnly=true:NoExecute
Adding the taint with kubectl works fine.sparse-dusk-81900
11/18/2022, 9:29 AMsparse-dusk-81900
11/18/2022, 9:31 AMv2.6.9
and the custom
RKE2 clusters to v1.24.7
. However, both clusters are now in “Updating” state as both have a single node/machine with status waiting for plan to be applied
. What’s the best way to troubleshoot this? I’ve already checked the rancher-system-agent.service
on the regarding VMs but didn’t find anything suspicious. Also, cluster operations like a manually triggered cert rotation after the upgrade to v1.24.7
run successfully - even on the affected nodes/machines. Because of that it just looks like an old status from a previous sync which keeps the whole clusters in “updating” state.sparse-fireman-14239
11/18/2022, 11:13 AMearly-engineer-43393
11/29/2022, 11:18 AMwaiting: waiting for viable init node
has anyone seen this before, we are not even sure how we can troubleshoot as we have no VM spun up to investigate and no other output from the logs. Thankswitty-engineer-12406
11/30/2022, 11:52 AMapiVersion: <http://rbac.authorization.k8s.io/v1|rbac.authorization.k8s.io/v1>
kind: ClusterRole
metadata:
name: dummy-cr
rules:
- nonResourceURLs: ["/healthz", "/readyz", "/livez"]
verbs: ["get"]
- apiGroups:
- ""
resources: ["pods", "pods/exec"]
verbs: ["get", "delete", "create", "exec", "list"]
- apiGroups:
- ""
resources: ["configmaps"]
verbs: ["create", "delete"]
---
apiVersion: <http://rbac.authorization.k8s.io/v1|rbac.authorization.k8s.io/v1>
kind: ClusterRoleBinding
metadata:
name: dummy-crb
roleRef:
apiGroup: <http://rbac.authorization.k8s.io|rbac.authorization.k8s.io>
kind: ClusterRole
name: dummy-cr
subjects:
- kind: ServiceAccount
name: dummy-sa
namespace: dummy-demo
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: dummy-sa
namespace: dummy-demo
square-policeman-85866
11/30/2022, 3:29 PMsquare-policeman-85866
12/01/2022, 9:17 AMnumerous-nail-55802
12/02/2022, 11:37 AMgentle-petabyte-40055
12/03/2022, 5:53 AMgifted-eye-43916
12/05/2022, 2:32 PMworried-plastic-58654
12/07/2022, 9:07 PMto install rke2 rancher in aws ec2, it is recommended to open jump 5.4, ubuntu centos or another
?worried-plastic-58654
12/08/2022, 3:21 PMboundless-eye-27124
12/09/2022, 12:43 AMable-engineer-22050
12/09/2022, 10:49 AMable-engineer-22050
12/09/2022, 10:51 AMrefined-scientist-20236
12/09/2022, 11:10 AMhundreds-evening-84071
12/12/2022, 9:04 PMbest-microphone-20624
12/13/2022, 9:08 PMboundless-eye-27124
12/14/2022, 2:41 AMforbidden sysctl: "net.ipv4.tcp_rmem" not allowlisted
error. Already patched psp, still getting the errorsquare-policeman-85866
12/14/2022, 10:05 AMambitious-plastic-3551
12/14/2022, 8:23 PMambitious-plastic-3551
12/14/2022, 8:23 PMambitious-plastic-3551
12/14/2022, 9:34 PMagreeable-art-61329
12/14/2022, 11:46 PMconnection refused
on port 9345 of the VIP. Any thoughts?silly-jordan-81965
12/15/2022, 12:08 PMlemon-ability-39482
12/19/2022, 9:57 AMcp -ar /var/lib/rancher/rke2 /mnt/data/
to preserve all attributes, then modified /etc/rancher/rke2/config.yaml and added the line data-dir: /mnt/data/rke2
. This seems to work on agent/worker nodes. On server nodes, however, it looks like the necessary Kubernetes containers can't start. In the log of rke2-server, I keep getting the message Waiting to retrieve kube-proxy configuration; server is not ready: <https://127.0.0.1:9345/v1-rke2/readyz>: 500 Internal Server Error
, while /mnt/data/rke2/agent/containerd/containerd.log looks like this:
time="2022-12-19T09:32:29.103553429+01:00" level=info msg="CreateContainer within sandbox \"478658988f888b30063a9127fb124abd38385967b796e15016675930bbb6cf88\" for container &ContainerMetadata{Name:cloud-controller-manager,Attempt:23,}"
time="2022-12-19T09:32:29.154564201+01:00" level=info msg="CreateContainer within sandbox \"478658988f888b30063a9127fb124abd38385967b796e15016675930bbb6cf88\" for &ContainerMetadata{Name:cloud-controller-manager,Attempt:23,} returns container id \"4a3fe66d16a18ac6397dafc147a68b5bbe9bda1d7d4f7f7ce5e7f95e3a49b84b\""
time="2022-12-19T09:32:29.154953048+01:00" level=info msg="StartContainer for \"4a3fe66d16a18ac6397dafc147a68b5bbe9bda1d7d4f7f7ce5e7f95e3a49b84b\""
time="2022-12-19T09:32:29.276767961+01:00" level=info msg="StartContainer for \"4a3fe66d16a18ac6397dafc147a68b5bbe9bda1d7d4f7f7ce5e7f95e3a49b84b\" returns successfully"
time="2022-12-19T09:32:29.691126028+01:00" level=info msg="shim disconnected" id=4a3fe66d16a18ac6397dafc147a68b5bbe9bda1d7d4f7f7ce5e7f95e3a49b84b
time="2022-12-19T09:32:29.691184071+01:00" level=warning msg="cleaning up after shim disconnected" id=4a3fe66d16a18ac6397dafc147a68b5bbe9bda1d7d4f7f7ce5e7f95e3a49b84b namespace=<http://k8s.io|k8s.io>
time="2022-12-19T09:32:29.691196163+01:00" level=info msg="cleaning up dead shim"
time="2022-12-19T09:32:29.708945825+01:00" level=warning msg="cleanup warnings time=\"2022-12-19T09:32:29+01:00\" level=info msg=\"starting signal loop\" namespace=<http://k8s.io|k8s.io> pid=3497353 runtime=io.containerd.runc.v2\n"
time="2022-12-19T09:32:30.049467561+01:00" level=info msg="RemoveContainer for \"9e40fb319e648b964c90cc77975c6cf7400aac36e53eb6354f738ad31995ce3c\""
time="2022-12-19T09:32:30.056618827+01:00" level=info msg="RemoveContainer for \"9e40fb319e648b964c90cc77975c6cf7400aac36e53eb6354f738ad31995ce3c\" returns successfully"
There are similar messages for kube-apiserver, etcd and kube-controller-manager.
If I remove the data-dir line from my config, it all works again.
Am I doing something wrong here? Some help would be much appreciated.creamy-pencil-82913
12/19/2022, 11:30 AMcreamy-pencil-82913
12/19/2022, 11:30 AM