This message was deleted.
# rke2
a
This message was deleted.
b
most likely transient errors here too
cat /var/lib/rancher/rke2/agent/containerd/containerd.log | grep level=error
- https://gist.githubusercontent.com/iosifnicolae2/4fdb96ba55ca3c285f54f970b846382d/raw/aa5eae900c6268e3cb5664a7ff46ab15392fd5ab/gistfile1.txt
not sure what to extract from here -
cat /var/lib/rancher/rke2/agent/logs/kubelet.log | grep ^E -C 2
- https://gist.githubusercontent.com/iosifnicolae2/8d6b456b18889dff94c0f8415b3c231f/raw/a4a9a1a13bb8df6115e68fb36b2bc6496b1b34b6/gistfile1.txt
etcd is up and running
Copy code
etcdcontainer=$(/var/lib/rancher/rke2/bin/crictl ps --label io.kubernetes.container.name=etcd --quiet)
/var/lib/rancher/rke2/bin/crictl exec $etcdcontainer sh -c "ETCDCTL_ENDPOINTS='<https://127.0.0.1:2379>' ETCDCTL_CACERT='/var/lib/rancher/rke2/server/tls/etcd/server-ca.crt' ETCDCTL_CERT='/var/lib/rancher/rke2/server/tls/etcd/server-client.crt' ETCDCTL_KEY='/var/lib/rancher/rke2/server/tls/etcd/server-client.key' ETCDCTL_API=3 etcdctl endpoint status --cluster --write-out=table"
~+----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+ | ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS | +----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+ | https://192.168.3.157:2379 | 47c09bf8da7b4a15 | 3.5.7 | 17 MB | true | false | 2 | 9111 | 9111 | | +----------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+~
etcd is up and running
Copy code
export CRI_CONFIG_FILE=/var/lib/rancher/rke2/agent/etc/crictl.yaml
etcdcontainer=$(/var/lib/rancher/rke2/bin/crictl ps --label io.kubernetes.container.name=etcd --quiet)
/var/lib/rancher/rke2/bin/crictl exec $etcdcontainer sh -c "ETCDCTL_ENDPOINTS='<https://127.0.0.1:2379>' ETCDCTL_CACERT='/var/lib/rancher/rke2/server/tls/etcd/server-ca.crt' ETCDCTL_CERT='/var/lib/rancher/rke2/server/tls/etcd/server-client.crt' ETCDCTL_KEY='/var/lib/rancher/rke2/server/tls/etcd/server-client.key' ETCDCTL_API=3 etcdctl endpoint status --cluster --write-out=table"
Copy code
export CRI_CONFIG_FILE=/var/lib/rancher/rke2/agent/etc/crictl.yaml
etcdcontainer=$(/var/lib/rancher/rke2/bin/crictl ps --label io.kubernetes.container.name=etcd --quiet)
/var/lib/rancher/rke2/bin/crictl exec $etcdcontainer sh -c "ETCDCTL_ENDPOINTS='<https://127.0.0.1:2379>' ETCDCTL_CACERT='/var/lib/rancher/rke2/server/tls/etcd/server-ca.crt' ETCDCTL_CERT='/var/lib/rancher/rke2/server/tls/etcd/server-client.crt' ETCDCTL_KEY='/var/lib/rancher/rke2/server/tls/etcd/server-client.key' ETCDCTL_API=3 etcdctl endpoint health --cluster --write-out=table"
Rancher provisioning logs:
Copy code
[INFO ] waiting for infrastructure ready
[INFO ] waiting for viable init node
[INFO ] configuring bootstrap node(s) iosif-kube-1-pool1-69bf8cf6bd-nqb6z: waiting for agent to check in and apply initial plan
[INFO ] configuring bootstrap node(s) iosif-kube-1-pool1-69bf8cf6bd-nqb6z: waiting for probes: calico, etcd, kube-apiserver, kube-controller-manager, kube-scheduler, kubelet
[INFO ] configuring bootstrap node(s) iosif-kube-1-pool1-69bf8cf6bd-nqb6z: waiting for probes: calico, etcd, kube-apiserver, kube-controller-manager, kube-scheduler
[INFO ] configuring bootstrap node(s) iosif-kube-1-pool1-69bf8cf6bd-nqb6z: waiting for probes: calico, kube-apiserver, kube-controller-manager, kube-scheduler
[INFO ] configuring bootstrap node(s) iosif-kube-1-pool1-69bf8cf6bd-nqb6z: waiting for probes: calico
[INFO ] configuring bootstrap node(s) iosif-kube-1-pool1-69bf8cf6bd-nqb6z: waiting for cluster agent to connect
[INFO ] non-ready bootstrap machine(s) iosif-kube-1-pool1-69bf8cf6bd-nqb6z and join url to be available on bootstrap node
[INFO ] configuring bootstrap node(s) iosif-kube-1-pool1-69bf8cf6bd-nqb6z: waiting for cluster agent to connect
Obs. We’re deploying a RKE2 cluster on Harvester
/var/lib/rancher/rke2/bin/kubectl logs -n cattle-system cattle-cluster-agent-f66966868-z42js | grep level=error -C 2
- https://gist.github.com/iosifnicolae2/7bace512a89f9164b14d4c445f3b4548
Copy code
level=error msg="Failed to dial steve aggregation server: read tcp 10.42.88.100:32868->192.168.3.253:443: use of closed network connection"
• I don’t know if we should be concerned about this error..
logs from pod
fleet-controller-77dbcb4978-9j696
from Rancher embeded Kubernetes:
Copy code
2023-07-25T12:41:10.211759314Z time="2023-07-25T12:41:10Z" level=error msg="error syncing 'fleet-local/local': handler import-cluster: Get \"<https://10.43.176.48/k8s/clusters/local/version?timeout=15s>\": dial tcp 10.43.176.48:443: connect: connection refused, requeuing"
https://gist.githubusercontent.com/iosifnicolae2/3c3b8e94e76d31b2fa443a52cd93cca4/raw/1850bed95c745cacb2fe1a8c533fe5f8998e112c/gistfile1.txt
Finnaly I found the issue: • we’ve deployed Rancher in a Harvester VM • the solution was to configure the default interface to use MTU 1400:
Copy code
"network":
  "ethernets":
    "enp1s0":
      "dhcp4": true
      "mtu": 1400
  "version": 2
278 Views