gorgeous-alarm-2311
05/15/2023, 11:10 AM14
and 30
are not in the cluster.yml, etcd doesn’t report them, and --etcd-servers
is correctly set to <https://192.168.253.47:2379>
gorgeous-alarm-2311
05/15/2023, 11:11 AMkubectl get no
mis-reporting?gorgeous-alarm-2311
05/15/2023, 11:14 AMkubectl get no
is misreporting.gorgeous-alarm-2311
05/15/2023, 11:16 AMancient-portugal-61276
05/15/2023, 11:39 AMquaint-leather-88230
05/15/2023, 11:48 AMrke2
and k3s
. So far I’ve found these differences between them:
• k3s optimized for edge computing,
• k3s has diverged from upstream k8s, while rke2 is much closer to upstream
• k3s more lightweight, stripped down for edge computing / rke2 is light but not that light
• rke2 security-oriented for FIPS compliance
• k3s has better community support
• i also noticed k3s really starts as a single binary, and controls cluster components internally via it’s own supervisor, but rke2 actually creates static pods for internal cluster components. Not sure which option is better.
Our use case is cloud computing, deploying web applications, data pipelines, etc. We don’t really need all that FIPS compliance, but we also run kubernetes on very powerful nodes, so resources aren’t really an issue.
We have about 40 nodes in a single cluster right now, but on the new infrastructure we’ll split them across 5 clusters (for geo-rendundancy, plus complete isolation for some specific apps).
We’re deploying the clusters via ansible + kustomize + helm (not helm-controller, just helm template + apply resulting manifest after a short diff/review).
Given these facts, what distribution would you run? k3s or rke2? Which do you think has a “brighter” future?
Thanks ✌️gorgeous-alarm-2311
05/15/2023, 12:04 PMrke
channel?kind-air-74358
05/15/2023, 1:26 PMaloof-analyst-42202
05/15/2023, 1:39 PMnarrow-honey-55422
05/15/2023, 1:39 PMearly-farmer-20948
05/15/2023, 2:07 PMswift-hair-47673
05/15/2023, 3:05 PM[ERROR] Failed to set up SSH tunneling for host [10.0.10.107]: Can't retrieve Docker Info: error during connect: Get "<http://%2Fvar%2Frun%2Fdocker.sock/v1.24/info>": can not build dialer to [c-p9cqp:m-2bd0a9380d15]
[ERROR] Removing host [10.0.10.107] from node lists
A restart of the affected node is required to recover it.
The restart of all containers is very worrying and it's something I have never seen before. I've tried to search about the behaviour but no luck so far.
Has anyone seen this behaviour before?rich-journalist-45109
05/15/2023, 3:27 PMfreezing-hairdresser-79403
05/16/2023, 7:40 AMstraight-fountain-2279
05/16/2023, 10:43 AMstraight-fountain-2279
05/16/2023, 10:43 AMaloof-analyst-42202
05/16/2023, 11:56 AMboundless-zebra-36646
05/16/2023, 12:31 PMenough-pencil-16731
05/16/2023, 2:15 PMproud-motherboard-51785
05/16/2023, 2:18 PMswift-hair-47673
05/16/2023, 2:23 PMfreezing-action-16232
05/16/2023, 7:26 PMancient-energy-15842
05/16/2023, 8:38 PMAgentDeployed False
and it seems that my control plane is broken, the thing is, that I cant create more control plane nodes because they get stuck with Waiting for registering with kubernetes
, Rancher version is 2.7.3 and Kubernetes version is 1.24.13
SSHing into those machines I just see 2 containers with docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
ed435976827f rancher/rancher-agent:v2.7.3 "run.sh --no-registe…" 10 minutes ago Exited (0) 10 minutes ago share-mnt
44860d74081d rancher/rancher-agent:v2.7.3 "run.sh --server htt…" 10 minutes ago Up 10 minutes admiring_yalow
Taking a look into the logs of the rancher-agent
that it's still running I get
INFO: Arguments: --server <https://REDACTED> --token REDACTED -r -n m-644rr
INFO: Environment: CATTLE_ADDRESS=172.31.37.252 CATTLE_AGENT_CONNECT=true CATTLE_INTERNAL_ADDRESS= CATTLE_NODE_NAME=m-644rr CATTLE_SERVER=<https://REDACTED> CATTLE_TOKEN=REDACTED
INFO: Using resolv.conf: nameserver 127.0.0.53 options edns0 trust-ad search us-east-2.compute.internal
WARN: Loopback address found in /etc/resolv.conf, please refer to the documentation how to configure your cluster to resolve DNS properly
INFO: <https://REDACTED/ping> is accessible
INFO: REDACTED resolves to REDACTED
time="2023-05-16T20:33:41Z" level=info msg="Listening on /tmp/log.sock"
time="2023-05-16T20:33:41Z" level=info msg="Rancher agent version v2.7.3 is starting"
time="2023-05-16T20:33:41Z" level=info msg="Option worker=false"
time="2023-05-16T20:33:41Z" level=info msg="Option requestedHostname=m-644rr"
time="2023-05-16T20:33:41Z" level=info msg="Option dockerInfo={PFIV:LBRE:AMMF:JNWU:4XQZ:GPJN:4FSD:4O6A:U336:VFBT:7WVD:AKOS 2 1 0 1 1 overlay2 [[Backing Filesystem extfs] [Supports d_type true] [Native Overlay Diff true] [userxattr false]] [] {[local] [bridge host ipvlan macvlan null overlay] [] [awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog]} true true false false true true true true true true true true false 32 false 39 2023-05-16T20:33:41.064129161Z json-file systemd 2 0 5.19.0-1025-aws Ubuntu 22.04.2 LTS 22.04 linux x86_64 <https://index.docker.io/v1/> 0xc0011f0a10 4 16629444608 [] /var/lib/docker control-plane-5 [provider=amazonec2] false 20.10.23 map[io.containerd.runc.v2:{runc [] <nil>} io.containerd.runtime.v1.linux:{runc [] <nil>} runc:{runc [] <nil>}] runc { inactive false [] 0 0 <nil> []} false docker-init {3dce8eb055cbb6872793272b4f20ed16117344f8 3dce8eb055cbb6872793272b4f20ed16117344f8} {v1.1.7-0-g860f061 v1.1.7-0-g860f061} {de40ad0 de40ad0} [name=apparmor name=seccomp,profile=default name=cgroupns] [] []}"
time="2023-05-16T20:33:41Z" level=info msg="Option customConfig=map[address:172.31.37.252 internalAddress: label:map[] roles:[] taints:[]]"
time="2023-05-16T20:33:41Z" level=info msg="Option etcd=false"
time="2023-05-16T20:33:41Z" level=info msg="Option controlPlane=false"
time="2023-05-16T20:33:41Z" level=info msg="Connecting to <wss://REDACTED/v3/connect> with token starting with c5brthtz9nwjwnmrqr5spckpw45"
time="2023-05-16T20:33:41Z" level=info msg="Connecting to proxy" url="<wss://REDACTED/v3/connect>"
time="2023-05-16T20:33:41Z" level=info msg="Requesting kubelet certificate regeneration"
time="2023-05-16T20:33:41Z" level=info msg="Starting plan monitor, checking every 120 seconds"
time="2023-05-16T20:35:41Z" level=info msg="Requesting kubelet certificate regeneration"
Any ideas why they get stuck at Requesting kubelet certificate regeneration
?high-salesclerk-5768
05/17/2023, 7:00 AMstale-nest-88025
05/17/2023, 7:48 AMstale-nest-88025
05/17/2023, 7:52 AMswift-library-66826
05/17/2023, 8:29 AMwide-diamond-18008
05/17/2023, 8:54 AMfierce-tomato-30072
05/17/2023, 9:56 AMfierce-tomato-30072
05/17/2023, 10:07 AM{"level":"info","ts":"2023-05-17T08:07:29.022Z","caller":"etcdmain/etcd.go:73","msg":"Running: ","args":["etcd","--config-file=/var/lib/rancher/rke2/server/db/etcd/config"]}
{"level":"warn","ts":"2023-05-17T08:07:29.022Z","caller":"etcdmain/etcd.go:75","msg":"failed to verify flags","error":"open /var/lib/rancher/rke2/server/db/etcd/config: permission denied"}
Step 4: I restarted rke2-server on 3 master servers and got async error
Step 5: On first master execute rke2-uninstall
Step 6: I removed etcd on master nodes to start the etcd cluster
Step 7: I tried to remove nodes first master to reconfigure join but failed. (kubectl delete nodes master-01)
Can anyone help me get the first master into the cluster?