This message was deleted Rancher Users #amazon

Join Slack

This message was deleted.

# amazon

adamant-kite-43734

07/07/2023, 3:42 PM

This message was deleted.

colossal-dentist-5939

07/07/2023, 10:19 PM

EKS using EC2 instances or an EKS cluster built with fargate?

wide-spoon-87065

07/07/2023, 10:21 PM

it's EKS using EC2. I followed this: https://ranchermanager.docs.rancher.com/getting-started/installation-and-upgrade/install-upgrade-on-a-kubernetes-cluster/rancher-on-amazon-eks

colossal-dentist-5939

07/07/2023, 10:26 PM

What all containers do you see running on your eks node(s)? When the nodes get stuck in "waiting" that usually means the agent is trying to poll or resolve access to something and it will just run in a loop. Sometimes i've found the logs on the rancher server side, sometimes on the agent side, and sometime in the kubelet logs

wide-spoon-87065

07/10/2023, 5:46 PM

I reinstalled Rancher directly on top of RKE2 (on 2 EC2 instances), I tried to create a cluster, but am running into the same problem (Rancher says: "[Waiting] configuring bootstrap node(s) custom-93e446a91a4b: waiting for probes: kubelet"). Looking at the node where I'm trying to deploy the cluster, I can see that the cluster has been created, but 2 PODS are unhappy: "NAMESPACE NAME READY STATUS RESTARTS AGE cattle-system cattle-cluster-agent-6988b48fd5-gzbkv 0/1 ContainerCreating 0 101m cattle-system cattle-cluster-agent-7c887c6f7b-pm5wl 0/1 CrashLoopBackOff 6 (105m ago) 112m".

wide-spoon-87065

07/10/2023, 5:47 PM

and "/var/lib/rancher/rke2/bin/kubectl --kubeconfig /etc/rancher/rke2/rke2.yaml logs cattle-cluster-agent-6988b48fd5-gzbkv -n cattle-system" displays : "Error from server: Get "https://10.252.12.47:10250/containerLogs/cattle-system/cattle-cluster-agent-6988b48fd5-gzbkv/cluster-register": dial tcp 127.0.0.19345 connect: connection refused". (10.252.12.47 is the IP of the node)

colossal-dentist-5939

07/10/2023, 5:48 PM

Do you have all the correct ports open in you aws security group attached to the instances?

colossal-dentist-5939

07/10/2023, 5:48 PM

and eks nodes?

wide-spoon-87065

07/10/2023, 7:19 PM

yes, the security group attached to the EC2 instances allows ALL (ingress/egress).

wide-spoon-87065

07/10/2023, 7:20 PM

on the node where I tried to install RKE2 (using the CLI provided by Rancher), I have:

wide-spoon-87065

07/10/2023, 7:21 PM

# /var/lib/rancher/rke2/bin/kubectl --kubeconfig /etc/rancher/rke2/rke2.yaml cluster-info

E0710 192053.244086 97898 memcache.go:287] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request E0710 192053.262613 97898 memcache.go:121] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request E0710 192053.267281 97898 memcache.go:121] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request E0710 192053.271121 97898 memcache.go:121] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request Kubernetes control plane is running at https://127.0.0.1:6443 CoreDNS is running at https://127.0.0.1:6443/api/v1/namespaces/kube-system/services/rke2-coredns-rke2-coredns:udp-53/proxy To further debug and diagnose cluster problems, use 'kubectl cluster-info dump

wide-spoon-87065

07/10/2023, 7:21 PM

/var/lib/rancher/rke2/bin/kubectl --kubeconfig /etc/rancher/rke2/rke2.yaml get nodes

E0710 192144.809057 98261 memcache.go:287] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request E0710 192144.827734 98261 memcache.go:121] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request E0710 192144.831533 98261 memcache.go:121] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request E0710 192144.834755 98261 memcache.go:121] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request NAME STATUS ROLES AGE VERSION ip-10-252-12-47 NotReady control-plane,etcd,master 3h31m v1.26.6+rke2r1

wide-spoon-87065

07/10/2023, 7:23 PM

but it cannot join Rancher

careful-piano-35019

07/10/2023, 9:01 PM

what's the size of the instance you run RKE2 on ?

wide-spoon-87065

07/10/2023, 10:00 PM

they are t2.medium, 20GB storage

wide-spoon-87065

07/10/2023, 11:10 PM

I managed to make it work, running this on the node on which I'm deploying the cluster from rancher: /var/lib/rancher/rke2/bin/kubectl --kubeconfig /etc/rancher/rke2/rke2.yaml -n cattle-system patch deployments cattle-cluster-agent --patch '{"spec": {"template": {"spec": {"hostAliases": [{"hostnames":["fab-rancher.local"],"ip": "10.252.12.18"}]}}}}'

wide-spoon-87065

07/10/2023, 11:11 PM

I'm using self-signed certficates, and somehow coreDns is not working

careful-piano-35019

07/11/2023, 5:50 AM

for a 1 node cluster it feels a bit tight for running Rancher. I personally run Rancher on a 1 node K3s on a t3a.large instance

colossal-dentist-5939

07/11/2023, 6:32 PM

If you are using a self-signed cert for rancher then that may be your issue. Generally I put an ALB in front of rancher with an ACM cert or use the lentsencrypt integration to generate an LE cert. I'm not aware of any way to tell the rancher agent to ignore the CA or accept any cert

colossal-dentist-5939

07/11/2023, 6:33 PM

You could also try this project to get a signed LE cert on a local address/node: https://www.getlocalcert.net/

wide-spoon-87065

07/11/2023, 9:10 PM

thanks for the replies. I re-installed Rancher on RKE2 on EC2, behind a network load babancer, using the FQDN of the NLB for the time being in the Rancher config, and so far so good, deploying a K8s cluster on AWS EC2 from Rancher works (with the right user:-) )

🎉 1

25 Views

Open in Slack

Previous Next