This message was deleted.
# rke2
a
This message was deleted.
c
Copy code
providerID <rke2://testjs2-worker-8bkz5-2dwvt> is invalid for EC2 instances
You need to deploy your cluster with the AWS cloud provider if you want to use the AWS LB controller. Either that or manually set the provider ID on your nodes via the kubelet arg. By default the RKE2 stub cloud provider will set the providerID to
<rke2://NODENAME>
which does not match the format that the AWS LB controller needs to look up the instance.
d
thx, do you have any pointers on how to proceed with your second suggestions? (manually set the provider ID)
c
https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet/ search for the
--provider-id
string. You’d need to determine what format AWS wants, and how to set it correctly on a per-node basis. It would probably be easier to build a new cluster with the correct cloud provider deployed, since the providerID cannot be changed on existing nodes, so you’ll have to rebuild anyway.
d
Do I get this correctly? This Amazon Cloud Provider needs to be installed on the rancher cluster (parent). If it's the case I think this won't work in my setup since Rancher is deployed in a on-prem env and not in AWS.
c
no. you need to deploy the cloud provider to the cluster that you want to use the LB controller on. Not on the Rancher management cluster.
d
ok, will give it a try
p
I desperately need help accomplishing this when deploying an RKE2 Cluster on EC2. I've tried setting the cloud provider to default and to AWS when creating the cluster (with the drop-down in the screenshot) and no luck. I spin up the cluster, install the aws cni, and then create the ingress - but I get the same error as above.
Any help HUGELY appreciated 🙏
Any help HUGELY appreciated! 🙏🙏
d
In my case I use terraform to deploy the downstream cluster to aws.
Copy code
resource "rancher2_cluster_v2" "rke2_cluster" {
  rke_config {
    ...
    additional_manifest   = <<EOF
---
apiVersion: <http://helm.cattle.io/v1|helm.cattle.io/v1>
kind: HelmChart
metadata:
  name: aws-cloud-controller-manager
  namespace: kube-system
spec:
  chart: aws-cloud-controller-manager
  repo: <https://kubernetes.github.io/cloud-provider-aws>
  targetNamespace: kube-system
  bootstrap: true
  valuesContent: |-
    hostNetworking: true
    nodeSelector:
      <http://node-role.kubernetes.io/control-plane|node-role.kubernetes.io/control-plane>: "true"
    args:
      - --configure-cloud-routes=false
      - --v=5
      - --cloud-provider=aws
EOF
    machine_global_config = <<EOF
      cloud-provider-name: aws
      EOF
    machine_selector_config {
      config = <<EOF
        disable-cloud-controller: true
        kube-apiserver-arg:
          - cloud-provider=external
        kube-controller-manager-arg:
          - cloud-provider=external
        kubelet-arg:
          - cloud-provider=external
      EOF
      machine_label_selector {
        match_expressions {
          key      = "<http://rke.cattle.io/control-plane-role|rke.cattle.io/control-plane-role>"
          operator = "In"
          values   = ["true"]
        }
      }
    }
    machine_selector_config {
      config = <<EOF
        kubelet-arg:
          - cloud-provider=external
      EOF
      machine_label_selector {
        match_expressions {
          key      = "<http://rke.cattle.io/worker-role|rke.cattle.io/worker-role>"
          operator = "In"
          values   = ["true"]
        }
      }
    }
Hope this helps Also, i'm using rke2 v1.31.3+rke2r1 I also had issues with older versions.
c
@purple-match-66532 you are using a pretty old version of Kubernetes. Why are you still on 1.26?
p
Hmm...Seemed like it was the latest available. for RKE2 on EC2?
Am I already off track? 😬
d
My rancher installation get's updated automatically, so I don't know how you ended up like this. I use the docker image and keep it updated.
p
Interesting... taking a look at what version I'm using/how to get a newer one. I based mine on the quickstart terraform (https://ranchermanager.docs.rancher.com/getting-started/quick-start-guides/deploy-rancher-manager/aws)
Alright, using the latest image from my computer heh
c
If 1.26 is the latest available version in the UI then you are also on an old version of Rancher. All of the Kubernetes versions you’re seeing listed there are end of life.
p
to deploy a modern k8s version to aws
c
What version of Rancher are you using and how did you install it?
p
Previously with the aws quickstart terraform but this time with:
sudo docker run --privileged -d --restart=unless-stopped -p 80:80 -p 443:443 rancher/rancher
Is the guide up to date? A bit of ambiguity in the guide for out-of-tree guide - Is it saying that I have to select Amazon in the dropdown and set those rkeConfig settings in the cluster deployment yaml before creating? And is it saying I then have to install the aws-cloud-controller-manager and then giving me 3 options to do so? (manifest, helm, ui?) Experimenting with the options here and will report back for posterity/anyone in the future - but appreciate any insights! and huge thanks for your time!
c
don’t use the rancher/rancher image, it is not supported and only designed for very lightweight proof of concept. You should deploy the Rancher helm chart to a cluster. K3s is a good choice if you must use Docker, as it will run a full Kubernetes cluster in Docker. Also, don’t use the latest tag - make sure you specify a version of that image if you are going to use it. We do not maintain the latest tag pointing at anything in particular, its probably out of date.
p
Ok, using the helm chart against my cluster running with Rancher Desktop. For the out-of-provider steps to modify the YAML for etcd, Control Plane, and Worker - Do I just add all of these if all 3 nodes have all 3 (So three
- config
entries under
rkeConfig -> machineSelectorConfig
)? Sorry if that is a really dumb question
Or are those meant to be patched after deployment of the cluster?
Ok added the configs under there and used the rancher on my rancher desktop which is v2.10.1 (and k8s 1.31) - Seems like it might be stuck on "Configuring bootstrap node(s) dev-apps-dev-apps-pool-0-5scq8-scv6d: waiting for agent to check in and apply initial plan" One question regarding the cluster id step - when/where do we provide that to rancher? 🤔
c
you don’t
p
Gotcha. Hmm definitely something wrong as its still stuck on "configuring bootstrap node(s) ***the_node_name***: waiting for agent to check in and apply initial plan" - investigating but followed https://ranchermanager.docs.rancher.com/how-to-guides/new-user-guides/kubernetes-clusters-in-rancher-setup/set-up-cloud-providers/amazon to a T
c
log in to the node and check on the services and cluster status
check rke2-server and rancher-system-agent logs in journald. do
kubectl get node -o wide
and see what it says.
p
Re: the guide - am I supposed to select External or Amazon in the dropdown for Cloud Provider? (I have been adding all the rkeConfig stuff)
Given Rancher 2.10.1 and K8s v1.31.3+rke2r1 - assumed External due to the warning at the top re: 1.27 or above needing to be out-of-tree - going to forgo the rkeConfig changes given this?
c
set it to aws and then override to external and deploy the chart as described in the docs.
p
Ok going with that.. Question was due to this step: 2. `Select
Amazon
if relying on the above mechanism to set the provider ID. Otherwise, select External (out-of-tree) cloud provider, which sets
--cloud-provider=external
for Kubernetes components. And the message at the top: In Kubernetes 1.27 and later, you must use an out-of-tree AWS cloud provider. In-tree cloud providers have been deprecated.
Ok, lots of progress. Deployed a new cluster as suggested above with:
Copy code
# Kubernetes version to use for Rancher server cluster
rancher_kubernetes_version = "v1.31.3+k3s1"

# Rancher server version (format: v0.0.0)
rancher_version = "2.10.1"

# Kubernetes version to use for managed workload cluster
workload_kubernetes_version = "v1.31.4+rke2r1"
Now I'm seeing: 1. Failures to pull from ECR - seemingly because its not using the instance profile for some reason and doesn't have credentials. 2. Failures from the AWS Cloud Controller Daemonset:
Copy code
Invalidating discovery information
 k8s.io/client-go@v0.27.0/tools/cache/reflector.go:231: forcing resync
 successfully renewed lease kube-system/cloud-controller-manager
 lock is held by pool-2-****** and has not yet expired
 failed to acquire lease kube-system/cloud-controller-manager
 lock is held by pool-2-****** and has not yet expired
 failed to acquire lease kube-system/cloud-controller-manager
 successfully renewed lease kube-system/cloud-controller-manager
 successfully renewed lease kube-system/cloud-controller-manager
 lock is held by pool-2-****** and has not yet expired
 failed to acquire lease kube-system/cloud-controller-manager