This message was deleted.
# cluster-api
a
This message was deleted.
o
The rke2 controlplane and bootstrap providers have examples for AWS and vsphere. I've successfully used the AWS CAPA infrastructure provider. I think the issue is that the Azure CAPZ provider does not seem to allow you to open up the extra ports for RKE2 (9345, 10250, 2379/2380, etc.) It only opens up 6443 on the api loadbalancer... In the AWS example, the extra ports are opened as ingress rules. Since RKE2 has recently been added as an official provider to the upstream CAPI docs, I was thinking there would be a way for RKE2 to work with the Azure provider, but I have had no luck
I've asked the same question in the kubernetes capz channel, and they asked to create a github issue to track this. https://github.com/kubernetes-sigs/cluster-api-provider-azure/issues/5511 Still, if anybody has any information I'd appreciate it!
f
Hi there, we have some documentation coming with the next release: https://turtles.docs.rancher.com/turtles/next/en/user/clusterclass.html The Class example is just a simple reference implementation that uses community images. The documentation I linked also provides a ClusterResourceSet to apply the Azure cloud provider. Note however that we are still iterating over this, so it may change a bit before it's released.
o
Thanks Andrea. Im curious if that yaml you linked has been verified as deploying a cluster... It looks almost identical to what I have built, but the azure nodes never join the cluster. Which I think is due to the AzureCluster CR not listening on the required rke2 ports. I'll try the example you sent and see if I have any better luck.
f
we validate this with e2e tests, are you using the Azure RKE2 example or the AKS one? For RKE2 in the updated doc we use HelmApps to install Calico and the Azure Cloud Provider, this does require Rancher 2.11 however
o
Im using RKE2. I can create a rancher 2.11 management cluster and try it out.
f
you can inizialize the cluster manually also without HelmApps, just install the Azure ccm and Calico
the initial provisioning should work if you use the provided ClusterClass, the nodes will just be tainted until the ccm is installed
o
okay. I know in the pasts I had ccm installed and configured. I even verified that the providerID was present on the node, but it was still failing to spin up new nodes. I'll revisit today and see what happens, hopefully I have better luck. Thanks for the help!
l
Hey @orange-shoe-42900 - I was able to successfully provision CAPZ cluster from ClusterClass using
registrationMethod: control-plane-endpoint
with this PR and with multiple Control Plane machines. You can probably already provision a cluster with
internal-first
registration method, but this configuration option can’t be later changed and has issues with upgrades.
o
Thanks! Yeah I saw your MR on the github issue I created. Im still having some issue with the template that @fast-bear-60513 linked in this thread. However I do have some differences... I am deploying into AzureUsGovernment which does require a slightly different setup for the CAPIProvider. I don't think thats a huge issue and should be solved by setting some environment variables on the CAPIProvider resource. My current issue is that the RKE2ControlPlane resource is not actually spinning up any VMs in Azure. I haven't had much time to look into it, but I can try to get to it this week.
f
If you are using the ClusterClass example, you need to setup an AzureClusterIdentity for the Cluster. It is possible that the machines are not provisioned in Azure because of it. I'd check in CAPZ logs. RKE2ControlPlane has no part in it, assuming you can see the CAPI Machine in Provisioning state.
o
I can see the machine in provisioning. I do suspect that part of my issue is that I am deploying in Azure government. There is extra configuration to the Azure Service Operator that needs done to make that work. I have unfortunately not been able to work on this much lately, but I hope to get back to it soon. Thanks again for all the help!
👍 1
@limited-football-68766 I finally got around to revisiting this and Im having good success. Thanks for getting that PR into the cluster-api-provider-azure github project! They have not released a version with your fixes in yet, so I did have to build the provider components and images locally for now. I have a cluster running in azure now with the rke2 controlplane/bootstrap providers, which is a huge success! The only thing I had to do was to change the health check probe on the control plane loadbalancer from an HTTPS probe to a TCP probe. I had to do that manually. The HTTPS probe was trying to hit a /readyz endpoint on the node, which is not served up by RKE2 (as far as I am aware). Is that something you ran into in your testing?
I could not find anything in the updated CRD spec(s) to override the health check probe.
f
Which version of RKE2 are you using? Also how did you configure the newly added
additionalAPIServerLBPorts
in AzureCluster? Our tests based on this class still use the
internal-first
registrationMethod
FYI I also tried building the Azure provider with the latest changes and I managed to succeed. I already prepared a PR: https://github.com/rancher/turtles/compare/fix_azure_rke2_registration_method Mainly by not defining the RKE2 registrationMethod (will default to control-plane-endpoint) and adding the additional rule to AzureCluster:
Copy code
apiVersion: <http://infrastructure.cluster.x-k8s.io/v1beta1|infrastructure.cluster.x-k8s.io/v1beta1>
kind: AzureClusterTemplate
metadata:
  name: azure-cluster
spec:
  template:
    spec:
      identityRef:
        apiVersion: <http://infrastructure.cluster.x-k8s.io/v1beta1|infrastructure.cluster.x-k8s.io/v1beta1>
        kind: AzureClusterIdentity
        name: cluster-identity
      networkSpec:
        subnets:
        - name: control-plane-subnet
          role: control-plane
          securityGroup:
            securityRules:
            - action: Allow
              description: Allow port 9345 for RKE2
              destination: '*'
              destinationPorts: "9345"
              direction: Inbound
              name: allow_port_9345
              priority: 2203
              protocol: Tcp
              source: '*'
              sourcePorts: '*'
        - name: node-subnet
          natGateway:
            name: node-natgateway
          role: node
        additionalAPIServerLBPorts:
        - name: rke2
          port: 9345
Copy code
apiVersion: <http://controlplane.cluster.x-k8s.io/v1beta1|controlplane.cluster.x-k8s.io/v1beta1>
kind: RKE2ControlPlaneTemplate
metadata:
  name: rke2-control-plane
spec:
  template:
    spec:
      rolloutStrategy:
        type: "RollingUpdate"
        rollingUpdate:
          maxSurge: 1
      agentConfig: {}
      serverConfig:
        cni: none
        cloudProviderName: external
        disableComponents:
          kubernetesComponents:
          - cloudController
        kubeAPIServer:
          extraArgs:
          - --anonymous-auth=true
      files:
        - owner: root:root
          path: /etc/kubernetes/azure.json
          permissions: "0644"
l
RKE2 provisioned API server configuration uses --anonymous-auth=false by default, which causes API server health checks to fail trying to reach /readyz I think. You may need to specify
Copy code
kubeAPIServer:
  extraArgs:
  - --anonymous-auth=true
to allow health checks to pass.
☝️ 1
But if you find current changes insufficient, feel free to open a PR/issue upstream to look further into that.
Best example of also tested configuration in full is in https://github.com/kubernetes-sigs/cluster-api-provider-azure/blob/main/templates/cluster-template-clusterclass-rke2.yaml, tested in upstream e2e tests. If it misses something, we should look into that too.
o
Setting anonymous auth to true is not something my team can do as it would violate certain stig hardening rules. I will play around with different settings and see what I can figure out.
👍 1