This message was deleted Rancher Users #vsphere

Join Slack

This message was deleted.

# vsphere

adamant-kite-43734

12/16/2022, 3:21 PM

This message was deleted.

hallowed-breakfast-56871

12/20/2022, 11:30 PM

We ended up deploying all vSphere CSI and CPI manually direct from vSphere after heaps of problems with the rancher helm charts... With this method we have zero issues and it is easily repeatable across all our cluster deployments.

hallowed-breakfast-56871

12/20/2022, 11:43 PM

https://github.com/kubernetes/cloud-provider-vsphere/tree/master/charts/vsphere-cpi followed by: https://docs.vmware.com/en/VMware-vSphere-Container-Storage-Plug-in/2.0/vmware-vsphe[…]etting-started/GUID-A1982536-F741-4614-A6F2-ADEE21AA4588.html

adventurous-address-26812

12/22/2022, 5:43 PM

Thanks @hallowed-breakfast-56871 - I have been beating this issue for the past days and I build another cluster to try out the vsphere manual install - I had issues with the node selectors on that install with the generic vsphere install - did you hit any issues with that installing on a cluster you build from Rancher by any chance?

hallowed-breakfast-56871

12/22/2022, 6:47 PM

To cut another long story short, we also do not use Rancher to provision clusters on vSphere anymore, basically due to bugs, crashes, and issues with connecting with our private cloud. We instead terraform our RKE2 clusters, then enroll them to Rancher while bootstrapping. I've done this manual CSI and CPI method above on 6 clusters now, all work fine and actually granted me more granular control over my storage options with vSphere. Please also note that we are not using vSAN for storage.

hallowed-breakfast-56871

12/22/2022, 6:51 PM

Also note that you must setup your clusters without a cloud provider.

adventurous-address-26812

12/22/2022, 7:04 PM

You have great timing. I just built a new cluster and was trying to install with the helm chart and it was not having it. I will tear down and build it without a cloud provider. For clarity sake are you doing these steps on your end: 1. installing the helm chart for the CPI (link #1 above) 2. Installing the CSI per link #2 above The reason I ask about the order is because now that I am doing this manually in testing, I came across this https://docs.vmware.com/en/VMware-vSphere-Container-Storage-Plug-in/2.0/vmware-vsphe[…]etting-started/GUID-0AB6E692-AA47-4B6A-8CEA-38B754E16567.html That link shows how to install the CPI yet a THIRD way so I think I will try the order above in a new cluster without a cloud provider but wanted to verify with you. Thanks Josh!

hallowed-breakfast-56871

12/22/2022, 7:07 PM

So just checking my notes / wiki.... I do the CPI first, which should be a helm chart which is now looked after by Kubernetes

<https://kubernetes.github.io/cloud-provider-vsphere>

Then I do the CSI which should be a a deployment pulled from the kubernetes git repo -

<https://raw.githubusercontent.com/kubernetes-sigs/vsphere-csi-driver/v2.6.0/manifests/vanilla/vsphere-csi-driver.yaml>

adventurous-address-26812

12/22/2022, 7:09 PM

thanks...I have the new cluster build running now without any cloud provider so I will do that first part via the helm link you provided above and let you know how I make out. Appreciate the notes on this - they have been helpful!

hallowed-breakfast-56871

12/22/2022, 7:09 PM

No worries. I would normally give our my full wiki page, but I'd need to redact a whole bunch. I still can if you still get stuck.

adventurous-address-26812

12/22/2022, 7:30 PM

np....i think you have set me on the right path. I'll hit you up if I get stuck - thank you very much for all the help and insight!

🙌 1

adventurous-address-26812

12/22/2022, 8:02 PM

well I thought I would have more luck, but after creating a new Rancher cluster (ubuntu 20, 1 etcd, 1cp, 3 workers) with no external cloud provider, deploying the helm chart or doing it manually seems to create the daemonset in kube-system, but its empty and no pods get created - is that expected from what you have tested? I had assumed to see controller pods running and for the life of me, I am not able to diagnose why this is the outcome: I feel I am missing something as I know the Rancher CPI would instantiate some controller pods so was curious if this rang any bells to you from your testing? thx!

hallowed-breakfast-56871

12/22/2022, 8:04 PM

Is the

vsphere-cloud-controller-manager

running?

adventurous-address-26812

12/22/2022, 8:05 PM

Nope...nothing like that is running - which is what I was expecting to be running

adventurous-address-26812

12/22/2022, 8:05 PM

its like the helm chart deployed OK, the DS was created empty, and it thinks things are OK...haha. But clearly there is no controller pod running which is odd

hallowed-breakfast-56871

12/22/2022, 8:06 PM

Perhaps try remove and re-install. I've personally not seen that before. The CPI is the easier part

adventurous-address-26812

12/22/2022, 8:07 PM

yeah - i have found it through rancher to be the quickest and easiest part. I've tried a few times to delete/uninstall through helm and redo but it ends up in the same state each time - its really strange and of course nothing shows an actual error

hallowed-breakfast-56871

12/22/2022, 8:08 PM

Do you have a tool like OpenLens which you can dig around with?

adventurous-address-26812

12/22/2022, 8:08 PM

unfortunately, I don't have that

hallowed-breakfast-56871

12/22/2022, 8:10 PM

Ha, that's a same, its very good and showing errors. Can you describe the DaemonSet? or it is straight up not being made by helm?

hallowed-breakfast-56871

12/22/2022, 8:11 PM

Ah. I think I see. Your node labels dont match a normal RKE2 install.

hallowed-breakfast-56871

12/22/2022, 8:13 PM

This is from one of my working clusters. I thin kthe issue you have is regarding your

controlplane

taint. It should be

control-plane

to match best practice, and what this chart requires.

adventurous-address-26812

12/22/2022, 8:16 PM

Copy code

[root@vsphere-cpi]# helm upgrade --install vsphere-cpi vsphere-cpi/vsphere-cpi --namespace kube-system --set config.enabled=true --set config.vcenter=hci-vcenter.domain.local --set config.username=administrator@vsphere.local --set config.password=XXX --set config.datacenter=DataCenter
WARNING: Kubernetes configuration file is group-readable. This is insecure. Location: /home/klaughman/rubrikk8s-vspherecsi.cfg
WARNING: Kubernetes configuration file is world-readable. This is insecure. Location: /home/klaughman/rubrikk8s-vspherecsi.cfg
Release "vsphere-cpi" does not exist. Installing it now.
NAME: vsphere-cpi
LAST DEPLOYED: Thu Dec 22 15:12:01 2022
NAMESPACE: kube-system
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
Thank you for installing vsphere-cpi.

Your release is named vsphere-cpi.

[root@vsphere-cpi]# helm list -n kube-system
WARNING: Kubernetes configuration file is group-readable. This is insecure. Location: /home/klaughman/rubrikk8s-vspherecsi.cfg
WARNING: Kubernetes configuration file is world-readable. This is insecure. Location: /home/klaughman/rubrikk8s-vspherecsi.cfg
NAME            NAMESPACE       REVISION        UPDATED                                 STATUS          CHART                   APP VERSION
vsphere-cpi     kube-system     1               2022-12-22 15:12:01.300807621 -0500 EST deployed        vsphere-cpi-1.25.0      1.25.0




[root@vsphere-cpi]# kubectl get ds -n kube-system
NAME          DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR            AGE
canal         5         5         5       5            5           <http://kubernetes.io/os=linux|kubernetes.io/os=linux>   59m
vsphere-cpi   0         0         0       0            0           <none>                   48s
[root@cetech-lnx01 vsphere-cpi]# kubectl describe ds vsphere-cpi -n kube-system
Name:           vsphere-cpi
Selector:       app=vsphere-cpi
Node-Selector:  <none>
Labels:         app=vsphere-cpi
                <http://app.kubernetes.io/managed-by=Helm|app.kubernetes.io/managed-by=Helm>
                component=cloud-controller-manager
                tier=control-plane
                vsphere-cpi-infra=daemonset
Annotations:    deprecated.daemonset.template.generation: 1
                <http://meta.helm.sh/release-name|meta.helm.sh/release-name>: vsphere-cpi
                <http://meta.helm.sh/release-namespace|meta.helm.sh/release-namespace>: kube-system
Desired Number of Nodes Scheduled: 0
Current Number of Nodes Scheduled: 0
Number of Nodes Scheduled with Up-to-date Pods: 0
Number of Nodes Scheduled with Available Pods: 0
Number of Nodes Misscheduled: 0
Pods Status:  0 Running / 0 Waiting / 0 Succeeded / 0 Failed
Pod Template:
  Labels:           app=vsphere-cpi
                    component=cloud-controller-manager
                    release=vsphere-cpi
                    tier=control-plane
                    vsphere-cpi-infra=daemonset
  Service Account:  cloud-controller-manager
  Containers:
   vsphere-cpi:
    Image:      <http://gcr.io/cloud-provider-vsphere/cpi/release/manager:v1.25.0|gcr.io/cloud-provider-vsphere/cpi/release/manager:v1.25.0>
    Port:       <none>
    Host Port:  <none>
    Args:
      --cloud-provider=vsphere
      --v=2
      --cloud-config=/etc/cloud/vsphere.conf
    Environment:  <none>
    Mounts:
      /etc/cloud from vsphere-config-volume (ro)
  Volumes:
   vsphere-config-volume:
    Type:               ConfigMap (a volume populated by a ConfigMap)
    Name:               vsphere-cloud-config
    Optional:           false
  Priority Class Name:  system-node-critical
Events:                 <none>

hallowed-breakfast-56871

12/22/2022, 8:18 PM

Yeah, try adding the correct taint for your control plane and see what that does.

Copy code

nodeAffinity:
  requiredDuringSchedulingIgnoredDuringExecution:
    nodeSelectorTerms:
      - matchExpressions:
          - key: <http://node-role.kubernetes.io/control-plane|node-role.kubernetes.io/control-plane>
            operator: Exists
      - matchExpressions:
          - key: <http://node-role.kubernetes.io/master|node-role.kubernetes.io/master>
            operator: Exists

adventurous-address-26812

12/22/2022, 8:18 PM

OK...that could be, I believe this image is using an RKE1 setup

hallowed-breakfast-56871

12/22/2022, 8:18 PM

Either that or add master.

hallowed-breakfast-56871

12/22/2022, 8:18 PM

Yeah RKE1 still using old taint for controlplane. I think the world has since moved on to

control-plane

adventurous-address-26812

12/22/2022, 8:21 PM

ok....that is good to know. Admittedly, I have a good amount to still learn with Rancher so this is helpful on that front. So in your snippet above, are you saying to manually add the taint to the control-plane node like this:

Copy code

[root@cetech-lnx01 vsphere-cpi]# kubectl describe nodes | egrep "Taints:|Name:"
Name:               rubrikk8s-csi-cp1
Taints:             <http://node-role.kubernetes.io/controlplane=true:NoSchedule|node-role.kubernetes.io/controlplane=true:NoSchedule>
Name:               rubrikk8s-csi-etcd1
Taints:             <http://node-role.kubernetes.io/etcd=true:NoExecute|node-role.kubernetes.io/etcd=true:NoExecute>
Name:               rubrikk8s-csi-wkr1
Taints:             <http://node.cloudprovider.kubernetes.io/uninitialized=true:NoSchedule|node.cloudprovider.kubernetes.io/uninitialized=true:NoSchedule>
Name:               rubrikk8s-csi-wkr2
Taints:             <http://node.cloudprovider.kubernetes.io/uninitialized=true:NoSchedule|node.cloudprovider.kubernetes.io/uninitialized=true:NoSchedule>
Name:               rubrikk8s-csi-wkr3
Taints:             <http://node.cloudprovider.kubernetes.io/uninitialized=true:NoSchedule|node.cloudprovider.kubernetes.io/uninitialized=true:NoSchedule>
[root@cetech-lnx01 vsphere-cpi]#
[root@cetech-lnx01 vsphere-cpi]# kubectl taint node rubrikk8s-csi-cp1 <http://node-role.kubernetes.io/control-plane=true:NoSchedule|node-role.kubernetes.io/control-plane=true:NoSchedule>

hallowed-breakfast-56871

12/22/2022, 9:28 PM

Yup, that should do it.

adventurous-address-26812

12/22/2022, 9:30 PM

So i just got a new cluster to finish and I changed the taint on the contolplane to control-plane and the CPI installed through helm - THANK YOU!

adventurous-address-26812

12/22/2022, 9:34 PM

now on the CSI driver install, i am hitting an issue I saw before and this feels like a silly question from me but the CSI pods have this as a selector in their pods definition. I have this cluster with 1 CP, 1ETCD, and 3 workers:

Copy code

QoS Class:                   BestEffort
Node-Selectors:              <http://node-role.kubernetes.io/control-plane=|node-role.kubernetes.io/control-plane=>
Tolerations:                 <http://node-role.kubernetes.io/control-plane:NoSchedule|node-role.kubernetes.io/control-plane:NoSchedule> op=Exists
                             <http://node-role.kubernetes.io/master:NoSchedule|node-role.kubernetes.io/master:NoSchedule> op=Exists
                             <http://node.kubernetes.io/not-ready:NoExecute|node.kubernetes.io/not-ready:NoExecute> op=Exists for 300s
                             <http://node.kubernetes.io/unreachable:NoExecute|node.kubernetes.io/unreachable:NoExecute> op=Exists for 300s

However, when applying the CSI driver manifest, the vsphere-csi-controller PODs never seem to schedule:

Copy code

vents:
  Type     Reason            Age   From               Message
  ----     ------            ----  ----               -------
  Warning  FailedScheduling  105s  default-scheduler  0/5 nodes are available: 1 node(s) had taint {<http://node-role.kubernetes.io/etcd|node-role.kubernetes.io/etcd>: true}, that the pod didn't tolerate, 1 node(s) had taint {<http://node.cloudprovider.kubernetes.io/uninitialized|node.cloudprovider.kubernetes.io/uninitialized>: true}, that the pod didn't tolerate, 3 node(s) didn't match Pod's node affinity/selector.
  Warning  FailedScheduling  15s   default-scheduler  0/5 nodes are available: 1 node(s) had taint {<http://node-role.kubernetes.io/etcd|node-role.kubernetes.io/etcd>: true}, that the pod didn't tolerate, 1 node(s) had taint {<http://node.cloudprovider.kubernetes.io/uninitialized|node.cloudprovider.kubernetes.io/uninitialized>: true}, that the pod didn't tolerate, 3 node(s) didn't match Pod's node affinity/selector.

What's throwing me off is the node selector is just blank: node-role.kubernetes.io/control-plane= I must be missing something with my taints, but I can't seem to toggle anything to match and its stumping me if you had any thoughts on that one 🙂

hallowed-breakfast-56871

12/22/2022, 9:35 PM

Yeah, I've seen this too

hallowed-breakfast-56871

12/22/2022, 9:35 PM

One sec, I have it in my notes...

hallowed-breakfast-56871

12/22/2022, 9:36 PM

Copy code

Next we need to pull down the latest deployment from the Kubernetes GitHub. We do this as we need to remove the node toleration, and edit the replica count if we are deploying to a single node. 
`wget <https://raw.githubusercontent.com/kubernetes-sigs/vsphere-csi-driver/v2.6.0/manifests/vanilla/vsphere-csi-driver.yaml> `

Edit any requirements (e.g replica's or removing node toleration) in the yaml.
You might want to edit the lines like below, to read true. Then remove all tolerations after.

yaml nodeSelector: node-role.kubernetes.io/control-plane: ""```

hallowed-breakfast-56871

12/22/2022, 9:37 PM

So in that yaml you pulled down, check for tolerations, and remove them if required.

adventurous-address-26812

12/22/2022, 9:38 PM

ahh....ok...this is helpful.....I was running this one right from the raw URL as in the docs, but let me pull this down and hack through it. I will let you know how I make out either tonight or tomorrow. Can't thank you enough for all your advice and help here Josh. thank you!

adventurous-address-26812

12/22/2022, 9:39 PM

the empty "=" was killing me. I read it infers true and was like WTF should I set these too...hhaa

hallowed-breakfast-56871

12/22/2022, 9:39 PM

No worries. Happy to help!

150 Views

Open in Slack

Previous Next