This message was deleted Rancher Users #vsphere

Join Slack

This message was deleted.

# vsphere

adamant-kite-43734

11/16/2022, 11:46 PM

This message was deleted.

square-orange-60123

11/16/2022, 11:53 PM

I think this may help solve your issue: https://github.com/rancher/rancher/issues/31172#issuecomment-786947101

modern-television-79263

11/17/2022, 12:55 AM

Thank ya! I will check this out tomorrow. :)

👍 1

modern-television-79263

11/18/2022, 7:30 PM

@square-orange-60123 - I got it working on my Test env. Thanks so much! While we did not have a path for the volume datastore, the yaml was not quite the same as in that example. 🙂 I aligned the formatting, and that did it. So now to get it working on an older cluster that has been around for longer… We have 2 datacenters, so here is the format I ended up with:

Copy code

cloud_provider:
    name: vsphere
    vsphereCloudProvider:
      global:
        insecure-flag: true
      virtual_center:
        <http://vsphere.org|vsphere.org>:
          user: rancher_user_acct
          password: 'password'
          datacenters: 'DC1, DC2'
      workspace:
        server: <http://vsphere.org|vsphere.org>
        folder: /DC1/vm/Test VMs/Linux
        default-datastore: kube_dev
        datacenter: DC1

square-orange-60123

11/18/2022, 7:30 PM

glad I could help!

modern-television-79263

11/18/2022, 7:38 PM

Appreciate it, we do pay for Prod support but it hadn’t gotten to that point yet. 😉

😃 1

modern-television-79263

11/18/2022, 9:49 PM

Ok, I tried this setup on an older cluster (but at the same prior versions tested above) in our Pre environment - and I’m hitting the same issue once more.

modern-television-79263

11/18/2022, 9:52 PM

So, I left the volumes in a “pending” state with that error in my original post and re-reconciled the cluster - and they provisioned!

modern-television-79263

11/18/2022, 9:57 PM

Something has got to be screwed up elsewhere in my yaml format. Will continue to investigate.

👍 1

modern-television-79263

11/18/2022, 10:51 PM

Well, this should work. It does on a brand new cluster - but not when upgrading an existing cluster, using the same node templates:

Copy code

# 
# Cluster Config
# 
default_pod_security_policy_template_id: unrestricted
docker_root_dir: /var/lib/docker
enable_cluster_alerting: true
enable_cluster_monitoring: true
enable_network_policy: true
local_cluster_auth_endpoint:
  enabled: false
# 
# Rancher Config
# 
rancher_kubernetes_engine_config:
  addon_job_timeout: 30
  addons: |-
    ---
    kind: StorageClass
    apiVersion: <http://storage.k8s.io/v1|storage.k8s.io/v1>
    metadata:
      name: vsphere-kube-datastore-dc1
    provisioner: <http://kubernetes.io/vsphere-volume|kubernetes.io/vsphere-volume>
    reclaimPolicy: Delete
    parameters:
      diskformat: thin
      datastore: kube_pre_dc1
      fstype: xfs
    ---
    kind: StorageClass
    apiVersion: <http://storage.k8s.io/v1|storage.k8s.io/v1>
    metadata:
      name: vsphere-kube-datastore-dc2
    provisioner: <http://kubernetes.io/vsphere-volume|kubernetes.io/vsphere-volume>
    reclaimPolicy: Delete
    parameters:
      diskformat: thin
      datastore: kube_pre_dc2
      fstype: xfs
  authentication:
    strategy: x509
  bastion_host:
    ssh_agent_auth: false
  cloud_provider:
    name: vsphere
    vsphere_cloud_provider:
      global:
        insecure-flag: true
        soap-roundtrip-count: 0
      virtual_center:
        <http://vsphere.org|vsphere.org>:
          datacenters: 'DC1, DC2'
          soap-roundtrip-count: 0
          user: rancher_user
          password: 'password'
      workspace:
        datacenter: DC2
        default-datastore: kube_pre_dc2
        folder: /DC2/vm/Preprod VMs/Linux
        server: <http://vsphere.org|vsphere.org>
  ignore_docker_version: true
# 
# # Currently only nginx ingress provider is supported.
# # To disable ingress controller, set `provider: none`
# # To enable ingress on specific nodes, use the node_selector, eg:
#    provider: nginx
#    node_selector:
#      app: ingress
# 
  ingress:
    default_backend: true
    http_port: 0
    https_port: 0
    options:
      compute-full-forwarded-for: 'true'
      hsts-include-subdomains: 'false'
      hsts-max-age: '31536000'
      proxy-body-size: '0'
      use-forwarded-headers: 'true'
    provider: nginx
  kubernetes_version: v1.20.15-rancher2-1
  monitoring:
    provider: metrics-server
    replicas: 1
# 
#   If you are using calico on AWS
# 
#    network:
#      plugin: calico
#      calico_network_provider:
#        cloud_provider: aws
# 
# # To specify flannel interface
# 
#    network:
#      plugin: flannel
#      flannel_network_provider:
#      iface: eth1
# 
# # To specify flannel interface for canal plugin
# 
#    network:
#      plugin: canal
#      canal_network_provider:
#        iface: eth1
# 
  network:
    mtu: 0
    options:
      flannel_backend_type: vxlan
    plugin: canal
  restore:
    restore: false
  rotate_encryption_key: false
# 
#    services:
#      kube-api:
#        service_cluster_ip_range: 10.43.0.0/16
#      kube-controller:
#        cluster_cidr: 10.42.0.0/16
#        service_cluster_ip_range: 10.43.0.0/16
#      kubelet:
#        cluster_domain: cluster.local
#        cluster_dns_server: 10.43.0.10
# 
  services:
    etcd:
      backup_config:
        enabled: true
        interval_hours: 6
        retention: 28
        safe_timestamp: false
        timeout: 300
      creation: 6h
      extra_args:
        election-timeout: '5000'
        heartbeat-interval: '500'
      gid: 0
      retention: 72h
      snapshot: false
      uid: 0
    kube_api:
      always_pull_images: false
      extra_args:
        default-not-ready-toleration-seconds: '60'
        default-unreachable-toleration-seconds: '60'
      pod_security_policy: true
      secrets_encryption_config:
        enabled: false
      service_node_port_range: 30000-32767
    kube_controller:
      extra_args:
        node-monitor-grace-period: 30s
        node-monitor-period: 10s
        v: '3'
    kubelet:
      extra_args:
        node-status-update-frequency: 5s
      fail_swap_on: false
      generate_serving_certificate: false
  ssh_agent_auth: false
  upgrade_strategy:
    drain: false
    max_unavailable_controlplane: '1'
    max_unavailable_worker: 10%
    node_drain_input:
      delete_local_data: true
      force: true
      grace_period: 120
      ignore_daemon_sets: true
      timeout: 240
scheduled_cluster_scan:
  enabled: true
  scan_config:
    cis_scan_config:
      debug_master: false
      debug_worker: false
      override_benchmark_version: rke-cis-1.5
      profile: permissive
  schedule_config:
    cron_schedule: 0 19 * * *
    retention: 7
windows_prefered_cluster: false

square-orange-60123

11/19/2022, 4:47 AM

what are you upgrading to?

modern-television-79263

11/19/2022, 5:14 AM

Rancher 2.4.xx to Rancher 2.5.xx. Then the clusters from K8s 1.18.xx to 1.20.xx

square-orange-60123

11/19/2022, 10:38 PM

interesting, what exactly did you need to do to

rereconcile

the cluster(s) with the error?

modern-television-79263

11/20/2022, 2:04 PM

I just updated something in the cluster yaml, and it triggered the cluster reconciliation/update and the pending pvcs then provisioned. However, creating another new pvc is still failing until another cluster update.

👍 1

modern-television-79263

11/21/2022, 4:18 PM

I went ahead and tried an upgrade to K8s v1.19 on an existing cluster and I’m seeing the same problem there too. 😞

modern-television-79263

11/28/2022, 6:38 PM

Still no luck. On a new cluster it works fine. In comparing the cloud-configs on the ROS nodes of each cluster (new, upgraded and not) I see no differences. One other interesting artifact I’m seeing is that on the upgraded Rancher Server instance - restarted worker nodes do not come back up. The kubelet pod continuously restarts and there is nothing under

/etc/kubernetes

on the node… A restarted etcd/control plane node comes back up fine.

11 Views

Open in Slack

Previous Next