Perhaps someone has a moment to help me with a hel...
# general
e
Perhaps someone has a moment to help me with a helm chart question?
I am trying to override some of the settings in the nvidia/gpu-operator chart. I have beaten it down to just one small nested section that continues to vex me. I am trying to deploy to an air-gapped environment, so I have to replace all of the nvcr.io and k8s.io registry entries with my own, along with an imagePullSecret. This particular section is nested and doesnt seem to work as straight-forward as it did for the "regular" pods:
Copy code
node-feature-discovery:
  master:
    tolerations:
      - key: "CriticalAddonsOnly"
        operator: "Exists"
      - key: "<http://nvidia.com/gpu|nvidia.com/gpu>"
        operator: "Exists"
        effect: "NoSchedule"
      - key: "sku"
        operator: "Equal"
        value: "gpu"
        effect: "NoSchedule"
  worker:
    tolerations:
      - key: "CriticalAddonsOnly"
        operator: "Exists"
      - key: "<http://nvidia.com/gpu|nvidia.com/gpu>"
        operator: "Exists"
        effect: "NoSchedule"
      - key: "sku"
        operator: "Equal"
        value: "gpu"
        effect: "NoSchedule"
  gc:
    tolerations:
      - key: "CriticalAddonsOnly"
        operator: "Exists"
      - key: "<http://nvidia.com/gpu|nvidia.com/gpu>"
        operator: "Exists"
        effect: "NoSchedule"
      - key: "sku"
        operator: "Equal"
        value: "gpu"
        effect: "NoSchedule"
I have tried adding "image:" at the top and even inside each sub section, but it just ignores it.
image.png
They spin up fine, but only because they can access the remote image.
image.png
And to be clear, these settings are applied in my values.yaml that I apply with the helm chart:
helm install -f values.yaml --wait gpu-operator   -n gpu-resources --create-namespace   nvidia/gpu-operator   --version=24.6.2
An example of a "normal" pod config in this same values.yaml file thats working perfectly:
Copy code
validator:
  repository: gitlab.actd.lab:5055/aess-edge/cdao/devsecops/ironseed/nvidia/operator-validator
  image: operator-validator
  version: v24.6.2
  imagePullPolicy: Always
  imagePullSecrets: ["private-registry"]
in case anyone wants the chart:
helm repo add nvidia <https://helm.ngc.nvidia.com/nvidia>