https://rancher.com/ logo
Title
r

refined-beach-98863

08/30/2022, 10:36 AM
Hello, The Rancher Operations pods in my cluster (Rancher v2.6.7) keep pending for no apparent reason. This leads to me being unable to install and update apps through the UI. Anyone encountered anything similar?
a

agreeable-oil-87482

08/30/2022, 11:13 AM
Describe the pod - should allude to why it's pending
r

refined-beach-98863

08/30/2022, 11:19 AM
I've tried, but cannot find anything useful
Name:         helm-operation-fv7pk
Namespace:    cattle-system
Priority:     0
Node:         aks-agentpool-18555919-vmss000004/10.240.0.7
Start Time:   Tue, 30 Aug 2022 10:54:33 +0000
Labels:       <http://pod-impersonation.cattle.io/token=bszj5k7hcl4vtnpt6r2v2kmp5rfs24ckd5jc5hzjn2gwgqtrr4mlnb|pod-impersonation.cattle.io/token=bszj5k7hcl4vtnpt6r2v2kmp5rfs24ckd5jc5hzjn2gwgqtrr4mlnb>
Annotations:  <http://pod-impersonation.cattle.io/cluster-role|pod-impersonation.cattle.io/cluster-role>: pod-impersonation-helm-op-k29lv
Status:       Pending
IP:
IPs:          <none>
Containers:
  helm:
    Container ID:
    Image:         rancher/shell:v0.1.18
    Image ID:
    Port:          <none>
    Host Port:     <none>
    Command:
      helm-cmd
    State:          Waiting
      Reason:       ContainerCreating
    Ready:          False
    Restart Count:  0
    Environment:
      KUBECONFIG:  /home/shell/.kube/config
    Mounts:
      /home/shell/.kube/config from user-kubeconfig (ro,path="config")
      /home/shell/helm from data (ro)
  proxy:
    Container ID:
    Image:         rancher/shell:v0.1.18
    Image ID:
    Port:          <none>
    Host Port:     <none>
    Command:
      sh
      -c
      kubectl proxy --disable-filter || true
    State:          Waiting
      Reason:       ContainerCreating
    Ready:          False
    Restart Count:  0
    Environment:
      KUBECONFIG:  /root/.kube/config
    Mounts:
      /root/.kube/config from admin-kubeconfig (ro,path="config")
      /var/run/secrets/kubernetes.io/serviceaccount from pod-impersonation-helm-op-ntwmn-token (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  data:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  helm-operation-8vdtr
    Optional:    false
  admin-kubeconfig:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      impersonation-helm-op-admin-kubeconfig-k269r
    Optional:  false
  user-kubeconfig:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      impersonation-helm-op-user-kubeconfig-gdz2q
    Optional:  false
  pod-impersonation-helm-op-ntwmn-token:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  pod-impersonation-helm-op-ntwmn-token
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <http://kubernetes.io/os=linux|kubernetes.io/os=linux>
Tolerations:     <http://cattle.io/os=linux:NoSchedule|cattle.io/os=linux:NoSchedule>
                 <http://node-role.kubernetes.io/controlplane=true:NoSchedule|node-role.kubernetes.io/controlplane=true:NoSchedule>
                 <http://node-role.kubernetes.io/etcd=true:NoExecute|node-role.kubernetes.io/etcd=true:NoExecute>
                 <http://node.cloudprovider.kubernetes.io/uninitialized=true:NoSchedule|node.cloudprovider.kubernetes.io/uninitialized=true:NoSchedule>
                 <http://node.kubernetes.io/not-ready:NoExecute|node.kubernetes.io/not-ready:NoExecute> for 300s
                 <http://node.kubernetes.io/unreachable:NoExecute|node.kubernetes.io/unreachable:NoExecute> for 300s
Events:
  Type    Reason     Age        From  Message
  ----    ------     ----       ----  -------
  Normal  Scheduled  <unknown>        Successfully assigned cattle-system/helm-operation-fv7pk to aks-agentpool-18555919-vmss000004
Just
Scheduled
Applying a busybox to the cluster through
kubectl
works correctly.
a

agreeable-oil-87482

08/30/2022, 11:24 AM
Can that node pull down
rancher/shell:v0.1.18
?
r

refined-beach-98863

08/30/2022, 11:28 AM
Hmm, I don't know how I would test that specifically
Huh, that may actually be the problem
Is there a way I can add node selectors to the rancher-operation pods?
It seems only two of the nodes in the cluster have been correctly identified by Rancher?
a

agreeable-oil-87482

08/30/2022, 12:12 PM
Shouldn't need to add node selectors, should just work on any worker node. I'd have a look at `aks-agentpool-18555919-vmss000004`'s
kubelet
logs. You could also test with a random nginx workload and define a nodeselector to
aks-agentpool-18555919-vmss000004
to see if the same happens
What's the health of the pods running on those nodes you don't get metrics for?
r

refined-beach-98863

08/30/2022, 12:27 PM
The pods seem to be running fine
The webservers are responding
I'm not allowed to run
kubectl exec
on them though
I'll try getting a shell into one of the nodes and see what I can find out
Seems to be an aks-issue
kubectl top nodes
also returns status
unknown
And
kubectl debug node/<vm00004>
returns
failed to pull and unpack image
May have to look at this with Microsoft
Thanks for your help though