This message was deleted.
# general
a
This message was deleted.
a
Describe the pod - should allude to why it's pending
r
I've tried, but cannot find anything useful
Copy code
Name:         helm-operation-fv7pk
Namespace:    cattle-system
Priority:     0
Node:         aks-agentpool-18555919-vmss000004/10.240.0.7
Start Time:   Tue, 30 Aug 2022 10:54:33 +0000
Labels:       <http://pod-impersonation.cattle.io/token=bszj5k7hcl4vtnpt6r2v2kmp5rfs24ckd5jc5hzjn2gwgqtrr4mlnb|pod-impersonation.cattle.io/token=bszj5k7hcl4vtnpt6r2v2kmp5rfs24ckd5jc5hzjn2gwgqtrr4mlnb>
Annotations:  <http://pod-impersonation.cattle.io/cluster-role|pod-impersonation.cattle.io/cluster-role>: pod-impersonation-helm-op-k29lv
Status:       Pending
IP:
IPs:          <none>
Containers:
  helm:
    Container ID:
    Image:         rancher/shell:v0.1.18
    Image ID:
    Port:          <none>
    Host Port:     <none>
    Command:
      helm-cmd
    State:          Waiting
      Reason:       ContainerCreating
    Ready:          False
    Restart Count:  0
    Environment:
      KUBECONFIG:  /home/shell/.kube/config
    Mounts:
      /home/shell/.kube/config from user-kubeconfig (ro,path="config")
      /home/shell/helm from data (ro)
  proxy:
    Container ID:
    Image:         rancher/shell:v0.1.18
    Image ID:
    Port:          <none>
    Host Port:     <none>
    Command:
      sh
      -c
      kubectl proxy --disable-filter || true
    State:          Waiting
      Reason:       ContainerCreating
    Ready:          False
    Restart Count:  0
    Environment:
      KUBECONFIG:  /root/.kube/config
    Mounts:
      /root/.kube/config from admin-kubeconfig (ro,path="config")
      /var/run/secrets/kubernetes.io/serviceaccount from pod-impersonation-helm-op-ntwmn-token (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  data:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  helm-operation-8vdtr
    Optional:    false
  admin-kubeconfig:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      impersonation-helm-op-admin-kubeconfig-k269r
    Optional:  false
  user-kubeconfig:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      impersonation-helm-op-user-kubeconfig-gdz2q
    Optional:  false
  pod-impersonation-helm-op-ntwmn-token:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  pod-impersonation-helm-op-ntwmn-token
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <http://kubernetes.io/os=linux|kubernetes.io/os=linux>
Tolerations:     <http://cattle.io/os=linux:NoSchedule|cattle.io/os=linux:NoSchedule>
                 <http://node-role.kubernetes.io/controlplane=true:NoSchedule|node-role.kubernetes.io/controlplane=true:NoSchedule>
                 <http://node-role.kubernetes.io/etcd=true:NoExecute|node-role.kubernetes.io/etcd=true:NoExecute>
                 <http://node.cloudprovider.kubernetes.io/uninitialized=true:NoSchedule|node.cloudprovider.kubernetes.io/uninitialized=true:NoSchedule>
                 <http://node.kubernetes.io/not-ready:NoExecute|node.kubernetes.io/not-ready:NoExecute> for 300s
                 <http://node.kubernetes.io/unreachable:NoExecute|node.kubernetes.io/unreachable:NoExecute> for 300s
Events:
  Type    Reason     Age        From  Message
  ----    ------     ----       ----  -------
  Normal  Scheduled  <unknown>        Successfully assigned cattle-system/helm-operation-fv7pk to aks-agentpool-18555919-vmss000004
Just
Scheduled
Applying a busybox to the cluster through
kubectl
works correctly.
a
Can that node pull down
rancher/shell:v0.1.18
?
r
Hmm, I don't know how I would test that specifically
Huh, that may actually be the problem
Is there a way I can add node selectors to the rancher-operation pods?
Can it be because of this?
It seems only two of the nodes in the cluster have been correctly identified by Rancher?
a
Shouldn't need to add node selectors, should just work on any worker node. I'd have a look at `aks-agentpool-18555919-vmss000004`'s
kubelet
logs. You could also test with a random nginx workload and define a nodeselector to
aks-agentpool-18555919-vmss000004
to see if the same happens
What's the health of the pods running on those nodes you don't get metrics for?
r
The pods seem to be running fine
The webservers are responding
I'm not allowed to run
kubectl exec
on them though
I'll try getting a shell into one of the nodes and see what I can find out
Seems to be an aks-issue
kubectl top nodes
also returns status
unknown
And
kubectl debug node/<vm00004>
returns
failed to pull and unpack image
May have to look at this with Microsoft
Thanks for your help though