This message was deleted.
# harvester
a
This message was deleted.
c
hi Rajesh, does other pods have similar behavior like CrashLoopBackOff or not running?
p
Are the 3 nodes in the same subnet?
h
@clean-cpu-90380 No, only one of the 3 pods of longhorn-csi-plugin DaemonSet. Looks like it's of the second node in the cluster (from the above output) -
Copy code
Node:                 harvestnode2/10.221.17.214
@prehistoric-balloon-31801 Yes, all 3 nodes belong to the same subnet. However, VIP was configured with a different subnet (I'll have to check why it's configured like that). Is that a problem?
Copy code
VIP: 10.221.17.213/32
harvestnode1: 10.221.17.211/22
harvestnode2: 10.221.17.214/22
harvestnode3: 10.221.17.215/22
🤔 1
Copy code
harvestnode1:/home/rancher # ip a
95: mgmt-br: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 20:67:7c:e9:fc:0c brd ff:ff:ff:ff:ff:ff
    inet 10.221.17.211/22 brd 10.221.19.255 scope global mgmt-br
       valid_lft forever preferred_lft forever
I think the
harvesternode2
is the leader with Node IP as well as the VIP:
Copy code
harvestnode2:/home/rancher # ip a
31: mgmt-br: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 20:67:7c:ef:fb:78 brd ff:ff:ff:ff:ff:ff
    inet 10.221.17.214/22 brd 10.221.19.255 scope global mgmt-br
       valid_lft forever preferred_lft forever
    inet 10.221.17.213/32 scope global mgmt-br
       valid_lft forever preferred_lft forever
Copy code
harvestnode3:/home/rancher # ip a
27: mgmt-br: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 20:67:7c:e1:df:24 brd ff:ff:ff:ff:ff:ff
    inet 10.221.17.215/22 brd 10.221.19.255 scope global mgmt-br
       valid_lft forever preferred_lft forever
The VIP configuration screen doesn't accept an IP with subnet.
a
I think the /32 is OK -- at least, my harvester vip is a /32 as well, and I didn't specify a subnet when installing (although my hosts are all /24, not /22)
can you ping the VIP?
also, are the harvester pods all OK? (
kubectl get pods -n harvester-system
)
h
@ambitious-daybreak-95996 Yes, I see that the k8s cluster is up and running (
kubectl get nodes
show all 3 nodes). Also, all the pods in the
harvester-system
ns are running ok.
p
@red-king-19196 any thoughts?
r
I once encountered a similar issue when putting nodes in different VLANs, but this case seems not the same. @high-alligator-99144 Could you help verify whether the
longhorn-csi-plugin
is the only failing pod?
h
@red-king-19196 as I mentioned, only one of the three
longhorn-csi-plugin
was failing (CrashLoopBackOff). Now, we are reinstalling all 3 nodes. Will let you know if the issue persists. Thanks!
👍 1
I was expecting the first 3 nodes to be control plane nodes if we select the "Default Role". But the 2nd node came up as a worker node when we redeployed. Is that fine?
r
After the third node joined, the second node will be promoted
1
h
Unable to login to Harvester UI (login is timing out). 3 control plane + 3 worker nodes in the cluster:
Copy code
harvestnode1:/home/rancher # kubectl get nodes -o wide
NAME             STATUS   ROLES                       AGE    VERSION           INTERNAL-IP     EXTERNAL-IP   OS-IMAGE           KERNEL-VERSION                  CONTAINER-RUNTIME
harvestnode1     Ready    control-plane,etcd,master   2d2h   v1.27.10+rke2r1   10.230.17.205   <none>        Harvester v1.3.0   5.14.21-150400.24.108-default   <containerd://1.7.11-k3s2>
harvestnode2     Ready    control-plane,etcd,master   47h    v1.27.10+rke2r1   10.230.17.207   <none>        Harvester v1.3.0   5.14.21-150400.24.108-default   <containerd://1.7.11-k3s2>
harvestnode3     Ready    control-plane,etcd,master   47h    v1.27.10+rke2r1   10.230.17.208   <none>        Harvester v1.3.0   5.14.21-150400.24.108-default   <containerd://1.7.11-k3s2>
harvestworker1   Ready    <none>                      24h    v1.27.10+rke2r1   10.230.17.209   <none>        Harvester v1.3.0   5.14.21-150400.24.108-default   <containerd://1.7.11-k3s2>
harvestworker2   Ready    <none>                      41h    v1.27.10+rke2r1   10.230.17.210   <none>        Harvester v1.3.0   5.14.21-150400.24.108-default   <containerd://1.7.11-k3s2>
harvestworker3   Ready    <none>                      47h    v1.27.10+rke2r1   10.230.17.211   <none>        Harvester v1.3.0   5.14.21-150400.24.108-default   <containerd://1.7.11-k3s2>
The grafana pod seems to be waiting on something -
Copy code
arvestnode1:/home/rancher # kubectl get pod  -n cattle-monitoring-system
NAME                                                     READY   STATUS     RESTARTS            AGE
alertmanager-rancher-monitoring-alertmanager-0           2/2     Running    0                   16h
prometheus-rancher-monitoring-prometheus-0               3/3     Running    0                   <invalid>
rancher-monitoring-grafana-d6f466988-87bg9               0/4     Init:0/2   0                   16h
rancher-monitoring-kube-state-metrics-7659b76cc4-lg5rf   1/1     Running    0                   19h
rancher-monitoring-operator-595476bc84-v52lq             1/1     Running    2 (19h ago)         19h
rancher-monitoring-prometheus-adapter-55dc9ccd5d-d6xvk   1/1     Running    3 (19h ago)         19h
rancher-monitoring-prometheus-node-exporter-2lntt        1/1     Running    0                   19h
rancher-monitoring-prometheus-node-exporter-2p4b8        1/1     Running    1 (18h ago)         19h
rancher-monitoring-prometheus-node-exporter-2rmdv        1/1     Running    1 (<invalid> ago)   19h
rancher-monitoring-prometheus-node-exporter-knrkp        1/1     Running    2 (16h ago)         19h
rancher-monitoring-prometheus-node-exporter-tbllh        1/1     Running    0                   19h
rancher-monitoring-prometheus-node-exporter-vpfcw        1/1     Running    0                   19h
Some failures in volume attachment -
Copy code
harvestnode1:/home/rancher # kubectl describe pod rancher-monitoring-grafana-d6f466988-87bg9 -n cattle-monitoring-system
...
Events:
  Type     Reason              Age                              From                     Message
  ----     ------              ----                             ----                     -------
  Warning  FailedAttachVolume  2m40s (x385 over 16h)            attachdetach-controller  AttachVolume.Attach failed for volume "pvc-b1720fdd-daa9-479c-b72a-c0541d9b8823" : rpc error: code = DeadlineExceeded desc = volume pvc-b1720fdd-daa9-479c-b72a-c0541d9b8823 failed to attach to node harvestnode2 with attachmentID csi-b329d93503d85e2d8b2b740acd622783fc1b9a3c6d37c50db7f0c59929a4625c
  Warning  FailedMount         <invalid> (x436 over <invalid>)  kubelet                  Unable to attach or mount volumes: unmounted volumes=[storage], unattached volumes=[storage], failed to process volumes=[]: timed out waiting for the condition
Copy code
harvestnode1:/home/rancher # kubectl get pv,pvc
NAME                                                        CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                                                                                                                     STORAGECLASS         REASON   AGE
persistentvolume/pvc-0e6befd7-c2af-4598-8c22-b2e6b80444e7   5Gi        RWO            Delete           Bound    cattle-monitoring-system/alertmanager-rancher-monitoring-alertmanager-db-alertmanager-rancher-monitoring-alertmanager-0   harvester-longhorn            19h
persistentvolume/pvc-861e4f12-6321-4a8f-9809-447dd42f6586   50Gi       RWO            Delete           Bound    cattle-monitoring-system/prometheus-rancher-monitoring-prometheus-db-prometheus-rancher-monitoring-prometheus-0           harvester-longhorn            19h
persistentvolume/pvc-b1720fdd-daa9-479c-b72a-c0541d9b8823   2Gi        RWO            Delete           Bound    cattle-monitoring-system/rancher-monitoring-grafana                                                                       harvester-longhorn            18h