This message was deleted Rancher Users #harvester

Join Slack

This message was deleted.

# harvester

adamant-kite-43734

03/20/2024, 1:50 AM

This message was deleted.

clean-cpu-90380

03/20/2024, 2:45 AM

hi Rajesh, does other pods have similar behavior like CrashLoopBackOff or not running?

prehistoric-balloon-31801

03/20/2024, 2:48 AM

Are the 3 nodes in the same subnet?

high-alligator-99144

03/20/2024, 3:33 AM

@clean-cpu-90380 No, only one of the 3 pods of longhorn-csi-plugin DaemonSet. Looks like it's of the second node in the cluster (from the above output) -

Copy code

Node:                 harvestnode2/10.221.17.214

@prehistoric-balloon-31801 Yes, all 3 nodes belong to the same subnet. However, VIP was configured with a different subnet (I'll have to check why it's configured like that). Is that a problem?

Copy code

VIP: 10.221.17.213/32
harvestnode1: 10.221.17.211/22
harvestnode2: 10.221.17.214/22
harvestnode3: 10.221.17.215/22

🤔 1

high-alligator-99144

03/20/2024, 3:37 AM

Copy code

harvestnode1:/home/rancher # ip a
95: mgmt-br: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 20:67:7c:e9:fc:0c brd ff:ff:ff:ff:ff:ff
    inet 10.221.17.211/22 brd 10.221.19.255 scope global mgmt-br
       valid_lft forever preferred_lft forever

I think the

harvesternode2

is the leader with Node IP as well as the VIP:

Copy code

harvestnode2:/home/rancher # ip a
31: mgmt-br: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 20:67:7c:ef:fb:78 brd ff:ff:ff:ff:ff:ff
    inet 10.221.17.214/22 brd 10.221.19.255 scope global mgmt-br
       valid_lft forever preferred_lft forever
    inet 10.221.17.213/32 scope global mgmt-br
       valid_lft forever preferred_lft forever

Copy code

harvestnode3:/home/rancher # ip a
27: mgmt-br: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 20:67:7c:e1:df:24 brd ff:ff:ff:ff:ff:ff
    inet 10.221.17.215/22 brd 10.221.19.255 scope global mgmt-br
       valid_lft forever preferred_lft forever

high-alligator-99144

03/20/2024, 5:57 AM

The VIP configuration screen doesn't accept an IP with subnet.

ambitious-daybreak-95996

03/20/2024, 6:10 AM

I think the /32 is OK -- at least, my harvester vip is a /32 as well, and I didn't specify a subnet when installing (although my hosts are all /24, not /22)

ambitious-daybreak-95996

03/20/2024, 6:10 AM

can you ping the VIP?

ambitious-daybreak-95996

03/20/2024, 6:10 AM

also, are the harvester pods all OK? (

kubectl get pods -n harvester-system

)

high-alligator-99144

03/20/2024, 6:51 AM

@ambitious-daybreak-95996 Yes, I see that the k8s cluster is up and running (

kubectl get nodes

show all 3 nodes). Also, all the pods in the

harvester-system

ns are running ok.

prehistoric-balloon-31801

03/20/2024, 7:40 AM

@red-king-19196 any thoughts?

red-king-19196

03/20/2024, 7:52 AM

I once encountered a similar issue when putting nodes in different VLANs, but this case seems not the same. @high-alligator-99144 Could you help verify whether the

longhorn-csi-plugin

is the only failing pod?

high-alligator-99144

03/20/2024, 9:24 AM

@red-king-19196 as I mentioned, only one of the three

longhorn-csi-plugin

was failing (CrashLoopBackOff). Now, we are reinstalling all 3 nodes. Will let you know if the issue persists. Thanks!

👍 1

high-alligator-99144

03/21/2024, 5:03 AM

I was expecting the first 3 nodes to be control plane nodes if we select the "Default Role". But the 2nd node came up as a worker node when we redeployed. Is that fine?

red-king-19196

03/21/2024, 5:05 AM

After the third node joined, the second node will be promoted

✅ 1

high-alligator-99144

03/23/2024, 11:10 AM

Unable to login to Harvester UI (login is timing out). 3 control plane + 3 worker nodes in the cluster:

Copy code

harvestnode1:/home/rancher # kubectl get nodes -o wide
NAME             STATUS   ROLES                       AGE    VERSION           INTERNAL-IP     EXTERNAL-IP   OS-IMAGE           KERNEL-VERSION                  CONTAINER-RUNTIME
harvestnode1     Ready    control-plane,etcd,master   2d2h   v1.27.10+rke2r1   10.230.17.205   <none>        Harvester v1.3.0   5.14.21-150400.24.108-default   <containerd://1.7.11-k3s2>
harvestnode2     Ready    control-plane,etcd,master   47h    v1.27.10+rke2r1   10.230.17.207   <none>        Harvester v1.3.0   5.14.21-150400.24.108-default   <containerd://1.7.11-k3s2>
harvestnode3     Ready    control-plane,etcd,master   47h    v1.27.10+rke2r1   10.230.17.208   <none>        Harvester v1.3.0   5.14.21-150400.24.108-default   <containerd://1.7.11-k3s2>
harvestworker1   Ready    <none>                      24h    v1.27.10+rke2r1   10.230.17.209   <none>        Harvester v1.3.0   5.14.21-150400.24.108-default   <containerd://1.7.11-k3s2>
harvestworker2   Ready    <none>                      41h    v1.27.10+rke2r1   10.230.17.210   <none>        Harvester v1.3.0   5.14.21-150400.24.108-default   <containerd://1.7.11-k3s2>
harvestworker3   Ready    <none>                      47h    v1.27.10+rke2r1   10.230.17.211   <none>        Harvester v1.3.0   5.14.21-150400.24.108-default   <containerd://1.7.11-k3s2>

The grafana pod seems to be waiting on something -

Copy code

arvestnode1:/home/rancher # kubectl get pod  -n cattle-monitoring-system
NAME                                                     READY   STATUS     RESTARTS            AGE
alertmanager-rancher-monitoring-alertmanager-0           2/2     Running    0                   16h
prometheus-rancher-monitoring-prometheus-0               3/3     Running    0                   <invalid>
rancher-monitoring-grafana-d6f466988-87bg9               0/4     Init:0/2   0                   16h
rancher-monitoring-kube-state-metrics-7659b76cc4-lg5rf   1/1     Running    0                   19h
rancher-monitoring-operator-595476bc84-v52lq             1/1     Running    2 (19h ago)         19h
rancher-monitoring-prometheus-adapter-55dc9ccd5d-d6xvk   1/1     Running    3 (19h ago)         19h
rancher-monitoring-prometheus-node-exporter-2lntt        1/1     Running    0                   19h
rancher-monitoring-prometheus-node-exporter-2p4b8        1/1     Running    1 (18h ago)         19h
rancher-monitoring-prometheus-node-exporter-2rmdv        1/1     Running    1 (<invalid> ago)   19h
rancher-monitoring-prometheus-node-exporter-knrkp        1/1     Running    2 (16h ago)         19h
rancher-monitoring-prometheus-node-exporter-tbllh        1/1     Running    0                   19h
rancher-monitoring-prometheus-node-exporter-vpfcw        1/1     Running    0                   19h

Some failures in volume attachment -

Copy code

harvestnode1:/home/rancher # kubectl describe pod rancher-monitoring-grafana-d6f466988-87bg9 -n cattle-monitoring-system
...
Events:
  Type     Reason              Age                              From                     Message
  ----     ------              ----                             ----                     -------
  Warning  FailedAttachVolume  2m40s (x385 over 16h)            attachdetach-controller  AttachVolume.Attach failed for volume "pvc-b1720fdd-daa9-479c-b72a-c0541d9b8823" : rpc error: code = DeadlineExceeded desc = volume pvc-b1720fdd-daa9-479c-b72a-c0541d9b8823 failed to attach to node harvestnode2 with attachmentID csi-b329d93503d85e2d8b2b740acd622783fc1b9a3c6d37c50db7f0c59929a4625c
  Warning  FailedMount         <invalid> (x436 over <invalid>)  kubelet                  Unable to attach or mount volumes: unmounted volumes=[storage], unattached volumes=[storage], failed to process volumes=[]: timed out waiting for the condition

Copy code

harvestnode1:/home/rancher # kubectl get pv,pvc
NAME                                                        CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                                                                                                                     STORAGECLASS         REASON   AGE
persistentvolume/pvc-0e6befd7-c2af-4598-8c22-b2e6b80444e7   5Gi        RWO            Delete           Bound    cattle-monitoring-system/alertmanager-rancher-monitoring-alertmanager-db-alertmanager-rancher-monitoring-alertmanager-0   harvester-longhorn            19h
persistentvolume/pvc-861e4f12-6321-4a8f-9809-447dd42f6586   50Gi       RWO            Delete           Bound    cattle-monitoring-system/prometheus-rancher-monitoring-prometheus-db-prometheus-rancher-monitoring-prometheus-0           harvester-longhorn            19h
persistentvolume/pvc-b1720fdd-daa9-479c-b72a-c0541d9b8823   2Gi        RWO            Delete           Bound    cattle-monitoring-system/rancher-monitoring-grafana                                                                       harvester-longhorn            18h

Open in Slack

Previous Next