adamant-kite-43734
03/20/2024, 1:50 AMclean-cpu-90380
03/20/2024, 2:45 AMprehistoric-balloon-31801
03/20/2024, 2:48 AMhigh-alligator-99144
03/20/2024, 3:33 AMNode: harvestnode2/10.221.17.214
@prehistoric-balloon-31801 Yes, all 3 nodes belong to the same subnet. However, VIP was configured with a different subnet (I'll have to check why it's configured like that). Is that a problem?
VIP: 10.221.17.213/32
harvestnode1: 10.221.17.211/22
harvestnode2: 10.221.17.214/22
harvestnode3: 10.221.17.215/22
high-alligator-99144
03/20/2024, 3:37 AMharvestnode1:/home/rancher # ip a
95: mgmt-br: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether 20:67:7c:e9:fc:0c brd ff:ff:ff:ff:ff:ff
inet 10.221.17.211/22 brd 10.221.19.255 scope global mgmt-br
valid_lft forever preferred_lft forever
I think the harvesternode2
is the leader with Node IP as well as the VIP:
harvestnode2:/home/rancher # ip a
31: mgmt-br: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether 20:67:7c:ef:fb:78 brd ff:ff:ff:ff:ff:ff
inet 10.221.17.214/22 brd 10.221.19.255 scope global mgmt-br
valid_lft forever preferred_lft forever
inet 10.221.17.213/32 scope global mgmt-br
valid_lft forever preferred_lft forever
harvestnode3:/home/rancher # ip a
27: mgmt-br: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether 20:67:7c:e1:df:24 brd ff:ff:ff:ff:ff:ff
inet 10.221.17.215/22 brd 10.221.19.255 scope global mgmt-br
valid_lft forever preferred_lft forever
high-alligator-99144
03/20/2024, 5:57 AMambitious-daybreak-95996
03/20/2024, 6:10 AMambitious-daybreak-95996
03/20/2024, 6:10 AMambitious-daybreak-95996
03/20/2024, 6:10 AMkubectl get pods -n harvester-system
)high-alligator-99144
03/20/2024, 6:51 AMkubectl get nodes
show all 3 nodes). Also, all the pods in the harvester-system
ns are running ok.prehistoric-balloon-31801
03/20/2024, 7:40 AMred-king-19196
03/20/2024, 7:52 AMlonghorn-csi-plugin
is the only failing pod?high-alligator-99144
03/20/2024, 9:24 AMlonghorn-csi-plugin
was failing (CrashLoopBackOff).
Now, we are reinstalling all 3 nodes. Will let you know if the issue persists. Thanks!high-alligator-99144
03/21/2024, 5:03 AMred-king-19196
03/21/2024, 5:05 AMhigh-alligator-99144
03/23/2024, 11:10 AMharvestnode1:/home/rancher # kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
harvestnode1 Ready control-plane,etcd,master 2d2h v1.27.10+rke2r1 10.230.17.205 <none> Harvester v1.3.0 5.14.21-150400.24.108-default <containerd://1.7.11-k3s2>
harvestnode2 Ready control-plane,etcd,master 47h v1.27.10+rke2r1 10.230.17.207 <none> Harvester v1.3.0 5.14.21-150400.24.108-default <containerd://1.7.11-k3s2>
harvestnode3 Ready control-plane,etcd,master 47h v1.27.10+rke2r1 10.230.17.208 <none> Harvester v1.3.0 5.14.21-150400.24.108-default <containerd://1.7.11-k3s2>
harvestworker1 Ready <none> 24h v1.27.10+rke2r1 10.230.17.209 <none> Harvester v1.3.0 5.14.21-150400.24.108-default <containerd://1.7.11-k3s2>
harvestworker2 Ready <none> 41h v1.27.10+rke2r1 10.230.17.210 <none> Harvester v1.3.0 5.14.21-150400.24.108-default <containerd://1.7.11-k3s2>
harvestworker3 Ready <none> 47h v1.27.10+rke2r1 10.230.17.211 <none> Harvester v1.3.0 5.14.21-150400.24.108-default <containerd://1.7.11-k3s2>
The grafana pod seems to be waiting on something -
arvestnode1:/home/rancher # kubectl get pod -n cattle-monitoring-system
NAME READY STATUS RESTARTS AGE
alertmanager-rancher-monitoring-alertmanager-0 2/2 Running 0 16h
prometheus-rancher-monitoring-prometheus-0 3/3 Running 0 <invalid>
rancher-monitoring-grafana-d6f466988-87bg9 0/4 Init:0/2 0 16h
rancher-monitoring-kube-state-metrics-7659b76cc4-lg5rf 1/1 Running 0 19h
rancher-monitoring-operator-595476bc84-v52lq 1/1 Running 2 (19h ago) 19h
rancher-monitoring-prometheus-adapter-55dc9ccd5d-d6xvk 1/1 Running 3 (19h ago) 19h
rancher-monitoring-prometheus-node-exporter-2lntt 1/1 Running 0 19h
rancher-monitoring-prometheus-node-exporter-2p4b8 1/1 Running 1 (18h ago) 19h
rancher-monitoring-prometheus-node-exporter-2rmdv 1/1 Running 1 (<invalid> ago) 19h
rancher-monitoring-prometheus-node-exporter-knrkp 1/1 Running 2 (16h ago) 19h
rancher-monitoring-prometheus-node-exporter-tbllh 1/1 Running 0 19h
rancher-monitoring-prometheus-node-exporter-vpfcw 1/1 Running 0 19h
Some failures in volume attachment -
harvestnode1:/home/rancher # kubectl describe pod rancher-monitoring-grafana-d6f466988-87bg9 -n cattle-monitoring-system
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedAttachVolume 2m40s (x385 over 16h) attachdetach-controller AttachVolume.Attach failed for volume "pvc-b1720fdd-daa9-479c-b72a-c0541d9b8823" : rpc error: code = DeadlineExceeded desc = volume pvc-b1720fdd-daa9-479c-b72a-c0541d9b8823 failed to attach to node harvestnode2 with attachmentID csi-b329d93503d85e2d8b2b740acd622783fc1b9a3c6d37c50db7f0c59929a4625c
Warning FailedMount <invalid> (x436 over <invalid>) kubelet Unable to attach or mount volumes: unmounted volumes=[storage], unattached volumes=[storage], failed to process volumes=[]: timed out waiting for the condition
harvestnode1:/home/rancher # kubectl get pv,pvc
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
persistentvolume/pvc-0e6befd7-c2af-4598-8c22-b2e6b80444e7 5Gi RWO Delete Bound cattle-monitoring-system/alertmanager-rancher-monitoring-alertmanager-db-alertmanager-rancher-monitoring-alertmanager-0 harvester-longhorn 19h
persistentvolume/pvc-861e4f12-6321-4a8f-9809-447dd42f6586 50Gi RWO Delete Bound cattle-monitoring-system/prometheus-rancher-monitoring-prometheus-db-prometheus-rancher-monitoring-prometheus-0 harvester-longhorn 19h
persistentvolume/pvc-b1720fdd-daa9-479c-b72a-c0541d9b8823 2Gi RWO Delete Bound cattle-monitoring-system/rancher-monitoring-grafana harvester-longhorn 18h