This message was deleted.
# harvester
a
This message was deleted.
a
are those 3 nodes all bare-metal servers? the node IP, static configured or DHCP ? can you ssh into harv-node3 and execute
ip link
,
ip addr
,
ps aux
etc to have a basic check
h
Yes, all 3 are baremetals.
Copy code
harv-node3:~ # ip link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eno1: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master mgmt-bo state UP mode DEFAULT group default qlen 1000
    link/ether 94:18:82:80:ed:34 brd ff:ff:ff:ff:ff:ff
    altname enp2s0f0
3: eno2: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN mode DEFAULT group default qlen 1000
    link/ether 94:18:82:80:ed:35 brd ff:ff:ff:ff:ff:ff
    altname enp2s0f1
4: eno3: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN mode DEFAULT group default qlen 1000
    link/ether 94:18:82:80:ed:36 brd ff:ff:ff:ff:ff:ff
    altname enp2s0f2
5: eno4: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN mode DEFAULT group default qlen 1000
    link/ether 94:18:82:80:ed:37 brd ff:ff:ff:ff:ff:ff
    altname enp2s0f3
6: eno49: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether 5c:b9:01:88:d3:58 brd ff:ff:ff:ff:ff:ff
    altname enp4s0f0
7: eno50: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether 5c:b9:01:88:d3:59 brd ff:ff:ff:ff:ff:ff
    altname enp4s0f1
8: mgmt-br: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether 94:18:82:80:ed:34 brd ff:ff:ff:ff:ff:ff
9: mgmt-bo: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue master mgmt-br state UP mode DEFAULT group default qlen 1000
    link/ether 94:18:82:80:ed:34 brd ff:ff:ff:ff:ff:ff
12: cali065d8d4f96d@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP mode DEFAULT group default
    link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netns cni-9e2ed625-fae0-49dc-ec12-caefbe4c6452
13: cali34b4da13991@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP mode DEFAULT group default
    link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netns cni-b7e2ea77-20ad-902a-96f0-fb3f945dcb51
14: cali41cee558337@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP mode DEFAULT group default
    link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netns cni-32c3828b-8058-faba-af4e-c0a60199bd26
15: cali6621a458522@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP mode DEFAULT group default
    link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netns cni-71ba6db1-a75b-2569-4714-6799e8f5324e
16: cali1c6b0251bea@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP mode DEFAULT group default
    link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netns cni-8d91872a-d179-3049-0d44-5b3e97beee3b
17: calib8cacd0fc8f@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP mode DEFAULT group default
    link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netns cni-8041579e-7853-2514-034f-d960709184eb
18: calicd6e7ba08c7@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP mode DEFAULT group default
    link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netns cni-831602f1-cfb3-c053-586b-ba6a7a923d81
19: cali5d7c5499bb2@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP mode DEFAULT group default
    link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netns cni-d57fe904-0c89-60ba-7b84-f9eaaf11c354
20: cali05c2796687d@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP mode DEFAULT group default
    link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netns cni-07505d94-0798-d934-2dc5-97fc5a482496
21: cali00b0b1be40c@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP mode DEFAULT group default
    link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netns cni-0ede8ff7-25d7-4fdc-0d5f-f355adc666e1
22: cali24516197f54@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP mode DEFAULT group default
    link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netns cni-7a23c03c-da43-72e0-d324-6da4750af0e5
23: flannel.1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN mode DEFAULT group default
    link/ether ee:be:f7:67:6b:1a brd ff:ff:ff:ff:ff:ff
24: califf19f5f2263@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP mode DEFAULT group default
    link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netns cni-4881797c-d04a-1565-2364-9056722aaccd
25: cali58f8343a0b8@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP mode DEFAULT group default
    link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netns cni-632ba981-951a-b1e5-760a-f667a0106603
26: caliefc9e4a102d@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP mode DEFAULT group default
    link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netns cni-ed915ed3-684f-abdd-fc19-1b052b22e1af
27: calicf76bb2beb9@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP mode DEFAULT group default
    link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netns cni-b80d12cc-6350-c79a-f6d1-67e95d6eed80
Copy code
harv-node3:~ # ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eno1: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master mgmt-bo state UP group default qlen 1000
    link/ether 94:18:82:80:ed:34 brd ff:ff:ff:ff:ff:ff
    altname enp2s0f0
3: eno2: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000
    link/ether 94:18:82:80:ed:35 brd ff:ff:ff:ff:ff:ff
    altname enp2s0f1
4: eno3: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000
    link/ether 94:18:82:80:ed:36 brd ff:ff:ff:ff:ff:ff
    altname enp2s0f2
5: eno4: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000
    link/ether 94:18:82:80:ed:37 brd ff:ff:ff:ff:ff:ff
    altname enp2s0f3
6: eno49: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 5c:b9:01:88:d3:58 brd ff:ff:ff:ff:ff:ff
    altname enp4s0f0
    inet6 fe80::5eb9:1ff:fe88:d358/64 scope link
       valid_lft forever preferred_lft forever
7: eno50: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 5c:b9:01:88:d3:59 brd ff:ff:ff:ff:ff:ff
    altname enp4s0f1
    inet6 fe80::5eb9:1ff:fe88:d359/64 scope link
       valid_lft forever preferred_lft forever
8: mgmt-br: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 94:18:82:80:ed:34 brd ff:ff:ff:ff:ff:ff
    inet 172.26.50.138/27 brd 172.26.50.159 scope global mgmt-br
       valid_lft forever preferred_lft forever
    inet6 fe80::9618:82ff:fe80:ed34/64 scope link
       valid_lft forever preferred_lft forever
9: mgmt-bo: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue master mgmt-br state UP group default qlen 1000
    link/ether 94:18:82:80:ed:34 brd ff:ff:ff:ff:ff:ff
12: cali065d8d4f96d@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default
    link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netns cni-9e2ed625-fae0-49dc-ec12-caefbe4c6452
    inet6 fe80::ecee:eeff:feee:eeee/64 scope link
       valid_lft forever preferred_lft forever
13: cali34b4da13991@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default
    link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netns cni-b7e2ea77-20ad-902a-96f0-fb3f945dcb51
    inet6 fe80::ecee:eeff:feee:eeee/64 scope link
       valid_lft forever preferred_lft forever
14: cali41cee558337@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default
    link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netns cni-32c3828b-8058-faba-af4e-c0a60199bd26
    inet6 fe80::ecee:eeff:feee:eeee/64 scope link
       valid_lft forever preferred_lft forever
15: cali6621a458522@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default
    link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netns cni-71ba6db1-a75b-2569-4714-6799e8f5324e
    inet6 fe80::ecee:eeff:feee:eeee/64 scope link
       valid_lft forever preferred_lft forever
16: cali1c6b0251bea@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default
    link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netns cni-8d91872a-d179-3049-0d44-5b3e97beee3b
    inet6 fe80::ecee:eeff:feee:eeee/64 scope link
       valid_lft forever preferred_lft forever
17: calib8cacd0fc8f@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default
    link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netns cni-8041579e-7853-2514-034f-d960709184eb
    inet6 fe80::ecee:eeff:feee:eeee/64 scope link
       valid_lft forever preferred_lft forever
18: calicd6e7ba08c7@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default
    link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netns cni-831602f1-cfb3-c053-586b-ba6a7a923d81
    inet6 fe80::ecee:eeff:feee:eeee/64 scope link
       valid_lft forever preferred_lft forever
19: cali5d7c5499bb2@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default
    link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netns cni-d57fe904-0c89-60ba-7b84-f9eaaf11c354
    inet6 fe80::ecee:eeff:feee:eeee/64 scope link
       valid_lft forever preferred_lft forever
20: cali05c2796687d@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default
    link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netns cni-07505d94-0798-d934-2dc5-97fc5a482496
    inet6 fe80::ecee:eeff:feee:eeee/64 scope link
       valid_lft forever preferred_lft forever
21: cali00b0b1be40c@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default
    link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netns cni-0ede8ff7-25d7-4fdc-0d5f-f355adc666e1
    inet6 fe80::ecee:eeff:feee:eeee/64 scope link
       valid_lft forever preferred_lft forever
22: cali24516197f54@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default
    link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netns cni-7a23c03c-da43-72e0-d324-6da4750af0e5
    inet6 fe80::ecee:eeff:feee:eeee/64 scope link
       valid_lft forever preferred_lft forever
23: flannel.1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN group default
    link/ether ee:be:f7:67:6b:1a brd ff:ff:ff:ff:ff:ff
    inet 10.52.2.0/32 scope global flannel.1
       valid_lft forever preferred_lft forever
    inet6 fe80::ecbe:f7ff:fe67:6b1a/64 scope link
       valid_lft forever preferred_lft forever
24: califf19f5f2263@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default
    link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netns cni-4881797c-d04a-1565-2364-9056722aaccd
    inet6 fe80::ecee:eeff:feee:eeee/64 scope link
       valid_lft forever preferred_lft forever
25: cali58f8343a0b8@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default
    link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netns cni-632ba981-951a-b1e5-760a-f667a0106603
    inet6 fe80::ecee:eeff:feee:eeee/64 scope link
       valid_lft forever preferred_lft forever
26: caliefc9e4a102d@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default
    link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netns cni-ed915ed3-684f-abdd-fc19-1b052b22e1af
    inet6 fe80::ecee:eeff:feee:eeee/64 scope link
       valid_lft forever preferred_lft forever
27: calicf76bb2beb9@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default
    link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netns cni-b80d12cc-6350-c79a-f6d1-67e95d6eed80
    inet6 fe80::ecee:eeff:feee:eeee/64 scope link
       valid_lft forever preferred_lft forever
All IPs are static
a
please add
systemctl status rancher-system-agent.service
systemctl status rancherd.service
systemctl status rke2-agent.service
systemctl status rke2-server.service
ps aux | grep kubelet
please also try to
ping 172.26.50.135
from
harv-node3
h
Copy code
harv-node3:~ # ping 172.26.50.135
PING 172.26.50.135 (172.26.50.135) 56(84) bytes of data.
64 bytes from 172.26.50.135: icmp_seq=1 ttl=64 time=0.155 ms
64 bytes from 172.26.50.135: icmp_seq=2 ttl=64 time=0.169 ms

harv-node3:~ # systemctl status rancher-system-agent.service
● rancher-system-agent.service - Rancher System Agent
     Loaded: loaded (/etc/systemd/system/rancher-system-agent.service; enabled; vendor preset: disabled)
     Active: active (running) since Thu 2022-10-20 11:47:30 UTC; 22h ago

harv-node3:~ # systemctl status rancherd.service
● rancherd.service - Rancher Bootstrap
     Loaded: loaded (/lib/systemd/system/rancherd.service; enabled; vendor preset: enabled)
     Active: inactive (dead) since Thu 2022-10-20 11:47:43 UTC; 22h ago
       Docs: <https://github.com/rancher/rancherd>
   Main PID: 2676 (code=exited, status=0/SUCCESS)

Oct 20 11:47:30 harv-node3 rancherd[2676]: time="2022-10-20T11:47:30Z" level=info msg="[stdout]: [INFO]  Starting/restarti>
Oct 20 11:47:30 harv-node3 rancherd[2676]: time="2022-10-20T11:47:30Z" level=info msg="No image provided, creating empty w>
Oct 20 11:47:30 harv-node3 rancherd[2676]: time="2022-10-20T11:47:30Z" level=info msg="Running command: /usr/bin/rancherd >
Oct 20 11:47:30 harv-node3 rancherd[2676]: time="2022-10-20T11:47:30Z" level=info msg="[stderr]: time=\"2022-10-20T11:47:3>
Oct 20 11:47:31 harv-node3 rancherd[2676]: time="2022-10-20T11:47:31Z" level=info msg="[stderr]: time=\"2022-10-20T11:47:3>
Oct 20 11:47:43 harv-node3 rancherd[2676]: time="2022-10-20T11:47:43Z" level=info msg="[stderr]: time=\"2022-10-20T11:47:4>
Oct 20 11:47:43 harv-node3 rancherd[2676]: time="2022-10-20T11:47:43Z" level=info msg="[stderr]: time=\"2022-10-20T11:47:4>
Oct 20 11:47:43 harv-node3 rancherd[2676]: time="2022-10-20T11:47:43Z" level=info msg="Successfully Bootstrapped Rancher (>
Oct 20 11:47:43 harv-node3 systemd[1]: rancherd.service: Succeeded.
Oct 20 11:47:43 harv-node3 systemd[1]: Finished Rancher Bootstrap.


harv-node3:~ # systemctl status rke2-agent.service
● rke2-agent.service - Rancher Kubernetes Engine v2 (agent)
     Loaded: loaded (/usr/local/lib/systemd/system/rke2-agent.service; disabled; vendor preset: disabled)
    Drop-In: /etc/systemd/system/rke2-agent.service.d
             └─override.conf
     Active: inactive (dead)
       Docs: <https://github.com/rancher/rke2#readme>

Oct 20 18:37:01 harv-node3 systemd[1]: rke2-agent.service: Unit process 7721 (containerd-shim) remains running after unit >
Oct 20 18:37:01 harv-node3 systemd[1]: rke2-agent.service: Unit process 7758 (containerd-shim) remains running after unit >
Oct 20 18:37:01 harv-node3 systemd[1]: rke2-agent.service: Unit process 7774 (containerd-shim) remains running after unit >
Oct 20 18:37:01 harv-node3 systemd[1]: rke2-agent.service: Unit process 9693 (containerd-shim) remains running after unit >
Oct 20 18:37:01 harv-node3 systemd[1]: rke2-agent.service: Unit process 9752 (containerd-shim) remains running after unit >
Oct 20 18:37:01 harv-node3 systemd[1]: rke2-agent.service: Unit process 19512 (containerd-shim) remains running after unit>
Oct 20 18:37:01 harv-node3 systemd[1]: rke2-agent.service: Unit process 19658 (containerd-shim) remains running after unit>
Oct 20 18:37:01 harv-node3 systemd[1]: rke2-agent.service: Unit process 19700 (containerd-shim) remains running after unit>
Oct 20 18:37:01 harv-node3 systemd[1]: rke2-agent.service: Unit process 46440 (containerd-shim) remains running after unit>
Oct 20 18:37:01 harv-node3 systemd[1]: Stopped Rancher Kubernetes Engine v2 (agent).


harv-node3:~ # systemctl status rke2-server.service
● rke2-server.service - Rancher Kubernetes Engine v2 (server)
     Loaded: loaded (/usr/local/lib/systemd/system/rke2-server.service; enabled; vendor preset: disabled)
    Drop-In: /etc/systemd/system/rke2-server.service.d
             └─override.conf
     Active: activating (start) since Fri 2022-10-21 10:29:19 UTC; 4min 16s ago
       Docs: <https://github.com/rancher/rke2#readme>
    Process: 32105 ExecStartPre=/bin/sh -xc ! /usr/bin/systemctl is-enabled --quiet nm-cloud-setup.service (code=exited, s>
    Process: 32107 ExecStartPre=/sbin/modprobe br_netfilter (code=exited, status=0/SUCCESS)
    Process: 32108 ExecStartPre=/sbin/modprobe overlay (code=exited, status=0/SUCCESS)
    Process: 32109 ExecStartPre=/usr/sbin/harv-update-rke2-server-url server (code=exited, status=0/SUCCESS)
   Main PID: 32111 (rke2)
      Tasks: 166
     CGroup: /system.slice/rke2-server.service
             ├─ 1352 /var/lib/rancher/rke2/data/v1.24.6-rke2r1-c4dcecfdb87b/bin/containerd-shim-runc-v2 -namespace <http://k8s.io|k8s.io> >
             ├─32111 /usr/local/bin/rke2 server
             ├─32156 containerd -c /var/lib/rancher/rke2/agent/etc/containerd/config.toml -a /run/k3s/containerd/container>
             └─32235 kubelet --volume-plugin-dir=/var/lib/kubelet/volumeplugins --file-check-frequency=5s --sync-frequency>

Oct 21 10:33:16 harv-node3 rke2[32111]: time="2022-10-21T10:33:16Z" level=info msg="Waiting to retrieve kube-proxy configu>
Oct 21 10:33:20 harv-node3 rke2[32111]: time="2022-10-21T10:33:20Z" level=info msg="Waiting for API server to become avail>
Oct 21 10:33:20 harv-node3 rke2[32111]: time="2022-10-21T10:33:20Z" level=info msg="Waiting for etcd server to become avai>
Oct 21 10:33:20 harv-node3 rke2[32111]: {"level":"warn","ts":"2022-10-21T10:33:20.417Z","logger":"etcd-client","caller":"v>
Oct 21 10:33:20 harv-node3 rke2[32111]: time="2022-10-21T10:33:20Z" level=info msg="Failed to test data store connection: >
Oct 21 10:33:21 harv-node3 rke2[32111]: {"level":"warn","ts":"2022-10-21T10:33:21.617Z","logger":"etcd-client","caller":"v>
Oct 21 10:33:21 harv-node3 rke2[32111]: time="2022-10-21T10:33:21Z" level=error msg="Failed to check local etcd status for>
Oct 21 10:33:21 harv-node3 rke2[32111]: time="2022-10-21T10:33:21Z" level=info msg="Waiting to retrieve kube-proxy configu>
Oct 21 10:33:26 harv-node3 rke2[32111]: time="2022-10-21T10:33:26Z" level=info msg="Waiting to retrieve kube-proxy configu>
Oct 21 10:33:31 harv-node3 rke2[32111]: time="2022-10-21T10:33:31Z" level=info msg="Waiting to retrieve kube-proxy configu>


harv-node3:~ # ps aux | grep kubelet
nobody    3664  0.6  0.0 727580 28340 ?        Ssl  Oct20   9:24 /bin/node_exporter --path.procfs=/host/proc --path.sysfs=/host/sys --path.rootfs=/host/root --web.listen-address=0.0.0.0:9796 --collector.filesystem.ignored-mount-points=^/(dev|proc|sys|var/lib/docker/.+|var/lib/kubelet/.+)($|/) --collector.filesystem.ignored-fs-types=^(autofs|binfmt_misc|bpf|cgroup2?|configfs|debugfs|devpts|devtmpfs|fusectl|hugetlbfs|iso9660|mqueue|nsfs|overlay|proc|procfs|pstore|rpc_pipefs|securityfs|selinuxfs|squashfs|sysfs|tracefs)$
root     32235  4.1  0.0 835596 119700 ?       Sl   10:29   0:11 kubelet --volume-plugin-dir=/var/lib/kubelet/volumeplugins --file-check-frequency=5s --sync-frequency=30s --address=0.0.0.0 --alsologtostderr=false --anonymous-auth=false --authentication-token-webhook=true --authorization-mode=Webhook --cgroup-driver=systemd --client-ca-file=/var/lib/rancher/rke2/agent/client-ca.crt --cloud-provider=external --cluster-dns=10.53.0.10 --cluster-domain=cluster.local --container-runtime-endpoint=unix:///run/k3s/containerd/containerd.sock --containerd=/run/k3s/containerd/containerd.sock --eviction-hard=imagefs.available<5%,nodefs.available<5% --eviction-minimum-reclaim=imagefs.available=10%,nodefs.available=10% --fail-swap-on=false --healthz-bind-address=127.0.0.1 --hostname-override=harv-node3 --kubeconfig=/var/lib/rancher/rke2/agent/kubelet.kubeconfig --log-file=/var/lib/rancher/rke2/agent/logs/kubelet.log --log-file-max-size=50 --logtostderr=false --node-labels=<http://harvesterhci.io/managed=true,rke.cattle.io/machine=920fc48d-3f2b-42e0-9001-5f8e8d492dbd|harvesterhci.io/managed=true,rke.cattle.io/machine=920fc48d-3f2b-42e0-9001-5f8e8d492dbd> --pod-infra-container-image=<http://index.docker.io/rancher/pause:3.6|index.docker.io/rancher/pause:3.6> --pod-manifest-path=/var/lib/rancher/rke2/agent/pod-manifests --read-only-port=0 --resolv-conf=/etc/resolv.conf --serialize-image-pulls=false --stderrthreshold=FATAL --tls-cert-file=/var/lib/rancher/rke2/agent/serving-kubelet.crt --tls-private-key-file=/var/lib/rancher/rke2/agent/serving-kubelet.key
root     36847  0.0  0.0   7680   780 pts/0    S+   10:34   0:00 grep kubelet
a
@high-alligator-99144 did you ever do some operation to those nodes ?
@prehistoric-balloon-31801 any idea of such situation:
Copy code
harv-node1, the first node in cluster
harv-node3, the second node in cluster
  Oct 20 11:47:43 harv-node3 systemd[1]: Finished Rancher Bootstrap.

  first as an agent node, unti Oct 20 18:37:01

harv-node2, the third node, join the cluster
  
harv-node3 switch role from agent to server
  
Oct 20 18:37:01 harv-node3 systemd[1]: rke2-agent.service: Unit process 46440 (containerd-shim) remains running after unit>
Oct 20 18:37:01 harv-node3 systemd[1]: Stopped Rancher Kubernetes Engine v2 (agent).
  
  then become rke-2 server 
  
  but the rke2-server.service has continuous error
@high-alligator-99144 could you help upload the full log of rke2 in harv-node3, thanks:
Copy code
journalctl --unit=rke2-client > rke2-client.log
journalctl --unit=rke2-server > rke2-server.log
h
did you ever do some operation to those nodes ?
No. First harv-node1 was created followed by harv-node2 and harv-node3. When harv-node3 was added, harv-node2 went into Unavailable/Cordoned state. I tried looking at some Kubernetes services. But couldn't find. So, redeployed harv-node2. That is when harv-node3 became Unavailable.
No other operations were performed.
a
what's the VIP of your cluster ?
it should be different with any node IP
h
172.26.50.136 -- VIP 172.26.50.135 -- harv-node1 172.26.50.137 -- harv-node2 172.26.50.138 -- harv-node3
p
looks like etcd or API server can’t be ready. Could you try go to harv-node3 and:
Copy code
sudo -i
crictl ps -a | grep etcd
crictl ps -a | grep kube-api

# logs of these two pods:
crictl log <pod_id>
h
The API server runs only on the master node IMO (harv-node1 here):
Copy code
harv-node3:~ # crictl ps -a | grep kube-api
harv-node3:~ #

======
harv-node2:~ # crictl ps -a | grep kube-api
harv-node2:~ #

======
harv-node1:~ # crictl ps -a | grep kube-api
fe7d7e5003920       0a5067ab04e9a       4 days ago          Running             kube-apiserver                     0                   6e9fcceccfc41       kube-apiserver-harv-node1
harv-node1:~ #
Just discovered that, etcd service is not running at all on harv-node2 and exited state in harv-node3.
Copy code
harv-node3:~ # crictl ps -a | grep etcd
CONTAINER           IMAGE               CREATED             STATE               NAME                              ATTEMPT             POD ID              POD
33797c85a507d       f0af64efd7e38       3 minutes ago       Exited              etcd                              3031                26ee3636ba4e0       etcd-harv-node3

harv-node3:~ # crictl logs 33797c85a507d
{"level":"info","ts":"2022-10-25T07:37:27.735Z","caller":"etcdmain/config.go:339","msg":"loaded server configuration, other configuration command line flags and environment variables will be ignored if provided","path":"/var/lib/rancher/rke2/server/db/etcd/config"}
{"level":"info","ts":"2022-10-25T07:37:27.735Z","caller":"etcdmain/etcd.go:73","msg":"Running: ","args":["etcd","--config-file=/var/lib/rancher/rke2/server/db/etcd/config"]}
{"level":"warn","ts":"2022-10-25T07:37:27.735Z","caller":"etcdmain/etcd.go:446","msg":"found invalid file under data directory","filename":"config","data-dir":"/var/lib/rancher/rke2/server/db/etcd"}
{"level":"warn","ts":"2022-10-25T07:37:27.735Z","caller":"etcdmain/etcd.go:446","msg":"found invalid file under data directory","filename":"name","data-dir":"/var/lib/rancher/rke2/server/db/etcd"}
{"level":"info","ts":"2022-10-25T07:37:27.735Z","caller":"etcdmain/etcd.go:116","msg":"server has been already initialized","data-dir":"/var/lib/rancher/rke2/server/db/etcd","dir-type":"member"}
{"level":"info","ts":"2022-10-25T07:37:27.735Z","caller":"embed/etcd.go:131","msg":"configuring peer listeners","listen-peer-urls":["<https://127.0.0.1:2380>","<https://172.26.50.138:2380>"]}
{"level":"info","ts":"2022-10-25T07:37:27.735Z","caller":"embed/etcd.go:479","msg":"starting with peer TLS","tls-info":"cert = /var/lib/rancher/rke2/server/tls/etcd/peer-server-client.crt, key = /var/lib/rancher/rke2/server/tls/etcd/peer-server-client.key, client-cert=, client-key=, trusted-ca = /var/lib/rancher/rke2/server/tls/etcd/peer-ca.crt, client-cert-auth = true, crl-file = ","cipher-suites":[]}
{"level":"info","ts":"2022-10-25T07:37:27.752Z","caller":"embed/etcd.go:139","msg":"configuring client listeners","listen-client-urls":["<https://127.0.0.1:2379>","<https://172.26.50.138:2379>"]}
{"level":"info","ts":"2022-10-25T07:37:27.752Z","caller":"embed/etcd.go:308","msg":"starting an etcd server","etcd-version":"3.5.4","git-sha":"Not provided (use ./build instead of go build)","go-version":"go1.16.10b7","go-os":"linux","go-arch":"amd64","max-cpu-set":48,"max-cpu-available":48,"member-initialized":false,"name":"harv-node3-1fec3f96","data-dir":"/var/lib/rancher/rke2/server/db/etcd","wal-dir":"","wal-dir-dedicated":"","member-dir":"/var/lib/rancher/rke2/server/db/etcd/member","force-new-cluster":false,"heartbeat-interval":"500ms","election-timeout":"5s","initial-election-tick-advance":true,"snapshot-count":10000,"snapshot-catchup-entries":5000,"initial-advertise-peer-urls":["<http://localhost:2380>"],"listen-peer-urls":["<https://127.0.0.1:2380>","<https://172.26.50.138:2380>"],"advertise-client-urls":["<https://172.26.50.138:2379>"],"listen-client-urls":["<https://127.0.0.1:2379>","<https://172.26.50.138:2379>"],"listen-metrics-urls":["<http://127.0.0.1:2381>"],"cors":["*"],"host-whitelist":["*"],"initial-cluster":"harv-node1-483f6823=<https://172.26.50.135:2380>,harv-node3-1fec3f96=<https://172.26.50.138:2380>","initial-cluster-state":"existing","initial-cluster-token":"etcd-cluster","quota-size-bytes":2147483648,"pre-vote":true,"initial-corrupt-check":true,"corrupt-check-time-interval":"0s","auto-compaction-mode":"","auto-compaction-retention":"0s","auto-compaction-interval":"0s","discovery-url":"","discovery-proxy":"","downgrade-check-interval":"5s"}
{"level":"info","ts":"2022-10-25T07:37:27.752Z","caller":"etcdserver/backend.go:81","msg":"opened backend db","path":"/var/lib/rancher/rke2/server/db/etcd/member/snap/db","took":"179.93µs"}
{"level":"warn","ts":"2022-10-25T07:37:27.870Z","caller":"etcdserver/cluster_util.go:79","msg":"failed to get cluster response","address":"<https://172.26.50.135:2380/members>","error":"Get \"<https://172.26.50.135:2380/members>\": Service Unavailable"}
{"level":"info","ts":"2022-10-25T07:37:27.874Z","caller":"embed/etcd.go:368","msg":"closing etcd server","name":"harv-node3-1fec3f96","data-dir":"/var/lib/rancher/rke2/server/db/etcd","advertise-peer-urls":["<http://localhost:2380>"],"advertise-client-urls":["<https://172.26.50.138:2379>"]}
{"level":"info","ts":"2022-10-25T07:37:27.874Z","caller":"embed/etcd.go:370","msg":"closed etcd server","name":"harv-node3-1fec3f96","data-dir":"/var/lib/rancher/rke2/server/db/etcd","advertise-peer-urls":["<http://localhost:2380>"],"advertise-client-urls":["<https://172.26.50.138:2379>"]}
{"level":"fatal","ts":"2022-10-25T07:37:27.874Z","caller":"etcdmain/etcd.go:204","msg":"discovery failed","error":"cannot fetch cluster info from peer urls: could not retrieve cluster information from the given URLs","stacktrace":"<http://go.etcd.io/etcd/server/v3/etcdmain.startEtcdOrProxyV2\n\t/go/src/go.etcd.io/etcd/server/etcdmain/etcd.go:204\ngo.etcd.io/etcd/server/v3/etcdmain.Main\n\t/go/src/go.etcd.io/etcd/server/etcdmain/main.go:40\nmain.main\n\tgo.etcd.io/etcd/server/main.go:32\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:225|go.etcd.io/etcd/server/v3/etcdmain.startEtcdOrProxyV2\n\t/go/src/go.etcd.io/etcd/server/etcdmain/etcd.go:204\ngo.etcd.io/etcd/server/v3/etcdmain.Main\n\t/go/src/go.etcd.io/etcd/server/etcdmain/main.go:40\nmain.main\n\tgo.etcd.io/etcd/server/main.go:32\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:225>"}
@ancient-pizza-13099 @prehistoric-balloon-31801 Any inputs after looking at the logs?
a
Did you set
ntp server
in each node? not sure the
time
is synced inbetween those 3 nodes.
h
Yes, all 3 nodes are set up with the same ntp server during the installation
a
what is the disk type in those server, ssd, nvme or hdd ?
h
Copy code
harv-node1:~ # for ip in {172.26.50.135,172.26.50.137,172.26.50.138}; do echo $ip; ssh rancher@$ip date; done
172.26.50.135
Wed Oct 26 10:11:59 UTC 2022
172.26.50.137
Wed Oct 26 10:12:00 UTC 2022
172.26.50.138
Wed Oct 26 10:12:00 UTC 2022
Servers have HDD
Copy code
harv-node1:~ # lsblk
NAME   MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
loop0    7:0    0     3G  1 loop /
sda      8:0    0   1.1T  0 disk
├─sda1   8:1    0     1M  0 part
├─sda2   8:2    0    50M  0 part /oem
├─sda3   8:3    0    15G  0 part /run/initramfs/cos-state
├─sda4   8:4    0     8G  0 part
├─sda5   8:5    0   100G  0 part /usr/local
└─sda6   8:6    0 994.7G  0 part /var/lib/harvester/defaultdisk
sdb      8:16   0   1.1T  0 disk
sdc      8:32   0   1.1T  0 disk
sdd      8:48   0   1.1T  0 disk
sde      8:64   0   1.1T  0 disk
sdf      8:80   0   1.1T  0 disk
sdg      8:96   0   1.1T  0 disk
sdh      8:112  0    50G  0 disk /var/lib/kubelet/pods/c81c61bd-8168-4608-9fab-22146a816956/volume-subpaths/pvc-8b2a3484-2da1-460b-9b54-41082e23f85d/prometheus/2
sdk      8:160  0     2G  0 disk /var/lib/kubelet/pods/373cf7db-6878-40d1-8025-4134f622bb76/volumes/kubernetes.io~csi/pvc-241c5d34-79b4-465d-8907-52b46176da62/mount
sdl      8:176  0     5G  0 disk /var/lib/kubelet/pods/77af1b93-f4c1-4a1f-aa0f-eb8028fe8010/volume-subpaths/pvc-8243f740-1c7f-4e9d-8565-fd435daf4f5b/alertmanager/2
a
@high-alligator-99144 do you know the roughly hdd speed, 5400 or 7200 ?
@prehistoric-balloon-31801 could
etcd/api serve is not ready
be caused by slow hdd ?
h
The HDD spindle speed is 10K rpm
p
Not likely from the log, I believe etcd should start even with HDD.
👍 1
a
@high-alligator-99144 Your issue seems to be similar with https://github.com/harvester/harvester/issues/3039 `
It is said
When the third node is added the second node will get cordoned when the promotion is triggered and then kubelet stops responding.
, is this happend in your environment ?
When yes, Is the
the promotion is triggered
triggered automatically or manually from the WebUI ?
h
Yes, that's exactly what I described earlier. It's triggered automatically. I have not done any operations from WebUI apart from logging in to it.
a
Could you also help log an issue and attach the support-bundle file
it seems some unknown issue is triggered
h
Sure thing. Since the bug is already filed, I think I should attach the support-bundle to it.
a
@high-alligator-99144 could you help
ls /tmp -alt
in
harveste1,
thanks
also
cat /etc/mtab,  df -H
@high-alligator-99144 please skip my last 2 messages.
h
Ok, no problem
a
From our QA's test, it can recover automaitically, how about the situation in your cluster?
h
No, in my setup, the node didn't recover even after 7 days.
a
Thanks. I guess there are chances the situation is not recovered automatically.
@high-alligator-99144 I file a new ticket to track your issue, https://github.com/harvester/harvester/issues/3091, feel free to add comments / more detailed information, thanks.
h
Thanks. Glad to contribute in whatever way I can. Let me know if you need any other info to be pulled out of the cluster/nodes.
👍 1
a
We need some more logs, will figure out how to get them in the NODE, thanks. https://github.com/harvester/harvester/issues/3091#issuecomment-1301255783 "The etcd and apiserver are never coming up. Can you collect the kubelet/containerd logs, and the apiserver/etcd pod logs?"
h
collect the kubelet/containerd logs, and the apiserver/etcd pod logs
I assume these are available in
/var/log/containers/
and
/var/log/pods/
?
a
yes, /var/log/containers are pointing to /var/log/pods/...
Copy code
harv31:~ # ls /var/log/pods/ | grep etcd
kube-system_etcd-harv31_e18aa5e5b83a5a3c56d78e4054612394

harv31:~ # ls /var/log/pods/ | grep apiserver
kube-system_kube-apiserver-harv31_4874a08227e8932676b83ca998a390f3
kubelet log is in : /var/lib/rancher/rke2/agent/logs/kubelet.log
please fetch those logs and attch the github, thanks.
ps aux | grep " kube-apiserver " ps aux | grep " etcd " ps aux | grep " kubelet "
those command will check if the related
process
are running there, maybe
etcd
did not start-up
also: cat /var/lib/rancher/rke2/server/db/etcd/config
h
Attached the requested logs/configs in the bug.
👍 1
a
etcd
is not running
harv-node3
, guess
ps aux | grep " kube-apiserver "
kube-apiserver is also not running in
harve-node3
, it relies on
harv-node3
h
I thought
kube-apiserver
is supposed to run only in the master (harv-node1). I think I had highlighted that
etcd
isn't running on both
harv-node2
harv-node3
335 Views