This message was deleted Rancher Users #general

Join Slack

This message was deleted.

# general

adamant-kite-43734

06/26/2023, 7:46 AM

This message was deleted.

hundreds-battery-84841

06/26/2023, 2:39 PM

@brash-cpu-62691 I've seen this error typically when there is an issue with the etcd cluster or the networking within your k8s setup. Did you check the status of the etcd nodes?. etcdctl endpoint health.

brash-cpu-62691

06/27/2023, 3:02 AM

@hundreds-battery-84841 Yes

Copy code

etcdctl endpoint health

its showing unhealthy We tried etcd reset but got another issue https://github.com/k3s-io/k3s/issues/7825

hundreds-battery-84841

06/27/2023, 3:10 AM

What was the issue?

hundreds-battery-84841

06/27/2023, 3:10 AM

Do you have the logs

hundreds-battery-84841

06/27/2023, 3:10 AM

brash-cpu-62691

06/27/2023, 3:15 AM

@hundreds-battery-84841 Logs:

Copy code

Jun 26 04:33:34 k3s[1510]: {"level":"warn","ts":"2023-06-26T04:33:34.490Z","logger":"etcd-client","caller":"v3@v3.5.3-k3s1/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"<etcd-endpoints://0xc000d8a000/127.0.0.1:2379>","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
Jun 26 04:33:34 k3s[1510]: time="2023-06-26T04:33:34Z" level=error msg="Failed to check local etcd status for learner management: context deadline exceeded"
Jun 26 04:33:34 k3s[1510]: time="2023-06-26T04:33:34Z" level=info msg="Waiting to retrieve kube-proxy configuration; server is not ready: <https://127.0.0.1:6443/v1-k3s/readyz>: 500 Internal Server Error"

hundreds-battery-84841

06/27/2023, 3:16 AM

What's the output of sudo systemctl status etcd ?

brash-cpu-62691

06/27/2023, 3:18 AM

Copy code

Unit etcd.service could not be found.

K3s has embedded etcd

hundreds-battery-84841

06/27/2023, 3:26 AM

How about systemctl status k3s ?

hundreds-battery-84841

06/27/2023, 3:26 AM

Or journalctl -u k3s

brash-cpu-62691

06/27/2023, 3:28 AM

Copy code

Jun 27 03:27:09 k3s[23842]: {"level":"info","ts":"2023-06-27T03:27:09.826Z","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"6cc15128bccbdf39 is starting a new election at term 12"}
Jun 27 03:27:09 k3s[23842]: {"level":"info","ts":"2023-06-27T03:27:09.826Z","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"6cc15128bccbdf39 became pre-candidate at term 12"}
Jun 27 03:27:09 k3s[23842]: {"level":"info","ts":"2023-06-27T03:27:09.827Z","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"6cc15128bccbdf39 received MsgPreVoteResp from 6cc15128bccbdf39 at term 12"}
Jun 27 03:27:09 k3s[23842]: {"level":"info","ts":"2023-06-27T03:27:09.827Z","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"6cc15128bccbdf39 [logterm: 12, index: 24579453] sent MsgPreVote request to 298b911665c13aa0 at term 12"}
Jun 27 03:27:09 k3s[23842]: {"level":"info","ts":"2023-06-27T03:27:09.827Z","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"6cc15128bccbdf39 [logterm: 12, index: 24579453] sent MsgPreVote request to f875d12643597ed2 at term 12"}
Jun 27 03:27:12 k3s[23842]: {"level":"warn","ts":"2023-06-27T03:27:12.878Z","caller":"rafthttp/probing_status.go:68","msg":"prober detected unhealthy status","round-tripper-name":"ROUND_TRIPPER_SNAPSHOT","remote-peer-id":"298b911665c13aa0","rtt":"0s","error":"dial tcp 10.47.0.4:2380: connect: connection refused"}
Jun 27 03:27:12 k3s[23842]: {"level":"warn","ts":"2023-06-27T03:27:12.878Z","caller":"rafthttp/probing_status.go:68","msg":"prober detected unhealthy status","round-tripper-name":"ROUND_TRIPPER_RAFT_MESSAGE","remote-peer-id":"298b911665c13aa0","rtt":"0s","error":"dial tcp 10.47.0.4:2380: connect: connection refused"}
Jun 27 03:27:12 k3s[23842]: {"level":"warn","ts":"2023-06-27T03:27:12.883Z","caller":"rafthttp/probing_status.go:68","msg":"prober detected unhealthy status","round-tripper-name":"ROUND_TRIPPER_SNAPSHOT","remote-peer-id":"f875d12643597ed2","rtt":"0s","error":"dial tcp 10.47.0.5:2380: connect: connection refused"}
Jun 27 03:27:12 k3s[23842]: {"level":"warn","ts":"2023-06-27T03:27:12.883Z","caller":"rafthttp/probing_status.go:68","msg":"prober detected unhealthy status","round-tripper-name":"ROUND_TRIPPER_RAFT_MESSAGE","remote-peer-id":"f875d12643597ed2","rtt":"0s","error":"dial tcp 10.47.0.5:2380: connect: connection refused"}
Jun 27 03:27:13 k3s[23842]: time="2023-06-27T03:27:13Z" level=info msg="Tunnel server egress proxy waiting for runtime core to become available"
Jun 27 03:27:14 k3s[23842]: time="2023-06-27T03:27:14Z" level=info msg="Waiting to retrieve kube-proxy configuration; server is not ready: <https://127.0.0.1:6443/v1-k3s/readyz>: 500 Internal Server Error"

hundreds-battery-84841

06/27/2023, 3:31 AM

It appears etcd is having trouble establishing a connection to other etcd peers in the cluster. Etcd uses port 2380 for peer to peer communication. The IP addresses 10.47.0.4 and 10.47.0.5 seem to be the other etcd peers that your current node is trying to communicate with. The connection refused error typically means that nothing is listening on the relevant IP and port you're trying to connect to.

hundreds-battery-84841

06/27/2023, 3:31 AM

Test with telnet 10.47.0.4 2380

hundreds-battery-84841

06/27/2023, 3:32 AM

to see what comes up from that.

hundreds-battery-84841

06/27/2023, 3:33 AM

Umm: Umm: dial tcp 10.47.0.42380 connect: connection refused continues to indicate that the node cannot establish a connection with the other nodes in the cluster. Additionally, the etcd is trying to start a new election, which likely means it's not receiving heartbeats from the other nodes in the etcd cluster, which is a further indication of network or connectivity issues.

hundreds-battery-84841

06/27/2023, 3:34 AM

Also, the "server is not ready" message indicates that the k3s server is not able to start up properly, possibly due to the issues with etcd.

brash-cpu-62691

06/27/2023, 3:34 AM

yup

brash-cpu-62691

06/27/2023, 3:35 AM

tried to reset etcd too but no luck

hundreds-battery-84841

06/27/2023, 3:35 AM

Have try restarting k3s?

brash-cpu-62691

06/27/2023, 3:35 AM

yes

brash-cpu-62691

06/27/2023, 3:35 AM

same error as shared above

hundreds-battery-84841

06/27/2023, 3:36 AM

And there's no firewall rules?

hundreds-battery-84841

06/27/2023, 3:36 AM

That you are aware of?

hundreds-battery-84841

06/27/2023, 3:36 AM

Can you show the content of the etcd.conf file

brash-cpu-62691

06/27/2023, 3:37 AM

No firewall

brash-cpu-62691

06/27/2023, 3:37 AM

chekcing for content

brash-cpu-62691

06/27/2023, 3:39 AM

Copy code

advertise-client-urls: <https://10.46.0.4:2379>
client-transport-security:
  cert-file: /application-volume/k3s/server/tls/etcd/server-client.crt
  client-cert-auth: true
  key-file: /application-volume/k3s/server/tls/etcd/server-client.key
  trusted-ca-file: /application-volume/k3s/server/tls/etcd/server-ca.crt
data-dir: /var/lib/etcd
election-timeout: 5000
experimental-initial-corrupt-check: true
heartbeat-interval: 500
listen-client-urls: <https://127.0.0.1:2379>,<https://10.46.0.4:2379>
listen-metrics-urls: <http://127.0.0.1:2381>
listen-peer-urls: <https://127.0.0.1:2380>,<https://10.46.0.4:2380>
log-outputs:
- stderr
logger: zap
name: master-1
peer-transport-security:
  cert-file: /application-volume/k3s/server/tls/etcd/peer-server-client.crt
  client-cert-auth: true
  key-file: /application-volume/k3s/server/tls/etcd/peer-server-client.key
  trusted-ca-file: /application-volume/k3s/server/tls/etcd/peer-ca.crt
snapshot-count: 10000

hundreds-battery-84841

06/27/2023, 3:40 AM

Can you see etcd port with: netstat -tuln ?

brash-cpu-62691

06/27/2023, 3:41 AM

Copy code

tcp       69      0 127.0.0.1:2379          0.0.0.0:*               LISTEN
tcp        0      0 10.46.0.4:2379          0.0.0.0:*               LISTEN
tcp        0      0 127.0.0.1:2380          0.0.0.0:*               LISTEN
tcp        0      0 10.46.0.4:2380          0.0.0.0:*               LISTEN
tcp        0      0 127.0.0.1:2381          0.0.0.0:*               LISTEN

Yes

hundreds-battery-84841

06/27/2023, 3:43 AM

Umm: the etcd configuration seems correct. The listen-client-urls and listen-peer-urls fields are both correctly set up to listen on the right ip addresses and ports. However, the error

dial tcp 10.47.0.4:2380: connect: connection refused

and

dial tcp 10.47.0.5:2380: connect: connection refused

. This suggests that the peer nodes with ip addresses 10.47.0.4 and 10.47.0.5, which are different from the ip address specified in your configuration (10.46.0.4), are not accessible or not running etcd.

hundreds-battery-84841

06/27/2023, 3:44 AM

Make sure to verify that the etcd configurations on the nodes 10.47.0.4 and 10.47.0.5 match your expectations. In particular, verify that the

listen-peer-urls

and

listen-client-urls

fields are correctly set.

hundreds-battery-84841

06/27/2023, 3:45 AM

From what I can see there might be a discrepancy in your cluster configuration. If the ip addresses in the log messages (10.47.0.4 and 10.47.0.5) are incorrect, then there may be a misconfiguration in your k3s setup. The nodes might not have been correctly added to the cluster, or the cluster configuration might not have been updated to reflect changes to the node ip addresses. Verify your cluster configuration to ensure it matches your current setup.

brash-cpu-62691

06/27/2023, 3:45 AM

ok sure I will update you in sometime

hundreds-battery-84841

06/27/2023, 3:45 AM

hundreds-battery-84841

06/27/2023, 4:32 AM

@brash-cpu-62691 any luck?

390 Views

Open in Slack

Previous Next