This message was deleted.
# general
a
This message was deleted.
h
@brash-cpu-62691 I've seen this error typically when there is an issue with the etcd cluster or the networking within your k8s setup. Did you check the status of the etcd nodes?. etcdctl endpoint health.
b
@hundreds-battery-84841 Yes
Copy code
etcdctl endpoint health
its showing unhealthy We tried etcd reset but got another issue https://github.com/k3s-io/k3s/issues/7825
h
What was the issue?
Do you have the logs
?
b
@hundreds-battery-84841 Logs:
Copy code
Jun 26 04:33:34 k3s[1510]: {"level":"warn","ts":"2023-06-26T04:33:34.490Z","logger":"etcd-client","caller":"v3@v3.5.3-k3s1/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"<etcd-endpoints://0xc000d8a000/127.0.0.1:2379>","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
Jun 26 04:33:34 k3s[1510]: time="2023-06-26T04:33:34Z" level=error msg="Failed to check local etcd status for learner management: context deadline exceeded"
Jun 26 04:33:34 k3s[1510]: time="2023-06-26T04:33:34Z" level=info msg="Waiting to retrieve kube-proxy configuration; server is not ready: <https://127.0.0.1:6443/v1-k3s/readyz>: 500 Internal Server Error"
h
What's the output of sudo systemctl status etcd ?
b
Copy code
Unit etcd.service could not be found.
K3s has embedded etcd
h
How about systemctl status k3s ?
Or journalctl -u k3s
b
Copy code
Jun 27 03:27:09 k3s[23842]: {"level":"info","ts":"2023-06-27T03:27:09.826Z","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"6cc15128bccbdf39 is starting a new election at term 12"}
Jun 27 03:27:09 k3s[23842]: {"level":"info","ts":"2023-06-27T03:27:09.826Z","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"6cc15128bccbdf39 became pre-candidate at term 12"}
Jun 27 03:27:09 k3s[23842]: {"level":"info","ts":"2023-06-27T03:27:09.827Z","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"6cc15128bccbdf39 received MsgPreVoteResp from 6cc15128bccbdf39 at term 12"}
Jun 27 03:27:09 k3s[23842]: {"level":"info","ts":"2023-06-27T03:27:09.827Z","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"6cc15128bccbdf39 [logterm: 12, index: 24579453] sent MsgPreVote request to 298b911665c13aa0 at term 12"}
Jun 27 03:27:09 k3s[23842]: {"level":"info","ts":"2023-06-27T03:27:09.827Z","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"6cc15128bccbdf39 [logterm: 12, index: 24579453] sent MsgPreVote request to f875d12643597ed2 at term 12"}
Jun 27 03:27:12 k3s[23842]: {"level":"warn","ts":"2023-06-27T03:27:12.878Z","caller":"rafthttp/probing_status.go:68","msg":"prober detected unhealthy status","round-tripper-name":"ROUND_TRIPPER_SNAPSHOT","remote-peer-id":"298b911665c13aa0","rtt":"0s","error":"dial tcp 10.47.0.4:2380: connect: connection refused"}
Jun 27 03:27:12 k3s[23842]: {"level":"warn","ts":"2023-06-27T03:27:12.878Z","caller":"rafthttp/probing_status.go:68","msg":"prober detected unhealthy status","round-tripper-name":"ROUND_TRIPPER_RAFT_MESSAGE","remote-peer-id":"298b911665c13aa0","rtt":"0s","error":"dial tcp 10.47.0.4:2380: connect: connection refused"}
Jun 27 03:27:12 k3s[23842]: {"level":"warn","ts":"2023-06-27T03:27:12.883Z","caller":"rafthttp/probing_status.go:68","msg":"prober detected unhealthy status","round-tripper-name":"ROUND_TRIPPER_SNAPSHOT","remote-peer-id":"f875d12643597ed2","rtt":"0s","error":"dial tcp 10.47.0.5:2380: connect: connection refused"}
Jun 27 03:27:12 k3s[23842]: {"level":"warn","ts":"2023-06-27T03:27:12.883Z","caller":"rafthttp/probing_status.go:68","msg":"prober detected unhealthy status","round-tripper-name":"ROUND_TRIPPER_RAFT_MESSAGE","remote-peer-id":"f875d12643597ed2","rtt":"0s","error":"dial tcp 10.47.0.5:2380: connect: connection refused"}
Jun 27 03:27:13 k3s[23842]: time="2023-06-27T03:27:13Z" level=info msg="Tunnel server egress proxy waiting for runtime core to become available"
Jun 27 03:27:14 k3s[23842]: time="2023-06-27T03:27:14Z" level=info msg="Waiting to retrieve kube-proxy configuration; server is not ready: <https://127.0.0.1:6443/v1-k3s/readyz>: 500 Internal Server Error"
h
It appears etcd is having trouble establishing a connection to other etcd peers in the cluster. Etcd uses port 2380 for peer to peer communication. The IP addresses 10.47.0.4 and 10.47.0.5 seem to be the other etcd peers that your current node is trying to communicate with. The connection refused error typically means that nothing is listening on the relevant IP and port you're trying to connect to.
Test with telnet 10.47.0.4 2380
to see what comes up from that.
Umm: Umm: dial tcp 10.47.0.42380 connect: connection refused continues to indicate that the node cannot establish a connection with the other nodes in the cluster. Additionally, the etcd is trying to start a new election, which likely means it's not receiving heartbeats from the other nodes in the etcd cluster, which is a further indication of network or connectivity issues.
Also, the "server is not ready" message indicates that the k3s server is not able to start up properly, possibly due to the issues with etcd.
b
yup
tried to reset etcd too but no luck
h
Have try restarting k3s?
b
yes
same error as shared above
h
And there's no firewall rules?
That you are aware of?
Can you show the content of the etcd.conf file
b
No firewall
chekcing for content
Copy code
advertise-client-urls: <https://10.46.0.4:2379>
client-transport-security:
  cert-file: /application-volume/k3s/server/tls/etcd/server-client.crt
  client-cert-auth: true
  key-file: /application-volume/k3s/server/tls/etcd/server-client.key
  trusted-ca-file: /application-volume/k3s/server/tls/etcd/server-ca.crt
data-dir: /var/lib/etcd
election-timeout: 5000
experimental-initial-corrupt-check: true
heartbeat-interval: 500
listen-client-urls: <https://127.0.0.1:2379>,<https://10.46.0.4:2379>
listen-metrics-urls: <http://127.0.0.1:2381>
listen-peer-urls: <https://127.0.0.1:2380>,<https://10.46.0.4:2380>
log-outputs:
- stderr
logger: zap
name: master-1
peer-transport-security:
  cert-file: /application-volume/k3s/server/tls/etcd/peer-server-client.crt
  client-cert-auth: true
  key-file: /application-volume/k3s/server/tls/etcd/peer-server-client.key
  trusted-ca-file: /application-volume/k3s/server/tls/etcd/peer-ca.crt
snapshot-count: 10000
h
Can you see etcd port with: netstat -tuln ?
b
Copy code
tcp       69      0 127.0.0.1:2379          0.0.0.0:*               LISTEN
tcp        0      0 10.46.0.4:2379          0.0.0.0:*               LISTEN
tcp        0      0 127.0.0.1:2380          0.0.0.0:*               LISTEN
tcp        0      0 10.46.0.4:2380          0.0.0.0:*               LISTEN
tcp        0      0 127.0.0.1:2381          0.0.0.0:*               LISTEN
Yes
h
Umm: the etcd configuration seems correct. The listen-client-urls and listen-peer-urls fields are both correctly set up to listen on the right ip addresses and ports. However, the error
dial tcp 10.47.0.4:2380: connect: connection refused
and
dial tcp 10.47.0.5:2380: connect: connection refused
. This suggests that the peer nodes with ip addresses 10.47.0.4 and 10.47.0.5, which are different from the ip address specified in your configuration (10.46.0.4), are not accessible or not running etcd.
Make sure to verify that the etcd configurations on the nodes 10.47.0.4 and 10.47.0.5 match your expectations. In particular, verify that the
listen-peer-urls
and
listen-client-urls
fields are correctly set.
From what I can see there might be a discrepancy in your cluster configuration. If the ip addresses in the log messages (10.47.0.4 and 10.47.0.5) are incorrect, then there may be a misconfiguration in your k3s setup. The nodes might not have been correctly added to the cluster, or the cluster configuration might not have been updated to reflect changes to the node ip addresses. Verify your cluster configuration to ensure it matches your current setup.
b
ok sure I will update you in sometime
h
Ok
@brash-cpu-62691 any luck?
381 Views