adamant-kite-43734
08/21/2024, 10:48 PMcreamy-pencil-82913
08/21/2024, 11:05 PMabundant-hair-58573
08/22/2024, 1:16 AM{
"level": "warn",
"ts": "2024-08-22T01:14:26.354997Z",
"caller": "etcdserver/server.go:2085",
"msg": "failed to publish local member to cluster through raft",
"local-member-id": "7ebc27a47a696333",
"local-member-attributes": "{Name:ip-10-114-49-88.ec2.internal-e975046f ClientURLs:[<https://10.114.49.88:2379>]}",
"request-path": "/0/members/7ebc27a47a696333/attributes",
"publish-timeout": "15s",
"error": "etcdserver: request timed out"
}
abundant-hair-58573
08/22/2024, 1:18 AMabundant-hair-58573
08/22/2024, 1:39 AMkubectl get nodes
from either of those 2 I see this
kubectl get nodes
E0822 01:38:20.050581 7582 memcache.go:265] couldn't get current server API group list: Get "<https://127.0.0.1:6443/api?timeout=32s>": net/http: TLS handshake timeout
E0822 01:38:30.052144 7582 memcache.go:265] couldn't get current server API group list: Get "<https://127.0.0.1:6443/api?timeout=32s>": net/http: TLS handshake timeout
abundant-hair-58573
08/22/2024, 1:45 AMError: context deadline exceeded
when trying to run the etcdctl commands in the etcd pods, which I know indicates the etcd instance is unhealthy. Makes sense, only one ectd node appears to behaving properly which means it doesn't have quorumcreamy-pencil-82913
08/22/2024, 2:02 AMabundant-hair-58573
08/22/2024, 2:10 AM{
"level": "warn",
"ts": "2024-08-22T01:56:44.770927Z",
"caller": "etcdserver/cluster_util.go:288",
"msg": "failed to reach the peer URL",
"address": "<https://10.114.49.88:2380/version>",
"remote-member-id": "7ebc27a47a696333",
"error": "Get \"<https://10.114.49.88:2380/version>\": dial tcp 10.114.49.88:2380: connect: connection refused"
}
And then here's a snippet from the etcd pod log that's running on 10.114.49.88 that the good etcd node can't connect to. I restarted this one about 10 minutes ago
{"level":"info","ts":"2024-08-22T01:56:48.519303Z","caller":"embed/serve.go:103","msg":"ready to serve client requests"}
{"level":"info","ts":"2024-08-22T01:56:48.520298Z","caller":"embed/serve.go:250","msg":"serving client traffic securely","traffic":"http","address":"127.0.0.1:2382"}
{"level":"info","ts":"2024-08-22T01:56:48.520358Z","caller":"embed/serve.go:103","msg":"ready to serve client requests"}
{"level":"info","ts":"2024-08-22T01:56:48.521483Z","caller":"embed/serve.go:250","msg":"serving client traffic securely","traffic":"grpc","address":"127.0.0.1:2379"}
{"level":"info","ts":"2024-08-22T01:56:48.522852Z","caller":"embed/serve.go:103","msg":"ready to serve client requests"}
{"level":"info","ts":"2024-08-22T01:56:48.523995Z","caller":"embed/serve.go:250","msg":"serving client traffic securely","traffic":"grpc","address":"10.114.49.88:2379"}
{"level":"info","ts":"2024-08-22T01:56:48.525162Z","caller":"etcdmain/main.go:44","msg":"notifying init daemon"}
{"level":"info","ts":"2024-08-22T01:56:48.525183Z","caller":"etcdmain/main.go:50","msg":"successfully notified init daemon"}
abundant-hair-58573
08/22/2024, 2:12 AMabundant-hair-58573
08/22/2024, 2:29 AM{"level":"warn","ts":"2024-08-22T01:56:20.758795Z","caller":"etcdserver/cluster_util.go:155","msg":"failed to get version","remote-member-id":"7ebc27a47a696333","error":"Get \"<https://10.114.49.88:2380/version>\": dial tcp 10.114.49.88:2380: connect: connection refused"}
{"level":"warn","ts":"2024-08-22T01:56:20.880189Z","caller":"embed/config_logging.go:169","msg":"rejected connection","remote-addr":"10.114.49.88:55648","server-name":"","error":"remote error: tls: bad certificate"}
{"level":"warn","ts":"2024-08-22T01:56:20.880905Z","caller":"embed/config_logging.go:169","msg":"rejected connection","remote-addr":"10.114.49.88:55658","server-name":"","error":"remote error: tls: bad certificate"}
{"level":"warn","ts":"2024-08-22T01:56:20.892152Z","caller":"embed/config_logging.go:169","msg":"rejected connection","remote-addr":"10.114.49.88:55686","server-name":"","error":"remote error: tls: bad certificate"}
{"level":"warn","ts":"2024-08-22T01:56:20.892637Z","caller":"embed/config_logging.go:169","msg":"rejected connection","remote-addr":"10.114.49.88:55672","server-name":"","error":"remote error: tls: bad certificate"}
{"level":"warn","ts":"2024-08-22T01:56:20.932421Z","caller":"embed/config_logging.go:169","msg":"rejected connection","remote-addr":"10.114.49.88:55700","server-name":"","error":"remote error: tls: bad certificate"}
{"level":"warn","ts":"2024-08-22T01:56:20.994648Z","caller":"embed/config_logging.go:169","msg":"rejected connection","remote-addr":"10.114.49.88:55716","server-name":"","error":"remote error: tls: bad certificate"}
{"level":"warn","ts":"2024-08-22T01:56:20.994998Z","caller":"embed/config_logging.go:169","msg":"rejected connection","remote-addr":"10.114.49.88:55704","server-name":"","error":"remote error: tls: bad certificate"}
{"level":"warn","ts":"2024-08-22T01:56:21.102371Z","caller":"embed/config_logging.go:169","msg":"rejected connection","remote-addr":"10.114.49.88:55732","server-name":"","error":"remote error: tls: bad certificate"}
{"level":"warn","ts":"2024-08-22T01:56:21.103052Z","caller":"embed/config_logging.go:169","msg":"rejected connection","remote-addr":"10.114.49.88:55722","server-name":"","error":"remote error: tls: bad certificate"}
{"level":"warn","ts":"2024-08-22T01:56:21.268047Z","caller":"embed/config_logging.go:169","msg":"rejected connection","remote-addr":"10.114.49.88:55740","server-name":"","error":"read tcp 10.114.49.102:2380->10.114.49.88:55740: read: connection reset by peer"}
{"level":"warn","ts":"2024-08-22T01:56:21.269205Z","caller":"embed/config_logging.go:169","msg":"rejected connection","remote-addr":"10.114.49.88:55738","server-name":"","error":"read tcp 10.114.49.102:2380->10.114.49.88:55738: read: connection reset by peer"}
creamy-pencil-82913
08/22/2024, 2:47 AMcreamy-pencil-82913
08/22/2024, 2:48 AMrke2 server --cluster-reset
to set etcd membership back to a single node only, then delete the db from the other two nodes and rejoin them to the cluster.abundant-hair-58573
08/22/2024, 2:54 AMabundant-hair-58573
08/22/2024, 2:55 AMcreamy-pencil-82913
08/22/2024, 4:01 AMabundant-hair-58573
08/23/2024, 2:53 PMrke2 server --cluster-reset
which gave an error about the server flag. So I deleted the server line from /etc/rancher/rke2/config.yaml.d/50-rancher.yaml
and continued with the reset, which seemed to work. I tried deleting the DB on another etcd node and starting the rke2-server process, now my control planes say they're down. I suspect it's because that server setting is still in their rke2 configs. Do I need to delete that from the control plane and other etcd node configs too? That server is actually one of the bad ones that I'm trying to bring back online, I guess it was the initial bootstrap server for my cluster?abundant-hair-58573
08/23/2024, 2:54 PMabundant-hair-58573
08/30/2024, 5:56 PM/etc/rancher/rke2/config.yaml.d/50-rancher.yaml
to reflect the new etcd node, the value it had was an old etcd node. The docs don't mention that step, maybe it isn't necessary in some configurations but it definitely was in mine.abundant-hair-58573
08/30/2024, 5:58 PMcreamy-pencil-82913
08/30/2024, 6:31 PMabundant-hair-58573
09/01/2024, 4:46 PM