hello all. i have a bad master node (was a 9 node ...
# general
q
hello all. i have a bad master node (was a 9 node cluster, 3 masters, 3 linux workers, and 3 windows workers) and one of my masters had a disk issue. so i'm trying to replace it, but i keep getting this when i try to join it to the cluster. note, i did delete the bad node out, and i'm trying to join w/ the same node name / ip. it seems like it's timing out downloading "something" it's a custom rancher cluster (rke2 1.24.10) i'm using the "Registration" link on a ubuntu box to join the cluster. i still have 2 health master nodes, i'm trying to replace the failed 3rd master node. my windows node gets the same error, this is what i see in event log: failed to run server: unable to parse connection info file: error gathering file information for file C/var/lib/rancher/agent/rancher2 connection info.json CreateFile C/var/lib/rancher/agent/rancher2 connection info.json The system cannot find the file specified. here's the full output:
Copy code
[INFO]  Label: <http://cattle.io/os=linux|cattle.io/os=linux>
[INFO]  Role requested: etcd
[INFO]  Role requested: controlplane
[INFO]  Using default agent configuration directory /etc/rancher/agent
[INFO]  Using default agent var directory /var/lib/rancher/agent
[INFO]  Determined CA is necessary to connect to Rancher
[INFO]  Successfully downloaded CA certificate
[INFO]  Value from <https://rancher.mk8s.mydomain.com/cacerts> is an x509 certificate
[INFO]  Successfully tested Rancher connection
[INFO]  Downloading rancher-system-agent binary from <https://rancher.mk8s.mydomain.com/assets/rancher-system-agent-amd64>
[INFO]  Successfully downloaded the rancher-system-agent binary.
[INFO]  Downloading rancher-system-agent-uninstall.sh script from <https://rancher.mk8s.mydomain.com/assets/system-agent-uninstall.sh>
[INFO]  Successfully downloaded the rancher-system-agent-uninstall.sh script.
[INFO]  Generating Cattle ID
[INFO]  Cattle ID was already detected as 3f4f57f4efdc465df1d3ed41e2ebe063806dccf8bf554196c1a38073aff47ef. Not generating a new one.
curl: (28) Operation timed out after 60000 milliseconds with 0 bytes received
[ERROR]  000 received while downloading Rancher connection information. Sleeping for 5 seconds and trying again
looks like it's this command that's timing out. any ideas? + curl --connect-timeout 60 --max-time 60 --write-out %{http_code}\n --cacert /tmp/tmp.kjGYl1D9Pr -sS -H Authorization: Bearer xz2tpjw6m8pvrpmrvs96sldpjxd4zvh8l2mt8dcfk4zzlmqwwlcrkq -H X-Cattle-Id: 3f4f57f4efdc465df1d3ed41e2ebe063806dccf8bf554196c1a38073aff47ef -H X-Cattle-Role-Etcd: true -H X-Cattle-Role-Control-Plane: true -H X-Cattle-Role-Worker: false -H X-Cattle-Node-Name: -H X-Cattle-Address: -H X-Cattle-Internal-Address: -H X-Cattle-Labels: cattle.io/os=linux -H X-Cattle-Taints: https://rancher.mk8s.mydomain.com/v3/connect/agent -o /var/lib/rancher/agent/rancher2_connection_info.json i've also tried w/ a totally new node name, and new ip address, same result. rancher itself is running on a single k3s node.