This message was deleted.
# rke2
a
This message was deleted.
c
You’ve provided no context for what you’re running or how, but it looks like curl can’t connect to the rancher server to download the install script?
or it is otherwise timing out when pulling down something
q
it seems like it's timing out downloading "something"
it's a custom rancher cluster (rke2 1.24.10) i'm using the "Registration" link on a ubuntu box to join the cluster. i still have 2 health master nodes, i'm trying to replace the failed 3rd master node.
my windows node gets the same error, this is what i see in event log:
Copy code
failed to run server: unable to parse connection info file: error gathering file information for file C:/var/lib/rancher/agent/rancher2_connection_info.json: CreateFile C:/var/lib/rancher/agent/rancher2_connection_info.json: The system cannot find the file specified.
here's the full output:
Copy code
[INFO]  Label: <http://cattle.io/os=linux|cattle.io/os=linux>
[INFO]  Role requested: etcd
[INFO]  Role requested: controlplane
[INFO]  Using default agent configuration directory /etc/rancher/agent
[INFO]  Using default agent var directory /var/lib/rancher/agent
[INFO]  Determined CA is necessary to connect to Rancher
[INFO]  Successfully downloaded CA certificate
[INFO]  Value from <https://rancher.mk8s.mydomain.com/cacerts> is an x509 certificate
[INFO]  Successfully tested Rancher connection
[INFO]  Downloading rancher-system-agent binary from <https://rancher.mk8s.mydomain.com/assets/rancher-system-agent-amd64>
[INFO]  Successfully downloaded the rancher-system-agent binary.
[INFO]  Downloading rancher-system-agent-uninstall.sh script from <https://rancher.mk8s.mydomain.com/assets/system-agent-uninstall.sh>
[INFO]  Successfully downloaded the rancher-system-agent-uninstall.sh script.
[INFO]  Generating Cattle ID
[INFO]  Cattle ID was already detected as 3f4f57f4efdc465df1d3ed41e2ebe063806dccf8bf554196c1a38073aff47ef. Not generating a new one.
curl: (28) Operation timed out after 60000 milliseconds with 0 bytes received
[ERROR]  000 received while downloading Rancher connection information. Sleeping for 5 seconds and trying again
@creamy-pencil-82913 ^^
c
hmm. you might check and see if perhaps you need to clean up something on the rancher side?
I don’t know what would cause it to download the script, but then hang when trying to get the cluster connection info. I suspect something is confused on the rancher side.
q
should i try restarting rancher maybe?
c
I’m not sure, I’m an RKE2 dev, not Rancher. You might check the rancher logs and see if there’s anything related to what you’re trying to do in there?
q
hmm.. looks like there's memory pressure on the rancher node. i'll try looking at that.
so it's not memory, i killed a few things, and the node is all green now (rancher) but i'm still getting the same issue. i set -x on the install script, and i can see there error is:
Copy code
+ curl --connect-timeout 60 --max-time 60 --write-out %{http_code}\n --cacert /tmp/tmp.kjGYl1D9Pr -sS -H Authorization: Bearer xz2tpjw6m8pvrpmrvs96sldpjxd4zvh8l2mt8dcfk4zzlmqwwlcrkq -H X-Cattle-Id: 3f4f57f4efdc465df1d3ed41e2ebe063806dccf8bf554196c1a38073aff47ef -H X-Cattle-Role-Etcd: true -H X-Cattle-Role-Control-Plane: true -H X-Cattle-Role-Worker: false -H X-Cattle-Node-Name:  -H X-Cattle-Address:  -H X-Cattle-Internal-Address:  -H X-Cattle-Labels: <http://cattle.io/os=linux|cattle.io/os=linux> -H X-Cattle-Taints:  <https://rancher.mk8s.mydomain.com/v3/connect/agent> -o /var/lib/rancher/agent/rancher2_connection_info.json
any thoughts? i also treid posting in #developer
c
maybe just general, since it’s an issue with rancher, not rke2… but I will say that the rancher developers don’t have much time to answer questions here.
is this the same URL you’re downloading the install script from?
Copy code
<https://rancher.mk8s.mydomain.com/v3/connect/agent>
q
yes
c
is there still a node/machine in rancher for this host?
q
not sure i got you. under machines in cluster-management, the old node is gone. the new node is in there, but "pending"
just for giggles i extended the timeout in the script to 600, still fails.
it's just really odd that rancher would just hang like that.
these nodes have been running for 180+ days w/ no issues too.
c
yeah there are multiple resources for each thing. nodes, machines, and so on.
make sure you’ve cleaned up both the old resources, and the new pending one, and then try again?
you might also run the rancher-system-agent uninstall script, and clean out anything under /etc/rancher to get a new cattle ID?
q
tried a whole new node w/ different ip and hostname, same result. 😞
490 Views