billowy-apple-60989
04/28/2025, 1:48 PMcattle-cluster-agent
pod on the downstream cluster at which point it will reconnect again.
In the agent pod logs there are some timeout errors to the upstream Rancher instance but it seems to just "give up" reconnecting at some point.
time="2025-04-27T21:24:42Z" level=warning msg="[850] encountered error \"write tcp 10.239.24.70:54512->172.30.103.51:443: i/o timeout\" while writing error \"tunnel disconnect\" to close remotedialer"
time="2025-04-27T21:24:42Z" level=warning msg="[850] encountered error \"write tcp 10.239.24.70:54512->172.30.103.51:443: i/o timeout\" while writing error \"io: read/write on closed pipe\" to close remotedialer"
time="2025-04-27T21:24:42Z" level=error msg="Failed to dial steve aggregation server: read tcp 10.239.24.70:54512->172.30.103.51:443: i/o timeout"
time="2025-04-27T21:34:00Z" level=error msg="Failed to dial steve aggregation server: read tcp 10.239.24.70:52390->172.30.103.51:443: i/o timeout"
time="2025-04-27T21:34:10Z" level=error msg="Failed to dial steve aggregation server: dial tcp 172.30.103.51:443: i/o timeout"
time="2025-04-27T21:39:47Z" level=error msg="Failed to dial steve aggregation server: read tcp 10.239.24.70:42104->172.30.103.51:443: i/o timeout"
time="2025-04-27T22:23:04Z" level=info msg="Downloading repo index from <https://azure.github.io/secrets-store-csi-driver-provider-azure/charts/index.yaml>"
time="2025-04-27T22:26:42Z" level=error msg="Failed to dial steve aggregation server: read tcp 10.239.24.70:34800->172.30.103.51:443: i/o timeout"
time="2025-04-27T22:26:52Z" level=error msg="Failed to dial steve aggregation server: dial tcp: lookup our.rancher.hostname: i/o timeout"
time="2025-04-27T22:27:02Z" level=error msg="Failed to dial steve aggregation server: dial tcp: lookup our.rancher.hostname: i/o timeout"
time="2025-04-27T22:27:12Z" level=error msg="Failed to dial steve aggregation server: dial tcp: lookup our.rancher.hostname: i/o timeout"
time="2025-04-27T22:27:22Z" level=error msg="Failed to dial steve aggregation server: dial tcp 172.30.103.51:443: i/o timeout"
time="2025-04-27T22:27:32Z" level=error msg="Failed to dial steve aggregation server: dial tcp 172.30.103.51:443: i/o timeout"
Any pointers on how to resolve this? Some value i can tweak to increase the timeout or retry attempts perhaps?
Thanks 🙏