This message was deleted.
# longhorn-storage
a
This message was deleted.
f
Longhorn does not care if the worker nodes are from the same vendor or not. But the network condition matters. If network fluctuation happens, some replicas will become failed, which is caused by a timeout between volume engine and replicas. And I guess this is one reason why there are some replica failures in your cluster. If it’s unavoidable, you can check why the rebuilding is not triggered automatically. You can check longhorn manager logs for details. I am thinking if Longhorn is trying to wait and reuse failed replica then the rebuilding is not happened immediately.
q
But replicas are all on-premise in the same rack. It’s a strange correlation that the issue happen when the connection to master nodes is down. I am puzzled since it’s designed to not be dependent on those nodes. I’ll continue to have a look at the logs. Thanks sharing your thoughts!