:wave: We have an app that orchestrates "static" ...
# longhorn-storage
c
👋 We have an app that orchestrates "static" pods on the cluster, just like Longhorn instance manager pods. Those "static" pods are used as dev environment and mounts Longhorn volumes. They are considered stateful. Occasionally our system load becomes high and Longhorn instance manager would restart. That causes I/O error in those "static" pods from the same node. We then have to restart those pods. Is there something we can do to make this more resilient? We cannot guarantee that Longhorn instance manager never restarts. It's ok if there is a brief disruption, but recover from it quickly. Having to restart the pods is quite disruptive
c
Put health checks on your static pods so that they can restart automatically? Same as you would with any normally orchestrated pod.
c
You are right that restart can have the pod back to normal. However because they are stateful it caused issues. I read that rook daemons would be more resilient to a single daemon restart. So I was wondering if longhorn can achieve the same