So,
3 node cluster.
The test was a shutdown of the node.
Tweaked the rke2 settings for k8s:
kubelet:
node-status-update-frequency=4s (from 10s)
controller-manager:
node-monitor-period=4s (from 5s)
node-monitor-grace-period=16s (from 40s)
pod-eviction-timeout=30s (from 5m)
kube-apiserver:
default-not-ready-toleration-seconds 30
default-unreachable-toleration-seconds 30
K8s then labels the node as unreachable after about 1 minute, at the same time the node goes unreachable in Rancher, it also goes unreachable in Harvester, and 'Down' in the longhorn UI on the Node page.
So that's pretty quick. At this point the node is down in longhorn, but the volume remains attached to the dead node for around 5-7 minutes (feels like another default timer somewhere). Whilst this is happening K8s is erroring "Failed to attach PVC... to pod...".
Then it detaches, reattaches to a remaining node, and repairing the degradation/binding to the new pod/VMI takes a few more minutes.