adamant-kite-43734
12/28/2020, 9:08 AMrhythmic-energy-29295
01/11/2021, 1:15 PMVolume Attachment Recovery Policy: immediate
Pod Deletion Policy When Node is Down: delete-both-statefulset-and-deployment-pod
I also changed the default toleration of the nginx pod to:
tolerations:
- effect: NoExecute
key: <http://node.kubernetes.io/not-ready|node.kubernetes.io/not-ready>
operator: Exists
tolerationSeconds: 0
- effect: NoExecute
key: <http://node.kubernetes.io/unreachable|node.kubernetes.io/unreachable>
operator: Exists
tolerationSeconds: 0
So the k8s won't tolerate a node shutdown at all.
Then, I tried to force shutdown to the machine that currently run the nginx pod, and started to follow the process from this moment until the nginx becomes up and running with the longhorn PVC again.
The process looks like this:
poweroff machine -> approx. 37 secs -> Node marked NotReady -> ContainerCreating immediately on new node for approx. 2 mins.
Seems like we're getting two common errors that delays the pod creation:
Multi-Attach error for volume "pvc-879f95fd-e69a-4c67-8b8c-d9fbca183edd" Volume is already exclusively attached to one node and can't be attached to another
Unable to attach or mount volumes: unmounted volumes=[nginx-storage], unattached volumes=[nginx-storage default-token-swdnp]: timed out waiting for the condition
I also noticed that two longhorn workloads are unavailable sometimes, but assuming these are not relevant for my case:
longhorn-driver-deployer
longhorn-ui
This is how the Rancher Events looks like: