https://rancher.com/ logo
Title
w

wonderful-appointment-6480

01/22/2023, 8:30 PM
Hi. I recently installed longhorn 1.34 on my k3s cluster. I have 3 master nodes, one of them is tainted with NoSchedule (node.kubernetes.io/unschedulable:NoSchedule) I still see this node when I go to longhorn ui -> Nodes. It appears as not scedhulable but it makes me unable to create backups to NFS and all sort of weird issues. The only way to get rid of that node is to remove the CR that represents it directly, but it will come back at some point (and since its not schedulable, ill have instance managers that are unable to deploy on it). Any suggestions why it happens? I also have "*Disable Scheduling On Cordoned Node*" checked.
f

famous-journalist-11332

01/27/2023, 5:02 AM
Did you manually added the taint http://node.kubernetes.io/unschedulable:NoSchedule or it is populated by Kubernetes itself?
w

wonderful-appointment-6480

01/27/2023, 5:37 AM
Hi @famous-journalist-11332, I used kubectl to add the taint even before deploying longhorn. It works as I dont have anything else other than k8s stuff deployed to it
f

famous-journalist-11332

01/27/2023, 5:45 AM
The issue is that that taint doesn't stop the daemonset pod to be scheduled on that node. Kubernetes automatically allows daemonset pod to tolerate the taint http://node.kubernetes.io/unschedulable:NoSchedule As the result, Longhorn manager (which is a daemonset) will deploy a pod there. And that pod will recreate a Longhorn node CR if it sees that the CR is missing. Proposal: Use a different taint than the system one (http://node.kubernetes.io/unschedulable:NoSchedule)
w

wonderful-appointment-6480

01/27/2023, 10:55 AM
Thats a general behavior with NoSchedule and deamonsets, or only with longhorn? Once I apply a new NoSchedule taint, and the node is "down" in longhorn ui, its safe to click to remove it?
I dont see any specific toleration on the longhorn-manager daemonset, I would like to understand it a bit better, I appreciate if you could elaborate @famous-journalist-11332
f

famous-journalist-11332

01/27/2023, 11:32 PM
The daemonset yaml doesn't have it. However, when the daemonset controller create pods for the daemonset, it will automatically populate the toleration. You can checkit by looking at the yaml of the pod (not the daemonset) by
kubectl get pod <daemonset-pod-name> -oyaml
Thats a general behavior with NoSchedule and deamonsets, or only with longhorn?
Once I apply a new NoSchedule taint, and the node is "down" in longhorn ui, its safe to click to remove it?
This behavior happens with any daemonset. My proposal is to use a different taint than the system one (http://node.kubernetes.io/unschedulable:NoSchedule)
w

wonderful-appointment-6480

01/28/2023, 7:02 AM
Oh, I was always under the impression that the toleration exists on the daemonset because it what I see for other DS in my environment. I guess implicit works the same way, and thats why I couldnt figure this out myself so far. Thank you @famous-journalist-11332