This message was deleted.
# harvester
a
This message was deleted.
g
a support bundle would be a nice place to get started
w
I put the node into maintenance mode and restarted it and it came up looking more healthy - however it is now cordoned, restarting vms (only 2 on there) migrated them to other hosts and the stack is running with this one node cordoned. We were experimenting with using 1 replica on cluster workers, since the workers are throw-away the thought was that if something was going wrong in a worker the pods should migrate to a healthy worker and a new one could be spun up - at the same time the I/O benefit of not having a replica sounded good on paper - shoot me down if this is not the done thing and could have been a contributing factor.
Also - we run minio in k8s, and it’s storage we also reduced replicas to 1 since minio handles replication itself the assumption was that minio will handle this and it wasn’t working well, with dreadful I/O speeds - these containers were in part hosted on these workers. All of this seems happy now our new node is out of the equation.
However I don’t seem to be able to uncordon the node. It will come up then quickly becomes cordoned.
Will generate a package now. This node has a different Motherboard but same CPU, different Nic names, and more storage but same types.
Where would be the right place to look for logs in this situation of not being able to un-cordon the host? Are there pods in the harvester cluster I can look at to see whats happening? Can't see anything in the UI to help explain whats happening.
Might have worked out the problem...might... when looking at logs of the pod kube-proxy-n5 I could see it wanted to be IP .200, however this was different to that which was reported in the UI, so I updated my network so its was assigned .200 (rather than .61) looks like its sorting itself out now... its picked up the new IP and it's gone into maintenance on startup, stats just started appearing so hoping it will wake up soon...
yep - thats fixed it, i manually disabled maintenance mode, assuming harvester enabled that as a precaution with things changing! Host looks good now so this is resolved. For anyone else that gets this - what helped me was looking at the harvester cluster in lens at the logs of the related kube-proxy and the reason was more obvious!