This message was deleted.
# harvester
a
This message was deleted.
p
Not to be that person but this doesn't seem to belong in the harvester channel as it's more vcluster + k3s related, no? I don't know how to change the debug level though the k3s docs likely have the answer.
e
No worries. This is the rancher-vcluster addon in Harvester. Expiremental. So it may be more not supported anywhere, but since this was a harvester addon which has started to fail to start thought i'd report it here. If this is the wrong place, happy to ask elsewhere.
p
Ah okay, my bad! Actually it's quite cool that there's the vcluster addon. As an aside, Loft Labs does have a Slack of their own in case you need more Vcluster specific help. I used to run plain vcluster k3s instances before but never had that problem. When I'm in the office tomorrow, I can give it a try on my Harvester cluster there and see if I can replicate it
e
Thank you. I've ran this for quite some time. The only other time I had an issue was when the volume filled, and it gave same errors as this. -- Which appeared to be something looping causing the etcd database to grow rapid. Stopping all my running fleet jobs stopped the growing, and after expanding the disk all was well. That was a couple months ago. So when it died today I thought disk again, but is fine. I'm going to try and mess with the config of the pod to add --v=5 on the k3 server command.
p
Alright, give that a shot. I take it this was a previously working cluster which suddenly had this error? And that it's not a new vcluster
e
Yes and the typical 'i didn't change anything'. Which i've been so focused on other things I truly haven't 😄 Just noticed a service was acting up and wanted to check it, then saw rancher was down. I did have an ingress certificate expiring this morning, but it went unhealthy last night. Which now makes it complicated to resolve the cert. But if it was a cert issue i'd expect some sort of logging about certificate. The guest cluster itself is happy as can be, just naturally cannot manage it since rancher (vcluster) is down.
p
Uh huh, do check the cert though because if you set the cert through the rancher UI, it is also used for your cluster. I'm not sure though
e
Just thought i'd share. it was simply startup time! After increasing debug logs, and seeing nothing. Thought maybe it was simply not starting in time. Increased resource limits/thresholds for liveness, and after nearly 10 minutes it came alive. Guessing I was approaching this threshold and when the pod was recreateed for any reason it then failed to come up within the default threshold. Also since runnning on 3x Mini PCs not exactly endless compute 😄 -- But will be moving to 3x r740s soon... so that won't be an issue. now to fix the certs :)
p
Ah okay okay, that's awesome! 10 minutes to start is... impressive haha. I faintly remember timeout issues on my old pi cluster! But Harvester on mini PCs sounds rough. Either way, glad to hear that solved it, and that you have a very big upgrade upcoming
e
They've been quite good actually, been running since 1.2 or earlier I think. They are i5-10500 so with 12 threads, 64gb memory, 2tb ssd, 2tb m2 drive. Changed the wifi 6ax to a 2.5gb ethernet. Biggest annoyance is lack of 10gig for storage, pretty easy to saturate the network. so bit shocked, but I did have that bug/issue some months ago that made my k3s db expldoe in size, so it's like > 7gig or something now, and not bothered to try and understand if posssible to reduce that 😄