RKE2 upgrades observation. I did a cluster upgrade from v1.24.3+rke2r1 to v1.24.6+rke2r1 and now i see this and it does not go away. not sure why or how to fix it? There is no logs for this (the jobs) that i can visibly see
anyone have any clue about his?
11/11/2022, 6:15 PM
is your cluster out of resources ie you don’t have enough cores available to run everything?
you should be able to see why they are pending
11/12/2022, 11:07 AM
I assume you're hitting a bug in 1.24.6 where the CNI certificates are not renewed. It's fixed in 1.24.7
11/14/2022, 7:03 AM
@sparse-fireman-14239 I will do the same upgrade to 1.24.7 and see if the problem goes away - feedback to follow. Thank you for the hint
@creamy-pencil-82913 This is a clean cluster (no pods other then the core cluster and storage, they each have 4vCpu and 16gig memory) - so doubt it is a resource issue
@sparse-fireman-14239 cluster now running on v1.24.7+rke2r1. still have this problem. But now I am seeing the issue. Seems our taint setups is messing around example:
5 node(s) didn't match Pod's node affinity/selector. preemption: 0/8 nodes are available: 8 Preemption is not helpful for scheduling.
The strange thing here is - clean build works (no errors, everything deploys) , upgrades gives issue with the 3 jobs - and these 3 jobs then have issues with the taints.