Hello all, I'm working through the automated upgra...
# general
a
Hello all, I'm working through the automated upgrade to 1.32.3 on an air-gap cluster. I've got all container images preloaded on the server node, and so far all pods have come up successfully except for the 'apply-server-plan-on-<nodeName>-with-<hash>' pod which is complaining it can't pull the image. 'crictl images' says I have the very image that pod is failing to pull in my local cache. I can't seem to find the pullPolicy for this image anywhere to see if it was set to 'always', and wondering if anyone knows what I'm missing. thanks!
c
… whats the image
a
right, of course. sorry. it's this one: rancher/kubectl:v1.30.3
and here's what crictl has: docker.io/rancher/kubectl v1.30.3 ec3217f60bc33 16.4MB
c
are you sure you have the image on the node in question? what do you mean by “in your local cache”?
a
It is my understanding that images loaded into containerd and visible with either nerdctl images or crictl images are stored in what is known as the image cache. On non-airgapped systems this removes the necessity to download the image from upstream on every pod creation.
on air-gapped systems, we load images into this cache so no image pulls are necessary. This is what rke2 does with images in /var/lib/rancher/rke2/agent/images/ IIRC. This has worked with all other images I've preloaded that way.
I can be sure the node has the image thusly: # crictl images |grep kubectl docker.io/rancher/kubectl v1.30.3 ec3217f60bc33 16.4MB
c
its not an image cache. it’s the containerd image store
check the pod yaml to see what the image pull policy is
a
I see where it is in system-upgrade-controller.yaml now. was hiding from grep as an env var...
I got the imagePullPolicy fixed for the kubectl image, but i can't seem to do the same for the rke2-upgrade image specified in the upgrade plan here: upgrade: image: rancher/rke2-upgrade version: v1.32.3-rke2r1 the pod is created with an imagePullPolicy of 'Always' and I'd like it to be 'ifNotPresent'
c
The default for the controller is IfNotPresent, if you’d changed that for some reason in the controller config you may have to delete and recreate the plans for the controller to update the job, after it’s been restarted with your desired policy setting
a
I have not changed anything in the controller config. I followed the example provided here: https://docs.rke2.io/upgrades/automated_upgrade, and in system-upgrade-controller.yaml it has this for the kubectl pod: SYSTEM_UPGRADE_JOB_IMAGE_PULL_POLICY: Always which I have changed to: SYSTEM_UPGRADE_JOB_IMAGE_PULL_POLICY: ifNotPresent I'm no longer getting an image pull error as I was before, but no jobs are created when I apply the plan, and no upgrade takes place. I'm on 1.29.15 and attempting to upgrade to 1.32.3.
sorry, please ignore. I got a clue and checked the logs and this was an unCatched syntax error in my config.
However, the default pullPolicy in system-upgrade-controller.yaml is 'Always', not 'IfNotPresent'.
c
The default for the controller is if not present. The sample deployment sets an env var to change that, as example of passing config via env vars. But that should be fairly easy to figure out since it's set in the deployment spec.
a
ah, I understand now. thanks for that. The master upgraded successfully, but the agent nodes have not. There is an agent-plan, but no jobs. The log from the system-upgrade-controller does not contain any errors to explain it.
c
does the agent node-selector match any nodes?
it is not an error for a plan’s selector to not match anything
a
I'm using the upgrade plan sample, and did not realize it had an optional match on rke2-upgrade enabled. based on the instructions I had assumed it would go on to upgrade all agent nodes. my bad. thanks for the help, tho!
this is all so new to me, but after finding the problem, it seems so obvious.
c
I believe it even calls it out in the yaml, if you read what you’re applying before you use it
Copy code
# Optionally limit the upgrade to nodes that have an "rke2-upgrade" label, and
      # exclude nodes where the label value is "disabled" or "false". To upgrade all
      # agent nodes, remove the following two items.
      - {key: rke2-upgrade, operator: Exists}
      - {key: rke2-upgrade, operator: NotIn, values: ["disabled", "false"]}