This message was deleted Rancher Users #rke2

Join Slack

This message was deleted.

# rke2

adamant-kite-43734

05/04/2022, 2:39 PM

This message was deleted.

faint-airport-83518

05/04/2022, 2:39 PM

this thing https://docs.rke2.io/upgrade/automated_upgrade/

faint-airport-83518

05/04/2022, 2:57 PM

oh, I think I understand we just need to grab this container and add it to a plan every time we want to update

gray-lawyer-73831

05/04/2022, 3:10 PM

Yep exactly! Or you can upgrade based on channel, but doing that you’d want to have an automated solution to pull the rke2-upgrade containers into your private registry as well, otherwise the upgrade will of course fail with the image not being able to pull

faint-airport-83518

05/04/2022, 3:35 PM

got it, thanks. Do you know if nodes need to reboot after being "upgraded"? We're running in an azure vmscaleset - essentially an aws autoscaling group equiv. Just trying to weigh our options on what's least painful to maintain some kind of ansible/bash script to drain + kill vms or to implement this

gray-lawyer-73831

05/04/2022, 3:36 PM

Nope, just the rke2 process needs to be restarted (which, if you are using automated upgrade, it will handle all that for you).

faint-airport-83518

05/04/2022, 3:37 PM

How about any complications with SELinux? Or we should be good to go with the rke2-selinux package?

gray-lawyer-73831

05/04/2022, 3:38 PM

should be good to go with the rke2-selinux package

faint-airport-83518

05/04/2022, 3:38 PM

alright, I'll give it a shot. appreciate it.

👍 1

faint-airport-83518

05/04/2022, 5:38 PM

@gray-lawyer-73831 Should I be pulling these images from docker-hub or are you guys hosting somewhere else? It doesn't look like the latest release is here ~~https://hub.docker.com/r/rancher/rke2-upgrade/tags?page=1~~ nvm, looks like just a different naming convention v1.23.6-rc4+rke2r1 vs v1.23.6-rc4-rke2r1

gray-lawyer-73831

05/04/2022, 5:39 PM

• https://hub.docker.com/r/rancher/rke2-upgrade/tags?page=1&name=v1.21.12-rke2r1 • https://hub.docker.com/r/rancher/rke2-upgrade/tags?page=1&name=v1.22.9-rke2r1 • https://hub.docker.com/r/rancher/rke2-upgrade/tags?page=1&name=v1.23.6-rke2r1 All of the latest releases are there

👍 1

faint-airport-83518

05/04/2022, 5:43 PM

Kind of a stupid side question.. what's the difference between v1.21,1.22,1.23?

faint-airport-83518

05/04/2022, 5:44 PM

correlation to k8s version, I'm guessing?

faint-airport-83518

05/04/2022, 5:46 PM

yeah, that was a dumb question heh

gray-lawyer-73831

05/04/2022, 5:46 PM

correlation to k8s version, I’m guessing?

Yep! 😄

faint-airport-83518

05/05/2022, 5:19 PM

Hey - so I got this thing deployed in my airgap and all my images mirrored... plans deployed etc. but it doesn't look like it's doing anything?

faint-airport-83518

05/05/2022, 5:20 PM

I see this env var that I kustomized in -

SYSTEM_UPGRADE_JOB_KUBECTL_IMAGE

and I'm using a private registry.. I'm wondering if I need to put any imagePullSecrets somewhere?

gray-lawyer-73831

05/05/2022, 5:20 PM

what’s your current rke2 version, and what version or channel do you have in your plan?

faint-airport-83518

05/05/2022, 5:22 PM

Copy code

apiVersion: <http://kustomize.toolkit.fluxcd.io/v1beta1|kustomize.toolkit.fluxcd.io/v1beta1>
kind: Kustomization
metadata:
  name: rke2-system-upgrade-controller
  namespace: bigbang
spec:
  interval: 1m
  sourceRef:
    kind: GitRepository
    name: rke2-system-upgrade-controller-repo
  path: .
  prune: true
  images:
  - name: rancher/system-upgrade-controller
    newName: private.registry.internal/rancher/system-upgrade-controller
    newTag: v0.9.1
  patches:
    - patch: |-
        apiVersion: v1
        kind: ConfigMap
        metadata:
          name: default-controller-env
        data:
          SYSTEM_UPGRADE_JOB_KUBECTL_IMAGE: private.registry.internal/rancher/kubectl:v1.23.6
      target:
        kind: ConfigMap
    - patch: |-
        apiVersion: apps/v1
        kind: Deployment
        metadata:
          name: system-upgrade-controller
          namespace: system-upgrade
        spec:
          template:
            spec:
              imagePullSecrets:
                - name: private-registry 
      target:
        kind: Deployment

faint-airport-83518

05/05/2022, 5:23 PM

I have the images airgapped with those tags.. and it's mirroed correctly, etc, the deployment is deployed

faint-airport-83518

05/05/2022, 5:23 PM

my plan:

faint-airport-83518

05/05/2022, 5:23 PM

my plan is also kustomized in -

Copy code

patches:
  - target:
      kind: Plan
    patch: |-
      apiVersion: <http://upgrade.cattle.io/v1|upgrade.cattle.io/v1>
      kind: Plan
      metadata:
        name: whatever
      spec:
        version: v1.22.9-rke2r1

faint-airport-83518

05/05/2022, 5:23 PM

I'm just using the default and patching in the versions

gray-lawyer-73831

05/05/2022, 5:24 PM

so, from the initial thing you posted, do you have a namespace

system-upgrade

and do you see the

system-upgrade-controller

deployed in that namespace?

faint-airport-83518

05/05/2022, 5:24 PM

yeah

gray-lawyer-73831

05/05/2022, 5:24 PM

and what’s the current rke2 version running in the cluster?

faint-airport-83518

05/05/2022, 5:25 PM

v1.22.6+rke2r1

faint-airport-83518

05/05/2022, 5:25 PM

I may need to update the image paths in my plans now that I'm looking at it..

👍 1

faint-airport-83518

05/05/2022, 5:26 PM

I'm guessing I'll need image pull secrets too?

gray-lawyer-73831

05/05/2022, 5:26 PM

Yeah so you have the prereqs, next is just making sure the Plan is up to snuff

faint-airport-83518

05/05/2022, 5:26 PM

Copy code

# Server plan
apiVersion: <http://upgrade.cattle.io/v1|upgrade.cattle.io/v1>
kind: Plan
metadata:
  name: server-plan
  namespace: system-upgrade
  labels:
    rke2-upgrade: server
spec:
  concurrency: 1
  nodeSelector:
    matchExpressions:
      - {key: rke2-upgrade, operator: Exists}
      - {key: rke2-upgrade, operator: NotIn, values: ["disabled", "false"]}
      # When using k8s version 1.19 or older, swap control-plane with master
      - {key: <http://node-role.kubernetes.io/control-plane|node-role.kubernetes.io/control-plane>, operator: In, values: ["true"]}
  serviceAccountName: system-upgrade
  cordon: true
#  drain:
#    force: true
  upgrade:
    image: rancher/rke2-upgrade
  version: v1.23.1+rke2r2
---
# Agent plan
apiVersion: <http://upgrade.cattle.io/v1|upgrade.cattle.io/v1>
kind: Plan
metadata:
  name: agent-plan
  namespace: system-upgrade
  labels:
    rke2-upgrade: agent
spec:
  concurrency: 1
  nodeSelector:
    matchExpressions:
      - {key: rke2-upgrade, operator: Exists}
      - {key: rke2-upgrade, operator: NotIn, values: ["disabled", "false"]}
      # When using k8s version 1.19 or older, swap control-plane with master
      - {key: <http://node-role.kubernetes.io/control-plane|node-role.kubernetes.io/control-plane>, operator: NotIn, values: ["true"]}
  prepare:
    args:
    - prepare
    - server-plan
    image: rancher/rke2-upgrade
  serviceAccountName: system-upgrade
  cordon: true
  drain:
    force: true
  upgrade:
    image: rancher/rke2-upgrade
  version: v1.23.1+rke2r2

faint-airport-83518

05/05/2022, 5:26 PM

I have this I'm just kustomizing over it

gray-lawyer-73831

05/05/2022, 5:26 PM

You might yeah, but if all the images necessary for v1.22.9 are present in your private registry then you might not

faint-airport-83518

05/05/2022, 5:27 PM

well, uhh

faint-airport-83518

05/05/2022, 5:27 PM

is there anywhere in here I can fit in some secrets?

faint-airport-83518

05/05/2022, 5:27 PM

( if needed)

faint-airport-83518

05/05/2022, 5:27 PM

not totally sure what a plan does tbh

gray-lawyer-73831

05/05/2022, 5:30 PM

I don’t think there’s anywhere to put in secrets actually

faint-airport-83518

05/05/2022, 5:30 PM

fudge

gray-lawyer-73831

05/05/2022, 5:30 PM

but that shouldn’t block anything

gray-lawyer-73831

05/05/2022, 5:30 PM

the Plans tell system-upgrade-controller what to do

faint-airport-83518

05/05/2022, 5:31 PM

alright, cool I wasn't sure if it was trying to do some other pod standup or something

faint-airport-83518

05/05/2022, 5:31 PM

how about this other rancher/kubectl container?

gray-lawyer-73831

05/05/2022, 5:31 PM

When it works, it will create Jobs (which will create pods) to upgrade

gray-lawyer-73831

05/05/2022, 5:32 PM

in my plans, I don’t mess with the kubectl version at all, but you should be able to adjust it as you’ve done. I think that is used in the jobs that get deployed

gray-lawyer-73831

05/05/2022, 5:32 PM

but my knowledge in this area is a little bit fuzzy

gray-lawyer-73831

05/05/2022, 5:32 PM

here are known working plans though (not necessarily for airgap, but should be the same):

Copy code

apiVersion: <http://upgrade.cattle.io/v1|upgrade.cattle.io/v1>
kind: Plan
metadata:
  name: rke2-server
  namespace: system-upgrade
  labels:
    rke2-upgrade: server
spec:
  concurrency: 1
  version: v1.22.8-rke2r1
  nodeSelector:
    matchExpressions:
      - {key: <http://node-role.kubernetes.io/master|node-role.kubernetes.io/master>, operator: In, values: ["true"]}
  serviceAccountName: system-upgrade
  cordon: true
  #drain:
  #  force: true
  upgrade:
    image: rancher/rke2-upgrade
---
apiVersion: <http://upgrade.cattle.io/v1|upgrade.cattle.io/v1>
kind: Plan
metadata:
  name: rke2-agent
  namespace: system-upgrade
  labels:
    rke2-upgrade: agent
spec:
  concurrency: 2
  version: v1.22.8-rke2r1
  nodeSelector:
    matchExpressions:
      - {key: <http://node-role.kubernetes.io/master|node-role.kubernetes.io/master>, operator: NotIn, values: ["true"]}
  serviceAccountName: system-upgrade
  prepare:
    image: rancher/rke2-upgrade
    args: ["prepare", "rke2-server"]
  drain:
    force: true
  upgrade:
    image: rancher/rke2-upgrade

faint-airport-83518

05/05/2022, 5:32 PM

so I kustoized our image path over this ->

Copy code

---
apiVersion: v1
kind: ConfigMap
metadata:
  name: default-controller-env
  namespace: system-upgrade
data:
  SYSTEM_UPGRADE_CONTROLLER_DEBUG: "false"
  SYSTEM_UPGRADE_CONTROLLER_THREADS: "2"
  SYSTEM_UPGRADE_JOB_ACTIVE_DEADLINE_SECONDS: "900"
  SYSTEM_UPGRADE_JOB_BACKOFF_LIMIT: "99"
  SYSTEM_UPGRADE_JOB_IMAGE_PULL_POLICY: "Always"
  SYSTEM_UPGRADE_JOB_KUBECTL_IMAGE: "rancher/kubectl:v1.21.9"
  SYSTEM_UPGRADE_JOB_PRIVILEGED: "true"
  SYSTEM_UPGRADE_JOB_TTL_SECONDS_AFTER_FINISH: "900"
  SYSTEM_UPGRADE_PLAN_POLLING_INTERVAL: "15m"

faint-airport-83518

05/05/2022, 5:32 PM

just all of our images ~~usually~~ need a secret to be pulled

gray-lawyer-73831

05/05/2022, 5:33 PM

Your best bet is checking throughout https://github.com/rancher/system-upgrade-controller to see if there is some imagepullsecret support

gray-lawyer-73831

05/05/2022, 5:34 PM

https://github.com/rancher/system-upgrade-controller/blob/aa0f2bb8dd88d45716c3215d590aec8f8777e5c8/examples/suse/sles.yaml which it looks like there might be!

faint-airport-83518

05/05/2022, 5:36 PM

ahh so that's like a uh, host volume secret mount or something

gray-lawyer-73831

05/05/2022, 5:37 PM

yeahhh https://github.com/rancher/system-upgrade-controller/blob/c0583b33371151874c25cb6ee64aedcfbc1e2974/pkg/upgrade/container/container.go#L15-L32

faint-airport-83518

05/05/2022, 5:37 PM

I was more so looking at this configmap https://github.com/rancher/system-upgrade-controller/blob/master/manifests/system-upgrade-controller.yaml#L25-L39

gray-lawyer-73831

05/05/2022, 5:37 PM

airgap of course always makes things more complicated, and to be honest, it’s probably safer in airgap to do what I call a “manual upgrade”

gray-lawyer-73831

05/05/2022, 5:38 PM

because there could be cases where you don’t have the images necessary for an upgrade, and then these automated upgrades try to pull anyway, and end up breaking your cluster

faint-airport-83518

05/05/2022, 5:38 PM

haven't had time to learn go yet heh

gray-lawyer-73831

05/05/2022, 5:39 PM

yeah that configmap controls the system-upgrade-controller deployment, which is the brains (the controller) behind how it applies the plans to upgrade your cluster

faint-airport-83518

05/05/2022, 5:40 PM

https://github.com/rancher/system-upgrade-controller/blob/master/pkg/upgrade/job/job.go#L54-L59

faint-airport-83518

05/05/2022, 5:40 PM

I THINK.. i need a secret in here somewhere so something can spin up a pod with this container?

faint-airport-83518

05/05/2022, 5:45 PM

KubectlImage

- this guy. yeah I'm not seeing anywhere to add

imagePullSecrets

to whatever pod spec

gray-lawyer-73831

05/05/2022, 5:48 PM

It’s possible that doesn’t exist and would need an enhancement to system-upgrade-controller in general. It wouldn’t hurt if you want to create an issue in that repo with the details of what you think you’d need, and if it does exist, someone with more knowledge of this than me can respond and hopefully point you in the right direction. Or if it doesn’t, it can get added and supported eventually 🙂

faint-airport-83518

05/05/2022, 5:50 PM

cool - thanks for poking around for me. I'll try to put one in. Just didn't know what I was looking at/for tbh :^)

gray-lawyer-73831

05/05/2022, 5:50 PM

Sorry I’m not much help here! It’s fun debugging this stuff though. Thank you!

faint-airport-83518

05/05/2022, 5:56 PM

it's weird, I think everything is deployed.. it's just not doing anything lol

faint-airport-83518

05/05/2022, 5:57 PM

from the rancher/system-upgrade-controller:v0.9.1 pod:

Copy code

│
│ system-upgrade-controller-5bd59b74fc-hnqn6 W0505 17:56:39.022311       1 client_config.go:552] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.                                                                                                      │
│ system-upgrade-controller-5bd59b74fc-hnqn6 time="2022-05-05T17:56:39Z" level=info msg="Applying CRD <http://plans.upgrade.cattle.io|plans.upgrade.cattle.io>"                                                                                                                                                                           │
│ system-upgrade-controller-5bd59b74fc-hnqn6 time="2022-05-05T17:56:40Z" level=info msg="Starting /v1, Kind=Node controller"                                                                                                                                                                             │
│ system-upgrade-controller-5bd59b74fc-hnqn6 time="2022-05-05T17:56:40Z" level=info msg="Starting /v1, Kind=Secret controller"                                                                                                                                                                           │
│ system-upgrade-controller-5bd59b74fc-hnqn6 time="2022-05-05T17:56:40Z" level=info msg="Starting batch/v1, Kind=Job controller"                                                                                                                                                                         │
│ system-upgrade-controller-5bd59b74fc-hnqn6 time="2022-05-05T17:56:40Z" level=info msg="Starting <http://upgrade.cattle.io/v1|upgrade.cattle.io/v1>, Kind=Plan controller"                                                                                                                                                            │
│

gray-lawyer-73831

05/05/2022, 5:57 PM

I’d try to minimize your plans, maybe only give it a server plan for now. When those are applied and it is a new version and working, you should see jobs/pods created

gray-lawyer-73831

05/05/2022, 5:57 PM

specifically minimize the nodeSelector in there

faint-airport-83518

05/05/2022, 6:01 PM

okay - i'll try to pull off. those two rke2-upgrade labels? what should put them there though (normally)?

Copy code

# Server plan
apiVersion: <http://upgrade.cattle.io/v1|upgrade.cattle.io/v1>
kind: Plan
metadata:
  name: server-plan
  namespace: system-upgrade
  labels:
    rke2-upgrade: server
spec:
  concurrency: 1
  nodeSelector:
    matchExpressions:
      - {key: rke2-upgrade, operator: Exists}
      - {key: rke2-upgrade, operator: NotIn, values: ["disabled", "false"]}
      # When using k8s version 1.19 or older, swap control-plane with master
      - {key: <http://node-role.kubernetes.io/control-plane|node-role.kubernetes.io/control-plane>, operator: In, values: ["true"]}
  serviceAccountName: system-upgrade
  cordon: true
#  drain:
#    force: true
  upgrade:
    image: rancher/rke2-upgrade
  version: v1.23.1+rke2r2

gray-lawyer-73831

05/05/2022, 6:02 PM

Copy code

# Server plan
apiVersion: <http://upgrade.cattle.io/v1|upgrade.cattle.io/v1>
kind: Plan
metadata:
  name: server-plan
  namespace: system-upgrade
  labels:
    rke2-upgrade: server
spec:
  concurrency: 1
  nodeSelector:
    matchExpressions:
      - {key: <http://node-role.kubernetes.io/control-plane|node-role.kubernetes.io/control-plane>, operator: In, values: ["true"]}
  serviceAccountName: system-upgrade
  cordon: true
#  drain:
#    force: true
  upgrade:
    image: rancher/rke2-upgrade
  version: v1.23.1-rke2r2

gray-lawyer-73831

05/05/2022, 6:02 PM

Try that

gray-lawyer-73831

05/05/2022, 6:02 PM

I think I see the issue

gray-lawyer-73831

05/05/2022, 6:02 PM

the

version

is confusing with these Plans

gray-lawyer-73831

05/05/2022, 6:02 PM

it should be a dash instead of a plus

faint-airport-83518

05/05/2022, 6:03 PM

I pulled that from here btw https://docs.rke2.io/upgrade/automated_upgrade/#configure-plans

faint-airport-83518

05/05/2022, 6:03 PM

🤪

faint-airport-83518

05/05/2022, 6:04 PM

hey I'm getting some errors from OPA gatekeeper blocking some stuff, that's progress

🎉 1

faint-airport-83518

05/05/2022, 6:04 PM

time to go create some new exceptions..weee

gray-lawyer-73831

05/05/2022, 6:06 PM

https://github.com/rancher/rke2/issues/2862

gray-lawyer-73831

05/05/2022, 6:06 PM

😄

faint-airport-83518

05/05/2022, 6:12 PM

I see a job

🦜 1

faint-airport-83518

05/05/2022, 6:15 PM

yeah in the job container spec it's referencing my airgapped

Copy code

- name: SYSTEM_UPGRADE_PLAN_LATEST_VERSION                                                                                                                                                                                                                                                     
           value: v1.22.9-rke2r1                                                                                                                                                                                                                                                                        
          image: private.registry/rancher/rke2-upgrade:v1.22.9-rke2r1                                                                                                                                                                                                                                   
          imagePullPolicy: Always                                                                                                                                                                                                                                                                        
          name: upgrade

it just looks like it's stuck maybe

gray-lawyer-73831

05/05/2022, 6:48 PM

you don’t see any pods created? yeah probably stuck from something.. maybe those secrets here too 🤔

faint-airport-83518

05/05/2022, 6:56 PM

yeah I look at the job and there's no pods in there

faint-airport-83518

05/05/2022, 7:00 PM

hold on... OPA gatekeeper was blocking privileged containers even through my wildcard exceptions...

faint-airport-83518

05/05/2022, 7:01 PM

Copy code

│ Events:                                                                                                                                                                                                                                                                                                │
│   Type     Reason            Age   From               Message                                                                                                                                                                                                                                          │
│   ----     ------            ----  ----               -------                                                                                                                                                                                                                                          │
│   Warning  FailedScheduling  113s  default-scheduler  0/7 nodes are available: 3 node(s) had taint {CriticalAddonsOnly: true}, that the pod didn't tolerate, 4 node(s) didn't match Pod's node affinity/selector.                                                                                      │
│   Warning  FailedScheduling  52s   default-scheduler  0/7 nodes are available: 3 node(s) had taint {CriticalAddonsOnly: true}, that the pod didn't tolerate, 4 node(s) didn't match Pod's node affinity/selector.

faint-airport-83518

05/05/2022, 7:01 PM

ahhh super close

🙏 1

faint-airport-83518

05/05/2022, 7:03 PM

I think those taints are part of our RKE2 deployment... I don't remember why..

faint-airport-83518

05/05/2022, 7:08 PM

alright, hopefully I can add a toleration to this thing

faint-airport-83518

05/05/2022, 7:25 PM

ahhh!!!!

faint-airport-83518

05/05/2022, 7:27 PM

I got past gatekeeper + taints and tolerations

Copy code

Events:                                                                                                                                                                                                                                                                                                │
│   Type     Reason     Age                From               Message                                                                                                                                                                                                                                    │
│   ----     ------     ----               ----               -------                                                                                                                                                                                                                                    │
│   Normal   Scheduled  66s                default-scheduler  Successfully assigned system-upgrade/apply-server-plan-on-vm-gvzonecil2zackrke2server000002--1-6qvnr to vm-gvzonecil2zackrke2server000002                                                                                                  │
│   Normal   Pulling    25s (x3 over 67s)  kubelet            Pulling image "private.registry/rancher/kubectl:v1.23.6"                                                                                                                                                                                  │
│   Warning  Failed     24s (x3 over 66s)  kubelet            Failed to pull image "private.registry/rancher/kubectl:v1.23.6": rpc error: code = Unknown desc = failed to pull and unpack image "private.registry/rancher/kubectl:v1.23.6": failed to resolve reference "private.registry/rancher/kub │
│ ectl:v1.23.6": pulling from host zarf.c1.internal failed with status code [manifests v1.23.6]: 401 Unauthorized                                                                                                                                                                                        │
│   Warning  Failed     24s (x3 over 66s)  kubelet            Error: ErrImagePull                                                                                                                                                                                                                        │
│   Normal   BackOff    13s (x3 over 66s)  kubelet            Back-off pulling image "private.registry/rancher/kubectl:v1.23.6"                                                                                                                                                                         │
│   Warning  Failed     13s (x3 over 66s)  kubelet            Error: ImagePullBackOff                                                                                                                                                                                                                    │
│

faint-airport-83518

05/05/2022, 7:28 PM

I'm mirroring -> private.registry to zarf.c1.internal

faint-airport-83518

05/05/2022, 7:28 PM

just need those imagePullSecrets

gray-lawyer-73831

05/05/2022, 7:29 PM

You might be able to edit the job directly and add those! Also, maybe putting the credentials in registries.yaml to avoid messing around with imagepullsecrets entirely

faint-airport-83518

05/05/2022, 8:23 PM

I added the credentials to registries.yaml on one of nodes and it upgrades

faint-airport-83518

05/05/2022, 8:24 PM

only problem is it's a pain because the registry creds are randomly generated after the rke2 cluster is up and running heh

faint-airport-83518

05/05/2022, 8:24 PM

we need to figure out a way to generate the secrets and have our private registry use them..

gray-lawyer-73831

05/05/2022, 8:26 PM

Ahh interesting setup! So you bring the cluster up using the tarball then I assume, and then run a private registry within the cluster itself?

faint-airport-83518

05/05/2022, 8:27 PM

using a tool called zarf which packages up all of our artifacts and hosts them in a docker registry and gitea server

faint-airport-83518

05/05/2022, 8:27 PM

so it's a fat tarball with everything that we can bring into an airgap

gray-lawyer-73831

05/05/2022, 8:28 PM

makes sense. So it’s much easier to just use imagepullsecrets for everything then and ensure to include that in the manifests

gray-lawyer-73831

05/05/2022, 8:29 PM

okay for one of the nodes that hasn’t upgraded yet, are you able to edit the job directly and add the imagepullsecrets?

gray-lawyer-73831

05/05/2022, 8:30 PM

or maybe just the pod

faint-airport-83518

05/05/2022, 8:30 PM

on the job? let me try

faint-airport-83518

05/05/2022, 8:42 PM

I can't change it on the pod because you can't change a pod spec for imagePullSecrets

Copy code

# pods "apply-server-plan-on-vm-gvzonecil2zackrke2server000000--1-h4pnm" was not valid:
# * spec: Forbidden: pod updates may not change fields other than `spec.containers[*].image`, `spec.initContainers[*].image`, `spec.activeDeadlineSeconds`, `spec.tolerations` (only additions to existing tolerations) or `spec.terminationGracePeriodSeconds` (allow it to be set to 1 if it was previously negative)
#   core.PodSpec{
#       ... // 11 identical fields
#       NodeName:         "vm-gvzonecil2zackrke2server000000",
#       SecurityContext:  &{HostNetwork: true, HostPID: true, HostIPC: true},
# -     ImagePullSecrets: []core.LocalObjectReference{{Name: "private-registry"}},
# +     ImagePullSecrets: nil,
#       Hostname:         "",
#       Subdomain:        "",
#       ... // 14 identical fields
#   }

#rke2

🤦 1

gray-lawyer-73831

05/05/2022, 8:43 PM

right

faint-airport-83518

05/05/2022, 8:43 PM

and I can't change the job

faint-airport-83518

05/05/2022, 8:43 PM

and I can't have the plan put it in the job

gray-lawyer-73831

05/05/2022, 8:44 PM

darn, okay. I’m going to do some internal snooping around to see if there’s a way. Would you create an issue in system-upgrade-controller repo as well? I think this is something that we don’t currently have but clearly it could be nice to include

faint-airport-83518

05/05/2022, 8:45 PM

yeah I don't think it should be a heavy lift to add it to the job templating for the pod spec

💯 1

faint-airport-83518

05/05/2022, 8:45 PM

not that I'm a go developer

gray-lawyer-73831

05/05/2022, 8:46 PM

It shouldn’t be, but I can’t guarantee that it’ll get completed at all or at least anytime soon since there are always a lot of other priorities going on, but let’s see what we can do! 💪 Thank you for debugging on this too, and I’m glad we found something that works even if it’s a pain right now

gray-lawyer-73831

05/05/2022, 8:52 PM

hey

gray-lawyer-73831

05/05/2022, 8:52 PM

https://kubernetes.io/docs/tasks/configure-pod-container/configure-service-account/#add-imagepullsecrets-to-a-service-account this might be a better approach

faint-airport-83518

05/05/2022, 9:05 PM

huh

faint-airport-83518

05/05/2022, 9:05 PM

interesting

faint-airport-83518

05/05/2022, 9:05 PM

can I kustomize that in?

faint-airport-83518

05/05/2022, 9:08 PM

I'm afk a bit - will try later for surs

gray-lawyer-73831

05/05/2022, 9:19 PM

I think you probably can, but I’ve never done that before so I’m not sure! I asked someone a lot smarter than me and he pointed me there 🙂

faint-airport-83518

05/05/2022, 9:42 PM

damn, that worked

faint-airport-83518

05/05/2022, 9:42 PM

nice

gray-lawyer-73831

05/05/2022, 9:42 PM

beautiful

faint-airport-83518

05/05/2022, 9:42 PM

Copy code

apiVersion: <http://kustomize.toolkit.fluxcd.io/v1beta1|kustomize.toolkit.fluxcd.io/v1beta1>
kind: Kustomization
metadata:
  name: rke2-system-upgrade-controller
  namespace: bigbang
spec:
  interval: 1m
  sourceRef:
    kind: GitRepository
    name: rke2-system-upgrade-controller-repo
  path: .
  prune: true
  images:
  - name: rancher/system-upgrade-controller
    newName: private.registry/rancher/system-upgrade-controller
    newTag: v0.9.1
  patches:
    - patch: |-
        apiVersion: v1
        kind: ConfigMap
        metadata:
          name: default-controller-env
        data:
          SYSTEM_UPGRADE_JOB_KUBECTL_IMAGE: private.registry/rancher/kubectl:v1.22.6
      target:
        kind: ConfigMap
    - patch: |-
        apiVersion: apps/v1
        kind: Deployment
        metadata:
          name: system-upgrade-controller
          namespace: system-upgrade
        spec:
          template:
            spec:
              imagePullSecrets:
                - name: private-registry 
      target:
        kind: Deployment
    - patch: |-
        apiVersion: v1
        kind: ServiceAccount
        metadata:
          name: system-upgrade
          namespace: system-upgrade
        imagePullSecrets:
        - name: private-registry
      target:
        kind: ServiceAccount

faint-airport-83518

05/05/2022, 9:43 PM

thanks for helping me out man

faint-airport-83518

05/05/2022, 9:43 PM

still going to keep the issue up, but this is totally workable compared to shoving it into registries.yaml for us

gray-lawyer-73831

05/05/2022, 9:47 PM

Yeah I’m glad we got something going! I’ll comment on the issue that doing this works too 🙂

faint-airport-83518

05/06/2022, 2:29 PM

so now that I got this thing working.. I'm getting a lot of errors on my worker nodes upgrading regarding not being able to evict pods.. so looking into that now

Copy code

│ drain evicting pod logging/logging-ek-es-master-0                                                                                                                                                                                                                                                                                  │
│ drain evicting pod istio-system/passthrough-ingressgateway-7879ff64db-kh86m                                                                                                                                                                                                                                                        │
│ drain evicting pod gatekeeper-system/gatekeeper-controller-manager-5bd878c895-4sbrp                                                                                                                                                                                                                                                │
│ drain error when evicting pods/"passthrough-ingressgateway-7879ff64db-kh86m" -n "istio-system" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.                                                                                                                                            │
│ drain error when evicting pods/"gatekeeper-controller-manager-5bd878c895-4sbrp" -n "gatekeeper-system" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.                                                                                                                                    │
│ drain error when evicting pods/"logging-ek-es-master-0" -n "logging" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.

faint-airport-83518

05/06/2022, 4:35 PM

a bunch of my operators/helm charts had pod disruption budgets - so we are scaling up 🙂

💪 1

2 Views

Open in Slack

Previous Next