This message was deleted Rancher Users #rke2

Join Slack

This message was deleted.

# rke2

adamant-kite-43734

07/09/2025, 8:54 PM

This message was deleted.

creamy-pencil-82913

07/09/2025, 9:04 PM

That log message should be showing the node name (as it would show in

kubectl get nodes

), not the IP. Did something happen that caused the node name to change?

creamy-pencil-82913

07/09/2025, 9:05 PM

https://github.com/k3s-io/k3s/blob/v1.29.15%2Bk3s1/pkg/server/server.go#L565-L570

abundant-hair-58573

07/09/2025, 9:06 PM

Sorry, yes it's the node name, our nodes are called ip-xxx-xxx-xxx-xxx.domain

creamy-pencil-82913

07/09/2025, 9:07 PM

Is the name that it says its waiting for the same as the node name in

kubectl get nodes

abundant-hair-58573

07/09/2025, 9:10 PM

actually no, in kubectl get nodes it has ip-xxx-xxx-xxx-xxx.us-iso-east-1.compute.internal. But I have that set in

/etc/rancher/rke2/config.yaml.d/99-aws-id.yaml

with

Copy code

kubelet-arg+:
  - --hostname-override=ip-xxx-xxx-xxx-xxx.us-iso-east-1.compute.internal
kube-proxy-arg+:
  - --hostname-override=ip-xxx-xxx-xxx-xxx.us-iso-east-1.compute.internal
node-name: ip-xxx-xxx-xxx-xxx.us-iso-east-1.compute.internal
node-label+
    - node-type=controlplane

creamy-pencil-82913

07/09/2025, 9:10 PM

uhhh yeah don’t do that

creamy-pencil-82913

07/09/2025, 9:10 PM

that is what the

node-name: xxx

option is for. If you just go poking at the hostname override in individual component args, rke2 itself will not be aware of that.

creamy-pencil-82913

07/09/2025, 9:11 PM

or wait I am confused, is that indentation how you have it? I misread it because the node-name and node-label are indented when they should not be.

abundant-hair-58573

07/09/2025, 9:12 PM

hmm... this has always worked though, it was fine going from 1.27 to 1.28, and it works in a new 1.32 cluster I built. I think you might have pointed me towards that a couple years ago when I was struggling getting the aws cloud controller working

abundant-hair-58573

07/09/2025, 9:12 PM

no ignore the formatting, I'm transcribing by hand across networks, that's all in air-gapped network

creamy-pencil-82913

07/09/2025, 9:13 PM

I do suspect it has something to do with your node name and hostname override settings.

abundant-hair-58573

07/09/2025, 9:15 PM

ok, I'll double check the formatting and all that. My last day on this job is tomorrow, I was just trying to run people through the rke2 upgrade process real quick to make sure my documentation was correct!

abundant-hair-58573

07/09/2025, 9:20 PM

Brand new controlplanes join the cluster just fine with the way I have it formatted. That's been in our terraform for years. This only happened with this specific upgrade 🤷

creamy-pencil-82913

07/09/2025, 9:41 PM

is that all that you have set in your config?

creamy-pencil-82913

07/09/2025, 9:42 PM

so just to be clear, you have node-name set to

ip-xxx-xxx-xxx-xxx.us-iso-east-1.compute.internal

but the log says it is looking for

ip-xxx-xxx-xxx-xxx

without the fqdn?

abundant-hair-58573

07/10/2025, 2:09 PM

Yep, that's all that I have in my config. I'm trying the upgrade in our non airgapped network now and getting the same thing. Straight copy/paste here from

/etc/rancher/rke2/config.yaml.d/99-aws-id.yaml

Copy code

kubelet-arg+:
  - --hostname-override=ip-xxx-xxx-xxx-xxx.ec2.internal
kube-proxy-arg+:
  - --hostname-override=ip-xxx-xxx-xxx-xxx.ec2.internal
node-name: ip-xxx-xxx-xxx-xxx.ec2.internal

This is from the rke2-server log on the control plane that's trying to upgrade

Copy code

rke2[27169]: time="2025-07-10T14:05:26Z" level=info msg="Waiting for control-plane node ip-xxx-xxx-xxx-xxx.domain.org startup: nodes \"ip-xxx-xxx-xxx-xxx.domain.org\" not found"

This is going form 1.28.15 to 1.29.15. Our cloud controller manager was still at 1.27.x from our initial install, I didn't upgrade that way back when I upgraded from 1.27.x to 1.28.15. First thing I did here was upgrade the cloud controller manager to 1.28.11, then I just added the plan to the SUC

Copy code

# Server plan
apiVersion: upgrade.cattle.io/v1
kind: Plan
metadata:
  name: controlplane-plan-v1-29-15
  namespace: cattle-system
  labels:
    rke2-upgrade: controlplane
spec:
  concurrency: 1
  nodeSelector:
    matchExpressions:
       - {key: node-role.kubernetes.io/control-plane, operator: In, values: ["true"]}
  tolerations:
  - key: "node-role.kubernetes.io/control-plane"
    operator: "Equal"
    effect: "NoSchedule"
  - key: "CriticalAddonsOnly"
    operator: "Equal"
    value: "true"
    effect: "NoExecute"
  serviceAccountName: system-upgrade-controller
  cordon: true
  upgrade:
    image: rancher/rke2-upgrade
  version: v1.29.15+rke2r1

abundant-hair-58573

07/10/2025, 2:21 PM

Hmmm I see this in the kubelet log on that node

Copy code

I0710 14:18:01.299173   27479 status_manager.go:877] "Failed to update status for pod" pod="kube-system/kube-proxy-ip-xxx-xxx-xxx-xxx.ec2.internal" err="failed to patch status \"{\\\"metadata\\\":{\\\"uid\\\":\\\"a1c6fb9d-34b3-45a5-9adf-d50451828562\\\"},\\\"status\\\":{\\\"$setElementOrder/conditions\\\":[{\\\"type\\\":\\\"PodReadyToStartContainers\\\"},{\\\"type\\\":\\\"Initialized\\\"},{\\\"type\\\":\\\"Ready\\\"},{\\\"type\\\":\\\"ContainersReady\\\"},{\\\"type\\\":\\\"PodScheduled\\\"}],\\\"conditions\\\":[{\\\"lastProbeTime\\\":null,\\\"lastTransitionTime\\\":\\\"2025-07-10T14:04:31Z\\\",\\\"status\\\":\\\"True\\\",\\\"type\\\":\\\"PodReadyToStartContainers\\\"},{\\\"lastTransitionTime\\\":\\\"2025-07-10T14:04:49Z\\\",\\\"status\\\":\\\"True\\\",\\\"type\\\":\\\"Ready\\\"},{\\\"lastTransitionTime\\\":\\\"2025-07-10T14:04:49Z\\\",\\\"type\\\":\\\"ContainersReady\\\"}],\\\"containerStatuses\\\":[{\\\"containerID\\\":\\\"<containerd://ca9ff807b8758ff432cb1d5b355dc79259311198edad8f4de046885f376b46d>5\\\",\\\"image\\\":\\\"<http://docker-remote.artifactory.domain.org/rancher/hardened-kubernetes:v1.29.15-rke2r1-build20250312\\\|docker-remote.artifactory.domain.org/rancher/hardened-kubernetes:v1.29.15-rke2r1-build20250312\\\>",\\\"imageID\\\":\\\"<http://docker-remote.artifactory.domain.org.org/rancher/hardened-kubernetes@sha256:34aaaf8700ef979929c3b1dbfb2d8de2b25c00a68a6a6b540293d6f576cb89fd\\\|docker-remote.artifactory.domain.org.org/rancher/hardened-kubernetes@sha256:34aaaf8700ef979929c3b1dbfb2d8de2b25c00a68a6a6b540293d6f576cb89fd\\\>",\\\"lastState\\\":{},\\\"name\\\":\\\"kube-proxy\\\",\\\"ready\\\":true,\\\"restartCount\\\":0,\\\"started\\\":true,\\\"state\\\":{\\\"running\\\":{\\\"startedAt\\\":\\\"2025-07-10T14:04:30Z\\\"}}}],\\\"hostIPs\\\":[{\\\"ip\\\":\\\"10.114.49.20\\\"}]}}\" for pod \"kube-system\"/\"kube-proxy-ip-xxx-xxx-xxx-xxx.ec2.internal\": pods \"kube-proxy-ip-xxx-xxx-xxx-xxx.ec2.internal\" is forbidden: node \"<http://ip-xxx-xxx-xxx-xxx.domain.org|ip-xxx-xxx-xxx-xxx.domain.org>\" can only update pod status for pods with spec.nodeName set to itself"

abundant-hair-58573

07/10/2025, 4:04 PM

Ok, I think this was a bug with rke2 v1.29.15. Just for a sanity check I tried upgrading from 1.28.15 to 1.29.9 (just picked a random minor release version) and that worked with the exact same config

abundant-hair-58573

07/10/2025, 5:15 PM

I'm not going to bother putting in a ticket since this is a relatively old version of RKE2 at this point, unless you'd like me to

creamy-pencil-82913

07/10/2025, 5:18 PM

Hmm that would not be any bug I'm aware of. If you go to 1.29.15 after 1.29.9 does it work ok? But yeah would not be fixed, 1.29 has been eol for a while and is not getting any more releases.

abundant-hair-58573

07/10/2025, 5:43 PM

Good question, we'll have to give that a shot and see if that makes a difference. We'll want to get as close to the latest 1.29 patch version anyways when we upgrade that cluster to 1.30

Open in Slack

Previous Next