Has anyone seen weirdness in `v1 29 15+rke2r1` around `rke2 Rancher Users #rke2

Has anyone seen "weirdness" in `v1.29.15+rke2r1` a...

bright-lifeguard-9803

08/22/2025, 8:26 PM

Has anyone seen "weirdness" in

v1.29.15+rke2r1

around

rke2-ingress-nginx

and

HelmChartConfig

(from which we need

controller.hostPort.enabled: false

) ... Helm Values history shows all with

controller.hostPort.enabled: false

but sometimes (not always) the DS will have the Host Ports (80,443) defined, which conflicts with another process that runs on system. if I toggle

false

true

false

it does the right thing. I was able to rollback to the DS with the hostPort defined.

Copy code

$ kubectl rollout -n kube-system undo ds/rke2-ingress-nginx-controller --to-revision 1
daemonset.apps/rke2-ingress-nginx-controller rolled back
$ kubectl get ds -n kube-system rke2-ingress-nginx-controller -o yaml |grep host
          hostPort: 80
          hostPort: 443

creamy-pencil-82913

08/22/2025, 8:38 PM

How are you deploying the HelmChartConfig to your cluster? That makes me suspect that somehow it is bringing up the cluster without the HelmChartConfig in place, then you’re creating it later?

creamy-pencil-82913

08/22/2025, 8:38 PM

Either that or you have something that’s deleting it or mangling the values

creamy-pencil-82913

08/22/2025, 8:38 PM

If you have a new revision of the chart that DOES NOT have it set, but the old revision does… then somehow the chart config is getting dropped.

bright-lifeguard-9803

08/22/2025, 9:02 PM

If you have a new revision of the chart that DOES NOT have it set, but the old revision does… then somehow the chart config is getting dropped.

can you elaborate please?

creamy-pencil-82913

08/22/2025, 9:21 PM

well you had the value configured earlier, but now do not. The only way that would happen is if it was removed from the HelmChartConfig, or if the HelmChartConfig itself was removed.

creamy-pencil-82913

08/22/2025, 9:22 PM

there are no cases where it will update the chart but ignore values from HelmChartConfig

bright-lifeguard-9803

08/22/2025, 9:30 PM

There def was a point where what was deployed in the DS (

hostPort: 80

) and what was in the

HelmChartConfig

(

hostPort.enabled: false

) did not match. it did not resolve even after doing a

sudo kubectl rollout restart ds rke2-ingress-nginx-controller -n kube-system

...

creamy-pencil-82913

08/22/2025, 9:40 PM

if you edit the ds directly, it will stay that way until the next time the chart gets upgraded. which happens pretty regularly as part of RKE2 releases. This is generally why folks are discouraged from editing things that are managed by helm - it’ll work fine, until the next time you touch the chart or chart values.

creamy-pencil-82913

08/22/2025, 9:40 PM

Helm does not constantly monitor things and make sure that they stay in sync. So it’ll stay drifted until you forget about it.

bright-lifeguard-9803

08/22/2025, 9:42 PM

We render

rke2/server/manifests/rke2-ingress-nginx-config.yaml

with our deployment tooling to get that config in place. AFAIK we don't edit the DS directly or anything like that.

bright-lifeguard-9803

08/22/2025, 9:46 PM

I just did the

kubectl rollout -n kube-system undo ds/rke2-ingress-nginx-controller --to-revision 1

to show that it was indeed that at some point in the past

bright-lifeguard-9803

08/22/2025, 9:48 PM

to make things more interesting, this does not happen 1:1. The majority of the time it just works

bright-lifeguard-9803

08/22/2025, 9:49 PM

but sometimes it does not work and it causes all sorts of problems.

creamy-pencil-82913

08/22/2025, 9:49 PM

I’d recommend checking the rke2 logs in journald and the helm job pod logs when it happens. If you want additional info, you can run rke2 with

debug: true

in the config

bright-lifeguard-9803

08/22/2025, 9:50 PM

bright-lifeguard-9803

08/22/2025, 9:50 PM

I'll give that a try next week

bright-lifeguard-9803

08/22/2025, 9:51 PM

have a good weekend @creamy-pencil-82913!

👋 1

creamy-pencil-82913

08/22/2025, 9:51 PM

There are no circumstances where the controller will ignore values from a HelmChartConfig, but if you are using external tooling to create or manage that file, it’s possible that there are some timing conditions present with how it is managed that cause it to not exist or have partial content at times.

creamy-pencil-82913

08/22/2025, 9:53 PM

The controller doesn’t read the file directly; its contents get synced into the cluster periodically, and then the controller picks up changes to the resource.

bright-lifeguard-9803

08/22/2025, 9:59 PM

good to know...

bright-lifeguard-9803

10/08/2025, 5:21 PM

I have another instance of this @creamy-pencil-82913

bright-lifeguard-9803

10/08/2025, 5:22 PM

Copy code

$ rke2 --version
rke2 version v1.33.1+rke2r1 (01d605e84711a636d407f6a87060425373b9f09e)
go version go1.24.2 X:boringcrypto

creamy-pencil-82913

10/08/2025, 5:47 PM

ok?

creamy-pencil-82913

10/08/2025, 5:47 PM

Did you get the logs I asked for last time?

bright-lifeguard-9803

10/08/2025, 5:59 PM

Just to re-cap... This file is created by our config mgmt tooling:

Copy code

$ sudo TZ=UTC stat .../rancher/rke2/server/manifests/rke2-ingress-nginx-config.yaml
  File: /.../rancher/rke2/server/manifests/rke2-ingress-nginx-config.yaml
  Size: 546       	Blocks: 8          IO Block: 4096   regular file
Device: fc10h/64528d	Inode: 52724142    Links: 1
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2025-10-08 13:03:00.601603657 +0000
Modify: 2025-10-08 13:03:00.601603657 +0000
Change: 2025-10-08 13:03:00.601603657 +0000
 Birth: -

With contents:

Copy code

---
apiVersion: <http://helm.cattle.io/v1|helm.cattle.io/v1>
kind: HelmChartConfig
metadata:
  name: rke2-ingress-nginx
  namespace: kube-system
spec:
  valuesContent: |-
    controller:
      watchIngressWithoutClass: false
      allowSnippetAnnotations: true
      hostPort:
        enabled: false
      publishService:
        enabled: true
      service:
        enabled: true
        type: LoadBalancer
        ports:
          http: 30480
          https: 30443
        ipFamilyPolicy: SingleStack
        ipFamilies: [IPv4]
      config:
        worker-processes: 2

this is correct:

Copy code

kubectl get helmchartconfigs -n kube-system   rke2-ingress-nginx -o yaml
apiVersion: <http://helm.cattle.io/v1|helm.cattle.io/v1>
kind: HelmChartConfig
metadata:
  annotations:
    <http://objectset.rio.cattle.io/applied|objectset.rio.cattle.io/applied>: H4sIAAAAAAAA/4TPwUrDQBDG8VcJc05q0wSbLngouSgeBb14mWymyZrNbNiZtkrpu0sogh5qj8u3/PjPCXByrxTFBQYDPflxYVHV08KFu0MOKQyOWzDwSH6se4xaB965DlIYSbFFRTAnQOagqC6wzM/QfJBVIV1EF36BbpYgvbqHI1PMusMABoZC/qSkybPj9mHbtoFvEowjgYE40Cpz3EUSybhz/JnZn/zbgExoZ2XYN5TJlyiNcE7BY0P+3zN7lB4MVKt8l+e0KcoK7bKpLBZ52dq82BTrZlWum/tyWSHmM3o1GC7blRaZyM4lB/R7kjqwEisYsIE1Bu8pmndOkiOq7Z8u7JvTPuy19ihikh16ofkLeh+OL+ymiXTLHOB8/g4AAP//wMdctx0CAAA
    <http://objectset.rio.cattle.io/id|objectset.rio.cattle.io/id>: ""
    <http://objectset.rio.cattle.io/owner-gvk|objectset.rio.cattle.io/owner-gvk>: <http://k3s.cattle.io/v1|k3s.cattle.io/v1>, Kind=Addon
    <http://objectset.rio.cattle.io/owner-name|objectset.rio.cattle.io/owner-name>: rke2-ingress-nginx-config
    <http://objectset.rio.cattle.io/owner-namespace|objectset.rio.cattle.io/owner-namespace>: kube-system
  creationTimestamp: "2025-10-08T13:03:25Z"
  generation: 1
  labels:
    <http://objectset.rio.cattle.io/hash|objectset.rio.cattle.io/hash>: 821f11e9348ac0b8ca314dc13937b247b6408aa1
  name: rke2-ingress-nginx
  namespace: kube-system
  resourceVersion: "954"
  uid: b5fcd032-0371-4569-873e-6bce14f47030
spec:
  valuesContent: |-
    controller:
      watchIngressWithoutClass: false
      allowSnippetAnnotations: true
      hostPort:
        enabled: false
      publishService:
        enabled: true
      service:
        enabled: true
        type: LoadBalancer
        ports:
          http: 30480
          https: 30443
        ipFamilyPolicy: SingleStack
        ipFamilies: [IPv4]
      config:
        worker-processes: 2

and we have:

Copy code

helm ls --all -f '^rke2-ingress-nginx$' --namespace kube-system --output json
[{"name":"rke2-ingress-nginx","namespace":"kube-system","revision":"1","updated":"2025-10-08 13:03:26.74367985 +0000 UTC","status":"deployed","chart":"rke2-ingress-nginx-4.12.103","app_version":"1.12.1"}]

but,

hostPort

is still present.

Copy code

kubectl get ds -n kube-system rke2-ingress-nginx-controller -o yaml |grep host
          hostPort: 80
          hostPort: 443

bright-lifeguard-9803

10/08/2025, 6:09 PM

kubectl logs -n kube-system jobs/helm-install-rke2-ingress-nginx --timestamps

Untitled

creamy-pencil-82913

10/08/2025, 6:23 PM

I’m not super familiar with the ingress-nginx helm chart, as its not something we maintain… but it looks to me like your values are changing the service ports, not the daemonset ports?

creamy-pencil-82913

10/08/2025, 6:24 PM

I see that you’ve set hostPort: enabled: false but idk what that actually does to the rendered ds

creamy-pencil-82913

10/08/2025, 6:26 PM

https://github.com/rancher/rke2-charts/blob/main/charts/rke2-ingress-nginx/rke2-ingress-nginx/4.13.300/templates/controller-daemonset.yaml#L118-L120

creamy-pencil-82913

10/08/2025, 6:28 PM

I suspect that, if you create a DS with hostPort set in the port spec, and then later update that port spec to remove the hostPort key, Kubernetes will not actually remove it. There are some fields that are hard to update like that.

creamy-pencil-82913

10/08/2025, 6:29 PM

You could try to reproduce this by creating a standalone DS with

kubectl apply

, then remove the hostPort field, and apply it again - and see if it gets removed or not. I bet it wont.

creamy-pencil-82913

10/08/2025, 6:31 PM

https://github.com/kubernetes/kubernetes/issues/117689#issuecomment-1529139182 suggests that you could try setting the port value to null, instead of setting enabled to false

creamy-pencil-82913

10/08/2025, 6:32 PM

regardless I think this is just helm/kubernetetes being fussy about field updates, not any bug in rke2 or the helm controller

bright-lifeguard-9803

10/09/2025, 1:53 PM

This only happens like 1:100 times, usually works just fine. Did you see the failure and attempted recovery in the logs?

bright-lifeguard-9803

10/09/2025, 2:12 PM

If I toggle

hostPort

from

false

true

false

it clears the fault. (

/.../rancher/rke2/server/manifests/rke2-ingress-nginx-config.yaml

)

bright-lifeguard-9803

10/09/2025, 5:01 PM

It seems like the first install did not have a values flag:

Copy code

helm_update install --set-string global.clusterCIDR=192.0.0.0/24 --set-string global.clusterCIDRv4=192.0.0.0/24 --set-string global.clusterDNS=203.0.113.10 --set-string global.clusterDomain=cluster.local --set-string global.rke2DataDir=/data/rancher/rke2 --set-string global.serviceCIDR=203.0.113.0/24 --set-string global.systemDefaultIngressClass=ingress-nginx

then it looks like the install failed, but was successful on re-install?

Copy code

++ helm ls --all -f '^rke2-ingress-nginx$' --namespace kube-system --output json
++ jq -r '"\(.[0].chart),\(.[0].status)"'
++ tr '[:upper:]' '[:lower:]'
+ LINE=rke2-ingress-nginx-4.12.103,pending-install
+ IFS=,
+ read -r INSTALLED_VERSION STATUS _
+ VALUES=
+ for VALUES_FILE in /config/*.yaml
+ VALUES=' --values /config/values-1-000-HelmChartConfig-ValuesContent.yaml'
+ [[ install = \d\e\l\e\t\e ]]
+ [[ rke2-ingress-nginx-4.12.103 =~ ^(|null)$ ]]
+ [[ pending-install =~ ^(pending-install|pending-upgrade|pending-rollback|uninstalling)$ ]]
+ echo Previous helm job was interrupted, updating status from pending-install to failed
Previous helm job was interrupted, updating status from pending-install to failed
+ echo 'Resetting helm release status from '\''pending-install'\'' to '\''failed'\'''
+ helm set-status rke2-ingress-nginx failed --namespace kube-system
2025/10/08 13:03:26 release rke2-ingress-nginx status updated
+ [[ pending-install == \p\e\n\d\i\n\g\-\u\p\g\r\a\d\e ]]
+ STATUS=failed
+ [[ failed =~ ^deployed$ ]]
+ [[ failed =~ ^(deleted|failed|null|unknown)$ ]]
+ [[ reinstall == \r\e\i\n\s\t\a\l\l ]]
+ echo 'Uninstalling failed helm chart'
+ helm uninstall rke2-ingress-nginx --namespace kube-system --wait
release "rke2-ingress-nginx" uninstalled
+ echo Deleted
Deleted
+ echo 'Installing helm chart'
+ helm install --set-string global.clusterCIDR=192.0.0.0/24 --set-string global.clusterCIDRv4=192.0.0.0/24 --set-string global.clusterDNS=203.0.113.10 --set-string global.clusterDomain=cluster.local --set-string global.rke2DataDir=/data/rancher/rke2 --set-string global.serviceCIDR=203.0.113.0/24 --set-string global.systemDefaultIngressClass=ingress-nginx rke2-ingress-nginx /tmp/rke2-ingress-nginx.tgz --values /config/values-1-000-HelmChartConfig-ValuesContent.yaml
NAME: rke2-ingress-nginx
LAST DEPLOYED: Wed Oct  8 13:03:26 2025
...

bright-lifeguard-9803

10/09/2025, 5:06 PM

where is the source for

helm-install-rke2-ingress-nginx

script?

bright-lifeguard-9803

10/09/2025, 5:19 PM

Ah!

image: rancher/klipper-helm:v0.9.5-build20250306

creamy-pencil-82913

10/09/2025, 7:09 PM

this sounds a lot like https://github.com/rancher/rke2/issues/8357#issuecomment-3007039847

creamy-pencil-82913

10/09/2025, 7:11 PM

The issue is that your “management tooling” is dropping the HelmChartConfig manifest too late, or at least is occasionally doing so. Sometimes it gets dropped before the chart is installed, and everything works great. Other times it gets dropped after the chart has already been installed, and then it gets updated. Sometimes it drops WHILE the chart is being installed, and it interrupts the existing chart job.

creamy-pencil-82913

10/09/2025, 7:11 PM

Ideally your tooling would drop all your AddOn manifests before RKE2 is started, so that the chart values are consistently available when the chart is initially installed.

bright-lifeguard-9803

10/09/2025, 7:13 PM

I am pretty sure that file exists in all cases before rke2 is started, but I'd have to check timestamps to verify.

4 Views

Open in Slack

Previous Next