Has anyone seen "weirdness" in `v1.29.15+rke2r1` a...
# rke2
b
Has anyone seen "weirdness" in
v1.29.15+rke2r1
around
rke2-ingress-nginx
and
HelmChartConfig
(from which we need
controller.hostPort.enabled: false
) ... Helm Values history shows all with
controller.hostPort.enabled: false
but sometimes (not always) the DS will have the Host Ports (80,443) defined, which conflicts with another process that runs on system. if I toggle
false
->
true
->
false
it does the right thing. I was able to rollback to the DS with the hostPort defined.
Copy code
$ kubectl rollout -n kube-system undo ds/rke2-ingress-nginx-controller --to-revision 1
daemonset.apps/rke2-ingress-nginx-controller rolled back
$ kubectl get ds -n kube-system rke2-ingress-nginx-controller -o yaml |grep host
          hostPort: 80
          hostPort: 443
c
How are you deploying the HelmChartConfig to your cluster? That makes me suspect that somehow it is bringing up the cluster without the HelmChartConfig in place, then you’re creating it later?
Either that or you have something that’s deleting it or mangling the values
If you have a new revision of the chart that DOES NOT have it set, but the old revision does… then somehow the chart config is getting dropped.
b
If you have a new revision of the chart that DOES NOT have it set, but the old revision does… then somehow the chart config is getting dropped.
can you elaborate please?
c
well you had the value configured earlier, but now do not. The only way that would happen is if it was removed from the HelmChartConfig, or if the HelmChartConfig itself was removed.
there are no cases where it will update the chart but ignore values from HelmChartConfig
b
There def was a point where what was deployed in the DS (
hostPort: 80
) and what was in the
HelmChartConfig
(
hostPort.enabled: false
) did not match. it did not resolve even after doing a
sudo kubectl rollout restart ds rke2-ingress-nginx-controller -n kube-system
...
c
if you edit the ds directly, it will stay that way until the next time the chart gets upgraded. which happens pretty regularly as part of RKE2 releases. This is generally why folks are discouraged from editing things that are managed by helm - it’ll work fine, until the next time you touch the chart or chart values.
Helm does not constantly monitor things and make sure that they stay in sync. So it’ll stay drifted until you forget about it.
b
We render
rke2/server/manifests/rke2-ingress-nginx-config.yaml
with our deployment tooling to get that config in place. AFAIK we don't edit the DS directly or anything like that.
I just did the
kubectl rollout -n kube-system undo ds/rke2-ingress-nginx-controller --to-revision 1
to show that it was indeed that at some point in the past
to make things more interesting, this does not happen 1:1. The majority of the time it just works
but sometimes it does not work and it causes all sorts of problems.
c
I’d recommend checking the rke2 logs in journald and the helm job pod logs when it happens. If you want additional info, you can run rke2 with
debug: true
in the config
b
k
I'll give that a try next week
have a good weekend @creamy-pencil-82913!
👋 1
c
There are no circumstances where the controller will ignore values from a HelmChartConfig, but if you are using external tooling to create or manage that file, it’s possible that there are some timing conditions present with how it is managed that cause it to not exist or have partial content at times.
The controller doesn’t read the file directly; its contents get synced into the cluster periodically, and then the controller picks up changes to the resource.
b
good to know...
I have another instance of this @creamy-pencil-82913
Copy code
$ rke2 --version
rke2 version v1.33.1+rke2r1 (01d605e84711a636d407f6a87060425373b9f09e)
go version go1.24.2 X:boringcrypto
c
ok?
Did you get the logs I asked for last time?
b
Just to re-cap... This file is created by our config mgmt tooling:
Copy code
$ sudo TZ=UTC stat .../rancher/rke2/server/manifests/rke2-ingress-nginx-config.yaml
  File: /.../rancher/rke2/server/manifests/rke2-ingress-nginx-config.yaml
  Size: 546       	Blocks: 8          IO Block: 4096   regular file
Device: fc10h/64528d	Inode: 52724142    Links: 1
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2025-10-08 13:03:00.601603657 +0000
Modify: 2025-10-08 13:03:00.601603657 +0000
Change: 2025-10-08 13:03:00.601603657 +0000
 Birth: -
With contents:
Copy code
---
apiVersion: <http://helm.cattle.io/v1|helm.cattle.io/v1>
kind: HelmChartConfig
metadata:
  name: rke2-ingress-nginx
  namespace: kube-system
spec:
  valuesContent: |-
    controller:
      watchIngressWithoutClass: false
      allowSnippetAnnotations: true
      hostPort:
        enabled: false
      publishService:
        enabled: true
      service:
        enabled: true
        type: LoadBalancer
        ports:
          http: 30480
          https: 30443
        ipFamilyPolicy: SingleStack
        ipFamilies: [IPv4]
      config:
        worker-processes: 2
this is correct:
Copy code
kubectl get helmchartconfigs -n kube-system   rke2-ingress-nginx -o yaml
apiVersion: <http://helm.cattle.io/v1|helm.cattle.io/v1>
kind: HelmChartConfig
metadata:
  annotations:
    <http://objectset.rio.cattle.io/applied|objectset.rio.cattle.io/applied>: H4sIAAAAAAAA/4TPwUrDQBDG8VcJc05q0wSbLngouSgeBb14mWymyZrNbNiZtkrpu0sogh5qj8u3/PjPCXByrxTFBQYDPflxYVHV08KFu0MOKQyOWzDwSH6se4xaB965DlIYSbFFRTAnQOagqC6wzM/QfJBVIV1EF36BbpYgvbqHI1PMusMABoZC/qSkybPj9mHbtoFvEowjgYE40Cpz3EUSybhz/JnZn/zbgExoZ2XYN5TJlyiNcE7BY0P+3zN7lB4MVKt8l+e0KcoK7bKpLBZ52dq82BTrZlWum/tyWSHmM3o1GC7blRaZyM4lB/R7kjqwEisYsIE1Bu8pmndOkiOq7Z8u7JvTPuy19ihikh16ofkLeh+OL+ymiXTLHOB8/g4AAP//wMdctx0CAAA
    <http://objectset.rio.cattle.io/id|objectset.rio.cattle.io/id>: ""
    <http://objectset.rio.cattle.io/owner-gvk|objectset.rio.cattle.io/owner-gvk>: <http://k3s.cattle.io/v1|k3s.cattle.io/v1>, Kind=Addon
    <http://objectset.rio.cattle.io/owner-name|objectset.rio.cattle.io/owner-name>: rke2-ingress-nginx-config
    <http://objectset.rio.cattle.io/owner-namespace|objectset.rio.cattle.io/owner-namespace>: kube-system
  creationTimestamp: "2025-10-08T13:03:25Z"
  generation: 1
  labels:
    <http://objectset.rio.cattle.io/hash|objectset.rio.cattle.io/hash>: 821f11e9348ac0b8ca314dc13937b247b6408aa1
  name: rke2-ingress-nginx
  namespace: kube-system
  resourceVersion: "954"
  uid: b5fcd032-0371-4569-873e-6bce14f47030
spec:
  valuesContent: |-
    controller:
      watchIngressWithoutClass: false
      allowSnippetAnnotations: true
      hostPort:
        enabled: false
      publishService:
        enabled: true
      service:
        enabled: true
        type: LoadBalancer
        ports:
          http: 30480
          https: 30443
        ipFamilyPolicy: SingleStack
        ipFamilies: [IPv4]
      config:
        worker-processes: 2
and we have:
Copy code
helm ls --all -f '^rke2-ingress-nginx$' --namespace kube-system --output json
[{"name":"rke2-ingress-nginx","namespace":"kube-system","revision":"1","updated":"2025-10-08 13:03:26.74367985 +0000 UTC","status":"deployed","chart":"rke2-ingress-nginx-4.12.103","app_version":"1.12.1"}]
but,
hostPort
is still present.
Copy code
kubectl get ds -n kube-system rke2-ingress-nginx-controller -o yaml |grep host
          hostPort: 80
          hostPort: 443
kubectl logs -n kube-system jobs/helm-install-rke2-ingress-nginx --timestamps
c
I’m not super familiar with the ingress-nginx helm chart, as its not something we maintain… but it looks to me like your values are changing the service ports, not the daemonset ports?
I see that you’ve set hostPort: enabled: false but idk what that actually does to the rendered ds
I suspect that, if you create a DS with hostPort set in the port spec, and then later update that port spec to remove the hostPort key, Kubernetes will not actually remove it. There are some fields that are hard to update like that.
You could try to reproduce this by creating a standalone DS with
kubectl apply
, then remove the hostPort field, and apply it again - and see if it gets removed or not. I bet it wont.
https://github.com/kubernetes/kubernetes/issues/117689#issuecomment-1529139182 suggests that you could try setting the port value to null, instead of setting enabled to false
regardless I think this is just helm/kubernetetes being fussy about field updates, not any bug in rke2 or the helm controller
b
This only happens like 1:100 times, usually works just fine. Did you see the failure and attempted recovery in the logs?
If I toggle
hostPort
from
false
->
true
->
false
it clears the fault. (
/.../rancher/rke2/server/manifests/rke2-ingress-nginx-config.yaml
)
It seems like the first install did not have a values flag:
Copy code
helm_update install --set-string global.clusterCIDR=192.0.0.0/24 --set-string global.clusterCIDRv4=192.0.0.0/24 --set-string global.clusterDNS=203.0.113.10 --set-string global.clusterDomain=cluster.local --set-string global.rke2DataDir=/data/rancher/rke2 --set-string global.serviceCIDR=203.0.113.0/24 --set-string global.systemDefaultIngressClass=ingress-nginx
then it looks like the install failed, but was successful on re-install?
Copy code
++ helm ls --all -f '^rke2-ingress-nginx$' --namespace kube-system --output json
++ jq -r '"\(.[0].chart),\(.[0].status)"'
++ tr '[:upper:]' '[:lower:]'
+ LINE=rke2-ingress-nginx-4.12.103,pending-install
+ IFS=,
+ read -r INSTALLED_VERSION STATUS _
+ VALUES=
+ for VALUES_FILE in /config/*.yaml
+ VALUES=' --values /config/values-1-000-HelmChartConfig-ValuesContent.yaml'
+ [[ install = \d\e\l\e\t\e ]]
+ [[ rke2-ingress-nginx-4.12.103 =~ ^(|null)$ ]]
+ [[ pending-install =~ ^(pending-install|pending-upgrade|pending-rollback|uninstalling)$ ]]
+ echo Previous helm job was interrupted, updating status from pending-install to failed
Previous helm job was interrupted, updating status from pending-install to failed
+ echo 'Resetting helm release status from '\''pending-install'\'' to '\''failed'\'''
+ helm set-status rke2-ingress-nginx failed --namespace kube-system
2025/10/08 13:03:26 release rke2-ingress-nginx status updated
+ [[ pending-install == \p\e\n\d\i\n\g\-\u\p\g\r\a\d\e ]]
+ STATUS=failed
+ [[ failed =~ ^deployed$ ]]
+ [[ failed =~ ^(deleted|failed|null|unknown)$ ]]
+ [[ reinstall == \r\e\i\n\s\t\a\l\l ]]
+ echo 'Uninstalling failed helm chart'
+ helm uninstall rke2-ingress-nginx --namespace kube-system --wait
release "rke2-ingress-nginx" uninstalled
+ echo Deleted
Deleted
+ echo 'Installing helm chart'
+ helm install --set-string global.clusterCIDR=192.0.0.0/24 --set-string global.clusterCIDRv4=192.0.0.0/24 --set-string global.clusterDNS=203.0.113.10 --set-string global.clusterDomain=cluster.local --set-string global.rke2DataDir=/data/rancher/rke2 --set-string global.serviceCIDR=203.0.113.0/24 --set-string global.systemDefaultIngressClass=ingress-nginx rke2-ingress-nginx /tmp/rke2-ingress-nginx.tgz --values /config/values-1-000-HelmChartConfig-ValuesContent.yaml
NAME: rke2-ingress-nginx
LAST DEPLOYED: Wed Oct  8 13:03:26 2025
...
where is the source for
helm-install-rke2-ingress-nginx
script?
Ah!
image: rancher/klipper-helm:v0.9.5-build20250306
c
The issue is that your “management tooling” is dropping the HelmChartConfig manifest too late, or at least is occasionally doing so. Sometimes it gets dropped before the chart is installed, and everything works great. Other times it gets dropped after the chart has already been installed, and then it gets updated. Sometimes it drops WHILE the chart is being installed, and it interrupts the existing chart job.
Ideally your tooling would drop all your AddOn manifests before RKE2 is started, so that the chart values are consistently available when the chart is initially installed.
b
I am pretty sure that file exists in all cases before rke2 is started, but I'd have to check timestamps to verify.