This message was deleted Rancher Users #rke2

Join Slack

This message was deleted.

# rke2

adamant-kite-43734

05/16/2023, 6:02 PM

This message was deleted.

creamy-pencil-82913

05/16/2023, 7:03 PM

Can you post the full output? it looks like it’s stuck in a state that the helm job pod doesn’t handle, I’m curious what that is.

creamy-pencil-82913

05/16/2023, 7:04 PM

you could try doing

kubectl delete helmchart -n kube-system rke2-coredns

; that should trigger an uninstall of the chart. Then reinstart rke2 on one of the servers and it should put it back.

broad-farmer-70498

05/16/2023, 7:05 PM

k, I'll give that a try shortly

broad-farmer-70498

05/16/2023, 7:06 PM

if coredns isn't running are we sure the uninstall will work? ie: the controller wouldn't be able to resolve kubernete.svc or whatever and therefore not do anything..

broad-farmer-70498

05/16/2023, 7:32 PM

Copy code

if [[ ${KUBERNETES_SERVICE_HOST} =~ .*:.* ]]; then
	echo "KUBERNETES_SERVICE_HOST is using IPv6"
	CHART="${CHART//%\{KUBERNETES_API\}%/[${KUBERNETES_SERVICE_HOST}]:${KUBERNETES_SERVICE_PORT}}"
else
	CHART="${CHART//%\{KUBERNETES_API\}%/${KUBERNETES_SERVICE_HOST}:${KUBERNETES_SERVICE_PORT}}"
fi

set +v -x
+ [[ true != \t\r\u\e ]]
+ [[ '' == \1 ]]
+ [[ '' == \v\2 ]]
+ [[ -f /config/ca-file.pem ]]
+ [[ -n '' ]]
+ shopt -s nullglob
+ helm_content_decode
+ set -e
+ ENC_CHART_PATH=/chart/rke2-coredns.tgz.base64
+ CHART_PATH=/tmp/rke2-coredns.tgz
+ [[ ! -f /chart/rke2-coredns.tgz.base64 ]]
+ base64 -d /chart/rke2-coredns.tgz.base64
+ CHART=/tmp/rke2-coredns.tgz
+ set +e
+ [[ install != \d\e\l\e\t\e ]]
+ helm_repo_init
+ grep -q -e 'https\?://'
+ [[ helm_v3 == \h\e\l\m\_\v\3 ]]
+ [[ /tmp/rke2-coredns.tgz == stable/* ]]
+ [[ -n '' ]]
+ helm_update install --set-string global.clusterCIDR=10.42.0.0/16 --set-string global.clusterCIDRv4=10.42.0.0/16 --set-string global.clusterDNS=10.43.0.10 --set-string global.clusterDomain=cluster.local --set-string global.rke2DataDir=/var/lib/rancher/rke2 --set-string global.serviceCIDR=10.43.0.0/16
+ [[ helm_v3 == \h\e\l\m\_\v\3 ]]
++ helm_v3 ls --all -f '^rke2-coredns$' --namespace kube-system --output json
++ tr '[:upper:]' '[:lower:]'
++ jq -r '"\(.[0].app_version),\(.[0].status)"'
+ LINE=1.9.3,uninstalling
+ IFS=,
+ read -r INSTALLED_VERSION STATUS _
+ VALUES=
+ for VALUES_FILE in /config/*.yaml
+ VALUES=' --values /config/values-10_HelmChartConfig.yaml'
+ [[ install = \d\e\l\e\t\e ]]
+ [[ 1.9.3 =~ ^(|null)$ ]]
+ [[ uninstalling =~ ^(pending-install|pending-upgrade|pending-rollback)$ ]]
+ [[ uninstalling == \d\e\p\l\o\y\e\d ]]
+ [[ uninstalling =~ ^(deleted|failed|null|unknown)$ ]]
+ echo 'Installing helm_v3 chart'
+ helm_v3 install --set-string global.clusterCIDR=10.42.0.0/16 --set-string global.clusterCIDRv4=10.42.0.0/16 --set-string global.clusterDNS=10.43.0.10 --set-string global.clusterDomain=cluster.local --set-string global.rke2DataDir=/var/lib/rancher/rke2 --set-string global.serviceCIDR=10.43.0.0/16 rke2-coredns /tmp/rke2-coredns.tgz --values /config/values-10_HelmChartConfig.yaml
Error: INSTALLATION FAILED: cannot re-use a name that is still in use

broad-farmer-70498

05/16/2023, 7:33 PM

that's the full log

creamy-pencil-82913

05/16/2023, 7:47 PM

ah, it got stuck uninstalling somehow

creamy-pencil-82913

05/16/2023, 7:47 PM

you can try deleting it to trigger a retry of the delete, or use the helm cli to do it manually

creamy-pencil-82913

05/16/2023, 7:47 PM

or worst case scenario delete the helm secret

broad-farmer-70498

05/16/2023, 7:48 PM

I was planning on that last option

broad-farmer-70498

05/16/2023, 7:48 PM

unless you want me to try something else first..

broad-farmer-70498

05/16/2023, 7:48 PM

just in the middle of something else so will be a few minutes

creamy-pencil-82913

05/16/2023, 7:52 PM

nah, I’d probably just nuke the secret

broad-farmer-70498

05/16/2023, 9:08 PM

ok, seems to have done the trick FYI

broad-farmer-70498

05/16/2023, 9:08 PM

I'll keep an eye on it for a bit

broad-farmer-70498

05/16/2023, 9:19 PM

it seems to be in some sort of loop..constantly running helm-install-rke2-coredns-pg5k2

broad-farmer-70498

05/16/2023, 9:20 PM

like, it's successfully deployed, but that keeps running and completing but not sure why it keeps running non-stop

creamy-pencil-82913

05/16/2023, 9:23 PM

Do you have different content in the rke2-coredns manifest on different nodes?

creamy-pencil-82913

05/16/2023, 9:23 PM

what do you see if you do

kubectl describe addon -n kube-system rke2-coredns

broad-farmer-70498

05/16/2023, 9:24 PM

Copy code

kubectl describe addon -n kube-system rke2-coredns                                                                                                                                                                             (thansen-docker|✚1…1)
Name:         rke2-coredns
Namespace:    kube-system
Labels:       <none>
Annotations:  <none>
API Version:  <http://k3s.cattle.io/v1|k3s.cattle.io/v1>
Kind:         Addon
Metadata:
  Creation Timestamp:  2023-05-12T20:33:39Z
  Generation:          2
  Resource Version:    986083903
  UID:                 e04c254f-eef8-465a-9989-31267358160f
Spec:
  Checksum:  46e8acef2cd2a7d6cd576f90325a27836d4f8a0e1360243bc419064a12e35078
  Source:    /var/lib/rancher/rke2/server/manifests/rke2-coredns.yaml
Status:
  Gvks:
    Group:    <http://helm.cattle.io|helm.cattle.io>
    Kind:     HelmChart
    Version:  v1
Events:       <none>

creamy-pencil-82913

05/16/2023, 9:25 PM

hmm no events from it being redeployed

creamy-pencil-82913

05/16/2023, 9:25 PM

the pod logs show it succeeding, but the job keeps re-running?

broad-farmer-70498

05/16/2023, 9:25 PM

it does complete

broad-farmer-70498

05/16/2023, 9:25 PM

but keeps running every minute or 2

broad-farmer-70498

05/16/2023, 9:26 PM

for example I'm at revision 21 already lol

broad-farmer-70498

05/16/2023, 9:26 PM

after wiping those secrets

creamy-pencil-82913

05/16/2023, 9:26 PM

something is changing it then

creamy-pencil-82913

05/16/2023, 9:27 PM

is the job being updated? or is the job just re-running the pod because its failing.

broad-farmer-70498

05/16/2023, 9:28 PM

pod is not failing

broad-farmer-70498

05/16/2023, 9:28 PM

jobs appear to be deleted and then recreated

creamy-pencil-82913

05/16/2023, 9:32 PM

what about

kubectl describe helmchart -n kube-system rke2-coredns

creamy-pencil-82913

05/16/2023, 9:32 PM

have you deployed a HelmChartConfig to customize the coredns config?

broad-farmer-70498

05/16/2023, 9:33 PM

broad-farmer-70498

05/16/2023, 9:35 PM

so this is a cluster migrated from rke1 -> rke2, it appears the metrics server is doing the same behavior

broad-farmer-70498

05/16/2023, 9:35 PM

Copy code

sh.helm.release.v1.rke2-metrics-server.v2965     <http://helm.sh/release.v1|helm.sh/release.v1>                    1      4m19s
sh.helm.release.v1.rke2-metrics-server.v2966     <http://helm.sh/release.v1|helm.sh/release.v1>                    1      3m19s
sh.helm.release.v1.rke2-metrics-server.v2967     <http://helm.sh/release.v1|helm.sh/release.v1>                    1      2m20s
sh.helm.release.v1.rke2-metrics-server.v2968     <http://helm.sh/release.v1|helm.sh/release.v1>                    1      80s
sh.helm.release.v1.rke2-metrics-server.v2969     <http://helm.sh/release.v1|helm.sh/release.v1>                    1      19s

broad-farmer-70498

05/16/2023, 9:35 PM

we converted it on Friday so it's been a few days...

creamy-pencil-82913

05/16/2023, 9:36 PM

creamy-pencil-82913

05/16/2023, 9:36 PM

yeah, rke1->rke2 migrations are still highly experimental and not supported

creamy-pencil-82913

05/16/2023, 9:36 PM

we really recommend people stand up new clusters and migrate workloads over, we are not currently planning on moving forward with direct conversion support

broad-farmer-70498

05/16/2023, 9:36 PM

based on the speed of progress on the tool I'm guessing it's likely to never be supported 😉

broad-farmer-70498

05/16/2023, 9:37 PM

regardless, we've done several others and I haven't noticed this issue

creamy-pencil-82913

05/16/2023, 9:38 PM

yeah we took a shot at it, but given there is no good way to roll back if you run into problems, our support org didn’t want to be on the hook for a bunch of potential outages caused by a tool that doesn’t have a back-out option.

broad-farmer-70498

05/16/2023, 9:39 PM

understood completely, no complaints here, we have meticulous notes/details about how to do it for our setup and we're walking through them slowly

creamy-pencil-82913

05/16/2023, 9:39 PM

if you look at the rke2-server logs there should be something in there describing why it’s updating the helm job

broad-farmer-70498

05/16/2023, 9:39 PM

I'm just glad something exists to migrate, migrating workloads would be a larger nightmare for us 😞

broad-farmer-70498

05/16/2023, 9:40 PM

I'll check out the logs and see what we can discover, gimme a few and I'll report back

broad-farmer-70498

05/16/2023, 9:54 PM

Copy code

May 16 17:53:04 na01lkubrchd03 rke2[2402]: I0516 17:53:04.287517    2402 event.go:294] "Event occurred" object="kube-system/rke2-coredns" fieldPath="" kind="HelmChart" apiVersion="<http://helm.cattle.io/v1|helm.cattle.io/v1>" type="Normal" reason="ApplyJob" message="Applying HelmChart using Job kube-system/helm-install-rke2-coredns"
May 16 17:53:04 na01lkubrchd03 rke2[2402]: time="2023-05-16T17:53:04-04:00" level=error msg="error syncing 'kube-system/rke2-coredns': handler helm-controller-chart-registration: DesiredSet - Replace Wait batch/v1, Kind=Job kube-system/helm-install-rke2-coredns for helm-controller-chart-registration kube-system/rke2-coredns, requeuing"

broad-farmer-70498

05/16/2023, 10:10 PM

ok, sorry to bother, it was a manual deployment of the k3s-helm-operator fighting with the rke2 built-in stuff

broad-farmer-70498

05/16/2023, 10:10 PM

apparently that step in the meticulous steps got forgot on this cluster 😞

creamy-pencil-82913

05/16/2023, 10:40 PM

yall were running the k3s helm-controller on RKE1!?

broad-farmer-70498

05/16/2023, 10:40 PM

yeah 😄

broad-farmer-70498

05/16/2023, 10:41 PM

we use it for some internal deployments (not cluster services, but actual workloads)

614 Views

Open in Slack

Previous Next