This message was deleted Rancher Users #rke2

Join Slack

This message was deleted.

# rke2

adamant-kite-43734

04/04/2024, 8:19 PM

This message was deleted.

many-nightfall-61858

04/04/2024, 8:21 PM

helm-install-rke2-snapshot-controller-lvzz6

in CrashLoopBackOff

Copy code

helm_v3 install --set-string global.clusterCIDR=10.42.0.0/16 --set-string global.clusterCIDRv4=10.42.0.0/16 --set-string global.clusterDNS=10.43.0.10 --set-string global.clusterDomain=cluster.local --set-string global.rke2DataDir=/var/lib/rancher/rke2 --set-string global.serviceCIDR=10.43.0.0/16 rke2-snapshot-controller /tmp/rke2-snapshot-controller.tgz
Error: INSTALLATION FAILED: execution error at (rke2-snapshot-controller/templates/validate-install-crd.yaml:13:7): Required CRDs are missing. Please install the corresponding CRD chart before installing this chart.

helm-install-rke2-snapshot-controller-crd-sd7xf

stuck in Running:

Copy code

E0404 18:58:24.099514       1 reflector.go:140] <http://github.com/kubernetes-csi/external-snapshotter/client/v6/informers/externalversions/factory.go:117|github.com/kubernetes-csi/external-snapshotter/client/v6/informers/externalversions/factory.go:117>: Failed to watch *v1.VolumeSnapshotClass: failed to list *v1.VolumeSnapshotClass: the server could not find the requested resource (get <http://volumesnapshotclasses.snapshot.storage.k8s.io|volumesnapshotclasses.snapshot.storage.k8s.io>)

[main] 2024/04/04 18:37:45 Storage driver is Secret
[main] 2024/04/04 18:37:45 Max history per release is 0
$HELM_HOME has been configured at /home/klipper-helm/.helm.
Not installing Tiller due to 'client-only' flag having been set
++ jq -r '.Releases | length'
++ timeout -s KILL 30 helm_v2 ls --all '^rke2-snapshot-controller-crd$' --output json

This causes snapshot validation webhook to be stuck in Crashloop.

many-nightfall-61858

04/04/2024, 8:22 PM

Any ideas what would cause it?

creamy-pencil-82913

04/04/2024, 10:01 PM

how is the snapshot validation webhook installed if the snapshot controller chart hasn’t installed yet?

creamy-pencil-82913

04/04/2024, 10:02 PM

ah right, the CRDs are in the controller chart, but the webhook doesn’t bundle them

creamy-pencil-82913

04/04/2024, 10:03 PM

sorry, it’s been a bit since I touched that chart

creamy-pencil-82913

04/04/2024, 10:04 PM

I would probably try to figure out why that

helm_v2 ls

is hanging, that is unusual. Is there some problem with the node that the job pod is running on?

creamy-pencil-82913

04/04/2024, 10:05 PM

That should timeout and get killed after 30 seconds and then move on to just checking the normal v3 chart, but whatever’s causing v2 to fail is probably an issue for v3 as well. You’ll want to get the full log.

many-nightfall-61858

04/05/2024, 12:01 AM

The node was healthy and in a Ready state. I was trying to see if it was a deeper failure so I tried deleting the helm install jobs that were stuck. Then they ran and completed successfully, and the webhook recovered after. But our tests timeout after 5 minutes, so it was stuck until then, that was the full log

many-nightfall-61858

04/05/2024, 12:05 AM

After the Node reports ‘Ready’ we run this, and it times out on the 2nd condition every so often

Copy code

wait_for_condition \
    "Nodes to report 'Ready' status" \
    "kubectl wait --for=condition=Ready nodes --all --timeout=120s"
  sleep 5
  wait_for_condition \
    "kube-system Jobs to complete" \
    "kubectl -n kube-system wait --for=condition=complete --timeout=600s jobs --all"
  wait_for_condition \
    "kube-system Deployments to report available" \
    "kubectl -n kube-system wait --for=condition=available --timeout=600s deployments --all"

23 Views

Open in Slack

Previous Next