This message was deleted Rancher Users #general

Join Slack

This message was deleted.

# general

adamant-kite-43734

01/24/2023, 1:16 PM

This message was deleted.

acceptable-printer-7134

01/24/2023, 1:17 PM

we see some rancher reconciliation error with respect to grafana

Copy code

level=error msg="error syncing 'monitoring/sh.helm.release.v1.grafana.v49': handler helm-app-secret: failed to create monitoring/grafana <http://catalog.cattle.io/v1|catalog.cattle.io/v1>, Kind=App for helm-app monitoring/sh.helm.release.v1.grafana.v49: etcdserver: request is too large, requeuing"

acceptable-printer-7134

01/24/2023, 1:31 PM

how to avoid this reconciliation by rancher agent

acceptable-printer-7134

01/24/2023, 2:20 PM

This is affecting our prod cluster. @fast-piano-59234 any pointer? or anyone from rancher

acceptable-printer-7134

01/27/2023, 5:42 AM

trying my luck again here if anyone from rancher can help

Copy code

level=error msg="error syncing 'monitoring/sh.helm.release.v1.grafana.v49': handler helm-app-secret: failed to create monitoring/grafana <http://catalog.cattle.io/v1|catalog.cattle.io/v1>, Kind=App for helm-app monitoring/sh.helm.release.v1.grafana.v49: etcdserver: request is too large, requeuing"

acceptable-printer-7134

01/27/2023, 6:25 AM

not sure what cluster-agent trying to do - we still have that problem.

acceptable-printer-7134

01/27/2023, 7:44 AM

sorry for tagging you @fast-piano-59234 but really stuck with this. can you please help. is there a way we can disable rancher-cluster-agent to perform this sync ?

acceptable-printer-7134

02/07/2023, 2:20 PM

posted this 14 days back still no response from Rancher. Not sure if we have anyone from rancher still supporting this channel? it's causing load on API Server.

witty-honey-18052

02/07/2023, 2:30 PM

have you tried uninstalling the monitoring?

acceptable-printer-7134

02/07/2023, 2:34 PM

in general i understand its the grafana helm release metadata size issue as helm keeps release info in the form of secret

helm.release.v1.grafana.v49

in this case. but why rancher agent keeps syncing that causing issue on API.

acceptable-printer-7134

02/07/2023, 2:35 PM

btw yes @witty-honey-18052 uninstalling grafana release does help. but thats not we can afford this time.

acceptable-printer-7134

02/07/2023, 2:35 PM

how can we avoid rancher agent doing any reconciliation in this case?

witty-honey-18052

02/07/2023, 2:37 PM

yea, it's obviously not ideal, but that's what i was checking, wondering if it was an issue with the helm chart, or an upgrade carrying over an object that's now too large

witty-honey-18052

02/07/2023, 2:37 PM

my thought was does a fresh install of the monitoring stack resolve it

witty-honey-18052

02/07/2023, 2:38 PM

i'm not sure why that isn't backing off

acceptable-printer-7134

02/07/2023, 2:38 PM

yes earlier

releases

didn't have this issue. actually dashboards json being deployed might the cause in our case. we have a plan t migrate to better monitoring architecture in near future.

witty-honey-18052

02/07/2023, 2:39 PM

could be if that's over 1mb

witty-honey-18052

02/07/2023, 2:39 PM

from what i'm reading that's going to be a general etcd issue, not limited to rancher

witty-honey-18052

02/07/2023, 2:40 PM

but it seems like there should be an error back-off regardless

witty-honey-18052

02/07/2023, 2:40 PM

have you opened a GH issue?

witty-honey-18052

02/07/2023, 2:40 PM

(i'm also guessing you don't have paid support w/suse?)

acceptable-printer-7134

02/07/2023, 2:41 PM

object being large is common issue i agree. but rancher agent keep requeuing it seems to be the cause.

acceptable-printer-7134

02/07/2023, 2:43 PM

gonna file

have you opened a GH issue?

but i was hoping since its causing load on API even in prod. wanted to check if we have some workaround other than uninstalling that release

witty-honey-18052

02/07/2023, 2:46 PM

I find the gh issues get surfaced a little better. support tickets obv the better option if available

witty-honey-18052

02/07/2023, 2:46 PM

did this start in 2.7 or did you upgrade after to try to resolve it?

acceptable-printer-7134

02/07/2023, 2:47 PM

have used rancher in the past but its new in this organisation and we are directly using 2.7 first time here.

acceptable-printer-7134

02/07/2023, 2:48 PM

issue started appearing after latest grafana release which definitely tells me its object size issue. but if i can stop rancher doing any reconciliation that would be helpful in this case.

acceptable-printer-7134

02/07/2023, 2:49 PM

Also.i feel this is related in someway https://github.com/rancher/rancher/issues/32939

acceptable-printer-7134

02/07/2023, 2:49 PM

i am also curious to know what

<http://catalog.cattle.io/v1|catalog.cattle.io/v1>

holds?

acceptable-printer-7134

02/07/2023, 2:50 PM

helm-app-secret

- these are rancher terms i believe

witty-honey-18052

02/07/2023, 2:58 PM

and monitoring probably isn't working at all anyways right now, right? or is it just failing to upgrade?

acceptable-printer-7134

02/07/2023, 3:00 PM

just failing to upgrade i believe. but major issue is this

acceptable-printer-7134

02/07/2023, 3:03 PM

seems to me if job

kubernetes-apiservers

can be disable in prometheus. That won't perform any POST on API server.

stocky-account-63046

02/08/2023, 10:09 AM

@acceptable-printer-7134 This slack is primarily for users of Rancher to get together, share their stories and support each other. Members of Rancher are here helping out where they can, but it's primarily for community support.

👍 1

84 Views

Open in Slack

Previous Next