This message was deleted.
# opni
a
This message was deleted.
b
I refreshed the page a little later and encountered the following error:
Copy code
[404 Not Found] {"error":{"root_cause":[{"type":"index_not_found_exception","reason":"no such index [.tasks]","index":".tasks","resource.id":".tasks","resource.type":"index_expression","index_uuid":"_na_"}],"type":"resource_not_found_exception","reason":"task [gGByDRYPQ62b7SHze4agiQ:288] belongs to the node [gGByDRYPQ62b7SHze4agiQ] which isn't part of the cluster and there is no record of the task","caused_by":{"type":"resource_not_found_exception","reason":"task [gGByDRYPQ62b7SHze4agiQ:288] isn't running and hasn't stored its results","caused_by":{"type":"index_not_found_exception","reason":"no such index [.tasks]","index":".tasks","resource.id":".tasks","resource.type":"index_expression","index_uuid":"_na_"}}},"status":404}: opensearch request unsuccessful
Thoughts?
The manager pod logs reported the following:
Copy code
[07:14:07] ERROR failed to update status {"controller": "loggingcluster", "controllerGroup": "<http://core.opni.io|core.opni.io>", "controllerKind": "LoggingCluster", "LoggingCluster": {"name":"logging-9zwfl","namespace":"opni"}, "namespace": "opni", "name": "logging-9zwfl", "reconcileID": "7699b389-25e2-4833-9b67-318609b8420c", "error": "<http://LoggingCluster.core.opni.io|LoggingCluster.core.opni.io> \"logging-9zwfl\" not found"}
<http://github.com/rancher/opni/pkg/resources/loggingcluster.(*Reconciler).Reconcile.func1|github.com/rancher/opni/pkg/resources/loggingcluster.(*Reconciler).Reconcile.func1>
        <http://github.com/rancher/opni/pkg/resources/loggingcluster/loggingcluster.go:67|github.com/rancher/opni/pkg/resources/loggingcluster/loggingcluster.go:67>
<http://github.com/rancher/opni/pkg/resources/loggingcluster.(*Reconciler).Reconcile|github.com/rancher/opni/pkg/resources/loggingcluster.(*Reconciler).Reconcile>
        <http://github.com/rancher/opni/pkg/resources/loggingcluster/loggingcluster.go:93|github.com/rancher/opni/pkg/resources/loggingcluster/loggingcluster.go:93>
<http://github.com/rancher/opni/controllers.(*CoreLoggingClusterReconciler).Reconcile|github.com/rancher/opni/controllers.(*CoreLoggingClusterReconciler).Reconcile>
        <http://github.com/rancher/opni/controllers/core_loggingcluster_controller.go:54|github.com/rancher/opni/controllers/core_loggingcluster_controller.go:54>
<http://sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile|sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile>
        <http://sigs.k8s.io/controller-runtime@v0.15.0/pkg/internal/controller/controller.go:118|sigs.k8s.io/controller-runtime@v0.15.0/pkg/internal/controller/controller.go:118>
<http://sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler|sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler>
        <http://sigs.k8s.io/controller-runtime@v0.15.0/pkg/internal/controller/controller.go:314|sigs.k8s.io/controller-runtime@v0.15.0/pkg/internal/controller/controller.go:314>
<http://sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem|sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem>
        <http://sigs.k8s.io/controller-runtime@v0.15.0/pkg/internal/controller/controller.go:265|sigs.k8s.io/controller-runtime@v0.15.0/pkg/internal/controller/controller.go:265>
<http://sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2|sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2>
        <http://sigs.k8s.io/controller-runtime@v0.15.0/pkg/internal/controller/controller.go:226|sigs.k8s.io/controller-runtime@v0.15.0/pkg/internal/controller/controller.go:226>
Encountered this error trying to reinitialize Logging
Copy code
[07:23:48] ERROR Reconciler error {"controller": "multiclusterrolebinding", "controllerGroup": "<http://logging.opni.io|logging.opni.io>", "controllerKind": "MulticlusterRoleBinding", "MulticlusterRoleBinding": {"name":"opni","namespace":"opni"}, "namespace": "opni", "name": "opni", "reconcileID": "a2ec19d3-e8a7-41c8-8b5d-05a0482a44a3", "error": "dial tcp: lookup opni-opensearch-svc on 10.43.0.10:53: server misbehaving"}
<http://sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler|sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler>
        <http://sigs.k8s.io/controller-runtime@v0.15.0/pkg/internal/controller/controller.go:324|sigs.k8s.io/controller-runtime@v0.15.0/pkg/internal/controller/controller.go:324>
<http://sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem|sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem>
        <http://sigs.k8s.io/controller-runtime@v0.15.0/pkg/internal/controller/controller.go:265|sigs.k8s.io/controller-runtime@v0.15.0/pkg/internal/controller/controller.go:265>
<http://sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2|sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2>
        <http://sigs.k8s.io/controller-runtime@v0.15.0/pkg/internal/controller/controller.go:226|sigs.k8s.io/controller-runtime@v0.15.0/pkg/internal/controller/controller.go:226>
@famous-dusk-36365 @bright-oil-36284 @brief-jordan-43130 Is there a way to manually get Logging reinstalled without uninstalling the entire opni server?
b
Are you able to uninstall the logging backend via the opni dashboard? If not, I suggest trying to delete the
<http://opensearch.opster.io|opensearch.opster.io>
CRDs, followed by the
<http://logging.opni.io|logging.opni.io>
CRDs
note you may have to edit some of the finalizers on those resources before deleting them,
b
I cannot uninstall from the opni dashboard. I'll try deleting the CRDs with sensitivity to the finalizers and let you know.
f
I've not seen that error before. The internalopni user is one that is hard coded into the cluster creation (an internal admin is a requirement for the upstream Opensearch operator - I have an issue open there to switch to using admin certs rather than a user). The user is created as part of creating the cluster. If I had to guess I would say that all the data nodes in the opensearch cluster lost their storage, and hence the security and other internal indices are missing.
There should be an OpniOpensearch resource; that is the parent resource for the logging cluster; deleting that should uninstall logging
c
In 0.11.1, do agents remotely connect directly to opensearch or does external agent traffic now go through the gateway before reaching opensearch?
b
everything goes through the gateway
f
We use OpenTelemetry OTLP to transmit the logging data. The agent works as a proxy through the gateway to an OpenTelemetry collector in the central cluster. This is what then writes the logs to Opensearch.
b
@famous-dusk-36365 Deleting the OpniOpensearch CR indeed seems to have uninstalled logging. Thanks for the guidance. @bright-oil-36284 Thanks to you as well.
After the successful uninstall, I tried to install but encountered the following error: dial tcp: lookup opni-opensearch-svc on 10.43.0.1053 server misbehaving Do you have any suggestions? Here's the related svc definition for your review:
Copy code
$ k -n opni get svc opni-opensearch-svc
NAME                  TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)                               AGE
opni-opensearch-svc   ClusterIP   10.43.252.43   <none>        9200/TCP,9300/TCP,9600/TCP,9650/TCP   60s
FYI, in lieu of a LoadBalancer, I am running on RKE2 with the following manifest applied:
Copy code
cat <<EOF | kubectl apply -f -
apiVersion: <http://helm.cattle.io/v1|helm.cattle.io/v1>
kind: HelmChartConfig
metadata:
  name: rke2-ingress-nginx
  namespace: kube-system
spec:
  valuesContent: |-
    tcp:
      4000: "opni/opni:4000"
      9090: "opni/opni:9090"
      12080: "opni/opni-admin-dashboard:12080"
EOF
Seems to be working ok for the gateway, alerting, and cortex.
f
hmm that's weird; it looks like a timeout from the dns server
what is the status field of the opni-opensearch-svc object?
b
spec: clusterIP: 10.43.252.43 clusterIPs: - 10.43.252.43 internalTrafficPolicy: Cluster ipFamilies: - IPv4 ipFamilyPolicy: SingleStack ports: - name: http port: 9200 protocol: TCP targetPort: 9200 - name: transport port: 9300 protocol: TCP targetPort: 9300 - name: metrics port: 9600 protocol: TCP targetPort: 9600 - name: rca port: 9650 protocol: TCP targetPort: 9650 selector: opster.io/opensearch-cluster: opni sessionAffinity: None type: ClusterIP status: loadBalancer: {}
Do you think rke2-ingress-nginx needs add'l configuration to support opensearch?
f
I don't think so; all the communication to opensearch should be internal to the cluster so shouldn't need to go through the ingress
is there a pod in the cluster where you can do a dig on opni-opensearch-svc? Also what is the status on the opensearch pods in the cluster?
And one last question; what's the storageclass you're using; and how many kubernetes nodes make up the cluster?
b
Single-node cluster.
k get storageclass NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE directpv-min-io (default) directpv-min-io Delete WaitForFirstConsumer true 47d
k get po -n opni NAME READY STATUS RESTARTS AGE cortex-all-0 1/1 Running 0 8h opni-agent-7445897576-842cf 3/3 Running 9 (9h ago) 4d17h opni-alertmanager-alerting-0 2/2 Running 12 (9h ago) 4d15h opni-bootstrap-0 1/1 Running 0 28m opni-data-0 1/1 Running 0 28m opni-gateway-57d44d8c77-9m9nf 1/1 Running 8 (9h ago) 4d17h opni-kube-prometheus-stack-operator-86b57c4f69-zp4pf 1/1 Running 2 (9h ago) 4d17h opni-kube-state-metrics-5df8ccf5-tb958 1/1 Running 2 (9h ago) 4d17h opni-manager-56d6c75644-k2dt6 2/2 Running 4 (9h ago) 4d17h opni-nats-0 2/2 Running 4 (9h ago) 4d17h opni-nats-1 2/2 Running 4 (9h ago) 4d17h opni-nats-2 2/2 Running 4 (9h ago) 4d17h opni-otel-preprocessor-c7f9848c7-dt7bj 1/1 Running 0 31m opni-prometheus-node-exporter-28twl 1/1 Running 2 (9h ago) 4d17h opni-quorum-0 0/1 Running 0 28m opni-quorum-1 0/1 Running 0 28m opni-securityconfig-update-hjb5w 0/1 Completed 0 28m prom-agent-opni-prometheus-agent-0 2/2 Running 0 8h
f
so none of the opensearch pods are ready. If the persistent volumes weren't deleted before you reinstalled the cluster they may have some old data which could be causing issues. What are the logs in the data pod?
b
$ k get pvc -n opni NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE data-cortex-all-0 Bound pvc-bd411ef6-8988-440a-bda9-f38def7474c8 64Gi RWO directpv-min-io 8h data-opni-nats-0 Bound pvc-01a649f4-5e67-4b80-bda7-19b00c582c43 5Gi RWO directpv-min-io 4d18h data-opni-nats-1 Bound pvc-7a70af08-67bd-472d-b079-8c2302debaca 5Gi RWO directpv-min-io 4d18h data-opni-nats-2 Bound pvc-febf140e-6fe6-4342-80c7-d0528150d3f4 5Gi RWO directpv-min-io 4d18h data-opni-quorum-0 Bound pvc-7a526df8-7f13-4e5a-8526-2d60f0e5fc93 5Gi RWO directpv-min-io 4d15h data-opni-quorum-1 Bound pvc-7d11ec80-c3f2-4b26-97bc-6776b47f6cf0 5Gi RWO directpv-min-io 4d15h opni-alertmanager-data-opni-alertmanager-alerting-0 Bound pvc-0b4b199e-cc92-4708-891a-391c591236f3 5Gi RWO directpv-min-io 4d15h opni-plugin-cache Bound pvc-abb70e02-9858-4c37-be78-c68ef064a512 8Gi RWO directpv-min-io 4d18h
Are the data-opni-quorum pvcs related to opensearch?
b
yeah
b
Let me uninstall logging again, clear the pvcs, then try again. one minute...
Immediately upon reinstall, the following error message displayed again: dial tcp: lookup opni-opensearch-svc on 10.43.0.1053 server misbehaving
Anything else that might be lingering that deleting the opniopensearch CR doesn't catch?
The admin ui just replaced its message with the following: dial tcp 10.43.30.1079200 connect: connection refused
Then message... [503 Service Unavailable] OpenSearch Security not initialized.: opensearch request unsuccessful
f
Yeah it will take a little while for the opensearch cluster to become available. The quorum nodes are basically small controlplane nodes that are required for leader elections.
The PVCs are the only things that should be left behind.
also I misread the above pods, looks like the data pod was ready but the quorum pods were not. If it's in the same situation now it would be good to see logs from one of those pods
b
$ k logs -n opni --tail=10 opni-data-0 [2023-09-05T221028,570][INFO ][o.o.p.PluginsService ] [opni-data-0] PluginService:onIndexModule index:[.opendistro-job-scheduler-lock/vlAT4nDuSYa8fDV2WFnOsQ] [2023-09-05T221028,608][INFO ][o.o.a.u.d.DestinationMigrationCoordinator] [opni-data-0] Detected cluster change event for destination migration [2023-09-05T221028,709][INFO ][o.o.a.u.d.DestinationMigrationCoordinator] [opni-data-0] Detected cluster change event for destination migration [2023-09-05T221029,537][INFO ][o.o.p.PluginsService ] [opni-data-0] PluginService:onIndexModule index:[.opendistro-ism-managed-index-history-2023.09.05-1/05VeIvMdSdqBSvRODz6feQ] [2023-09-05T221029,542][INFO ][o.o.c.m.MetadataCreateIndexService] [opni-data-0] [.opendistro-ism-managed-index-history-2023.09.05-1] creating index, cause [api], templates [], shards [1]/[1] [2023-09-05T221029,576][INFO ][o.o.p.PluginsService ] [opni-data-0] PluginService:onIndexModule index:[.opendistro-ism-managed-index-history-2023.09.05-1/05VeIvMdSdqBSvRODz6feQ] [2023-09-05T221029,588][INFO ][o.o.a.u.d.DestinationMigrationCoordinator] [opni-data-0] Detected cluster change event for destination migration [2023-09-05T221029,659][INFO ][o.o.a.u.d.DestinationMigrationCoordinator] [opni-data-0] Detected cluster change event for destination migration [2023-09-05T221047,444][INFO ][o.o.i.i.PluginVersionSweepCoordinator] [opni-data-0] Canceling sweep ism plugin version job [2023-09-05T221052,057][INFO ][o.o.j.s.JobScheduler ] [opni-data-0] Will delay 171151 miliseconds for next execution of job logs-v0.5.4-000001 [rlgavli@ip-10-113-61-217 repos]$ k logs -n opni --tail=10 opni-data-0 [2023-09-05T221028,570][INFO ][o.o.p.PluginsService ] [opni-data-0] PluginService:onIndexModule index:[.opendistro-job-scheduler-lock/vlAT4nDuSYa8fDV2WFnOsQ] [2023-09-05T221028,608][INFO ][o.o.a.u.d.DestinationMigrationCoordinator] [opni-data-0] Detected cluster change event for destination migration [2023-09-05T221028,709][INFO ][o.o.a.u.d.DestinationMigrationCoordinator] [opni-data-0] Detected cluster change event for destination migration [2023-09-05T221029,537][INFO ][o.o.p.PluginsService ] [opni-data-0] PluginService:onIndexModule index:[.opendistro-ism-managed-index-history-2023.09.05-1/05VeIvMdSdqBSvRODz6feQ] [2023-09-05T221029,542][INFO ][o.o.c.m.MetadataCreateIndexService] [opni-data-0] [.opendistro-ism-managed-index-history-2023.09.05-1] creating index, cause [api], templates [], shards [1]/[1] [2023-09-05T221029,576][INFO ][o.o.p.PluginsService ] [opni-data-0] PluginService:onIndexModule index:[.opendistro-ism-managed-index-history-2023.09.05-1/05VeIvMdSdqBSvRODz6feQ] [2023-09-05T221029,588][INFO ][o.o.a.u.d.DestinationMigrationCoordinator] [opni-data-0] Detected cluster change event for destination migration [2023-09-05T221029,659][INFO ][o.o.a.u.d.DestinationMigrationCoordinator] [opni-data-0] Detected cluster change event for destination migration [2023-09-05T221047,444][INFO ][o.o.i.i.PluginVersionSweepCoordinator] [opni-data-0] Canceling sweep ism plugin version job [2023-09-05T221052,057][INFO ][o.o.j.s.JobScheduler ] [opni-data-0] Will delay 171151 miliseconds for next execution of job logs-v0.5.4-000001
f
That looks OK, what are the quorum pods saying?
b
k logs -n opni --tail=10 opni-quorum-0 [2023-09-05T221029,571][INFO ][o.o.a.u.d.DestinationMigrationCoordinator] [opni-quorum-0] Detected cluster change event for destination migration [2023-09-05T221029,657][INFO ][o.o.a.u.d.DestinationMigrationCoordinator] [opni-quorum-0] Detected cluster change event for destination migration [2023-09-05T221043,479][WARN ][o.o.s.a.BackendRegistry ] [opni-quorum-0] Authentication finally failed for internalopni from 10.42.53.178:43004 [2023-09-05T221047,452][INFO ][o.o.i.i.PluginVersionSweepCoordinator] [opni-quorum-0] Canceling sweep ism plugin version job [2023-09-05T221115,199][WARN ][o.o.s.a.BackendRegistry ] [opni-quorum-0] Authentication finally failed for internalopni from 10.42.53.178:43004 [2023-09-05T221145,183][WARN ][o.o.s.a.BackendRegistry ] [opni-quorum-0] Authentication finally failed for internalopni from 10.42.53.178:43004 [2023-09-05T221215,544][WARN ][o.o.s.a.BackendRegistry ] [opni-quorum-0] Authentication finally failed for internalopni from 10.42.53.178:43004 [2023-09-05T221245,573][WARN ][o.o.s.a.BackendRegistry ] [opni-quorum-0] Authentication finally failed for internalopni from 10.42.53.178:43004 [2023-09-05T221315,284][WARN ][o.o.s.a.BackendRegistry ] [opni-quorum-0] Authentication finally failed for internalopni from 10.42.53.178:43004 [2023-09-05T221345,169][WARN ][o.o.s.a.BackendRegistry ] [opni-quorum-0] Authentication finally failed for internalopni from 10.42.53.178:43004
$ k logs -n opni --tail=10 opni-quorum-1 [2023-09-05T220547,526][INFO ][o.o.a.u.d.DestinationMigrationCoordinator] [opni-quorum-1] Detected cluster change event for destination migration [2023-09-05T220948,143][INFO ][o.o.c.s.ClusterSettings ] [opni-quorum-1] updating [plugins.index_state_management.metadata_migration.status] from [0] to [1] [2023-09-05T220948,164][INFO ][o.o.i.i.ManagedIndexCoordinator] [opni-quorum-1] Canceling metadata moving job because of cluster setting update. [2023-09-05T220948,164][INFO ][o.o.a.u.d.DestinationMigrationCoordinator] [opni-quorum-1] Detected cluster change event for destination migration [2023-09-05T220954,701][INFO ][o.o.j.s.JobSweeper ] [opni-quorum-1] Running full sweep [2023-09-05T221028,566][INFO ][o.o.a.u.d.DestinationMigrationCoordinator] [opni-quorum-1] Detected cluster change event for destination migration [2023-09-05T221028,698][INFO ][o.o.a.u.d.DestinationMigrationCoordinator] [opni-quorum-1] Detected cluster change event for destination migration [2023-09-05T221029,572][INFO ][o.o.a.u.d.DestinationMigrationCoordinator] [opni-quorum-1] Detected cluster change event for destination migration [2023-09-05T221029,657][INFO ][o.o.a.u.d.DestinationMigrationCoordinator] [opni-quorum-1] Detected cluster change event for destination migration [2023-09-05T221047,455][INFO ][o.o.i.i.PluginVersionSweepCoordinator] [opni-quorum-1] Canceling sweep ism plugin version job
f
Could you try deleting the opni-quorum-0 pvc (it will not delete immediately due to finalizers) and then delete the opni-quorum-0 pod
If that doesn't work I suspect there's a bug with the generation/storage of the internalopni user. I'll need to double check the code to confirm what's going on there.
b
$ k -n opni get po opni-quorum-0 NAME READY STATUS RESTARTS AGE opni-quorum-0 1/1 Running 0 2m28s
$ k -n opni get pvc data-opni-quorum-0 NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE data-opni-quorum-0 Bound pvc-87c82a2d-3d9b-43e9-8660-f7a536d80063 5Gi RWO directpv-min-io 2m31s
Looks like the pod and pvc redeployed ok but still getting error in ui: [404 Not Found] {"error":{"root_cause":[{"type":"index_not_found_exception","reason":"no such index [.tasks]","index":".tasks","resource.id":".tasks","resource.type":"index_expression","index_uuid":"_na_"}],"type":"resource_not_found_exception","reason":"task [gGByDRYPQ62b7SHze4agiQ:288] belongs to the node [gGByDRYPQ62b7SHze4agiQ] which isn't part of the cluster and there is no record of the task","caused_by":{"type":"resource_not_found_exception","reason":"task [gGByDRYPQ62b7SHze4agiQ:288] isn't running and hasn't stored its results","caused_by":{"type":"index_not_found_exception","reason":"no such index [.tasks]","index":".tasks","resource.id":".tasks","resource.type":"index_expression","index_uuid":"_na_"}}},"status"404} opensearch request unsuccessful
I'll let you do add'l troubleshooting. Thanks for your assistance.
Please post here if you discover the problem.
Best regards!
c
I opened issue https://github.com/rancher/opni/issues/1695 to track this problem.
👍 1
b
I tried updating my working, ephemeral configuration w/3 replicas to a persistent configuration w/3 replicas and encountered the following error: ... [040804] INFO Generating certificates {"controller": "opensearchcluster", "controllerGroup": "opensearch.opster.io", "controllerKind": "OpenSearchCluster", "OpenSearchCluster": {"name":"opni","namespace":"opni"}, "namespace": "opni", "name": "opni", "reconcileID": "11a160f3-966e-4979-b16f-b5848f1a51a6", "interface": "http"} [040804] INFO updating existing ism {"controller": "multiclusterrolebinding", "controllerGroup": "logging.opni.io", "controllerKind": "MulticlusterRoleBinding", "MulticlusterRoleBinding": {"name":"opni","namespace":"opni"}, "namespace": "opni", "name": "opni", "reconcileID": "c3431a0e-56c2-4e53-b119-21012136273a", "policy": "log-policy"} [040804] INFO Observed a panic in reconciler: runtime error: index out of range [0] with length 0 {"controller": "opensearchcluster", "controllerGroup": "opensearch.opster.io", "controllerKind": "OpenSearchCluster", "OpenSearchCluster": {"name":"opni","namespace":"opni"}, "namespace": "opni", "name": "opni", "reconcileID": "11a160f3-966e-4979-b16f-b5848f1a51a6"} panic: runtime error: index out of range [0] with length 0 [recovered] panic: runtime error: index out of range [0] with length 0 goroutine 880 [running]: sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile.func1() sigs.k8s.io/controller-runtime@v0.15.0/pkg/internal/controller/controller.go:115 +0x1e5 panic({0x488cea0?, 0xc003f9b260?}) runtime/panic.go:914 +0x21f opensearch.opster.io/pkg/reconcilers.(*ClusterReconciler).reconcileNodeStatefulSet(_, {{0xc001b3818c, 0x4}, 0x3, {0xc001b38190, 0x4}, {0xc0016ffc20, 0xc0016ffc50, {0x0, 0x0, ...}}, ...}, ...) opensearch.opster.io@v0.0.0-00010101000000-000000000000/pkg/reconcilers/cluster.go:308 +0x1f18 opensearch.opster.io/pkg/reconcilers.(*ClusterReconciler).Reconcile(0xc00046abd0) opensearch.opster.io@v0.0.0-00010101000000-000000000000/pkg/reconcilers/cluster.go:120 +0x9b0 opensearch.opster.io/controllers.(*OpenSearchClusterReconciler).reconcilePhaseRunning(0xc001030a50, {0x63c5490, 0xc0016ff890}) opensearch.opster.io@v0.0.0-00010101000000-000000000000/controllers/opensearchController.go:320 +0x7ad opensearch.opster.io/controllers.(*OpenSearchClusterReconciler).Reconcile(0xc001030a50, {0x63c5490, 0xc0016ff890}, {{{0xc001b380e4, 0x4}, {0xc001b380e0, 0x4}}}) opensearch.opster.io@v0.0.0-00010101000000-000000000000/controllers/opensearchController.go:141 +0x779 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile(0x63c5490?, {0x63c5490?, 0xc0016ff890?}, {{{0xc001b380e4?, 0x40b50a0?}, {0xc001b380e0?, 0x6399528?}}}) sigs.k8s.io/controller-runtime@v0.15.0/pkg/internal/controller/controller.go:118 +0xb7 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0xc000fafb80, {0x63c54c8, 0xc000f65860}, {0x45f69c0?, 0xc001b8c020?}) sigs.k8s.io/controller-runtime@v0.15.0/pkg/internal/controller/controller.go:314 +0x365 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc000fafb80, {0x63c54c8, 0xc000f65860}) sigs.k8s.io/controller-runtime@v0.15.0/pkg/internal/controller/controller.go:265 +0x1c9 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2() sigs.k8s.io/controller-runtime@v0.15.0/pkg/internal/controller/controller.go:226 +0x79 created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2 in goroutine 375 sigs.k8s.io/controller-runtime@v0.15.0/pkg/internal/controller/controller.go:222 +0x565
Should I open a separate issue for this issue whereby the opni-manager repeatedly crashes with the above panic?
f
Yes please. That looks like something in the upstream operator which we will need to look at