This message was deleted Rancher Users #opni

Join Slack

This message was deleted.

# opni

adamant-kite-43734

09/01/2023, 6:52 AM

This message was deleted.

best-microphone-20624

09/01/2023, 6:56 AM

I refreshed the page a little later and encountered the following error:

Copy code

[404 Not Found] {"error":{"root_cause":[{"type":"index_not_found_exception","reason":"no such index [.tasks]","index":".tasks","resource.id":".tasks","resource.type":"index_expression","index_uuid":"_na_"}],"type":"resource_not_found_exception","reason":"task [gGByDRYPQ62b7SHze4agiQ:288] belongs to the node [gGByDRYPQ62b7SHze4agiQ] which isn't part of the cluster and there is no record of the task","caused_by":{"type":"resource_not_found_exception","reason":"task [gGByDRYPQ62b7SHze4agiQ:288] isn't running and hasn't stored its results","caused_by":{"type":"index_not_found_exception","reason":"no such index [.tasks]","index":".tasks","resource.id":".tasks","resource.type":"index_expression","index_uuid":"_na_"}}},"status":404}: opensearch request unsuccessful

Thoughts?

best-microphone-20624

09/01/2023, 7:20 AM

The manager pod logs reported the following:

Copy code

[07:14:07] ERROR failed to update status {"controller": "loggingcluster", "controllerGroup": "<http://core.opni.io|core.opni.io>", "controllerKind": "LoggingCluster", "LoggingCluster": {"name":"logging-9zwfl","namespace":"opni"}, "namespace": "opni", "name": "logging-9zwfl", "reconcileID": "7699b389-25e2-4833-9b67-318609b8420c", "error": "<http://LoggingCluster.core.opni.io|LoggingCluster.core.opni.io> \"logging-9zwfl\" not found"}
<http://github.com/rancher/opni/pkg/resources/loggingcluster.(*Reconciler).Reconcile.func1|github.com/rancher/opni/pkg/resources/loggingcluster.(*Reconciler).Reconcile.func1>
        <http://github.com/rancher/opni/pkg/resources/loggingcluster/loggingcluster.go:67|github.com/rancher/opni/pkg/resources/loggingcluster/loggingcluster.go:67>
<http://github.com/rancher/opni/pkg/resources/loggingcluster.(*Reconciler).Reconcile|github.com/rancher/opni/pkg/resources/loggingcluster.(*Reconciler).Reconcile>
        <http://github.com/rancher/opni/pkg/resources/loggingcluster/loggingcluster.go:93|github.com/rancher/opni/pkg/resources/loggingcluster/loggingcluster.go:93>
<http://github.com/rancher/opni/controllers.(*CoreLoggingClusterReconciler).Reconcile|github.com/rancher/opni/controllers.(*CoreLoggingClusterReconciler).Reconcile>
        <http://github.com/rancher/opni/controllers/core_loggingcluster_controller.go:54|github.com/rancher/opni/controllers/core_loggingcluster_controller.go:54>
<http://sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile|sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile>
        <http://sigs.k8s.io/controller-runtime@v0.15.0/pkg/internal/controller/controller.go:118|sigs.k8s.io/controller-runtime@v0.15.0/pkg/internal/controller/controller.go:118>
<http://sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler|sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler>
        <http://sigs.k8s.io/controller-runtime@v0.15.0/pkg/internal/controller/controller.go:314|sigs.k8s.io/controller-runtime@v0.15.0/pkg/internal/controller/controller.go:314>
<http://sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem|sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem>
        <http://sigs.k8s.io/controller-runtime@v0.15.0/pkg/internal/controller/controller.go:265|sigs.k8s.io/controller-runtime@v0.15.0/pkg/internal/controller/controller.go:265>
<http://sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2|sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2>
        <http://sigs.k8s.io/controller-runtime@v0.15.0/pkg/internal/controller/controller.go:226|sigs.k8s.io/controller-runtime@v0.15.0/pkg/internal/controller/controller.go:226>

best-microphone-20624

09/01/2023, 7:25 AM

Encountered this error trying to reinitialize Logging

Copy code

[07:23:48] ERROR Reconciler error {"controller": "multiclusterrolebinding", "controllerGroup": "<http://logging.opni.io|logging.opni.io>", "controllerKind": "MulticlusterRoleBinding", "MulticlusterRoleBinding": {"name":"opni","namespace":"opni"}, "namespace": "opni", "name": "opni", "reconcileID": "a2ec19d3-e8a7-41c8-8b5d-05a0482a44a3", "error": "dial tcp: lookup opni-opensearch-svc on 10.43.0.10:53: server misbehaving"}
<http://sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler|sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler>
        <http://sigs.k8s.io/controller-runtime@v0.15.0/pkg/internal/controller/controller.go:324|sigs.k8s.io/controller-runtime@v0.15.0/pkg/internal/controller/controller.go:324>
<http://sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem|sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem>
        <http://sigs.k8s.io/controller-runtime@v0.15.0/pkg/internal/controller/controller.go:265|sigs.k8s.io/controller-runtime@v0.15.0/pkg/internal/controller/controller.go:265>
<http://sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2|sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2>
        <http://sigs.k8s.io/controller-runtime@v0.15.0/pkg/internal/controller/controller.go:226|sigs.k8s.io/controller-runtime@v0.15.0/pkg/internal/controller/controller.go:226>

best-microphone-20624

09/05/2023, 4:38 PM

@famous-dusk-36365 @bright-oil-36284 @brief-jordan-43130 Is there a way to manually get Logging reinstalled without uninstalling the entire opni server?

bright-oil-36284

09/05/2023, 4:42 PM

Are you able to uninstall the logging backend via the opni dashboard? If not, I suggest trying to delete the

<http://opensearch.opster.io|opensearch.opster.io>

CRDs, followed by the

<http://logging.opni.io|logging.opni.io>

CRDs

bright-oil-36284

09/05/2023, 4:43 PM

note you may have to edit some of the finalizers on those resources before deleting them,

best-microphone-20624

09/05/2023, 6:52 PM

I cannot uninstall from the opni dashboard. I'll try deleting the CRDs with sensitivity to the finalizers and let you know.

famous-dusk-36365

09/05/2023, 8:06 PM

I've not seen that error before. The internalopni user is one that is hard coded into the cluster creation (an internal admin is a requirement for the upstream Opensearch operator - I have an issue open there to switch to using admin certs rather than a user). The user is created as part of creating the cluster. If I had to guess I would say that all the data nodes in the opensearch cluster lost their storage, and hence the security and other internal indices are missing.

famous-dusk-36365

09/05/2023, 8:07 PM

There should be an OpniOpensearch resource; that is the parent resource for the logging cluster; deleting that should uninstall logging

creamy-wolf-46823

09/05/2023, 8:27 PM

In 0.11.1, do agents remotely connect directly to opensearch or does external agent traffic now go through the gateway before reaching opensearch?

bright-oil-36284

09/05/2023, 8:28 PM

everything goes through the gateway

famous-dusk-36365

09/05/2023, 8:29 PM

We use OpenTelemetry OTLP to transmit the logging data. The agent works as a proxy through the gateway to an OpenTelemetry collector in the central cluster. This is what then writes the logs to Opensearch.

best-microphone-20624

09/05/2023, 9:18 PM

@famous-dusk-36365 Deleting the OpniOpensearch CR indeed seems to have uninstalled logging. Thanks for the guidance. @bright-oil-36284 Thanks to you as well.

best-microphone-20624

09/05/2023, 9:21 PM

After the successful uninstall, I tried to install but encountered the following error: dial tcp: lookup opni-opensearch-svc on 10.43.0.1053 server misbehaving Do you have any suggestions? Here's the related svc definition for your review:

Copy code

$ k -n opni get svc opni-opensearch-svc
NAME                  TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)                               AGE
opni-opensearch-svc   ClusterIP   10.43.252.43   <none>        9200/TCP,9300/TCP,9600/TCP,9650/TCP   60s

best-microphone-20624

09/05/2023, 9:24 PM

FYI, in lieu of a LoadBalancer, I am running on RKE2 with the following manifest applied:

Copy code

cat <<EOF | kubectl apply -f -
apiVersion: <http://helm.cattle.io/v1|helm.cattle.io/v1>
kind: HelmChartConfig
metadata:
  name: rke2-ingress-nginx
  namespace: kube-system
spec:
  valuesContent: |-
    tcp:
      4000: "opni/opni:4000"
      9090: "opni/opni:9090"
      12080: "opni/opni-admin-dashboard:12080"
EOF

best-microphone-20624

09/05/2023, 9:25 PM

Seems to be working ok for the gateway, alerting, and cortex.

famous-dusk-36365

09/05/2023, 9:25 PM

hmm that's weird; it looks like a timeout from the dns server

famous-dusk-36365

09/05/2023, 9:26 PM

what is the status field of the opni-opensearch-svc object?

best-microphone-20624

09/05/2023, 9:27 PM

spec: clusterIP: 10.43.252.43 clusterIPs: - 10.43.252.43 internalTrafficPolicy: Cluster ipFamilies: - IPv4 ipFamilyPolicy: SingleStack ports: - name: http port: 9200 protocol: TCP targetPort: 9200 - name: transport port: 9300 protocol: TCP targetPort: 9300 - name: metrics port: 9600 protocol: TCP targetPort: 9600 - name: rca port: 9650 protocol: TCP targetPort: 9650 selector: opster.io/opensearch-cluster: opni sessionAffinity: None type: ClusterIP status: loadBalancer: {}

best-microphone-20624

09/05/2023, 9:29 PM

Do you think rke2-ingress-nginx needs add'l configuration to support opensearch?

famous-dusk-36365

09/05/2023, 9:41 PM

I don't think so; all the communication to opensearch should be internal to the cluster so shouldn't need to go through the ingress

famous-dusk-36365

09/05/2023, 9:43 PM

is there a pod in the cluster where you can do a dig on opni-opensearch-svc? Also what is the status on the opensearch pods in the cluster?

famous-dusk-36365

09/05/2023, 9:43 PM

And one last question; what's the storageclass you're using; and how many kubernetes nodes make up the cluster?

best-microphone-20624

09/05/2023, 9:47 PM

Single-node cluster.

best-microphone-20624

09/05/2023, 9:47 PM

k get storageclass NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE directpv-min-io (default) directpv-min-io Delete WaitForFirstConsumer true 47d

best-microphone-20624

09/05/2023, 9:48 PM

k get po -n opni NAME READY STATUS RESTARTS AGE cortex-all-0 1/1 Running 0 8h opni-agent-7445897576-842cf 3/3 Running 9 (9h ago) 4d17h opni-alertmanager-alerting-0 2/2 Running 12 (9h ago) 4d15h opni-bootstrap-0 1/1 Running 0 28m opni-data-0 1/1 Running 0 28m opni-gateway-57d44d8c77-9m9nf 1/1 Running 8 (9h ago) 4d17h opni-kube-prometheus-stack-operator-86b57c4f69-zp4pf 1/1 Running 2 (9h ago) 4d17h opni-kube-state-metrics-5df8ccf5-tb958 1/1 Running 2 (9h ago) 4d17h opni-manager-56d6c75644-k2dt6 2/2 Running 4 (9h ago) 4d17h opni-nats-0 2/2 Running 4 (9h ago) 4d17h opni-nats-1 2/2 Running 4 (9h ago) 4d17h opni-nats-2 2/2 Running 4 (9h ago) 4d17h opni-otel-preprocessor-c7f9848c7-dt7bj 1/1 Running 0 31m opni-prometheus-node-exporter-28twl 1/1 Running 2 (9h ago) 4d17h opni-quorum-0 0/1 Running 0 28m opni-quorum-1 0/1 Running 0 28m opni-securityconfig-update-hjb5w 0/1 Completed 0 28m prom-agent-opni-prometheus-agent-0 2/2 Running 0 8h

famous-dusk-36365

09/05/2023, 9:50 PM

so none of the opensearch pods are ready. If the persistent volumes weren't deleted before you reinstalled the cluster they may have some old data which could be causing issues. What are the logs in the data pod?

best-microphone-20624

09/05/2023, 9:52 PM

$ k get pvc -n opni NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE data-cortex-all-0 Bound pvc-bd411ef6-8988-440a-bda9-f38def7474c8 64Gi RWO directpv-min-io 8h data-opni-nats-0 Bound pvc-01a649f4-5e67-4b80-bda7-19b00c582c43 5Gi RWO directpv-min-io 4d18h data-opni-nats-1 Bound pvc-7a70af08-67bd-472d-b079-8c2302debaca 5Gi RWO directpv-min-io 4d18h data-opni-nats-2 Bound pvc-febf140e-6fe6-4342-80c7-d0528150d3f4 5Gi RWO directpv-min-io 4d18h data-opni-quorum-0 Bound pvc-7a526df8-7f13-4e5a-8526-2d60f0e5fc93 5Gi RWO directpv-min-io 4d15h data-opni-quorum-1 Bound pvc-7d11ec80-c3f2-4b26-97bc-6776b47f6cf0 5Gi RWO directpv-min-io 4d15h opni-alertmanager-data-opni-alertmanager-alerting-0 Bound pvc-0b4b199e-cc92-4708-891a-391c591236f3 5Gi RWO directpv-min-io 4d15h opni-plugin-cache Bound pvc-abb70e02-9858-4c37-be78-c68ef064a512 8Gi RWO directpv-min-io 4d18h

best-microphone-20624

09/05/2023, 9:52 PM

Are the data-opni-quorum pvcs related to opensearch?

bright-oil-36284

09/05/2023, 9:52 PM

yeah

best-microphone-20624

09/05/2023, 9:53 PM

Let me uninstall logging again, clear the pvcs, then try again. one minute...

best-microphone-20624

09/05/2023, 9:57 PM

Immediately upon reinstall, the following error message displayed again: dial tcp: lookup opni-opensearch-svc on 10.43.0.1053 server misbehaving

best-microphone-20624

09/05/2023, 9:58 PM

Anything else that might be lingering that deleting the opniopensearch CR doesn't catch?

best-microphone-20624

09/05/2023, 10:00 PM

The admin ui just replaced its message with the following: dial tcp 10.43.30.1079200 connect: connection refused

best-microphone-20624

09/05/2023, 10:02 PM

Then message... [503 Service Unavailable] OpenSearch Security not initialized.: opensearch request unsuccessful

famous-dusk-36365

09/05/2023, 10:09 PM

Yeah it will take a little while for the opensearch cluster to become available. The quorum nodes are basically small controlplane nodes that are required for leader elections.

famous-dusk-36365

09/05/2023, 10:09 PM

The PVCs are the only things that should be left behind.

famous-dusk-36365

09/05/2023, 10:11 PM

also I misread the above pods, looks like the data pod was ready but the quorum pods were not. If it's in the same situation now it would be good to see logs from one of those pods

best-microphone-20624

09/05/2023, 10:13 PM

$ k logs -n opni --tail=10 opni-data-0 [2023-09-05T221028,570][INFO ][o.o.p.PluginsService ] [opni-data-0] PluginService:onIndexModule index:[.opendistro-job-scheduler-lock/vlAT4nDuSYa8fDV2WFnOsQ] [2023-09-05T221028,608][INFO ][o.o.a.u.d.DestinationMigrationCoordinator] [opni-data-0] Detected cluster change event for destination migration [2023-09-05T221028,709][INFO ][o.o.a.u.d.DestinationMigrationCoordinator] [opni-data-0] Detected cluster change event for destination migration [2023-09-05T221029,537][INFO ][o.o.p.PluginsService ] [opni-data-0] PluginService:onIndexModule index:[.opendistro-ism-managed-index-history-2023.09.05-1/05VeIvMdSdqBSvRODz6feQ] [2023-09-05T221029,542][INFO ][o.o.c.m.MetadataCreateIndexService] [opni-data-0] [.opendistro-ism-managed-index-history-2023.09.05-1] creating index, cause [api], templates [], shards [1]/[1] [2023-09-05T221029,576][INFO ][o.o.p.PluginsService ] [opni-data-0] PluginService:onIndexModule index:[.opendistro-ism-managed-index-history-2023.09.05-1/05VeIvMdSdqBSvRODz6feQ] [2023-09-05T221029,588][INFO ][o.o.a.u.d.DestinationMigrationCoordinator] [opni-data-0] Detected cluster change event for destination migration [2023-09-05T221029,659][INFO ][o.o.a.u.d.DestinationMigrationCoordinator] [opni-data-0] Detected cluster change event for destination migration [2023-09-05T221047,444][INFO ][o.o.i.i.PluginVersionSweepCoordinator] [opni-data-0] Canceling sweep ism plugin version job [2023-09-05T221052,057][INFO ][o.o.j.s.JobScheduler ] [opni-data-0] Will delay 171151 miliseconds for next execution of job logs-v0.5.4-000001 [rlgavli@ip-10-113-61-217 repos]$ k logs -n opni --tail=10 opni-data-0 [2023-09-05T221028,570][INFO ][o.o.p.PluginsService ] [opni-data-0] PluginService:onIndexModule index:[.opendistro-job-scheduler-lock/vlAT4nDuSYa8fDV2WFnOsQ] [2023-09-05T221028,608][INFO ][o.o.a.u.d.DestinationMigrationCoordinator] [opni-data-0] Detected cluster change event for destination migration [2023-09-05T221028,709][INFO ][o.o.a.u.d.DestinationMigrationCoordinator] [opni-data-0] Detected cluster change event for destination migration [2023-09-05T221029,537][INFO ][o.o.p.PluginsService ] [opni-data-0] PluginService:onIndexModule index:[.opendistro-ism-managed-index-history-2023.09.05-1/05VeIvMdSdqBSvRODz6feQ] [2023-09-05T221029,542][INFO ][o.o.c.m.MetadataCreateIndexService] [opni-data-0] [.opendistro-ism-managed-index-history-2023.09.05-1] creating index, cause [api], templates [], shards [1]/[1] [2023-09-05T221029,576][INFO ][o.o.p.PluginsService ] [opni-data-0] PluginService:onIndexModule index:[.opendistro-ism-managed-index-history-2023.09.05-1/05VeIvMdSdqBSvRODz6feQ] [2023-09-05T221029,588][INFO ][o.o.a.u.d.DestinationMigrationCoordinator] [opni-data-0] Detected cluster change event for destination migration [2023-09-05T221029,659][INFO ][o.o.a.u.d.DestinationMigrationCoordinator] [opni-data-0] Detected cluster change event for destination migration [2023-09-05T221047,444][INFO ][o.o.i.i.PluginVersionSweepCoordinator] [opni-data-0] Canceling sweep ism plugin version job [2023-09-05T221052,057][INFO ][o.o.j.s.JobScheduler ] [opni-data-0] Will delay 171151 miliseconds for next execution of job logs-v0.5.4-000001

famous-dusk-36365

09/05/2023, 10:13 PM

That looks OK, what are the quorum pods saying?

best-microphone-20624

09/05/2023, 10:14 PM

k logs -n opni --tail=10 opni-quorum-0 [2023-09-05T221029,571][INFO ][o.o.a.u.d.DestinationMigrationCoordinator] [opni-quorum-0] Detected cluster change event for destination migration [2023-09-05T221029,657][INFO ][o.o.a.u.d.DestinationMigrationCoordinator] [opni-quorum-0] Detected cluster change event for destination migration [2023-09-05T221043,479][WARN ][o.o.s.a.BackendRegistry ] [opni-quorum-0] Authentication finally failed for internalopni from 10.42.53.178:43004 [2023-09-05T221047,452][INFO ][o.o.i.i.PluginVersionSweepCoordinator] [opni-quorum-0] Canceling sweep ism plugin version job [2023-09-05T221115,199][WARN ][o.o.s.a.BackendRegistry ] [opni-quorum-0] Authentication finally failed for internalopni from 10.42.53.178:43004 [2023-09-05T221145,183][WARN ][o.o.s.a.BackendRegistry ] [opni-quorum-0] Authentication finally failed for internalopni from 10.42.53.178:43004 [2023-09-05T221215,544][WARN ][o.o.s.a.BackendRegistry ] [opni-quorum-0] Authentication finally failed for internalopni from 10.42.53.178:43004 [2023-09-05T221245,573][WARN ][o.o.s.a.BackendRegistry ] [opni-quorum-0] Authentication finally failed for internalopni from 10.42.53.178:43004 [2023-09-05T221315,284][WARN ][o.o.s.a.BackendRegistry ] [opni-quorum-0] Authentication finally failed for internalopni from 10.42.53.178:43004 [2023-09-05T221345,169][WARN ][o.o.s.a.BackendRegistry ] [opni-quorum-0] Authentication finally failed for internalopni from 10.42.53.178:43004

best-microphone-20624

09/05/2023, 10:14 PM

$ k logs -n opni --tail=10 opni-quorum-1 [2023-09-05T220547,526][INFO ][o.o.a.u.d.DestinationMigrationCoordinator] [opni-quorum-1] Detected cluster change event for destination migration [2023-09-05T220948,143][INFO ][o.o.c.s.ClusterSettings ] [opni-quorum-1] updating [plugins.index_state_management.metadata_migration.status] from [0] to [1] [2023-09-05T220948,164][INFO ][o.o.i.i.ManagedIndexCoordinator] [opni-quorum-1] Canceling metadata moving job because of cluster setting update. [2023-09-05T220948,164][INFO ][o.o.a.u.d.DestinationMigrationCoordinator] [opni-quorum-1] Detected cluster change event for destination migration [2023-09-05T220954,701][INFO ][o.o.j.s.JobSweeper ] [opni-quorum-1] Running full sweep [2023-09-05T221028,566][INFO ][o.o.a.u.d.DestinationMigrationCoordinator] [opni-quorum-1] Detected cluster change event for destination migration [2023-09-05T221028,698][INFO ][o.o.a.u.d.DestinationMigrationCoordinator] [opni-quorum-1] Detected cluster change event for destination migration [2023-09-05T221029,572][INFO ][o.o.a.u.d.DestinationMigrationCoordinator] [opni-quorum-1] Detected cluster change event for destination migration [2023-09-05T221029,657][INFO ][o.o.a.u.d.DestinationMigrationCoordinator] [opni-quorum-1] Detected cluster change event for destination migration [2023-09-05T221047,455][INFO ][o.o.i.i.PluginVersionSweepCoordinator] [opni-quorum-1] Canceling sweep ism plugin version job

famous-dusk-36365

09/05/2023, 10:15 PM

Could you try deleting the opni-quorum-0 pvc (it will not delete immediately due to finalizers) and then delete the opni-quorum-0 pod

famous-dusk-36365

09/05/2023, 10:19 PM

If that doesn't work I suspect there's a bug with the generation/storage of the internalopni user. I'll need to double check the code to confirm what's going on there.

best-microphone-20624

09/05/2023, 10:21 PM

$ k -n opni get po opni-quorum-0 NAME READY STATUS RESTARTS AGE opni-quorum-0 1/1 Running 0 2m28s

best-microphone-20624

09/05/2023, 10:21 PM

$ k -n opni get pvc data-opni-quorum-0 NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE data-opni-quorum-0 Bound pvc-87c82a2d-3d9b-43e9-8660-f7a536d80063 5Gi RWO directpv-min-io 2m31s

best-microphone-20624

09/05/2023, 10:23 PM

Looks like the pod and pvc redeployed ok but still getting error in ui: [404 Not Found] {"error":{"root_cause":[{"type":"index_not_found_exception","reason":"no such index [.tasks]","index":".tasks","resource.id":".tasks","resource.type":"index_expression","index_uuid":"_na_"}],"type":"resource_not_found_exception","reason":"task [gGByDRYPQ62b7SHze4agiQ:288] belongs to the node [gGByDRYPQ62b7SHze4agiQ] which isn't part of the cluster and there is no record of the task","caused_by":{"type":"resource_not_found_exception","reason":"task [gGByDRYPQ62b7SHze4agiQ:288] isn't running and hasn't stored its results","caused_by":{"type":"index_not_found_exception","reason":"no such index [.tasks]","index":".tasks","resource.id":".tasks","resource.type":"index_expression","index_uuid":"_na_"}}},"status"404} opensearch request unsuccessful

best-microphone-20624

09/05/2023, 10:24 PM

I'll let you do add'l troubleshooting. Thanks for your assistance.

best-microphone-20624

09/05/2023, 10:25 PM

Please post here if you discover the problem.

best-microphone-20624

09/05/2023, 10:25 PM

Best regards!

creamy-wolf-46823

09/06/2023, 1:38 AM

I opened issue https://github.com/rancher/opni/issues/1695 to track this problem.

👍 1

best-microphone-20624

09/07/2023, 4:12 AM

I tried updating my working, ephemeral configuration w/3 replicas to a persistent configuration w/3 replicas and encountered the following error: ... [040804] INFO Generating certificates {"controller": "opensearchcluster", "controllerGroup": "opensearch.opster.io", "controllerKind": "OpenSearchCluster", "OpenSearchCluster": {"name":"opni","namespace":"opni"}, "namespace": "opni", "name": "opni", "reconcileID": "11a160f3-966e-4979-b16f-b5848f1a51a6", "interface": "http"} [040804] INFO updating existing ism {"controller": "multiclusterrolebinding", "controllerGroup": "logging.opni.io", "controllerKind": "MulticlusterRoleBinding", "MulticlusterRoleBinding": {"name":"opni","namespace":"opni"}, "namespace": "opni", "name": "opni", "reconcileID": "c3431a0e-56c2-4e53-b119-21012136273a", "policy": "log-policy"} [040804] INFO Observed a panic in reconciler: runtime error: index out of range [0] with length 0 {"controller": "opensearchcluster", "controllerGroup": "opensearch.opster.io", "controllerKind": "OpenSearchCluster", "OpenSearchCluster": {"name":"opni","namespace":"opni"}, "namespace": "opni", "name": "opni", "reconcileID": "11a160f3-966e-4979-b16f-b5848f1a51a6"} panic: runtime error: index out of range [0] with length 0 [recovered] panic: runtime error: index out of range [0] with length 0 goroutine 880 [running]: sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile.func1() sigs.k8s.io/controller-runtime@v0.15.0/pkg/internal/controller/controller.go:115 +0x1e5 panic({0x488cea0?, 0xc003f9b260?}) runtime/panic.go:914 +0x21f opensearch.opster.io/pkg/reconcilers.(*ClusterReconciler).reconcileNodeStatefulSet(_, {{0xc001b3818c, 0x4}, 0x3, {0xc001b38190, 0x4}, {0xc0016ffc20, 0xc0016ffc50, {0x0, 0x0, ...}}, ...}, ...) opensearch.opster.io@v0.0.0-00010101000000-000000000000/pkg/reconcilers/cluster.go:308 +0x1f18 opensearch.opster.io/pkg/reconcilers.(*ClusterReconciler).Reconcile(0xc00046abd0) opensearch.opster.io@v0.0.0-00010101000000-000000000000/pkg/reconcilers/cluster.go:120 +0x9b0 opensearch.opster.io/controllers.(*OpenSearchClusterReconciler).reconcilePhaseRunning(0xc001030a50, {0x63c5490, 0xc0016ff890}) opensearch.opster.io@v0.0.0-00010101000000-000000000000/controllers/opensearchController.go:320 +0x7ad opensearch.opster.io/controllers.(*OpenSearchClusterReconciler).Reconcile(0xc001030a50, {0x63c5490, 0xc0016ff890}, {{{0xc001b380e4, 0x4}, {0xc001b380e0, 0x4}}}) opensearch.opster.io@v0.0.0-00010101000000-000000000000/controllers/opensearchController.go:141 +0x779 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile(0x63c5490?, {0x63c5490?, 0xc0016ff890?}, {{{0xc001b380e4?, 0x40b50a0?}, {0xc001b380e0?, 0x6399528?}}}) sigs.k8s.io/controller-runtime@v0.15.0/pkg/internal/controller/controller.go:118 +0xb7 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0xc000fafb80, {0x63c54c8, 0xc000f65860}, {0x45f69c0?, 0xc001b8c020?}) sigs.k8s.io/controller-runtime@v0.15.0/pkg/internal/controller/controller.go:314 +0x365 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc000fafb80, {0x63c54c8, 0xc000f65860}) sigs.k8s.io/controller-runtime@v0.15.0/pkg/internal/controller/controller.go:265 +0x1c9 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2() sigs.k8s.io/controller-runtime@v0.15.0/pkg/internal/controller/controller.go:226 +0x79 created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2 in goroutine 375 sigs.k8s.io/controller-runtime@v0.15.0/pkg/internal/controller/controller.go:222 +0x565

best-microphone-20624

09/07/2023, 4:14 AM

Should I open a separate issue for this issue whereby the opni-manager repeatedly crashes with the above panic?

famous-dusk-36365

09/07/2023, 4:48 AM

Yes please. That looks like something in the upstream operator which we will need to look at

3 Views

Open in Slack

Previous Next