Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Hybrid Search with single shard fails intermittently #1139

Closed
owaiskazi19 opened this issue Jan 23, 2025 · 1 comment · Fixed by #1140
Closed

[BUG] Hybrid Search with single shard fails intermittently #1139

owaiskazi19 opened this issue Jan 23, 2025 · 1 comment · Fixed by #1140
Assignees
Labels
bug Something isn't working hybrid search

Comments

@owaiskazi19
Copy link
Member

What is the bug?

Caused by: java.lang.ArrayIndexOutOfBoundsException: Index -1 out of bounds for length 2
        at org.opensearch.neuralsearch.processor.NormalizationProcessorWorkflow.updateOriginalFetchResults(NormalizationProcessorWorkflow.java:311) ~[?:?]
        at org.opensearch.neuralsearch.processor.NormalizationProcessorWorkflow.execute(NormalizationProcessorWorkflow.java:98) ~[?:?]
        at org.opensearch.neuralsearch.processor.NormalizationProcessor.hybridizeScores(NormalizationProcessor.java:66) ~[?:?]
        at org.opensearch.neuralsearch.processor.AbstractScoreHybridizationProcessor.process(AbstractScoreHybridizationProcessor.java:49) ~[?:?]
        at org.opensearch.search.pipeline.Pipeline.runSearchPhaseResultsTransformer(Pipeline.java:276) ~[opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
        at org.opensearch.search.pipeline.PipelinedRequest.transformSearchPhaseResults(PipelinedRequest.java:47) ~[opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
        at org.opensearch.action.search.AbstractSearchAsyncAction.onPhaseDone(AbstractSearchAsyncAction.java:813) ~[opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
        at org.opensearch.action.search.AbstractSearchAsyncAction.successfulShardExecution(AbstractSearchAsyncAction.java:663) ~[opensearch-3.0.0-SNAPSHOT.jar:3.0.0-SNAPSHOT]
        ... 31 more

The above is failing

ScoreDoc scoreDoc = topDocs.scoreDocs[i + fromValueForSingleShard];

where fromValueForSingleShard is coming as -1. While retracing the -1 is all the way from the initialization of from in OpenSearch
https://github.com/opensearch-project/OpenSearch/blob/c6dfc65ea0bccc9cfe66bc4248d09b42d7430d0e/server/src/main/java/org/opensearch/search/builder/SearchSourceBuilder.java#L169

How can one reproduce the bug?

  1. Create search pipeline
{
  "description": "Post processor for hybrid search",
  "phase_results_processors": [
    {
      "normalization-processor": {
        "normalization": {
          "technique": "min_max"
        },
        "combination": {
          "technique": "arithmetic_mean",
          "parameters": {
            "weights": [
              0.3,
              0.7
            ]
          }
        }
      }
    }
  ]
}
  1. Have the below mapping with single shard
{
  "settings": {
    "index.knn": true,
    "default_pipeline": "nlp-ingest-pipeline"
  },
  "mappings": {
    "properties": {
      "id": {
        "type": "text"
      },
      "passage_embedding": {
        "type": "knn_vector",
        "dimension": 768,
        "method": {
          "engine": "lucene",
          "space_type": "l2",
          "name": "hnsw",
          "parameters": {}
        }
      },
      "messages": {
        "type": "text"
      }
    }
  }
}
  1. Make the search request
{
    "query": {
        "hybrid": {
            "queries": [
                {
                    "match": {
                        "text": {
                            "query": "horse"
                        }
                    }
                },
                {
                    "neural": {
                        "passage_embedding": {
                            "query_text": "Hi world",
                            "model_id": "16cjlZQB07K5dzBKpHMP",
                            "k": 2
                        }
                    }
                }
            ]
        }
    }
}

What is the expected behavior?

A successful search response should be provided

What is your host/environment?

Operating system, version.

Do you have any screenshots?

If applicable, add screenshots to help explain your problem.

Do you have any additional context?

{
    "error": {
        "root_cause": [],
        "type": "search_phase_execution_exception",
        "reason": "The phase has failed",
        "phase": "query",
        "grouped": true,
        "failed_shards": [],
        "caused_by": {
            "type": "array_index_out_of_bounds_exception",
            "reason": "Index -1 out of bounds for length 2"
        }
    },
    "status": 500
}
@owaiskazi19 owaiskazi19 added bug Something isn't working hybrid search untriaged labels Jan 23, 2025
@vibrantvarun
Copy link
Member

Thanks @owaiskazi19 for logging and reproducing the bug. Synced offline for the fix and @owaiskazi19 is working to raise the PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working hybrid search
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants