Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New OpenSearch API source implementation #5024

Merged
merged 10 commits into from
Oct 11, 2024

Conversation

sb2k16
Copy link
Member

@sb2k16 sb2k16 commented Oct 5, 2024

Description

In order for DataPrepper to support all OpenSearch Document API(s), we need to build a new source similar to the existing http source. This pull request is intended to implement a new OpenSearch API source like opensearch_api similar to http source. This source should support the Document API Bulk.

This pull request includes the following:

  1. Path and HTTP methods:
    • POST _bulk
    • POST <index>/_bulk
  2. Optional URL parameters
    • pipeline and routing. (TODO: pipeline parameter handling on Sink side)
  3. Multi-line JSON Bulk Request payload

Example pipeline configuration with opensearch_api source looks like:

simple-sample-pipeline:
  source:
    opensearch_api:
      path: "/opensearch"
      port: 9202
  sink:
   ...

Issues Resolved

Contributes to #248

Check List

  • New functionality includes testing.
  • New functionality has a documentation issue. Please link to it in this PR.
  • New functionality has javadoc added
  • Commits are signed with a real name per the DCO

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@DataPrepperPlugin(name = "opensearch_api", pluginType = Source.class, pluginConfigurationType = OpenSearchAPISourceConfig.class)
public class OpenSearchAPISource extends BaseHttpSource<Record<Event>> {
private static final String SOURCE_NAME = "OpenSearch API";
private static final String HTTP_HEALTH_CHECK_PATH = "/";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the health check return any data?

What I meant by my comment from the other PR is that we should return the same JSON format that the OpenSearch root endpoint returns.

e.g.

docker run -p 9200:9200 -p 9600:9600 -e "discovery.type=single-node" -d opensearchproject/opensearch:2.11.1
curl -k -u "admin:admin" 'https://localhost:9200/'
{
  "name" : "9c3406b376ef",
  "cluster_name" : "docker-cluster",
  "cluster_uuid" : "ud9QWcTcQruIwlEIjIpYdg",
  "version" : {
    "distribution" : "opensearch",
    "number" : "2.11.1",
    "build_type" : "tar",
    "build_hash" : "6b1986e964d440be9137eba1413015c31c5a7752",
    "build_date" : "2023-11-29T21:45:35.524809067Z",
    "build_snapshot" : false,
    "lucene_version" : "9.7.0",
    "minimum_wire_compatibility_version" : "7.10.0",
    "minimum_index_compatibility_version" : "7.0.0"
  },
  "tagline" : "The OpenSearch Project: https://opensearch.org/"
}

Maybe we don't return all of that. But, maybe we return {} to start. What I want to avoid is making this return a health response that will not match with the root endpoint going forward.

I'm ok with just removing the health endpoint for the sake of this PR if you want.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @dlvenable. It makes sense to not return a response which does not match the root endpoint response to opensearch. I have disabled it. If customers want to access, data prepper will respond with a 404.

dlvenable
dlvenable previously approved these changes Oct 10, 2024
Copy link
Member

@dlvenable dlvenable left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @sb2k16. This is a great contribution and a great new feature for Data Prepper!

kkondaka
kkondaka previously approved these changes Oct 10, 2024
@sb2k16 sb2k16 dismissed stale reviews from kkondaka and dlvenable via 68b623b October 10, 2024 23:50
@sb2k16 sb2k16 force-pushed the opensearch-api-async branch from 68b623b to 190b10e Compare October 10, 2024 23:51
sb2k16 and others added 10 commits October 10, 2024 16:52
Signed-off-by: Souvik Bose <[email protected]>
Signed-off-by: Souvik Bose <[email protected]>
Signed-off-by: Souvik Bose <[email protected]>
Signed-off-by: Souvik Bose <[email protected]>
Signed-off-by: Souvik Bose <[email protected]>
Signed-off-by: Souvik Bose <[email protected]>
Signed-off-by: Souvik Bose <[email protected]>
Signed-off-by: Souvik Bose <[email protected]>
This reverts commit c52f584.

Signed-off-by: Souvik Bose <[email protected]>
Signed-off-by: Souvik Bose <[email protected]>
Signed-off-by: Souvik Bose <[email protected]>
Signed-off-by: Souvik Bose <[email protected]>
@sb2k16 sb2k16 force-pushed the opensearch-api-async branch from 190b10e to 71b35a6 Compare October 10, 2024 23:52
@kkondaka kkondaka merged commit 37aaac8 into opensearch-project:main Oct 11, 2024
73 of 74 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants