Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Blogpost] Configurable Automation for OpenSearch ML Use Cases #2698

Merged
merged 11 commits into from
Apr 8, 2024

Conversation

owaiskazi19
Copy link
Member

Description

Configurable Automation for OpenSearch ML Use Cases for 2.13

Issues Resolved

#2697

Check List

  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the BSD-3-Clause License.

Copy link

@minalsha minalsha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion to lets add all the team members to the blogpost.

kolchfa-aws and others added 2 commits April 3, 2024 11:54
Signed-off-by: Fanit Kolchina <[email protected]>
Signed-off-by: owaiskazi19 <[email protected]>
Copy link
Collaborator

@natebower natebower left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@owaiskazi19 @kolchfa-aws Editorial review complete. Please see my comments and changes and let me know if you have any questions. Thanks!

_community_members/amitgalitz.md Outdated Show resolved Hide resolved
_community_members/hnyng.md Outdated Show resolved Hide resolved
_community_members/jpalis.md Outdated Show resolved Hide resolved
_community_members/ohltyler.md Outdated Show resolved Hide resolved
}
```

With Flow Framework, we've simplified this complex setup process, enabling you to focus on your tasks without the burden of navigating complex APIs. Our goal is for you to use OpenSearch seamlessly, unlocking new possibilities in your projects.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"realizing" or "uncovering" instead of "unlocking"?


## Additional default use cases

You can explore more default use cases by viewing [substitution templates](https://github.com/opensearch-project/flow-framework/tree/2.13/src/main/resources/substitutionTemplates) with their corresponding [defaults](https://github.com/opensearch-project/flow-framework/tree/2.13/src/main/resources/defaults).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"and" their corresponding defaults?

Co-authored-by: Nathan Bower <[email protected]>
Signed-off-by: kolchfa-aws <[email protected]>
@kolchfa-aws
Copy link
Collaborator

@natebower Thank you for the review. I addressed your comments and accepted your suggestions.

Copy link
Member

@dbwiddis dbwiddis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, a few suggestions.

Comment on lines 27 to 30
1. Create a connector for a remote model, specifying pre- and post-processing functions.
1. Register an embedding model using the connector ID obtained in the previous step.
1. Configure an ingest pipeline to generate vector embeddings using the model ID of the registered model.
1. Create a k-NN index and add the pipeline created in the previous step.
Copy link
Member

@dbwiddis dbwiddis Apr 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not clear to the reader how complex this is, particularly since the same steps are essentially repeated on lines 38-40. We need to highlight in the sentence above (line 25) that these require 4 separate API calls, perhaps adding the words "copy and paste" when refer to "using the X ID"

1. Configure an ingest pipeline to generate vector embeddings using the model ID of the registered model.
1. Create a k-NN index and add the pipeline created in the previous step.

This complex setup typically required you to be familiar with the OpenSearch ML Commons APIs. However, we are simplifying this experience through the Flow Framework plugin. Let's demonstrate how the plugin simplifies this process using the preceding semantic search example.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't think we need "typically"


```json
{
"create_index.name": "my-nlp-index"
Copy link
Member

@dbwiddis dbwiddis Apr 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is confusing having this separated from the line 47 API, particularly with the response JSON in between. I think this goes in the block under line 49, but even I'm not sure. Make it clear, perhaps including it as a line 50 after describing that it's optional, or perhaps repeating the whole REST call with both lines.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will remove the json completely

Once the workflow is provisioned, you can ingest documents into the index created by the workflow:

```json
PUT /my-nlp-index/_doc/1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably use the path without the document ID included. Having the 1 here and the s1 in the "id" field is confusing. (We can leave out the "id" field as well since it's not the same as the "_id".)

"neural": {
"passage_embedding": {
"query_text": "Hi world",
"k": 100
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a smaller k?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a lowercase k only

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant a number less than 100, heh. Like "k": 10

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Signed-off-by: owaiskazi19 <[email protected]>
Copy link
Member

@dbwiddis dbwiddis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks really good. Some minor suggestions.

meta_description: Explore the simplicity of integrating Machine Learning capabilities within OpenSearch through an innovative and groundbreaking framework designed to simplify complex setup tasks.
---

In OpenSearch, to use machine learning (ML) offerings, such as semantic, hybrid, and multimodal search, you often have to grapple with complex setup and preprocessing tasks. Additionally, you must write verbose queries, which can be a time-consuming and error-prone process.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sentence has a lot of commas. I tried to rewrite it to make it better but couldn't really do much better. So I guess it's fine as is! :|

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kolchfa-aws any inputs here?


In OpenSearch, to use machine learning (ML) offerings, such as semantic, hybrid, and multimodal search, you often have to grapple with complex setup and preprocessing tasks. Additionally, you must write verbose queries, which can be a time-consuming and error-prone process.

In this blog post, we introduce the OpenSearch Flow Framework plugin, [released in version 2.13](https://opensearch.org/blog/2.13-is-ready-for-download/) and designed to streamline this cumbersome process. By using this plugin, you can simplify complex setups with just one click. We've provided automated templates, enabling you to create connectors, register models, deploy them, and register agents and tools through a single API call. This eliminates the complexity of calling multiple APIs and orchestrating setups based on the responses.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: "click" is kind of GUI centric and we're still API. Can we maybe say "one simple API call?"


## Before the Flow Framework plugin

Previously, setting up semantic search involves *4 separate API* calls outlined in the [semantic search documentation](https://opensearch.org/docs/latest/search-plugins/semantic-search/):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Put the closing * after "calls"

"neural": {
"passage_embedding": {
"query_text": "Hi world",
"k": 100
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant a number less than 100, heh. Like "k": 10

Signed-off-by: owaiskazi19 <[email protected]>
Copy link
Collaborator

@natebower natebower left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kolchfa-aws Final edits

@@ -22,14 +22,14 @@ In this blog post, we introduce the OpenSearch Flow Framework plugin, [released

## Before the Flow Framework plugin

Previously, setting up semantic search involved the steps outlined in the [semantic search documentation](https://opensearch.org/docs/latest/search-plugins/semantic-search/):
Previously, setting up semantic search involves *4 separate API* calls outlined in the [semantic search documentation](https://opensearch.org/docs/latest/search-plugins/semantic-search/):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"involved four". Comma after "calls".

"create_index.name": "my-nlp-index"
}
```
Note: The workflow in the previous step creates a default k-NN index. The default index name is `my-nlp-index`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add terminating period.

Signed-off-by: Fanit Kolchina <[email protected]>
@kolchfa-aws
Copy link
Collaborator

Thank you for reviewing the edits, @natebower! I've addressed your final comments.

@pajuric
Copy link

pajuric commented Apr 8, 2024

@nateynateynate @krisfreedain - Blog is ready to publish today.

Copy link
Member

@nateynateynate nateynateynate left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's push this live.

@nateynateynate nateynateynate merged commit c9fd1a2 into opensearch-project:main Apr 8, 2024
5 checks passed
@owaiskazi19
Copy link
Member Author

@nateynateynate thanks for pushing it to live. I don't see @dbwiddis and @jackiehanyang as the author though I have added them in the PR. Any idea?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants