Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial commit for RAG in py-ml #427

Open
wants to merge 42 commits into
base: main
Choose a base branch
from

Conversation

hmumtazz
Copy link

@hmumtazz hmumtazz commented Nov 15, 2024

Description

[Describe what this change achieves]

Issues Resolved

[List any issues this PR will resolve]

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@dhrubo-os
Copy link
Collaborator

DCO is missing

Copy link

@brianf-aws brianf-aws left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

really cool stuff here! 📖 🤖 💬 Lets try and refactor some code so its reusable across classes also lets apply SRP so that a class isnt burdened by doing a lot at once.

Make sure to add documentation to files and methods.
Also Im seeing there exists a query python file within py-ml https://github.com/opensearch-project/opensearch-py-ml/blame/main/opensearch_py_ml/query.py if possible maybe we can aggregate to existing code?

Lastly lets come up with a great description of the feature. You put a lot of effort so lets make it visually appealing in the PR description (diagrams, how to use, gifs, concise summary; emojis) You can use this as influence opensearch-project/neural-search#933 talk to @dhrubo-os if maybe we want to open up a Issue and then link this PR not sure if thats too much.

Great work!

Comment on lines 27 to 51
self.aws_region = config.get('region')
self.index_name = config.get('index_name')

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there input validation involved? maybe we can catch this earlier so it doesn't have to be a headache later?

opensearch_py_ml/ml_commons/rag_pipeline/rag/ingest.py Outdated Show resolved Hide resolved
opensearch_py_ml/ml_commons/rag_pipeline/rag/ingest.py Outdated Show resolved Hide resolved
opensearch_py_ml/ml_commons/rag_pipeline/rag/ingest.py Outdated Show resolved Hide resolved
opensearch_py_ml/ml_commons/rag_pipeline/rag/query.py Outdated Show resolved Hide resolved
opensearch_py_ml/ml_commons/rag_pipeline/rag/query.py Outdated Show resolved Hide resolved
opensearch_py_ml/ml_commons/rag_pipeline/rag/query.py Outdated Show resolved Hide resolved
opensearch_py_ml/ml_commons/rag_pipeline/rag/rag_setup.py Outdated Show resolved Hide resolved
@dhrubo-os
Copy link
Collaborator

DCO is missing.

@hmumtazz hmumtazz force-pushed the rag_pipeline branch 2 times, most recently from ced6a59 to 3260f3a Compare November 21, 2024 09:41
…ffering a suggested default value with the flexibility for users to enter a custom value if needed.

Signed-off-by: hmumtazz <[email protected]>
hmumtazz and others added 7 commits December 5, 2024 13:37
Removed references to serverless. Now the code only supports managed and open-source service types.
Removed the menu option for serverless.
For search methods, removed “neural” and only use “semantic” search.
Added a prompt to choose between semantic search with LLM and semantic search without LLM.
If semantic with LLM is chosen and the service type is managed, the code will prompt for LLM configuration.
If open-source is chosen, we skip AWS/Bedrock configurations and do not prompt for LLM registration since that requires AWS Bedrock.

Signed-off-by: hmumtazz <[email protected]>
Signed-off-by: hmumtazz <[email protected]>
Signed-off-by: hmumtazz <[email protected]>
Signed-off-by: hmumtazz <[email protected]>
Copy link

@brianf-aws brianf-aws left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets put a new line on the following files thats probably why its not able to pick up the license header

No license header found in:

  • opensearch_py_ml/ml_commons/rag_pipeline/init.py
  • opensearch_py_ml/ml_commons/rag_pipeline/rag/init.py
  • opensearch_py_ml/ml_commons/rag_pipeline/rag/rag.py
  • opensearch_py_ml/ml_commons/rag_pipeline/rag/ml_models/init.py

@dhrubo-os
Copy link
Collaborator

@hmumtazz lint is still failing:


Run nox -s lint
  nox -s lint
  shell: /usr/bin/bash -e {0}
  env:
    pythonLocation: /opt/hostedtoolcache/Python/3.12.7/x64
    LD_LIBRARY_PATH: /opt/hostedtoolcache/Python/3.12.7/x64/lib
nox > Running session lint
nox > Creating virtual environment (virtualenv) using python3 in .nox/lint
nox > python -m pip install black flake8 mypy isort numpy
nox > python utils/lint/license-headers.py check setup.py noxfile.py opensearch_py_ml/ utils/ tests/
All files had license header
nox > black --check --target-version=py38 setup.py noxfile.py opensearch_py_ml/ utils/ tests/
All done! ✨ 🍰 ✨
180 files would be left unchanged.
nox > isort --check --profile=black setup.py noxfile.py opensearch_py_ml/ utils/ tests/
nox > flake8 --ignore=E501,W503,E402,E712,E203 setup.py noxfile.py opensearch_py_ml/ utils/ tests/
opensearch_py_ml/ml_commons/rag_pipeline/rag/serverless.py:189:96: E226 missing whitespace around arithmetic operator
nox > Command flake8 --ignore=E501,W503,E402,E712,E203 setup.py noxfile.py opensearch_py_ml/ utils/ tests/ failed with exit code 1
nox > Session lint failed.

@dhrubo-os
Copy link
Collaborator

Integ tests are failing too!! Please fix these two

setup.py Outdated Show resolved Hide resolved
setup.py Outdated Show resolved Hide resolved
opensearch_py_ml/ml_commons/IAMRoleHelper.py Outdated Show resolved Hide resolved
opensearch_py_ml/ml_commons/IAMRoleHelper.py Outdated Show resolved Hide resolved
opensearch_py_ml/ml_commons/IAMRoleHelper.py Outdated Show resolved Hide resolved
opensearch_py_ml/ml_commons/IAMRoleHelper.py Outdated Show resolved Hide resolved
opensearch_py_ml/ml_commons/IAMRoleHelper.py Outdated Show resolved Hide resolved
opensearch_py_ml/ml_commons/IAMRoleHelper.py Show resolved Hide resolved
opensearch_py_ml/ml_commons/SecretsHelper.py Outdated Show resolved Hide resolved
opensearch_py_ml/ml_commons/rag_pipeline/__init__.py Outdated Show resolved Hide resolved
opensearch_py_ml/ml_commons/SecretsHelper.py Outdated Show resolved Hide resolved
opensearch_py_ml/ml_commons/rag_pipeline/rag/__init__.py Outdated Show resolved Hide resolved
@dhrubo-os
Copy link
Collaborator

@hmumtazz could you please add more details in the PR description. In addition, I remember you had a video of this feature. If possible let's add that in the PR as well so that people from community can also review your PR. Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants