Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ADR: Vector store for RAG #168

Merged
merged 2 commits into from
Jan 9, 2025
Merged

Conversation

anastasds
Copy link
Contributor

@anastasds anastasds commented Dec 19, 2024

@dmartinol
Copy link

What about moving adrs/ at the root folder, with README and the template files, and have a rag/ subfolder?
Apart from that, lgtm

@anastasds
Copy link
Contributor Author

That would probably need org-wide approval and we don't have that. I'll bring it up with the wider team next year to advocate for adoption of the format.

Copy link
Contributor

@jwm4 jwm4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the decomposition into Context, Decision, Status, and Consequences. It seems much smoother and more natural than the approach used in #164 which feels awkward and cumbersome to me. I have some minor change requests here, but mostly I like the direction of both the ADR template and the proposed Milvus ADR.

docs/rag/adrs/template.md Outdated Show resolved Hide resolved
docs/rag/adrs/adr-001-vectordb.md Outdated Show resolved Hide resolved
.gitignore Show resolved Hide resolved
@anastasds
Copy link
Contributor Author

I like the decomposition into Context, Decision, Status, and Consequences. It seems much smoother and more natural than the approach used in #164 which feels awkward and cumbersome to me.

I'm glad the approach resonates. Within reason, people should take priority over processes and not the other way around. This is a core tenet of the Agile Manifesto which, for better or for worse, has been a solid philosophical framework for ways of working for software teams since 2004:

Individuals and interactions over processes and tools

https://agilemanifesto.org/

@anastasds anastasds changed the title ADR 001: Vector store for RAG ADR: Vector store for RAG Dec 20, 2024

## Context

One of the first choices to make in implementing RAG is to choose an initial vector store to develop against. Though the usage of frameworks like LangChain or Haystack make it easy to swap vector databases, we need a working end to end implementation for RAG that is tested against and available to install with InstructLab. There are many options (see [here](https://docs.haystack.deepset.ai/docs/choosing-a-document-store)).
Copy link
Contributor Author

@anastasds anastasds Dec 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do I disable the below linting bot comment? It seems to me that this is a useless nitpick for markdown which will be rendered and in which whitespace does not matter. @nathan-weinberg maybe you can help here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also be fine with shutting off the linting, which I also find kind of tedious. However, I don't feel strongly. The markdownlint plugin in VSCode has made it a lot easier for me to comply with the linting in my ADRs since I can see the issues before submitting.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can run make md-lint locally to catch things locally so you don't need to wait for the GitHub CI to run

Copy link
Contributor

@jwm4 jwm4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a couple more comments. Also, I do think there is a lot of controversy around the Milvus dependencies specifically, but I am OK with waiting to see if someone else wants to challenge this decision on those grounds.

I see that this PR will need an update to the spellcheck dictionary and a couple of spelling corrections to pass the spell check. Also, I see that it is not passing DCO. I tried using the DCO guidance to bring my last dev-doc into compliance with DCO and it didn't work, but I do think it can be done with enough github expertise (just more than I have). So I don't think it is ready to merge. However, the only open item from me is the title line of the template.

docs/rag/adrs/template.md Outdated Show resolved Hide resolved

## Context

One of the first choices to make in implementing RAG is to choose an initial vector store to develop against. Though the usage of frameworks like LangChain or Haystack make it easy to swap vector databases, we need a working end to end implementation for RAG that is tested against and available to install with InstructLab. There are many options (see [here](https://docs.haystack.deepset.ai/docs/choosing-a-document-store)).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also be fine with shutting off the linting, which I also find kind of tedious. However, I don't feel strongly. The markdownlint plugin in VSCode has made it a lot easier for me to comply with the linting in my ADRs since I can see the issues before submitting.

Copy link
Contributor

@jwm4 jwm4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issues I raised earlier are all resolved now, so I am happy with this in its current form.

@jwm4 jwm4 mentioned this pull request Jan 6, 2025
Signed-off-by: Anastas Stoyanovsky <[email protected]>
Copy link
Contributor

@cdoern cdoern left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this all looks really good to me. I really appreciate the detail you put in here about ADRs in general.

@anastasds
Copy link
Contributor Author

@cdoern @nathan-weinberg thanks both! I don't have a merge button, so, if you would be so kind, that would get the template and guidance merged, and a subsequent ADR will record any changes regarding vector stores based on situational / informational changes.

Signed-off-by: Anastas Stoyanovsky <[email protected]>
@nathan-weinberg nathan-weinberg merged commit c71e488 into instructlab:main Jan 9, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants