RAG ingestion and chat pipelines #161

dmartinol · 2024-12-02T14:52:22Z

Introducing ilab commands changes to support the RAG ingestion and chat pipelines:

RAG conversion: a new command to process customer documentation. Either from knowledge taxonomy or from actual user documents.
RAG ingestion: a new command to generate and ingest embeddings from pre-processed documents into a configured vector store.
RAG chat: augment the context of the chat command with the result of a similarity search executed on the ingested database, to increase the accuracy of the response.

anastasds · 2024-12-02T18:57:15Z

I think that the chat / options probably deserve some discussion and also look like they may not be a priority, but, other than that, this looks reasonable to me.

docs/cli/ilab-rag-retrieval.md

franciscojavierarceo · 2024-12-03T16:22:02Z

docs/cli/ilab-rag-retrieval.md

+allowed. Therefore, we also propose alternative approaches to run the same RAG pipelines using existing `ilab` commands or
+other provided tools.
+
+### 3.1 RAG Ingestion Pipeline Command


How will this be handled for fine tuning?

I would expect that it ends up that many of the components end up being re-invented here.

Are you asking about fine-tuning of the embedding model or the response-generation model or both?

Here is the plan regarding fine-tuning of the response-generation model:

There are no plans to make changes to the existing capability in InstructLab for synthetic data generation (SDG) and fine-tuning the response-generation model from that synthetic data.

That existing capability includes a preprocessing step that is part of the ilab data generate command which fetches source documents (e.g., PDF files) and processes them using docling.

In RAG ingestion and chat pipelines #161 we propose to separate that preprocessing into its own step.

The outputs of that step will be used as inputs for the capabilities in RAG for vectorizing and indexing that same content (the source documents).

Ideally there will also be some way to put documents in directly without having to run the SDG preprocessing, but that is lower priority than just getting the primary flow working.

Fine-tuning the embedding model is out of scope for the MVP, but in the future I think we expect that the outputs of SDG would also be useful as training data for an embedding model (e.g., a cross-encoder model that really needs query / response pairs for fine-tuning). Alternatively, maybe we just use the extracted text for fine tuning a basic single-text encoder.

docs/cli/ilab-rag-retrieval.md

franciscojavierarceo · 2024-12-03T16:34:17Z

docs/cli/ilab-rag-retrieval.md

+| **TODO** evaluation framework options | | | |
+
+Equivalent YAML document for the newly proposed options:
+```yaml


This looks like it could pretty easily be structured in Feast.

jwm4

This document is a good start, but needs a lot more input from a lot of stakeholders, especially stakeholders who work on the existing command-line interface.

docs/cli/ilab-rag-retrieval.md

jwm4 · 2024-12-03T17:06:48Z

docs/cli/ilab-rag-retrieval.md

+### 3.1 RAG Ingestion Pipeline Command
+The proposal is to add a `rag` subgroup under the `data` group, with an `ingest` command, like:
+```
+ilab data rag ingest /path/to/docs/folder


Some thoughts on this:

I guess broadly speaking, I was expecting the proposal for how this should be reflected in the command-line interface to come from members of the engine team, e.g., @cdoern . However, I guess it is fine for us to propose things here and iterate with them.

I don't like rag ingest here. I think we want something that describes what we're doing here, which is building an index rather than bringing in the term "RAG" which describes the feature but not really what this specific step is doing.

I'm not sure how to respond to the /path/to/docs/folder part. We definitely want some sort of affordance around a flow where you do an ilab data generate and then ilab just knows where the outputs of that step are rather than you needing to specify it. However, some other affordance for being able to override that location also makes sense to me. So maybe if the folder is optional that solves this?

We need to figure out how this fits in with the broader refactor being considered in Refactor preprocessing and postprocessing in SDG #155

Also, I would like a flow some day where you can just point this step at source documents and it runs docling for you, but that's lower priority than the flow that is more tightly connected with SDG (or at least SDG preprocessing).

I don't like rag ingest here

Me neither, but I was waiting for the closure of the discussion on the related command at Knowledge doc ingestion #148 that, IIUC, should be the preliminary step before running the embedding ingestion. Depending on the selected verb, we can update this proposal accordingly (maybe, like ilab data index or ilab data generate index?).

I'm not sure how to respond to the /path/to/docs/folder part

Again, this followed the proposal for the other PR, that has both --input and --output options.

We definitely want some sort of affordance around a flow where you do an ilab data generate and then ilab just knows where the outputs of that step are rather than you needing to specify it

If this is a valid use case, then yes and the parameter will be optional. We have to think carefully of how to auto-detect the json docs in this case, as the datasets folder is "versioned" for each data generate execution, so I assume the requirement is to pick all the files from the latest documents-* subfolder.

docs/cli/ilab-rag-retrieval.md

anastasds · 2024-12-03T17:21:15Z

@jwm4

This document is a good start, but needs a lot more input from a lot of stakeholders

I think I need to clarify my position when I commented with a general "looks good to me" - I think that you raise some very valid points, but also I am operating under the assumption that this document serves as a proposal for "directionally where to head right now" and that any less-than-high-level details can, will, and probably should change as we understand the problem domain better during execution. I think that a useful modus operandi is to get a few key stakeholders to give a general approval and that that is enough to get started, and then have continuous feedback cycles all the time going forward to course correct as necessary. Analysis paralysis is a real effect that is best avoided.

Not trying to beat a dead horse but this is partly why I keep advocating for atomic decision records like ADRs over all-encompassing design docs like this. A general development roadmap is a necessary thing to have, but nobody will ever have enough information to design a full system specification, especially in the context of a marketplace and a large development organization. The only constant is change.

dmartinol · 2024-12-05T18:03:12Z

I will soon publish an updated version with the outcome of the discussion with the ilab Runtime (aka CLI) team.

dmartinol · 2024-12-06T17:33:51Z

@cdoern Could you please TAL and involve relevant people?

docs/cli/ilab-rag-retrieval.md

anastasds · 2024-12-06T18:12:00Z

There are many design decisions being made here that appear to be in a bit of a vacuum and so increase complexity of product usage and configuration while there are opportunities to streamline it instead.

@jwm4 I think we need to dedicate a significant effort to work through these as a group. I left comments on what I saw in a first pass.

jwm4

This is starting to look good to me. I still have some minor disagreements about technical details (see comments below) but mostly this is feeling like it is on the right track.

docs/cli/ilab-rag-retrieval.md

bbrowning

Overall I have some concerns about this approach, especially in light of the current changes happening in SDG. I think a lot of this approach is based on where SDG was and not where SDG is going, but this work wouldn't land in SDG until after we've reconciled with the research changes, have the ability to create custom Pipeline Blocks, expect users to create and execute their own Pipelines, and split out data preprocessing from data generation from data postprocessing.

I think the entire approach to generating vector embeddings and populating those in a vector database could probably be handled with the existing (post-reconcile with Research fork) SDG code along with a custom Pipeline Block implementation or two. We don't document how to do this yet, as the code is just landing, but that's our designated extension mechanism to do any random thing you want during a data generation pipeline.

docs/cli/ilab-rag-retrieval.md

anastasds · 2024-12-09T19:37:32Z

@bbrowning

a custom Pipeline Block implementation or two

What would the purpose of indexing generated data be?

bbrowning · 2024-12-09T20:06:09Z

@bbrowning

a custom Pipeline Block implementation or two

What would the purpose of indexing generated data be?

I don't mean indexing generated data - I mean using our pipelines concept to run a RAG pipeline that generates embeddings, populates a vector db, whatever you need - as opposed to calling an LLM for inference and data generation. Pipelines take an input dataset, have a sequences of Blocks that get executed in step, with the first block getting each input sample as input, it transforms those samples in some way, outputs samples, and the next block gets those new samples as its input. Today we mostly use this for transforming data in datasets, building prompts and calling LLMs for inference, but you could also use this concept to tokenize text and insert into a vector db. A RAG pipeline just becomes another set of pipelines shipped with the product versus code custom and specific to the RAG use-case, other than perhaps some RAG-specific Blocks we'd like to ship in the product itself.

It may be hard to understand how this all works without understanding the code of SDG including the upcoming changes to it, but we should at least try to use the designed SDG extension points of custom Blocks for part of this I think.

anastasds · 2024-12-09T20:12:30Z

Ah you mean creating a new pipeline for this that has nothing to do SDG. That sounds like it might be a very flexible solution, but at the cost of understandability etc.

franciscojavierarceo · 2024-12-10T15:45:26Z

There are many design decisions being made here that appear to be in a bit of a vacuum and so increase complexity of product usage and configuration while there are opportunities to streamline it instead.

@jwm4 I think we need to dedicate a significant effort to work through these as a group. I left comments on what I saw in a first pass.

I very much agree with this conclusion.

It also has extraordinary consequences for our enterprise customers at the RHOAI scale.

dmartinol · 2024-12-11T10:16:02Z

@jwm4 @anastasds integrated changes from yesterday's meeting. Should we move it from Draft to Ready?

jwm4

This is mostly looking good to me. I am requesting a few minor changes.

docs/rag/ilab-rag-retrieval.md

jwm4 · 2025-01-08T17:47:39Z

docs/rag/ilab-rag-retrieval.md

+| Option Description | Default Value | CLI Flag | Environment Variable |
+|--------------------|---------------|----------|----------------------|
+| Location folder of user documents. In case it's missing, the taxonomy is navigated to look for updated knowledge documents.|  | `--input` | `ILAB_PROCESS_INPUT` |
+| Location folder of processed documents. |  | `--ouput` | `ILAB_PROCESS_OUTPUT` |


@anastasds , should ILAB_PROCESS_INPUT and ILAB_PROCESS_OUTPUT be ILAB_CONVERT_INPUT and ILAB_CONVERT_OUTPUT, since process was renamed to convert?

Yes, missed that, thanks - submitted fix in dmartinol#4

Merged, thanks!

nathan-weinberg

A few questions/comments but overall LGTM - won't block on anything I've stated here

nathan-weinberg · 2025-01-08T19:12:51Z

docs/rag/ilab-rag-retrieval.md

+(RAG) artifacts within `InstructLab`. The proposed changes introduce new commands and options for the embedding ingestion
+and RAG-based chat pipelines:
+
+* A new `ilab rag` command group, feature gated behind a `ILAB_DEV_PREVIEW` environment variable.


What's the point of the feature gate?

@cdoern and @bbrowning were concerned that if this were released as dev preview without a gate that it would set expectations that this command would continue to exist, but in reality we haven't really converged on a long term CLI for this functionality.

Gotcha - personally I would simply prefer some kind of user alert (i.e. something like "NOTE: This is an experimental command at this time - once fully supported this warning will go away) over a env var-based feature gate - but if there was a previous convo about this I won't muck up the works, I don't feel that strongly about it 😄

Honestly, going in and trying to implement this, it seems a lot simpler to put a warning on it. Especially since some of the new options are for existing command groups, e.g. chat.rag.enabled. And don't we want users trying out previews?

I would also prefer a warning, but @cdoern and @bbrowning seemed pretty firm in their insistence on a feature gate.

After an offline discussion yesterday, where we landed is that experimental options would cause the application to simply exit with an error message if used without the dev flag being set.

nathan-weinberg · 2025-01-08T19:13:02Z

docs/rag/ilab-rag-retrieval.md

+and RAG-based chat pipelines:
+
+* A new `ilab rag` command group, feature gated behind a `ILAB_DEV_PREVIEW` environment variable.
+* A new `ilab rag` sub-command  group to process customer documentation.


Suggested change

* A new `ilab rag` sub-command group to process customer documentation.

* A new `ilab rag` sub-command group to process user documentation.

docs/rag/ilab-rag-retrieval.md

nathan-weinberg · 2025-01-08T19:25:17Z

@dmartinol Please squash commits before this is merged, TIA

cdoern

generally looks good! just a few comments on env var names and values.

docs/rag/ilab-rag-retrieval.md

jwm4

The concerns I raised in my last review appear to be resolved, so I am happy for this to merge now.

jwm4

I've reviewed the latest round of changes and this still looks good to me. I think it is ready for final oversight and merging. cc: @instructlab/oversight-committee

reidliu41 · 2024-12-17T13:23:27Z

docs/cli/ilab-rag-retrieval.md

+```
+ilab model chat --rag
+```
+


I think it would be better to have a list or check option to check what documents or collection name it have currently if it can use multiple.

May I know one document one collection name? Or one collection name multiple documents?

I think it would be better to have a list or check option to check what documents or collection name it have currently if it can use multiple.

Makes sense, but probably such command extensions may be discussed after an initial rollout, WDYT?

May I know one document one collection name? Or one collection name multiple documents?

The ingestion logic puts all docs in the configured collection name. Of course we can run it multiple times, each one to a different collection.
At inference time (chat), there's only the option to configure the collection name at startup, but it might be useful to allow for changing it dynamically along the conversation.

reidliu41 · 2024-12-17T13:35:16Z

docs/cli/ilab-rag-retrieval.md

+│ Press Alt (or Meta) and Enter or Esc Enter to end multiline input.                                                                │
+╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
+```
+


[S] is can switch to [M] and [default] can switch to [cli_helper] if user want to change.
if just show [RAG], I think would be better to show already enabled RAG as prompt output after user enabling it.

reidliu41 · 2024-12-17T13:39:09Z

docs/cli/ilab-rag-retrieval.md

+
+Equivalent YAML document for the newly proposed options:
+```yaml
+chat:


maybe chat --> rag?

chat: rag:

docs/cli/ilab-rag-retrieval.md

reidliu41 · 2024-12-17T13:57:18Z

docs/cli/ilab-rag-retrieval.md

+Options:
+  --input DIRECTORY  The folder with user documents to process.
+  --help             Show this message and exit.```
+```


Is it possible to separate it directory or single document? Seems user single document might be common for users.

makes sense for future enhancements.

reidliu41 · 2024-12-17T14:02:45Z

docs/cli/ilab-rag-retrieval.md

+#### Command Purpose
+Generate the embeddings from the pre-processed documents.
+* In case of Model Training path, the documents are located in the location specified by the `generate.output_dir` configuration key
+  (e.g. `_HOME_/.local/share/instructlab/datasets`).


it might be better to separate it too, e.g. _HOME_/.local/share/instructlab/datasets if users run several times, it will mess it there, even ilab data list can show it and separate, but would be hard to mange if more.

The doc also states In particular, only the latest folder with name starting by documents- will be explored.
Future enhancements can provide support to point to a specific document/dataset.

docs/rag/ilab-rag-retrieval.md

Adds new `ilab rag convert` command which takes in input documents from a taxonomy or a directory of original document files (e.g., PDFs) and outputs Docling JSON for each input document. **Issue resolved by this Pull Request:** Resolves #2890 **Dev Docs related to this Pull Request:** Link to Dev Doc or PR: instructlab/dev-docs#161 **Checklist:** - [x] **Commit Message Formatting**: Commit titles and messages follow guidelines in the [conventional commits](https://www.conventionalcommits.org/en/v1.0.0/#summary). - [ ] [Changelog](https://github.com/instructlab/instructlab/blob/main/CHANGELOG.md) updated with breaking and/or notable changes for the next minor release. - [ ] Documentation has been updated, if necessary. - [x] Unit tests have been added, if necessary. - [ ] Functional tests have been added, if necessary. - [ ] E2E Workflow tests have been added, if necessary. Approved-by: nathan-weinberg Approved-by: cdoern

danmcp

I think it would be good to see:

An answer to my newbie excalidraw question and some resolution on that really large file
An approval from (or delegate approval) from @bbrowning and @cdoern since they lead affected areas and have given feedback
As @nathan-weinberg suggested the PR needs to be squashed of intermediate commits

docs/rag/images/rag-ingestion-and-chat.excalidraw

Signed-off-by: Daniele Martinoli <[email protected]> Signed-off-by: Anastas Stoyanovsky <[email protected]>

bbrowning · 2025-01-21T20:44:43Z

Regarding my approval, and to @danmcp's point, on the surface it doesn't look like anything here expects or requires changes in SDG today. Am I reading the dev doc correctly? Things look entirely self-contained with the new ilab rag set of commands, and future work to reuse the same document ingestion code paths between SDG and the RAG work is deferred and out of scope of this dev doc?

The proposal seems reasonable as-written, given that it's a separate command set and gated behind a feature flag. However, it also seems to mostly or entirely impact the instructlab/instructlab repository with the new commands and code to support the experimental workflow, so I'd defer more towards the most active maintainers there to approve this, with @cdoern being a good person to wrangle those.

dmartinol · 2025-01-21T22:02:21Z

Regarding my approval, and to @danmcp's point, on the surface it doesn't look like anything here expects or requires changes in SDG today. Am I reading the dev doc correctly?

You are right. No changes expected on SDG package.

future work to reuse the same document ingestion code paths between SDG and the RAG work is deferred and out of scope of this dev doc?

Correct: as you said, this is out of scope.

Resolves #2876 Depends on #2832 Dev doc: instructlab/dev-docs#161 **Checklist:** - [X] **Commit Message Formatting**: Commit titles and messages follow guidelines in the [conventional commits](https://www.conventionalcommits.org/en/v1.0.0/#summary). - [x] [Changelog](https://github.com/instructlab/instructlab/blob/main/CHANGELOG.md) updated with breaking and/or notable changes for the next minor release. - [x] Documentation has been updated, if necessary. - [x] Unit tests have been added, if necessary. - [x] Functional tests have been added, if necessary. - [x] E2E Workflow tests have been added, if necessary. Approved-by: cdoern Approved-by: nathan-weinberg

bbrowning · 2025-01-22T13:46:38Z

@dmartinol Thanks for the clarification. This is approved from my point-of-view, as it pertains to impacts to SDG. I don't see anything proposed here that blocks or prevents us from reconciling the newer document ingestion work for RAG with the older way we do things in SDG as part of a future effort. And, having this behind an experimental feature gate gives us the ability to get real-world feedback from users about this feature as well as the newer document ingestion and conversion here without locking us in if we need to pivot based on that feedback.

Actual GitHub approvals I still defer to the maintainers most involved day-to-day in the instructlab/instructlab code repo, as that's where this work will land.

cdoern

After re-reading and given the work happening in InstructLab already, this all makes sense to me.

It might make sense to do a follow up to this doc or perhaps a new doc on how we aim to evolve the ingestion code to be more standardized between RAG and SDG in the future.

cdoern · 2025-01-22T14:14:34Z

docs/rag/ilab-rag-retrieval.md

+This flow is designed for users who aim to train their own models and leverage the source documents that support knowledge submissions to enhance the chat context:
+![model-training](./images/rag-model-training.png)
+
+**Note**: documents are processed using `instructlab-sdg` package and are defined using the docling v1 schema.


this makes sense, but it might be worth calling out we are looking to unify this code into a different ingestion package eventually as this is not an ideal solution

nathan-weinberg · 2025-01-22T16:08:39Z

If it's good with @danmcp I'm happy to merge this

danmcp

Thanks @dmartinol for the updates and @cdoern and @bbrowning for the reviews.

…2903) **Issue resolved by this Pull Request:** Resolves #2875 Addresses #2957 Depends on #2832 **Dev Docs related to this Pull Request:** Link to Dev Doc or PR: instructlab/dev-docs#161 **Checklist:** - [x] **Commit Message Formatting**: Commit titles and messages follow guidelines in the [conventional commits](https://www.conventionalcommits.org/en/v1.0.0/#summary). - [ ] [Changelog](https://github.com/instructlab/instructlab/blob/main/CHANGELOG.md) updated with breaking and/or notable changes for the next minor release. - [ ] Documentation has been updated, if necessary. - [x] Unit tests have been added, if necessary. - [ ] Functional tests have been added, if necessary. - [ ] E2E Workflow tests have been added, if necessary. **Unit test code in the next commit** Approved-by: nathan-weinberg Approved-by: cdoern Approved-by: alinaryan

nathan-weinberg requested review from cdoern and nathan-weinberg December 2, 2024 14:57

dmartinol commented Dec 3, 2024

View reviewed changes

docs/cli/ilab-rag-retrieval.md Outdated Show resolved Hide resolved

franciscojavierarceo reviewed Dec 3, 2024

View reviewed changes

docs/cli/ilab-rag-retrieval.md Outdated Show resolved Hide resolved

franciscojavierarceo reviewed Dec 3, 2024

View reviewed changes

docs/cli/ilab-rag-retrieval.md Outdated Show resolved Hide resolved

franciscojavierarceo reviewed Dec 3, 2024

View reviewed changes

jwm4 suggested changes Dec 3, 2024

View reviewed changes

dmartinol marked this pull request as draft December 5, 2024 18:01

anastasds reviewed Dec 6, 2024

View reviewed changes

docs/cli/ilab-rag-retrieval.md Outdated Show resolved Hide resolved

anastasds reviewed Dec 6, 2024

View reviewed changes

docs/cli/ilab-rag-retrieval.md Outdated Show resolved Hide resolved

anastasds reviewed Dec 6, 2024

View reviewed changes

docs/cli/ilab-rag-retrieval.md Outdated Show resolved Hide resolved

anastasds reviewed Dec 6, 2024

View reviewed changes

docs/cli/ilab-rag-retrieval.md Outdated Show resolved Hide resolved

anastasds reviewed Dec 6, 2024

View reviewed changes

docs/cli/ilab-rag-retrieval.md Outdated Show resolved Hide resolved

anastasds reviewed Dec 6, 2024

View reviewed changes

docs/cli/ilab-rag-retrieval.md Outdated Show resolved Hide resolved

anastasds reviewed Dec 6, 2024

View reviewed changes

docs/cli/ilab-rag-retrieval.md Outdated Show resolved Hide resolved

anastasds reviewed Dec 6, 2024

View reviewed changes

docs/cli/ilab-rag-retrieval.md Outdated Show resolved Hide resolved

jwm4 suggested changes Dec 6, 2024

View reviewed changes

docs/cli/ilab-rag-retrieval.md Outdated Show resolved Hide resolved

docs/cli/ilab-rag-retrieval.md Outdated Show resolved Hide resolved

docs/cli/ilab-rag-retrieval.md Outdated Show resolved Hide resolved

bbrowning reviewed Dec 9, 2024

View reviewed changes

docs/cli/ilab-rag-retrieval.md Outdated Show resolved Hide resolved

docs/cli/ilab-rag-retrieval.md Outdated Show resolved Hide resolved

docs/cli/ilab-rag-retrieval.md Outdated Show resolved Hide resolved

docs/cli/ilab-rag-retrieval.md Outdated Show resolved Hide resolved

dmartinol mentioned this pull request Dec 10, 2024

[FOR SHARING PURPOSES ONLY] RAG ingestion and chat pipelines instructlab/instructlab#2736

Closed

11 tasks

jwm4 suggested changes Jan 8, 2025

View reviewed changes

docs/rag/ilab-rag-retrieval.md Outdated Show resolved Hide resolved

docs/rag/ilab-rag-retrieval.md Show resolved Hide resolved

docs/rag/ilab-rag-retrieval.md Show resolved Hide resolved

dmartinol force-pushed the rag-embed-and-chat branch 2 times, most recently from 3956a88 to ce2ac7f Compare January 8, 2025 17:00

jwm4 reviewed Jan 8, 2025

View reviewed changes

nathan-weinberg approved these changes Jan 8, 2025

View reviewed changes

anastasds mentioned this pull request Jan 8, 2025

feat: Retrieval augmented generation for chat instructlab/instructlab#2886

Merged

6 tasks

dmartinol force-pushed the rag-embed-and-chat branch from a9184fe to 109a8b9 Compare January 8, 2025 22:08

anastasds mentioned this pull request Jan 9, 2025

[RAG][Dev] Implement ilab rag convert instructlab/instructlab#2890

Closed

cdoern reviewed Jan 9, 2025

View reviewed changes

jwm4 approved these changes Jan 9, 2025

View reviewed changes

jwm4 mentioned this pull request Jan 10, 2025

feat(rag): Add lab rag convert CLI command instructlab/instructlab#2902

Merged

6 tasks

dmartinol mentioned this pull request Jan 10, 2025

feat: Ingest document embeddings for Retrieval augmented generation instructlab/instructlab#2903

Merged

6 tasks

jwm4 approved these changes Jan 14, 2025

View reviewed changes

reidliu41 suggested changes Jan 15, 2025

View reviewed changes

danmcp requested changes Jan 18, 2025

View reviewed changes

docs/rag/images/rag-ingestion-and-chat.excalidraw Outdated Show resolved Hide resolved

dmartinol force-pushed the rag-embed-and-chat branch from 777d3a6 to acf3883 Compare January 18, 2025 07:09

RAG ingest and chat ADR

c471c61

Signed-off-by: Daniele Martinoli <[email protected]> Signed-off-by: Anastas Stoyanovsky <[email protected]>

dmartinol force-pushed the rag-embed-and-chat branch from acf3883 to c471c61 Compare January 18, 2025 07:12

cdoern approved these changes Jan 22, 2025

View reviewed changes

danmcp approved these changes Jan 22, 2025

View reviewed changes

danmcp merged commit 97844ab into instructlab:main Jan 22, 2025
4 checks passed

	* A new `ilab rag` sub-command group to process customer documentation.
	* A new `ilab rag` sub-command group to process user documentation.

RAG ingestion and chat pipelines #161

RAG ingestion and chat pipelines #161

Conversation

dmartinol commented Dec 2, 2024 • edited Loading

anastasds commented Dec 2, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jwm4 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

anastasds commented Dec 3, 2024

dmartinol commented Dec 5, 2024

dmartinol commented Dec 6, 2024

anastasds commented Dec 6, 2024 • edited Loading

jwm4 left a comment

Choose a reason for hiding this comment

bbrowning left a comment

Choose a reason for hiding this comment

anastasds commented Dec 9, 2024

bbrowning commented Dec 9, 2024

anastasds commented Dec 9, 2024

franciscojavierarceo commented Dec 10, 2024

dmartinol commented Dec 11, 2024

jwm4 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nathan-weinberg left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

anastasds Jan 8, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nathan-weinberg commented Jan 8, 2025

cdoern left a comment

Choose a reason for hiding this comment

jwm4 left a comment

Choose a reason for hiding this comment

jwm4 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

danmcp left a comment • edited Loading

Choose a reason for hiding this comment

bbrowning commented Jan 21, 2025

dmartinol commented Jan 21, 2025

bbrowning commented Jan 22, 2025

cdoern left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nathan-weinberg commented Jan 22, 2025

danmcp left a comment

Choose a reason for hiding this comment

dmartinol commented Dec 2, 2024 •

edited

Loading

anastasds commented Dec 6, 2024 •

edited

Loading

anastasds Jan 8, 2025 •

edited

Loading

danmcp left a comment •

edited

Loading