-
Notifications
You must be signed in to change notification settings - Fork 829
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[R-263] Roadmap - v0.2 #1009
Comments
We lack chunk quality metrics as of today. It will be good to see some chunk quality evaluation metrics. |
hey @rajib76, thanks for chipping in 🙂 could you explain a bit more about how you're measuring quality here? maybe an example too if possible? |
One of the hard problem today in RAG is to determine the right size of the chunk. If a chunk talks about multiple concept, it is very difficult to find the most relevant chunk for the question. I was looking for a metrics that will tell that a chunk is atomic and it talks about only one concept. The semantic chunking approach did not work as the embedding model itself has a semantic dissonance. |
@jjmachan @rajib76 metrics like |
This will be useful Akash.this will help in chunk tuning.
…On Sun, Aug 18, 2024 at 11:03 AM Aakash Thatte ***@***.***> wrote:
@jjmachan <https://github.com/jjmachan> @rajib76
<https://github.com/rajib76> metrics like chunk_attribution and
chunk_utilization (as referenced here
<https://docs.rungalileo.io/galileo/gen-ai-studio-products/guardrail-store/chunk-attribution>)
could help to quantify chunk quality. We already have relevance scores(from
vector DBs or keyword search engines) to measure chunk relevance with
respect to the query. But metrics to quantify how much of the chunk was
used can be helpful. I can can take this up you find them useful, I found
it interesting, could help decide how many chunks to retrieve.
—
Reply to this email directly, view it on GitHub
<#1009 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AD4VIRC6RTYRBPFEA76LKY3ZSDOYZAVCNFSM6AAAAABI3G2TLKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEOJVGM2DGNBRGA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
@jjmachan hey, thanks for the superb library and effort! I want to use RAGAS to evaluate my open-source RAG application that has it's own custom chunker and retriever. Do you consider feasible adding support for custom-chunks to the synthetic data generator? Right now I can't really use ragas fully because I need to rely on the chunks generated by ragas, instead of my own chunker. |
hey @Twist333d - thanks for the kind words ❤️ do you want to give that a go? |
yep @jjmachan shoot it of course! Btw, I've just setup RAGAS to be used with Weave, and another feature request came up - it would be great if you supported a much easier integration with tracing & eval suites such as Weave by W&B. |
Several more feature requests:
|
Prompt
#1012From SyncLinear.com | R-263
The text was updated successfully, but these errors were encountered: