forked from explodinggradients/ragas
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
docs: testset generation (explodinggradients#1373)
- Loading branch information
1 parent
3f5bcae
commit 3d13974
Showing
8 changed files
with
369 additions
and
122 deletions.
There are no files selected for viewing
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,18 @@ | ||
# Testset Generation | ||
|
||
Curating a high quality test dataset is crucial for evaluating the performance of your AI application. | ||
|
||
## Characteristics of an Ideal Test Dataset | ||
|
||
- Contains high quality data samples | ||
- Covers wide variety of scenarios as observed in real world. | ||
- Contains enough number of samples to be derive statistically significant conclusions. | ||
- Continually updated to prevent data drift | ||
|
||
Curating such a dataset manually can be time consuming and expensive. Ragas provides a set of tools to generate synthetic test datasets for evaluating your AI applications. | ||
|
||
<div class="grid cards" markdown> | ||
|
||
- :fontawesome-solid-database:[__RAG__ for evaluating retrieval augmented generation pipelines](rag.md) | ||
- :fontawesome-solid-robot: [__Agents or Tool use__ for evaluating agent workflows](agents.md) | ||
</div> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,31 @@ | ||
# Testset Generation for RAG | ||
|
||
In RAG application, when a user interacts through your application to a set of documents the user may ask different types of queries. These queries in terms of a RAG system can be generally classified into two types: | ||
|
||
## Two fundamental query types in RAG | ||
|
||
- Specific Queries | ||
- Queries directly answerable by referring to single context | ||
- “What is the value of X in Report FY2020 ?” | ||
|
||
- Abstract Queries | ||
|
||
- Queries that can only be answered by referring to multiple documents | ||
- “What is the the revenue trend for Company X from FY2020 through FY2023?” | ||
|
||
|
||
Synthesizing specific queries is relatively easy as it requires only a single context to generate the query. However, abstract queries require multiple contexts to generate the query.** Now the fundamental question is how select the right set of chunks to generate the abstract queries**. Different types of abstract queries require different types of contexts. For example, | ||
|
||
- Abstract queries comparing two entities in a specific domain require contexts that contain information about the entities. | ||
- “Compare the revenue growth of Company X and Company Y from FY2020 through FY2023” | ||
- Abstract queries about the a topic discussed in different contexts require contexts that contain information about the topic. | ||
- “What are the different strategies used by companies to increase revenue?” | ||
|
||
|
||
To solve this problem, Ragas uses a Knowledge Graph based approach to Testset Generation. | ||
|
||
## Knowledge Graph Creation | ||
|
||
|
||
|
||
## Scenario Generation |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,317 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"id": "cf4c47e0-01c9-4b63-b204-24a53ac13678", | ||
"metadata": {}, | ||
"source": [ | ||
"# Testset Generation for RAG\n", | ||
"\n", | ||
"[Open in google colab](https://colab.research.google.com/github/explodinggradients/ragas/blob/main/docs/getstarted/rag_testset_generation.ipynb)\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "7de6e234-8bc6-4066-82d2-45af66b0b350", | ||
"metadata": {}, | ||
"source": [ | ||
"## Requirements\n", | ||
"1. Install ragas\n", | ||
"2. Load documents from hub" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "394f0517-f314-415a-b17c-8943a39e8886", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"! pip install git+https://github.com/explodinggradients/ragas.git" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 13, | ||
"id": "e611a88c-709a-495a-91b3-2680bd903772", | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"name": "stdout", | ||
"output_type": "stream", | ||
"text": [ | ||
"Cloning into 'Sample_Docs_Markdown'...\n", | ||
"remote: Enumerating objects: 10, done.\u001b[K\n", | ||
"remote: Counting objects: 100% (6/6), done.\u001b[K\n", | ||
"remote: Compressing objects: 100% (6/6), done.\u001b[K\n", | ||
"remote: Total 10 (delta 0), reused 0 (delta 0), pack-reused 4 (from 1)\u001b[K\n", | ||
"Unpacking objects: 100% (10/10), 103.01 KiB | 4.68 MiB/s, done.\n" | ||
] | ||
} | ||
], | ||
"source": [ | ||
"! git clone https://huggingface.co/datasets/explodinggradients/Sample_Docs_Markdown" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "118a0031-547e-4fe1-a10c-407b199e68b5", | ||
"metadata": {}, | ||
"source": [ | ||
"## Load documents" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 7, | ||
"id": "4e2a8066-a091-439a-b3bb-bb3941b790a1", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"from langchain_community.document_loaders import DirectoryLoader, TextLoader" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 15, | ||
"id": "1faa9556-5bec-4811-8637-21c514a26a6b", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"path = \"Sample_Docs_Markdown/\"\n", | ||
"loader = DirectoryLoader(path, glob=\"**/*.md\")\n", | ||
"docs = loader.load()" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "01419d3b-162b-48ec-b4ff-ba9cce409b35", | ||
"metadata": {}, | ||
"source": [ | ||
"## Setup LLM" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 9, | ||
"id": "6c2d7146-ba62-4c36-9d14-f44bdeb61c0a", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"from langchain_openai import ChatOpenAI\n", | ||
"from ragas.llms.base import LangchainLLMWrapper\n", | ||
"openai_model = LangchainLLMWrapper(ChatOpenAI(model_name=\"gpt-4o\"))\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "86b99f89-ff6b-4b1f-a6e1-4071cb772af5", | ||
"metadata": {}, | ||
"source": [ | ||
"## Run test generation" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 4, | ||
"id": "bc4ade5e-5c3d-4ea7-9677-a02f2f2a44eb", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"from ragas.experimental.testset import TestsetGenerator" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 16, | ||
"id": "0ed55c14-a44b-4a21-87dd-e751fe33a7d6", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"generator = TestsetGenerator(llm=openai_model)\n", | ||
"dataset = generator.generate_with_langchain_docs(docs, test_size=10)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "f4b24fbe-e8bd-4a5b-b524-13bb34a1dc10", | ||
"metadata": {}, | ||
"source": [ | ||
"## Export" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 12, | ||
"id": "9de8f26c-791f-42fa-bb88-28ba6630b97f", | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"data": { | ||
"text/html": [ | ||
"<div>\n", | ||
"<style scoped>\n", | ||
" .dataframe tbody tr th:only-of-type {\n", | ||
" vertical-align: middle;\n", | ||
" }\n", | ||
"\n", | ||
" .dataframe tbody tr th {\n", | ||
" vertical-align: top;\n", | ||
" }\n", | ||
"\n", | ||
" .dataframe thead th {\n", | ||
" text-align: right;\n", | ||
" }\n", | ||
"</style>\n", | ||
"<table border=\"1\" class=\"dataframe\">\n", | ||
" <thead>\n", | ||
" <tr style=\"text-align: right;\">\n", | ||
" <th></th>\n", | ||
" <th>user_input</th>\n", | ||
" <th>reference_contexts</th>\n", | ||
" <th>reference</th>\n", | ||
" </tr>\n", | ||
" </thead>\n", | ||
" <tbody>\n", | ||
" <tr>\n", | ||
" <th>0</th>\n", | ||
" <td>What strategies does DeepSeekMoE employ to ach...</td>\n", | ||
" <td>[1. Introduction\\n\\nRecent research and practi...</td>\n", | ||
" <td>DeepSeekMoE employs two principal strategies t...</td>\n", | ||
" </tr>\n", | ||
" <tr>\n", | ||
" <th>1</th>\n", | ||
" <td>How Mixture-Of-Experts architecture make Trans...</td>\n", | ||
" <td>[2. Preliminaries: Mixture-Of-Experts For Tran...</td>\n", | ||
" <td>The Mixture-Of-Experts (MoE) architecture make...</td>\n", | ||
" </tr>\n", | ||
" <tr>\n", | ||
" <th>2</th>\n", | ||
" <td>How the DeepSeekMoE architecture optimize expe...</td>\n", | ||
" <td>[3. Deepseekmoe Architecture\\n\\nOn top of the ...</td>\n", | ||
" <td>The DeepSeekMoE architecture optimizes expert ...</td>\n", | ||
" </tr>\n", | ||
" <tr>\n", | ||
" <th>3</th>\n", | ||
" <td>What are the key components and strategies inv...</td>\n", | ||
" <td>[4. Validation Experiments 4.1. Experimental S...</td>\n", | ||
" <td>The key components and strategies involved in ...</td>\n", | ||
" </tr>\n", | ||
" <tr>\n", | ||
" <th>4</th>\n", | ||
" <td>How DeepSeekMoE 16B perform better with effici...</td>\n", | ||
" <td>[5. Scaling Up To Deepseekmoe 16B\\n\\nWith the ...</td>\n", | ||
" <td>DeepSeekMoE 16B performs better with efficient...</td>\n", | ||
" </tr>\n", | ||
" <tr>\n", | ||
" <th>5</th>\n", | ||
" <td>How does DeepSeekMoE 16B achive resorce-effici...</td>\n", | ||
" <td>[6. Alignment For Deepseekmoe 16B\\n\\nPrevious ...</td>\n", | ||
" <td>DeepSeekMoE 16B achieves resource-efficient pe...</td>\n", | ||
" </tr>\n", | ||
" <tr>\n", | ||
" <th>6</th>\n", | ||
" <td>What factors contribute to performance and eff...</td>\n", | ||
" <td>[7. Deepseekmoe 145B Ongoing\\n\\nEncouraged by ...</td>\n", | ||
" <td>The performance and efficiency of DeepSeekMoE ...</td>\n", | ||
" </tr>\n", | ||
" <tr>\n", | ||
" <th>7</th>\n", | ||
" <td>What are the key advancements in Mixture of Ex...</td>\n", | ||
" <td>[8. Related Work\\n\\nThe Mixture of Experts (Mo...</td>\n", | ||
" <td>Key advancements in Mixture of Experts (MoE) t...</td>\n", | ||
" </tr>\n", | ||
" <tr>\n", | ||
" <th>8</th>\n", | ||
" <td>What key features and advantages of DeepSeekMo...</td>\n", | ||
" <td>[9. Conclusion\\n\\nIn this paper, we introduce ...</td>\n", | ||
" <td>Key features and advantages of DeepSeekMoE in ...</td>\n", | ||
" </tr>\n", | ||
" <tr>\n", | ||
" <th>9</th>\n", | ||
" <td>How does DeepSeekMoE demonstrate competitive p...</td>\n", | ||
" <td>[Appendices A. Overview Of Hyper-Parameters\\n\\...</td>\n", | ||
" <td>DeepSeekMoE demonstrates competitive performan...</td>\n", | ||
" </tr>\n", | ||
" </tbody>\n", | ||
"</table>\n", | ||
"</div>" | ||
], | ||
"text/plain": [ | ||
" user_input \\\n", | ||
"0 What strategies does DeepSeekMoE employ to ach... \n", | ||
"1 How Mixture-Of-Experts architecture make Trans... \n", | ||
"2 How the DeepSeekMoE architecture optimize expe... \n", | ||
"3 What are the key components and strategies inv... \n", | ||
"4 How DeepSeekMoE 16B perform better with effici... \n", | ||
"5 How does DeepSeekMoE 16B achive resorce-effici... \n", | ||
"6 What factors contribute to performance and eff... \n", | ||
"7 What are the key advancements in Mixture of Ex... \n", | ||
"8 What key features and advantages of DeepSeekMo... \n", | ||
"9 How does DeepSeekMoE demonstrate competitive p... \n", | ||
"\n", | ||
" reference_contexts \\\n", | ||
"0 [1. Introduction\\n\\nRecent research and practi... \n", | ||
"1 [2. Preliminaries: Mixture-Of-Experts For Tran... \n", | ||
"2 [3. Deepseekmoe Architecture\\n\\nOn top of the ... \n", | ||
"3 [4. Validation Experiments 4.1. Experimental S... \n", | ||
"4 [5. Scaling Up To Deepseekmoe 16B\\n\\nWith the ... \n", | ||
"5 [6. Alignment For Deepseekmoe 16B\\n\\nPrevious ... \n", | ||
"6 [7. Deepseekmoe 145B Ongoing\\n\\nEncouraged by ... \n", | ||
"7 [8. Related Work\\n\\nThe Mixture of Experts (Mo... \n", | ||
"8 [9. Conclusion\\n\\nIn this paper, we introduce ... \n", | ||
"9 [Appendices A. Overview Of Hyper-Parameters\\n\\... \n", | ||
"\n", | ||
" reference \n", | ||
"0 DeepSeekMoE employs two principal strategies t... \n", | ||
"1 The Mixture-Of-Experts (MoE) architecture make... \n", | ||
"2 The DeepSeekMoE architecture optimizes expert ... \n", | ||
"3 The key components and strategies involved in ... \n", | ||
"4 DeepSeekMoE 16B performs better with efficient... \n", | ||
"5 DeepSeekMoE 16B achieves resource-efficient pe... \n", | ||
"6 The performance and efficiency of DeepSeekMoE ... \n", | ||
"7 Key advancements in Mixture of Experts (MoE) t... \n", | ||
"8 Key features and advantages of DeepSeekMoE in ... \n", | ||
"9 DeepSeekMoE demonstrates competitive performan... " | ||
] | ||
}, | ||
"execution_count": 12, | ||
"metadata": {}, | ||
"output_type": "execute_result" | ||
} | ||
], | ||
"source": [ | ||
"dataset.to_hf_dataset().to_pandas()" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "846d1d12-eb9c-4c6f-b920-83b9b89d6818", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [] | ||
} | ||
], | ||
"metadata": { | ||
"kernelspec": { | ||
"display_name": "ragas", | ||
"language": "python", | ||
"name": "ragas" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 3 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython3", | ||
"version": "3.10.8" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 5 | ||
} |
Oops, something went wrong.