docs: testset generation (explodinggradients#1373)

shahules786 · Oct 2, 2024 · 3d13974 · 3d13974
1 parent 3f5bcae
commit 3d13974
Show file tree

Hide file tree

Showing 8 changed files with 369 additions and 122 deletions.
diff --git a/docs/_static/imgs/kg_rag.png b/docs/_static/imgs/kg_rag.png
diff --git a/docs/_static/imgs/metrics_mindmap.png b/docs/_static/imgs/metrics_mindmap.png
diff --git a/docs/concepts/index.md b/docs/concepts/index.md
@@ -23,7 +23,7 @@
 
     ---
 
-    Create high-quality datasets for comprehensive testing.
+    Generate high-quality datasets for comprehensive testing.
 
     Algorithms for synthesizing data to test [RAG](test_data_generation/index.md#retrieval-augmented-generation), [Agentic workflows](test_data_generation/index.md#agents-or-tool-use-cases) 
 

diff --git a/docs/concepts/test_data_generation/index.md b/docs/concepts/test_data_generation/index.md
@@ -0,0 +1,18 @@
+# Testset Generation
+
+Curating a high quality test dataset is crucial for evaluating the performance of your AI application.
+
+## Characteristics of an Ideal Test Dataset
+
+- Contains high quality data samples
+- Covers wide variety of scenarios as observed in real world. 
+- Contains enough number of samples to be derive statistically significant conclusions. 
+- Continually updated to prevent data drift
+
+Curating such a dataset manually can be time consuming and expensive. Ragas provides a set of tools to generate synthetic test datasets for evaluating your AI applications. 
+
+<div class="grid cards" markdown>
+
+- :fontawesome-solid-database:[__RAG__ for evaluating retrieval augmented generation pipelines](rag.md)
+- :fontawesome-solid-robot: [__Agents or Tool use__ for evaluating agent workflows](agents.md)
+</div>
diff --git a/docs/concepts/test_data_generation/rag.md b/docs/concepts/test_data_generation/rag.md
@@ -0,0 +1,31 @@
+# Testset Generation for RAG
+
+In RAG application, when a user interacts through your application to a set of documents the user may ask different types of queries. These queries in terms of a RAG system can be generally classified into two types:
+
+## Two fundamental query types in RAG
+
+- Specific Queries
+    - Queries directly answerable by referring to single context
+    - “What is the value of X in Report FY2020 ?”
+
+- Abstract Queries
+
+    - Queries that can only be answered by referring to multiple documents
+    - “What is the the revenue trend for Company X from FY2020 through FY2023?”
+
+
+Synthesizing specific queries is relatively easy as it requires only a single context to generate the query. However, abstract queries require multiple contexts to generate the query.** Now the fundamental question is how select the right set of chunks to generate the abstract queries**. Different types of abstract queries require different types of contexts. For example, 
+
+- Abstract queries comparing two entities in a specific domain require contexts that contain information about the entities.
+    - “Compare the revenue growth of Company X and Company Y from FY2020 through FY2023”
+- Abstract queries about the a topic discussed in different contexts require contexts that contain information about the topic.
+    - “What are the different strategies used by companies to increase revenue?”
+
+
+To solve this problem, Ragas uses a Knowledge Graph based approach to Testset Generation.
+
+## Knowledge Graph Creation
+
+
+
+## Scenario Generation
diff --git a/docs/getstarted/index.md b/docs/getstarted/index.md
@@ -12,5 +12,5 @@ If you have any questions about Ragas, feel free to join and ask in the `#questi
 Let's get started!
 
 ## Retrieval Augmented Generation
-- [Generate test data for evaluating RAG]()
-- [Run ragas metrics for evaluating RAG](rag_evaluation.md)
+- [Run ragas metrics for evaluating RAG](rag_evaluation.md)
+- [Generate test data for evaluating RAG](rag_testset_generation.ipynb)
diff --git a/docs/getstarted/rag_testset_generation.ipynb b/docs/getstarted/rag_testset_generation.ipynb
@@ -0,0 +1,317 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "cf4c47e0-01c9-4b63-b204-24a53ac13678",
+   "metadata": {},
+   "source": [
+    "# Testset Generation for RAG\n",
+    "\n",
+    "[Open in google colab](https://colab.research.google.com/github/explodinggradients/ragas/blob/main/docs/getstarted/rag_testset_generation.ipynb)\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7de6e234-8bc6-4066-82d2-45af66b0b350",
+   "metadata": {},
+   "source": [
+    "## Requirements\n",
+    "1. Install ragas\n",
+    "2. Load documents from hub"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "394f0517-f314-415a-b17c-8943a39e8886",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "! pip install git+https://github.com/explodinggradients/ragas.git"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 13,
+   "id": "e611a88c-709a-495a-91b3-2680bd903772",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Cloning into 'Sample_Docs_Markdown'...\n",
+      "remote: Enumerating objects: 10, done.\u001b[K\n",
+      "remote: Counting objects: 100% (6/6), done.\u001b[K\n",
+      "remote: Compressing objects: 100% (6/6), done.\u001b[K\n",
+      "remote: Total 10 (delta 0), reused 0 (delta 0), pack-reused 4 (from 1)\u001b[K\n",
+      "Unpacking objects: 100% (10/10), 103.01 KiB | 4.68 MiB/s, done.\n"
+     ]
+    }
+   ],
+   "source": [
+    "! git clone https://huggingface.co/datasets/explodinggradients/Sample_Docs_Markdown"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "118a0031-547e-4fe1-a10c-407b199e68b5",
+   "metadata": {},
+   "source": [
+    "## Load documents"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "id": "4e2a8066-a091-439a-b3bb-bb3941b790a1",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from langchain_community.document_loaders import DirectoryLoader, TextLoader"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 15,
+   "id": "1faa9556-5bec-4811-8637-21c514a26a6b",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "path = \"Sample_Docs_Markdown/\"\n",
+    "loader = DirectoryLoader(path, glob=\"**/*.md\")\n",
+    "docs = loader.load()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "01419d3b-162b-48ec-b4ff-ba9cce409b35",
+   "metadata": {},
+   "source": [
+    "## Setup LLM"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "id": "6c2d7146-ba62-4c36-9d14-f44bdeb61c0a",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from langchain_openai import ChatOpenAI\n",
+    "from ragas.llms.base import LangchainLLMWrapper\n",
+    "openai_model = LangchainLLMWrapper(ChatOpenAI(model_name=\"gpt-4o\"))\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "86b99f89-ff6b-4b1f-a6e1-4071cb772af5",
+   "metadata": {},
+   "source": [
+    "## Run test generation"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "id": "bc4ade5e-5c3d-4ea7-9677-a02f2f2a44eb",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from ragas.experimental.testset import TestsetGenerator"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 16,
+   "id": "0ed55c14-a44b-4a21-87dd-e751fe33a7d6",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "generator = TestsetGenerator(llm=openai_model)\n",
+    "dataset = generator.generate_with_langchain_docs(docs, test_size=10)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f4b24fbe-e8bd-4a5b-b524-13bb34a1dc10",
+   "metadata": {},
+   "source": [
+    "## Export"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 12,
+   "id": "9de8f26c-791f-42fa-bb88-28ba6630b97f",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>user_input</th>\n",
+       "      <th>reference_contexts</th>\n",
+       "      <th>reference</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>0</th>\n",
+       "      <td>What strategies does DeepSeekMoE employ to ach...</td>\n",
+       "      <td>[1. Introduction\\n\\nRecent research and practi...</td>\n",
+       "      <td>DeepSeekMoE employs two principal strategies t...</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>1</th>\n",
+       "      <td>How Mixture-Of-Experts architecture make Trans...</td>\n",
+       "      <td>[2. Preliminaries: Mixture-Of-Experts For Tran...</td>\n",
+       "      <td>The Mixture-Of-Experts (MoE) architecture make...</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>2</th>\n",
+       "      <td>How the DeepSeekMoE architecture optimize expe...</td>\n",
+       "      <td>[3. Deepseekmoe Architecture\\n\\nOn top of the ...</td>\n",
+       "      <td>The DeepSeekMoE architecture optimizes expert ...</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>3</th>\n",
+       "      <td>What are the key components and strategies inv...</td>\n",
+       "      <td>[4. Validation Experiments 4.1. Experimental S...</td>\n",
+       "      <td>The key components and strategies involved in ...</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>4</th>\n",
+       "      <td>How DeepSeekMoE 16B perform better with effici...</td>\n",
+       "      <td>[5. Scaling Up To Deepseekmoe 16B\\n\\nWith the ...</td>\n",
+       "      <td>DeepSeekMoE 16B performs better with efficient...</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>5</th>\n",
+       "      <td>How does DeepSeekMoE 16B achive resorce-effici...</td>\n",
+       "      <td>[6. Alignment For Deepseekmoe 16B\\n\\nPrevious ...</td>\n",
+       "      <td>DeepSeekMoE 16B achieves resource-efficient pe...</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>6</th>\n",
+       "      <td>What factors contribute to performance and eff...</td>\n",
+       "      <td>[7. Deepseekmoe 145B Ongoing\\n\\nEncouraged by ...</td>\n",
+       "      <td>The performance and efficiency of DeepSeekMoE ...</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>7</th>\n",
+       "      <td>What are the key advancements in Mixture of Ex...</td>\n",
+       "      <td>[8. Related Work\\n\\nThe Mixture of Experts (Mo...</td>\n",
+       "      <td>Key advancements in Mixture of Experts (MoE) t...</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>8</th>\n",
+       "      <td>What key features and advantages of DeepSeekMo...</td>\n",
+       "      <td>[9. Conclusion\\n\\nIn this paper, we introduce ...</td>\n",
+       "      <td>Key features and advantages of DeepSeekMoE in ...</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>9</th>\n",
+       "      <td>How does DeepSeekMoE demonstrate competitive p...</td>\n",
+       "      <td>[Appendices A. Overview Of Hyper-Parameters\\n\\...</td>\n",
+       "      <td>DeepSeekMoE demonstrates competitive performan...</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "</div>"
+      ],
+      "text/plain": [
+       "                                          user_input  \\\n",
+       "0  What strategies does DeepSeekMoE employ to ach...   \n",
+       "1  How Mixture-Of-Experts architecture make Trans...   \n",
+       "2  How the DeepSeekMoE architecture optimize expe...   \n",
+       "3  What are the key components and strategies inv...   \n",
+       "4  How DeepSeekMoE 16B perform better with effici...   \n",
+       "5  How does DeepSeekMoE 16B achive resorce-effici...   \n",
+       "6  What factors contribute to performance and eff...   \n",
+       "7  What are the key advancements in Mixture of Ex...   \n",
+       "8  What key features and advantages of DeepSeekMo...   \n",
+       "9  How does DeepSeekMoE demonstrate competitive p...   \n",
+       "\n",
+       "                                  reference_contexts  \\\n",
+       "0  [1. Introduction\\n\\nRecent research and practi...   \n",
+       "1  [2. Preliminaries: Mixture-Of-Experts For Tran...   \n",
+       "2  [3. Deepseekmoe Architecture\\n\\nOn top of the ...   \n",
+       "3  [4. Validation Experiments 4.1. Experimental S...   \n",
+       "4  [5. Scaling Up To Deepseekmoe 16B\\n\\nWith the ...   \n",
+       "5  [6. Alignment For Deepseekmoe 16B\\n\\nPrevious ...   \n",
+       "6  [7. Deepseekmoe 145B Ongoing\\n\\nEncouraged by ...   \n",
+       "7  [8. Related Work\\n\\nThe Mixture of Experts (Mo...   \n",
+       "8  [9. Conclusion\\n\\nIn this paper, we introduce ...   \n",
+       "9  [Appendices A. Overview Of Hyper-Parameters\\n\\...   \n",
+       "\n",
+       "                                           reference  \n",
+       "0  DeepSeekMoE employs two principal strategies t...  \n",
+       "1  The Mixture-Of-Experts (MoE) architecture make...  \n",
+       "2  The DeepSeekMoE architecture optimizes expert ...  \n",
+       "3  The key components and strategies involved in ...  \n",
+       "4  DeepSeekMoE 16B performs better with efficient...  \n",
+       "5  DeepSeekMoE 16B achieves resource-efficient pe...  \n",
+       "6  The performance and efficiency of DeepSeekMoE ...  \n",
+       "7  Key advancements in Mixture of Experts (MoE) t...  \n",
+       "8  Key features and advantages of DeepSeekMoE in ...  \n",
+       "9  DeepSeekMoE demonstrates competitive performan...  "
+      ]
+     },
+     "execution_count": 12,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "dataset.to_hf_dataset().to_pandas()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "846d1d12-eb9c-4c6f-b920-83b9b89d6818",
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "ragas",
+   "language": "python",
+   "name": "ragas"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.10.8"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
-Original file line number
+Diff line change
@@ Expand Up / @@ -23,7 +23,7 @@ @@
         ---
-        Create high-quality datasets for comprehensive testing.
+        Generate high-quality datasets for comprehensive testing.
         Algorithms for synthesizing data to test [RAG](test_data_generation/index.md#retrieval-augmented-generation), [Agentic workflows](test_data_generation/index.md#agents-or-tool-use-cases)
@@ Expand Down @@