Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom prompt for test set generation #1121

Open
harvey1992 opened this issue Jul 22, 2024 · 20 comments
Open

Custom prompt for test set generation #1121

harvey1992 opened this issue Jul 22, 2024 · 20 comments
Labels
module-testsetgen Module testset generation question Further information is requested

Comments

@harvey1992
Copy link

Is there a way to customize a prompt using the TestsetGenerator class? I looked at the documentation, and it only provides examples with default prompts used under the hood. I also looked at the code associated with the test generation class, but I didn't understand how to do it, if it is possible.

@harvey1992 harvey1992 added the question Further information is requested label Jul 22, 2024
@dosubot dosubot bot added the module-testsetgen Module testset generation label Jul 22, 2024
Copy link

dosubot bot commented Jul 22, 2024

Hello @harvey1992! I'm here to assist you with any questions you have and help resolve any issues you encounter. If you need any assistance, feel free to ask.

Yes, you can customize a prompt using the TestsetGenerator class by defining a Prompt object with specific instructions, examples, input keys, output key, output type, and language. Here is an example of how a prompt is customized:

from ragas.llms.prompt import Prompt

reasoning_question_prompt = Prompt(
    name="reasoning_question",
    instruction="""Complicate the given question by rewriting question into a multi-hop reasoning question based on the provided context.
    Answering the question should require the reader to make multiple logical connections or inferences using the information available in given context.
    Rules to follow when rewriting question:
    1. Ensure that the rewritten question can be answered entirely from the information present in the contexts.
    2. Do not frame questions that contains more than 15 words. Use abbreviation wherever possible.
    3. Make sure the question is clear and unambiguous.
    4. phrases like 'based on the provided context','according to the context',etc are not allowed to appear in the question.""",
    examples=[
        {
            "question": "What is the capital of France?",
            "context": "France is a country in Western Europe. It has several cities, including Paris, Lyon, and Marseille. Paris is not only known for its cultural landmarks like the Eiffel Tower and the Louvre Museum but also as the administrative center.",
            "output": "Linking the Eiffel Tower and administrative center, which city stands as both?",
        },
        {
            "question": "What does the append() method do in Python?",
            "context": "In Python, lists are used to store multiple items in a single variable. Lists are one of 4 built-in data types used to store collections of data. The append() method adds a single item to the end of a list.",
            "output": "If a list represents a variable collection, what method extends it by one item?",
        },
    ],
    input_keys=["question", "context"],
    output_key="output",
    output_type="str",
    language="english",
)

This example shows how to create a prompt for rewriting questions into multi-hop reasoning questions. You can customize the instruction, examples, input_keys, output_key, output_type, and language to fit your specific needs.

Additionally, you can use the TestsetGenerator class to generate synthetic test data with different types of questions, such as reasoning, conditioning, and multi-context questions. Here is an example of how to customize the test data distribution:

from ragas.testset.generator import TestsetGenerator
from ragas.testset.evolutions import simple, reasoning, multi_context
from langchain_openai import ChatOpenAI, OpenAIEmbeddings

# documents = load your documents

# generator with openai models
generator_llm = ChatOpenAI(model="gpt-3.5-turbo-16k")
critic_llm = ChatOpenAI(model="gpt-4")
embeddings = OpenAIEmbeddings()

generator = TestsetGenerator.from_langchain(
    generator_llm,
    critic_llm,
    embeddings
)

# Change resulting question type distribution
distributions = {
    simple: 0.5,
    multi_context: 0.4,
    reasoning: 0.1
}

# use generator.generate_with_llamaindex_docs if you use llama-index as document loader
testset = generator.generate_with_langchain_docs(documents, 10, distributions) 
testset.to_pandas()

This example demonstrates how to customize the distribution of different question types in the generated test data [1][2].

To continue talking to Dosu, mention @dosu.

@harvey1992
Copy link
Author

@dosu you didn't actually use the custom prompt in the test generation.

Copy link

dosubot bot commented Jul 22, 2024

To integrate the custom prompt into the TestsetGenerator for generating test sets, you need to define the prompt and then use it within the TestsetGenerator. Here is an example of how you can integrate the custom prompt:

  1. Define the Custom Prompt: Ensure the prompt is defined as shown below.
  2. Integrate into TestsetGenerator: Use the defined prompt within the TestsetGenerator.

Here is an example code snippet:

from ragas.llms.prompt import Prompt
from ragas.testset.generator import TestsetGenerator

# Define the custom prompt
custom_prompt = Prompt(
    name="custom-prompt",
    instruction="Create one or more statements from each sentence in the given answer.",
    examples=[
        {
            "question": "Cadmium Chloride is slightly soluble in this chemical, it is also called what?",
            "answer": "alcohol",
            "statements in json": """{
                                        "statements": [
                                        "Cadmium Chloride is slightly soluble in alcohol."
                                        ]
                                    }""",
        },
        {
            "question": "Were Hitler and Benito Mussolini of the same nationality?",
            "answer": "Sorry, I can't provide answer to that question.",
            "statements in json": """{
                                        "statements": []
                                    }""",
        },
    ],
    input_keys=["question", "answer"],
    output_key="statements in json"
)

# Initialize the TestsetGenerator with the custom prompt
testset_generator = TestsetGenerator(prompt=custom_prompt)

# Generate test sets using the custom prompt
test_sets = testset_generator.generate_test_sets()

This code assumes that the TestsetGenerator class has a method generate_test_sets that uses the provided prompt to generate test sets. If the TestsetGenerator class or its methods differ, you may need to adjust the integration accordingly [1][2].

To continue talking to Dosu, mention @dosu.

@jjmachan
Copy link
Member

jjmachan commented Jul 24, 2024

@harvey1992 as of today customizing the testset generation is hard. You will have to reimplement the evaluations.

what modifications were you planning to do though?

@harvey1992
Copy link
Author

harvey1992 commented Jul 24, 2024

@jjmachan The default instructions for prompting test generation make the LLM generate questions that would be written on an exam. I wanted to modify the instructions so they would output questions closer to what an actual human might write.

@jjmachan
Copy link
Member

@harvey1992 gotcha!

you can actually easily modify the prompts. all you have to do is create a new Prompt which instructions and then replace simple.seed_question_prompt with it. You can do that for the other evolutions too.

let me know if this solves the issue?

@harvey1992
Copy link
Author

could you show me a simple example?

@jjmachan
Copy link
Member

I'm not sure about your usecase but something you could try is

from ragas.testset.evolutions import simple

simple.seed_question_prompt

and then use Prompt Objects | Ragas to create the new one

If you want we can jump on a call to better understand your usecase and help you out too

@atr-ip
Copy link

atr-ip commented Jul 30, 2024

@jjmachan I have tried it and it alone is not enough to change the generated response. Will it be possible in the future to generate your own question answer pair with your own prompts for a given context?

here is my example:

from ragas.llms.prompt import Prompt
from ragas.testset.evolutions import simple


own_seed_question_prompt = Prompt(
    name="seed_question",
    instruction="Generate a question that can be fully answered from given context. The question should be formed using topic",
    examples=[
        {
            "context": "Goal The XY was set out to replace another system, ABC. Right now it is only available to perform a partial role of the original system, the visualization of input data. It is out of scope for XY to perform further operational tasks.",
            "keyphrase": "XY",
            "question": "What was the initial purpose of XY project?",
        },
        {
            "context": " 8904 William Shakespeare 8905 Jane Austen    8983 Mark Twain  8916 Charles Dickens   8918 Leo Tolstoy    8920 Fyodor Dostoevsky  8921 George Orwell  8969 Ernest Hemingway  8974 Franz Kafka   8977 J. R. R. Tolkien   8978 Oscar Wilde 8989 Alexander Pushkin    8994 Victor Hugo   8995 John Steinbeck  8996 Walt Whitman    8997 Charles Baudelaire   8998 Virginia Woolf",
            "keyphrase": "Alexander Pushkin",
            "question": "What is Alexander Pushkins' port?",
        },
        {
            "context": "MaximumLinesWritten float Maximum number of lines written in a single poem Lines/Day MinutesOfInspiration float Sum of the durations of all individual writing sessions in the BIN and the period of 15 minutes. NumberOfPoems int Number of poems completed RejectionReasons string Reasons for the rejection of poems by publishers. The reason is given as text with associated number to show how many occurrences are seen in the BIN for the period of 15 minutes. BlockReasons string Reasons for the writer's block. The reason is given as text with associated number to show how many occurrences are seen in the BIN for the period of 15 minutes. StationaryMinutesOfInspiration float The duration of time measured in minutes in which the poet has been classified to be stationary and inspired for the BIN and the period of 15 minutes. Minutes",
            "keyphrase": "metaphor usage",
            "question": "In poetry analysis, which metric determines the frequency of metaphor usage in a given poem?",
        },
    ],
    input_keys=["context", "keyphrase"],
    output_key="question",
    output_type="str"
)


simple.seed_question_prompt = own_seed_question_prompt

@jjmachan
Copy link
Member

hey @atr-ip, I think I'm not quite understanding the question. Today you can do what you have mentioned above. The simple evolution has the prompt you specified and the generation process will go along as you instructed in the prompt.

or is it not working as such and you are facing some other problem?

from what I understand the root concern is that you need more control over the testsets generated. This is something we are hoping to address with #1016, do keep a tab on that one 🙂

@harvey1992
Copy link
Author

harvey1992 commented Jul 31, 2024

@jjmachan I am doing something similar to this from the documentation

# generator with openai models
generator_llm = ChatOpenAI(model="gpt-3.5-turbo-16k")
critic_llm = ChatOpenAI(model="gpt-4")
embeddings = OpenAIEmbeddings()

generator = TestsetGenerator.from_langchain(
    generator_llm,
    critic_llm,
    embeddings
)

# Change resulting question type distribution
distributions = {
    simple: 0.5,
    multi_context: 0.4,
    reasoning: 0.1
}

# use generator.generate_with_llamaindex_docs if you use llama-index as document loader
testset = generator.generate_with_langchain_docs(documents, 10, distributions) 
testset.to_pandas()

I am currently using the default instructions to generate this dataset. However, I would like to slightly modify how the questions are generated based on their type (simple, multi-context, reasoning). You mentioned using simple.seed_question_prompt, but I'm unsure how to proceed since I can't pass my custom prompt to the class in any way.

@jjmachan
Copy link
Member

jjmachan commented Aug 2, 2024

@harvey1992 mixing what @atr-ip wrote above and your code, you could try something like

from ragas.llms.prompt import Prompt
from ragas.testset.evolutions import simple


own_seed_question_prompt = Prompt(
    name="seed_question",
    instruction="Generate a question that can be fully answered from given context. The question should be formed using topic",
    examples=[
        {
            "context": "Goal The XY was set out to replace another system, ABC. Right now it is only available to perform a partial role of the original system, the visualization of input data. It is out of scope for XY to perform further operational tasks.",
            "keyphrase": "XY",
            "question": "What was the initial purpose of XY project?",
        },
        {
            "context": " 8904 William Shakespeare 8905 Jane Austen    8983 Mark Twain  8916 Charles Dickens   8918 Leo Tolstoy    8920 Fyodor Dostoevsky  8921 George Orwell  8969 Ernest Hemingway  8974 Franz Kafka   8977 J. R. R. Tolkien   8978 Oscar Wilde 8989 Alexander Pushkin    8994 Victor Hugo   8995 John Steinbeck  8996 Walt Whitman    8997 Charles Baudelaire   8998 Virginia Woolf",
            "keyphrase": "Alexander Pushkin",
            "question": "What is Alexander Pushkins' port?",
        },
        {
            "context": "MaximumLinesWritten float Maximum number of lines written in a single poem Lines/Day MinutesOfInspiration float Sum of the durations of all individual writing sessions in the BIN and the period of 15 minutes. NumberOfPoems int Number of poems completed RejectionReasons string Reasons for the rejection of poems by publishers. The reason is given as text with associated number to show how many occurrences are seen in the BIN for the period of 15 minutes. BlockReasons string Reasons for the writer's block. The reason is given as text with associated number to show how many occurrences are seen in the BIN for the period of 15 minutes. StationaryMinutesOfInspiration float The duration of time measured in minutes in which the poet has been classified to be stationary and inspired for the BIN and the period of 15 minutes. Minutes",
            "keyphrase": "metaphor usage",
            "question": "In poetry analysis, which metric determines the frequency of metaphor usage in a given poem?",
        },
    ],
    input_keys=["context", "keyphrase"],
    output_key="question",
    output_type="str"
)


simple.seed_question_prompt = own_seed_question_prompt # now the prompt has been changed

# generator with openai models
generator_llm = ChatOpenAI(model="gpt-3.5-turbo-16k")
critic_llm = ChatOpenAI(model="gpt-4")
embeddings = OpenAIEmbeddings()

generator = TestsetGenerator.from_langchain(
    generator_llm,
    critic_llm,
    embeddings
)

# Change resulting question type distribution
distributions = {
    simple: 0.5,
    multi_context: 0.4,
    reasoning: 0.1
}

# use generator.generate_with_llamaindex_docs if you use llama-index as document loader
testset = generator.generate_with_langchain_docs(documents, 10, distributions) 
testset.to_pandas()

let me know if that works?

@harvey1992
Copy link
Author

@jjmachan i modified the open-source code and it made it work for my needs. thanks

@jjmachan
Copy link
Member

jjmachan commented Aug 5, 2024

@harvey1992 glad you are unblock but do keep a tab one #1016 , will be improving customization more there

@fschuh
Copy link

fschuh commented Aug 7, 2024

@harvey1992 mixing what @atr-ip wrote above and your code, you could try something like

from ragas.llms.prompt import Prompt
from ragas.testset.evolutions import simple


own_seed_question_prompt = Prompt(
    name="seed_question",
    instruction="Generate a question that can be fully answered from given context. The question should be formed using topic",
    examples=[
        {
            "context": "Goal The XY was set out to replace another system, ABC. Right now it is only available to perform a partial role of the original system, the visualization of input data. It is out of scope for XY to perform further operational tasks.",
            "keyphrase": "XY",
            "question": "What was the initial purpose of XY project?",
        },
        {
            "context": " 8904 William Shakespeare 8905 Jane Austen    8983 Mark Twain  8916 Charles Dickens   8918 Leo Tolstoy    8920 Fyodor Dostoevsky  8921 George Orwell  8969 Ernest Hemingway  8974 Franz Kafka   8977 J. R. R. Tolkien   8978 Oscar Wilde 8989 Alexander Pushkin    8994 Victor Hugo   8995 John Steinbeck  8996 Walt Whitman    8997 Charles Baudelaire   8998 Virginia Woolf",
            "keyphrase": "Alexander Pushkin",
            "question": "What is Alexander Pushkins' port?",
        },
        {
            "context": "MaximumLinesWritten float Maximum number of lines written in a single poem Lines/Day MinutesOfInspiration float Sum of the durations of all individual writing sessions in the BIN and the period of 15 minutes. NumberOfPoems int Number of poems completed RejectionReasons string Reasons for the rejection of poems by publishers. The reason is given as text with associated number to show how many occurrences are seen in the BIN for the period of 15 minutes. BlockReasons string Reasons for the writer's block. The reason is given as text with associated number to show how many occurrences are seen in the BIN for the period of 15 minutes. StationaryMinutesOfInspiration float The duration of time measured in minutes in which the poet has been classified to be stationary and inspired for the BIN and the period of 15 minutes. Minutes",
            "keyphrase": "metaphor usage",
            "question": "In poetry analysis, which metric determines the frequency of metaphor usage in a given poem?",
        },
    ],
    input_keys=["context", "keyphrase"],
    output_key="question",
    output_type="str"
)


simple.seed_question_prompt = own_seed_question_prompt # now the prompt has been changed

# generator with openai models
generator_llm = ChatOpenAI(model="gpt-3.5-turbo-16k")
critic_llm = ChatOpenAI(model="gpt-4")
embeddings = OpenAIEmbeddings()

generator = TestsetGenerator.from_langchain(
    generator_llm,
    critic_llm,
    embeddings
)

# Change resulting question type distribution
distributions = {
    simple: 0.5,
    multi_context: 0.4,
    reasoning: 0.1
}

# use generator.generate_with_llamaindex_docs if you use llama-index as document loader
testset = generator.generate_with_langchain_docs(documents, 10, distributions) 
testset.to_pandas()

let me know if that works?

I use a similar solution to customize some of the prompts to get them working with less powerful LLMs that I run locally.
However, just updating the internal Ragas prompt variables isn't enough - it's also needed to reload the Ragas testset modules, because the original prompts have already been imported by Ragas code.

Here's what I do to reload the modules and have my custom prompts kick in:

import importlib
import ragas.testset.prompts as ragas_prompts
import ragas.testset.filters
import ragas.testset.evolutions
import ragas.testset.extractor
import ragas.testset.generator

# declare all custom ragas prompts
keyphrase_extraction_prompt = Prompt(
    name="keyphrase_extraction",
    instruction="Extract the top 3 to 5 keyphrases from the provided text, focusing on the most significant and distinctive aspects. Only output JSON, and nothing else.",
    examples=[
        {
            "text": "A black hole is a region of spacetime where gravity is so strong that nothing, including light and other electromagnetic waves, has enough energy to escape it. The theory of general relativity predicts that a sufficiently compact mass can deform spacetime to form a black hole.",
            "output": {
                "keyphrases": [
                    "Black hole",
                    "Region of spacetime",
                    "Strong gravity",
                    "Light and electromagnetic waves",
                    "Theory of general relativity",
                ]
            },
        },
        {
            "text": "The Great Wall of China is an ancient series of walls and fortifications located in northern China, built around 500 years ago. This immense wall stretches over 13,000 miles and is a testament to the skill and persistence of ancient Chinese engineers.",
            "output": {
                "keyphrases": [
                    "Great Wall of China",
                    "Ancient fortifications",
                    "Northern China",
                ]
            },
        },
    ],
    input_keys=["text"],
    output_key="output",
    output_type="json",
)

void apply_custom_prompts():
    # override ragas prompts here
    ragas_prompts.keyphrase_extraction_prompt = keyphrase_extraction_prompt

    importlib.reload(ragas.testset.filters)
    importlib.reload(ragas.testset.evolutions)
    importlib.reload(ragas.testset.extractor)
    importlib.reload(ragas.testset.generator)
    # from this point onwards, Ragas will use our custom prompts

This still feels like a brittle hack though, as these internal variables could change and/or disappear in future versions of Ragas.
It would be nice to have the public API allow custom prompts in some form, without having to resort to fiddling with the Ragas internals.

@harvey1992
Copy link
Author

@fschuh I updated the evolution file and generator file to allow custom prompts to be passed via arguments.

@CosaroLisa
Copy link

CosaroLisa commented Aug 9, 2024

@harvey1992

@fschuh I updated the evolution file and generator file to allow custom prompts to be passed via arguments.

Could you give a simple example of that ?

@harvey1992
Copy link
Author

@CosaroLisa I don't want to post a ton of code here, but in the evolution file, i created a setter to set the prompt for the given evolution type and then i added the initialized piece in the init_evolution method in the generator class.

@tomi-neuron
Copy link

hi,
@jjmachan is my understanding correct, that if I want to finetune the generator for a specific domain, then it is enough to change the simple.seed_question_prompt prompt examples to my custom ones? Since this is the "seed" from which the other type of questions (reasoning, conditional, multi-context) are evolved (and if i'm not mistaken, this is also the question for the "simple" question type), is this correct?
So, is changing the simple.seed_question_prompt sufficient and the right way to go?

I have a set of ground truth question/answer/context triplets and would like to use this as a few-shot examples for the generator so that the original seed question from which the other questions are rephrased/evolved are closer to my domain. However in the seed_question_prompt the keyphrases are also given in the examples. Is it possible to use the KeyphraseExtractor for this or better do it manually for the few-shot ground truth?

@derekhsu
Copy link

derekhsu commented Sep 6, 2024

No, if you want to use a customized key phrase extractor, you can create your own DocumentStore and pass by docstore parameter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module-testsetgen Module testset generation question Further information is requested
Projects
None yet
Development

No branches or pull requests

7 participants