diff --git a/docs/_static/langsmith_dashboard.png b/docs/_static/langsmith_dashboard.png new file mode 100644 index 000000000..4c3209db2 Binary files /dev/null and b/docs/_static/langsmith_dashboard.png differ diff --git a/docs/howtos/integrations/_langsmith.md b/docs/howtos/integrations/_langsmith.md deleted file mode 100644 index d936c1f43..000000000 --- a/docs/howtos/integrations/_langsmith.md +++ /dev/null @@ -1,75 +0,0 @@ -# Langsmith -## Dataset and Tracing Visualisation - -[Langsmith](https://docs.smith.langchain.com/) in a platform for building production-grade LLM applications from the langchain team. It helps you with tracing, debugging and evaluting LLM applications. - -The langsmith + ragas integrations offer 2 features -1. View the traces of ragas `evaluator` -2. Use ragas metrics in langchain evaluation - (soon) - - -## Tracing ragas metrics - -since ragas uses langchain under the hood all you have to do is setup langsmith and your traces will be logged. - -to setup langsmith make sure the following env-vars are set (you can read more in the [langsmith docs](https://docs.smith.langchain.com/#quick-start) - -```bash -export LANGCHAIN_TRACING_V2=true -export LANGCHAIN_ENDPOINT=https://api.smith.langchain.com -export LANGCHAIN_API_KEY= -export LANGCHAIN_PROJECT= # if not specified, defaults to "default" -``` - -Once langsmith is setup, just run the evaluations as your normally would - - -```python -from datasets import load_dataset -from ragas.metrics import context_precision, answer_relevancy, faithfulness -from ragas import evaluate - - -fiqa_eval = load_dataset("explodinggradients/fiqa", "ragas_eval") - -result = evaluate( - fiqa_eval["baseline"].select(range(3)), - metrics=[context_precision, faithfulness, answer_relevancy], -) - -result -``` - - Found cached dataset fiqa (/home/jjmachan/.cache/huggingface/datasets/explodinggradients___fiqa/ragas_eval/1.0.0/3dc7b639f5b4b16509a3299a2ceb78bf5fe98ee6b5fee25e7d5e4d290c88efb8) - - - - 0%| | 0/1 [00:00 +export LANGCHAIN_PROJECT= # Defaults to "default" if not set +``` + +## Getting the Dataset + +When creating evaluation dataset or evaluating instance, ensure the terminology matches the schema used in `SingleTurnSample` or `MultiTurnSample`. + + +```python +from ragas import EvaluationDataset + + +dataset = [ + { + "user_input": "Which CEO is widely recognized for democratizing AI education through platforms like Coursera?", + "retrieved_contexts": [ + "Andrew Ng, CEO of Landing AI, is known for his pioneering work in deep learning and for democratizing AI education through Coursera." + ], + "response": "Andrew Ng is widely recognized for democratizing AI education through platforms like Coursera.", + "reference": "Andrew Ng, CEO of Landing AI, is known for democratizing AI education through Coursera.", + }, + { + "user_input": "Who is Sam Altman?", + "retrieved_contexts": [ + "Sam Altman, CEO of OpenAI, has advanced AI research and advocates for safe, beneficial AI technologies." + ], + "response": "Sam Altman is the CEO of OpenAI and advocates for safe, beneficial AI technologies.", + "reference": "Sam Altman, CEO of OpenAI, has advanced AI research and advocates for safe AI.", + }, + { + "user_input": "Who is Demis Hassabis and how did he gain prominence?", + "retrieved_contexts": [ + "Demis Hassabis, CEO of DeepMind, is known for developing systems like AlphaGo that master complex games." + ], + "response": "Demis Hassabis is the CEO of DeepMind, known for developing systems like AlphaGo.", + "reference": "Demis Hassabis, CEO of DeepMind, is known for developing AlphaGo.", + }, + { + "user_input": "Who is the CEO of Google and Alphabet Inc., praised for leading innovation across Google's product ecosystem?", + "retrieved_contexts": [ + "Sundar Pichai, CEO of Google and Alphabet Inc., leads innovation across Google's product ecosystem." + ], + "response": "Sundar Pichai is the CEO of Google and Alphabet Inc., praised for leading innovation across Google's product ecosystem.", + "reference": "Sundar Pichai, CEO of Google and Alphabet Inc., leads innovation across Google's product ecosystem.", + }, + { + "user_input": "How did Arvind Krishna transform IBM?", + "retrieved_contexts": [ + "Arvind Krishna, CEO of IBM, transformed the company by focusing on cloud computing and AI solutions." + ], + "response": "Arvind Krishna transformed IBM by focusing on cloud computing and AI solutions.", + "reference": "Arvind Krishna, CEO of IBM, transformed the company through cloud computing and AI.", + }, +] + +evaluation_dataset = EvaluationDataset.from_list(dataset) +``` + +## Tracing ragas metrics + +Run the Ragas evaluations on your dataset, and the traces will appear in your LangSmith dashboard under the specified project name or "default." + + +```python +from ragas import evaluate +from ragas.llms import LangchainLLMWrapper +from langchain_openai import ChatOpenAI +from ragas.metrics import LLMContextRecall, Faithfulness, FactualCorrectness + +llm = ChatOpenAI(model="gpt-4o-mini") +evaluator_llm = LangchainLLMWrapper(llm) + +result = evaluate( + dataset=evaluation_dataset, + metrics=[LLMContextRecall(), Faithfulness(), FactualCorrectness()], + llm=evaluator_llm, +) + +result +``` + +Output +``` +Evaluating: 0%| | 0/15 [00:00 None: def build_evaluation_app_url(app_url: str, run_id: str) -> str: - return f"{app_url}/dashboard/alignment/evaluation/{run_id}" \ No newline at end of file + return f"{app_url}/dashboard/alignment/evaluation/{run_id}" diff --git a/src/ragas/testset/synthesizers/testset_schema.py b/src/ragas/testset/synthesizers/testset_schema.py index 6441be54d..973dcd209 100644 --- a/src/ragas/testset/synthesizers/testset_schema.py +++ b/src/ragas/testset/synthesizers/testset_schema.py @@ -16,7 +16,7 @@ SingleTurnSample, ) from ragas.exceptions import UploadException -from ragas.sdk import upload_packet, get_app_url +from ragas.sdk import get_app_url, upload_packet class TestsetSample(BaseSample):