diff --git a/docs/howtos/customisations/aws-bedrock.ipynb b/docs/howtos/customisations/aws-bedrock.ipynb new file mode 100644 index 000000000..9aef4035b --- /dev/null +++ b/docs/howtos/customisations/aws-bedrock.ipynb @@ -0,0 +1,453 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "7c249b40", + "metadata": {}, + "source": [ + "# Using Amazon Bedrock\n", + "\n", + "Amazon Bedrock is a fully managed service that makes FMs from leading AI startups and Amazon available via an API, so you can choose from a wide range of FMs to find the model that is best suited for your use case.\n", + "\n", + "This tutorial will show you how to use Amazon Bedrock endpoints and LangChain." + ] + }, + { + "cell_type": "markdown", + "id": "2e63f667", + "metadata": {}, + "source": [ + ":::{Note}\n", + "this guide is for folks who are using the Amazon Bedrock endpoints. Check the [evaluation guide](../../getstarted/evaluation.md) if your using OpenAI endpoints.\n", + ":::" + ] + }, + { + "cell_type": "markdown", + "id": "e54b5e01", + "metadata": {}, + "source": [ + "### Load sample dataset" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "b658e02f", + "metadata": {}, + "outputs": [], + "source": [ + "# data\n", + "from datasets import load_dataset\n", + "\n", + "fiqa_eval = load_dataset(\"explodinggradients/fiqa\", \"ragas_eval\")\n", + "fiqa_eval" + ] + }, + { + "cell_type": "markdown", + "id": "d4b8a69c", + "metadata": {}, + "source": [ + "Lets import metrics that we are going to use" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "f17bcf9d", + "metadata": {}, + "outputs": [], + "source": [ + "from ragas.metrics import (\n", + " context_precision,\n", + " answer_relevancy, # AnswerRelevancy\n", + " faithfulness,\n", + " context_recall,\n", + ")\n", + "from ragas.metrics.critique import harmfulness\n", + "\n", + "# list of metrics we're going to use\n", + "metrics = [\n", + " faithfulness,\n", + " answer_relevancy,\n", + " context_recall,\n", + " context_precision,\n", + " harmfulness,\n", + "]" + ] + }, + { + "cell_type": "markdown", + "id": "f1201199", + "metadata": {}, + "source": [ + "Now lets swap out the default `ChatOpenAI` with `BedrockChat`. Init a new instance of `BedrockChat` with the `model_id` of the model you want to use. You will also have to change the `BedrockEmbeddings` in the metrics that use them, which in our case is `answer_relevance`.\n", + "\n", + "Now in order to use the new `BedrockChat` llm instance with Ragas metrics, you have to create a new instance of `RagasLLM` using the `ragas.llms.LangchainLLM` wrapper. Its a simple wrapper around langchain that make Langchain LLM/Chat instances compatible with how Ragas metrics will use them." + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "40406a26", + "metadata": {}, + "outputs": [], + "source": [ + "from ragas.llms import LangchainLLM\n", + "from langchain.chat_models import BedrockChat\n", + "from langchain.embeddings import BedrockEmbeddings\n", + "\n", + "config = {\n", + " \"credentials_profile_name\": \"your-profile-name\", # E.g \"default\"\n", + " \"region_name\": \"your-region-name\", # E.g. \"us-east-1\"\n", + " \"model_id\": \"your-model-id\", # E.g \"anthropic.claude-v2\"\n", + " \"model_kwargs\": {\"temperature\": 0.4},\n", + "}\n", + "\n", + "bedrock_model = BedrockChat(\n", + " credentials_profile_name=config[\"credentials_profile_name\"],\n", + " region_name=config[\"region_name\"],\n", + " endpoint_url=f\"https://bedrock-runtime.{config['region_name']}.amazonaws.com\",\n", + " model_id=config[\"model_id\"],\n", + " model_kwargs=config[\"model_kwargs\"],\n", + ")\n", + "# wrapper around bedrock_model\n", + "ragas_bedrock_model = LangchainLLM(bedrock_model)\n", + "# patch the new RagasLLM instance\n", + "answer_relevancy.llm = ragas_bedrock_model\n", + "\n", + "# init and change the embeddings\n", + "# only for answer_relevancy\n", + "bedrock_embeddings = BedrockEmbeddings(\n", + " credentials_profile_name=config[\"credentials_profile_name\"],\n", + " region_name=config[\"region_name\"],\n", + ")\n", + "# embeddings can be used as it is\n", + "answer_relevancy.embeddings = bedrock_embeddings" + ] + }, + { + "cell_type": "markdown", + "id": "44641e41", + "metadata": {}, + "source": [ + "This replaces the default llm of `answer_relevancy` with the Amazon Bedrock endpoint. Now with some `__setattr__` magic lets change it for all other metrics." + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "id": "52d9f5f3", + "metadata": {}, + "outputs": [], + "source": [ + "for m in metrics:\n", + " m.__setattr__(\"llm\", ragas_bedrock_model)" + ] + }, + { + "cell_type": "markdown", + "id": "8d6ecd5a", + "metadata": {}, + "source": [ + "### Evaluation\n", + "\n", + "Running the evalutation is as simple as calling evaluate on the `Dataset` with the metrics of your choice." + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "id": "22eb6f97", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "evaluating with [faithfulness]\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "100%|█████████████████████████████████████████████████████████████| 2/2 [01:22<00:00, 41.24s/it]\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "evaluating with [answer_relevancy]\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "100%|█████████████████████████████████████████████████████████████| 2/2 [01:21<00:00, 40.59s/it]\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "evaluating with [context_recall]\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "100%|█████████████████████████████████████████████████████████████| 2/2 [00:46<00:00, 23.22s/it]\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "evaluating with [context_precision]\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "100%|█████████████████████████████████████████████████████████████| 2/2 [00:59<00:00, 29.85s/it]\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "evaluating with [harmfulness]\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "100%|█████████████████████████████████████████████████████████████| 2/2 [00:33<00:00, 16.96s/it]\n" + ] + }, + { + "data": { + "text/plain": [ + "{'faithfulness': 0.9428, 'answer_relevancy': 0.7860, 'context_recall': 0.2296, 'context_precision': 0.0000, 'harmfulness': 0.0000}" + ] + }, + "execution_count": 8, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "from ragas import evaluate\n", + "import nest_asyncio # CHECK NOTES\n", + "\n", + "# NOTES: Only used when running on a jupyter notebook, otherwise comment or remove this function.\n", + "nest_asyncio.apply()\n", + "\n", + "result = evaluate(\n", + " fiqa_eval[\"baseline\"],\n", + " metrics=metrics,\n", + ")\n", + "\n", + "result" + ] + }, + { + "cell_type": "markdown", + "id": "a2dc0ec2", + "metadata": {}, + "source": [ + "and there you have the it, all the scores you need. `ragas_score` gives you a single metric that you can use while the other onces measure the different parts of your pipeline.\n", + "\n", + "now if we want to dig into the results and figure out examples where your pipeline performed worse or really good you can easily convert it into a pandas array and use your standard analytics tools too!" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "id": "8686bf53", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + " | question | \n", + "contexts | \n", + "answer | \n", + "ground_truths | \n", + "faithfulness | \n", + "answer_relevancy | \n", + "context_recall | \n", + "context_precision | \n", + "harmfulness | \n", + "
---|---|---|---|---|---|---|---|---|---|
0 | \n", + "How to deposit a cheque issued to an associate... | \n", + "[Just have the associate sign the back and the... | \n", + "\\nThe best way to deposit a cheque issued to a... | \n", + "[Have the check reissued to the proper payee.J... | \n", + "1.0 | \n", + "0.930311 | \n", + "0.263158 | \n", + "0.0 | \n", + "0 | \n", + "
1 | \n", + "Can I send a money order from USPS as a business? | \n", + "[Sure you can. You can fill in whatever you w... | \n", + "\\nYes, you can send a money order from USPS as... | \n", + "[Sure you can. You can fill in whatever you w... | \n", + "1.0 | \n", + "0.984122 | \n", + "0.363636 | \n", + "0.0 | \n", + "0 | \n", + "
2 | \n", + "1 EIN doing business under multiple business n... | \n", + "[You're confusing a lot of things here. Compan... | \n", + "\\nYes, it is possible to have one EIN doing bu... | \n", + "[You're confusing a lot of things here. Compan... | \n", + "1.0 | \n", + "0.883872 | \n", + "0.363636 | \n", + "0.0 | \n", + "0 | \n", + "
3 | \n", + "Applying for and receiving business credit | \n", + "[Set up a meeting with the bank that handles y... | \n", + "\\nApplying for and receiving business credit c... | \n", + "[\"I'm afraid the great myth of limited liabili... | \n", + "1.0 | \n", + "0.518287 | \n", + "0.363636 | \n", + "0.0 | \n", + "0 | \n", + "
4 | \n", + "401k Transfer After Business Closure | \n", + "[The time horizon for your 401K/IRA is essenti... | \n", + "\\nIf your employer has closed and you need to ... | \n", + "[You should probably consult an attorney. Howe... | \n", + "1.0 | \n", + "0.779471 | \n", + "0.000000 | \n", + "0.0 | \n", + "0 | \n", + "