Skip to content

Commit

Permalink
docs: Fixed most of the broken links (#1830)
Browse files Browse the repository at this point in the history
  • Loading branch information
sahusiddharth authored Jan 10, 2025
1 parent 6478a6e commit 91393e6
Show file tree
Hide file tree
Showing 25 changed files with 663 additions and 278 deletions.
2 changes: 1 addition & 1 deletion docs/concepts/components/eval_dataset.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ An evaluation dataset consists of:

- **Define Clear Objectives**: Identify the specific aspects of the AI application that you want to evaluate and the scenarios you want to test. Collect data samples that reflect these objectives.

- **Collect Representative Data**: Ensure that the dataset covers a diverse range of scenarios, user inputs, and expected responses to provide a comprehensive evaluation of the AI application. This can be achieved by collecting data from various sources or [generating synthetic data]().
- **Collect Representative Data**: Ensure that the dataset covers a diverse range of scenarios, user inputs, and expected responses to provide a comprehensive evaluation of the AI application. This can be achieved by collecting data from various sources or [generating synthetic data](./../../howtos/customizations/index.md#testset-generation).

- **Quality and Size**: Aim for a dataset that is large enough to provide meaningful insights but not so large that it becomes unwieldy. Ensure that the data is of high quality and accurately reflects the real-world scenarios you want to evaluate.

Expand Down
7 changes: 4 additions & 3 deletions docs/concepts/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,23 +9,24 @@

Discover the various components used within Ragas.

Components like [Prompt Object](components/index.md#prompt-object), [Evaluation Dataset](components/index.md#evaluation-dataset) and [more..](components/index.md)
Components like [Prompt Object](components/prompt.md), [Evaluation Dataset](components/eval_dataset.md) and [more..](components/index.md)


- ::material-ruler-square:{ .lg .middle } [__Ragas Metrics__](metrics/index.md)

---

Explore available metrics and understand how they work.

Metrics for evaluating [RAG](metrics/index.md/#retrieval-augmented-generation), [Agentic workflows](metrics/index.md/#agents-or-tool-use-cases) and [more..](metrics/index.md/#list-of-available-metrics).
Metrics for evaluating [RAG](metrics/available_metrics/index.md#retrieval-augmented-generation), [Agentic workflows](metrics/available_metrics/index.md#agents-or-tool-use-cases) and [more..](metrics/available_metrics/index.md#list-of-available-metrics).

- :material-database-plus:{ .lg .middle } [__Test Data Generation__](test_data_generation/index.md)

---

Generate high-quality datasets for comprehensive testing.

Algorithms for synthesizing data to test [RAG](test_data_generation/index.md#retrieval-augmented-generation), [Agentic workflows](test_data_generation/index.md#agents-or-tool-use-cases)
Algorithms for synthesizing data to test [RAG](test_data_generation/rag.md), [Agentic workflows](test_data_generation/agents.md)


- :material-chart-box-outline:{ .lg .middle } [__Feedback Intelligence__](feedback/index.md)
Expand Down
8 changes: 4 additions & 4 deletions docs/concepts/metrics/overview/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,14 +18,14 @@ A metric is a quantitative measure used to evaluate the performance of a AI appl

     **LLM-based metrics**: These metrics use LLM underneath to do the evaluation. There might be one or more LLM calls that are performed to arrive at the score or result. These metrics can be somewhat non deterministic as the LLM might not always return the same result for the same input. On the other hand, these metrics has shown to be more accurate and closer to human evaluation.

All LLM based metrics in ragas are inherited from `MetricWithLLM` class. These metrics expects a [LLM]() object to be set before scoring.
All LLM based metrics in ragas are inherited from `MetricWithLLM` class. These metrics expects a LLM object to be set before scoring.

```python
from ragas.metrics import FactualCorrectness
scorer = FactualCorrectness(llm=evaluation_llm)
```

Each LLM based metrics also will have prompts associated with it written using [Prompt Object]().
Each LLM based metrics also will have prompts associated with it written using [Prompt Object](./../../components/prompt.md).


     **Non-LLM-based metrics**: These metrics do not use LLM underneath to do the evaluation. These metrics are deterministic and can be used to evaluate the performance of the AI application without using LLM. These metrics rely on traditional methods to evaluate the performance of the AI application, such as string similarity, BLEU score, etc. Due to the same, these metrics are known to have a lower correlation with human evaluation.
Expand All @@ -34,7 +34,7 @@ All LLM based metrics in ragas are inherited from `Metric` class.

**Metrics can be broadly classified into two categories based on the type of data they evaluate**:

     **Single turn metrics**: These metrics evaluate the performance of the AI application based on a single turn of interaction between the user and the AI. All metrics in ragas that supports single turn evaluation are inherited from `SingleTurnMetric` class and scored using `single_turn_ascore` method. It also expects a [Single Turn Sample]() object as input.
     **Single turn metrics**: These metrics evaluate the performance of the AI application based on a single turn of interaction between the user and the AI. All metrics in ragas that supports single turn evaluation are inherited from [SingleTurnMetric][ragas.metrics.base.SingleTurnMetric] class and scored using `single_turn_ascore` method. It also expects a [Single Turn Sample][ragas.dataset_schema.SingleTurnSample] object as input.

```python
from ragas.metrics import FactualCorrectness
Expand All @@ -43,7 +43,7 @@ scorer = FactualCorrectness()
await scorer.single_turn_ascore(sample)
```

     **Multi-turn metrics**: These metrics evaluate the performance of the AI application based on multiple turns of interaction between the user and the AI. All metrics in ragas that supports multi turn evaluation are inherited from `MultiTurnMetric` class and scored using `multi_turn_ascore` method. It also expects a [Multi Turn Sample]() object as input.
     **Multi-turn metrics**: These metrics evaluate the performance of the AI application based on multiple turns of interaction between the user and the AI. All metrics in ragas that supports multi turn evaluation are inherited from [MultiTurnMetric][ragas.metrics.base.MultiTurnMetric] class and scored using `multi_turn_ascore` method. It also expects a [Multi Turn Sample][ragas.dataset_schema.MultiTurnSample] object as input.

```python
from ragas.metrics import AgentGoalAccuracy
Expand Down
3 changes: 2 additions & 1 deletion docs/concepts/test_data_generation/rag.md
Original file line number Diff line number Diff line change
Expand Up @@ -103,7 +103,7 @@ graph TD

### Extractors

Different extractors are used to extract information from each nodes that can be used to establish the relationship between the nodes. For example, in the case of financial documents, the extractor that can be used are entity extractor to extract the entities like Company Name, Keyphrase extractor to extract important key phrases present in each node, etc. You can write your own [custom extractors]() to extract the information that is relevant to your domain.
Different extractors are used to extract information from each nodes that can be used to establish the relationship between the nodes. For example, in the case of financial documents, the extractor that can be used are entity extractor to extract the entities like Company Name, Keyphrase extractor to extract important key phrases present in each node, etc. You can write your own custom extractors to extract the information that is relevant to your domain.

Extractors can be LLM based which are inherited from `LLMBasedExtractor` or rule based which are inherited from `Extractor`.

Expand Down Expand Up @@ -165,6 +165,7 @@ graph TD

The extracted information is used to establish the relationship between the nodes. For example, in the case of financial documents, the relationship can be established between the nodes based on the entities present in the nodes.
You can write your own [custom relationship builder]() to establish the relationship between the nodes based on the information that is relevant to your domain.
# Link missing above

#### Example

Expand Down
4 changes: 2 additions & 2 deletions docs/extra/components/choose_evaluator_llm.md
Original file line number Diff line number Diff line change
Expand Up @@ -126,7 +126,7 @@
evaluator_llm = LangchainLLMWrapper(your_llm_instance)
```

For a more detailed guide, checkout [the guide on customizing models](../../howtos/customizations/customize_models/).
For a more detailed guide, checkout [the guide on customizing models](../../howtos/customizations/customize_models.md).

If you using LlamaIndex, you can use the `LlamaIndexLLMWrapper` to wrap your LLM so that it can be used with ragas.

Expand All @@ -135,6 +135,6 @@
evaluator_llm = LlamaIndexLLMWrapper(your_llm_instance)
```

For more information on how to use LlamaIndex, please refer to the [LlamaIndex Integration guide](../../howtos/integrations/_llamaindex/).
For more information on how to use LlamaIndex, please refer to the [LlamaIndex Integration guide](./../../howtos/integrations/_llamaindex.md).

If your still not able use Ragas with your favorite LLM provider, please let us know by by commenting on this [issue](https://github.com/explodinggradients/ragas/issues/1617) and we'll add support for it 🙂.
4 changes: 2 additions & 2 deletions docs/extra/components/choose_generator_llm.md
Original file line number Diff line number Diff line change
Expand Up @@ -125,7 +125,7 @@
generator_llm = LangchainLLMWrapper(your_llm_instance)
```

For a more detailed guide, checkout [the guide on customizing models](../../howtos/customizations/customize_models/).
For a more detailed guide, checkout [the guide on customizing models](../../howtos/customizations/customize_models.md).

If you using LlamaIndex, you can use the `LlamaIndexLLMWrapper` to wrap your LLM so that it can be used with ragas.

Expand All @@ -134,6 +134,6 @@
generator_llm = LlamaIndexLLMWrapper(your_llm_instance)
```

For more information on how to use LlamaIndex, please refer to the [LlamaIndex Integration guide](../../howtos/integrations/_llamaindex/).
For more information on how to use LlamaIndex, please refer to the [LlamaIndex Integration guide](./../../howtos/integrations/_llamaindex.md).

If your still not able use Ragas with your favorite LLM provider, please let us know by by commenting on this [issue](https://github.com/explodinggradients/ragas/issues/1617) and we'll add support for it 🙂.
4 changes: 2 additions & 2 deletions docs/getstarted/evals.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ The purpose of this guide is to illustrate a simple workflow for testing and eva

In this guide, you will evaluate a **text summarization pipeline**. The goal is to ensure that the output summary accurately captures all the key details specified in the text, such as growth figures, market insights, and other essential information.

`ragas` offers a variety of methods for analyzing the performance of LLM applications, referred to as [metrics](../concepts/metrics/). Each metric requires a predefined set of data points, which it uses to calculate scores that indicate performance.
`ragas` offers a variety of methods for analyzing the performance of LLM applications, referred to as [metrics](../concepts/metrics/available_metrics/index.md). Each metric requires a predefined set of data points, which it uses to calculate scores that indicate performance.

### Evaluating using a Non-LLM Metric

Expand Down Expand Up @@ -203,7 +203,7 @@ To fix these results, ragas provides a way to align the metric with your prefere
2. **Download**: Save the annotated data using the `Annotated JSON` button in [app.ragas.io](https://app.ragas.io/).
3. **Train**: Use the annotated data to train your custom metric.

To learn more about this, refer to how to [train your own metric guide](../howtos/customizations/metrics/train_your_own_metric.md)
To learn more about this, refer to how to [train your own metric guide](./../howtos/customizations/metrics/train_your_own_metric.md)

[Download sample annotated JSON](../_static/sample_annotated_summary.json)

Expand Down
2 changes: 1 addition & 1 deletion docs/getstarted/rag_eval.md
Original file line number Diff line number Diff line change
Expand Up @@ -157,7 +157,7 @@ evaluation_dataset = EvaluationDataset.from_list(dataset)

## Evaluate

We have successfully collected the evaluation data. Now, we can evaluate our RAG system on the collected dataset using a set of commonly used RAG evaluation metrics. You may choose any model as [evaluator LLM](/docs/howtos/customizations/customize_models.md) for evaluation.
We have successfully collected the evaluation data. Now, we can evaluate our RAG system on the collected dataset using a set of commonly used RAG evaluation metrics. You may choose any model as [evaluator LLM](./../howtos/customizations/customize_models.md) for evaluation.

```python
from ragas import evaluate
Expand Down
19 changes: 13 additions & 6 deletions docs/getstarted/rag_testset_generation.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ docs = loader.load()

### Choose your LLM

You may choose to use any [LLM of your choice](../howtos/customizations/customize_models.md)
You may choose to use any [LLM of your choice](./../howtos/customizations/customize_models.md)
--8<--
choose_generator_llm.md
--8<--
Expand All @@ -55,9 +55,10 @@ Once you have generated a testset, you would want to view it and select the quer
dataset.to_pandas()
```

Output
![testset](./testset_output.png)

You can also use other tools like [app.ragas.io](https://app.ragas.io/) or any other similar tools available for you in the [Integrations](../howtos/integrations/index.md) section.
You can also use other tools like [app.ragas.io](https://app.ragas.io/) or any other similar tools available for you in the [Integrations](./../howtos/integrations/index.md) section.

In order to use the [app.ragas.io](https://app.ragas.io/) dashboard, you need to have an account on [app.ragas.io](https://app.ragas.io/). If you don't have one, you can sign up for one [here](https://app.ragas.io/login). You will also need to have a [Ragas APP token](https://app.ragas.io/settings/api-keys).

Expand Down Expand Up @@ -93,6 +94,7 @@ from ragas.testset.graph import KnowledgeGraph

kg = KnowledgeGraph()
```
Output
```
KnowledgeGraph(nodes: 0, relationships: 0)
```
Expand All @@ -110,6 +112,7 @@ for doc in docs:
)
)
```
Output
```
KnowledgeGraph(nodes: 10, relationships: 0)
```
Expand Down Expand Up @@ -137,6 +140,8 @@ kg.save("knowledge_graph.json")
loaded_kg = KnowledgeGraph.load("knowledge_graph.json")
loaded_kg
```

Output
```
KnowledgeGraph(nodes: 48, relationships: 605)
```
Expand All @@ -158,11 +163,13 @@ from ragas.testset.synthesizers import default_query_distribution

query_distribution = default_query_distribution(generator_llm)
```

Output
```
[
(SingleHopSpecificQuerySynthesizer(llm=llm), 0.5),
(MultiHopAbstractQuerySynthesizer(llm=llm), 0.25),
(MultiHopSpecificQuerySynthesizer(llm=llm), 0.25),
(SingleHopSpecificQuerySynthesizer(llm=llm), 0.5),
(MultiHopAbstractQuerySynthesizer(llm=llm), 0.25),
(MultiHopSpecificQuerySynthesizer(llm=llm), 0.25),
]
```

Expand All @@ -172,5 +179,5 @@ Now we can generate the testset.
testset = generator.generate(testset_size=10, query_distribution=query_distribution)
testset.to_pandas()
```

Output
![testset](./testset_output.png)
60 changes: 27 additions & 33 deletions docs/howtos/applications/_cost.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,12 +24,10 @@ from ragas.cost import get_token_usage_for_openai

get_token_usage_for_openai(llm_result)
```




TokenUsage(input_tokens=9, output_tokens=9, model='')

Output
```
TokenUsage(input_tokens=9, output_tokens=9, model='')
```


You can define your own or import parsers if they are defined. If you would like to suggest parser for LLM providers or contribute your own ones please check out this [issue](https://github.com/explodinggradients/ragas/issues/1151) 🙂.
Expand All @@ -47,9 +45,10 @@ dataset = load_dataset("explodinggradients/amnesty_qa", "english_v3")

eval_dataset = EvaluationDataset.from_hf_dataset(dataset["eval"])
```

Repo card metadata block was not found. Setting CardData to empty.

Output
```
Repo card metadata block was not found. Setting CardData to empty.
```

You can pass in the parser to the `evaluate()` function and the cost will be calculated and returned in the `Result` object.

Expand All @@ -67,21 +66,19 @@ result = evaluate(
token_usage_parser=get_token_usage_for_openai,
)
```


Evaluating: 0%| | 0/20 [00:00<?, ?it/s]

Output
```
Evaluating: 0%| | 0/20 [00:00<?, ?it/s]
```


```python
result.total_tokens()
```




TokenUsage(input_tokens=25097, output_tokens=3757, model='')

Output
```
TokenUsage(input_tokens=25097, output_tokens=3757, model='')
```


You can compute the cost for each run by passing in the cost per token to `Result.total_cost()` function.
Expand All @@ -93,11 +90,10 @@ In this case GPT-4o costs $5 for 1M input tokens and $15 for 1M output tokens.
result.total_cost(cost_per_input_token=5 / 1e6, cost_per_output_token=15 / 1e6)
```




1.1692900000000002

Output
```
1.1692900000000002
```


## Token Usage for Testset Generation
Expand All @@ -116,10 +112,9 @@ kg = KnowledgeGraph.load("../../../experiments/scratchpad_kg.json")
kg
```




KnowledgeGraph(nodes: 47, relationships: 109)
Output
```
KnowledgeGraph(nodes: 47, relationships: 109)
Expand All @@ -145,9 +140,8 @@ testset = tg.generate(testset_size=10, token_usage_parser=get_token_usage_for_op
testset.total_cost(cost_per_input_token=5 / 1e6, cost_per_output_token=15 / 1e6)
```




0.20967000000000002

Output
```
0.20967000000000002
```

Loading

0 comments on commit 91393e6

Please sign in to comment.