Skip to content

Commit

Permalink
[ci skip] DOC add section in user guide 147df0d
Browse files Browse the repository at this point in the history
  • Loading branch information
glemaitre committed Apr 17, 2024
1 parent 172ed7b commit 60bfec1
Show file tree
Hide file tree
Showing 34 changed files with 1,987 additions and 212 deletions.
2 changes: 1 addition & 1 deletion .buildinfo
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Sphinx build info version 1
# This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done.
config: 31829c0d0fa608d27ff762e0d5e45a77
config: 08df6d8976882d5d196038fc120712f9
tags: 645f666f9bcd5a90fca523b33c5a78b7
2 changes: 1 addition & 1 deletion _sources/index.rst.txt
Original file line number Diff line number Diff line change
Expand Up @@ -101,7 +101,7 @@ about the scikit-learn library.
:titlesonly:

install
user_guide
user_guide/index
references/index
auto_examples/index
whats_new
Expand Down
19 changes: 19 additions & 0 deletions _sources/user_guide/index.rst.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
.. title:: User guide

.. _user_guide:

==========
User Guide
==========

Some info regarding RAG

Implementation details
======================

.. toctree::
:maxdepth: 2

text_scraping
information_retrieval
large_language_model
Original file line number Diff line number Diff line change
@@ -1,43 +1,13 @@
.. title:: User guide

.. _user_guide:

==========
User Guide
==========

Scraping
========

The scraping module provides some simple estimator that extract meaningful
documentation from the documentation website.

API documentation
-----------------

:class:`~ragger_duck.scraping.APINumPyDocExtractor` is a more advanced scraper
that uses `numpydoc` and it scraper to extract the documentation. Indeed, the
`numpydoc` scraper will parse the different sections and we build meaningful
chunks of documentation from the parsed sections. While, we don't control for
the chunk size, the chunks are build such that they contain information only
of a specific parameter and always refer to the class or function. We hope that
scraping in such way can remove ambiguity that could exist when building chunks
without any control.

User Guide documentation
------------------------

:class:`~ragger_duck.scraping.UserGuideDocExtractor` is a scraper that extract
documentation from the user guide. It is a simple scraper that extract
text information from the webpage. Additionally, this text can be chunked.
.. _information_retrieval:

=========
Retriever
=========

We differentiate two types of context retrievers: lexical and semantical.

Lexical retrievers
------------------
==================

In lexical retrievers, the idea is to have exact match between the query and
the documentation.
Expand All @@ -49,7 +19,7 @@ During the a query, we provide a similarity score between the query and the
each documentation chunk seen during training.

Semantical retrievers
---------------------
=====================

In semantical retrievers, the idea is to have a more flexible match between the
query and the documentation. We use an embedding model to project a document
Expand All @@ -65,21 +35,10 @@ As embedding, we provide a :class:`~ragger_duck.embedding.SentenceTransformer`
that download any pre-trained sentence transformers from HuggingFace.

Reranker: merging lexical and semantical retrievers
---------------------------------------------------
===================================================

If we use both lexical and semantical retrievers, we need to merge the results
of both retrievers. :class:`~ragger_duck.retrieval.RetrieverReranker` makes
such reranking by using a cross-encoder model. In our case, cross-encoder model
is trained on Microsoft Bing query-document pairs and is available on
HuggingFace.

Prompting
=========

Prompting for API documentation
-------------------------------

:class:`~ragger_duck.prompt.BasicPromptingStrategy` implements a prompting
strategy to answer documentation questions. We get context by reranking the
search from a lexical and semantical retrievers. Once the context is retrieved,
we request a Large Language Model (LLM) to answer the question.
13 changes: 13 additions & 0 deletions _sources/user_guide/large_language_model.rst.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
.. _large_language_model:

=========
Prompting
=========

Prompting for API documentation
===============================

:class:`~ragger_duck.prompt.BasicPromptingStrategy` implements a prompting
strategy to answer documentation questions. We get context by reranking the
search from a lexical and semantical retrievers. Once the context is retrieved,
we request a Large Language Model (LLM) to answer the question.
27 changes: 27 additions & 0 deletions _sources/user_guide/text_scraping.rst.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
.. _text_scraping:

=============
Text Scraping
=============

The scraping module provides some simple estimator that extract meaningful
documentation from the documentation website.

API documentation
=================

:class:`~ragger_duck.scraping.APINumPyDocExtractor` is a more advanced scraper
that uses `numpydoc` and it scraper to extract the documentation. Indeed, the
`numpydoc` scraper will parse the different sections and we build meaningful
chunks of documentation from the parsed sections. While, we don't control for
the chunk size, the chunks are build such that they contain information only
of a specific parameter and always refer to the class or function. We hope that
scraping in such way can remove ambiguity that could exist when building chunks
without any control.

User Guide documentation
========================

:class:`~ragger_duck.scraping.UserGuideDocExtractor` is a scraper that extract
documentation from the user guide. It is a simple scraper that extract
text information from the webpage. Additionally, this text can be chunked.
4 changes: 2 additions & 2 deletions about.html
Original file line number Diff line number Diff line change
Expand Up @@ -146,7 +146,7 @@


<li class="nav-item">
<a class="nav-link nav-internal" href="user_guide.html">
<a class="nav-link nav-internal" href="user_guide/index.html">
User Guide
</a>
</li>
Expand Down Expand Up @@ -293,7 +293,7 @@


<li class="nav-item">
<a class="nav-link nav-internal" href="user_guide.html">
<a class="nav-link nav-internal" href="user_guide/index.html">
User Guide
</a>
</li>
Expand Down
4 changes: 2 additions & 2 deletions auto_examples/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -147,7 +147,7 @@


<li class="nav-item">
<a class="nav-link nav-internal" href="../user_guide.html">
<a class="nav-link nav-internal" href="../user_guide/index.html">
User Guide
</a>
</li>
Expand Down Expand Up @@ -294,7 +294,7 @@


<li class="nav-item">
<a class="nav-link nav-internal" href="../user_guide.html">
<a class="nav-link nav-internal" href="../user_guide/index.html">
User Guide
</a>
</li>
Expand Down
4 changes: 2 additions & 2 deletions genindex.html
Original file line number Diff line number Diff line change
Expand Up @@ -144,7 +144,7 @@


<li class="nav-item">
<a class="nav-link nav-internal" href="user_guide.html">
<a class="nav-link nav-internal" href="user_guide/index.html">
User Guide
</a>
</li>
Expand Down Expand Up @@ -289,7 +289,7 @@


<li class="nav-item">
<a class="nav-link nav-internal" href="user_guide.html">
<a class="nav-link nav-internal" href="user_guide/index.html">
User Guide
</a>
</li>
Expand Down
6 changes: 3 additions & 3 deletions index.html
Original file line number Diff line number Diff line change
Expand Up @@ -146,7 +146,7 @@


<li class="nav-item">
<a class="nav-link nav-internal" href="user_guide.html">
<a class="nav-link nav-internal" href="user_guide/index.html">
User Guide
</a>
</li>
Expand Down Expand Up @@ -295,7 +295,7 @@


<li class="nav-item">
<a class="nav-link nav-internal" href="user_guide.html">
<a class="nav-link nav-internal" href="user_guide/index.html">
User Guide
</a>
</li>
Expand Down Expand Up @@ -430,7 +430,7 @@ <h5 class="sd-card-title">Getting started</h5>
<h5 class="sd-card-title">User guide</h5>
<p class="sd-card-text">The user guide provides in-depth information on the
key concepts of the RAG with useful background information and explanation.</p><div class="custom-button docutils container">
<p><a class="reference internal" href="user_guide.html#user-guide"><span class="std std-ref">To the user guide</span></a></p>
<p><a class="reference internal" href="user_guide/index.html#user-guide"><span class="std std-ref">To the user guide</span></a></p>
</div>
</div>
</div>
Expand Down
10 changes: 5 additions & 5 deletions install.html
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@
<link rel="author" title="About these documents" href="about.html" />
<link rel="index" title="Index" href="genindex.html" />
<link rel="search" title="Search" href="search.html" />
<link rel="next" title="User Guide" href="user_guide.html" />
<link rel="next" title="User Guide" href="user_guide/index.html" />
<link rel="prev" title="Ragger Duck documentation" href="index.html" />
<meta name="viewport" content="width=device-width, initial-scale=1"/>
<meta name="docsearch:language" content="en"/>
Expand Down Expand Up @@ -147,7 +147,7 @@


<li class="nav-item">
<a class="nav-link nav-internal" href="user_guide.html">
<a class="nav-link nav-internal" href="user_guide/index.html">
User Guide
</a>
</li>
Expand Down Expand Up @@ -294,7 +294,7 @@


<li class="nav-item">
<a class="nav-link nav-internal" href="user_guide.html">
<a class="nav-link nav-internal" href="user_guide/index.html">
User Guide
</a>
</li>
Expand Down Expand Up @@ -479,7 +479,7 @@ <h3>Build the scikit-learn documentation<a class="headerlink" href="#build-the-s
<h3>Train the semantic and lexical retrievers<a class="headerlink" href="#train-the-semantic-and-lexical-retrievers" title="Link to this heading">#</a></h3>
<p>We need to train a set of lexical and semantic retrievers on the API documentation,
the user guide, and the gallery of examples. We will have different retrievers
for each of these type of documentation. You can refer <a class="reference internal" href="user_guide.html#user-guide"><span class="std std-ref">User Guide</span></a> for more
for each of these type of documentation. You can refer <a class="reference internal" href="user_guide/index.html#user-guide"><span class="std std-ref">User Guide</span></a> for more
details on the strategy used to train the retrievers.</p>
<p>You can launch the training of the retrievers by running the following command:</p>
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span><span class="n">pixi</span> <span class="n">run</span> <span class="o">--</span><span class="n">frozen</span> <span class="n">train</span><span class="o">-</span><span class="n">retrievers</span>
Expand Down Expand Up @@ -536,7 +536,7 @@ <h3>Launch the Web Console<a class="headerlink" href="#launch-the-web-console" t
</div>
</a>
<a class="right-next"
href="user_guide.html"
href="user_guide/index.html"
title="next page">
<div class="prev-next-info">
<p class="prev-next-subtitle">next</p>
Expand Down
Binary file modified objects.inv
Binary file not shown.
4 changes: 2 additions & 2 deletions py-modindex.html
Original file line number Diff line number Diff line change
Expand Up @@ -147,7 +147,7 @@


<li class="nav-item">
<a class="nav-link nav-internal" href="user_guide.html">
<a class="nav-link nav-internal" href="user_guide/index.html">
User Guide
</a>
</li>
Expand Down Expand Up @@ -292,7 +292,7 @@


<li class="nav-item">
<a class="nav-link nav-internal" href="user_guide.html">
<a class="nav-link nav-internal" href="user_guide/index.html">
User Guide
</a>
</li>
Expand Down
4 changes: 2 additions & 2 deletions references/embedding.html
Original file line number Diff line number Diff line change
Expand Up @@ -147,7 +147,7 @@


<li class="nav-item">
<a class="nav-link nav-internal" href="../user_guide.html">
<a class="nav-link nav-internal" href="../user_guide/index.html">
User Guide
</a>
</li>
Expand Down Expand Up @@ -294,7 +294,7 @@


<li class="nav-item">
<a class="nav-link nav-internal" href="../user_guide.html">
<a class="nav-link nav-internal" href="../user_guide/index.html">
User Guide
</a>
</li>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -147,7 +147,7 @@


<li class="nav-item">
<a class="nav-link nav-internal" href="../../user_guide.html">
<a class="nav-link nav-internal" href="../../user_guide/index.html">
User Guide
</a>
</li>
Expand Down Expand Up @@ -294,7 +294,7 @@


<li class="nav-item">
<a class="nav-link nav-internal" href="../../user_guide.html">
<a class="nav-link nav-internal" href="../../user_guide/index.html">
User Guide
</a>
</li>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -147,7 +147,7 @@


<li class="nav-item">
<a class="nav-link nav-internal" href="../../user_guide.html">
<a class="nav-link nav-internal" href="../../user_guide/index.html">
User Guide
</a>
</li>
Expand Down Expand Up @@ -294,7 +294,7 @@


<li class="nav-item">
<a class="nav-link nav-internal" href="../../user_guide.html">
<a class="nav-link nav-internal" href="../../user_guide/index.html">
User Guide
</a>
</li>
Expand Down
4 changes: 2 additions & 2 deletions references/generated/ragger_duck.retrieval.BM25Retriever.html
Original file line number Diff line number Diff line change
Expand Up @@ -147,7 +147,7 @@


<li class="nav-item">
<a class="nav-link nav-internal" href="../../user_guide.html">
<a class="nav-link nav-internal" href="../../user_guide/index.html">
User Guide
</a>
</li>
Expand Down Expand Up @@ -294,7 +294,7 @@


<li class="nav-item">
<a class="nav-link nav-internal" href="../../user_guide.html">
<a class="nav-link nav-internal" href="../../user_guide/index.html">
User Guide
</a>
</li>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -147,7 +147,7 @@


<li class="nav-item">
<a class="nav-link nav-internal" href="../../user_guide.html">
<a class="nav-link nav-internal" href="../../user_guide/index.html">
User Guide
</a>
</li>
Expand Down Expand Up @@ -294,7 +294,7 @@


<li class="nav-item">
<a class="nav-link nav-internal" href="../../user_guide.html">
<a class="nav-link nav-internal" href="../../user_guide/index.html">
User Guide
</a>
</li>
Expand Down
Loading

0 comments on commit 60bfec1

Please sign in to comment.