Retrieval-Augmented Generation: Impact on Information Retrieval Systems for SMEs

This repository contains the source code for the case study exploring the effects of Retrieval-Augmented Generation (RAG) on Information Retrieval Systems tailored for Small and Medium-Sized Enterprises (SMEs) as part of my bachelor thesis.

Getting Started

To begin examining the results yourself, follow these steps to set up the environment:

Clone the repository:

git clone https://github.com/nneubacher/Bachelorarbeit.git

Install the necessary dependencies:
```
pip install -r requirements.txt
```

Repository Contents

data/: Directory containing the Stanford Question Answering Dataset.
chromaDB/: Directory containing the vector store with the embeddings of SQuAD.
toChroma.py: Script for embedding and storing data from the data directory into the chromaDB vector store.
RAG.py: Evaluation script for the RAG-based information retrieval system.
noRAG.py: Evaluation script for the base model information retrieval system.
compare.ipynb: Jupyter notebook for interactive analysis and comparison.
compare.py: Python script for comparing total correct predictions at different thresholds.
predictions.json: Output from the RAG-based Information Retrieval system.
predictions_gpt.json: Output from the Information Retrieval system using just the base model.

Usage

After setting up, you can run compare.py or explore compare.ipynb to see how different configurations of the Information Retrieval system perform with respect to your datasets and queries.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Retrieval-Augmented Generation: Impact on Information Retrieval Systems for SMEs

Getting Started

Repository Contents

Usage

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
chromaDB		chromaDB
data		data
.gitignore		.gitignore
RAG.py		RAG.py
README.md		README.md
compare.ipynb		compare.ipynb
compare.py		compare.py
noRAG.py		noRAG.py
predictions.json		predictions.json
predictions_gpt.json		predictions_gpt.json
requirements.txt		requirements.txt
toChroma.py		toChroma.py

nneubacher/Bachelorarbeit

Folders and files

Latest commit

History

Repository files navigation

Retrieval-Augmented Generation: Impact on Information Retrieval Systems for SMEs

Getting Started

Repository Contents

Usage

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages