Skip to content

rudra-singh1/ragChatbot

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Biomedical RAG Application

A Retrieval Augmented Generation (RAG) implementation for medical domain queries using BioMistral-7B and other open-source components. This application provides accurate medical information retrieval and response generation while keeping all processing local and private.

Python

Features

  • Fully Local Processing: All computations run on-premise without external API calls
  • Domain-Specific Models: Uses medical-specialized language and embedding models
  • Self-hosted Vector Database: Scalable vector storage using Qdrant
  • Interactive Web Interface: Clean UI for easy interaction with the system
  • Document Source Tracking: Provides source context for all generated responses

Architecture

  • LLM: BioMistral-7B (medical domain-specific model)
  • Embeddings: PubMedBERT-based embeddings (medical domain-specific)
  • Vector Database: Qdrant (self-hosted)
  • Framework: LangChain + LlamaCpp
  • API: FastAPI
  • Frontend: HTML/JavaScript with Bootstrap

Prerequisites

  • Python 3.8+
  • Docker
  • 16GB+ RAM
  • CPU with AVX2 support (for LlamaCpp)

Installation

  1. Clone the repository:
git clone https://github.com/rudra-singh1/ragChatbot
cd biomedical-rag
  1. Create and activate a virtual environment:
python -m venv venv
source venv/bin/activate  # On Windows: .\venv\Scripts\activate
  1. Install dependencies:
pip install -r requirements.txt
  1. Download the BioMistral model:
  1. Start Qdrant:
docker pull qdrant/qdrant
docker run -p 6333:6333 qdrant/qdrant

Configuration

  1. Place your medical documents in the data/ directory (supports PDF, TXT)
  2. Update the model path in app.py if needed:
LOCAL_LLM_PATH = "biomistral-7b.Q4_K_M.gguf"

Usage

  1. First, ingest your documents to create vectors:
python ingest.py
  1. Start the application:
uvicorn app:app --reload
  1. Access the web interface at http://localhost:8000

API Endpoints

  • GET /: Main web interface
  • POST /get_response: Query endpoint
    • Input: {"query": "your medical question here"}
    • Returns: {"answer": "response", "context": "source context", "source": "document name"}

Technical Details

Vector Database Configuration

  • Collection Name: vector_db
  • Vector Dimension: 768 (PubMedBERT embedding size)
  • Distance Metric: Cosine Similarity

LLM Configuration

  • Temperature: 0.1
  • Max Tokens: 2048
  • Model: BioMistral-7B (4-bit quantized)

Document Processing

  • Chunk Size: 700 tokens
  • Chunk Overlap: 70 tokens
  • Top-k retrieval: 2 documents

Performance Considerations

  • Average response time: 30-40 seconds on CPU
  • RAM usage: ~8GB during operation
  • Storage: Depends on document volume (vectors typically 20% of raw text size)

Limitations

  • CPU-only implementation (can be extended to GPU)
  • No chat memory/history (stateless queries)
  • Response time dependent on CPU capabilities
  • Limited to medical domain queries

Future Improvements

  • Add chat memory for contextual conversations
  • Implement streaming responses
  • Add GPU support
  • Improve document preview functionality
  • Add more medical document formats support
  • Implement authentication

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages