BioChat

BioChat is an intelligent conversational interface for biological databases that combines the power of large language models with specialized biological data sources. It enables researchers and professionals to interact with multiple biological databases using natural language queries.

Supported Databases and APIs

BioChat integrates with multiple biological databases to provide comprehensive analysis capabilities:

Process FlowChart

Core Databases

NCBI PubMed (E-utilities)
- Literature search and analysis
- Citation information
- Article abstracts
Open Targets Platform
- Target-disease associations
- Drug information
- Clinical trials data
- Safety information
- Expression data
Reactome
- Biological pathways
- Molecular mechanisms
- Disease pathways
Ensembl
- Genetic variants
- Genomic annotations
- Gene information

Molecular Interactions and Networks

STRING-DB
- Protein-protein interactions
- Interaction networks
- Functional associations
IntAct
- Molecular interaction data
- Experimentally verified interactions
- Interaction networks
BioGRID
- Protein-protein interactions
- Genetic interactions
- Chemical associations

Disease and Drug Resources

GWAS Catalog
- Genetic associations
- Trait information
- Study metadata
PharmGKB
- Drug-gene relationships
- Clinical annotations
- Pharmacogenomic data

Protein and Pathway Information

UniProt
- Protein sequences
- Protein structure
- Function annotations
BioCyc
- Metabolic pathways
- Biochemical reactions
- Regulatory networks

Integration Features

Each database provides specific capabilities:

Literature mining and evidence synthesis (PubMed)
Drug-target interactions and clinical relevance (Open Targets, PharmGKB)
Molecular interaction networks (STRING-DB, IntAct, BioGRID)
Pathway analysis and mechanisms (Reactome, BioCyc)
Genetic variation and genomic features (Ensembl)
Disease associations and pharmacogenomics (GWAS Catalog, PharmGKB)
Protein information and annotations (UniProt)

Project Structure

biochat/

├── LICENSE
├── README.md
├── commitss.sh
├── config.py
├── htmlcov
│   ├── status.json
│   └── style_cb_8e611ae1.css
├── requirements.txt
├── run_tests.sh
├── setup.py
├── src
│   ├── APIHub.py
│   ├── __init__.py
│   ├── __pycache__
│   ├── api.py
│   ├── orchestrator.py
│   ├── schemas.py
│   └── tool_executor.py
└── tests
    ├── __init__.py
    ├── __pycache__
    ├── conftest.py
    ├── integration
    │   ├── __init__.py
    │   ├── __pycache__
    │   ├── conftest.py
    │   ├── test_gwas.py
    │   ├── test_literature.py
    │   ├── test_protein.py
    │   └── test_variants.py
    ├── logs
    ├── pytest.ini
    ├── test_api.py
    ├── test_logger.py
    ├── test_orchestrator.py
    ├── test_requirements.txt
    ├── test_tool_executor.py
    └── utils
        ├── __init__.py
        ├── __pycache__
        └── logger.py

Prerequisites

Python 3.9 or higher
OpenAI API key
NCBI API key
Valid email address for API access

Installation

Clone the repository:

git clone https://github.com/yourusername/biochat.git
cd biochat

Create and activate a virtual environment:

python -m venv venv
source venv/bin/activate  # On Windows use: venv\Scripts\activate

Install dependencies:

pip install -r requirements.txt

Configuration

Create a .env file in the project root with the following variables:

OPENAI_API_KEY=your_openai_api_key
NCBI_API_KEY=your_ncbi_api_key
CONTACT_EMAIL=[email protected]
PORT=8000
HOST=0.0.0.0

Running the Application

Start the FastAPI server from the project root:

python -m src.api

The API will be available at http://localhost:8000. Access the interactive API documentation at http://localhost:8000/docs.

API Endpoints

The following endpoints are available:

POST /query: Process a natural language query about biological topics
GET /history: Retrieve the conversation history
POST /clear: Clear the conversation history
GET /health: Check the API's health status

Example Usage

Here's an example of how to use the API:

import requests

# Send a query
response = requests.post(
    "http://localhost:8000/query",
    json={"text": "What are the known genetic variants associated with cystic fibrosis?"}
)

print(response.json())

Development Guidelines

When developing new features:

Place all source code in the src directory
Add test files in the tests directory

Follow the package structure for imports:

from src.schemas import LiteratureSearchParams
from src.tool_executor import ToolExecutor

Update requirements.txt when adding new dependencies
Maintain consistent error handling and logging practices
Follow the established code style and documentation standards

Error Handling

The API implements comprehensive error handling:

Input validation errors return 422 status code
Authentication errors return 401 status code
Server errors return 500 status code
All errors include detailed error messages and timestamps

Testing

To run the test suite:

pytest tests/

Contributing

Fork the repository
Create a feature branch
Commit your changes
Push to the branch
Create a Pull Request

License

This project is licensed under the GNU AFFERO License - see the LICENSE file for details.

Support

For support, please open an issue in the GitHub repository or contact the maintainers.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BioChat

Supported Databases and APIs

Process FlowChart

Core Databases

Molecular Interactions and Networks

Disease and Drug Resources

Protein and Pathway Information

Integration Features

Project Structure

Prerequisites

Installation

Configuration

Running the Application

API Endpoints

Example Usage

Development Guidelines

Error Handling

Testing

Contributing

License

Support

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 95 Commits
HCG		HCG
src		src
tests		tests
.env.test		.env.test
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
config.py		config.py
requirements.txt		requirements.txt
run_tests.sh		run_tests.sh
setup.py		setup.py

License

HeartBioPortal/BioChat

Folders and files

Latest commit

History

Repository files navigation

BioChat

Supported Databases and APIs

Process FlowChart

Core Databases

Molecular Interactions and Networks

Disease and Drug Resources

Protein and Pathway Information

Integration Features

Project Structure

Prerequisites

Installation

Configuration

Running the Application

API Endpoints

Example Usage

Development Guidelines

Error Handling

Testing

Contributing

License

Support

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages