BioChat is an intelligent conversational interface for biological databases that combines the power of large language models with specialized biological data sources. It enables researchers and professionals to interact with multiple biological databases using natural language queries.
BioChat integrates with multiple biological databases to provide comprehensive analysis capabilities:
-
NCBI PubMed (E-utilities)
- Literature search and analysis
- Citation information
- Article abstracts
-
Open Targets Platform
- Target-disease associations
- Drug information
- Clinical trials data
- Safety information
- Expression data
-
Reactome
- Biological pathways
- Molecular mechanisms
- Disease pathways
-
Ensembl
- Genetic variants
- Genomic annotations
- Gene information
-
STRING-DB
- Protein-protein interactions
- Interaction networks
- Functional associations
-
IntAct
- Molecular interaction data
- Experimentally verified interactions
- Interaction networks
-
BioGRID
- Protein-protein interactions
- Genetic interactions
- Chemical associations
-
GWAS Catalog
- Genetic associations
- Trait information
- Study metadata
-
PharmGKB
- Drug-gene relationships
- Clinical annotations
- Pharmacogenomic data
-
UniProt
- Protein sequences
- Protein structure
- Function annotations
-
BioCyc
- Metabolic pathways
- Biochemical reactions
- Regulatory networks
Each database provides specific capabilities:
- Literature mining and evidence synthesis (PubMed)
- Drug-target interactions and clinical relevance (Open Targets, PharmGKB)
- Molecular interaction networks (STRING-DB, IntAct, BioGRID)
- Pathway analysis and mechanisms (Reactome, BioCyc)
- Genetic variation and genomic features (Ensembl)
- Disease associations and pharmacogenomics (GWAS Catalog, PharmGKB)
- Protein information and annotations (UniProt)
biochat/
├── LICENSE
├── README.md
├── commitss.sh
├── config.py
├── htmlcov
│ ├── status.json
│ └── style_cb_8e611ae1.css
├── requirements.txt
├── run_tests.sh
├── setup.py
├── src
│ ├── APIHub.py
│ ├── __init__.py
│ ├── __pycache__
│ ├── api.py
│ ├── orchestrator.py
│ ├── schemas.py
│ └── tool_executor.py
└── tests
├── __init__.py
├── __pycache__
├── conftest.py
├── integration
│ ├── __init__.py
│ ├── __pycache__
│ ├── conftest.py
│ ├── test_gwas.py
│ ├── test_literature.py
│ ├── test_protein.py
│ └── test_variants.py
├── logs
├── pytest.ini
├── test_api.py
├── test_logger.py
├── test_orchestrator.py
├── test_requirements.txt
├── test_tool_executor.py
└── utils
├── __init__.py
├── __pycache__
└── logger.py
- Python 3.9 or higher
- OpenAI API key
- NCBI API key
- Valid email address for API access
- Clone the repository:
git clone https://github.com/yourusername/biochat.git
cd biochat
- Create and activate a virtual environment:
python -m venv venv
source venv/bin/activate # On Windows use: venv\Scripts\activate
- Install dependencies:
pip install -r requirements.txt
Create a .env
file in the project root with the following variables:
OPENAI_API_KEY=your_openai_api_key
NCBI_API_KEY=your_ncbi_api_key
CONTACT_EMAIL=[email protected]
PORT=8000
HOST=0.0.0.0
Start the FastAPI server from the project root:
python -m src.api
The API will be available at http://localhost:8000
. Access the interactive API documentation at http://localhost:8000/docs
.
The following endpoints are available:
POST /query
: Process a natural language query about biological topicsGET /history
: Retrieve the conversation historyPOST /clear
: Clear the conversation historyGET /health
: Check the API's health status
Here's an example of how to use the API:
import requests
# Send a query
response = requests.post(
"http://localhost:8000/query",
json={"text": "What are the known genetic variants associated with cystic fibrosis?"}
)
print(response.json())
When developing new features:
- Place all source code in the
src
directory - Add test files in the
tests
directory - Follow the package structure for imports:
from src.schemas import LiteratureSearchParams from src.tool_executor import ToolExecutor
- Update requirements.txt when adding new dependencies
- Maintain consistent error handling and logging practices
- Follow the established code style and documentation standards
The API implements comprehensive error handling:
- Input validation errors return 422 status code
- Authentication errors return 401 status code
- Server errors return 500 status code
- All errors include detailed error messages and timestamps
To run the test suite:
pytest tests/
- Fork the repository
- Create a feature branch
- Commit your changes
- Push to the branch
- Create a Pull Request
This project is licensed under the GNU AFFERO License - see the LICENSE file for details.
For support, please open an issue in the GitHub repository or contact the maintainers.