FastAPI implementation of BAAI/bge-m3 encoder, containerized for scalable Kubernetes deployment.
This project provides a high-performance API for generating sentence embeddings using the BAAI/bge-m3 model. It's built with FastAPI for efficient handling of requests and containerized for easy deployment and scaling in Kubernetes environments.
- Fast and efficient sentence encoding using BAAI/bge-m3 model
- RESTful API built with FastAPI
- Docker containerization for consistent environments
- Kubernetes deployment ready
- Scalable architecture suitable for high-load environments
- Health check endpoints for Kubernetes probes
- Python 3.9+
- Docker
- Kubernetes cluster (for production deployment)
-
Clone the repository:
git clone [email protected]:jeroenherczeg/sentence-encoder-bge-m3.git cd sentence-encoder-bge-m3
-
Create a virtual environment and install dependencies:
python -m venv venv source venv/bin/activate pip install -r requirements.txt
-
Run the FastAPI server:
uvicorn main:app --reload
-
Access the API at
http://localhost:8000
and the interactive docs athttp://localhost:8000/docs
-
Build the Docker image:
docker build -t sentence-encoder-bge-m3:latest .
-
Run the container:
docker run -p 8000:8000 sentence-encoder-bge-m3:latest
-
Apply the Kubernetes manifests:
kubectl apply -f kubernetes/
-
Access the service (method depends on your Kubernetes setup)
Endpoint: POST /encode
curl -X POST "http://localhost:8000/encode" -H "Content-Type: application/json" -d '{"sentences": ["Hello, world!", "This is a test sentence."]}'
Request Body:
{
"sentences": ["Hello, world!", "Another sentence to encode."]
}
Response:
{
"encodings": [
[0.1, 0.2, 0.3, ...],
[0.4, 0.5, 0.6, ...]
]
}
Endpoint: GET /readiness
curl http://localhost:8000/readiness
Endpoint: GET /liveness
curl http://localhost:8000/liveness
This API uses the BAAI/bge-m3 model, which is a state-of-the-art sentence embedding model. It's designed to generate high-quality vector representations of sentences that capture semantic meaning, making it ideal for various natural language processing tasks such as semantic search, text classification, and similarity comparison.
The BAAI/bge-m3 model offers a good balance between performance and accuracy. In our testing, it processes approximately 457 sentences per second on a standard CPU. For production environments, we recommend GPU acceleration for higher throughput.
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/AmazingFeature
) - Commit your changes (
git commit -m 'Add some AmazingFeature'
) - Push to the branch (
git push origin feature/AmazingFeature
) - Open a Pull Request
Distributed under the MIT License. See LICENSE
for more information.
- BAAI for the bge-m3 model
- FastAPI for the web framework
- Sentence Transformers for the embedding framework
- Docker for containerization
- Kubernetes for orchestration