This repository contains the necessary files and instructions to deploy a GPT-2 language model on Kubernetes using Helm. The setup includes creating a Kubernetes deployment, service, and ingress to expose the model for inference.
The project demonstrates how to deploy an LLM (GPT-2) on Kubernetes. The application is containerized and deployed with a Helm chart, allowing for easy management and scalability.
- The GPT-2 model is packaged in a Docker container for easy deployment.
- Helm Chart: A Helm chart (llm-inference-svc) is provided to manage Kubernetes resources.
- Kubernetes cluster (Minikube, EKS, GKE, AKS, etc.)
- Helm
- Docker
- kubectl
- Clone the Repository
git clone [email protected]:dawood9598/gpt-k8s-inference.git
- Deploy with Helm Install or upgrade the Helm chart to deploy the GPT-2 model on your Kubernetes cluster.
helm upgrade --install llm-inference-svc chart/
- Access the Service
curl -X POST "http://<external-ip>/generate-text" -H "Content-Type: application/json" -d '{"prompt": "Once upon a time", "max_length": 50}'