Overview

This repository contains the necessary files and instructions to deploy a GPT-2 language model on Kubernetes using Helm. The setup includes creating a Kubernetes deployment, service, and ingress to expose the model for inference.

Overview

The project demonstrates how to deploy an LLM (GPT-2) on Kubernetes. The application is containerized and deployed with a Helm chart, allowing for easy management and scalability.

Containerized GPT Model:

The GPT-2 model is packaged in a Docker container for easy deployment.
Helm Chart: A Helm chart (llm-inference-svc) is provided to manage Kubernetes resources.

Prerequisites

Kubernetes cluster (Minikube, EKS, GKE, AKS, etc.)
Helm
Docker
kubectl

Getting Started

Clone the Repository

git clone [email protected]:dawood9598/gpt-k8s-inference.git

Deploy with Helm Install or upgrade the Helm chart to deploy the GPT-2 model on your Kubernetes cluster.

helm upgrade --install llm-inference-svc chart/

Access the Service

curl -X POST "http://<external-ip>/generate-text" -H "Content-Type: application/json" -d '{"prompt": "Once upon a time", "max_length": 50}'

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
app		app
chart		chart
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Overview

Containerized GPT Model:

Prerequisites

Getting Started

About

Releases

Packages

Languages

dawood9598/gpt-k8s-inference

Folders and files

Latest commit

History

Repository files navigation

Overview

Containerized GPT Model:

Prerequisites

Getting Started

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages