Runpod Serverless GGUF

Builds docker image for runpod serverless containers based on GGUF models from huggingface.

As of now, only supports models without split files (due to Huggingface limitation < 50GB).

Building

To build the easy way, use the python script and dynamically input the parameters:

$ python build.py
Enter MODEL (default: TheBloke/CodeLlama-34B-Instruct-GGUF):
Enter QUANTITIZATION (default: Q5_K_S):
Enter IS_70B (yes/no, default: no):
Enter HF_TOKEN:
Enter REPOSITORY_NAME (default: myuser/my-test-image):
Push Image to repository (yes/no, default: no):
Run docker as sudo (yes/no, default: no):
Building Image: myuser/my-test-image:thebloke-codellama-34b-instruct-gguf-Q5_K_S-CUBLAS

Alternatively, build it yourself:

export DOCKER_BUILDKIT=1 # Important to activate buildkit
export MODEL=TheBloke/CodeLlama-34B-Instruct-GGUF # Repository name, make sure .gguf files are existend
export QUANTITIZATION=Q5_K_S # This must match the filename for the target quantitization, will download codellama-34b-instruct.Q5_K_S.gguf
export HF_TOKEN=your_hugging_face_token_here

docker build -t runpod-images:dev . --platform linux/amd64 --build-arg HUGGING_FACE_HUB_TOKEN=$HF_TOKEN --build-arg MODEL_NAME=$MODEL --build-arg QUANTITIZATION=$QUANTITIZATION

If model is a 70b parameter model, the following build arg must be added:

--build-arg IS_70B=True

Testing Image

Create file test_input.json with input values for the model:

{
  "input": {
    "prompt": "Q: What is Python?\nA:",
    "stop": ["Q:", "\n"],
    "max_tokens": 50
  }
}

To run the image with the input:

docker run -v $(pwd)/test_input.json:/test_input.json runpod-images:dev

Usage

Following environment variables are available:

N_CTX: Context Size
NUM_GPU_SHARD: Number of GPU shards.

The POST Request input should have the structure of the input key in the test file:

{
  "prompt": "Q: What is Python?\nA:",
  "stop": ["Q:", "\n"],
  "max_tokens": 50
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Runpod Serverless GGUF

Building

Testing Image

Usage

Files

README.md

Latest commit

History

README.md

File metadata and controls

Runpod Serverless GGUF

Building

Testing Image

Usage