Skip to content

Latest commit

 

History

History
78 lines (50 loc) · 3.32 KB

README.md

File metadata and controls

78 lines (50 loc) · 3.32 KB

Distributed Inference For LLM💻

Run LLMs using distributed GPU architecture

Generate text with distributed Llama 3 , Falcon (40B+), BLOOM (176B) (or their derivatives), and fine‑tune them for your own tasks — right from your desktop computer or Google Colab:

🐧 Linux + Anaconda. Run these commands for NVIDIA GPUs (or follow this for AMD):

conda install pytorch pytorch-cuda=11.7 -c pytorch -c nvidia
pip install git+https://github.com/bigscience-workshop/petals
python -m petals.cli.run_server petals-team/StableBeluga2

🪟 Windows + WSL. Follow this guide on our Wiki.

🐋 Docker. Run our Docker image for NVIDIA GPUs (or follow this for AMD):

sudo docker run -p 31330:31330 --ipc host --gpus all --volume petals-cache:/cache --rm \
    learningathome/petals:main \
    python -m petals.cli.run_server --port 31330 petals-team/StableBeluga2

🍏 macOS + Apple M1/M2 GPU. Install Homebrew, then run these commands:

brew install python
python3 -m pip install git+https://github.com/bigscience-workshop/petals
python3 -m petals.cli.run_server petals-team/StableBeluga2

How to Do a sample inference

from transformers import AutoTokenizer
from petals import AutoDistributedModelForCausalLM

# Choose any model available at https://health.petals.dev
model_name = "petals-team/StableBeluga2"  # This one is fine-tuned Llama 2 (70B)

# Connect to a distributed network hosting model layers
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoDistributedModelForCausalLM.from_pretrained(model_name)

# Run the model as if it were on your computer
inputs = tokenizer("A cat sat", return_tensors="pt")["input_ids"]
outputs = model.generate(inputs, max_new_tokens=5)
print(tokenizer.decode(outputs[0]))  # A cat sat on a mat...

How does it work?

  • You load a small part of the model, and host a private swarm, and then others can join the network

image

How to use:

Useful tools:

Advanced guides:

  • Launch a private swarm: guide
  • Run a custom model: guide

Our results in comparison to other SOTA models currently:

MMLU Score