A gradio web UI for running Large Language Models like LLaMA, llama.cpp, GPT-J, Pythia, OPT, and GALACTICA.
Its goal is to become the AUTOMATIC1111/stable-diffusion-webui of text generation.
The Tags of the Docker Image is the same as the release on the official repository. I.e. "v1.3.1" or "v1.4"
- 3 interface modes: default, notebook, and chat
- Multiple model backends: tranformers, llama.cpp, AutoGPTQ, GPTQ-for-LLaMa, ExLlama, RWKV, FlexGen
- Dropdown menu for quickly switching between different models
- LoRA: load and unload LoRAs on the fly, load multiple LoRAs at the same time, train a new LoRA
- Precise instruction templates for chat mode, including Alpaca, Vicuna, Open Assistant, Dolly, Koala, ChatGLM, MOSS, RWKV-Raven, Galactica, StableLM, WizardLM, Baize, Ziya, Chinese-Vicuna, MPT, INCITE, Wizard Mega, KoAlpaca, Vigogne, Bactrian, h2o, and OpenBuddy
- Multimodal pipelines, including LLaVA and MiniGPT-4
- 8-bit and 4-bit inference through bitsandbytes
- CPU mode for transformers models
- DeepSpeed ZeRO-3 inference
- Extensions
- Custom chat characters
- Very efficient text streaming
- Markdown output with LaTeX rendering, to use for instance with GALACTICA
- Nice HTML output for GPT-4chan
- API, including endpoints for websocket streaming (see the examples)
To learn how to use the various features, check out the Documentation: https://github.com/oobabooga/text-generation-webui/tree/main/docs
- oobabooga/text-generation-webui
- Prebuild Docker Images by zjuuu