Skip to content

Releases: oobabooga/text-generation-webui

v2.2

09 Jan 21:48
e6eda6a
Compare
Choose a tag to compare

Changes

  • UI:
    • Add a new "Branch chat" option to the chat tab.
    • Add a new "Search chats" menu to the chat tab.
    • Improve handling of markdown lists (#6626). This greatly improves the rendering of lists and nested lists in the UI. Thanks, @mamei16.
    • Reduce the size of HTML and CSS sent to the UI during streaming. This improves performance and reduces CPU usage.
    • Optimize the JavaScript to reduce the CPU usage during streaming.
    • Add a horizontal scrollbar to code blocks that are wider than the chat area.
  • Make responses start faster by removing unnecessary cleanup calls (#6625). This removes a 0.2 second delay for llama.cpp and ExLlamaV2 while also increasing the reported tokens/second.
  • Add a --torch-compile flag for transformers (improves performance).
  • Add a "Static KV cache" option for transformers (improves performance).
  • Connect XTC, DRY, smoothing_factor, and dynatemp to the ExLlamaV2 loader (non-HF).
  • Remove the AutoGPTQ loader (#6641). The project was discontinued, and no wheels had been available for a while. GPTQ models can still be loaded through ExLlamaV2.
  • Streamline the one-click installer by asking one question to NVIDIA users instead of two.
  • Add a --exclude-pattern flag to the download-model.py script (#6542). Thanks, @JackCloudman.
  • Add IPv6 support to the API (#6559). Thanks, @BPplays.

Bug fixes

  • Fix an orjson.JSONDecodeError error on page reload.
  • Fix the font size of lists in chat mode.
  • Fix CUDA error on MPS backend during API request (#6572). Thanks, @skywinder.
  • Add UnicodeDecodeError workaround for modules/llamacpp_model.py (#6040). Thanks, @nclok1405.
  • Training_PRO fix: add if 'quantization_config' in shared.model.config.to_dict() (#6640). Thanks, @FartyPants.

Backend updates

  • llama-cpp-python: bump to 0.3.6 (llama.cpp commit f7cd13301c2a88f97073fd119072b4cc92c08df1, January 8, 2025).

v2.1

31 Dec 23:48
88a6331
Compare
Choose a tag to compare
Before After

Changes

  • Organize the Parameters tab (see above): group similar input fields together (sliders, checkboxes, etc) and create headings for types of parameters (curve shape, curve cutoff), to reduce visual clutter and improve navigation.
  • Organize the Model tab in a similar way.
  • Improve the style of headings, lists, and links in chat messages.
  • Improve the typing cursor | that appears during chat streaming.
  • Slighly improve the chat colors in light mode.
  • Reduce the number of built-in presets from 11 to 6, removing presets that I do not consider useful and adding two new presets that I personally use: Instruct and Creative. The old presets can be found here.

Bug fixes

  • Fix interface loading with dark theme even when 'dark_theme' is set to false (#6614). Thanks @mamei16.
  • Fix newlines in the markdown renderer (#6599). Thanks @mamei16.

Backend updates

  • ExLlamaV2: bump to 0.2.7.

v2.0

19 Dec 02:37
4d466d5
Compare
Choose a tag to compare

v2.0 - New looks for text-generation-webui!

BEFORE AFTER
Image1 Image1
Image1 Image1
Image1 Image1
Image1 Image1

Changes

  • Improved the UI by pushing Gradio to its limits and making it look like ChatGPT, specifically the early 2023 ChatGPT look (which I think looked better than the current darker theme).
    • I have used chatbot-ui (the "legacy" version, v1.0, April/2023) as a reference for the old ChatGPT styles, and copied a lot of CSS and some icons from there. Credits to chatbot-ui!
    • Mobile support is now much better, with collapsible sidebars added for easier navigation.
    • Better, more readable fonts in instruct mode.
    • Improved "past chats" menu, now in its own sidebar visually separated from the chat area.
    • Converted the top navigation bar (Chat / Default / Notebook, etc.) into a vertical sidebar on the left.
    • Reduced margins and removed borders throughout the UI. The "Parameters" tab looks much tidier now, closer to how Gradio is used in AUTOMATIC1111/stable-diffusion-webui.
    • Updated Gradio from version 4.26.0 to 4.37.1, bringing important security fixes.
    • For people who feel nostalgic about the old colors, a new --old-colors flag has been added to make the UI as similar as possible to its previous look.
  • Improved HTML rendering for lists with sub-lists (sub-items were not previously rendered correctly).
  • Allow more granular KV cache settings (#6561). Thanks @dinerburger.

Bug fixes

  • openai extension fix: Handle Multiple Content Items in Messages (#6528). Thanks @hronoas.
  • Filter whitespaces in downloader fields in model tab (#6518). Thanks @mefich.
  • Fix an issue caused during the installation of tts (#6496). Thanks @Aluisio-Pires.
  • Fix the history upload event in the UI.

Backend updates

  • llama-cpp-python: bump to 0.3.5.
  • ExLlamaV2: bump to 0.2.6.
  • Transformers: bump to 4.47.
  • flash-attention: bump to v2.7.2.post1.
  • Accelerate: bump to 1.2.
  • bitsandbytes: bump to 0.45.

v1.16

25 Oct 04:10
cc8c7ed
Compare
Choose a tag to compare

Backend updates

  • Transformers: bump to 4.46.
  • Accelerate: bump to 1.0.

Changes

  • Add whisper turbo (#6423). Thanks @SeanScripts.
  • Add RWKV-World instruction template (#6456). Thanks @MollySophia.
  • Minor Documentation update - query cuda compute for docker .env (#6469). Thanks @practical-dreamer.
  • Remove lm_eval and optimum from requirements (they don't seem to be necessary anymore).

Bug fixes

  • Fix llama.cpp loader not being random. Thanks @reydeljuego12345.
  • Fix temperature_last when temperature not in sampler priority (#6439). Thanks @ThisIsPIRI.
  • Make token bans work again on HF loaders (#6488). Thanks @ThisIsPIRI.
  • Fix for systems that have bash in a non-standard directory (#6428). Thanks @LuNeder.
  • Fix intel bug described in #6253 (#6433). Thanks @schorschie.
  • Fix locally compiled llama-cpp-python failing to import.

v1.15

01 Oct 17:48
3b06cb4
Compare
Choose a tag to compare

Backend updates

  • Transformers: bump to 4.45.
  • ExLlamaV2: bump to 0.2.3.
  • flash-attention: bump to 2.6.3.
  • llama-cpp-python: bump to 0.3.1.
  • bitsandbytes: bump to 0.44.
  • PyTorch: bump to 2.4.1.
  • ROCm: bump wheels to 6.1.2.
  • Remove AutoAWQ, AutoGPTQ, HQQ, and AQLM from requirements.txt:
    • AutoAWQ and AutoGPTQ were removed due to lack of support for PyTorch 2.4.1 and CUDA 12.1.
    • HQQ and AQLM were removed to make the project leaner since they're experimental with limited use.
    • You can still install those libraries manually if you are interested.

Changes

  • Exclude Top Choices (XTC): A sampler that boosts creativity, breaks writing clichés, and inhibits non-verbatim repetition (#6335). Thanks @p-e-w.
  • Make it possible to sort repetition penalties with "Sampler priority". The new keywords are:
    • repetition_penalty
    • presence_penalty
    • frequency_penalty
    • dry
    • encoder_repetition_penalty
    • no_repeat_ngram
    • xtc (not a repetition penalty but also added in this update)
  • Don't import PEFT unless necessary. This makes the web UI launch faster.
  • Add beforeunload event to add confirmation dialog when leaving page (#6279). Thanks @leszekhanusz.
  • update API documentation with examples to list/load models (#5902). Thanks @joachimchauvet.
  • Training pro update script.py (#6359). Thanks @FartyPants.

Bug fixes

  • Fix UnicodeDecodeError for BPE-based Models (especially GLM-4) (#6357). Thanks @GralchemOz.
  • API: Relax multimodal format, fixes HuggingFace Chat UI (#6353). Thanks @Papierkorb.
  • Force /bin/bash shell for conda (#6386). Thanks @Thireus.
  • Do not set value for histories in chat when --multi-user is used (#6317). Thanks @mashb1t.
  • typo in OpenAI response format (#6365). Thanks @jsboige.

v1.14

20 Aug 04:29
073694b
Compare
Choose a tag to compare

Backend updates

  • llama-cpp-python: bump to 0.2.89.
  • Transformers: bump to 4.44.

Other changes

  • Model downloader: use a single session for all downloaded files to reduce the time to start each download.
  • Add a --tokenizer-dir flag to be used with llamacpp_HF.

v1.13

01 Aug 05:28
d011040
Compare
Choose a tag to compare

Backend updates

  • llama-cpp-python: bump to 0.2.85 (adds Llama 3.1 support).

UI updates

  • Make compress_pos_emb float (#6276). Thanks @hocjordan.
  • Make n_ctx, max_seq_len, and truncation_length numbers rather than sliders, to make it possible to type the context length manually.
  • Improve the style of headings in chat messages.
  • LaTeX rendering:
    • Add back single $ for inline equations.
    • Fix rendering for equations enclosed between \[ and \].
    • Fix rendering for multiline equations.

Bug fixes

  • Fix saving characters through the UI.
  • Fix instruct mode displaying "quotes" as ""double quotes"".
  • Fix chat sometimes not scrolling down after sending a message.
  • Fix the chat "stop" event.
  • Make --idle-timeout work for API requests.

Other changes

  • Model downloader: improve the progress bar by adding the filename, size, and download speed for each downloaded file.
  • Better handle the Llama 3.1 Jinja2 template by not including its optional "tools" headers.

v1.12

25 Jul 15:19
dd97a83
Compare
Choose a tag to compare

Backend updates

  • Transformers: bump to 4.43 (adds Llama 3.1 support).
  • ExLlamaV2: bump to 0.1.8 (adds Llama 3.1 support).
  • AutoAWQ: bump to 0.2.6 (adds Llama 3.1 support).

UI updates

  • Make text between quote characters colored in chat and chat-instruct modes.
  • Prevent LaTeX from being rendered for inline "$", as that caused problems for phrases like "apples cost $1, oranges cost $2".
  • Make the markdown cache infinite and clear it when switching to another chat. This cache exists because the markdown conversion is CPU-intensive. By making it infinite, messages in a full 128k context will be cached, making the UI more responsive for long conversations.

Bug fixes

  • Fix a race condition that caused the default character to not be loaded correctly on startup.
  • Fix Linux shebangs (#6110). Thanks @LuNeder.

Other changes

  • Make the Google Colab notebook use the one-click installer instead of its own Python environment for better stability.
  • Disable flash-attention on Google Colab by default, as its GPU models do not support it.

v1.11

23 Jul 05:34
d1115f1
Compare
Choose a tag to compare

UI updates

  • Optimize the UI: events triggered by clicking on buttons, selecting values from dropdown menus, etc have been refactored to minimize the number of connections made between the UI and the server. As a result, the UI is now significantly faster and more responsive.
  • Use chat-instruct mode by default: most models nowadays are instruction-following models, and this mode automatically uses the model's Jinja2 template to generate the prompt, leading to higher-quality outputs.
  • Improve the style of code blocks in light mode.
  • Increase the font weight of chat messages (for chat and chat-instruct modes).
  • Use gr.Number for RoPE scaling parameters (#6233). Thanks @Vhallo.
  • Don't export the instruction template to settings.yaml on "Save UI defaults to settings.yaml" (it gets ignored and replaced with the model template).

Backend updates

  • llama-cpp-python: bump to 0.2.83 (adds Mistral-Nemo support).

Other changes

  • training: Added ChatML-format.json format example (#5899). Thanks @FartyPants.
  • Customize the subpath for gradio, use with reverse proxy (#5106). Thanks @canoalberto.

Bug fixes

  • Fix an issue where the chat contents sometimes disappear for a split second during streaming (#6247). Thanks @Patronics.
  • Fix the chat UI losing its vertical scrolling position when the input area grows to more than 1 line.

v1.10.1

13 Jul 17:56
0315122
Compare
Choose a tag to compare

Library updates

  • FlashAttention: bump to v2.6.1. Now Gemma-2 works in ExLlamaV2 with FlashAttention without any quality loss.

Bug fixes

  • Fix for MacOS users encountering model load errors with llama.cpp (#6227). Thanks @InvectorGator.