Releases: oobabooga/text-generation-webui
Releases · oobabooga/text-generation-webui
snapshot-2023-11-05
What's Changed
- updated wiki link by @senadev42 in #4415
- Bump AutoAWQ to v0.1.5 by @casper-hansen in #4410
- Bump exllamav2 version to 0.0.7 by @Soefati in #4417
- Bugfix: Updating the shared settings object when loading a model by @ziadloo in #4425
- [Fix] OpenOrca-Platypus2 models should use correct instruction_template when matching against models/config.yaml by @deevis in #4435
- make torch.load a bit safer by @julien-c in #4448
- transformers: Add a flag to force load from safetensors by @julien-c in #4450
- Implement Min P as a sampler option in HF loaders by @kalomaze in #4449
- Add temperature_last parameter by @oobabooga in #4472
- Bump AWQ to 0.1.6 by @casper-hansen in #4470
- fixed two links in the ui by @wvanderp in #4452
- add use_flash_attention_2 to param for Model loader Transformers by @fenglui in #4373
- Bump transformers to 4.35.* by @Soefati in #4474
- Merge dev branch by @oobabooga in #4475
- Merge dev branch by @oobabooga in #4476
- Fix openai extension not working because of absent new defaults by @kabachuha in #4477
New Contributors
- @senadev42 made their first contribution in #4415
- @Soefati made their first contribution in #4417
- @ziadloo made their first contribution in #4425
- @deevis made their first contribution in #4435
- @julien-c made their first contribution in #4448
- @wvanderp made their first contribution in #4452
- @fenglui made their first contribution in #4373
Full Changelog: snapshot-2023-10-29...snapshot-2023-11-05
snapshot-2023-10-29
What's Changed
- Add additive_repetition_penalty sampler setting. by @tdrussell in #3627
- Fix training.py tutorial url by @adrianfiedler in #4367
- Rename additive_repetition_penalty to presence_penalty, add frequency_penalty by @tdrussell in #4376
- Replace hashlib.sha256 with hashlib.file_digest, so we don't need to load entire files into ram before hashing them. by @LightningDragon in #4383
- Fix Gradio warning message regarding custom value by @GuizzyQC in #4391
- Intel Gpu support initialization by @abhilash1910 in #4340
- Update accelerate requirement from ==0.23.* to ==0.24.* by @dependabot in #4400
- Adding
platform_system
toautoawq
by @jamesbraza in #4390
New Contributors
- @adrianfiedler made their first contribution in #4367
- @LightningDragon made their first contribution in #4383
- @abhilash1910 made their first contribution in #4340
Full Changelog: snapshot-2023-10-22...snapshot-2023-10-29
snapshot-2023-10-22
What's Changed
- Fix for using Torch with CUDA 11.8 by @sammcj in #4298
- openai: fix wrong models list on query present in /v1/models by @hronoas in #4139
- More silero languages by @missionfloyd in #3950
- ExLlamav2_HF: Convert logits to FP32 by @turboderp in #4310
- Support LLaVA v1.5 by @haotian-liu in #4305
- Structured requirements && Python 3.11 support by @mjbogusz in #4233
- Enable special token support for exllamav2 by @JohanAR in #4314
- Add flash-attention 2 for windows by @bdashore3 in #4235
- Docker: Remove explicit CUDA 11.8 Reference by @whiteadam in #4343
- Add a proper documentation by @oobabooga in #3885
- USE_CUDA118 from ENV remains null one_click.py + cuda-toolkit by @mongolu in #4352
- Training PRO a month worth of updates by @FartyPants in #4345
- Support LLaVA v1.5 7B by @cnut1648 in #4348
- Option to select/target additional linear modules/layers in LORA training by @computerman00 in #4178
- Allow multiple loaded LoRAs to simultaneously influence the output by @Googulator in #3120
New Contributors
- @hronoas made their first contribution in #4139
- @mjbogusz made their first contribution in #4233
- @whiteadam made their first contribution in #4343
- @mongolu made their first contribution in #4352
- @cnut1648 made their first contribution in #4348
- @computerman00 made their first contribution in #4178
- @Googulator made their first contribution in #3120
Full Changelog: snapshot-2023-10-15...snapshot-2023-10-22
snapshot-2023-10-15
Switching to a rolling release model with weekly snapshots.
What's Changed
- Flash attention fix redux. by @Ph0rk0z in #4247
- Bump safetensors from 0.3.2 to 0.4.0 by @dependabot in #4249
- Support LLaVA-LLaMA-2 by @haotian-liu in #3377
- Bump to latest gradio (3.47) by @oobabooga in #4258
- Add HTTPS Support to openai extension by @chuyqa in #4270
- Add ChatML support + Mistral-OpenOrca by @netrunnereve in #4275
- Use Pytorch 2.1 exllama wheels by @jllllll in #4285
- Exllamav2 lora support by @Ph0rk0z in #4229
- Relax numpy version requirements by @JohanAR in #4291
New Contributors
- @haotian-liu made their first contribution in #3377
- @chuyqa made their first contribution in #4270
Full Changelog: v1.7...snapshot-2023-10-15
v1.7
What's Changed
- Check '--model-dir' for no models warning in one-click-installer by @jllllll in #4067
- Supercharging superbooga by @HideLord in #3272
- Fix old install migration for WSL installer by @jllllll in #4093
- Expand MacOS llama.cpp support in requirements.txt by @jllllll in #4094
- Bump exllamav2 to 0.0.4 and use pre-built wheels by @jllllll in #4095
- Enable NUMA feature for llama_cpp_python by @StoyanStAtanasov in #4040
- fix: add missing superboogav2 dep by @sammcj in #4099
- Delete extensions/Training_PRO/readme.md by @missionfloyd in #4112
- Bump llama-cpp-python to 0.2.7 by @jllllll in #4110
- fix: update superboogav2
requirements.txt
by @wangcx18 in #4100 - Update one_click.py to initialize site_packages_path variable by @Psynbiotik in #4118
- Let model downloader download *.tiktoken as well by @happyme531 in #4121
- Bump llama-cpp-python to 0.2.11 by @jllllll in #4142
- Add grammar to transformers and _HF loaders by @oobabooga in #4091
- Ignoring custom changes to CMD_FLAGS.txt on update. by @berkut1 in #4181
- Fix off-by-one error in exllama_hf caching logic by @tdrussell in #4145
- AutoAWQ: initial support by @cal066 in #3999
- Bump ExLlamaV2 to 0.0.5 by @turboderp in #4186
- Bump AutoAWQ to v0.1.4 by @casper-hansen in #4203
- Fix python wheels for avx requirements by @AG-w in #4189
- Bump to pytorch 11.8 by @oobabooga in #4209
- Use GPTQ wheels compatible with Pytorch 2.1 by @jllllll in #4210
- Fix CFG init with Llamacpp_HF by @bdashore3 in #4219
- Text Generation: Abort if EOS token is reached by @bdashore3 in #4213
- README for superboogav2 by @jamesbraza in #4212
- Move import in llama_attn_hijack.py by @Ph0rk0z in #4231
New Contributors
- @StoyanStAtanasov made their first contribution in #4040
- @Psynbiotik made their first contribution in #4118
- @turboderp made their first contribution in #4186
- @casper-hansen made their first contribution in #4203
- @AG-w made their first contribution in #4189
- @bdashore3 made their first contribution in #4219
Full Changelog: 1.6.1...v1.7
1.6.1
What's Changed
- Use call for conda deactivate in Windows installer by @jllllll in #4042
- [extensions/openai] Fix error when preparing cache for embedding models by @wangcx18 in #3995
- Create alternative requirements.txt with AMD and Metal wheels by @oobabooga in #4052
- Add a grammar editor to the UI by @oobabooga in #4061
- Avoid importing torch in one-click-installer by @jllllll in #4064
Full Changelog: v1.6...1.6.1
v1.6
The one-click-installers have been merged into the repository. Migration instructions can be found here.
The updated one-click install features an installation size several GB smaller and a more reliable update procedure.
What's Changed
- sd_api_pictures: Widen sliders for image size minimum and maximum by @GuizzyQC in #3326
- Bump exllama module to 0.0.9 by @jllllll in #3338
- Add an extension that makes chat replies longer by @oobabooga in #3363
- add chat instruction config for BaiChuan-chat model by @CrazyShipOne in #3332
- [extensions/openai] +Array input (batched) , +Fixes by @matatonic in #3309
- Add a scrollbar to notebook/default textboxes, improve chat scrollbar style by @jparmstr in #3403
- Add auto_max_new_tokens parameter by @oobabooga in #3419
- Add the --cpu option for llama.cpp to prevent CUDA from being used by @oobabooga in #3432
- Use character settings from API properties if present by @rafa-9 in #3428
- Add standalone Dockerfile for NVIDIA Jetson by @toolboc in #3336
- More models: +StableBeluga2 by @matatonic in #3415
- [extensions/openai] include content-length for json replies by @matatonic in #3416
- Fix llama.cpp truncation by @jparmstr in #3400
- Remove unnecessary chat.js by @missionfloyd in #3445
- Add back silero preview by @missionfloyd by @oobabooga in #3446
- Add SSL certificate support by @oobabooga in #3453
- Bump bitsandbytes to 0.41.1 by @jllllll in #3457
- [Bug fix] Remove html tags form the Prompt sent to Stable Diffusion by @SodaPrettyCold in #3151
- Fix: Mirostat fails on models split across multiple GPUs. by @Ph0rk0z in #3465
- Bump exllama wheels to 0.0.10 by @jllllll in #3467
- Create logs dir if missing when saving history by @jllllll in #3462
- Fix chat message order by @missionfloyd in #3461
- Add Classifier Free Guidance (CFG) for Transformers/ExLlama by @oobabooga in #3325
- Refactor everything by @oobabooga in #3481
- Use chat_instruct_command in API by @jllllll in #3482
- Make dockerfile respect specified cuda version by @sammcj in #3474
- Fixed a typo when displaying parameters on the llamm.cpp model did not correctly display "rms_norm_eps" by @berkut1 in #3494
- Add option for named cloudflare tunnels by @Fredddi43 in #3364
- Fix superbooga when using regenerate by @oderwat in #3362
- Added the logic for starchat model series by @giprime in #3185
- Streamline GPTQ-for-LLaMa support by @jllllll in #3526
- Add Vicuna-v1.5 detection by @berkut1 in #3524
- ctransformers: another attempt by @cal066 in #3313
- Bump ctransformers wheel version by @jllllll in #3558
- ctransformers: move thread and seed parameters by @cal066 in #3543
- Unify the 3 interface modes by @oobabooga in #3554
- Various ctransformers fixes by @netrunnereve in #3556
- Add "save defaults to settings.yaml" button by @oobabooga in #3574
- Add the --disable_exllama option for AutoGPTQ by @clefever in #3545
- ctransformers: Fix up model_type name consistency by @cal066 in #3567
- Add a "Show controls" button to chat UI by @oobabooga in #3590
- Improved chat scrolling by @oobabooga in #3601
- fixes error when not specifying tunnel id by @ausboss in #3606
- Fix print CSS by @missionfloyd in #3608
- Bump llama-cpp-python by @oobabooga in #3610
- Bump llama_cpp_python_cuda to 0.1.78 by @jllllll in #3614
- Refactor the training tab by @oobabooga in #3619
- llama.cpp: make Stop button work with streaming disabled by @cebtenzzre in #3620
- Unescape last message by @missionfloyd in #3623
- Improve readability of download-model.py by @Thutmose3 in #3497
- Add probability dropdown to perplexity_colors extension by @SeanScripts in #3148
- Add a simple logit viewer by @oobabooga in #3636
- Fix whitespace formatting in perplexity_colors extension. by @tdrussell in #3643
- ctransformers: add mlock and no-mmap options by @cal066 in #3649
- Update requirements.txt by @tkbit in #3651
- Add missing extensions to Dockerfile by @sammcj in #3544
- Implement CFG for ExLlama_HF by @oobabooga in #3666
- Add CFG to llamacpp_HF (second attempt) by @oobabooga in #3678
- ctransformers: gguf support by @cal066 in #3685
- Fix ctransformers threads auto-detection by @jllllll in #3688
- Use separate llama-cpp-python packages for GGML support by @jllllll in #3697
- GGUF by @oobabooga in #3695
- Fix ctransformers model unload by @marella in #3711
- Add ffmpeg to the Docker image by @kelvie in #3664
- accept floating-point alpha value on the command line by @cebtenzzre in #3712
- Bump llama-cpp-python to 0.1.81 by @jllllll in #3716
- Make it possible to scroll during streaming by @oobabooga in #3721
- Bump llama-cpp-python to 0.1.82 by @jllllll in #3730
- Bump ctransformers to 0.2.25 by @jllllll in #3740
- Add max_tokens_second param by @oobabooga in #3533
- Update requirements.txt by @VishwasKukreti in #3725
- Update llama.cpp.md by @q5sys in #3702
- Bump llama-cpp-python to 0.1.83 by @jllllll in #3745
- Update download-model.py (Allow single file download) by @bet0x in #3732
- Allow downloading single file from UI by @missionfloyd in #3737
- Bump exllama to 0.0.14 by @jllllll in #3758
- Bump llama-cpp-python to 0.1.84 by @jllllll in #3854
- Update transformers requirement from ==4.32.* to ==4.33.* by @dependabot in #3865
- Bump exllama to 0.1.17 by @jllllll in #3847
- Exllama new rope settings by @Ph0rk0z in #3852
- fix lora training with alpaca_lora_4bit by @johnsmith...
v1.5
What's Changed
- Add a detailed extension example and update the extension docs. The example can be found here: example/script.py.
- Introduce a new
chat_input_modifier
extension function and deprecate the oldinput_hijack
. - Change rms_norm_eps to 5e-6 for
llama-2-70b ggmlall llama-2 models -- this value reduces the perplexities of the models. - Remove FlexGen support. It has been made obsolete by the lack of Llama support and the emergence of llama.cpp and 4-bit quantization. I can add it back if it ever gets updated.
- Use the dark theme by default.
- Set the correct instruction template for the model when switching from default/notebook modes to chat mode.
Bug fixes
- [extensions/openai] Fixes for: embeddings, tokens, better errors. +Docs update, +Images, +logit_bias/logprobs, +more. by @matasonic in #3122
- Fix typo in README.md by @eltociear in #3286
- README updates and improvements by @netrunnereve in #3198
- Ignore values in training.py which are not string by @Foxtr0t1337 in #3287
v1.4
What's Changed
- Add llama-2-70b GGML support by @oobabooga in #3285
- Bump bitsandbytes to 0.41.0 by @jllllll in #3258 -- faster speeds
- Bump exllama module to 0.0.8 by @jllllll in #3256 -- expanded LoRA support
Bug fixes
Extensions
- [extensions/openai] Fixes for: embeddings, tokens, better errors. +Docs update, +Images, +logit_bias/logprobs, +more. by @matatonic in #3122