Releases: hiyouga/LLaMA-Factory
v0.6.1: Patch release
This patch mainly fixes #2983
In commit 9bec3c9, we built the optimizer and scheduler inside the trainers, which inadvertently introduced a bug: when DeepSpeed was enabled, the trainers in transformers would build an optimizer and scheduler before calling the create_optimizer_and_scheduler
method [1], then the optimizer created by our method would overwrite the original one, while the scheduler would not. Consequently, the scheduler would no longer affect the learning rate in the optimizer, leading to a regression in the training result. We have fixed this bug in 3bcd41b and 8c77b10. Thank @HideLord for helping us identify this critical bug.
[1] https://github.com/huggingface/transformers/blob/v4.39.1/src/transformers/trainer.py#L1877-L1881
v0.6.0: Paper Release, GaLore and FSDP+QLoRA
We released our paper on arXiv! Thanks to all co-authors and AK's recommendation
New features
- Support GaLore algorithm, allowing full-parameter learning of a 7B model using less than 24GB VRAM
- Support FSDP+QLoRA that allows QLoRA fine-tuning of a 70B model on 2x24GB GPUs
- Support LoRA+ algorithm for better LoRA fine-tuning by @qibaoyuan in #2830
- LLaMA Factory 🤝 vLLM, enjoy 270% inference speed with
--infer_backend vllm
- Add Colab notebook for easily getting started
- Support pushing fine-tuned models to Hugging Face Hub in web UI
- Support
apply_chat_template
by adding a chat template to the tokenizer after fine-tuning - Add dockerize support by @S3Studio in #2743 #2849
New models
- Base models
- OLMo (1B/7B)
- StarCoder2 (3B/7B/15B)
- Yi-9B
- Instruct/Chat models
- OLMo-7B-Instruct
New datasets
- Supervised fine-tuning datasets
- Cosmopedia (en)
- Preference datasets
- Orca DPO (en)
Bug fix
- Fix flash_attn in web UI by @cx2333-gt in #2730
- Fix deepspeed runtime error in PPO by @stephen-nju in #2746
- Fix readme ddp instruction by @khazic in #2903
- Fix environment variable in datasets by @SirlyDreamer in #2905
- Fix readme information by @0xez in #2919
- Fix generation config validation by @marko1616 in #2945
- Fix requirements by @rkinas in #2963
- Fix bitsandbytes windows version by @Tsumugii24 in #2967
- Fix #2346 #2642 #2649 #2732 #2735 #2756 #2766 #2775 #2777 #2782 #2798 #2802 #2803 #2817 #2895 #2928 #2936 #2941
v0.5.3: DoRA and AWQ/AQLM QLoRA
New features
- Support DoRA (Weight-Decomposed LoRA)
- Support QLoRA for the AWQ/AQLM quantized models, now 2-bit QLoRA is feasible
- Provide some example scripts in https://github.com/hiyouga/LLaMA-Factory/tree/main/examples
New models
- Base models
- Gemma (2B/7B)
- Instruct/Chat models
- Gemma-it (2B/7B)
Bug fix
v0.5.2: Block Expansion, Qwen1.5 Models
New features
- Support block expansion in LLaMA Pro, see
tests/llama_pro.py
for usage - Add
use_rslora
option for the LoRA method
New models
- Base models
- Qwen1.5 (0.5B/1.8B/4B/7B/14B/72B)
- DeepSeekMath-7B-Base
- DeepSeekCoder-7B-Base-v1.5
- Orion-14B-Base
- Instruct/Chat models
- Qwen1.5-Chat (0.5B/1.8B/4B/7B/14B/72B)
- MiniCPM-2B-SFT/DPO
- DeepSeekMath-7B-Instruct
- DeepSeekCoder-7B-Instruct-v1.5
- Orion-14B-Chat
- Orion-14B-Long-Chat
- Orion-14B-RAG-Chat
- Orion-14B-Plugin-Chat
New datasets
- Supervised fine-tuning datasets
- SlimOrca (en)
- Dolly (de)
- Dolphin (de)
- Airoboros (de)
- Preference datasets
- Orca DPO (de)
Bug fix
- Fix
torch_dtype
check in export model by @fenglui in #2262 - Add Russian locale to LLaMA Board by @seoeaa in #2264
- Remove manually set
use_cache
in export model by @yhyu13 in #2266 - Fix DeepSpeed Zero3 training with MoE models by @A-Cepheus in #2283
- Add a patch for full training of the Mixtral model using DeepSpeed Zero3 by @ftgreat in #2319
- Fix bug in data pre-processing by @lxsyz in #2411
- Add German sft and dpo datasets by @johannhartmann in #2423
- Add version checking in
test_toolcall.py
by @mini-tiger in #2435 - Enable parsing of SlimOrca dataset by @mnmueller in #2462
- Add tags for models when pushing to hf hub by @younesbelkada in #2474
- Fix #2189 #2268 #2282 #2320 #2338 #2376 #2388 #2394 #2397 #2404 #2412 #2420 #2421 #2436 #2438 #2471 #2481
v0.5.0: Agent Tuning, Unsloth Integration
Congratulations on 10k stars 🎉 Make LLM fine-tuning easier and faster together with LLaMA-Factory ✨
New features
- Support agent tuning for most models, you can fine-tune any LLMs with
--dataset glaive_toolcall
for tool using #2226 - Support function calling in both API and Web mode with fine-tuned models, same as the OpenAI's format
- LLaMA Factory 🤝 Unsloth, enjoy 170% LoRA training speed with
--use_unsloth
, see benchmarking here - Supports fine-tuning models on MPS device #2090
New models
- Base models
- Phi-2 (2.7B)
- InternLM2 (7B/20B)
- SOLAR-10.7B
- DeepseekMoE-16B-Base
- XVERSE-65B-2
- Instruct/Chat models
- InternLM2-Chat (7B/20B)
- SOLAR-10.7B-Instruct
- DeepseekMoE-16B-Chat
- Yuan (2B/51B/102B)
New datasets
- Supervised fine-tuning datasets
- deepctrl dataset
- Glaive function calling dataset v2
Core updates
- Refactor data engine: clearer dataset alignment, easier templating and tool formatting
- Refactor saving logic for models with value head #1789
- Use ruff code formatter for stylish code
Bug fix
- Bump transformers version to 4.36.2 by @ShaneTian in #1932
- Fix requirements by @dasdristanta13 in #2117
- Add Machine-Mindset project by @JessyTsui in #2163
- Fix typo in readme file by @junuMoon in #2194
- Support resize token embeddings with ZeRO3 by @liu-zichen in #2201
- Fix #1073 #1462 #1617 #1735 #1742 #1789 #1821 #1875 #1895 #1900 #1908 #1907 #1909 #1923 #2014 #2067 #2081 #2090 #2098 #2125 #2127 #2147 #2161 #2164 #2183 #2195 #2249 #2260
v0.4.0: Mixtral-8x7B, DPO-ftx, AutoGPTQ Integration
🚨🚨 Core refactor
- Deprecate
checkpoint_dir
and useadapter_name_or_path
instead - Replace
resume_lora_training
withcreate_new_adapter
- Move the patches in model loading to
llmtuner.model.patcher
- Bump to Transformers 4.36.1 to adapt to the Mixtral models
- Wide adaptation for FlashAttention2 (LLaMA, Falcon, Mistral)
- Temporarily disable LongLoRA due to breaking changes, which will be supported later
The above changes were made by @hiyouga in #1864
New features
- Add DPO-ftx: mixing fine-tuning gradients to DPO via the
dpo_ftx
argument, suggested by @lylcst in #1347 (comment) - Integrate AutoGPTQ into the model export via the
export_quantization_bit
andexport_quantization_dataset
arguments - Support loading datasets from ModelScope Hub by @tastelikefeet and @wangxingjun778 in #1802
- Support resizing token embeddings with the noisy mean initialization by @hiyouga in a66186b
- Support system column in both alpaca and sharegpt dataset formats
New models
- Base models
- Mixtral-8x7B-v0.1
- Instruct/Chat models
- Mixtral-8x7B-v0.1-instruct
- Mistral-7B-Instruct-v0.2
- XVERSE-65B-Chat
- Yi-6B-Chat
Bug fix
v0.3.3: ModelScope Integration, Reward Server
New features
- Support loading pre-trained models from ModelScope Hub by @tastelikefeet in #1700
- Support launching a reward model server in demo API via specifying
--stage=rm
inapi_demo.py
- Support using a reward model server in PPO training via specifying
--reward_model_type api
- Support adjusting the shard size of exported models via the
export_size
argument
New models
- Base models
- DeepseekLLM-Base (7B/67B)
- Qwen (1.8B/72B)
- Instruct/Chat models
- DeepseekLLM-Chat (7B/67B)
- Qwen-Chat (1.8B/72B)
- Yi-34B-Chat
New datasets
- Supervised fine-tuning datasets
- Preference datasets