Skip to content

Commit

Permalink
Merge branch 'main' into fix-grpo-logits-calc
Browse files Browse the repository at this point in the history
  • Loading branch information
andyl98 authored Jan 31, 2025
2 parents eced62b + 2ce36ae commit 8ac4802
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion docs/source/sft_trainer.md
Original file line number Diff line number Diff line change
Expand Up @@ -624,7 +624,7 @@ To learn more about Liger-Kernel, visit their [official repository](https://gith

Pay attention to the following best practices when training a model with that trainer:

- [`SFTTrainer`] always truncates by default the sequences to the `max_seq_length` argument of the [`SFTTrainer`]. If none is passed, the trainer will retrieve that value from the tokenizer. Some tokenizers do not provide a default value, so there is a check to retrieve the minimum between 1024 and that value. Make sure to check it before training.
- [`SFTTrainer`] always truncates by default the sequences to the `max_seq_length` argument of the [`SFTConfig`]. If none is passed, the trainer will retrieve that value from the tokenizer. Some tokenizers do not provide a default value, so there is a check to retrieve the minimum between 1024 and that value. Make sure to check it before training.
- For training adapters in 8bit, you might need to tweak the arguments of the `prepare_model_for_kbit_training` method from PEFT, hence we advise users to use `prepare_in_int8_kwargs` field, or create the `PeftModel` outside the [`SFTTrainer`] and pass it.
- For a more memory-efficient training using adapters, you can load the base model in 8bit, for that simply add `load_in_8bit` argument when creating the [`SFTTrainer`], or create a base model in 8bit outside the trainer and pass it.
- If you create a model outside the trainer, make sure to not pass to the trainer any additional keyword arguments that are relative to `from_pretrained()` method.
Expand Down

0 comments on commit 8ac4802

Please sign in to comment.