Squashed commit of the following:

commit 9e9dc96 Author: Maxim Kopecki <[email protected]> Date: Wed Jul 10 19:11:13 2024 +0200 Added missing token kwarg in Peft model loading (#1825) commit 7ddef5c Author: Quentin Gallouédec <[email protected]> Date: Wed Jul 10 18:26:11 2024 +0200 Make use of `trust_remote_code` consistent (#1806) Co-authored-by: Quentin Gallouédec <[email protected]> commit a9cddf8 Author: Adnan Khan <[email protected]> Date: Wed Jul 10 11:25:07 2024 -0400 Delete unused benchmark.yml workflow. (#1822) commit 2860ce5 Author: Quentin Gallouédec <[email protected]> Date: Tue Jul 9 09:22:52 2024 +0200 DPO Llava 1.5 and PaliGemma support (#1797) * llava support dpo * add_special_tokens=False only when possible * format * pali gemma * refactor size * remove image resize --------- Co-authored-by: Quentin Gallouédec <[email protected]> commit 30e33bd Author: Quentin Gallouédec <[email protected]> Date: Tue Jul 9 05:37:12 2024 +0200 upgrade gh actions (#1818) Co-authored-by: Quentin Gallouédec <[email protected]> commit d5a0d2d Author: Costa Huang <[email protected]> Date: Mon Jul 8 11:12:41 2024 -0400 Set dev version (#1817) commit 314e8eb Author: Puneet Singh Bhooi <[email protected]> Date: Mon Jul 8 19:11:36 2024 +0530 fix broken url in `docs\source\index.mdx` (#1813) commit e107920 Author: Costa Huang <[email protected]> Date: Mon Jul 8 09:38:09 2024 -0400 0.9.6 release (#1816) commit 78045de Author: Alvaro Bartolome <[email protected]> Date: Mon Jul 8 01:59:26 2024 +0200 Fix `TRL_USE_RICH` environment variable handling (#1808) * Add `strtobool` custom implementation from `distutils` * Fix `TRL_USE_RICH` handling via `strtobool` * Run `make precommit` commit 747612f Author: Alvaro Bartolome <[email protected]> Date: Fri Jul 5 16:28:59 2024 +0200 Fix `torch_dtype` handling in `{DPO,SFT}Trainer` when provided via CLI (#1807) * Fix `torch_dtype` handling through CLI The `torch_dtype` is not properly handled when provided via the TRL CLI since it's provided initially as a string, but is then casted to `torch.dtype` before providing it to the `{DPO,SFT}Trainer`, which means that those trainers should handle the scenario where `torch_dtype` is a `torch.dtype` too. * Add `torch_dtype` tests in `test_{dpo,sft}_trainer.py` * Forward contribution credits * Run `make precommit` --------- Co-authored-by: Tash Srivastava <[email protected]> commit 9e3a35b Author: Michael <[email protected]> Date: Fri Jul 5 07:29:48 2024 -0400 Remove extra print in reward_trainer.py (#1799) `print_rich_table` is called twice and the first call doesn't restrict to `num_print_samples`. Remove the first, extra call commit 4402b36 Author: Quentin Gallouédec <[email protected]> Date: Thu Jul 4 14:29:25 2024 +0200 clean examples (#1791) Co-authored-by: Quentin Gallouédec <[email protected]> commit 78f8228 Author: Noah Tye <[email protected]> Date: Wed Jul 3 11:10:50 2024 -0700 Bugfix: Preserve token fields when converting TrainingArguments to SFTConfig (#1794) * Preserve token fields when converting TrainingArguments to SFTConfig TrainingArguments.to_dict() redacts token fields, so we have to individually copy them over when converting to SFTConfig to avoid breaking push_to_hub functionality. Also adds a test. * run precommit * one-line args_as_dict definition per suggestion from kashif * generalize token copying to match TrainingArguments behavior * unwrap |= on dict, to support python 3.8 * use .update instead of |= or for-loop commit b6af2ed Author: Kashif Rasul <[email protected]> Date: Wed Jul 3 08:29:16 2024 +0200 add model_init_kwargs to training_args (#1787) commit cd85b14 Author: Tommaso Buonocore <[email protected]> Date: Sat Jun 29 15:35:48 2024 +0200 Fixed typo in SFT trainer docs (#1788) 'STFConfig' instead of 'SFTConfig' appears multiple times in the doc, causing error when running the code snippets. commit a57544f Author: Kashif Rasul <[email protected]> Date: Thu Jun 27 15:47:58 2024 +0200 fix docs and examples (#1780) commit b68ff96 Author: Quentin Gallouédec <[email protected]> Date: Wed Jun 26 16:26:37 2024 +0200 Visual DPO (#1647) * Remove extra whitespaces * idefics * vdpo * sft idefics * pad with test * use prompt instead of tokenizer * rm name main * support vlm in tokenize row * temp fix for regex in lora_target_module * format * vdpo * tmp float16 hard code * concatenated_forward support for vision * style and new command line * all-linear * format * delete old examples * get image * upcast * new test * modified test * new strat for tokenizer * rm token transfer * integrate vision in dpo example * format * add FDivergenceType back * precommit * pillow test dep * optional prompt * `evaluation_strategy` to `eval_strategy` * revert vsft change (oos) * update test * test * comment and support more in process * update process * update doc for vdpo * caution about limited support * Update docs/source/dpo_trainer.mdx Co-authored-by: Kashif Rasul <[email protected]> * revert DPO example changes * cleaner way to check if a model is vision * comment * update vdpo example * rename --------- Co-authored-by: Quentin Gallouédec <[email protected]> Co-authored-by: Kashif Rasul <[email protected]> commit c8c01cc Author: Mubin Manasia <[email protected]> Date: Wed Jun 26 03:23:36 2024 -0600 Fix Documentation Overflow Issues for Long URLs in SFTConfig (#1774) * Update sft_config.py * Update sft_config.py commit 3479606 Author: Costa Huang <[email protected]> Date: Wed Jun 26 03:18:22 2024 -0400 Remove the leading space in the tldr preference dataset (#1773) commit 7965b78 Author: Haozhe Ji <[email protected]> Date: Tue Jun 25 22:47:32 2024 +0800 add Efficient Exact Optimization (EXO) (#1735) * add exo * fix a detail * Update trl/trainer/dpo_trainer.py * Update trl/trainer/dpo_trainer.py * Update trl/trainer/dpo_trainer.py --------- Co-authored-by: Kashif Rasul <[email protected]> commit 56bd1bb Author: Quentin Gallouédec <[email protected]> Date: Tue Jun 25 16:14:26 2024 +0200 `evaluation_strategy` to `eval_strategy` (#1771) Co-authored-by: Quentin Gallouédec <[email protected]> commit 94d53e6 Author: Clara Pohland <[email protected]> Date: Mon Jun 24 21:27:00 2024 +0200 MoE Models: option to add load balancing loss (#1765) * KTO: add aux loss * use router_aux_loss_coef in KtoTrainer when aux_loss enabled * align optional aux_loss in DPO, KTO, CPO, ORPO * precommit changes * fix KL forward kwargs * add aux_loss doku entry * apply docs suggestions --------- Co-authored-by: Clara Luise Pohland <[email protected]> commit b5be100 Author: Mihir Prabhudesai <[email protected]> Date: Mon Jun 24 12:05:44 2024 -0400 Added Reward Backpropogation Support (#1585) * added alignprop template * added alignprop support * Update alignprop_trainer.mdx * Update alignprop_trainer.mdx * added better why statement * fixed inference code * changed self to pipeline * removed aesthetic classifier * added aesthetic to auxiliary models * added unseen prompt logging * removed unseen prompt log * fixed minor * remove not needed import in trl/__init__.py Co-authored-by: Younes Belkada <[email protected]> * fixed styling * updated _toctree --------- Co-authored-by: Younes Belkada <[email protected]> commit 6e1652b Author: Haoran Xu <[email protected]> Date: Sun Jun 23 09:54:30 2024 -0700 Add CPO-SimPO method (#1760) * enable cpo-simpo * highlight SimPO and CPO-SimPO * add test for cpo_alpha * formatting * Update docs/source/cpo_trainer.mdx --------- Co-authored-by: Kashif Rasul <[email protected]> commit 65374c6 Author: Costa Huang <[email protected]> Date: Fri Jun 21 11:20:54 2024 -0400 New sentiment and descriptiveness dataset (#1757) * push changes * handle edge cases where the chosen and the rejected are the same commit 9956091 Author: Juyoung Suk <[email protected]> Date: Fri Jun 21 18:01:08 2024 +0900 Add dataset_text_field in examples/scripts/sft.py (#1758) commit 34d273f Author: Costa Huang <[email protected]> Date: Thu Jun 20 13:16:43 2024 -0400 Support num_train_epochs (#1743) * add a test case for num_train_epochs * fix ci * quick change * disable push to hub * debug windows ci * try another fix * skip subprocess tests on windows commit 3bf9449 Author: Mert Sayar <[email protected]> Date: Thu Jun 20 18:22:20 2024 +0300 Fix masking of response tokens (#1718) Current handling of `response_masks` inside `batch_forward_pass` function does not take padding into consideration which results with shape unmatch during masking. Since response mask is a mask tensor of response tokens, response tokens should not be concatenated with a `torch.zeros(query_length)` and masking operation should be done without slicing. Remove the concatenation of the response mask, remove the slicing from the response mask since response mask already has the length of `end - start + 1`, which is equal to length of `masks[j, start:end]`. commit ba6abee Author: idanshen <[email protected]> Date: Thu Jun 20 09:14:16 2024 -0400 Support for returning past_key_values from the model (#1742) * add support for returning past_key_values from the model * change order of keys commit a57e759 Author: 1485840691 <[email protected]> Date: Wed Jun 19 18:02:51 2024 +0800 Integrate f-divergence to DPO (Follow up) (#1610) * Step 1: update ppo_trainer and hello_world example * Step 2: Refine comments and add parameter type * Step 2: Add missing parameter comments * Step 1: Organize ptx loss into a function and add ptx_loss to train_stats * Step 1 updates: add comment to ptx_loss function, fix a bug and add warning message * Step 2: 1) Add ppo_ptx trainig example as ppo; 2) separate pretrain data fetch and iterate * Step 2: Remove loss from columns_to_log in ppo_ptx example * Remove data set revision in load imbd dataset * Run pre-commit and fix format issues * Initial draft of f-divergence fn * Update f-divergence to avoid overflow * fix test errors and comments * Add Unit tests for dpo loss with alpha and js div f * Adjust format * Fix test error * Reverse this update * Add test cases * Reverse un-needed updates * Update code style * Try to fix code fmt error * remove extra end line --------- Co-authored-by: Kashif Rasul <[email protected]> commit ae23d40 Author: Shihyueh Hsu <[email protected]> Date: Tue Jun 18 22:07:24 2024 +0800 change the `process` function in the example of DPO (#1753) * change the `process` function in the example of DPO * fix commit 83b367b Author: Younes Belkada <[email protected]> Date: Tue Jun 18 11:31:17 2024 +0200 CI / `KTOTrainer`: Remove old tests (#1750) * remove old tests * remove datasets * Update test_dpo_trainer.py * Update test_dpo_trainer.py commit d1ed730 Author: Michael <[email protected]> Date: Mon Jun 17 10:50:21 2024 -0400 prepare deepspeed accomodate fp16 and bf16 (#1728) * prepare deepspeed accomodate fp16 and bf16 * precommit commit 8f8e95e Author: Younes Belkada <[email protected]> Date: Mon Jun 17 16:49:00 2024 +0200 CPO / DPO: Fix red CI (#1749) * fix red CI * precommit commit 4e23d95 Author: Younes Belkada <[email protected]> Date: Mon Jun 17 16:41:36 2024 +0200 fix red CI commit 50c4620 Author: Kawin <[email protected]> Date: Mon Jun 17 07:14:44 2024 -0700 small KTO fixes (#1734) * add warning for imbalanced data * update documentation * update script commands to be same as in dpo * use batch_size KL examples and batch_size target examples to calculate batch_size losses * fix deepspeed issue * speed up forward with no_grad for KL * add some removed metrics * Update trl/trainer/kto_trainer.py * Update trl/trainer/kto_trainer.py * Update trl/trainer/kto_trainer.py add reference to paper Co-authored-by: lewtun <[email protected]> * Update trl/trainer/kto_trainer.py Co-authored-by: Kashif Rasul <[email protected]> * Update trl/trainer/kto_trainer.py Co-authored-by: Kashif Rasul <[email protected]> * Update trl/trainer/kto_trainer.py Co-authored-by: Kashif Rasul <[email protected]> * Update trl/trainer/kto_trainer.py Co-authored-by: Kashif Rasul <[email protected]> * Update trl/trainer/kto_trainer.py Co-authored-by: Kashif Rasul <[email protected]> * Update trl/trainer/kto_trainer.py Co-authored-by: Kashif Rasul <[email protected]> * Update trl/trainer/kto_trainer.py Co-authored-by: Kashif Rasul <[email protected]> * Update trl/trainer/kto_trainer.py Co-authored-by: Kashif Rasul <[email protected]> * Update trl/trainer/kto_trainer.py Co-authored-by: Kashif Rasul <[email protected]> * Update trl/trainer/kto_trainer.py Co-authored-by: Kashif Rasul <[email protected]> * Update trl/trainer/kto_trainer.py Co-authored-by: Kashif Rasul <[email protected]> * add more detailed comments * convert assert to ValueError * Update kto_trainer.py * precommit formatting * remove nans in metrics by gathering across machines * fix formatting * fix choice of mismatched examples for KL term * describe weights * fix hanging issue in distributed training * linting * move metrics to cpu * Update trl/trainer/kto_trainer.py Co-authored-by: lewtun <[email protected]> * Update trl/trainer/kto_trainer.py * Update trl/trainer/kto_trainer.py * remove kto_pair * speed up data processing * move bco code inside * raise error for kto_pair argument * fix formatting --------- Co-authored-by: Kashif Rasul <[email protected]> Co-authored-by: lewtun <[email protected]> Co-authored-by: Winnie Xu <[email protected]> commit 6105d03 Author: Younes Belkada <[email protected]> Date: Mon Jun 17 16:01:06 2024 +0200 `TrlParser`: Add ignore extra args option (#1748) * add ignore extra args option * Update trl/commands/cli_utils.py commit e247bbd Author: Younes Belkada <[email protected]> Date: Mon Jun 17 15:16:07 2024 +0200 CI / core: Pin `numpy` to `!=2.0.0` for CI and to users (#1747) * Update setup.py * Update setup.py * Update setup.py * Update test_best_of_n_sampler.py dummy commit * pin numpy * Update tests/test_best_of_n_sampler.py * Update setup.py commit 3d04496 Author: Michael <[email protected]> Date: Mon Jun 17 08:43:33 2024 -0400 better trl parser with yaml config (#1739) * working trl parser with config correctly overrides yaml config with command line arguments adds return_remaining_strings when return_remaining_strings is False, raises error if yaml contains extra args that are not in the dataclasses simpler and cleaner than previous yaml parsing and merging addresses #1733 * lowercase trlparser commit 2d244f8 Author: Younes Belkada <[email protected]> Date: Mon Jun 17 11:56:13 2024 +0200 Workflow: Notify tests results on slack channel (#1744) * Update tests-main.yml * Update docker-build.yml commit f5168fd Author: Igor Melnyk <[email protected]> Date: Wed Jun 12 05:54:54 2024 -0400 adds AOT (#1701) * adds AOT * Applied format changes * added docs and tests --------- Co-authored-by: Igor Melnyk <[email protected]> commit 79686e1 Author: jetlime <[email protected]> Date: Wed Jun 12 00:35:31 2024 +1000 ktotrainer: Refuse datasets which contain only one class of labels (#1724) * ktotrainer: refuse dataset which contain only one class of labels * ktotrainer: document new dataset constraint commit 34ebc4c Author: Luc Georges <[email protected]> Date: Mon Jun 10 11:17:54 2024 +0200 feat(ci): add trufflehog secrets detection (#1721) * feat(ci): add trufflehog secrets detection * fix(ci): remove unnecessary permissions commit 1d84e2b Author: Michael <[email protected]> Date: Fri Jun 7 11:42:08 2024 +0200 Fix default padding_value in dpo_config.py (#1692) dpo_config default padding value should be None, not 0, otherwise it by default overrides the padding value of any tokenizer to 0 commit 2f71b8b Author: Michael <[email protected]> Date: Fri Jun 7 10:37:27 2024 +0200 fix yaml parser for derived config classes (#1713) fixes #1712 reformatted cli_utils with ruff commit 5bcb8ad Author: Kashif Rasul <[email protected]> Date: Fri Jun 7 08:48:17 2024 +0100 RDPO fix nll loss (#1705) commit b8b972f Author: Haoran Xu <[email protected]> Date: Thu Jun 6 14:06:47 2024 -0700 Add a variant of CPO, SimPO (#1703) * add a variant of cpo: simpo * correct cpo-simpo loss * avoid 0 int error in logging * add simpo description * Update trl/trainer/cpo_trainer.py Co-authored-by: Kashif Rasul <[email protected]> * fix formatting * add test for simpo * Update docs/source/cpo_trainer.mdx Co-authored-by: Kashif Rasul <[email protected]> * add a docstring for simpogamma * move simpo description to the above docstring * change simpo description in the doc * formatting --------- Co-authored-by: Kashif Rasul <[email protected]> commit 3eb9ccb Author: Younes Belkada <[email protected]> Date: Thu Jun 6 19:33:20 2024 +0200 set dev version (#1710) * Update setup.py * Update __init__.py commit 974b0d3 Author: Costa Huang <[email protected]> Date: Thu Jun 6 10:13:00 2024 -0400 0.9.4 release (#1708) commit 39a7d1c Author: Younes Belkada <[email protected]> Date: Thu Jun 6 15:50:17 2024 +0200 SFTTrainer: Fix backward Compatibility issue with `TrainingArguments` (#1707) * fix BC * fixup commit 0bdc638 Author: Guilherme Freire <[email protected]> Date: Thu Jun 6 14:42:58 2024 +0100 Fixed doc string and docs for the SFTConfig update (#1706) commit 275d33b Author: Costa Huang <[email protected]> Date: Wed Jun 5 14:34:59 2024 -0400 0.9.3 release (#1699) commit c0819ee Author: Younes Belkada <[email protected]> Date: Wed Jun 5 17:29:03 2024 +0200 Update sft_trainer.py (#1698) commit a03e7cc Author: Costa Huang <[email protected]> Date: Wed Jun 5 11:00:19 2024 -0400 Release 0.9.2 (#1697) * Release: 0.9.0 * Release commit a13cb89 Author: Costa Huang <[email protected]> Date: Wed Jun 5 10:20:54 2024 -0400 Quick fix on GPT4-eval (#1696) * quick fix * precommit commit 84156f1 Author: Quentin Gallouédec <[email protected]> Date: Mon Jun 3 20:09:05 2024 +0200 Fix typo in DPOTrainer's warnings (#1688) commit 4eb0b90 Author: Alex Brooks <[email protected]> Date: Mon Jun 3 10:24:32 2024 -0600 Skip packing validation (#1673) * Add test for skipping preproc if packing=True Signed-off-by: Alex-Brooks <[email protected]> * Allow skipping of validation for packing=True Signed-off-by: Alex-Brooks <[email protected]> * Use dummy dataset in no packing preproc test Signed-off-by: Alex-Brooks <[email protected]> --------- Signed-off-by: Alex-Brooks <[email protected]> commit 6c203f9 Author: Alexey Rozhkov <[email protected]> Date: Mon Jun 3 10:16:22 2024 +0100 Fix overriding optimize_device_cache with optimize_cuda_cache in PPOConfig (#1690) * Don't override optimize_device_cache when optimize_cuda_cache is not provided Raise an exception when both optimize_cuda_cache and optimize_device_cache are set * Minor fix commit f18253b Author: Kashif Rasul <[email protected]> Date: Mon Jun 3 09:43:02 2024 +0100 intial RPO loss (#1686) * intial RPO loss * fix sign * clean up commit 151a452 Author: Samuel <[email protected]> Date: Wed May 29 20:29:38 2024 +0200 Fix max completion length (#1588) commit 488b502 Author: Younes Belkada <[email protected]> Date: Wed May 29 20:19:26 2024 +0200 fix (#1678) commit 3c0a10b Author: Wang, Yi <[email protected]> Date: Mon May 27 20:52:20 2024 +0800 fix dataset load error (#1670) Signed-off-by: Wang, Yi <[email protected]> commit b031adf Author: Younes Belkada <[email protected]> Date: Fri May 24 15:20:16 2024 +0200 FIX / PPO: Fix `enable_input_require_grads` issues with PPO models (#1664) * Update modeling_base.py * Update ppo_config.py * Update ppo_trainer.py * style commit e7cb597 Author: Costa Huang <[email protected]> Date: Thu May 23 11:37:16 2024 -0400 Fix ppov2 test case (#1661) * Fix PPOv2 / RLOO refactor's stuff * update terminology to use stop token commit bc8dfbf Author: Kashif Rasul <[email protected]> Date: Thu May 23 15:28:04 2024 +0200 update eval_strategy (#1662) commit e4ed7a3 Author: Sourab Mangrulkar <[email protected]> Date: Thu May 23 18:34:22 2024 +0530 do not upcast adapters when using FSDP+QLoRA (#1654) commit 9a7efbd Author: syrn1k <[email protected]> Date: Thu May 23 15:58:49 2024 +0300 🤫 TR-DPO implementation (#1593) * 🤫 TR-DPO implementation baseline * fix comments * docs * fix linters * test added * move configs to DPOConfig * fix typo * add docs * fix import * use state.global_step * fix order of arguments * make sure plugins are not none * Update trl/trainer/utils.py Co-authored-by: Benjamin Bossan <[email protected]> * Update trl/trainer/utils.py Co-authored-by: Benjamin Bossan <[email protected]> * checking that reference model weights have changed * sync_target_model as staticmethod * set reference model --------- Co-authored-by: Nikita Surnachev <[email protected]> Co-authored-by: Kashif Rasul <[email protected]> Co-authored-by: Benjamin Bossan <[email protected]> commit b344bce Author: Anush Kini <[email protected]> Date: Thu May 23 18:27:25 2024 +0530 [DPO] Add 'robust' loss_type (#1653) * Initial commit * pre-commit fix * Minor change to comments * Added some documentation on how to use Robust DPO commit 35e12dc Author: Nicolinho <[email protected]> Date: Thu May 23 14:36:15 2024 +0200 Fix inheritance order in PPOv2Config (#1659) * fix inheritance order in PPOv2Config * fix inheritance order in rloo_config commit 1da6be1 Author: Ali Bakly <[email protected]> Date: Thu May 23 14:10:29 2024 +0200 docs: correct cDPO usage in DPOTrainer (#1655) commit e249cd8 Author: Younes Belkada <[email protected]> Date: Thu May 23 14:10:05 2024 +0200 add support for training collator (#1658) commit a02513c Author: Zach Mueller <[email protected]> Date: Thu May 23 06:48:00 2024 -0400 Apply deprecated `evaluation_strategy` (#1559) * Deprecate * Update tests/test_dpo_trainer.py --------- Co-authored-by: Kashif Rasul <[email protected]> commit 13454d2 Author: Costa Huang <[email protected]> Date: Wed May 22 08:31:10 2024 -0400 PPO / Reinforce Trainers (#1540) * Add ppov2 trainer * make eos trick optional, remove unused args * quick fix * precommit * update debugging script * fix out of bound `drop_last=True`; use built-in scheduler * Add PPO examples * push changes * quick change * quick change * various bug fixes * remove unnecessary grad accumulation setting * push new changes * fix DS3 model saving * update ppo.py * refactor * quick change * refactor * update ppo trainer * refactor * quick test * add ds2 /ds3 7 processes config * add vllm trainer * quick change * experiment with reward normalization * push changes * quick push * push changes * push various changes * refactor to use ModelConfig * quick change * refactor * refactor * Simplify DS logic * quick update * remove unnecessary files * precommit * deepspeed fix; handle edge case when eos_token_id = 0 * add PPO tldr example * add TL;DR example * fix undefined var * utilize all samples in rloo * quick setting * remove the unnecessary `value_model` * use exact_div * allow saving the deepspeed model * refactor * remove dead code * Use some shared utilities * add some end-to-end test cases * add PPOv2 docs and RLOO docs / tests * update docs * quikc push * fix ci * fix type annotation for ci * quick update * update trainer docs commit 99f2c94 Author: Sourab Mangrulkar <[email protected]> Date: Wed May 15 19:55:46 2024 +0530 don't cast the trainable lora layers to half precision (#1644) * don't cast the trainable lora layers to half precision * quality commit 6401d08 Author: Wing Lian <[email protected]> Date: Tue May 14 09:41:07 2024 -0400 Pairwise Noise Contrastive Alignment (#1632) * add NCA paired preference loss * chore: lint * set more lenient tolerance for integration tests * Update tests/test_dpo_trainer.py * skip test * fix --------- Co-authored-by: Younes Belkada <[email protected]> Co-authored-by: younesbelkada <[email protected]> commit d632a5b Author: bartoszzuk <[email protected]> Date: Tue May 14 12:25:54 2024 +0200 Fixed wrong logs prefixes in KTOTrainer (#1641) * Fixed wrong logs prefixes in KTOTrainer * Pre-commit formating commit 5aeb752 Author: Tiezhen WANG <[email protected]> Date: Fri May 10 23:19:15 2024 +0800 Update sft_llama2.py to work with the latest API (#1637) * Update sft_llama2.py to work with the latest API SFTTrainer now takes a STFConfig argument * Update dpo_llama2.py * precommit commit b8b8978 Author: Ilya Gusev <[email protected]> Date: Fri May 10 15:43:13 2024 +0200 [ORPO] Correct label mask for pad tokens (#1625) * [ORPO] Correct label mask for pad tokens Recent [fix](57aebe9) for calculating NLL loss for a whole sequence introduced a bug. When input_ids are copied to labels, pad tokens are not masked. This PR aims to path this by masking labels based on the attention mask. * -100 -> label_pad_token_id Co-authored-by: Kashif Rasul <[email protected]> --------- Co-authored-by: Kashif Rasul <[email protected]> commit 8799952 Author: Costa Huang <[email protected]> Date: Fri May 10 09:32:20 2024 -0400 visualize rm prediction (#1636) * visualize rm prediction * quick update * quick check * quick fix * update eval steps commit 3b4c249 Author: Xiao Yu <[email protected]> Date: Fri May 3 18:19:35 2024 -0400 fixed adding bos and eos token unconditionally (#1591) * fixed adding bos and eos token unconditionally * fixed typo of tokenizer -> self.tokenizer. Also added update to ORPO * fixed code quality, and added BOS/EOS fix to KTO * code reformatting with pre-commit run --all-files * bug fix: check input id length before checking for EOS/BOS commit 0347f58 Author: lewtun <[email protected]> Date: Fri May 3 15:59:59 2024 +0200 Fix ZeRO-3 generation context manager (#1617)
huggingface · Jul 15, 2024 · e05e0ca · e05e0ca
1 parent 9362eff
commit e05e0ca
Show file tree

Hide file tree

Showing 90 changed files with 6,337 additions and 757 deletions.
diff --git a/.github/workflows/benchmark.yml b/.github/workflows/benchmark.yml
diff --git a/.github/workflows/clear_cache.yml b/.github/workflows/clear_cache.yml
@@ -10,7 +10,7 @@ jobs:
     runs-on: ubuntu-latest
     steps:
       - name: Check out code
-        uses: actions/checkout@v3
+        uses: actions/checkout@v4
 
       - name: Cleanup
         run: |

diff --git a/.github/workflows/docker-build.yml b/.github/workflows/docker-build.yml
@@ -31,7 +31,7 @@ jobs:
       - name: Set up Docker Buildx
         uses: docker/setup-buildx-action@v1
       - name: Check out code
-        uses: actions/checkout@v3
+        uses: actions/checkout@v4
       - name: Login to DockerHub
         uses: docker/login-action@v1
         with:
@@ -45,30 +45,14 @@ jobs:
           push: true
           tags: huggingface/trl-latest-gpu
 
-      - name: Post to a Slack channel
-        id: slack
-        #uses: slackapi/[email protected]
-        uses: slackapi/slack-github-action@6c661ce58804a1a20f6dc5fbee7f0381b469e001
+      - name: Post to Slack
+        if: always()
+        uses: huggingface/hf-workflows/.github/actions/post-slack@main
         with:
-          # Slack channel id, channel name, or user id to post message.
-          # See also: https://api.slack.com/methods/chat.postMessage#channels
-          channel-id: ${{ env.CI_SLACK_CHANNEL }}
-          # For posting a rich message using Block Kit
-          payload: |
-            {
-              "text": "trl-latest-gpu Docker Image build result: ${{ job.status }}\n${{ github.event.pull_request.html_url || github.event.head_commit.url }}",
-              "blocks": [
-                {
-                  "type": "section",
-                  "text": {
-                    "type": "mrkdwn",
-                    "text": "trl-latest-gpu Docker Image build result: ${{ job.status }}\n${{ github.event.pull_request.html_url || github.event.head_commit.url }}"
-                  }
-                }
-              ]
-            }
-        env:
-          SLACK_BOT_TOKEN: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }}
+          slack_channel: ${{ env.CI_SLACK_CHANNEL }}
+          title: 🤗 Results of the trl-latest-gpu Docker Image build
+          status: ${{ job.status }}
+          slack_token: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }}
 
   trl-source:
     name: "Latest TRL + HF ecosystem from source"
@@ -87,7 +71,7 @@ jobs:
       - name: Set up Docker Buildx
         uses: docker/setup-buildx-action@v1
       - name: Check out code
-        uses: actions/checkout@v3
+        uses: actions/checkout@v4
       - name: Login to DockerHub
         uses: docker/login-action@v1
         with:
@@ -101,27 +85,11 @@ jobs:
           push: true
           tags: huggingface/trl-source-gpu
 
-      - name: Post to a Slack channel
-        id: slack
-        #uses: slackapi/[email protected]
-        uses: slackapi/slack-github-action@6c661ce58804a1a20f6dc5fbee7f0381b469e001
+      - name: Post to Slack
+        if: always()
+        uses: huggingface/hf-workflows/.github/actions/post-slack@main
         with:
-          # Slack channel id, channel name, or user id to post message.
-          # See also: https://api.slack.com/methods/chat.postMessage#channels
-          channel-id: ${{ env.CI_SLACK_CHANNEL }}
-          # For posting a rich message using Block Kit
-          payload: |
-            {
-              "text": "trl-source-gpu Docker Image build result: ${{ job.status }}\n${{ github.event.pull_request.html_url || github.event.head_commit.url }}",
-              "blocks": [
-                {
-                  "type": "section",
-                  "text": {
-                    "type": "mrkdwn",
-                    "text": "trl-source-gpu Docker Image build result: ${{ job.status }}\n${{ github.event.pull_request.html_url || github.event.head_commit.url }}"
-                  }
-                }
-              ]
-            }
-        env:
-          SLACK_BOT_TOKEN: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }}
+          slack_channel: ${{ env.CI_SLACK_CHANNEL }}
+          title: 🤗 Results of the trl-source-gpu Docker Image build
+          status: ${{ job.status }}
+          slack_token: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }}  
diff --git a/.github/workflows/slow-tests.yml b/.github/workflows/slow-tests.yml
@@ -30,7 +30,7 @@ jobs:
       run:
         shell: bash
     steps:
-      - uses: actions/checkout@v3
+      - uses: actions/checkout@v4
       - name: Pip install
         run: |
           source activate trl
@@ -66,7 +66,7 @@ jobs:
       run:
         shell: bash
     steps:
-      - uses: actions/checkout@v3
+      - uses: actions/checkout@v4
       - name: Pip install
         run: |
           source activate trl

diff --git a/.github/workflows/stale.yml b/.github/workflows/stale.yml
@@ -12,10 +12,10 @@ jobs:
     env:
       GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
     steps:
-    - uses: actions/checkout@v3
+    - uses: actions/checkout@v4
 
     - name: Setup Python
-      uses: actions/setup-python@v4
+      uses: actions/setup-python@v5
       with:
         python-version: 3.8
 

diff --git a/.github/workflows/tests-main.yml b/.github/workflows/tests-main.yml
@@ -16,9 +16,9 @@ jobs:
       fail-fast: false
     runs-on: ${{ matrix.os }}
     steps:
-    - uses: actions/checkout@v3
+    - uses: actions/checkout@v4
     - name: Set up Python ${{ matrix.python-version }}
-      uses: actions/setup-python@v4
+      uses: actions/setup-python@v5
       with:
         python-version: ${{ matrix.python-version }}
         cache: "pip"
@@ -36,28 +36,11 @@ jobs:
     - name: Test with pytest
       run: |
         make test
-    - name: Post to a Slack channel
+    - name: Post to Slack
       if: always()
-      id: slack
-      #uses: slackapi/[email protected]
-      uses: slackapi/slack-github-action@6c661ce58804a1a20f6dc5fbee7f0381b469e001
+      uses: huggingface/hf-workflows/.github/actions/post-slack@main
       with:
-        # Slack channel id, channel name, or user id to post message.
-        # See also: https://api.slack.com/methods/chat.postMessage#channels
-        channel-id: ${{ env.CI_SLACK_CHANNEL }}
-        # For posting a rich message using Block Kit
-        payload: |
-          {
-            "text": "TRL CI on transformers/PEFT main: ${{ job.status }}\n${{ github.event.pull_request.html_url || github.event.head_commit.url }}",
-            "blocks": [
-              {
-                "type": "section",
-                "text": {
-                  "type": "mrkdwn",
-                  "text": "TRL CI on transformers/PEFT main: ${{ job.status }}\n${{ github.event.pull_request.html_url || github.event.head_commit.url }}"
-                }
-              }
-            ]
-          }
-      env:
-        SLACK_BOT_TOKEN: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }}
+        slack_channel: ${{ env.CI_SLACK_CHANNEL }}
+        title: 🤗 Results of the TRL CI on transformers/PEFT main
+        status: ${{ job.status }}
+        slack_token: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }}
diff --git a/.github/workflows/tests.yml b/.github/workflows/tests.yml
@@ -21,15 +21,15 @@ jobs:
         python-version: [3.9]
 
     steps:
-      - uses: actions/checkout@v2
+      - uses: actions/checkout@v4
         with:
           fetch-depth: 0
           submodules: recursive
       - name: Set up Python ${{ matrix.python-version }}
-        uses: actions/setup-python@v2
+        uses: actions/setup-python@v5
         with:
           python-version: ${{ matrix.python-version }}
-      - uses: pre-commit/action@v2.0.3
+      - uses: pre-commit/action@v3.0.1
         with:
           extra_args: --all-files
 
@@ -41,9 +41,9 @@ jobs:
         os: ['ubuntu-latest', 'windows-latest']
     runs-on: ${{ matrix.os }}
     steps:
-    - uses: actions/checkout@v3
+    - uses: actions/checkout@v4
     - name: Set up Python ${{ matrix.python-version }}
-      uses: actions/setup-python@v4
+      uses: actions/setup-python@v5
       with:
         python-version: ${{ matrix.python-version }}
         cache: "pip"
@@ -63,9 +63,9 @@ jobs:
     needs: check_code_quality
     runs-on: 'ubuntu-latest'
     steps:
-    - uses: actions/checkout@v3
+    - uses: actions/checkout@v4
     - name: Set up Python 3.9
-      uses: actions/setup-python@v4
+      uses: actions/setup-python@v5
       with:
         python-version: '3.9'
         cache: "pip"

diff --git a/.github/workflows/trufflehog.yml b/.github/workflows/trufflehog.yml
@@ -0,0 +1,15 @@
+on:
+  push:
+
+name: Secret Leaks
+
+jobs:
+  trufflehog:
+    runs-on: ubuntu-latest
+    steps:
+    - name: Checkout code
+      uses: actions/checkout@v4
+      with:
+        fetch-depth: 0
+    - name: Secret Scanning
+      uses: trufflesecurity/trufflehog@main
diff --git a/benchmark/benchmark_level1.sh b/benchmark/benchmark_level1.sh
@@ -33,7 +33,7 @@ python benchmark/benchmark.py \
     --slurm-template-path benchmark/trl.slurm_template
 
 python benchmark/benchmark.py \
-    --command "python examples/scripts/reward_modeling.py --model_name_or_path=facebook/opt-350m --output_dir="reward_modeling_anthropic_hh" --per_device_train_batch_size=64 --num_train_epochs=1 --gradient_accumulation_steps=16 --gradient_checkpointing=True --learning_rate=1.41e-5 --report_to="wandb" --remove_unused_columns=False --optim="adamw_torch" --logging_steps=10 --evaluation_strategy="steps" --max_length=512" \
+    --command "python examples/scripts/reward_modeling.py --model_name_or_path=facebook/opt-350m --output_dir="reward_modeling_anthropic_hh" --per_device_train_batch_size=64 --num_train_epochs=1 --gradient_accumulation_steps=16 --gradient_checkpointing=True --learning_rate=1.41e-5 --report_to="wandb" --remove_unused_columns=False --optim="adamw_torch" --logging_steps=10 --eval_strategy="steps" --max_length=512" \
     --num-seeds 3 \
     --start-seed 1 \
     --workers 10 \

diff --git a/docs/source/_toctree.yml b/docs/source/_toctree.yml
@@ -27,6 +27,10 @@
     title: Supervised Fine-Tuning
   - local: ppo_trainer
     title: PPO Trainer
+  - local: ppov2_trainer
+    title: PPOv2 Trainer
+  - local: rloo_trainer
+    title: RLOO Trainer
   - local: best_of_n
     title: Best of N Sampling
   - local: dpo_trainer
@@ -37,6 +41,8 @@
     title: CPO Trainer
   - local: ddpo_trainer
     title: Denoising Diffusion Policy Optimization
+  - local: alignprop_trainer
+    title: AlignProp Trainer
   - local: orpo_trainer
     title: ORPO Trainer
   - local: iterative_sft_trainer