Skip to content

Commit

Permalink
Squashed commit of the following:
Browse files Browse the repository at this point in the history
commit 9e9dc96
Author: Maxim Kopecki <[email protected]>
Date:   Wed Jul 10 19:11:13 2024 +0200

    Added missing token kwarg in Peft model loading (#1825)

commit 7ddef5c
Author: Quentin Gallouédec <[email protected]>
Date:   Wed Jul 10 18:26:11 2024 +0200

    Make use of `trust_remote_code` consistent (#1806)

    Co-authored-by: Quentin Gallouédec <[email protected]>

commit a9cddf8
Author: Adnan Khan <[email protected]>
Date:   Wed Jul 10 11:25:07 2024 -0400

    Delete unused benchmark.yml workflow. (#1822)

commit 2860ce5
Author: Quentin Gallouédec <[email protected]>
Date:   Tue Jul 9 09:22:52 2024 +0200

    DPO Llava 1.5 and PaliGemma support (#1797)

    * llava support dpo

    * add_special_tokens=False only when possible

    * format

    * pali gemma

    * refactor size

    * remove image resize

    ---------

    Co-authored-by: Quentin Gallouédec <[email protected]>

commit 30e33bd
Author: Quentin Gallouédec <[email protected]>
Date:   Tue Jul 9 05:37:12 2024 +0200

    upgrade gh actions (#1818)

    Co-authored-by: Quentin Gallouédec <[email protected]>

commit d5a0d2d
Author: Costa Huang <[email protected]>
Date:   Mon Jul 8 11:12:41 2024 -0400

    Set dev version (#1817)

commit 314e8eb
Author: Puneet Singh Bhooi <[email protected]>
Date:   Mon Jul 8 19:11:36 2024 +0530

    fix broken url in `docs\source\index.mdx` (#1813)

commit e107920
Author: Costa Huang <[email protected]>
Date:   Mon Jul 8 09:38:09 2024 -0400

    0.9.6 release (#1816)

commit 78045de
Author: Alvaro Bartolome <[email protected]>
Date:   Mon Jul 8 01:59:26 2024 +0200

    Fix `TRL_USE_RICH` environment variable handling (#1808)

    * Add `strtobool` custom implementation from `distutils`

    * Fix `TRL_USE_RICH` handling via `strtobool`

    * Run `make precommit`

commit 747612f
Author: Alvaro Bartolome <[email protected]>
Date:   Fri Jul 5 16:28:59 2024 +0200

    Fix `torch_dtype` handling in `{DPO,SFT}Trainer` when provided via CLI (#1807)

    * Fix `torch_dtype` handling through CLI

    The `torch_dtype` is not properly handled when provided via the TRL CLI
    since it's provided initially as a string, but is then casted to
    `torch.dtype` before providing it to the `{DPO,SFT}Trainer`, which means
    that those trainers should handle the scenario where `torch_dtype` is a
    `torch.dtype` too.

    * Add `torch_dtype` tests in `test_{dpo,sft}_trainer.py`

    * Forward contribution credits

    * Run `make precommit`

    ---------

    Co-authored-by: Tash Srivastava <[email protected]>

commit 9e3a35b
Author: Michael <[email protected]>
Date:   Fri Jul 5 07:29:48 2024 -0400

    Remove extra print in reward_trainer.py (#1799)

    `print_rich_table` is called twice and the first call doesn't restrict to `num_print_samples`. Remove the first, extra call

commit 4402b36
Author: Quentin Gallouédec <[email protected]>
Date:   Thu Jul 4 14:29:25 2024 +0200

    clean examples (#1791)

    Co-authored-by: Quentin Gallouédec <[email protected]>

commit 78f8228
Author: Noah Tye <[email protected]>
Date:   Wed Jul 3 11:10:50 2024 -0700

    Bugfix: Preserve token fields when converting TrainingArguments to SFTConfig (#1794)

    * Preserve token fields when converting TrainingArguments to SFTConfig

    TrainingArguments.to_dict() redacts token fields, so we have to
    individually copy them over when converting to SFTConfig to avoid
    breaking push_to_hub functionality.

    Also adds a test.

    * run precommit

    * one-line args_as_dict definition per suggestion from kashif

    * generalize token copying to match TrainingArguments behavior

    * unwrap |= on dict, to support python 3.8

    * use .update instead of |= or for-loop

commit b6af2ed
Author: Kashif Rasul <[email protected]>
Date:   Wed Jul 3 08:29:16 2024 +0200

    add model_init_kwargs to training_args (#1787)

commit cd85b14
Author: Tommaso Buonocore <[email protected]>
Date:   Sat Jun 29 15:35:48 2024 +0200

    Fixed typo in SFT trainer docs (#1788)

    'STFConfig' instead of 'SFTConfig' appears multiple times in the doc, causing error when running the code snippets.

commit a57544f
Author: Kashif Rasul <[email protected]>
Date:   Thu Jun 27 15:47:58 2024 +0200

    fix docs and examples (#1780)

commit b68ff96
Author: Quentin Gallouédec <[email protected]>
Date:   Wed Jun 26 16:26:37 2024 +0200

    Visual DPO (#1647)

    * Remove extra whitespaces

    * idefics

    * vdpo

    * sft idefics

    * pad with test

    * use prompt instead of tokenizer

    * rm name main

    * support vlm in tokenize row

    * temp fix for regex in lora_target_module

    * format

    * vdpo

    * tmp float16 hard code

    * concatenated_forward support for vision

    * style and new command line

    * all-linear

    * format

    * delete old examples

    * get image

    * upcast

    * new test

    * modified test

    * new strat for tokenizer

    * rm token transfer

    * integrate vision in dpo example

    * format

    * add FDivergenceType back

    * precommit

    * pillow test dep

    * optional prompt

    * `evaluation_strategy` to `eval_strategy`

    * revert vsft change (oos)

    * update test

    * test

    * comment and support more in process

    * update process

    * update doc for vdpo

    * caution about limited support

    * Update docs/source/dpo_trainer.mdx

    Co-authored-by: Kashif Rasul <[email protected]>

    * revert DPO example changes

    * cleaner way to check if a model is vision

    * comment

    * update vdpo example

    * rename

    ---------

    Co-authored-by: Quentin Gallouédec <[email protected]>
    Co-authored-by: Kashif Rasul <[email protected]>

commit c8c01cc
Author: Mubin Manasia <[email protected]>
Date:   Wed Jun 26 03:23:36 2024 -0600

    Fix Documentation Overflow Issues for Long URLs in SFTConfig (#1774)

    * Update sft_config.py

    * Update sft_config.py

commit 3479606
Author: Costa Huang <[email protected]>
Date:   Wed Jun 26 03:18:22 2024 -0400

    Remove the leading space in the tldr preference dataset (#1773)

commit 7965b78
Author: Haozhe Ji <[email protected]>
Date:   Tue Jun 25 22:47:32 2024 +0800

    add Efficient Exact Optimization (EXO) (#1735)

    * add exo

    * fix a detail

    * Update trl/trainer/dpo_trainer.py

    * Update trl/trainer/dpo_trainer.py

    * Update trl/trainer/dpo_trainer.py

    ---------

    Co-authored-by: Kashif Rasul <[email protected]>

commit 56bd1bb
Author: Quentin Gallouédec <[email protected]>
Date:   Tue Jun 25 16:14:26 2024 +0200

    `evaluation_strategy` to `eval_strategy` (#1771)

    Co-authored-by: Quentin Gallouédec <[email protected]>

commit 94d53e6
Author: Clara Pohland <[email protected]>
Date:   Mon Jun 24 21:27:00 2024 +0200

    MoE Models: option to add load balancing loss (#1765)

    * KTO: add aux loss

    * use router_aux_loss_coef in KtoTrainer when aux_loss enabled

    * align optional aux_loss in DPO, KTO, CPO, ORPO

    * precommit changes

    * fix KL forward kwargs

    * add aux_loss doku entry

    * apply docs suggestions

    ---------

    Co-authored-by: Clara Luise Pohland <[email protected]>

commit b5be100
Author: Mihir Prabhudesai <[email protected]>
Date:   Mon Jun 24 12:05:44 2024 -0400

    Added Reward Backpropogation Support  (#1585)

    * added alignprop template

    * added alignprop support

    * Update alignprop_trainer.mdx

    * Update alignprop_trainer.mdx

    * added better why statement

    * fixed inference code

    * changed self to pipeline

    * removed aesthetic classifier

    * added aesthetic to auxiliary models

    * added unseen prompt logging

    * removed unseen prompt log

    * fixed minor

    * remove not needed import in trl/__init__.py

    Co-authored-by: Younes Belkada <[email protected]>

    * fixed styling

    * updated _toctree

    ---------

    Co-authored-by: Younes Belkada <[email protected]>

commit 6e1652b
Author: Haoran Xu <[email protected]>
Date:   Sun Jun 23 09:54:30 2024 -0700

    Add CPO-SimPO method (#1760)

    * enable cpo-simpo

    * highlight SimPO and CPO-SimPO

    * add test for cpo_alpha

    * formatting

    * Update docs/source/cpo_trainer.mdx

    ---------

    Co-authored-by: Kashif Rasul <[email protected]>

commit 65374c6
Author: Costa Huang <[email protected]>
Date:   Fri Jun 21 11:20:54 2024 -0400

    New sentiment and descriptiveness dataset (#1757)

    * push changes

    * handle edge cases where the chosen and the rejected are the same

commit 9956091
Author: Juyoung Suk <[email protected]>
Date:   Fri Jun 21 18:01:08 2024 +0900

    Add dataset_text_field in examples/scripts/sft.py (#1758)

commit 34d273f
Author: Costa Huang <[email protected]>
Date:   Thu Jun 20 13:16:43 2024 -0400

    Support num_train_epochs (#1743)

    * add a test case for num_train_epochs

    * fix ci

    * quick change

    * disable push to hub

    * debug windows ci

    * try another fix

    * skip subprocess tests on windows

commit 3bf9449
Author: Mert Sayar <[email protected]>
Date:   Thu Jun 20 18:22:20 2024 +0300

    Fix masking of response tokens (#1718)

    Current handling of `response_masks` inside `batch_forward_pass`
    function does not take padding into consideration which results with
    shape unmatch during masking. Since response mask is a mask tensor of
    response tokens, response tokens should not be concatenated with a
    `torch.zeros(query_length)` and masking operation should be done without
    slicing.

    Remove the concatenation of the response mask, remove the slicing from
    the response mask since response mask already has the length of `end -
    start + 1`, which is equal to length of `masks[j, start:end]`.

commit ba6abee
Author: idanshen <[email protected]>
Date:   Thu Jun 20 09:14:16 2024 -0400

    Support for returning past_key_values from the model (#1742)

    * add support for returning past_key_values from the model

    * change order of  keys

commit a57e759
Author: 1485840691 <[email protected]>
Date:   Wed Jun 19 18:02:51 2024 +0800

    Integrate f-divergence to DPO (Follow up) (#1610)

    * Step 1: update ppo_trainer and hello_world example

    * Step 2: Refine comments and add parameter type

    * Step 2: Add missing parameter comments

    * Step 1: Organize ptx loss into a function and add ptx_loss to train_stats

    * Step 1 updates: add comment to ptx_loss function, fix a bug and add warning message

    * Step 2: 1) Add ppo_ptx trainig example as ppo; 2) separate pretrain data fetch and iterate

    * Step 2: Remove loss from columns_to_log in ppo_ptx example

    * Remove data set revision in load imbd dataset

    * Run pre-commit and fix format issues

    * Initial draft of f-divergence fn

    * Update f-divergence to avoid overflow

    * fix test errors and comments

    * Add Unit tests for dpo loss with alpha and js div f

    * Adjust format

    * Fix test error

    * Reverse this update

    * Add test cases

    * Reverse un-needed updates

    * Update code style

    * Try to fix code fmt error

    * remove extra end line

    ---------

    Co-authored-by: Kashif Rasul <[email protected]>

commit ae23d40
Author: Shihyueh Hsu <[email protected]>
Date:   Tue Jun 18 22:07:24 2024 +0800

    change the `process` function in the example of DPO (#1753)

    * change the `process` function in the example of DPO

    * fix

commit 83b367b
Author: Younes Belkada <[email protected]>
Date:   Tue Jun 18 11:31:17 2024 +0200

    CI / `KTOTrainer`: Remove old tests (#1750)

    * remove old tests

    * remove datasets

    * Update test_dpo_trainer.py

    * Update test_dpo_trainer.py

commit d1ed730
Author: Michael <[email protected]>
Date:   Mon Jun 17 10:50:21 2024 -0400

    prepare deepspeed accomodate fp16 and bf16 (#1728)

    * prepare deepspeed accomodate fp16 and bf16

    * precommit

commit 8f8e95e
Author: Younes Belkada <[email protected]>
Date:   Mon Jun 17 16:49:00 2024 +0200

    CPO / DPO: Fix red CI (#1749)

    * fix red CI

    * precommit

commit 4e23d95
Author: Younes Belkada <[email protected]>
Date:   Mon Jun 17 16:41:36 2024 +0200

    fix red CI

commit 50c4620
Author: Kawin <[email protected]>
Date:   Mon Jun 17 07:14:44 2024 -0700

    small KTO fixes (#1734)

    * add warning for imbalanced data

    * update documentation

    * update script commands to be same as in dpo

    * use batch_size KL examples and batch_size target examples to calculate batch_size losses

    * fix deepspeed issue

    * speed up forward with no_grad for KL

    * add some removed metrics

    * Update trl/trainer/kto_trainer.py

    * Update trl/trainer/kto_trainer.py

    * Update trl/trainer/kto_trainer.py

    add reference to paper

    Co-authored-by: lewtun <[email protected]>

    * Update trl/trainer/kto_trainer.py

    Co-authored-by: Kashif Rasul <[email protected]>

    * Update trl/trainer/kto_trainer.py

    Co-authored-by: Kashif Rasul <[email protected]>

    * Update trl/trainer/kto_trainer.py

    Co-authored-by: Kashif Rasul <[email protected]>

    * Update trl/trainer/kto_trainer.py

    Co-authored-by: Kashif Rasul <[email protected]>

    * Update trl/trainer/kto_trainer.py

    Co-authored-by: Kashif Rasul <[email protected]>

    * Update trl/trainer/kto_trainer.py

    Co-authored-by: Kashif Rasul <[email protected]>

    * Update trl/trainer/kto_trainer.py

    Co-authored-by: Kashif Rasul <[email protected]>

    * Update trl/trainer/kto_trainer.py

    Co-authored-by: Kashif Rasul <[email protected]>

    * Update trl/trainer/kto_trainer.py

    Co-authored-by: Kashif Rasul <[email protected]>

    * Update trl/trainer/kto_trainer.py

    Co-authored-by: Kashif Rasul <[email protected]>

    * Update trl/trainer/kto_trainer.py

    Co-authored-by: Kashif Rasul <[email protected]>

    * add more detailed comments

    * convert assert to ValueError

    * Update kto_trainer.py

    * precommit formatting

    * remove nans in metrics by gathering across machines

    * fix formatting

    * fix choice of mismatched examples for KL term

    * describe weights

    * fix hanging issue in distributed training

    * linting

    * move metrics to cpu

    * Update trl/trainer/kto_trainer.py

    Co-authored-by: lewtun <[email protected]>

    * Update trl/trainer/kto_trainer.py

    * Update trl/trainer/kto_trainer.py

    * remove kto_pair

    * speed up data processing

    * move bco code inside

    * raise error for kto_pair argument

    * fix formatting

    ---------

    Co-authored-by: Kashif Rasul <[email protected]>
    Co-authored-by: lewtun <[email protected]>
    Co-authored-by: Winnie Xu <[email protected]>

commit 6105d03
Author: Younes Belkada <[email protected]>
Date:   Mon Jun 17 16:01:06 2024 +0200

    `TrlParser`: Add ignore extra args option (#1748)

    * add ignore extra args option

    * Update trl/commands/cli_utils.py

commit e247bbd
Author: Younes Belkada <[email protected]>
Date:   Mon Jun 17 15:16:07 2024 +0200

    CI / core: Pin `numpy` to `!=2.0.0` for CI and to users (#1747)

    * Update setup.py

    * Update setup.py

    * Update setup.py

    * Update test_best_of_n_sampler.py

    dummy commit

    * pin numpy

    * Update tests/test_best_of_n_sampler.py

    * Update setup.py

commit 3d04496
Author: Michael <[email protected]>
Date:   Mon Jun 17 08:43:33 2024 -0400

    better trl parser with yaml config (#1739)

    * working trl parser with config

    correctly overrides yaml config with command line arguments
    adds return_remaining_strings
    when return_remaining_strings is False, raises error if yaml contains
    extra args that are not in the dataclasses
    simpler and cleaner than previous yaml parsing and merging
    addresses #1733

    * lowercase trlparser

commit 2d244f8
Author: Younes Belkada <[email protected]>
Date:   Mon Jun 17 11:56:13 2024 +0200

    Workflow: Notify tests results on slack channel (#1744)

    * Update tests-main.yml

    * Update docker-build.yml

commit f5168fd
Author: Igor Melnyk <[email protected]>
Date:   Wed Jun 12 05:54:54 2024 -0400

    adds AOT (#1701)

    * adds AOT

    * Applied format changes

    * added docs and tests

    ---------

    Co-authored-by: Igor Melnyk <[email protected]>

commit 79686e1
Author: jetlime <[email protected]>
Date:   Wed Jun 12 00:35:31 2024 +1000

    ktotrainer: Refuse datasets which contain only one class of labels (#1724)

    * ktotrainer: refuse dataset which contain only one class of labels

    * ktotrainer: document new dataset constraint

commit 34ebc4c
Author: Luc Georges <[email protected]>
Date:   Mon Jun 10 11:17:54 2024 +0200

    feat(ci): add trufflehog secrets detection (#1721)

    * feat(ci): add trufflehog secrets detection

    * fix(ci): remove unnecessary permissions

commit 1d84e2b
Author: Michael <[email protected]>
Date:   Fri Jun 7 11:42:08 2024 +0200

    Fix default padding_value in dpo_config.py (#1692)

    dpo_config default padding value should be None, not 0, otherwise it by default overrides the padding value of any tokenizer to 0

commit 2f71b8b
Author: Michael <[email protected]>
Date:   Fri Jun 7 10:37:27 2024 +0200

    fix yaml parser for derived config classes (#1713)

    fixes #1712
    reformatted cli_utils with ruff

commit 5bcb8ad
Author: Kashif Rasul <[email protected]>
Date:   Fri Jun 7 08:48:17 2024 +0100

    RDPO fix nll loss (#1705)

commit b8b972f
Author: Haoran Xu <[email protected]>
Date:   Thu Jun 6 14:06:47 2024 -0700

    Add a variant of CPO, SimPO (#1703)

    * add a variant of cpo: simpo

    * correct cpo-simpo loss

    * avoid 0 int error in logging

    * add simpo description

    * Update trl/trainer/cpo_trainer.py

    Co-authored-by: Kashif Rasul <[email protected]>

    * fix formatting

    * add test for simpo

    * Update docs/source/cpo_trainer.mdx

    Co-authored-by: Kashif Rasul <[email protected]>

    * add a docstring for simpogamma

    * move simpo description to the above docstring

    * change simpo description in the doc

    * formatting

    ---------

    Co-authored-by: Kashif Rasul <[email protected]>

commit 3eb9ccb
Author: Younes Belkada <[email protected]>
Date:   Thu Jun 6 19:33:20 2024 +0200

    set dev version (#1710)

    * Update setup.py

    * Update __init__.py

commit 974b0d3
Author: Costa Huang <[email protected]>
Date:   Thu Jun 6 10:13:00 2024 -0400

    0.9.4 release (#1708)

commit 39a7d1c
Author: Younes Belkada <[email protected]>
Date:   Thu Jun 6 15:50:17 2024 +0200

    SFTTrainer: Fix backward Compatibility issue with `TrainingArguments` (#1707)

    * fix BC

    * fixup

commit 0bdc638
Author: Guilherme Freire <[email protected]>
Date:   Thu Jun 6 14:42:58 2024 +0100

    Fixed doc string and docs for the SFTConfig update (#1706)

commit 275d33b
Author: Costa Huang <[email protected]>
Date:   Wed Jun 5 14:34:59 2024 -0400

    0.9.3 release (#1699)

commit c0819ee
Author: Younes Belkada <[email protected]>
Date:   Wed Jun 5 17:29:03 2024 +0200

    Update sft_trainer.py (#1698)

commit a03e7cc
Author: Costa Huang <[email protected]>
Date:   Wed Jun 5 11:00:19 2024 -0400

    Release 0.9.2 (#1697)

    * Release: 0.9.0

    * Release

commit a13cb89
Author: Costa Huang <[email protected]>
Date:   Wed Jun 5 10:20:54 2024 -0400

    Quick fix on GPT4-eval (#1696)

    * quick fix

    * precommit

commit 84156f1
Author: Quentin Gallouédec <[email protected]>
Date:   Mon Jun 3 20:09:05 2024 +0200

    Fix typo in DPOTrainer's warnings (#1688)

commit 4eb0b90
Author: Alex Brooks <[email protected]>
Date:   Mon Jun 3 10:24:32 2024 -0600

    Skip packing validation (#1673)

    * Add test for skipping preproc if packing=True

    Signed-off-by: Alex-Brooks <[email protected]>

    * Allow skipping of validation for packing=True

    Signed-off-by: Alex-Brooks <[email protected]>

    * Use dummy dataset in no packing preproc test

    Signed-off-by: Alex-Brooks <[email protected]>

    ---------

    Signed-off-by: Alex-Brooks <[email protected]>

commit 6c203f9
Author: Alexey Rozhkov <[email protected]>
Date:   Mon Jun 3 10:16:22 2024 +0100

    Fix overriding optimize_device_cache with optimize_cuda_cache in PPOConfig (#1690)

    * Don't override optimize_device_cache when optimize_cuda_cache is not provided
    Raise an exception when both optimize_cuda_cache and optimize_device_cache are set

    * Minor fix

commit f18253b
Author: Kashif Rasul <[email protected]>
Date:   Mon Jun 3 09:43:02 2024 +0100

    intial RPO loss (#1686)

    * intial RPO loss

    * fix sign

    * clean up

commit 151a452
Author: Samuel <[email protected]>
Date:   Wed May 29 20:29:38 2024 +0200

    Fix max completion length (#1588)

commit 488b502
Author: Younes Belkada <[email protected]>
Date:   Wed May 29 20:19:26 2024 +0200

    fix (#1678)

commit 3c0a10b
Author: Wang, Yi <[email protected]>
Date:   Mon May 27 20:52:20 2024 +0800

    fix dataset load error (#1670)

    Signed-off-by: Wang, Yi <[email protected]>

commit b031adf
Author: Younes Belkada <[email protected]>
Date:   Fri May 24 15:20:16 2024 +0200

    FIX / PPO: Fix `enable_input_require_grads` issues with PPO models (#1664)

    * Update modeling_base.py

    * Update ppo_config.py

    * Update ppo_trainer.py

    * style

commit e7cb597
Author: Costa Huang <[email protected]>
Date:   Thu May 23 11:37:16 2024 -0400

    Fix ppov2 test case (#1661)

    * Fix PPOv2 / RLOO refactor's stuff

    * update terminology to use stop token

commit bc8dfbf
Author: Kashif Rasul <[email protected]>
Date:   Thu May 23 15:28:04 2024 +0200

    update eval_strategy (#1662)

commit e4ed7a3
Author: Sourab Mangrulkar <[email protected]>
Date:   Thu May 23 18:34:22 2024 +0530

    do not upcast adapters when using FSDP+QLoRA (#1654)

commit 9a7efbd
Author: syrn1k <[email protected]>
Date:   Thu May 23 15:58:49 2024 +0300

    🤫 TR-DPO implementation (#1593)

    * 🤫 TR-DPO implementation baseline

    * fix comments

    * docs

    * fix linters

    * test added

    * move configs to DPOConfig

    * fix typo

    * add docs

    * fix import

    * use state.global_step

    * fix order of arguments

    * make sure plugins are not none

    * Update trl/trainer/utils.py

    Co-authored-by: Benjamin Bossan <[email protected]>

    * Update trl/trainer/utils.py

    Co-authored-by: Benjamin Bossan <[email protected]>

    * checking that reference model weights have changed

    * sync_target_model as staticmethod

    * set reference model

    ---------

    Co-authored-by: Nikita Surnachev <[email protected]>
    Co-authored-by: Kashif Rasul <[email protected]>
    Co-authored-by: Benjamin Bossan <[email protected]>

commit b344bce
Author: Anush Kini <[email protected]>
Date:   Thu May 23 18:27:25 2024 +0530

    [DPO] Add 'robust' loss_type (#1653)

    * Initial commit

    * pre-commit fix

    * Minor change to comments

    * Added some documentation on how to use Robust DPO

commit 35e12dc
Author: Nicolinho <[email protected]>
Date:   Thu May 23 14:36:15 2024 +0200

    Fix inheritance order in PPOv2Config (#1659)

    * fix inheritance order in PPOv2Config

    * fix inheritance order in rloo_config

commit 1da6be1
Author: Ali Bakly <[email protected]>
Date:   Thu May 23 14:10:29 2024 +0200

    docs: correct cDPO usage in DPOTrainer (#1655)

commit e249cd8
Author: Younes Belkada <[email protected]>
Date:   Thu May 23 14:10:05 2024 +0200

    add support for training collator (#1658)

commit a02513c
Author: Zach Mueller <[email protected]>
Date:   Thu May 23 06:48:00 2024 -0400

    Apply deprecated `evaluation_strategy` (#1559)

    * Deprecate

    * Update tests/test_dpo_trainer.py

    ---------

    Co-authored-by: Kashif Rasul <[email protected]>

commit 13454d2
Author: Costa Huang <[email protected]>
Date:   Wed May 22 08:31:10 2024 -0400

    PPO / Reinforce Trainers (#1540)

    * Add ppov2 trainer

    * make eos trick optional, remove unused args

    * quick fix

    * precommit

    * update debugging script

    * fix out of bound `drop_last=True`; use built-in scheduler

    * Add PPO examples

    * push changes

    * quick change

    * quick change

    * various bug fixes

    * remove unnecessary grad accumulation setting

    * push new changes

    * fix DS3 model saving

    * update ppo.py

    * refactor

    * quick change

    * refactor

    * update ppo trainer

    * refactor

    * quick test

    * add ds2 /ds3 7 processes config

    * add vllm trainer

    * quick change

    * experiment with reward normalization

    * push changes

    * quick push

    * push changes

    * push various changes

    * refactor to use ModelConfig

    * quick change

    * refactor

    * refactor

    * Simplify DS logic

    * quick update

    * remove unnecessary files

    * precommit

    * deepspeed fix; handle edge case when eos_token_id = 0

    * add PPO tldr example

    * add TL;DR example

    * fix undefined var

    * utilize all samples in rloo

    * quick setting

    * remove the unnecessary `value_model`

    * use exact_div

    * allow saving the deepspeed model

    * refactor

    * remove dead code

    * Use some shared utilities

    * add some end-to-end test cases

    * add PPOv2 docs and RLOO docs / tests

    * update docs

    * quikc push

    * fix ci

    * fix type annotation for ci

    * quick update

    * update trainer docs

commit 99f2c94
Author: Sourab Mangrulkar <[email protected]>
Date:   Wed May 15 19:55:46 2024 +0530

    don't cast the trainable lora layers to half precision (#1644)

    * don't cast the trainable lora layers to half precision

    * quality

commit 6401d08
Author: Wing Lian <[email protected]>
Date:   Tue May 14 09:41:07 2024 -0400

    Pairwise Noise Contrastive Alignment (#1632)

    * add NCA paired preference loss

    * chore: lint

    * set more lenient tolerance for integration tests

    * Update tests/test_dpo_trainer.py

    * skip test

    * fix

    ---------

    Co-authored-by: Younes Belkada <[email protected]>
    Co-authored-by: younesbelkada <[email protected]>

commit d632a5b
Author: bartoszzuk <[email protected]>
Date:   Tue May 14 12:25:54 2024 +0200

    Fixed wrong logs prefixes in KTOTrainer (#1641)

    * Fixed wrong logs prefixes in KTOTrainer

    * Pre-commit formating

commit 5aeb752
Author: Tiezhen WANG <[email protected]>
Date:   Fri May 10 23:19:15 2024 +0800

    Update sft_llama2.py to work with the latest API (#1637)

    * Update sft_llama2.py to work with the latest API

    SFTTrainer now takes a STFConfig argument

    * Update dpo_llama2.py

    * precommit

commit b8b8978
Author: Ilya Gusev <[email protected]>
Date:   Fri May 10 15:43:13 2024 +0200

    [ORPO] Correct label mask for pad tokens (#1625)

    * [ORPO] Correct label mask for pad tokens

    Recent [fix](57aebe9) for calculating NLL loss for a whole sequence introduced a bug. When input_ids are copied to labels, pad tokens are not masked.

    This PR aims to path this by masking labels based on the attention mask.

    * -100 -> label_pad_token_id

    Co-authored-by: Kashif Rasul <[email protected]>

    ---------

    Co-authored-by: Kashif Rasul <[email protected]>

commit 8799952
Author: Costa Huang <[email protected]>
Date:   Fri May 10 09:32:20 2024 -0400

    visualize rm prediction (#1636)

    * visualize rm prediction

    * quick update

    * quick check

    * quick fix

    * update eval steps

commit 3b4c249
Author: Xiao Yu <[email protected]>
Date:   Fri May 3 18:19:35 2024 -0400

    fixed adding bos and eos token unconditionally (#1591)

    * fixed adding bos and eos token unconditionally

    * fixed typo of tokenizer -> self.tokenizer. Also added update to ORPO

    * fixed code quality, and added BOS/EOS fix to KTO

    * code reformatting with pre-commit run --all-files

    * bug fix: check input id length before checking for EOS/BOS

commit 0347f58
Author: lewtun <[email protected]>
Date:   Fri May 3 15:59:59 2024 +0200

    Fix ZeRO-3 generation context manager (#1617)
  • Loading branch information
qgallouedec committed Jul 15, 2024
1 parent 9362eff commit e05e0ca
Show file tree
Hide file tree
Showing 90 changed files with 6,337 additions and 757 deletions.
107 changes: 0 additions & 107 deletions .github/workflows/benchmark.yml

This file was deleted.

2 changes: 1 addition & 1 deletion .github/workflows/clear_cache.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ jobs:
runs-on: ubuntu-latest
steps:
- name: Check out code
uses: actions/checkout@v3
uses: actions/checkout@v4

- name: Cleanup
run: |
Expand Down
64 changes: 16 additions & 48 deletions .github/workflows/docker-build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ jobs:
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v1
- name: Check out code
uses: actions/checkout@v3
uses: actions/checkout@v4
- name: Login to DockerHub
uses: docker/login-action@v1
with:
Expand All @@ -45,30 +45,14 @@ jobs:
push: true
tags: huggingface/trl-latest-gpu

- name: Post to a Slack channel
id: slack
#uses: slackapi/[email protected]
uses: slackapi/slack-github-action@6c661ce58804a1a20f6dc5fbee7f0381b469e001
- name: Post to Slack
if: always()
uses: huggingface/hf-workflows/.github/actions/post-slack@main
with:
# Slack channel id, channel name, or user id to post message.
# See also: https://api.slack.com/methods/chat.postMessage#channels
channel-id: ${{ env.CI_SLACK_CHANNEL }}
# For posting a rich message using Block Kit
payload: |
{
"text": "trl-latest-gpu Docker Image build result: ${{ job.status }}\n${{ github.event.pull_request.html_url || github.event.head_commit.url }}",
"blocks": [
{
"type": "section",
"text": {
"type": "mrkdwn",
"text": "trl-latest-gpu Docker Image build result: ${{ job.status }}\n${{ github.event.pull_request.html_url || github.event.head_commit.url }}"
}
}
]
}
env:
SLACK_BOT_TOKEN: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }}
slack_channel: ${{ env.CI_SLACK_CHANNEL }}
title: 🤗 Results of the trl-latest-gpu Docker Image build
status: ${{ job.status }}
slack_token: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }}

trl-source:
name: "Latest TRL + HF ecosystem from source"
Expand All @@ -87,7 +71,7 @@ jobs:
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v1
- name: Check out code
uses: actions/checkout@v3
uses: actions/checkout@v4
- name: Login to DockerHub
uses: docker/login-action@v1
with:
Expand All @@ -101,27 +85,11 @@ jobs:
push: true
tags: huggingface/trl-source-gpu

- name: Post to a Slack channel
id: slack
#uses: slackapi/[email protected]
uses: slackapi/slack-github-action@6c661ce58804a1a20f6dc5fbee7f0381b469e001
- name: Post to Slack
if: always()
uses: huggingface/hf-workflows/.github/actions/post-slack@main
with:
# Slack channel id, channel name, or user id to post message.
# See also: https://api.slack.com/methods/chat.postMessage#channels
channel-id: ${{ env.CI_SLACK_CHANNEL }}
# For posting a rich message using Block Kit
payload: |
{
"text": "trl-source-gpu Docker Image build result: ${{ job.status }}\n${{ github.event.pull_request.html_url || github.event.head_commit.url }}",
"blocks": [
{
"type": "section",
"text": {
"type": "mrkdwn",
"text": "trl-source-gpu Docker Image build result: ${{ job.status }}\n${{ github.event.pull_request.html_url || github.event.head_commit.url }}"
}
}
]
}
env:
SLACK_BOT_TOKEN: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }}
slack_channel: ${{ env.CI_SLACK_CHANNEL }}
title: 🤗 Results of the trl-source-gpu Docker Image build
status: ${{ job.status }}
slack_token: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }}
4 changes: 2 additions & 2 deletions .github/workflows/slow-tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ jobs:
run:
shell: bash
steps:
- uses: actions/checkout@v3
- uses: actions/checkout@v4
- name: Pip install
run: |
source activate trl
Expand Down Expand Up @@ -66,7 +66,7 @@ jobs:
run:
shell: bash
steps:
- uses: actions/checkout@v3
- uses: actions/checkout@v4
- name: Pip install
run: |
source activate trl
Expand Down
4 changes: 2 additions & 2 deletions .github/workflows/stale.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,10 +12,10 @@ jobs:
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
steps:
- uses: actions/checkout@v3
- uses: actions/checkout@v4

- name: Setup Python
uses: actions/setup-python@v4
uses: actions/setup-python@v5
with:
python-version: 3.8

Expand Down
33 changes: 8 additions & 25 deletions .github/workflows/tests-main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -16,9 +16,9 @@ jobs:
fail-fast: false
runs-on: ${{ matrix.os }}
steps:
- uses: actions/checkout@v3
- uses: actions/checkout@v4
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v4
uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}
cache: "pip"
Expand All @@ -36,28 +36,11 @@ jobs:
- name: Test with pytest
run: |
make test
- name: Post to a Slack channel
- name: Post to Slack
if: always()
id: slack
#uses: slackapi/[email protected]
uses: slackapi/slack-github-action@6c661ce58804a1a20f6dc5fbee7f0381b469e001
uses: huggingface/hf-workflows/.github/actions/post-slack@main
with:
# Slack channel id, channel name, or user id to post message.
# See also: https://api.slack.com/methods/chat.postMessage#channels
channel-id: ${{ env.CI_SLACK_CHANNEL }}
# For posting a rich message using Block Kit
payload: |
{
"text": "TRL CI on transformers/PEFT main: ${{ job.status }}\n${{ github.event.pull_request.html_url || github.event.head_commit.url }}",
"blocks": [
{
"type": "section",
"text": {
"type": "mrkdwn",
"text": "TRL CI on transformers/PEFT main: ${{ job.status }}\n${{ github.event.pull_request.html_url || github.event.head_commit.url }}"
}
}
]
}
env:
SLACK_BOT_TOKEN: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }}
slack_channel: ${{ env.CI_SLACK_CHANNEL }}
title: 🤗 Results of the TRL CI on transformers/PEFT main
status: ${{ job.status }}
slack_token: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }}
14 changes: 7 additions & 7 deletions .github/workflows/tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -21,15 +21,15 @@ jobs:
python-version: [3.9]

steps:
- uses: actions/checkout@v2
- uses: actions/checkout@v4
with:
fetch-depth: 0
submodules: recursive
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v2
uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}
- uses: pre-commit/action@v2.0.3
- uses: pre-commit/action@v3.0.1
with:
extra_args: --all-files

Expand All @@ -41,9 +41,9 @@ jobs:
os: ['ubuntu-latest', 'windows-latest']
runs-on: ${{ matrix.os }}
steps:
- uses: actions/checkout@v3
- uses: actions/checkout@v4
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v4
uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}
cache: "pip"
Expand All @@ -63,9 +63,9 @@ jobs:
needs: check_code_quality
runs-on: 'ubuntu-latest'
steps:
- uses: actions/checkout@v3
- uses: actions/checkout@v4
- name: Set up Python 3.9
uses: actions/setup-python@v4
uses: actions/setup-python@v5
with:
python-version: '3.9'
cache: "pip"
Expand Down
15 changes: 15 additions & 0 deletions .github/workflows/trufflehog.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
on:
push:

name: Secret Leaks

jobs:
trufflehog:
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Secret Scanning
uses: trufflesecurity/trufflehog@main
2 changes: 1 addition & 1 deletion benchmark/benchmark_level1.sh
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ python benchmark/benchmark.py \
--slurm-template-path benchmark/trl.slurm_template

python benchmark/benchmark.py \
--command "python examples/scripts/reward_modeling.py --model_name_or_path=facebook/opt-350m --output_dir="reward_modeling_anthropic_hh" --per_device_train_batch_size=64 --num_train_epochs=1 --gradient_accumulation_steps=16 --gradient_checkpointing=True --learning_rate=1.41e-5 --report_to="wandb" --remove_unused_columns=False --optim="adamw_torch" --logging_steps=10 --evaluation_strategy="steps" --max_length=512" \
--command "python examples/scripts/reward_modeling.py --model_name_or_path=facebook/opt-350m --output_dir="reward_modeling_anthropic_hh" --per_device_train_batch_size=64 --num_train_epochs=1 --gradient_accumulation_steps=16 --gradient_checkpointing=True --learning_rate=1.41e-5 --report_to="wandb" --remove_unused_columns=False --optim="adamw_torch" --logging_steps=10 --eval_strategy="steps" --max_length=512" \
--num-seeds 3 \
--start-seed 1 \
--workers 10 \
Expand Down
6 changes: 6 additions & 0 deletions docs/source/_toctree.yml
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,10 @@
title: Supervised Fine-Tuning
- local: ppo_trainer
title: PPO Trainer
- local: ppov2_trainer
title: PPOv2 Trainer
- local: rloo_trainer
title: RLOO Trainer
- local: best_of_n
title: Best of N Sampling
- local: dpo_trainer
Expand All @@ -37,6 +41,8 @@
title: CPO Trainer
- local: ddpo_trainer
title: Denoising Diffusion Policy Optimization
- local: alignprop_trainer
title: AlignProp Trainer
- local: orpo_trainer
title: ORPO Trainer
- local: iterative_sft_trainer
Expand Down
Loading

0 comments on commit e05e0ca

Please sign in to comment.