Skip to content

Commit

Permalink
Merge branch 'huggingface:main' into main
Browse files Browse the repository at this point in the history
  • Loading branch information
idanshen authored Jun 17, 2024
2 parents d072b34 + d1ed730 commit 32603cc
Show file tree
Hide file tree
Showing 15 changed files with 142 additions and 261 deletions.
60 changes: 14 additions & 46 deletions .github/workflows/docker-build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -45,30 +45,14 @@ jobs:
push: true
tags: huggingface/trl-latest-gpu

- name: Post to a Slack channel
id: slack
#uses: slackapi/[email protected]
uses: slackapi/slack-github-action@6c661ce58804a1a20f6dc5fbee7f0381b469e001
- name: Post to Slack
if: always()
uses: huggingface/hf-workflows/.github/actions/post-slack@main
with:
# Slack channel id, channel name, or user id to post message.
# See also: https://api.slack.com/methods/chat.postMessage#channels
channel-id: ${{ env.CI_SLACK_CHANNEL }}
# For posting a rich message using Block Kit
payload: |
{
"text": "trl-latest-gpu Docker Image build result: ${{ job.status }}\n${{ github.event.pull_request.html_url || github.event.head_commit.url }}",
"blocks": [
{
"type": "section",
"text": {
"type": "mrkdwn",
"text": "trl-latest-gpu Docker Image build result: ${{ job.status }}\n${{ github.event.pull_request.html_url || github.event.head_commit.url }}"
}
}
]
}
env:
SLACK_BOT_TOKEN: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }}
slack_channel: ${{ env.CI_SLACK_CHANNEL }}
title: 🤗 Results of the trl-latest-gpu Docker Image build
status: ${{ job.status }}
slack_token: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }}

trl-source:
name: "Latest TRL + HF ecosystem from source"
Expand Down Expand Up @@ -101,27 +85,11 @@ jobs:
push: true
tags: huggingface/trl-source-gpu

- name: Post to a Slack channel
id: slack
#uses: slackapi/[email protected]
uses: slackapi/slack-github-action@6c661ce58804a1a20f6dc5fbee7f0381b469e001
- name: Post to Slack
if: always()
uses: huggingface/hf-workflows/.github/actions/post-slack@main
with:
# Slack channel id, channel name, or user id to post message.
# See also: https://api.slack.com/methods/chat.postMessage#channels
channel-id: ${{ env.CI_SLACK_CHANNEL }}
# For posting a rich message using Block Kit
payload: |
{
"text": "trl-source-gpu Docker Image build result: ${{ job.status }}\n${{ github.event.pull_request.html_url || github.event.head_commit.url }}",
"blocks": [
{
"type": "section",
"text": {
"type": "mrkdwn",
"text": "trl-source-gpu Docker Image build result: ${{ job.status }}\n${{ github.event.pull_request.html_url || github.event.head_commit.url }}"
}
}
]
}
env:
SLACK_BOT_TOKEN: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }}
slack_channel: ${{ env.CI_SLACK_CHANNEL }}
title: 🤗 Results of the trl-source-gpu Docker Image build
status: ${{ job.status }}
slack_token: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }}
29 changes: 6 additions & 23 deletions .github/workflows/tests-main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -36,28 +36,11 @@ jobs:
- name: Test with pytest
run: |
make test
- name: Post to a Slack channel
- name: Post to Slack
if: always()
id: slack
#uses: slackapi/[email protected]
uses: slackapi/slack-github-action@6c661ce58804a1a20f6dc5fbee7f0381b469e001
uses: huggingface/hf-workflows/.github/actions/post-slack@main
with:
# Slack channel id, channel name, or user id to post message.
# See also: https://api.slack.com/methods/chat.postMessage#channels
channel-id: ${{ env.CI_SLACK_CHANNEL }}
# For posting a rich message using Block Kit
payload: |
{
"text": "TRL CI on transformers/PEFT main: ${{ job.status }}\n${{ github.event.pull_request.html_url || github.event.head_commit.url }}",
"blocks": [
{
"type": "section",
"text": {
"type": "mrkdwn",
"text": "TRL CI on transformers/PEFT main: ${{ job.status }}\n${{ github.event.pull_request.html_url || github.event.head_commit.url }}"
}
}
]
}
env:
SLACK_BOT_TOKEN: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }}
slack_channel: ${{ env.CI_SLACK_CHANNEL }}
title: 🤗 Results of the TRL CI on transformers/PEFT main
status: ${{ job.status }}
slack_token: ${{ secrets.SLACK_CIFEEDBACK_BOT_TOKEN }}
4 changes: 1 addition & 3 deletions docs/source/dpo_trainer.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -109,8 +109,6 @@ The [cDPO](https://ericmitchell.ai/cdpo.pdf) is a tweak on the DPO loss where we

The [Robust DPO](https://arxiv.org/abs/2403.00409) authors propose an unbiased estimate of the DPO loss that is robust to preference noise in the data. Like in cDPO, assume that the preference labels are noisy with some probability that can be passed to the `DPOTrainer` via `label_smoothing` argument (between 0 and 0.5). Use `loss_type="robust"` to the trainer to use it.

The [KTO](https://arxiv.org/abs/2402.01306) authors directly maximize the utility of LLM generations instead of the log-likelihood of preferences. To use preference data with KTO, we recommend breaking up the n preferences into 2n examples and using [`KTOTrainer`](kto_trainer) (i.e., treating the data like an unpaired feedback dataset). Although it is possible to pass in `loss_type="kto_pair"` into DPOTrainer, this is a highly simplified version of KTO that we *do not recommend* in most cases. Please use [`KTOTrainer`](kto_trainer) when possible.

The [BCO](https://arxiv.org/abs/2404.04656) authors train a binary classifier whose logit serves as a reward so that the classifier maps {prompt, chosen completion} pairs to 1 and {prompt, rejected completion} pairs to 0. The `DPOTrainer` can be switched to this loss via the `loss_type="bco_pair"` argument.

The [SPPO](https://arxiv.org/abs/2405.00675) authors claim that SPPO is capable of solving the Nash equilibrium iteratively by pushing the chosen rewards to be as large as 1/2 and the rejected rewards to be as small as -1/2 and can alleviate data sparsity issues. The implementation using loss_type="sppo_hard" approximates this algorithm by employing hard label probabilities, assigning 1 to the winner and 0 to the loser.
Expand All @@ -121,7 +119,7 @@ The [TR-DPO](https://arxiv.org/pdf/2404.09656) paper suggests syncing the refere

The [RPO](https://arxiv.org/abs/2404.19733) paper implements an iterative preference tuning algorithm using a loss related to the RPO loss in this [paper](https://arxiv.org/abs/2405.16436) that essentially consists of the SFT loss on the chosen preferences together with a weighted DPO loss. To use this loss set the `rpo_alpha` in the `DPOConfig` to an appropriate value.

The [AOT](https://arxiv.org/abs/2406.05882) authors propose to use Distributional Preference Alignment Via Optimal Transport. Traditionally, the alignment algorithms use paired preferences at a sample level, which does not ensure alignment on the distributional level. AOT, on the other hand, can align LLMs on paired or unpaired preference data by making the reward distribution of the positive samples stochastically dominant in the first order on the distribution of negative samples. Specifically, `loss_type="aot"` is appropriate for paired datasets, where each prompt has both chosen and rejected responses; `loss_type="aot_pair"` is for unpaired datasets. Note that `loss_type="aot_pair"` is similar in spirit to `loss_type="kto_pair"` that applies unpaired alignment methodology on paired dataset. In a nutshell, `loss_type="aot"` ensures that the log-likelihood ratio of chosen to rejected of the aligned model has higher quantiles than that ratio for the reference model. `loss_type="aot_pair"` ensures that the chosen reward is higher on all quantiles than the rejected reward. Note that in both cases quantiles are obtained via sorting. To fully leverage the advantages of the AOT algorithm, it is important to maximize the per-GPU batch size.
The [AOT](https://arxiv.org/abs/2406.05882) authors propose to use Distributional Preference Alignment Via Optimal Transport. Traditionally, the alignment algorithms use paired preferences at a sample level, which does not ensure alignment on the distributional level. AOT, on the other hand, can align LLMs on paired or unpaired preference data by making the reward distribution of the positive samples stochastically dominant in the first order on the distribution of negative samples. Specifically, `loss_type="aot"` is appropriate for paired datasets, where each prompt has both chosen and rejected responses; `loss_type="aot_pair"` is for unpaired datasets. In a nutshell, `loss_type="aot"` ensures that the log-likelihood ratio of chosen to rejected of the aligned model has higher quantiles than that ratio for the reference model. `loss_type="aot_pair"` ensures that the chosen reward is higher on all quantiles than the rejected reward. Note that in both cases quantiles are obtained via sorting. To fully leverage the advantages of the AOT algorithm, it is important to maximize the per-GPU batch size.

## Logging

Expand Down
2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,7 @@
REQUIRED_PKGS = [
"torch>=1.4.0",
"transformers>=4.31.0",
"numpy>=1.18.2",
"numpy>=1.18.2,<2.0.0",
"accelerate",
"datasets",
"tyro>=0.5.11",
Expand Down
2 changes: 1 addition & 1 deletion tests/slow/testing_constants.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,5 +23,5 @@
GRADIENT_CHECKPOINTING_KWARGS = [None, {"use_reentrant": False}, {"use_reentrant": True}]
DEVICE_MAP_OPTIONS = [{"": 0}, "auto"]

DPO_LOSS_TYPES = ["sigmoid", "ipo", "kto_pair"]
DPO_LOSS_TYPES = ["sigmoid", "ipo"]
DPO_PRECOMPUTE_LOGITS = [True, False]
6 changes: 0 additions & 6 deletions tests/test_dpo_trainer.py
Original file line number Diff line number Diff line change
Expand Up @@ -86,8 +86,6 @@ def _init_dummy_dataset(self):
["t5", "hinge", False],
["gpt2", "ipo", False],
["t5", "ipo", True],
["gpt2", "kto_pair", True],
["t5", "kto_pair", False],
["gpt2", "aot_pair", True],
["t5", "aot_pair", False],
["gpt2", "aot", True],
Expand Down Expand Up @@ -506,10 +504,6 @@ def test_dpo_lora_bf16_autocast_llama(self):
["gpt2", "ipo", False, True],
["gpt2", "ipo", True, False],
["gpt2", "ipo", True, True],
["gpt2", "kto_pair", False, False],
["gpt2", "kto_pair", False, True],
["gpt2", "kto_pair", True, False],
["gpt2", "kto_pair", True, True],
["gpt2", "aot_pair", False, False],
["gpt2", "aot_pair", False, True],
["gpt2", "aot_pair", True, False],
Expand Down
Loading

0 comments on commit 32603cc

Please sign in to comment.