GRPO #2625

qgallouedec · 2025-01-23T16:58:46Z

No description provided.

qgallouedec · 2025-01-24T14:49:17Z

from datasets import load_dataset
from trl import GRPOConfig, GRPOTrainer

batch_size = 4
gradient_accumulation_steps = 2
output_dir = f"GRPO-bsz{batch_size}-grad_acc{gradient_accumulation_steps}-fixed"

training_args = GRPOConfig(
    output_dir=output_dir,
    per_device_train_batch_size=batch_size,
    gradient_accumulation_steps=gradient_accumulation_steps,
    logging_steps=2,
)

dummy_dataset = load_dataset("trl-internal-testing/zen", "standard_prompt_only")

def reward_len(prompts, completions):
    return [len(completion) for completion in completions]

trainer = GRPOTrainer(
    model="trl-internal-testing/tiny-Qwen2ForCausalLM-2.5",
    reward_funcs=reward_len,
    args=training_args,
    train_dataset=dummy_dataset["train"],
)

trainer.train()

qgallouedec mentioned this issue Jan 23, 2025

[Tracking issue] Wrong loss scaling when accumulating gradient #2617

Open

18 tasks

qgallouedec self-assigned this Jan 23, 2025

qgallouedec linked a pull request Jan 24, 2025 that will close this issue

🥞 Fix GRPO gradient accumulation loss scaling #2647

Merged

5 tasks

qgallouedec closed this as completed in #2647 Jan 24, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GRPO #2625

GRPO #2625

qgallouedec commented Jan 23, 2025

qgallouedec commented Jan 24, 2025

GRPO #2625

GRPO #2625

Comments

qgallouedec commented Jan 23, 2025

qgallouedec commented Jan 24, 2025