About the computation of total training steps （关于训练step数量计算） #122

SpursGoZmy · 2025-02-21T11:02:37Z

I was training Qwen2-VL-2B-instruct on 8 80G GPU (7 for training and 1 for vllm). The training dataset is the authors' provided GEOQA_R1V_Train_8K dataset (including 8,031 samples in total).
I set the per_device_train_batch_size=1, gradient_accumulation_steps=4 and num_train_epochs=1. In my understanding, the global train batch size would be 1*4*7=28 and the total training steps should be 8031*1/28≈286.82. But the training log gives me a total training step of 2007:

0%| | 2/2007 [01:19<22:19:48, 40.09s/it]
{'loss': 0.0, 'grad_norm': 2.0027942657470703, 'learning_rate': 9.990034877927254e-07, 'rewards/accuracy_reward': 0.0714285746216774, 'rewards/format_reward': 0.0357142873108387, 'reward': 0.1071428619325161, 'reward_std': 0.28347335010766983, 'completion_length': 335.75001525878906, 'kl': 0.0005685782016371377, 'epoch': 0.0}

Is there something wrong with the python script? or Did I get it wrong?
My training scripts is:

#!/bin/bash

# The latest vllm==0.7.2 is required for this script: pip3 install vllm==0.7.2 

export DEBUG_MODE="true"
RUN_NAME="Qwen2-VL-2B-GRPO-GEOQA_R1V_Train_8K-1_epoch_bs_28"
export LOG_PATH="./log_dir/reward_log/${RUN_NAME}_Reward_Log.txt"
QWEN_PATH="/root/env_run/LLM_weights/Qwen2-VL-2B-Instruct"
HF_DATASET="/root/env_run/R1-V-main/dataset/GEOQA_R1V_Train_8K" 
OUTPUT_DIR="./saved_models/${RUN_NAME}" 


CUDA_VISIBLE_DEVICES="0,1,2,3,4,5,6,7" WANDB_MODE=offline torchrun \
    --nproc_per_node="7" \
    --nnodes="1" \
    --node_rank="0" \
    --master_addr="127.0.0.1" \
    --master_port="12345" \
    r1-v/src/open_r1/grpo.py \
    --use_vllm True \
    --output_dir $OUTPUT_DIR \
    --model_name_or_path $QWEN_PATH \
    --dataset_name $HF_DATASET \
    --max_prompt_length 2048 \
    --max_completion_length 8192 \
    --temperature 1.0 \
    --num_generations 7 \
    --per_device_train_batch_size 1 \
    --gradient_accumulation_steps 4 \
    --logging_steps 1 \
    --bf16  \
    --gradient_checkpointing true \
    --attn_implementation flash_attention_2 \
    --max_pixels 401408 \
    --num_train_epochs 1 \
    --run_name $RUN_NAME \
    --save_steps 500 \
    --save_only_model true \
    --report_to wandb

The text was updated successfully, but these errors were encountered:

tcy6 · 2025-02-24T03:34:17Z

Hi @SpursGoZmy, I found that the steps should be computed as 8031 / 4 = 2007.75. Regarding the 7 generations, each card is assigned to one generation, which means that during a single forward and backward pass, the batch size is effectively 1 rather than 7. This detail is the key point I discovered when reading the code.

SpursGoZmy · 2025-02-24T04:59:15Z

Hi @SpursGoZmy, I found that the steps should be computed as 8031 / 4 = 2007.75. Regarding the 7 generations, each card is assigned to one generation, which means that during a single forward and backward pass, the batch size is effectively 1 rather than 7. This detail is the key point I discovered when reading the code.

This is gold. Thank you very much for your help! I will look into this to understand it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About the computation of total training steps （关于训练step数量计算） #122

About the computation of total training steps （关于训练step数量计算） #122

SpursGoZmy commented Feb 21, 2025

tcy6 commented Feb 24, 2025

SpursGoZmy commented Feb 24, 2025

About the computation of total training steps （关于训练step数量计算） #122

About the computation of total training steps （关于训练step数量计算） #122

Comments

SpursGoZmy commented Feb 21, 2025

tcy6 commented Feb 24, 2025

SpursGoZmy commented Feb 24, 2025