We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
--num_gpus
To reproduce:
accelerate launch --mixed_precision bf16 --use_deepspeed --deepspeed_multinode_launcher openmpi --deepspeed_hostfile /etc/mpi/hostfile --same_network --monitor_interval "30" --num_machines "2" --num_processes "2" --main_process_port 29500 --machine_rank "-1" train.py
Which translates to
deepspeed --no_local_rank --hostfile /etc/mpi/hostfile --launcher openmpi --master_port 29500 --num_gpus 1 train.py --deepspeed ./deepspeed_configs/5_ds_z3_config.json --model_name_or_path Qwen/Qwen2.5-0.5B --dataset_name trl-lib/Capybara --learning_rate 2.0e-5 --num_train_epochs 1 --packing --per_device_train_batch_size 2 --gradient_accumulation_steps 8 --gradient_checkpointing --logging_steps 25 --eval_strategy steps --eval_steps 100 --output_dir /mnt/shared/Qwen2-0.5B-SFT --use_liger
Which will lead to
openmpi backend does not support limiting num nodes/gpus
Deepspeed error reference:
https://github.com/microsoft/DeepSpeed/blob/3573858e7ce2c723b8c43231c6c6b0cf97dca2fc/deepspeed/launcher/multinode_runner.py#L141C44-L141C92
Accelerate should skip setting num gpus for openmpi based launchers in the else clause here:
accelerate/src/accelerate/utils/launch.py
Lines 323 to 341 in d6d3e03
The text was updated successfully, but these errors were encountered:
No branches or pull requests
To reproduce:
Which translates to
Which will lead to
Deepspeed error reference:
https://github.com/microsoft/DeepSpeed/blob/3573858e7ce2c723b8c43231c6c6b0cf97dca2fc/deepspeed/launcher/multinode_runner.py#L141C44-L141C92
Accelerate should skip setting num gpus for openmpi based launchers in the else clause here:
accelerate/src/accelerate/utils/launch.py
Lines 323 to 341 in d6d3e03
The text was updated successfully, but these errors were encountered: