Skip to content

Commit

Permalink
Fix typo in brats_training_ddp.py (#1678)
Browse files Browse the repository at this point in the history
Address comments in #1666

### Checks
<!--- Put an `x` in all the boxes that apply, and remove the not
applicable items -->
- [ ] Avoid including large-size files in the PR.
- [ ] Clean up long text outputs from code cells in the notebook.
- [ ] For security purposes, please check the contents and remove any
sensitive info such as user names and private key.
- [ ] Ensure (1) hyperlinks and markdown anchors are working (2) use
relative paths for tutorial repo files (3) put figure and graphs in the
`./figure` folder
- [ ] Notebook runs automatically `./runner.sh -t <path to .ipynb file>`

Signed-off-by: YunLiu <[email protected]>
  • Loading branch information
KumoLiu authored Mar 27, 2024
1 parent 3a016a0 commit d2c1f05
Showing 1 changed file with 2 additions and 2 deletions.
4 changes: 2 additions & 2 deletions acceleration/distributed_training/brats_training_ddp.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@
`--nnodes=NUM_NODES`
`--master_addr="localhost"`
`--master_port=1234`
For more details, refer to https://github.com/pytorch/pytorch/blob/master/torch/distributed/run.py.
For more details, refer to https://github.com/pytorch/pytorch/blob/main/torch/distributed/run.py.
Alternatively, we can also use `torch.multiprocessing.spawn` to start program, but it that case, need to handle
all the above parameters and compute `rank` manually, then set to `init_process_group`, etc.
`torchrun` is even more efficient than `torch.multiprocessing.spawn` during training.
Expand All @@ -42,7 +42,7 @@
Suggest setting exactly the same software environment for every node, especially `PyTorch`, `nccl`, etc.
A good practice is to use the same MONAI docker image for all nodes directly.
Example script to execute this program on every node:
python -m torchrun --nproc_per_node=NUM_GPUS_PER_NODE --nnodes=NUM_NODES
torchrun --nproc_per_node=NUM_GPUS_PER_NODE --nnodes=NUM_NODES
--master_addr="localhost" --master_port=1234 brats_training_ddp.py -d DIR_OF_TESTDATA
This example was tested with [Ubuntu 16.04/20.04], [NCCL 2.6.3].
Expand Down

0 comments on commit d2c1f05

Please sign in to comment.