Skip to content

Commit

Permalink
Add in FI_EFA_SET_CUDA_SYNC_MEMOPS to efa cheatsheet
Browse files Browse the repository at this point in the history
Signed-off-by: Sean Smith <[email protected]>
  • Loading branch information
sean-smith committed Mar 6, 2024
1 parent 571a598 commit b1cafd0
Showing 1 changed file with 2 additions and 1 deletion.
3 changes: 2 additions & 1 deletion 1.architectures/efa-cheatsheet.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,11 +10,12 @@ versions of your libfabric.
| `FI_EFA_USE_HUGE_PAGE=0` | Set to 0 when you see `os.fork()` causes `OSError: Cannot allocate memory`. Typically happen by multi-process PyTorch data loader. Disabling huge page causes minor performance hit, but it's needed to prevent fork fails due to the operating system running out of huge pages. |
| `FI_EFA_FORK_SAFE=1` | Not needed for kernel>=5.15. Still fine to set it though no effect. See [ref](https://github.com/ofiwg/libfabric/pull/9112). |
| `FI_EFA_USE_DEVICE_RDMA=1` | Do not set for libfabric>=1.18.0 and aws-ofi-nccl>=1.7.0. It's not harmful to set it on p4/p5 on the newer software, but you just don't have to set it. |
| `FI_EFA_SET_CUDA_SYNC_MEMOPS` | Set this to `0` if you see the error `register_rail_mr_buffer:617 NCCL WARN NET/OFI Unable to register memory (type = 2) for device 4. RC: -22, Error: Invalid argument`. |
| `FI_EFA_ENABLE_SHM_TRANSFER=1` | Not needed. This is really a no-op, the default already to enable SHMEM |
| `FI_PROVIDER=efa` | Use for aws-ofi-nccl<=1.5.0 AND p4/p5 instances. |
| `NCCL_PROTO=simple` | Use for aws-ofi-nccl<=1.5.0 and p4/p5 instances. |
| `NCCL_SOCKET_NTHREADS` | Not applicable for EFA. |
| `NCCL_SOCKET_IFNAME` | Set this to `en` to cover both `p5.48xlarge` and `p4d(e).24xlarge`. For other instances check `ifconfig` to see the active network interface. |
| `NCCL_SOCKET_IFNAME` | Set this to `en` to cover both `p5.48xlarge` and `p4d(e).24xlarge`. For other instances check `ifconfig` to see the active network interface. |
| `NCCL_NSOCKS_PERTHREAD` | Not applicable for EFA. |
| `NCCL_MIN_CHANNELS=xxx` | Recommend to leave it out to use the default. For e.g., on p4d/p4de, the number of channels should be 8, which is the minimum for a 4-NIC platform. The reduction message is split by number of GPUs in the job, then the number of channels, so having more channels than necessary causes smaller messages which causes EFA to be starved for data. |
| `NCCL_BUFFSIZE=xxx` | Recommend to leave it out to use the default. |
Expand Down

0 comments on commit b1cafd0

Please sign in to comment.