Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
init: Avoid hang by forcing SENDRECV in case of neuron v4 API usage
v4 API may block infinitively when executed with RDMA protocol because communicator creation is (a) blocking operation by definition of v4 API and (b) performing 4-way handshake in case of RDMA protocol. Therefore, we force it to use SENDRECV protocol in case neuron specific API is used. We do not force SENDRECV protocol in case of NCCL API, since there is no known platform that uses RDMA protocol with v4 API. Note, on P5 instances, NCCL needs to needs to support more recent API anyways. Signed-off-by: Michael Axtmann <[email protected]>
- Loading branch information