AS_SEND_LAT has a bit impact on network busbw achieved #67

YangZhou1997 · 2025-01-11T00:44:14Z

Hi SimAI team,

I find that the value of AS_SEND_LAT can impact a lot on the network busbw achieved when running this HPN 7.0 architecture VS DCN+ architecture example that mainly test the datacenter network communication.

For example, with AS_SEND_LAT value of 2 (microsecond), 512MB allreduce among 256 GPUs achieves 45.7 GB/s. But with default AS_SEND_LAT value of 6, the performance becomes 2.5 GB/s. So I wonder why the packet sending latency impacts so much on the achieved network bandwidth? What should be the right value for mimicking real-world collectives?

Another side question is: I saw the current gen_HPN_7.0_topo_mulgpus_one_link.py only generates two-layer datacenter networking topology (correct me if I am wrong). Is there any script I can use to test three-layer topology as described in the HPN sigcomm paper?

Best,
Yang

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AS_SEND_LAT has a bit impact on network busbw achieved #67

AS_SEND_LAT has a bit impact on network busbw achieved #67

YangZhou1997 commented Jan 11, 2025

AS_SEND_LAT has a bit impact on network busbw achieved #67

AS_SEND_LAT has a bit impact on network busbw achieved #67

Comments

YangZhou1997 commented Jan 11, 2025