Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AS_SEND_LAT has a bit impact on network busbw achieved #67

Open
YangZhou1997 opened this issue Jan 11, 2025 · 0 comments
Open

AS_SEND_LAT has a bit impact on network busbw achieved #67

YangZhou1997 opened this issue Jan 11, 2025 · 0 comments

Comments

@YangZhou1997
Copy link

Hi SimAI team,

I find that the value of AS_SEND_LAT can impact a lot on the network busbw achieved when running this HPN 7.0 architecture VS DCN+ architecture example that mainly test the datacenter network communication.

For example, with AS_SEND_LAT value of 2 (microsecond), 512MB allreduce among 256 GPUs achieves 45.7 GB/s. But with default AS_SEND_LAT value of 6, the performance becomes 2.5 GB/s. So I wonder why the packet sending latency impacts so much on the achieved network bandwidth? What should be the right value for mimicking real-world collectives?

Another side question is: I saw the current gen_HPN_7.0_topo_mulgpus_one_link.py only generates two-layer datacenter networking topology (correct me if I am wrong). Is there any script I can use to test three-layer topology as described in the HPN sigcomm paper?

Best,
Yang

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant