-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question about link_num in Generated Topology (HPN 7.0) #57
Comments
In the code, the total number of links is calculated using the following formula:
I noticed that in the first term:
Could you kindly clarify the reasoning behind this division? Does it represent a specific connection pattern, such as each ASW connecting to only half the PSWs? If this division is correct, the link_num of 20 makes sense, but I want to ensure I’m understanding the logic properly. Thanks so much! |
Hi, thanks for your question! For HPN7.0, dual plane is applied in the topology, so the input number of PSW is the total number of which in dual plane. And within single plane topology, the first half psw will be used. So here if you want to generate topology with 24 links, the -psn should be 2. There is some ambiguity, and we will make improvements in future versions. |
Thank you for the clarification! I ran the following command to generate the topology:
The resulting topology file content is as follows:
This command generates a topology that utilizes all links across the dual-plane configuration. The resulting link_num is 24, and all the PSWs and their associated links are correctly included. However, I have a question regarding the documentation and the behavior of the topology generator: Documentation Clarity
Thank you for considering this feedback, and I look forward to further improvements in future versions! 🙏 |
Thank you for your valuable suggestions. We would be delighted if you could directly submit a Pull Request to our code repository based on your ideas. We will review it promptly. |
Thank you for your kind response! I have submitted a Pull Request based on the discussed suggestions: PR #61. Please feel free to review it at your convenience. Let me know if there are any further changes or clarifications needed. Thanks again for your support! |
Have merged, if any issues arise, please feel free to contact us again. |
Thank you for pointing out this. I struggled to understand why my simulation was crashing on a simple topology. Note that this behavior was not clear to me even if I was following the tutorial updated by @beatlesforever. I guess the main reason for the confusion is that script
I would suggest you document this behavior directly in the options description and make it also explicit in the tutorial (see PR #66). Eventually, it would be ideal to have the two parameters behave in a congruent way (i.e., if I hope this can help others in the future. |
I have a question about the link_num in the generated HPN 7.0 topology. According to the generated topology file:
The file states that there are 20 links. However, based on the connection logic:
GPU to NVSwitch: There are 8 GPUs (0–7), each connected to one NVSwitch (node 8). This adds up to 8 links.
GPU to ASW: Each GPU (0–7) is also connected to an Aggregation Switch (ASW, nodes 9–16). This should also add 8 links.
ASW to PSW: Each ASW (nodes 9–16) is connected to the Pod Switch (PSW, node 17). This should add 8 links.
However, the topology file specifies only 20 links, which seems to leave out some connections. I may be misunderstanding the logic, so I’d appreciate clarification on the following points:
1.How is the link_num value of 20 calculated?
2.Are there specific links that are excluded in this count? If so, which ones and why?
The text was updated successfully, but these errors were encountered: