-
Notifications
You must be signed in to change notification settings - Fork 433
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Only half the bandwidth is available in rdma bond mode #10430
Comments
|
Can you pls check if UCX_IB_NUM_PATHS=2 helps? (in case there was a problem with bond device detection) |
@yosefe Hi, It doesn't look effective. client:
server:
|
What about UCX_IB_NUM_PATHS=8? |
@yosefe Both server and client are added. It's not working.
|
Seems that protocol is still using 2 lanes. |
@yosefe Still not working. client:
server:
|
@ivanallen you mentioned ib_send_bw gets 200Gbs. Does ib_read_bw also get 200 Gbs? |
@yosefe read doesn't look as good as send, but it can still exceed 100Gbps
|
Can you try with |
@yosefe Still not working. client:
server
|
@ivanallen can you pls try appying the following patch to ucx and test with diff --git a/src/uct/ib/mlx5/dv/ib_mlx5_dv.c b/src/uct/ib/mlx5/dv/ib_mlx5_dv.c
index 4da30d4f9a..ce2c9be2bd 100644
--- a/src/uct/ib/mlx5/dv/ib_mlx5_dv.c
+++ b/src/uct/ib/mlx5/dv/ib_mlx5_dv.c
@@ -469,6 +469,8 @@ void uct_ib_mlx5_devx_set_qpc_port_affinity(uct_ib_mlx5_md_t *md,
uct_ib_device_t *dev = &md->super.dev;
uint8_t tx_port = dev->first_port;
+ return;
+
if (!(md->flags & UCT_IB_MLX5_MD_FLAG_LAG)) {
return;
} |
@yosefe I've tried that. It doesn't seem to work. Using multiple threads also doesn't improve bandwidth. |
@ivanallen what about |
@yosefe Not working too. client
server:
|
@yosefe I added |
@yosefe Hi! We suspect it has something to do with the configuration of the switch. However, we also capture ib packets to check the distribution of source ports. There is a significant difference between ucx and ib_send_bw. |
|
Hi @changchengx Thank you for your reply. We have now basically confirmed that it is related to the network configuration. We are preparing to reconfigure our network using MLAG to set up our switches. |
Describe the bug
In bond mode, the bandwidth is only 100Gbps. The expected bandwidth is 200Gbps. Using 2 threads does not improve performance.
Steps to Reproduce
Setup and versions
MLNX_OFED_LINUX-5.8-4.1.5.0
ibstat
oribv_devinfo -vv
commandAdditional information (depending on the issue)
client
server
The text was updated successfully, but these errors were encountered: