Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only half the bandwidth is available in rdma bond mode #10430

Open
ivanallen opened this issue Jan 20, 2025 · 19 comments
Open

Only half the bandwidth is available in rdma bond mode #10430

ivanallen opened this issue Jan 20, 2025 · 19 comments
Labels

Comments

@ivanallen
Copy link

ivanallen commented Jan 20, 2025

Describe the bug

In bond mode, the bandwidth is only 100Gbps. The expected bandwidth is 200Gbps. Using 2 threads does not improve performance.

Image

Steps to Reproduce

  • Command line
# server
UCX_NET_DEVICES=mlx5_bond_0:1 UCX_TLS=rc  UCX_PROTO_ENABLE=y UCX_PROTO_INFO=y ./install-release-mt/bin/ucx_perftest

# client
UCX_NET_DEVICES=mlx5_bond_0:1  UCX_PROTO_ENABLE=y UCX_TLS=rc  UCX_PROTO_INFO=y ./install-release-mt/bin/ucx_perftest 10.16.29.12 -t ucp_am_bw -s 1048576  -n 5000000
  • UCX version used: 1.18.0-rc3
  • Any UCX environment variables used

Setup and versions

  • OS version (e.g Linux distro) + CPU architecture (x86_64/aarch64/ppc64le/...)
Linux localhost.localdomain 5.14.0-162.nos.4.el8.x86_64 #1 SMP PREEMPT_DYNAMIC Thu Nov 24 07:51:00 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
  • For RDMA/IB/RoCE related issues:
    • Driver version:
      MLNX_OFED_LINUX-5.8-4.1.5.0
    • HW information from ibstat or ibv_devinfo -vv command
[root@node12 ucx-1.18.0]# ibv_devinfo -vvv
hca_id: mlx5_bond_0
        transport:                      InfiniBand (0)
        fw_ver:                         22.36.1010
        node_guid:                      a088:c203:00b4:87e6
        sys_image_guid:                 a088:c203:00b4:87e6
        vendor_id:                      0x02c9
        vendor_part_id:                 4125
        hw_ver:                         0x0
        board_id:                       MT_0000000359
        phys_port_cnt:                  1
        max_mr_size:                    0xffffffffffffffff
        page_size_cap:                  0xfffffffffffff000
        max_qp:                         131072
        max_qp_wr:                      32768
        device_cap_flags:               0xed721c36
                                        BAD_PKEY_CNTR
                                        BAD_QKEY_CNTR
                                        AUTO_PATH_MIG
                                        CHANGE_PHY_PORT
                                        PORT_ACTIVE_EVENT
                                        SYS_IMAGE_GUID
                                        RC_RNR_NAK_GEN
                                        MEM_WINDOW
                                        XRC
                                        MEM_MGT_EXTENSIONS
                                        MEM_WINDOW_TYPE_2B
                                        RAW_IP_CSUM
                                        MANAGED_FLOW_STEERING
                                        Unknown flags: 0xC8400000
        max_sge:                        30
        max_sge_rd:                     30
        max_cq:                         16777216
        max_cqe:                        4194303
        max_mr:                         16777216
        max_pd:                         8388608
        max_qp_rd_atom:                 16
        max_ee_rd_atom:                 0
        max_res_rd_atom:                2097152
        max_qp_init_rd_atom:            16
        max_ee_init_rd_atom:            0
        atomic_cap:                     ATOMIC_HCA (1)
        max_ee:                         0
        max_rdd:                        0
        max_mw:                         16777216
        max_raw_ipv6_qp:                0
        max_raw_ethy_qp:                0
        max_mcast_grp:                  2097152
        max_mcast_qp_attach:            240
        max_total_mcast_qp_attach:      503316480
        max_ah:                         2147483647
        max_fmr:                        0
        max_srq:                        8388608
        max_srq_wr:                     32767
        max_srq_sge:                    31
        max_pkeys:                      128
        local_ca_ack_delay:             16
        general_odp_caps:
                                        ODP_SUPPORT
                                        ODP_SUPPORT_IMPLICIT
        rc_odp_caps:
                                        SUPPORT_SEND
                                        SUPPORT_RECV
                                        SUPPORT_WRITE
                                        SUPPORT_READ
                                        SUPPORT_SRQ
        uc_odp_caps:
                                        NO SUPPORT
        ud_odp_caps:
                                        SUPPORT_SEND
        xrc_odp_caps:
                                        SUPPORT_SEND
                                        SUPPORT_WRITE
                                        SUPPORT_READ
                                        SUPPORT_SRQ
        completion timestamp_mask:                      0x7fffffffffffffff
        hca_core_clock:                 1000000kHZ
        raw packet caps:
                                        C-VLAN stripping offload
                                        Scatter FCS offload
                                        IP csum offload
                                        Delay drop
        device_cap_flags_ex:            0x30000054ED721C36
                                        RAW_SCATTER_FCS
                                        PCI_WRITE_END_PADDING
                                        Unknown flags: 0x3000004000000000
        tso_caps:
                max_tso:                        262144
                supported_qp:
                                        SUPPORT_RAW_PACKET
        rss_caps:
                max_rwq_indirection_tables:                     524288
                max_rwq_indirection_table_size:                 2048
                rx_hash_function:                               0x1
                rx_hash_fields_mask:                            0x800000FF
                supported_qp:
                                        SUPPORT_RAW_PACKET
        max_wq_type_rq:                 8388608
        packet_pacing_caps:
                qp_rate_limit_min:      1kbps
                qp_rate_limit_max:      100000000kbps
                supported_qp:
                                        SUPPORT_RAW_PACKET
        tag matching not supported

        cq moderation caps:
                max_cq_count:   65535
                max_cq_period:  4095 us

        maximum available device memory:        131072Bytes

        num_comp_vectors:               63
                port:   1
                        state:                  PORT_ACTIVE (4)
                        max_mtu:                4096 (5)
                        active_mtu:             1024 (3)
                        sm_lid:                 0
                        port_lid:               0
                        port_lmc:               0x00
                        link_layer:             Ethernet
                        max_msg_sz:             0x40000000
                        port_cap_flags:         0x04010000
                        port_cap_flags2:        0x0000
                        max_vl_num:             invalid value (0)
                        bad_pkey_cntr:          0x0
                        qkey_viol_cntr:         0x0
                        sm_sl:                  0
                        pkey_tbl_len:           1
                        gid_tbl_len:            255
                        subnet_timeout:         0
                        init_type_reply:        0
                        active_width:           4X (2)
                        active_speed:           25.0 Gbps (32)
                        phys_state:             LINK_UP (5)
                        GID[  0]:               fe80:0000:0000:0000:a288:c2ff:feb4:87e6, RoCE v1
                        GID[  1]:               fe80::a288:c2ff:feb4:87e6, RoCE v2
                        GID[  2]:               0000:0000:0000:0000:0000:ffff:0a10:1d0c, RoCE v1
                        GID[  3]:               ::ffff:10.16.29.12, RoCE v2

hca_id: mlx5_bond_1
        transport:                      InfiniBand (0)
        fw_ver:                         22.36.1010
        node_guid:                      a088:c203:00b4:a562
        sys_image_guid:                 a088:c203:00b4:a562
        vendor_id:                      0x02c9
        vendor_part_id:                 4125
        hw_ver:                         0x0
        board_id:                       MT_0000000359
        phys_port_cnt:                  1
        max_mr_size:                    0xffffffffffffffff
        page_size_cap:                  0xfffffffffffff000
        max_qp:                         131072
        max_qp_wr:                      32768
        device_cap_flags:               0xed721c36
                                        BAD_PKEY_CNTR
                                        BAD_QKEY_CNTR
                                        AUTO_PATH_MIG
                                        CHANGE_PHY_PORT
                                        PORT_ACTIVE_EVENT
                                        SYS_IMAGE_GUID
                                        RC_RNR_NAK_GEN
                                        MEM_WINDOW
                                        XRC
                                        MEM_MGT_EXTENSIONS
                                        MEM_WINDOW_TYPE_2B
                                        RAW_IP_CSUM
                                        MANAGED_FLOW_STEERING
                                        Unknown flags: 0xC8400000
        max_sge:                        30
        max_sge_rd:                     30
        max_cq:                         16777216
        max_cqe:                        4194303
        max_mr:                         16777216
        max_pd:                         8388608
        max_qp_rd_atom:                 16
        max_ee_rd_atom:                 0
        max_res_rd_atom:                2097152
        max_qp_init_rd_atom:            16
        max_ee_init_rd_atom:            0
        atomic_cap:                     ATOMIC_HCA (1)
        max_ee:                         0
        max_rdd:                        0
        max_mw:                         16777216
        max_raw_ipv6_qp:                0
        max_raw_ethy_qp:                0
        max_mcast_grp:                  2097152
        max_mcast_qp_attach:            240
        max_total_mcast_qp_attach:      503316480
        max_ah:                         2147483647
        max_fmr:                        0
        max_srq:                        8388608
        max_srq_wr:                     32767
        max_srq_sge:                    31
        max_pkeys:                      128
        local_ca_ack_delay:             16
        general_odp_caps:
                                        ODP_SUPPORT
                                        ODP_SUPPORT_IMPLICIT
        rc_odp_caps:
                                        SUPPORT_SEND
                                        SUPPORT_RECV
                                        SUPPORT_WRITE
                                        SUPPORT_READ
                                        SUPPORT_SRQ
        uc_odp_caps:
                                        NO SUPPORT
        ud_odp_caps:
                                        SUPPORT_SEND
        xrc_odp_caps:
                                        SUPPORT_SEND
                                        SUPPORT_WRITE
                                        SUPPORT_READ
                                        SUPPORT_SRQ
        completion timestamp_mask:                      0x7fffffffffffffff
        hca_core_clock:                 1000000kHZ
        raw packet caps:
                                        C-VLAN stripping offload
                                        Scatter FCS offload
                                        IP csum offload
                                        Delay drop
        device_cap_flags_ex:            0x30000054ED721C36
                                        RAW_SCATTER_FCS
                                        PCI_WRITE_END_PADDING
                                        Unknown flags: 0x3000004000000000
        tso_caps:
                max_tso:                        262144
                supported_qp:
                                        SUPPORT_RAW_PACKET
        rss_caps:
                max_rwq_indirection_tables:                     524288
                max_rwq_indirection_table_size:                 2048
                rx_hash_function:                               0x1
                rx_hash_fields_mask:                            0x800000FF
                supported_qp:
                                        SUPPORT_RAW_PACKET
        max_wq_type_rq:                 8388608
        packet_pacing_caps:
                qp_rate_limit_min:      1kbps
                qp_rate_limit_max:      100000000kbps
                supported_qp:
                                        SUPPORT_RAW_PACKET
        tag matching not supported

        cq moderation caps:
                max_cq_count:   65535
                max_cq_period:  4095 us

        maximum available device memory:        131072Bytes

        num_comp_vectors:               63
                port:   1
                        state:                  PORT_ACTIVE (4)
                        max_mtu:                4096 (5)
                        active_mtu:             1024 (3)
                        sm_lid:                 0
                        port_lid:               0
                        port_lmc:               0x00
                        link_layer:             Ethernet
                        max_msg_sz:             0x40000000
                        port_cap_flags:         0x04010000
                        port_cap_flags2:        0x0000
                        max_vl_num:             invalid value (0)
                        bad_pkey_cntr:          0x0
                        qkey_viol_cntr:         0x0
                        sm_sl:                  0
                        pkey_tbl_len:           1
                        gid_tbl_len:            255
                        subnet_timeout:         0
                        init_type_reply:        0
                        active_width:           4X (2)
                        active_speed:           25.0 Gbps (32)
                        phys_state:             LINK_UP (5)
                        GID[  0]:               fe80:0000:0000:0000:a288:c2ff:feb4:a562, RoCE v1
                        GID[  1]:               fe80::a288:c2ff:feb4:a562, RoCE v2
                        GID[  2]:               0000:0000:0000:0000:0000:ffff:0a10:270c, RoCE v1
                        GID[  3]:               ::ffff:10.16.39.12, RoCE v2

[root@node12 ucx-1.18.0]#

Additional information (depending on the issue)

client

[root@node13 ucx-1.18.0]# UCX_NET_DEVICES=mlx5_bond_0:1  UCX_PROTO_ENABLE=y UCX_TLS=rc  UCX_PROTO_INFO=y ./install-release-mt/bin/ucx_perftest 10.16.29.12 -t ucp_am_bw -s 1048576  -n 5000000
[1737344523.925862] [node13:3340904:0]        perftest.c:800  UCX  WARN  CPU affinity is not set (bound to 96 cpus). Performance may be impacted.
+--------------+--------------+------------------------------+---------------------+-----------------------+
|              |              |       overhead (usec)        |   bandwidth (MB/s)  |  message rate (msg/s) |
+--------------+--------------+----------+---------+---------+----------+----------+-----------+-----------+
|    Stage     | # iterations | 50.0%ile | average | overall |  average |  overall |  average  |  overall  |
+--------------+--------------+----------+---------+---------+----------+----------+-----------+-----------+
[1737344523.997930] [node13:3340904:0]   +---------------------------+-------------------------------------------------------------------------------------------------+
[1737344523.997940] [node13:3340904:0]   | perftest inter-node cfg#0 | active message by ucp_am_send* from host memory                                                 |
[1737344523.997943] [node13:3340904:0]   +---------------------------+-------------------------------------------+-----------------------------------------------------+
[1737344523.997945] [node13:3340904:0]   |                   0..2038 | short                                     | rc_mlx5/mlx5_bond_0:1/path0                         |
[1737344523.997947] [node13:3340904:0]   |                2039..8246 | copy-in                                   | rc_mlx5/mlx5_bond_0:1/path0                         |
[1737344523.997948] [node13:3340904:0]   |               8247..27188 | multi-frag copy-in                        | rc_mlx5/mlx5_bond_0:1/path0                         |
[1737344523.997951] [node13:3340904:0]   |                27189..inf | (?) rendezvous zero-copy read from remote | rc_mlx5/mlx5_bond_0:1 50% on path0 and 50% on path1 |
[1737344523.997953] [node13:3340904:0]   +---------------------------+-------------------------------------------+-----------------------------------------------------+
[1737344523.998124] [node13:3340904:0]   +---------------------------+-------------------------------------------------------------------------------------------------+
[1737344523.998127] [node13:3340904:0]   | perftest inter-node cfg#0 | active message by ucp_am_send*(fast-completion) from host memory                                |
[1737344523.998129] [node13:3340904:0]   +---------------------------+-------------------------------------------+-----------------------------------------------------+
[1737344523.998132] [node13:3340904:0]   |                   0..2038 | short                                     | rc_mlx5/mlx5_bond_0:1/path0                         |
[1737344523.998134] [node13:3340904:0]   |                2039..8246 | copy-in                                   | rc_mlx5/mlx5_bond_0:1/path0                         |
[1737344523.998136] [node13:3340904:0]   |               8247..13104 | multi-frag copy-in                        | rc_mlx5/mlx5_bond_0:1/path0                         |
[1737344523.998139] [node13:3340904:0]   |             13105..262143 | multi-frag zero-copy                      | rc_mlx5/mlx5_bond_0:1/path0                         |
[1737344523.998141] [node13:3340904:0]   |                 256K..inf | (?) rendezvous zero-copy read from remote | rc_mlx5/mlx5_bond_0:1 50% on path0 and 50% on path1 |
[1737344523.998143] [node13:3340904:0]   +---------------------------+-------------------------------------------+-----------------------------------------------------+
[1737344523.998495] [node13:3340904:0]   +---------------------------+-------------------------------------------------------------------------------------------------+
[1737344523.998498] [node13:3340904:0]   | perftest inter-node cfg#0 | active message by ucp_am_send*(multi) from host memory                                          |
[1737344523.998500] [node13:3340904:0]   +---------------------------+-------------------------------------------+-----------------------------------------------------+
[1737344523.998504] [node13:3340904:0]   |                    0..514 | short                                     | rc_mlx5/mlx5_bond_0:1/path0                         |
[1737344523.998506] [node13:3340904:0]   |                 515..4844 | zero-copy                                 | rc_mlx5/mlx5_bond_0:1/path0                         |
[1737344523.998509] [node13:3340904:0]   |                 4845..inf | (?) rendezvous zero-copy read from remote | rc_mlx5/mlx5_bond_0:1 50% on path0 and 50% on path1 |
[1737344523.998511] [node13:3340904:0]   +---------------------------+-------------------------------------------+-----------------------------------------------------+
[1737344523.998670] [node13:3340904:0]   +---------------------------+-------------------------------------------------------------------------------------------------+
[1737344523.998673] [node13:3340904:0]   | perftest inter-node cfg#0 | active message by ucp_am_send* with reply flag from host memory                                 |
[1737344523.998675] [node13:3340904:0]   +---------------------------+-------------------------------------------+-----------------------------------------------------+
[1737344523.998678] [node13:3340904:0]   |                   0..2030 | short                                     | rc_mlx5/mlx5_bond_0:1/path0                         |
[1737344523.998682] [node13:3340904:0]   |                2031..8238 | copy-in                                   | rc_mlx5/mlx5_bond_0:1/path0                         |
[1737344523.998684] [node13:3340904:0]   |               8239..27188 | multi-frag copy-in                        | rc_mlx5/mlx5_bond_0:1/path0                         |
[1737344523.998686] [node13:3340904:0]   |                27189..inf | (?) rendezvous zero-copy read from remote | rc_mlx5/mlx5_bond_0:1 50% on path0 and 50% on path1 |
[1737344523.998688] [node13:3340904:0]   +---------------------------+-------------------------------------------+-----------------------------------------------------+
[1737344523.998827] [node13:3340904:0]   +---------------------------+-------------------------------------------------------------------------------------------------+
[1737344523.998830] [node13:3340904:0]   | perftest inter-node cfg#0 | active message by ucp_am_send* with reply flag(fast-completion) from host memory                |
[1737344523.998832] [node13:3340904:0]   +---------------------------+-------------------------------------------+-----------------------------------------------------+
[1737344523.998835] [node13:3340904:0]   |                   0..2030 | short                                     | rc_mlx5/mlx5_bond_0:1/path0                         |
[1737344523.998839] [node13:3340904:0]   |                2031..8238 | copy-in                                   | rc_mlx5/mlx5_bond_0:1/path0                         |
[1737344523.998841] [node13:3340904:0]   |               8239..13104 | multi-frag copy-in                        | rc_mlx5/mlx5_bond_0:1/path0                         |
[1737344523.998843] [node13:3340904:0]   |             13105..262143 | multi-frag zero-copy                      | rc_mlx5/mlx5_bond_0:1/path0                         |
[1737344523.998844] [node13:3340904:0]   |                 256K..inf | (?) rendezvous zero-copy read from remote | rc_mlx5/mlx5_bond_0:1 50% on path0 and 50% on path1 |
[1737344523.998847] [node13:3340904:0]   +---------------------------+-------------------------------------------+-----------------------------------------------------+
[1737344523.999017] [node13:3340904:0]   +---------------------------+-------------------------------------------------------------------------------------------------+
[1737344523.999020] [node13:3340904:0]   | perftest inter-node cfg#0 | active message by ucp_am_send* with reply flag(multi) from host memory                          |
[1737344523.999022] [node13:3340904:0]   +---------------------------+-------------------------------------------+-----------------------------------------------------+
[1737344523.999025] [node13:3340904:0]   |                    0..514 | short                                     | rc_mlx5/mlx5_bond_0:1/path0                         |
[1737344523.999028] [node13:3340904:0]   |                 515..4844 | zero-copy                                 | rc_mlx5/mlx5_bond_0:1/path0                         |
[1737344523.999031] [node13:3340904:0]   |                 4845..inf | (?) rendezvous zero-copy read from remote | rc_mlx5/mlx5_bond_0:1 50% on path0 and 50% on path1 |
[1737344523.999033] [node13:3340904:0]   +---------------------------+-------------------------------------------+-----------------------------------------------------+
[thread 0]             10511      0.215    95.566    95.566    10464.02   10464.02       10464       10464
[thread 0]             21579      0.215    90.627    93.033    11034.18   10748.90       11034       10749
[thread 0]             32646      0.215    90.619    92.214    11035.25   10844.29       11035       10844
[thread 0]             43713      0.216    90.618    91.810    11035.37   10892.04       11035       10892
[thread 0]             54780      0.215    90.619    91.569    11035.27   10920.68       11035       10921
^C
[root@node13 ucx-1.18.0]#

server

[root@node12 ucx-1.18.0]# UCX_NET_DEVICES=mlx5_bond_0:1 UCX_IB_ROCE_LOCAL_SUBNET=y UCX_TLS=rc  UCX_PROTO_ENABLE=y UCX_PROTO_INFO=y ./install-release-mt/bin/ucx_perftest
[1737344495.491358] [node13:3355654:0]        perftest.c:800  UCX  WARN  CPU affinity is not set (bound to 96 cpus). Performance may be impacted.
Waiting for connection...
^C
[root@node12 ucx-1.18.0]# UCX_NET_DEVICES=mlx5_bond_0:1 UCX_TLS=rc  UCX_PROTO_ENABLE=y UCX_PROTO_INFO=y ./install-release-mt/bin/ucx_perftest
[1737344512.076307] [node13:3357054:0]        perftest.c:800  UCX  WARN  CPU affinity is not set (bound to 96 cpus). Performance may be impacted.
Waiting for connection...
Accepted connection from 10.16.29.13:38322
+----------------------------------------------------------------------------------------------------------+
| API:          protocol layer                                                                             |
| Test:         am bandwidth / message rate                                                                |
| Data layout:  (automatic)                                                                                |
| Send memory:  host                                                                                       |
| Recv memory:  host                                                                                       |
| Message size: 1048576                                                                                    |
| Window size:  32                                                                                         |
| AM header size: 0                                                                                        |
+----------------------------------------------------------------------------------------------------------+
[1737344523.993616] [node13:3357054:0]   +---------------------------+-------------------------------------------------------------------------------------------------+
[1737344523.993626] [node13:3357054:0]   | perftest inter-node cfg#0 | active message by ucp_am_send* from host memory                                                 |
[1737344523.993630] [node13:3357054:0]   +---------------------------+-------------------------------------------+-----------------------------------------------------+
[1737344523.993632] [node13:3357054:0]   |                   0..2038 | short                                     | rc_mlx5/mlx5_bond_0:1/path0                         |
[1737344523.993635] [node13:3357054:0]   |                2039..8246 | copy-in                                   | rc_mlx5/mlx5_bond_0:1/path0                         |
[1737344523.993637] [node13:3357054:0]   |               8247..27188 | multi-frag copy-in                        | rc_mlx5/mlx5_bond_0:1/path0                         |
[1737344523.993641] [node13:3357054:0]   |                27189..inf | (?) rendezvous zero-copy read from remote | rc_mlx5/mlx5_bond_0:1 50% on path0 and 50% on path1 |
[1737344523.993643] [node13:3357054:0]   +---------------------------+-------------------------------------------+-----------------------------------------------------+
[1737344523.993801] [node13:3357054:0]   +---------------------------+-------------------------------------------------------------------------------------------------+
[1737344523.993806] [node13:3357054:0]   | perftest inter-node cfg#0 | active message by ucp_am_send*(fast-completion) from host memory                                |
[1737344523.993808] [node13:3357054:0]   +---------------------------+-------------------------------------------+-----------------------------------------------------+
[1737344523.993811] [node13:3357054:0]   |                   0..2038 | short                                     | rc_mlx5/mlx5_bond_0:1/path0                         |
[1737344523.993815] [node13:3357054:0]   |                2039..8246 | copy-in                                   | rc_mlx5/mlx5_bond_0:1/path0                         |
[1737344523.993817] [node13:3357054:0]   |               8247..13104 | multi-frag copy-in                        | rc_mlx5/mlx5_bond_0:1/path0                         |
[1737344523.993820] [node13:3357054:0]   |             13105..262143 | multi-frag zero-copy                      | rc_mlx5/mlx5_bond_0:1/path0                         |
[1737344523.993822] [node13:3357054:0]   |                 256K..inf | (?) rendezvous zero-copy read from remote | rc_mlx5/mlx5_bond_0:1 50% on path0 and 50% on path1 |
[1737344523.993825] [node13:3357054:0]   +---------------------------+-------------------------------------------+-----------------------------------------------------+
[1737344523.994184] [node13:3357054:0]   +---------------------------+-------------------------------------------------------------------------------------------------+
[1737344523.994187] [node13:3357054:0]   | perftest inter-node cfg#0 | active message by ucp_am_send*(multi) from host memory                                          |
[1737344523.994190] [node13:3357054:0]   +---------------------------+-------------------------------------------+-----------------------------------------------------+
[1737344523.994192] [node13:3357054:0]   |                    0..514 | short                                     | rc_mlx5/mlx5_bond_0:1/path0                         |
[1737344523.994195] [node13:3357054:0]   |                 515..4844 | zero-copy                                 | rc_mlx5/mlx5_bond_0:1/path0                         |
[1737344523.994196] [node13:3357054:0]   |                 4845..inf | (?) rendezvous zero-copy read from remote | rc_mlx5/mlx5_bond_0:1 50% on path0 and 50% on path1 |
[1737344523.994198] [node13:3357054:0]   +---------------------------+-------------------------------------------+-----------------------------------------------------+
[1737344523.994353] [node13:3357054:0]   +---------------------------+-------------------------------------------------------------------------------------------------+
[1737344523.994357] [node13:3357054:0]   | perftest inter-node cfg#0 | active message by ucp_am_send* with reply flag from host memory                                 |
[1737344523.994359] [node13:3357054:0]   +---------------------------+-------------------------------------------+-----------------------------------------------------+
[1737344523.994362] [node13:3357054:0]   |                   0..2030 | short                                     | rc_mlx5/mlx5_bond_0:1/path0                         |
[1737344523.994364] [node13:3357054:0]   |                2031..8238 | copy-in                                   | rc_mlx5/mlx5_bond_0:1/path0                         |
[1737344523.994367] [node13:3357054:0]   |               8239..27188 | multi-frag copy-in                        | rc_mlx5/mlx5_bond_0:1/path0                         |
[1737344523.994369] [node13:3357054:0]   |                27189..inf | (?) rendezvous zero-copy read from remote | rc_mlx5/mlx5_bond_0:1 50% on path0 and 50% on path1 |
[1737344523.994374] [node13:3357054:0]   +---------------------------+-------------------------------------------+-----------------------------------------------------+
[1737344523.994511] [node13:3357054:0]   +---------------------------+-------------------------------------------------------------------------------------------------+
[1737344523.994514] [node13:3357054:0]   | perftest inter-node cfg#0 | active message by ucp_am_send* with reply flag(fast-completion) from host memory                |
[1737344523.994517] [node13:3357054:0]   +---------------------------+-------------------------------------------+-----------------------------------------------------+
[1737344523.994520] [node13:3357054:0]   |                   0..2030 | short                                     | rc_mlx5/mlx5_bond_0:1/path0                         |
[1737344523.994523] [node13:3357054:0]   |                2031..8238 | copy-in                                   | rc_mlx5/mlx5_bond_0:1/path0                         |
[1737344523.994526] [node13:3357054:0]   |               8239..13104 | multi-frag copy-in                        | rc_mlx5/mlx5_bond_0:1/path0                         |
[1737344523.994528] [node13:3357054:0]   |             13105..262143 | multi-frag zero-copy                      | rc_mlx5/mlx5_bond_0:1/path0                         |
[1737344523.994530] [node13:3357054:0]   |                 256K..inf | (?) rendezvous zero-copy read from remote | rc_mlx5/mlx5_bond_0:1 50% on path0 and 50% on path1 |
[1737344523.994533] [node13:3357054:0]   +---------------------------+-------------------------------------------+-----------------------------------------------------+
[1737344523.994709] [node13:3357054:0]   +---------------------------+-------------------------------------------------------------------------------------------------+
[1737344523.994712] [node13:3357054:0]   | perftest inter-node cfg#0 | active message by ucp_am_send* with reply flag(multi) from host memory                          |
[1737344523.994715] [node13:3357054:0]   +---------------------------+-------------------------------------------+-----------------------------------------------------+
[1737344523.994718] [node13:3357054:0]   |                    0..514 | short                                     | rc_mlx5/mlx5_bond_0:1/path0                         |
[1737344523.994721] [node13:3357054:0]   |                 515..4844 | zero-copy                                 | rc_mlx5/mlx5_bond_0:1/path0                         |
[1737344523.994723] [node13:3357054:0]   |                 4845..inf | (?) rendezvous zero-copy read from remote | rc_mlx5/mlx5_bond_0:1 50% on path0 and 50% on path1 |
[1737344523.994726] [node13:3357054:0]   +---------------------------+-------------------------------------------+-----------------------------------------------------+

@ivanallen ivanallen added the Bug label Jan 20, 2025
@ivanallen
Copy link
Author

ivanallen commented Jan 20, 2025

[root@node12 ucx-1.18.0]# show_gids
DEV     PORT    INDEX   GID                                     IPv4            VER     DEV
---     ----    -----   ---                                     ------------    ---     ---
mlx5_bond_0     1       0       fe80:0000:0000:0000:a288:c2ff:feb4:87e6                 v1      bond0
mlx5_bond_0     1       1       fe80:0000:0000:0000:a288:c2ff:feb4:87e6                 v2      bond0
mlx5_bond_0     1       2       0000:0000:0000:0000:0000:ffff:0a10:1d0c 10.16.29.12     v1      bond0
mlx5_bond_0     1       3       0000:0000:0000:0000:0000:ffff:0a10:1d0c 10.16.29.12     v2      bond0
mlx5_bond_1     1       0       fe80:0000:0000:0000:a288:c2ff:feb4:a562                 v1      bond1
mlx5_bond_1     1       1       fe80:0000:0000:0000:a288:c2ff:feb4:a562                 v2      bond1
mlx5_bond_1     1       2       0000:0000:0000:0000:0000:ffff:0a10:270c 10.16.39.12     v1      bond1
mlx5_bond_1     1       3       0000:0000:0000:0000:0000:ffff:0a10:270c 10.16.39.12     v2      bond1
n_gids_found=8

[root@node13 ucx-1.18.0]# show_gids
DEV     PORT    INDEX   GID                                     IPv4            VER     DEV
---     ----    -----   ---                                     ------------    ---     ---
mlx5_bond_0     1       0       fe80:0000:0000:0000:a288:c2ff:feb4:87d6                 v1      bond0
mlx5_bond_0     1       1       fe80:0000:0000:0000:a288:c2ff:feb4:87d6                 v2      bond0
mlx5_bond_0     1       2       0000:0000:0000:0000:0000:ffff:0a10:1d0d 10.16.29.13     v1      bond0
mlx5_bond_0     1       3       0000:0000:0000:0000:0000:ffff:0a10:1d0d 10.16.29.13     v2      bond0
mlx5_bond_1     1       0       fe80:0000:0000:0000:a288:c2ff:feb4:b04a                 v1      bond1
mlx5_bond_1     1       1       fe80:0000:0000:0000:a288:c2ff:feb4:b04a                 v2      bond1
mlx5_bond_1     1       2       0000:0000:0000:0000:0000:ffff:0a10:270d 10.16.39.13     v1      bond1
mlx5_bond_1     1       3       0000:0000:0000:0000:0000:ffff:0a10:270d 10.16.39.13     v2      bond1
n_gids_found=8
[root@node12 ucx-1.18.0]# ethtool bond0
Settings for bond0:
        Supported ports: [  ]
        Supported link modes:   Not reported
        Supported pause frame use: No
        Supports auto-negotiation: No
        Supported FEC modes: Not reported
        Advertised link modes:  Not reported
        Advertised pause frame use: No
        Advertised auto-negotiation: No
        Advertised FEC modes: Not reported
        Speed: 200000Mb/s
        Duplex: Full
        Auto-negotiation: off
        Port: Other
        PHYAD: 0
        Transceiver: internal
        Link detected: yes
[root@node13 ucx-1.18.0]# ethtool bond0
Settings for bond0:
        Supported ports: [  ]
        Supported link modes:   Not reported
        Supported pause frame use: No
        Supports auto-negotiation: No
        Supported FEC modes: Not reported
        Advertised link modes:  Not reported
        Advertised pause frame use: No
        Advertised auto-negotiation: No
        Advertised FEC modes: Not reported
        Speed: 200000Mb/s
        Duplex: Full
        Auto-negotiation: off
        Port: Other
        PHYAD: 0
        Transceiver: internal
        Link detected: yes
[root@node12 ucx-1.18.0]# cat /proc/net/bonding/bond0
Ethernet Channel Bonding Driver: v5.14.0-162.nos.4.el8.x86_64

Bonding Mode: IEEE 802.3ad Dynamic link aggregation
Transmit Hash Policy: layer3+4 (1)
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0
Peer Notification Delay (ms): 0

802.3ad info
LACP active: on
LACP rate: slow
Min links: 0
Aggregator selection policy (ad_select): stable
System priority: 65535
System MAC address: a0:88:c2:b4:87:e6
Active Aggregator Info:
        Aggregator ID: 15
        Number of ports: 2
        Actor Key: 29
        Partner Key: 9
        Partner Mac Address: 18:c0:09:0c:45:be

Slave Interface: ens2f0np0
MII Status: up
Speed: 100000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: a0:88:c2:b4:87:e6
Slave queue ID: 0
Aggregator ID: 15
Actor Churn State: none
Partner Churn State: none
Actor Churned Count: 0
Partner Churned Count: 0
details actor lacp pdu:
    system priority: 65535
    system mac address: a0:88:c2:b4:87:e6
    port key: 29
    port priority: 255
    port number: 1
    port state: 61
details partner lacp pdu:
    system priority: 32768
    system mac address: 18:c0:09:0c:45:be
    oper key: 9
    port priority: 32768
    port number: 17
    port state: 61

Slave Interface: ens2f1np1
MII Status: up
Speed: 100000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: a0:88:c2:b4:87:e7
Slave queue ID: 0
Aggregator ID: 15
Actor Churn State: none
Partner Churn State: none
Actor Churned Count: 0
Partner Churned Count: 0
details actor lacp pdu:
    system priority: 65535
    system mac address: a0:88:c2:b4:87:e6
    port key: 29
    port priority: 255
    port number: 2
    port state: 61
details partner lacp pdu:
    system priority: 32768
    system mac address: 18:c0:09:0c:45:be
    oper key: 9
    port priority: 32768
    port number: 18
    port state: 61
[root@node13 ucx-1.18.0]# ib_send_bw -d mlx5_bond_0 -F --report_gbits  10.16.29.12 -q 8 -s 1048576 --run_infinitely
---------------------------------------------------------------------------------------
                    Send BW Test
 Dual-port       : OFF          Device         : mlx5_bond_0
 Number of qps   : 8            Transport type : IB
 Connection type : RC           Using SRQ      : OFF
 PCIe relax order: ON
 ibv_wr* API     : ON
 TX depth        : 128
 CQ Moderation   : 1
 Mtu             : 1024[B]
 Link type       : Ethernet
 GID index       : 3
 Max inline data : 0[B]
 rdma_cm QPs     : OFF
 Data ex. method : Ethernet
---------------------------------------------------------------------------------------
 local address: LID 0000 QPN 0xc97e PSN 0x429886
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:10:16:29:13
 local address: LID 0000 QPN 0xc97f PSN 0x29322f
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:10:16:29:13
 local address: LID 0000 QPN 0xc980 PSN 0xefa617
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:10:16:29:13
 local address: LID 0000 QPN 0xc981 PSN 0x268de3
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:10:16:29:13
 local address: LID 0000 QPN 0xc982 PSN 0x1084f1
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:10:16:29:13
 local address: LID 0000 QPN 0xc983 PSN 0xc702c9
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:10:16:29:13
 local address: LID 0000 QPN 0xc984 PSN 0xfc13be
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:10:16:29:13
 local address: LID 0000 QPN 0xc985 PSN 0x87e4a9
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:10:16:29:13
 remote address: LID 0000 QPN 0x19213 PSN 0x49994
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:10:16:29:12
 remote address: LID 0000 QPN 0x19214 PSN 0x27c5ee
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:10:16:29:12
 remote address: LID 0000 QPN 0x19215 PSN 0x9c9ba0
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:10:16:29:12
 remote address: LID 0000 QPN 0x19216 PSN 0x26a26f
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:10:16:29:12
 remote address: LID 0000 QPN 0x19217 PSN 0x289956
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:10:16:29:12
 remote address: LID 0000 QPN 0x19218 PSN 0xece300
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:10:16:29:12
 remote address: LID 0000 QPN 0x19219 PSN 0xb8685f
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:10:16:29:12
 remote address: LID 0000 QPN 0x1921a PSN 0xa3066c
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:10:16:29:12
---------------------------------------------------------------------------------------
 #bytes     #iterations    BW peak[Gb/sec]    BW average[Gb/sec]   MsgRate[Mpps]
 1048576    110370           0.00               185.17             0.022074
 1048576    110369           0.00               185.17             0.022073
 1048576    110372           0.00               185.17             0.022074

@yosefe
Copy link
Contributor

yosefe commented Jan 20, 2025

Can you pls check if UCX_IB_NUM_PATHS=2 helps? (in case there was a problem with bond device detection)

@ivanallen
Copy link
Author

@yosefe Hi, It doesn't look effective.

client:

[root@node13 ucx-1.18.0]# UCX_IB_NUM_PATHS=2 UCX_NET_DEVICES=mlx5_bond_0:1  UCX_PROTO_ENABLE=y UCX_TLS=rc  UCX_PROTO_INFO=y ./install-release-mt/bin/ucx_perftest 10.16.29.12 -t ucp_am_bw -s 1048576  -n 5000000
[1737355925.568852] [node13:211766:0]        perftest.c:800  UCX  WARN  CPU affinity is not set (bound to 96 cpus). Performance may be impacted.
+--------------+--------------+------------------------------+---------------------+-----------------------+
|              |              |       overhead (usec)        |   bandwidth (MB/s)  |  message rate (msg/s) |
+--------------+--------------+----------+---------+---------+----------+----------+-----------+-----------+
|    Stage     | # iterations | 50.0%ile | average | overall |  average |  overall |  average  |  overall  |
+--------------+--------------+----------+---------+---------+----------+----------+-----------+-----------+
[1737355925.663267] [node13:211766:0]   +---------------------------+-------------------------------------------------------------------------------------------------+
[1737355925.663277] [node13:211766:0]   | perftest inter-node cfg#0 | active message by ucp_am_send* from host memory                                                 |
[1737355925.663280] [node13:211766:0]   +---------------------------+-------------------------------------------+-----------------------------------------------------+
[1737355925.663282] [node13:211766:0]   |                   0..2038 | short                                     | rc_mlx5/mlx5_bond_0:1/path0                         |
[1737355925.663285] [node13:211766:0]   |                2039..8246 | copy-in                                   | rc_mlx5/mlx5_bond_0:1/path0                         |
[1737355925.663287] [node13:211766:0]   |               8247..27188 | multi-frag copy-in                        | rc_mlx5/mlx5_bond_0:1/path0                         |
[1737355925.663289] [node13:211766:0]   |                27189..inf | (?) rendezvous zero-copy read from remote | rc_mlx5/mlx5_bond_0:1 50% on path0 and 50% on path1 |
[1737355925.663291] [node13:211766:0]   +---------------------------+-------------------------------------------+-----------------------------------------------------+
[1737355925.663453] [node13:211766:0]   +---------------------------+-------------------------------------------------------------------------------------------------+
[1737355925.663456] [node13:211766:0]   | perftest inter-node cfg#0 | active message by ucp_am_send*(fast-completion) from host memory                                |
[1737355925.663458] [node13:211766:0]   +---------------------------+-------------------------------------------+-----------------------------------------------------+
[1737355925.663461] [node13:211766:0]   |                   0..2038 | short                                     | rc_mlx5/mlx5_bond_0:1/path0                         |
[1737355925.663463] [node13:211766:0]   |                2039..8246 | copy-in                                   | rc_mlx5/mlx5_bond_0:1/path0                         |
[1737355925.663464] [node13:211766:0]   |               8247..13104 | multi-frag copy-in                        | rc_mlx5/mlx5_bond_0:1/path0                         |
[1737355925.663467] [node13:211766:0]   |             13105..262143 | multi-frag zero-copy                      | rc_mlx5/mlx5_bond_0:1/path0                         |
[1737355925.663470] [node13:211766:0]   |                 256K..inf | (?) rendezvous zero-copy read from remote | rc_mlx5/mlx5_bond_0:1 50% on path0 and 50% on path1 |
[1737355925.663473] [node13:211766:0]   +---------------------------+-------------------------------------------+-----------------------------------------------------+
[1737355925.663832] [node13:211766:0]   +---------------------------+-------------------------------------------------------------------------------------------------+
[1737355925.663835] [node13:211766:0]   | perftest inter-node cfg#0 | active message by ucp_am_send*(multi) from host memory                                          |
[1737355925.663836] [node13:211766:0]   +---------------------------+-------------------------------------------+-----------------------------------------------------+
[1737355925.663839] [node13:211766:0]   |                    0..514 | short                                     | rc_mlx5/mlx5_bond_0:1/path0                         |
[1737355925.663840] [node13:211766:0]   |                 515..4844 | zero-copy                                 | rc_mlx5/mlx5_bond_0:1/path0                         |
[1737355925.663842] [node13:211766:0]   |                 4845..inf | (?) rendezvous zero-copy read from remote | rc_mlx5/mlx5_bond_0:1 50% on path0 and 50% on path1 |
[1737355925.663844] [node13:211766:0]   +---------------------------+-------------------------------------------+-----------------------------------------------------+
[1737355925.664001] [node13:211766:0]   +---------------------------+-------------------------------------------------------------------------------------------------+
[1737355925.664004] [node13:211766:0]   | perftest inter-node cfg#0 | active message by ucp_am_send* with reply flag from host memory                                 |
[1737355925.664005] [node13:211766:0]   +---------------------------+-------------------------------------------+-----------------------------------------------------+
[1737355925.664008] [node13:211766:0]   |                   0..2030 | short                                     | rc_mlx5/mlx5_bond_0:1/path0                         |
[1737355925.664011] [node13:211766:0]   |                2031..8238 | copy-in                                   | rc_mlx5/mlx5_bond_0:1/path0                         |
[1737355925.664014] [node13:211766:0]   |               8239..27188 | multi-frag copy-in                        | rc_mlx5/mlx5_bond_0:1/path0                         |
[1737355925.664018] [node13:211766:0]   |                27189..inf | (?) rendezvous zero-copy read from remote | rc_mlx5/mlx5_bond_0:1 50% on path0 and 50% on path1 |
[1737355925.664020] [node13:211766:0]   +---------------------------+-------------------------------------------+-----------------------------------------------------+
[1737355925.664164] [node13:211766:0]   +---------------------------+-------------------------------------------------------------------------------------------------+
[1737355925.664168] [node13:211766:0]   | perftest inter-node cfg#0 | active message by ucp_am_send* with reply flag(fast-completion) from host memory                |
[1737355925.664169] [node13:211766:0]   +---------------------------+-------------------------------------------+-----------------------------------------------------+
[1737355925.664172] [node13:211766:0]   |                   0..2030 | short                                     | rc_mlx5/mlx5_bond_0:1/path0                         |
[1737355925.664174] [node13:211766:0]   |                2031..8238 | copy-in                                   | rc_mlx5/mlx5_bond_0:1/path0                         |
[1737355925.664177] [node13:211766:0]   |               8239..13104 | multi-frag copy-in                        | rc_mlx5/mlx5_bond_0:1/path0                         |
[1737355925.664180] [node13:211766:0]   |             13105..262143 | multi-frag zero-copy                      | rc_mlx5/mlx5_bond_0:1/path0                         |
[1737355925.664182] [node13:211766:0]   |                 256K..inf | (?) rendezvous zero-copy read from remote | rc_mlx5/mlx5_bond_0:1 50% on path0 and 50% on path1 |
[1737355925.664184] [node13:211766:0]   +---------------------------+-------------------------------------------+-----------------------------------------------------+
[1737355925.664358] [node13:211766:0]   +---------------------------+-------------------------------------------------------------------------------------------------+
[1737355925.664361] [node13:211766:0]   | perftest inter-node cfg#0 | active message by ucp_am_send* with reply flag(multi) from host memory                          |
[1737355925.664363] [node13:211766:0]   +---------------------------+-------------------------------------------+-----------------------------------------------------+
[1737355925.664367] [node13:211766:0]   |                    0..514 | short                                     | rc_mlx5/mlx5_bond_0:1/path0                         |
[1737355925.664370] [node13:211766:0]   |                 515..4844 | zero-copy                                 | rc_mlx5/mlx5_bond_0:1/path0                         |
[1737355925.664373] [node13:211766:0]   |                 4845..inf | (?) rendezvous zero-copy read from remote | rc_mlx5/mlx5_bond_0:1 50% on path0 and 50% on path1 |
[1737355925.664375] [node13:211766:0]   +---------------------------+-------------------------------------------+-----------------------------------------------------+
[thread 0]             10511      0.210    95.566    95.566    10464.01   10464.01       10464       10464
[thread 0]             21578      0.210    90.621    93.030    11034.93   10749.24       11035       10749

server:

[root@node12 ucx-1.18.0]# UCX_IB_NUM_PATHS=2 UCX_NET_DEVICES=mlx5_bond_0:1 UCX_TLS=rc  UCX_PROTO_ENABLE=y UCX_PROTO_INFO=y ./install-release-mt/bin/ucx_perftest
[1737355923.345316] [node13:229553:0]        perftest.c:800  UCX  WARN  CPU affinity is not set (bound to 96 cpus). Performance may be impacted.
Waiting for connection...
Accepted connection from 10.16.29.13:33556
+----------------------------------------------------------------------------------------------------------+
| API:          protocol layer                                                                             |
| Test:         am bandwidth / message rate                                                                |
| Data layout:  (automatic)                                                                                |
| Send memory:  host                                                                                       |
| Recv memory:  host                                                                                       |
| Message size: 1048576                                                                                    |
| Window size:  32                                                                                         |
| AM header size: 0                                                                                        |
+----------------------------------------------------------------------------------------------------------+
[1737355925.660388] [node13:229553:0]   +---------------------------+-------------------------------------------------------------------------------------------------+
[1737355925.660398] [node13:229553:0]   | perftest inter-node cfg#0 | active message by ucp_am_send* from host memory                                                 |
[1737355925.660402] [node13:229553:0]   +---------------------------+-------------------------------------------+-----------------------------------------------------+
[1737355925.660405] [node13:229553:0]   |                   0..2038 | short                                     | rc_mlx5/mlx5_bond_0:1/path0                         |
[1737355925.660408] [node13:229553:0]   |                2039..8246 | copy-in                                   | rc_mlx5/mlx5_bond_0:1/path0                         |
[1737355925.660410] [node13:229553:0]   |               8247..27188 | multi-frag copy-in                        | rc_mlx5/mlx5_bond_0:1/path0                         |
[1737355925.660412] [node13:229553:0]   |                27189..inf | (?) rendezvous zero-copy read from remote | rc_mlx5/mlx5_bond_0:1 50% on path0 and 50% on path1 |
[1737355925.660417] [node13:229553:0]   +---------------------------+-------------------------------------------+-----------------------------------------------------+
[1737355925.660581] [node13:229553:0]   +---------------------------+-------------------------------------------------------------------------------------------------+
[1737355925.660585] [node13:229553:0]   | perftest inter-node cfg#0 | active message by ucp_am_send*(fast-completion) from host memory                                |
[1737355925.660586] [node13:229553:0]   +---------------------------+-------------------------------------------+-----------------------------------------------------+
[1737355925.660589] [node13:229553:0]   |                   0..2038 | short                                     | rc_mlx5/mlx5_bond_0:1/path0                         |
[1737355925.660590] [node13:229553:0]   |                2039..8246 | copy-in                                   | rc_mlx5/mlx5_bond_0:1/path0                         |
[1737355925.660592] [node13:229553:0]   |               8247..13104 | multi-frag copy-in                        | rc_mlx5/mlx5_bond_0:1/path0                         |
[1737355925.660595] [node13:229553:0]   |             13105..262143 | multi-frag zero-copy                      | rc_mlx5/mlx5_bond_0:1/path0                         |
[1737355925.660598] [node13:229553:0]   |                 256K..inf | (?) rendezvous zero-copy read from remote | rc_mlx5/mlx5_bond_0:1 50% on path0 and 50% on path1 |
[1737355925.660600] [node13:229553:0]   +---------------------------+-------------------------------------------+-----------------------------------------------------+
[1737355925.660965] [node13:229553:0]   +---------------------------+-------------------------------------------------------------------------------------------------+
[1737355925.660969] [node13:229553:0]   | perftest inter-node cfg#0 | active message by ucp_am_send*(multi) from host memory                                          |
[1737355925.660970] [node13:229553:0]   +---------------------------+-------------------------------------------+-----------------------------------------------------+
[1737355925.660974] [node13:229553:0]   |                    0..514 | short                                     | rc_mlx5/mlx5_bond_0:1/path0                         |
[1737355925.660977] [node13:229553:0]   |                 515..4844 | zero-copy                                 | rc_mlx5/mlx5_bond_0:1/path0                         |
[1737355925.660980] [node13:229553:0]   |                 4845..inf | (?) rendezvous zero-copy read from remote | rc_mlx5/mlx5_bond_0:1 50% on path0 and 50% on path1 |
[1737355925.660983] [node13:229553:0]   +---------------------------+-------------------------------------------+-----------------------------------------------------+
[1737355925.661140] [node13:229553:0]   +---------------------------+-------------------------------------------------------------------------------------------------+
[1737355925.661144] [node13:229553:0]   | perftest inter-node cfg#0 | active message by ucp_am_send* with reply flag from host memory                                 |
[1737355925.661146] [node13:229553:0]   +---------------------------+-------------------------------------------+-----------------------------------------------------+
[1737355925.661150] [node13:229553:0]   |                   0..2030 | short                                     | rc_mlx5/mlx5_bond_0:1/path0                         |
[1737355925.661154] [node13:229553:0]   |                2031..8238 | copy-in                                   | rc_mlx5/mlx5_bond_0:1/path0                         |
[1737355925.661157] [node13:229553:0]   |               8239..27188 | multi-frag copy-in                        | rc_mlx5/mlx5_bond_0:1/path0                         |
[1737355925.661159] [node13:229553:0]   |                27189..inf | (?) rendezvous zero-copy read from remote | rc_mlx5/mlx5_bond_0:1 50% on path0 and 50% on path1 |
[1737355925.661162] [node13:229553:0]   +---------------------------+-------------------------------------------+-----------------------------------------------------+
[1737355925.661304] [node13:229553:0]   +---------------------------+-------------------------------------------------------------------------------------------------+
[1737355925.661308] [node13:229553:0]   | perftest inter-node cfg#0 | active message by ucp_am_send* with reply flag(fast-completion) from host memory                |
[1737355925.661311] [node13:229553:0]   +---------------------------+-------------------------------------------+-----------------------------------------------------+
[1737355925.661313] [node13:229553:0]   |                   0..2030 | short                                     | rc_mlx5/mlx5_bond_0:1/path0                         |
[1737355925.661316] [node13:229553:0]   |                2031..8238 | copy-in                                   | rc_mlx5/mlx5_bond_0:1/path0                         |
[1737355925.661318] [node13:229553:0]   |               8239..13104 | multi-frag copy-in                        | rc_mlx5/mlx5_bond_0:1/path0                         |
[1737355925.661321] [node13:229553:0]   |             13105..262143 | multi-frag zero-copy                      | rc_mlx5/mlx5_bond_0:1/path0                         |
[1737355925.661324] [node13:229553:0]   |                 256K..inf | (?) rendezvous zero-copy read from remote | rc_mlx5/mlx5_bond_0:1 50% on path0 and 50% on path1 |
[1737355925.661327] [node13:229553:0]   +---------------------------+-------------------------------------------+-----------------------------------------------------+
[1737355925.661499] [node13:229553:0]   +---------------------------+-------------------------------------------------------------------------------------------------+
[1737355925.661503] [node13:229553:0]   | perftest inter-node cfg#0 | active message by ucp_am_send* with reply flag(multi) from host memory                          |
[1737355925.661505] [node13:229553:0]   +---------------------------+-------------------------------------------+-----------------------------------------------------+
[1737355925.661508] [node13:229553:0]   |                    0..514 | short                                     | rc_mlx5/mlx5_bond_0:1/path0                         |
[1737355925.661510] [node13:229553:0]   |                 515..4844 | zero-copy                                 | rc_mlx5/mlx5_bond_0:1/path0                         |
[1737355925.661514] [node13:229553:0]   |                 4845..inf | (?) rendezvous zero-copy read from remote | rc_mlx5/mlx5_bond_0:1 50% on path0 and 50% on path1 |
[1737355925.661516] [node13:229553:0]   +---------------------------+-------------------------------------------+-----------------------------------------------------+

@yosefe
Copy link
Contributor

yosefe commented Jan 20, 2025

What about UCX_IB_NUM_PATHS=8?

@ivanallen
Copy link
Author

@yosefe Both server and client are added. It's not working.

[root@node13 ucx-1.18.0]# UCX_IB_NUM_PATHS=8 UCX_NET_DEVICES=mlx5_bond_0:1  UCX_PROTO_ENABLE=y UCX_TLS=rc  UCX_PROTO_INFO=y ./install-release-mt/bin/ucx_perftest 10.16.29.12 -t ucp_am_bw -s 1048576  -n 5000000
[1737358297.407829] [node13:433275:0]        perftest.c:800  UCX  WARN  CPU affinity is not set (bound to 96 cpus). Performance may be impacted.
+--------------+--------------+------------------------------+---------------------+-----------------------+
|              |              |       overhead (usec)        |   bandwidth (MB/s)  |  message rate (msg/s) |
+--------------+--------------+----------+---------+---------+----------+----------+-----------+-----------+
|    Stage     | # iterations | 50.0%ile | average | overall |  average |  overall |  average  |  overall  |
+--------------+--------------+----------+---------+---------+----------+----------+-----------+-----------+
[1737358297.483086] [node13:433275:0]   +---------------------------+-------------------------------------------------------------------------------------------------+
[1737358297.483100] [node13:433275:0]   | perftest inter-node cfg#0 | active message by ucp_am_send* from host memory                                                 |
[1737358297.483102] [node13:433275:0]   +---------------------------+-------------------------------------------+-----------------------------------------------------+
[1737358297.483107] [node13:433275:0]   |                   0..2038 | short                                     | rc_mlx5/mlx5_bond_0:1/path0                         |
[1737358297.483108] [node13:433275:0]   |                2039..8246 | copy-in                                   | rc_mlx5/mlx5_bond_0:1/path0                         |
[1737358297.483111] [node13:433275:0]   |               8247..27188 | multi-frag copy-in                        | rc_mlx5/mlx5_bond_0:1/path0                         |
[1737358297.483113] [node13:433275:0]   |                27189..inf | (?) rendezvous zero-copy read from remote | rc_mlx5/mlx5_bond_0:1 50% on path0 and 50% on path1 |
[1737358297.483116] [node13:433275:0]   +---------------------------+-------------------------------------------+-----------------------------------------------------+
[1737358297.483277] [node13:433275:0]   +---------------------------+-------------------------------------------------------------------------------------------------+
[1737358297.483281] [node13:433275:0]   | perftest inter-node cfg#0 | active message by ucp_am_send*(fast-completion) from host memory                                |
[1737358297.483282] [node13:433275:0]   +---------------------------+-------------------------------------------+-----------------------------------------------------+
[1737358297.483286] [node13:433275:0]   |                   0..2038 | short                                     | rc_mlx5/mlx5_bond_0:1/path0                         |
[1737358297.483288] [node13:433275:0]   |                2039..8246 | copy-in                                   | rc_mlx5/mlx5_bond_0:1/path0                         |
[1737358297.483292] [node13:433275:0]   |               8247..13104 | multi-frag copy-in                        | rc_mlx5/mlx5_bond_0:1/path0                         |
[1737358297.483295] [node13:433275:0]   |             13105..262143 | multi-frag zero-copy                      | rc_mlx5/mlx5_bond_0:1/path0                         |
[1737358297.483298] [node13:433275:0]   |                 256K..inf | (?) rendezvous zero-copy read from remote | rc_mlx5/mlx5_bond_0:1 50% on path0 and 50% on path1 |
[1737358297.483301] [node13:433275:0]   +---------------------------+-------------------------------------------+-----------------------------------------------------+
[1737358297.483713] [node13:433275:0]   +---------------------------+-------------------------------------------------------------------------------------------------+
[1737358297.483716] [node13:433275:0]   | perftest inter-node cfg#0 | active message by ucp_am_send*(multi) from host memory                                          |
[1737358297.483718] [node13:433275:0]   +---------------------------+-------------------------------------------+-----------------------------------------------------+
[1737358297.483722] [node13:433275:0]   |                    0..514 | short                                     | rc_mlx5/mlx5_bond_0:1/path0                         |
[1737358297.483724] [node13:433275:0]   |                 515..4844 | zero-copy                                 | rc_mlx5/mlx5_bond_0:1/path0                         |
[1737358297.483727] [node13:433275:0]   |                 4845..inf | (?) rendezvous zero-copy read from remote | rc_mlx5/mlx5_bond_0:1 50% on path0 and 50% on path1 |
[1737358297.483729] [node13:433275:0]   +---------------------------+-------------------------------------------+-----------------------------------------------------+
[1737358297.483890] [node13:433275:0]   +---------------------------+-------------------------------------------------------------------------------------------------+
[1737358297.483893] [node13:433275:0]   | perftest inter-node cfg#0 | active message by ucp_am_send* with reply flag from host memory                                 |
[1737358297.483895] [node13:433275:0]   +---------------------------+-------------------------------------------+-----------------------------------------------------+
[1737358297.483898] [node13:433275:0]   |                   0..2030 | short                                     | rc_mlx5/mlx5_bond_0:1/path0                         |
[1737358297.483902] [node13:433275:0]   |                2031..8238 | copy-in                                   | rc_mlx5/mlx5_bond_0:1/path0                         |
[1737358297.483905] [node13:433275:0]   |               8239..27188 | multi-frag copy-in                        | rc_mlx5/mlx5_bond_0:1/path0                         |
[1737358297.483907] [node13:433275:0]   |                27189..inf | (?) rendezvous zero-copy read from remote | rc_mlx5/mlx5_bond_0:1 50% on path0 and 50% on path1 |
[1737358297.483909] [node13:433275:0]   +---------------------------+-------------------------------------------+-----------------------------------------------------+
[1737358297.484053] [node13:433275:0]   +---------------------------+-------------------------------------------------------------------------------------------------+
[1737358297.484056] [node13:433275:0]   | perftest inter-node cfg#0 | active message by ucp_am_send* with reply flag(fast-completion) from host memory                |
[1737358297.484059] [node13:433275:0]   +---------------------------+-------------------------------------------+-----------------------------------------------------+
[1737358297.484062] [node13:433275:0]   |                   0..2030 | short                                     | rc_mlx5/mlx5_bond_0:1/path0                         |
[1737358297.484064] [node13:433275:0]   |                2031..8238 | copy-in                                   | rc_mlx5/mlx5_bond_0:1/path0                         |
[1737358297.484067] [node13:433275:0]   |               8239..13104 | multi-frag copy-in                        | rc_mlx5/mlx5_bond_0:1/path0                         |
[1737358297.484069] [node13:433275:0]   |             13105..262143 | multi-frag zero-copy                      | rc_mlx5/mlx5_bond_0:1/path0                         |
[1737358297.484072] [node13:433275:0]   |                 256K..inf | (?) rendezvous zero-copy read from remote | rc_mlx5/mlx5_bond_0:1 50% on path0 and 50% on path1 |
[1737358297.484074] [node13:433275:0]   +---------------------------+-------------------------------------------+-----------------------------------------------------+
[1737358297.484256] [node13:433275:0]   +---------------------------+-------------------------------------------------------------------------------------------------+
[1737358297.484259] [node13:433275:0]   | perftest inter-node cfg#0 | active message by ucp_am_send* with reply flag(multi) from host memory                          |
[1737358297.484261] [node13:433275:0]   +---------------------------+-------------------------------------------+-----------------------------------------------------+
[1737358297.484264] [node13:433275:0]   |                    0..514 | short                                     | rc_mlx5/mlx5_bond_0:1/path0                         |
[1737358297.484267] [node13:433275:0]   |                 515..4844 | zero-copy                                 | rc_mlx5/mlx5_bond_0:1/path0                         |
[1737358297.484269] [node13:433275:0]   |                 4845..inf | (?) rendezvous zero-copy read from remote | rc_mlx5/mlx5_bond_0:1 50% on path0 and 50% on path1 |
[1737358297.484272] [node13:433275:0]   +---------------------------+-------------------------------------------+-----------------------------------------------------+
[thread 0]             10511      0.212    95.561    95.561    10464.51   10464.51       10465       10465
[thread 0]             21578      0.211    90.632    93.033    11033.61   10748.86       11034       10749
[thread 0]             32645      0.212    90.619    92.215    11035.15   10844.24       11035       10844
[thread 0]             43712      0.211    90.620    91.811    11035.08   10891.93       11035       10892
[thread 0]             54779      0.212    90.619    91.570    11035.20   10920.57       11035       10921

@yosefe
Copy link
Contributor

yosefe commented Jan 20, 2025

Seems that protocol is still using 2 lanes.
Can you try UCX_IB_NUM_PATHS=8 UCX_MAX_RNDV_LANES=8 ?

@ivanallen
Copy link
Author

@yosefe Still not working.

client:

[root@node13 ucx-1.18.0]# UCX_IB_NUM_PATHS=8 UCX_MAX_RNDV_LANES=8 UCX_NET_DEVICES=mlx5_bond_0:1  UCX_PROTO_ENABLE=y UCX_TLS=rc  UCX_PROTO_INFO=y ./install-release-mt/bin/ucx_perftest 10.16.29.12 -t ucp_am_bw -s 1048576  -n 5000000
[1737359165.518854] [node13:514396:0]        perftest.c:800  UCX  WARN  CPU affinity is not set (bound to 96 cpus). Performance may be impacted.
+--------------+--------------+------------------------------+---------------------+-----------------------+
|              |              |       overhead (usec)        |   bandwidth (MB/s)  |  message rate (msg/s) |
+--------------+--------------+----------+---------+---------+----------+----------+-----------+-----------+
|    Stage     | # iterations | 50.0%ile | average | overall |  average |  overall |  average  |  overall  |
+--------------+--------------+----------+---------+---------+----------+----------+-----------+-----------+
[1737359165.592885] [node13:514396:0]   +---------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
[1737359165.592894] [node13:514396:0]   | perftest inter-node cfg#0 | active message by ucp_am_send* from host memory                                                                                                                             |
[1737359165.592899] [node13:514396:0]   +---------------------------+-------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------+
[1737359165.592902] [node13:514396:0]   |                   0..2038 | short                                     | rc_mlx5/mlx5_bond_0:1/path0                                                                                                     |
[1737359165.592905] [node13:514396:0]   |                2039..8246 | copy-in                                   | rc_mlx5/mlx5_bond_0:1/path0                                                                                                     |
[1737359165.592907] [node13:514396:0]   |               8247..28286 | multi-frag copy-in                        | rc_mlx5/mlx5_bond_0:1/path0                                                                                                     |
[1737359165.592909] [node13:514396:0]   |                28287..inf | (?) rendezvous zero-copy read from remote | rc_mlx5/mlx5_bond_0:1 13% on path0, 13% on path1, 13% on path2, 13% on path3, 13% on path4, 13% on path5, 13% on path6 and 9% o |
[1737359165.592912] [node13:514396:0]   +---------------------------+-------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------+
[1737359165.593073] [node13:514396:0]   +---------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
[1737359165.593077] [node13:514396:0]   | perftest inter-node cfg#0 | active message by ucp_am_send*(fast-completion) from host memory                                                                                                            |
[1737359165.593080] [node13:514396:0]   +---------------------------+-------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------+
[1737359165.593083] [node13:514396:0]   |                   0..2038 | short                                     | rc_mlx5/mlx5_bond_0:1/path0                                                                                                     |
[1737359165.593088] [node13:514396:0]   |                2039..8246 | copy-in                                   | rc_mlx5/mlx5_bond_0:1/path0                                                                                                     |
[1737359165.593096] [node13:514396:0]   |               8247..13104 | multi-frag copy-in                        | rc_mlx5/mlx5_bond_0:1/path0                                                                                                     |
[1737359165.593099] [node13:514396:0]   |             13105..262143 | multi-frag zero-copy                      | rc_mlx5/mlx5_bond_0:1/path0                                                                                                     |
[1737359165.593102] [node13:514396:0]   |                 256K..inf | (?) rendezvous zero-copy read from remote | rc_mlx5/mlx5_bond_0:1 13% on path0, 13% on path1, 13% on path2, 13% on path3, 13% on path4, 13% on path5, 13% on path6 and 9% o |
[1737359165.593105] [node13:514396:0]   +---------------------------+-------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------+
[1737359165.593521] [node13:514396:0]   +---------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
[1737359165.593525] [node13:514396:0]   | perftest inter-node cfg#0 | active message by ucp_am_send*(multi) from host memory                                                                                                                      |
[1737359165.593528] [node13:514396:0]   +---------------------------+-------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------+
[1737359165.593531] [node13:514396:0]   |                    0..514 | short                                     | rc_mlx5/mlx5_bond_0:1/path0                                                                                                     |
[1737359165.593534] [node13:514396:0]   |                 515..8246 | zero-copy                                 | rc_mlx5/mlx5_bond_0:1/path0                                                                                                     |
[1737359165.593537] [node13:514396:0]   |                8247..8784 | multi-frag zero-copy                      | rc_mlx5/mlx5_bond_0:1/path0                                                                                                     |
[1737359165.593540] [node13:514396:0]   |                 8785..inf | (?) rendezvous zero-copy read from remote | rc_mlx5/mlx5_bond_0:1 13% on path0, 13% on path1, 13% on path2, 13% on path3, 13% on path4, 13% on path5, 13% on path6 and 9% o |
[1737359165.593542] [node13:514396:0]   +---------------------------+-------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------+
[1737359165.593704] [node13:514396:0]   +---------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
[1737359165.593707] [node13:514396:0]   | perftest inter-node cfg#0 | active message by ucp_am_send* with reply flag from host memory                                                                                                             |
[1737359165.593709] [node13:514396:0]   +---------------------------+-------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------+
[1737359165.593712] [node13:514396:0]   |                   0..2030 | short                                     | rc_mlx5/mlx5_bond_0:1/path0                                                                                                     |
[1737359165.593716] [node13:514396:0]   |                2031..8238 | copy-in                                   | rc_mlx5/mlx5_bond_0:1/path0                                                                                                     |
[1737359165.593719] [node13:514396:0]   |               8239..28286 | multi-frag copy-in                        | rc_mlx5/mlx5_bond_0:1/path0                                                                                                     |
[1737359165.593721] [node13:514396:0]   |                28287..inf | (?) rendezvous zero-copy read from remote | rc_mlx5/mlx5_bond_0:1 13% on path0, 13% on path1, 13% on path2, 13% on path3, 13% on path4, 13% on path5, 13% on path6 and 9% o |
[1737359165.593724] [node13:514396:0]   +---------------------------+-------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------+
[1737359165.593872] [node13:514396:0]   +---------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
[1737359165.593875] [node13:514396:0]   | perftest inter-node cfg#0 | active message by ucp_am_send* with reply flag(fast-completion) from host memory                                                                                            |
[1737359165.593877] [node13:514396:0]   +---------------------------+-------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------+
[1737359165.593880] [node13:514396:0]   |                   0..2030 | short                                     | rc_mlx5/mlx5_bond_0:1/path0                                                                                                     |
[1737359165.593882] [node13:514396:0]   |                2031..8238 | copy-in                                   | rc_mlx5/mlx5_bond_0:1/path0                                                                                                     |
[1737359165.593885] [node13:514396:0]   |               8239..13104 | multi-frag copy-in                        | rc_mlx5/mlx5_bond_0:1/path0                                                                                                     |
[1737359165.593888] [node13:514396:0]   |             13105..262143 | multi-frag zero-copy                      | rc_mlx5/mlx5_bond_0:1/path0                                                                                                     |
[1737359165.593891] [node13:514396:0]   |                 256K..inf | (?) rendezvous zero-copy read from remote | rc_mlx5/mlx5_bond_0:1 13% on path0, 13% on path1, 13% on path2, 13% on path3, 13% on path4, 13% on path5, 13% on path6 and 9% o |
[1737359165.593893] [node13:514396:0]   +---------------------------+-------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------+
[1737359165.594076] [node13:514396:0]   +---------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
[1737359165.594079] [node13:514396:0]   | perftest inter-node cfg#0 | active message by ucp_am_send* with reply flag(multi) from host memory                                                                                                      |
[1737359165.594081] [node13:514396:0]   +---------------------------+-------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------+
[1737359165.594083] [node13:514396:0]   |                    0..514 | short                                     | rc_mlx5/mlx5_bond_0:1/path0                                                                                                     |
[1737359165.594085] [node13:514396:0]   |                 515..8238 | zero-copy                                 | rc_mlx5/mlx5_bond_0:1/path0                                                                                                     |
[1737359165.594088] [node13:514396:0]   |                8239..8784 | multi-frag zero-copy                      | rc_mlx5/mlx5_bond_0:1/path0                                                                                                     |
[1737359165.594094] [node13:514396:0]   |                 8785..inf | (?) rendezvous zero-copy read from remote | rc_mlx5/mlx5_bond_0:1 13% on path0, 13% on path1, 13% on path2, 13% on path3, 13% on path4, 13% on path5, 13% on path6 and 9% o |
[1737359165.594096] [node13:514396:0]   +---------------------------+-------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------+
[thread 0]             10510      0.210    95.530    95.530    10467.88   10467.88       10468       10468
[thread 0]             21569      0.211    90.642    93.024    11032.43   10749.93       11032       10750
[thread 0]             32656      0.210    90.610    92.204    11036.32   10845.48       11036       10845
[thread 0]             43741      0.210    90.425    91.753    11058.87   10898.77       11059       10899
[thread 0]             54802      0.211    90.837    91.568    11008.72   10920.79       11009       10921
[thread 0]             65870      0.214    90.608    91.407    11036.50   10940.06       11037       10940
[thread 0]             76954      0.209    90.638    91.296    11032.89   10953.33       11033       10953
[thread 0]             88041      0.213    90.417    91.186    11059.83   10966.63       11060       10967
[thread 0]             99098      0.212    90.829    91.146    11009.73   10971.42       11010       10971
[thread 0]            110165      0.208    90.621    91.093    11035.01   10977.78       11035       10978
[thread 0]            121237      0.211    90.625    91.050    11034.51   10982.94       11035       10983
^C

server:

[root@node12 ucx-1.18.0]# UCX_IB_NUM_PATHS=8 UCX_MAX_RNDV_LANES=8 UCX_NET_DEVICES=mlx5_bond_0:1 UCX_TLS=rc  UCX_PROTO_ENABLE=y UCX_PROTO_INFO=y ./install-release-mt/bin/ucx_perftest
[1737359162.594329] [node13:532223:0]        perftest.c:800  UCX  WARN  CPU affinity is not set (bound to 96 cpus). Performance may be impacted.
Waiting for connection...
Accepted connection from 10.16.29.13:40746
+----------------------------------------------------------------------------------------------------------+
| API:          protocol layer                                                                             |
| Test:         am bandwidth / message rate                                                                |
| Data layout:  (automatic)                                                                                |
| Send memory:  host                                                                                       |
| Recv memory:  host                                                                                       |
| Message size: 1048576                                                                                    |
| Window size:  32                                                                                         |
| AM header size: 0                                                                                        |
+----------------------------------------------------------------------------------------------------------+
[1737359165.590136] [node13:532223:0]   +---------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
[1737359165.590146] [node13:532223:0]   | perftest inter-node cfg#0 | active message by ucp_am_send* from host memory                                                                                                                             |
[1737359165.590151] [node13:532223:0]   +---------------------------+-------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------+
[1737359165.590154] [node13:532223:0]   |                   0..2038 | short                                     | rc_mlx5/mlx5_bond_0:1/path0                                                                                                     |
[1737359165.590157] [node13:532223:0]   |                2039..8246 | copy-in                                   | rc_mlx5/mlx5_bond_0:1/path0                                                                                                     |
[1737359165.590159] [node13:532223:0]   |               8247..28286 | multi-frag copy-in                        | rc_mlx5/mlx5_bond_0:1/path0                                                                                                     |
[1737359165.590162] [node13:532223:0]   |                28287..inf | (?) rendezvous zero-copy read from remote | rc_mlx5/mlx5_bond_0:1 13% on path0, 13% on path1, 13% on path2, 13% on path3, 13% on path4, 13% on path5, 13% on path6 and 9% o |
[1737359165.590165] [node13:532223:0]   +---------------------------+-------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------+
[1737359165.590324] [node13:532223:0]   +---------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
[1737359165.590327] [node13:532223:0]   | perftest inter-node cfg#0 | active message by ucp_am_send*(fast-completion) from host memory                                                                                                            |
[1737359165.590331] [node13:532223:0]   +---------------------------+-------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------+
[1737359165.590333] [node13:532223:0]   |                   0..2038 | short                                     | rc_mlx5/mlx5_bond_0:1/path0                                                                                                     |
[1737359165.590336] [node13:532223:0]   |                2039..8246 | copy-in                                   | rc_mlx5/mlx5_bond_0:1/path0                                                                                                     |
[1737359165.590339] [node13:532223:0]   |               8247..13104 | multi-frag copy-in                        | rc_mlx5/mlx5_bond_0:1/path0                                                                                                     |
[1737359165.590342] [node13:532223:0]   |             13105..262143 | multi-frag zero-copy                      | rc_mlx5/mlx5_bond_0:1/path0                                                                                                     |
[1737359165.590345] [node13:532223:0]   |                 256K..inf | (?) rendezvous zero-copy read from remote | rc_mlx5/mlx5_bond_0:1 13% on path0, 13% on path1, 13% on path2, 13% on path3, 13% on path4, 13% on path5, 13% on path6 and 9% o |
[1737359165.590347] [node13:532223:0]   +---------------------------+-------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------+
[1737359165.590764] [node13:532223:0]   +---------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
[1737359165.590767] [node13:532223:0]   | perftest inter-node cfg#0 | active message by ucp_am_send*(multi) from host memory                                                                                                                      |
[1737359165.590770] [node13:532223:0]   +---------------------------+-------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------+
[1737359165.590773] [node13:532223:0]   |                    0..514 | short                                     | rc_mlx5/mlx5_bond_0:1/path0                                                                                                     |
[1737359165.590776] [node13:532223:0]   |                 515..8246 | zero-copy                                 | rc_mlx5/mlx5_bond_0:1/path0                                                                                                     |
[1737359165.590779] [node13:532223:0]   |                8247..8784 | multi-frag zero-copy                      | rc_mlx5/mlx5_bond_0:1/path0                                                                                                     |
[1737359165.590782] [node13:532223:0]   |                 8785..inf | (?) rendezvous zero-copy read from remote | rc_mlx5/mlx5_bond_0:1 13% on path0, 13% on path1, 13% on path2, 13% on path3, 13% on path4, 13% on path5, 13% on path6 and 9% o |
[1737359165.590785] [node13:532223:0]   +---------------------------+-------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------+
[1737359165.590947] [node13:532223:0]   +---------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
[1737359165.590950] [node13:532223:0]   | perftest inter-node cfg#0 | active message by ucp_am_send* with reply flag from host memory                                                                                                             |
[1737359165.590953] [node13:532223:0]   +---------------------------+-------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------+
[1737359165.590956] [node13:532223:0]   |                   0..2030 | short                                     | rc_mlx5/mlx5_bond_0:1/path0                                                                                                     |
[1737359165.590959] [node13:532223:0]   |                2031..8238 | copy-in                                   | rc_mlx5/mlx5_bond_0:1/path0                                                                                                     |
[1737359165.590962] [node13:532223:0]   |               8239..28286 | multi-frag copy-in                        | rc_mlx5/mlx5_bond_0:1/path0                                                                                                     |
[1737359165.590965] [node13:532223:0]   |                28287..inf | (?) rendezvous zero-copy read from remote | rc_mlx5/mlx5_bond_0:1 13% on path0, 13% on path1, 13% on path2, 13% on path3, 13% on path4, 13% on path5, 13% on path6 and 9% o |
[1737359165.590967] [node13:532223:0]   +---------------------------+-------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------+
[1737359165.591114] [node13:532223:0]   +---------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
[1737359165.591117] [node13:532223:0]   | perftest inter-node cfg#0 | active message by ucp_am_send* with reply flag(fast-completion) from host memory                                                                                            |
[1737359165.591120] [node13:532223:0]   +---------------------------+-------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------+
[1737359165.591123] [node13:532223:0]   |                   0..2030 | short                                     | rc_mlx5/mlx5_bond_0:1/path0                                                                                                     |
[1737359165.591124] [node13:532223:0]   |                2031..8238 | copy-in                                   | rc_mlx5/mlx5_bond_0:1/path0                                                                                                     |
[1737359165.591127] [node13:532223:0]   |               8239..13104 | multi-frag copy-in                        | rc_mlx5/mlx5_bond_0:1/path0                                                                                                     |
[1737359165.591130] [node13:532223:0]   |             13105..262143 | multi-frag zero-copy                      | rc_mlx5/mlx5_bond_0:1/path0                                                                                                     |
[1737359165.591133] [node13:532223:0]   |                 256K..inf | (?) rendezvous zero-copy read from remote | rc_mlx5/mlx5_bond_0:1 13% on path0, 13% on path1, 13% on path2, 13% on path3, 13% on path4, 13% on path5, 13% on path6 and 9% o |
[1737359165.591135] [node13:532223:0]   +---------------------------+-------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------+
[1737359165.591317] [node13:532223:0]   +---------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
[1737359165.591320] [node13:532223:0]   | perftest inter-node cfg#0 | active message by ucp_am_send* with reply flag(multi) from host memory                                                                                                      |
[1737359165.591323] [node13:532223:0]   +---------------------------+-------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------+
[1737359165.591326] [node13:532223:0]   |                    0..514 | short                                     | rc_mlx5/mlx5_bond_0:1/path0                                                                                                     |
[1737359165.591329] [node13:532223:0]   |                 515..8238 | zero-copy                                 | rc_mlx5/mlx5_bond_0:1/path0                                                                                                     |
[1737359165.591333] [node13:532223:0]   |                8239..8784 | multi-frag zero-copy                      | rc_mlx5/mlx5_bond_0:1/path0                                                                                                     |
[1737359165.591336] [node13:532223:0]   |                 8785..inf | (?) rendezvous zero-copy read from remote | rc_mlx5/mlx5_bond_0:1 13% on path0, 13% on path1, 13% on path2, 13% on path3, 13% on path4, 13% on path5, 13% on path6 and 9% o |
[1737359165.591338] [node13:532223:0]   +---------------------------+-------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------+

@yosefe
Copy link
Contributor

yosefe commented Jan 20, 2025

@ivanallen you mentioned ib_send_bw gets 200Gbs. Does ib_read_bw also get 200 Gbs?

@ivanallen
Copy link
Author

ivanallen commented Jan 20, 2025

@yosefe read doesn't look as good as send, but it can still exceed 100Gbps

Image

[root@node13 ucx-1.18.0]# ib_read_bw -d mlx5_bond_0 -F --report_gbits  10.16.29.12 -q 8 -s 1048576 --run_infinitely
---------------------------------------------------------------------------------------
                    RDMA_Read BW Test
 Dual-port       : OFF          Device         : mlx5_bond_0
 Number of qps   : 8            Transport type : IB
 Connection type : RC           Using SRQ      : OFF
 PCIe relax order: ON
 ibv_wr* API     : ON
 TX depth        : 128
 CQ Moderation   : 1
 Mtu             : 1024[B]
 Link type       : Ethernet
 GID index       : 3
 Outstand reads  : 16
 rdma_cm QPs     : OFF
 Data ex. method : Ethernet
---------------------------------------------------------------------------------------
 local address: LID 0000 QPN 0xca30 PSN 0x812776 OUT 0x10 RKey 0x01ab33 VAddr 0x007f11b0c7e000
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:10:16:29:13
 local address: LID 0000 QPN 0xca31 PSN 0x13b28 OUT 0x10 RKey 0x01ab33 VAddr 0x007f11b0d7e000
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:10:16:29:13
 local address: LID 0000 QPN 0xca32 PSN 0xfda152 OUT 0x10 RKey 0x01ab33 VAddr 0x007f11b0e7e000
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:10:16:29:13
 local address: LID 0000 QPN 0xca33 PSN 0xabfb39 OUT 0x10 RKey 0x01ab33 VAddr 0x007f11b0f7e000
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:10:16:29:13
 local address: LID 0000 QPN 0xca34 PSN 0xece358 OUT 0x10 RKey 0x01ab33 VAddr 0x007f11b107e000
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:10:16:29:13
 local address: LID 0000 QPN 0xca35 PSN 0xd631da OUT 0x10 RKey 0x01ab33 VAddr 0x007f11b117e000
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:10:16:29:13
 local address: LID 0000 QPN 0xca36 PSN 0xded731 OUT 0x10 RKey 0x01ab33 VAddr 0x007f11b127e000
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:10:16:29:13
 local address: LID 0000 QPN 0xca37 PSN 0x28e1d6 OUT 0x10 RKey 0x01ab33 VAddr 0x007f11b137e000
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:10:16:29:13
 remote address: LID 0000 QPN 0x192d8 PSN 0x294c86 OUT 0x10 RKey 0x00824e VAddr 0x007fbc9d6d4000
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:10:16:29:12
 remote address: LID 0000 QPN 0x192d9 PSN 0xa562f8 OUT 0x10 RKey 0x00824e VAddr 0x007fbc9d7d4000
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:10:16:29:12
 remote address: LID 0000 QPN 0x192da PSN 0x6774e2 OUT 0x10 RKey 0x00824e VAddr 0x007fbc9d8d4000
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:10:16:29:12
 remote address: LID 0000 QPN 0x192db PSN 0x8e6f89 OUT 0x10 RKey 0x00824e VAddr 0x007fbc9d9d4000
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:10:16:29:12
 remote address: LID 0000 QPN 0x192dc PSN 0x984968 OUT 0x10 RKey 0x00824e VAddr 0x007fbc9dad4000
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:10:16:29:12
 remote address: LID 0000 QPN 0x192dd PSN 0x4306aa OUT 0x10 RKey 0x00824e VAddr 0x007fbc9dbd4000
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:10:16:29:12
 remote address: LID 0000 QPN 0x192de PSN 0x2853c1 OUT 0x10 RKey 0x00824e VAddr 0x007fbc9dcd4000
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:10:16:29:12
 remote address: LID 0000 QPN 0x192df PSN 0x1c4b26 OUT 0x10 RKey 0x00824e VAddr 0x007fbc9ddd4000
 GID: 00:00:00:00:00:00:00:00:00:00:255:255:10:16:29:12
---------------------------------------------------------------------------------------
 #bytes     #iterations    BW peak[Gb/sec]    BW average[Gb/sec]   MsgRate[Mpps]
 1048576    92279            0.00               154.82             0.018456
 1048576    91761            0.00               153.95             0.018352
 1048576    91687            0.00               153.82             0.018337
 1048576    91435            0.00               153.40             0.018287
 1048576    91775            0.00               153.97             0.018355
 1048576    91094            0.00               152.83             0.018219
 1048576    91741            0.00               153.91             0.018348
 1048576    91301            0.00               153.18             0.018260
 1048576    91224            0.00               153.05             0.018245
 1048576    91338            0.00               153.24             0.018267
 1048576    91240            0.00               153.07             0.018248
 1048576    91298            0.00               153.17             0.018260
 1048576    91394            0.00               153.33             0.018279
 1048576    91251            0.00               153.09             0.018250
 1048576    91213            0.00               153.03             0.018243
 1048576    90891            0.00               152.49             0.018178
 1048576    91558            0.00               153.61             0.018311
 1048576    91605            0.00               153.69             0.018321
 1048576    91454            0.00               153.43             0.018291
 1048576    91284            0.00               153.15             0.018257
 1048576    90653            0.00               152.09             0.018130
 1048576    91176            0.00               152.97             0.018235
 1048576    90522            0.00               151.87             0.018104
 1048576    91336            0.00               153.24             0.018267
 1048576    90645            0.00               152.08             0.018129
 1048576    91137            0.00               152.90             0.018227
 1048576    91150            0.00               152.92             0.018230
 1048576    90811            0.00               152.35             0.018162
 1048576    91135            0.00               152.90             0.018227
 1048576    90720            0.00               152.20             0.018144
 1048576    90721            0.00               152.20             0.018144
 1048576    91280            0.00               153.14             0.018256
 1048576    90349            0.00               151.58             0.018069
 1048576    90655            0.00               152.09             0.018131
 1048576    90316            0.00               151.52             0.018063
 1048576    89951            0.00               150.91             0.017990
 1048576    90792            0.00               152.32             0.018158
 1048576    90991            0.00               152.66             0.018198

@yosefe
Copy link
Contributor

yosefe commented Jan 20, 2025

Can you try with
UCX_IB_NUM_PATHS=8 UCX_MAX_RNDV_LANES=8 UCX_RNDV_SCHEME=put_zcopy?

@ivanallen
Copy link
Author

ivanallen commented Jan 20, 2025

@yosefe Still not working.

client:

[root@node12 ucx-1.18.0]# UCX_RNDV_SCHEME=put_zcopy UCX_IB_NUM_PATHS=8 UCX_MAX_RNDV_LANES=8 UCX_NET_DEVICES=mlx5_bond_0:1  UCX_PROTO_ENABLE=y UCX_TLS=rc  UCX_PROTO_INFO=y ./install-release-mt/bin/ucx_perftest 10.16.29.13 -t ucp_am_bw -s 1048576  -n 5000000
[1737366121.602703] [node13:1181838:0]        perftest.c:800  UCX  WARN  CPU affinity is not set (bound to 96 cpus). Performance may be impacted.
+--------------+--------------+------------------------------+---------------------+-----------------------+
|              |              |       overhead (usec)        |   bandwidth (MB/s)  |  message rate (msg/s) |
+--------------+--------------+----------+---------+---------+----------+----------+-----------+-----------+
|    Stage     | # iterations | 50.0%ile | average | overall |  average |  overall |  average  |  overall  |
+--------------+--------------+----------+---------+---------+----------+----------+-----------+-----------+
[1737366121.685410] [node13:1181838:0]   +---------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
[1737366121.685418] [node13:1181838:0]   | perftest inter-node cfg#0 | active message by ucp_am_send* from host memory                                                                                                                                   |
[1737366121.685422] [node13:1181838:0]   +---------------------------+-------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------+
[1737366121.685425] [node13:1181838:0]   |                         0 | short                                           | rc_mlx5/mlx5_bond_0:1/path0                                                                                                     |
[1737366121.685427] [node13:1181838:0]   |                  1..14060 | (?) rendezvous fragmented copy-in copy-out      | rc_mlx5/mlx5_bond_0:1/path0                                                                                                     |
[1737366121.685429] [node13:1181838:0]   |                14061..inf | (?) rendezvous zero-copy fenced write to remote | rc_mlx5/mlx5_bond_0:1 13% on path0, 13% on path1, 13% on path2, 13% on path3, 13% on path4, 13% on path5, 13% on path6 and 9% o |
[1737366121.685432] [node13:1181838:0]   +---------------------------+-------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------+
[1737366121.685601] [node13:1181838:0]   +---------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
[1737366121.685605] [node13:1181838:0]   | perftest inter-node cfg#0 | active message by ucp_am_send*(fast-completion) from host memory                                                                                                                  |
[1737366121.685609] [node13:1181838:0]   +---------------------------+-------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------+
[1737366121.685612] [node13:1181838:0]   |                   0..2038 | short                                           | rc_mlx5/mlx5_bond_0:1/path0                                                                                                     |
[1737366121.685615] [node13:1181838:0]   |                2039..8246 | copy-in                                         | rc_mlx5/mlx5_bond_0:1/path0                                                                                                     |
[1737366121.685618] [node13:1181838:0]   |               8247..13104 | multi-frag copy-in                              | rc_mlx5/mlx5_bond_0:1/path0                                                                                                     |
[1737366121.685620] [node13:1181838:0]   |             13105..262143 | multi-frag zero-copy                            | rc_mlx5/mlx5_bond_0:1/path0                                                                                                     |
[1737366121.685622] [node13:1181838:0]   |                 256K..inf | (?) rendezvous zero-copy fenced write to remote | rc_mlx5/mlx5_bond_0:1 13% on path0, 13% on path1, 13% on path2, 13% on path3, 13% on path4, 13% on path5, 13% on path6 and 9% o |
[1737366121.685624] [node13:1181838:0]   +---------------------------+-------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------+
[1737366121.685993] [node13:1181838:0]   +---------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
[1737366121.685997] [node13:1181838:0]   | perftest inter-node cfg#0 | active message by ucp_am_send*(multi) from host memory                                                                                                                            |
[1737366121.686000] [node13:1181838:0]   +---------------------------+-------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------+
[1737366121.686003] [node13:1181838:0]   |                         0 | short                                           | rc_mlx5/mlx5_bond_0:1/path0                                                                                                     |
[1737366121.686005] [node13:1181838:0]   |                    1..477 | (?) rendezvous fragmented copy-in copy-out      | rc_mlx5/mlx5_bond_0:1/path0                                                                                                     |
[1737366121.686009] [node13:1181838:0]   |                 478..6863 | (?) rendezvous zero-copy                        | rc_mlx5/mlx5_bond_0:1/path0                                                                                                     |
[1737366121.686012] [node13:1181838:0]   |                 6864..inf | (?) rendezvous zero-copy fenced write to remote | rc_mlx5/mlx5_bond_0:1 13% on path0, 13% on path1, 13% on path2, 13% on path3, 13% on path4, 13% on path5, 13% on path6 and 9% o |
[1737366121.686015] [node13:1181838:0]   +---------------------------+-------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------+
[1737366121.686152] [node13:1181838:0]   +---------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
[1737366121.686155] [node13:1181838:0]   | perftest inter-node cfg#0 | active message by ucp_am_send* with reply flag from host memory                                                                                                                   |
[1737366121.686158] [node13:1181838:0]   +---------------------------+-------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------+
[1737366121.686160] [node13:1181838:0]   |                         0 | short                                           | rc_mlx5/mlx5_bond_0:1/path0                                                                                                     |
[1737366121.686164] [node13:1181838:0]   |                  1..14060 | (?) rendezvous fragmented copy-in copy-out      | rc_mlx5/mlx5_bond_0:1/path0                                                                                                     |
[1737366121.686167] [node13:1181838:0]   |                14061..inf | (?) rendezvous zero-copy fenced write to remote | rc_mlx5/mlx5_bond_0:1 13% on path0, 13% on path1, 13% on path2, 13% on path3, 13% on path4, 13% on path5, 13% on path6 and 9% o |
[1737366121.686169] [node13:1181838:0]   +---------------------------+-------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------+
[1737366121.686318] [node13:1181838:0]   +---------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
[1737366121.686322] [node13:1181838:0]   | perftest inter-node cfg#0 | active message by ucp_am_send* with reply flag(fast-completion) from host memory                                                                                                  |
[1737366121.686325] [node13:1181838:0]   +---------------------------+-------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------+
[1737366121.686327] [node13:1181838:0]   |                   0..2030 | short                                           | rc_mlx5/mlx5_bond_0:1/path0                                                                                                     |
[1737366121.686329] [node13:1181838:0]   |                2031..8238 | copy-in                                         | rc_mlx5/mlx5_bond_0:1/path0                                                                                                     |
[1737366121.686332] [node13:1181838:0]   |               8239..13104 | multi-frag copy-in                              | rc_mlx5/mlx5_bond_0:1/path0                                                                                                     |
[1737366121.686334] [node13:1181838:0]   |             13105..262143 | multi-frag zero-copy                            | rc_mlx5/mlx5_bond_0:1/path0                                                                                                     |
[1737366121.686337] [node13:1181838:0]   |                 256K..inf | (?) rendezvous zero-copy fenced write to remote | rc_mlx5/mlx5_bond_0:1 13% on path0, 13% on path1, 13% on path2, 13% on path3, 13% on path4, 13% on path5, 13% on path6 and 9% o |
[1737366121.686339] [node13:1181838:0]   +---------------------------+-------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------+
[1737366121.686488] [node13:1181838:0]   +---------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
[1737366121.686491] [node13:1181838:0]   | perftest inter-node cfg#0 | active message by ucp_am_send* with reply flag(multi) from host memory                                                                                                            |
[1737366121.686493] [node13:1181838:0]   +---------------------------+-------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------+
[1737366121.686496] [node13:1181838:0]   |                         0 | short                                           | rc_mlx5/mlx5_bond_0:1/path0                                                                                                     |
[1737366121.686498] [node13:1181838:0]   |                    1..477 | (?) rendezvous fragmented copy-in copy-out      | rc_mlx5/mlx5_bond_0:1/path0                                                                                                     |
[1737366121.686501] [node13:1181838:0]   |                 478..6863 | (?) rendezvous zero-copy                        | rc_mlx5/mlx5_bond_0:1/path0                                                                                                     |
[1737366121.686503] [node13:1181838:0]   |                 6864..inf | (?) rendezvous zero-copy fenced write to remote | rc_mlx5/mlx5_bond_0:1 13% on path0, 13% on path1, 13% on path2, 13% on path3, 13% on path4, 13% on path5, 13% on path6 and 9% o |
[1737366121.686505] [node13:1181838:0]   +---------------------------+-------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------+
[thread 0]             10350     91.789    96.850    96.850    10325.26   10325.26       10325       10325
[thread 0]             21235     91.768    92.094    94.412    10858.48   10591.88       10858       10592
[thread 0]             32125     93.450    92.053    93.612    10863.36   10682.37       10863       10682
[thread 0]             43020     91.944    92.003    93.205    10869.17   10729.07       10869       10729
[thread 0]             53907     92.151    92.076    92.977    10860.55   10755.37       10861       10755
[thread 0]             64796     92.919    92.054    92.822    10863.22   10773.34       10863       10773
[thread 0]             75687     91.768    92.037    92.709    10865.22   10786.47       10865       10786
[thread 0]             86571     91.776    92.102    92.632    10857.58   10795.36       10858       10795

server

[root@node13 ucx-1.18.0]# UCX_RNDV_SCHEME=put_zcopy UCX_IB_NUM_PATHS=8 UCX_MAX_RNDV_LANES=8 UCX_NET_DEVICES=mlx5_bond_0:1 UCX_TLS=rc  UCX_PROTO_ENABLE=y UCX_PROTO_INFO=y ./install-release-mt/bin/ucx_perftest
[1737366117.008765] [node13:1163009:0]        perftest.c:800  UCX  WARN  CPU affinity is not set (bound to 96 cpus). Performance may be impacted.
Waiting for connection...
Accepted connection from 10.16.29.12:49742
+----------------------------------------------------------------------------------------------------------+
| API:          protocol layer                                                                             |
| Test:         am bandwidth / message rate                                                                |
| Data layout:  (automatic)                                                                                |
| Send memory:  host                                                                                       |
| Recv memory:  host                                                                                       |
| Message size: 1048576                                                                                    |
| Window size:  32                                                                                         |
| AM header size: 0                                                                                        |
+----------------------------------------------------------------------------------------------------------+
[1737366121.688332] [node13:1163009:0]   +---------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
[1737366121.688341] [node13:1163009:0]   | perftest inter-node cfg#0 | active message by ucp_am_send* from host memory                                                                                                                                   |
[1737366121.688345] [node13:1163009:0]   +---------------------------+-------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------+
[1737366121.688348] [node13:1163009:0]   |                         0 | short                                           | rc_mlx5/mlx5_bond_0:1/path0                                                                                                     |
[1737366121.688351] [node13:1163009:0]   |                  1..14060 | (?) rendezvous fragmented copy-in copy-out      | rc_mlx5/mlx5_bond_0:1/path0                                                                                                     |
[1737366121.688354] [node13:1163009:0]   |                14061..inf | (?) rendezvous zero-copy fenced write to remote | rc_mlx5/mlx5_bond_0:1 13% on path0, 13% on path1, 13% on path2, 13% on path3, 13% on path4, 13% on path5, 13% on path6 and 9% o |
[1737366121.688357] [node13:1163009:0]   +---------------------------+-------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------+
[1737366121.688514] [node13:1163009:0]   +---------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
[1737366121.688517] [node13:1163009:0]   | perftest inter-node cfg#0 | active message by ucp_am_send*(fast-completion) from host memory                                                                                                                  |
[1737366121.688520] [node13:1163009:0]   +---------------------------+-------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------+
[1737366121.688523] [node13:1163009:0]   |                   0..2038 | short                                           | rc_mlx5/mlx5_bond_0:1/path0                                                                                                     |
[1737366121.688526] [node13:1163009:0]   |                2039..8246 | copy-in                                         | rc_mlx5/mlx5_bond_0:1/path0                                                                                                     |
[1737366121.688528] [node13:1163009:0]   |               8247..13104 | multi-frag copy-in                              | rc_mlx5/mlx5_bond_0:1/path0                                                                                                     |
[1737366121.688531] [node13:1163009:0]   |             13105..262143 | multi-frag zero-copy                            | rc_mlx5/mlx5_bond_0:1/path0                                                                                                     |
[1737366121.688533] [node13:1163009:0]   |                 256K..inf | (?) rendezvous zero-copy fenced write to remote | rc_mlx5/mlx5_bond_0:1 13% on path0, 13% on path1, 13% on path2, 13% on path3, 13% on path4, 13% on path5, 13% on path6 and 9% o |
[1737366121.688536] [node13:1163009:0]   +---------------------------+-------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------+
[1737366121.688896] [node13:1163009:0]   +---------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
[1737366121.688900] [node13:1163009:0]   | perftest inter-node cfg#0 | active message by ucp_am_send*(multi) from host memory                                                                                                                            |
[1737366121.688902] [node13:1163009:0]   +---------------------------+-------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------+
[1737366121.688905] [node13:1163009:0]   |                         0 | short                                           | rc_mlx5/mlx5_bond_0:1/path0                                                                                                     |
[1737366121.688907] [node13:1163009:0]   |                    1..477 | (?) rendezvous fragmented copy-in copy-out      | rc_mlx5/mlx5_bond_0:1/path0                                                                                                     |
[1737366121.688911] [node13:1163009:0]   |                 478..6863 | (?) rendezvous zero-copy                        | rc_mlx5/mlx5_bond_0:1/path0                                                                                                     |
[1737366121.688914] [node13:1163009:0]   |                 6864..inf | (?) rendezvous zero-copy fenced write to remote | rc_mlx5/mlx5_bond_0:1 13% on path0, 13% on path1, 13% on path2, 13% on path3, 13% on path4, 13% on path5, 13% on path6 and 9% o |
[1737366121.688916] [node13:1163009:0]   +---------------------------+-------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------+
[1737366121.689049] [node13:1163009:0]   +---------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
[1737366121.689052] [node13:1163009:0]   | perftest inter-node cfg#0 | active message by ucp_am_send* with reply flag from host memory                                                                                                                   |
[1737366121.689054] [node13:1163009:0]   +---------------------------+-------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------+
[1737366121.689056] [node13:1163009:0]   |                         0 | short                                           | rc_mlx5/mlx5_bond_0:1/path0                                                                                                     |
[1737366121.689060] [node13:1163009:0]   |                  1..14060 | (?) rendezvous fragmented copy-in copy-out      | rc_mlx5/mlx5_bond_0:1/path0                                                                                                     |
[1737366121.689062] [node13:1163009:0]   |                14061..inf | (?) rendezvous zero-copy fenced write to remote | rc_mlx5/mlx5_bond_0:1 13% on path0, 13% on path1, 13% on path2, 13% on path3, 13% on path4, 13% on path5, 13% on path6 and 9% o |
[1737366121.689065] [node13:1163009:0]   +---------------------------+-------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------+
[1737366121.689212] [node13:1163009:0]   +---------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
[1737366121.689215] [node13:1163009:0]   | perftest inter-node cfg#0 | active message by ucp_am_send* with reply flag(fast-completion) from host memory                                                                                                  |
[1737366121.689218] [node13:1163009:0]   +---------------------------+-------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------+
[1737366121.689221] [node13:1163009:0]   |                   0..2030 | short                                           | rc_mlx5/mlx5_bond_0:1/path0                                                                                                     |
[1737366121.689223] [node13:1163009:0]   |                2031..8238 | copy-in                                         | rc_mlx5/mlx5_bond_0:1/path0                                                                                                     |
[1737366121.689226] [node13:1163009:0]   |               8239..13104 | multi-frag copy-in                              | rc_mlx5/mlx5_bond_0:1/path0                                                                                                     |
[1737366121.689228] [node13:1163009:0]   |             13105..262143 | multi-frag zero-copy                            | rc_mlx5/mlx5_bond_0:1/path0                                                                                                     |
[1737366121.689231] [node13:1163009:0]   |                 256K..inf | (?) rendezvous zero-copy fenced write to remote | rc_mlx5/mlx5_bond_0:1 13% on path0, 13% on path1, 13% on path2, 13% on path3, 13% on path4, 13% on path5, 13% on path6 and 9% o |
[1737366121.689234] [node13:1163009:0]   +---------------------------+-------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------+
[1737366121.689384] [node13:1163009:0]   +---------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
[1737366121.689387] [node13:1163009:0]   | perftest inter-node cfg#0 | active message by ucp_am_send* with reply flag(multi) from host memory                                                                                                            |
[1737366121.689389] [node13:1163009:0]   +---------------------------+-------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------+
[1737366121.689392] [node13:1163009:0]   |                         0 | short                                           | rc_mlx5/mlx5_bond_0:1/path0                                                                                                     |
[1737366121.689395] [node13:1163009:0]   |                    1..477 | (?) rendezvous fragmented copy-in copy-out      | rc_mlx5/mlx5_bond_0:1/path0                                                                                                     |
[1737366121.689398] [node13:1163009:0]   |                 478..6863 | (?) rendezvous zero-copy                        | rc_mlx5/mlx5_bond_0:1/path0                                                                                                     |
[1737366121.689401] [node13:1163009:0]   |                 6864..inf | (?) rendezvous zero-copy fenced write to remote | rc_mlx5/mlx5_bond_0:1 13% on path0, 13% on path1, 13% on path2, 13% on path3, 13% on path4, 13% on path5, 13% on path6 and 9% o |
[1737366121.689403] [node13:1163009:0]   +---------------------------+-------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------+

@yosefe
Copy link
Contributor

yosefe commented Jan 21, 2025

@ivanallen can you pls try appying the following patch to ucx and test with
UCX_IB_NUM_PATHS=8 UCX_MAX_RNDV_LANES=8 UCX_RNDV_SCHEME=put_zcopy

diff --git a/src/uct/ib/mlx5/dv/ib_mlx5_dv.c b/src/uct/ib/mlx5/dv/ib_mlx5_dv.c
index 4da30d4f9a..ce2c9be2bd 100644
--- a/src/uct/ib/mlx5/dv/ib_mlx5_dv.c
+++ b/src/uct/ib/mlx5/dv/ib_mlx5_dv.c
@@ -469,6 +469,8 @@ void uct_ib_mlx5_devx_set_qpc_port_affinity(uct_ib_mlx5_md_t *md,
     uct_ib_device_t *dev = &md->super.dev;
     uint8_t tx_port      = dev->first_port;

+    return;
+
     if (!(md->flags & UCT_IB_MLX5_MD_FLAG_LAG)) {
         return;
     }

@ivanallen
Copy link
Author

@yosefe I've tried that. It doesn't seem to work. Using multiple threads also doesn't improve bandwidth.

  • apply patch
    Image

  • make
    Image

  • make install

@yosefe
Copy link
Contributor

yosefe commented Jan 21, 2025

@ivanallen what about
UCX_IB_NUM_PATHS=8 UCX_MAX_RNDV_LANES=8 UCX_RNDV_SCHEME=put_zcopy UCX_TLS=rc_v (to disable devx)?

@ivanallen
Copy link
Author

@yosefe Not working too.

client

[root@node12 ucx-1.18.0]# UCX_RNDV_SCHEME=put_zcopy UCX_IB_NUM_PATHS=8 UCX_MAX_RNDV_LANES=8 UCX_NET_DEVICES=mlx5_bond_0:1  UCX_PROTO_ENABLE=y UCX_TLS=rc_v  UCX_PROTO_INFO=y ./install-release-mt/bin/ucx_perftest 10.16.29.13 -t ucp_am_bw -s 1048576  -n 5000000 -T 2
[1737449293.242806] [node13:565911:0]        perftest.c:800  UCX  WARN  CPU affinity is not set (bound to 96 cpus). Performance may be impacted.
+--------------+--------------+------------------------------+---------------------+-----------------------+
|              |              |       overhead (usec)        |   bandwidth (MB/s)  |  message rate (msg/s) |
+--------------+--------------+----------+---------+---------+----------+----------+-----------+-----------+
|    Stage     | # iterations | 50.0%ile | average | overall |  average |  overall |  average  |  overall  |
+--------------+--------------+----------+---------+---------+----------+----------+-----------+-----------+
[1737449293.324982] [node13:565911:0]   +---------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
[1737449293.324991] [node13:565911:0]   | perftest inter-node cfg#0 | active message by ucp_am_send* from host memory                                                                                                                                  |
[1737449293.324995] [node13:565911:0]   +---------------------------+-------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------+
[1737449293.324999] [node13:565911:0]   |                         0 | short                                           | rc_verbs/mlx5_bond_0:1/path0                                                                                                   |
[1737449293.325002] [node13:565911:0]   |                  1..15562 | (?) rendezvous fragmented copy-in copy-out      | rc_verbs/mlx5_bond_0:1/path0                                                                                                   |
[1737449293.325005] [node13:565911:0]   |                15563..inf | (?) rendezvous zero-copy fenced write to remote | rc_verbs/mlx5_bond_0:1 13% on path0, 13% on path1, 13% on path2, 13% on path3, 13% on path4, 13% on path5, 13% on path6 and 9% |
[1737449293.325007] [node13:565911:0]   +---------------------------+-------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------+
[1737449293.325165] [node13:565911:0]   +---------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
[1737449293.325168] [node13:565911:0]   | perftest inter-node cfg#0 | active message by ucp_am_send*(fast-completion) from host memory                                                                                                                 |
[1737449293.325172] [node13:565911:0]   +---------------------------+-------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------+
[1737449293.325175] [node13:565911:0]   |                    0..115 | short                                           | rc_verbs/mlx5_bond_0:1/path0                                                                                                   |
[1737449293.325178] [node13:565911:0]   |                 116..8247 | copy-in                                         | rc_verbs/mlx5_bond_0:1/path0                                                                                                   |
[1737449293.325181] [node13:565911:0]   |               8248..12591 | multi-frag copy-in                              | rc_verbs/mlx5_bond_0:1/path0                                                                                                   |
[1737449293.325184] [node13:565911:0]   |             12592..262143 | multi-frag zero-copy                            | rc_verbs/mlx5_bond_0:1/path0                                                                                                   |
[1737449293.325187] [node13:565911:0]   |                 256K..inf | (?) rendezvous zero-copy fenced write to remote | rc_verbs/mlx5_bond_0:1 13% on path0, 13% on path1, 13% on path2, 13% on path3, 13% on path4, 13% on path5, 13% on path6 and 9% |
[1737449293.325189] [node13:565911:0]   +---------------------------+-------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------+
[1737449293.325552] [node13:565911:0]   +---------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
[1737449293.325555] [node13:565911:0]   | perftest inter-node cfg#0 | active message by ucp_am_send*(multi) from host memory                                                                                                                           |
[1737449293.325573] [node13:565911:0]   +---------------------------+-------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------+
[1737449293.325576] [node13:565911:0]   |                         0 | short                                           | rc_verbs/mlx5_bond_0:1/path0                                                                                                   |
[1737449293.325579] [node13:565911:0]   |                    1..477 | (?) rendezvous fragmented copy-in copy-out      | rc_verbs/mlx5_bond_0:1/path0                                                                                                   |
[1737449293.325582] [node13:565911:0]   |                 478..8290 | (?) rendezvous zero-copy                        | rc_verbs/mlx5_bond_0:1/path0                                                                                                   |
[1737449293.325585] [node13:565911:0]   |                 8291..inf | (?) rendezvous zero-copy fenced write to remote | rc_verbs/mlx5_bond_0:1 13% on path0, 13% on path1, 13% on path2, 13% on path3, 13% on path4, 13% on path5, 13% on path6 and 9% |
[1737449293.325588] [node13:565911:0]   +---------------------------+-------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------+
[1737449293.325726] [node13:565911:0]   +---------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
[1737449293.325729] [node13:565911:0]   | perftest inter-node cfg#0 | active message by ucp_am_send* with reply flag from host memory                                                                                                                  |
[1737449293.325732] [node13:565911:0]   +---------------------------+-------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------+
[1737449293.325735] [node13:565911:0]   |                         0 | short                                           | rc_verbs/mlx5_bond_0:1/path0                                                                                                   |
[1737449293.325738] [node13:565911:0]   |                  1..15562 | (?) rendezvous fragmented copy-in copy-out      | rc_verbs/mlx5_bond_0:1/path0                                                                                                   |
[1737449293.325741] [node13:565911:0]   |                15563..inf | (?) rendezvous zero-copy fenced write to remote | rc_verbs/mlx5_bond_0:1 13% on path0, 13% on path1, 13% on path2, 13% on path3, 13% on path4, 13% on path5, 13% on path6 and 9% |
[1737449293.325744] [node13:565911:0]   +---------------------------+-------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------+
[1737449293.325895] [node13:565911:0]   +---------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
[1737449293.325898] [node13:565911:0]   | perftest inter-node cfg#0 | active message by ucp_am_send* with reply flag(fast-completion) from host memory                                                                                                 |
[1737449293.325901] [node13:565911:0]   +---------------------------+-------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------+
[1737449293.325904] [node13:565911:0]   |                    0..107 | short                                           | rc_verbs/mlx5_bond_0:1/path0                                                                                                   |
[1737449293.325907] [node13:565911:0]   |                 108..8239 | copy-in                                         | rc_verbs/mlx5_bond_0:1/path0                                                                                                   |
[1737449293.325910] [node13:565911:0]   |               8240..12591 | multi-frag copy-in                              | rc_verbs/mlx5_bond_0:1/path0                                                                                                   |
[1737449293.325913] [node13:565911:0]   |             12592..262143 | multi-frag zero-copy                            | rc_verbs/mlx5_bond_0:1/path0                                                                                                   |
[1737449293.325916] [node13:565911:0]   |                 256K..inf | (?) rendezvous zero-copy fenced write to remote | rc_verbs/mlx5_bond_0:1 13% on path0, 13% on path1, 13% on path2, 13% on path3, 13% on path4, 13% on path5, 13% on path6 and 9% |
[1737449293.325918] [node13:565911:0]   +---------------------------+-------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------+
[1737449293.326069] [node13:565911:0]   +---------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
[1737449293.326072] [node13:565911:0]   | perftest inter-node cfg#0 | active message by ucp_am_send* with reply flag(multi) from host memory                                                                                                           |
[1737449293.326075] [node13:565911:0]   +---------------------------+-------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------+
[1737449293.326078] [node13:565911:0]   |                         0 | short                                           | rc_verbs/mlx5_bond_0:1/path0                                                                                                   |
[1737449293.326081] [node13:565911:0]   |                    1..477 | (?) rendezvous fragmented copy-in copy-out      | rc_verbs/mlx5_bond_0:1/path0                                                                                                   |
[1737449293.326084] [node13:565911:0]   |                 478..8290 | (?) rendezvous zero-copy                        | rc_verbs/mlx5_bond_0:1/path0                                                                                                   |
[1737449293.326087] [node13:565911:0]   |                 8291..inf | (?) rendezvous zero-copy fenced write to remote | rc_verbs/mlx5_bond_0:1 13% on path0, 13% on path1, 13% on path2, 13% on path3, 13% on path4, 13% on path5, 13% on path6 and 9% |
[1737449293.326089] [node13:565911:0]   +---------------------------+-------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------+
[1737449293.337107] [node13:565911:0]   +---------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
[1737449293.337114] [node13:565911:0]   | perftest inter-node cfg#0 | active message by ucp_am_send* from host memory                                                                                                                                  |
[1737449293.337118] [node13:565911:0]   +---------------------------+-------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------+
[1737449293.337121] [node13:565911:0]   |                         0 | short                                           | rc_verbs/mlx5_bond_0:1/path0                                                                                                   |
[1737449293.337124] [node13:565911:0]   |                  1..15562 | (?) rendezvous fragmented copy-in copy-out      | rc_verbs/mlx5_bond_0:1/path0                                                                                                   |
[1737449293.337126] [node13:565911:0]   |                15563..inf | (?) rendezvous zero-copy fenced write to remote | rc_verbs/mlx5_bond_0:1 13% on path0, 13% on path1, 13% on path2, 13% on path3, 13% on path4, 13% on path5, 13% on path6 and 9% |
[1737449293.337129] [node13:565911:0]   +---------------------------+-------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------+
[1737449293.337281] [node13:565911:0]   +---------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
[1737449293.337284] [node13:565911:0]   | perftest inter-node cfg#0 | active message by ucp_am_send*(fast-completion) from host memory                                                                                                                 |
[1737449293.337287] [node13:565911:0]   +---------------------------+-------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------+
[1737449293.337290] [node13:565911:0]   |                    0..115 | short                                           | rc_verbs/mlx5_bond_0:1/path0                                                                                                   |
[1737449293.337292] [node13:565911:0]   |                 116..8247 | copy-in                                         | rc_verbs/mlx5_bond_0:1/path0                                                                                                   |
[1737449293.337294] [node13:565911:0]   |               8248..12591 | multi-frag copy-in                              | rc_verbs/mlx5_bond_0:1/path0                                                                                                   |
[1737449293.337297] [node13:565911:0]   |             12592..262143 | multi-frag zero-copy                            | rc_verbs/mlx5_bond_0:1/path0                                                                                                   |
[1737449293.337300] [node13:565911:0]   |                 256K..inf | (?) rendezvous zero-copy fenced write to remote | rc_verbs/mlx5_bond_0:1 13% on path0, 13% on path1, 13% on path2, 13% on path3, 13% on path4, 13% on path5, 13% on path6 and 9% |
[1737449293.337302] [node13:565911:0]   +---------------------------+-------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------+
[1737449293.337657] [node13:565911:0]   +---------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
[1737449293.337661] [node13:565911:0]   | perftest inter-node cfg#0 | active message by ucp_am_send*(multi) from host memory                                                                                                                           |
[1737449293.337664] [node13:565911:0]   +---------------------------+-------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------+
[1737449293.337667] [node13:565911:0]   |                         0 | short                                           | rc_verbs/mlx5_bond_0:1/path0                                                                                                   |
[1737449293.337670] [node13:565911:0]   |                    1..477 | (?) rendezvous fragmented copy-in copy-out      | rc_verbs/mlx5_bond_0:1/path0                                                                                                   |
[1737449293.337673] [node13:565911:0]   |                 478..8290 | (?) rendezvous zero-copy                        | rc_verbs/mlx5_bond_0:1/path0                                                                                                   |
[1737449293.337676] [node13:565911:0]   |                 8291..inf | (?) rendezvous zero-copy fenced write to remote | rc_verbs/mlx5_bond_0:1 13% on path0, 13% on path1, 13% on path2, 13% on path3, 13% on path4, 13% on path5, 13% on path6 and 9% |
[1737449293.337678] [node13:565911:0]   +---------------------------+-------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------+
[1737449293.337817] [node13:565911:0]   +---------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
[1737449293.337821] [node13:565911:0]   | perftest inter-node cfg#0 | active message by ucp_am_send* with reply flag from host memory                                                                                                                  |
[1737449293.337823] [node13:565911:0]   +---------------------------+-------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------+
[1737449293.337826] [node13:565911:0]   |                         0 | short                                           | rc_verbs/mlx5_bond_0:1/path0                                                                                                   |
[1737449293.337829] [node13:565911:0]   |                  1..15562 | (?) rendezvous fragmented copy-in copy-out      | rc_verbs/mlx5_bond_0:1/path0                                                                                                   |
[1737449293.337832] [node13:565911:0]   |                15563..inf | (?) rendezvous zero-copy fenced write to remote | rc_verbs/mlx5_bond_0:1 13% on path0, 13% on path1, 13% on path2, 13% on path3, 13% on path4, 13% on path5, 13% on path6 and 9% |
[1737449293.337835] [node13:565911:0]   +---------------------------+-------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------+
[1737449293.337976] [node13:565911:0]   +---------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
[1737449293.337980] [node13:565911:0]   | perftest inter-node cfg#0 | active message by ucp_am_send* with reply flag(fast-completion) from host memory                                                                                                 |
[1737449293.337983] [node13:565911:0]   +---------------------------+-------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------+
[1737449293.337986] [node13:565911:0]   |                    0..107 | short                                           | rc_verbs/mlx5_bond_0:1/path0                                                                                                   |
[1737449293.337989] [node13:565911:0]   |                 108..8239 | copy-in                                         | rc_verbs/mlx5_bond_0:1/path0                                                                                                   |
[1737449293.337992] [node13:565911:0]   |               8240..12591 | multi-frag copy-in                              | rc_verbs/mlx5_bond_0:1/path0                                                                                                   |
[1737449293.337995] [node13:565911:0]   |             12592..262143 | multi-frag zero-copy                            | rc_verbs/mlx5_bond_0:1/path0                                                                                                   |
[1737449293.337997] [node13:565911:0]   |                 256K..inf | (?) rendezvous zero-copy fenced write to remote | rc_verbs/mlx5_bond_0:1 13% on path0, 13% on path1, 13% on path2, 13% on path3, 13% on path4, 13% on path5, 13% on path6 and 9% |
[1737449293.338000] [node13:565911:0]   +---------------------------+-------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------+
[1737449293.338145] [node13:565911:0]   +---------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
[1737449293.338148] [node13:565911:0]   | perftest inter-node cfg#0 | active message by ucp_am_send* with reply flag(multi) from host memory                                                                                                           |
[1737449293.338151] [node13:565911:0]   +---------------------------+-------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------+
[1737449293.338154] [node13:565911:0]   |                         0 | short                                           | rc_verbs/mlx5_bond_0:1/path0                                                                                                   |
[1737449293.338157] [node13:565911:0]   |                    1..477 | (?) rendezvous fragmented copy-in copy-out      | rc_verbs/mlx5_bond_0:1/path0                                                                                                   |
[1737449293.338160] [node13:565911:0]   |                 478..8290 | (?) rendezvous zero-copy                        | rc_verbs/mlx5_bond_0:1/path0                                                                                                   |
[1737449293.338163] [node13:565911:0]   |                 8291..inf | (?) rendezvous zero-copy fenced write to remote | rc_verbs/mlx5_bond_0:1 13% on path0, 13% on path1, 13% on path2, 13% on path3, 13% on path4, 13% on path5, 13% on path6 and 9% |
[1737449293.338165] [node13:565911:0]   +---------------------------+-------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------+
[thread 0]              4549    180.708   220.360   220.360     4538.03    4538.03        4538        4538
[thread 1]              4599    180.669   217.964   217.964     4587.91    4587.91        4588        4588
[thread 1]              9565    180.655   201.873   209.610     4953.60    4770.76        4954        4771
[thread 0]              9519    180.650   201.713   210.624     4957.53    4747.79        4958        4748
[thread 1]             14482    180.747   203.857   207.657     4905.39    4815.64        4905        4816
[thread 0]             14397    180.647   205.511   208.892     4865.91    4787.16        4866        4787
[thread 1]             19400    180.657   203.830   206.687     4906.04    4838.24        4906        4838
[thread 0]             19359    180.512   202.020   207.131     4950.01    4827.87        4950        4828
[thread 0]             24313    180.553   202.368   206.160     4941.48    4850.60        4941        4851
[thread 1]             24344    180.625   203.502   206.040     4913.96    4853.43        4914        4853
[thread 0]             29278    180.685   201.903   205.438     4952.87    4867.64        4953        4868
[thread 1]             29325    180.632   201.277   205.231     4968.27    4872.56        4968        4873
[thread 0]             34260    180.705   201.220   204.825     4969.69    4882.22        4970        4882
[thread 1]             34272    180.754   202.645   204.858     4934.74    4881.44        4935        4881
[thread 0]             39202    180.713   203.291   204.632     4919.05    4886.83        4919        4887
[thread 1]             39200    180.686   203.408   204.675     4916.24    4885.78        4916        4886
[thread 0]             44227    180.602   199.505   204.049     5012.39    4900.78        5012        4901
[thread 1]             44174    180.622   201.542   204.323     4961.73    4894.22        4962        4894
[thread 0]             49135    180.662   204.240   204.068     4896.20    4900.32        4896        4900
[thread 1]             49045    180.707   205.789   204.468     4859.34    4890.73        4859        4891
[thread 0]             54085    180.597   202.589   203.933     4936.10    4903.58        4936        4904
[thread 1]             54022    180.644   201.413   204.187     4964.91    4897.47        4965        4897
[thread 0]             59045    180.579   202.114   203.780     4947.70    4907.25        4948        4907
[thread 1]             58990    180.605   201.794   203.985     4955.55    4902.31        4956        4902
[thread 0]             64014    180.634   201.742   203.622     4956.83    4911.06        4957        4911
[thread 1]             63971    180.576   201.273   203.774     4968.37    4907.39        4968        4907
[thread 0]             69003    180.725   200.935   203.428     4976.74    4915.75        4977        4916
[thread 1]             68990    180.545   199.922   203.494     5001.95    4914.15        5002        4914
[thread 0]             73999    180.511   200.636   203.239     4984.15    4920.31        4984        4920
[thread 1]             73989    180.660   200.513   203.293     4987.22    4919.02        4987        4919
[thread 0]             79018    180.569   199.743   203.017     5006.43    4925.70        5006        4926
[thread 1]             78977    180.493   200.981   203.147     4975.59    4922.55        4976        4923
[thread 0]             83909    180.456   204.944   203.129     4879.37    4922.97        4879        4923
[thread 1]             83851    180.513   205.676   203.294     4862.01    4918.99        4862        4919
[thread 0]             88868    180.643   202.351   203.086     4941.92    4924.02        4942        4924
[thread 1]             88796    180.695   202.789   203.265     4931.24    4919.67        4931        4920
[thread 0]             93941    180.680   197.608   202.790     5060.54    4931.21        5061        4931

server:

[root@node13 ucx-1.18.0]# UCX_RNDV_SCHEME=put_zcopy UCX_IB_NUM_PATHS=8 UCX_MAX_RNDV_LANES=8 UCX_NET_DEVICES=mlx5_bond_0:1 UCX_TLS=rc_v  UCX_PROTO_ENABLE=y UCX_PROTO_INFO=y ./install-release-mt/bin/ucx_perftest -T 2
[1737449290.414834] [node13:559607:0]        perftest.c:800  UCX  WARN  CPU affinity is not set (bound to 96 cpus). Performance may be impacted.
Waiting for connection...
Accepted connection from 10.16.29.12:54400
+----------------------------------------------------------------------------------------------------------+
| API:          protocol layer                                                                             |
| Test:         am bandwidth / message rate                                                                |
| Data layout:  (automatic)                                                                                |
| Send memory:  host                                                                                       |
| Recv memory:  host                                                                                       |
| Message size: 1048576                                                                                    |
| Window size:  32                                                                                         |
| AM header size: 0                                                                                        |
+----------------------------------------------------------------------------------------------------------+
[1737449293.327507] [node13:559607:0]   +---------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
[1737449293.327517] [node13:559607:0]   | perftest inter-node cfg#0 | active message by ucp_am_send* from host memory                                                                                                                                  |
[1737449293.327521] [node13:559607:0]   +---------------------------+-------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------+
[1737449293.327526] [node13:559607:0]   |                         0 | short                                           | rc_verbs/mlx5_bond_0:1/path0                                                                                                   |
[1737449293.327529] [node13:559607:0]   |                  1..15562 | (?) rendezvous fragmented copy-in copy-out      | rc_verbs/mlx5_bond_0:1/path0                                                                                                   |
[1737449293.327532] [node13:559607:0]   |                15563..inf | (?) rendezvous zero-copy fenced write to remote | rc_verbs/mlx5_bond_0:1 13% on path0, 13% on path1, 13% on path2, 13% on path3, 13% on path4, 13% on path5, 13% on path6 and 9% |
[1737449293.327534] [node13:559607:0]   +---------------------------+-------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------+
[1737449293.327693] [node13:559607:0]   +---------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
[1737449293.327697] [node13:559607:0]   | perftest inter-node cfg#0 | active message by ucp_am_send*(fast-completion) from host memory                                                                                                                 |
[1737449293.327699] [node13:559607:0]   +---------------------------+-------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------+
[1737449293.327702] [node13:559607:0]   |                    0..115 | short                                           | rc_verbs/mlx5_bond_0:1/path0                                                                                                   |
[1737449293.327706] [node13:559607:0]   |                 116..8247 | copy-in                                         | rc_verbs/mlx5_bond_0:1/path0                                                                                                   |
[1737449293.327707] [node13:559607:0]   |               8248..12591 | multi-frag copy-in                              | rc_verbs/mlx5_bond_0:1/path0                                                                                                   |
[1737449293.327711] [node13:559607:0]   |             12592..262143 | multi-frag zero-copy                            | rc_verbs/mlx5_bond_0:1/path0                                                                                                   |
[1737449293.327714] [node13:559607:0]   |                 256K..inf | (?) rendezvous zero-copy fenced write to remote | rc_verbs/mlx5_bond_0:1 13% on path0, 13% on path1, 13% on path2, 13% on path3, 13% on path4, 13% on path5, 13% on path6 and 9% |
[1737449293.327717] [node13:559607:0]   +---------------------------+-------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------+
[1737449293.328086] [node13:559607:0]   +---------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
[1737449293.328095] [node13:559607:0]   | perftest inter-node cfg#0 | active message by ucp_am_send*(multi) from host memory                                                                                                                           |
[1737449293.328098] [node13:559607:0]   +---------------------------+-------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------+
[1737449293.328100] [node13:559607:0]   |                         0 | short                                           | rc_verbs/mlx5_bond_0:1/path0                                                                                                   |
[1737449293.328102] [node13:559607:0]   |                    1..477 | (?) rendezvous fragmented copy-in copy-out      | rc_verbs/mlx5_bond_0:1/path0                                                                                                   |
[1737449293.328106] [node13:559607:0]   |                 478..8290 | (?) rendezvous zero-copy                        | rc_verbs/mlx5_bond_0:1/path0                                                                                                   |
[1737449293.328107] [node13:559607:0]   |                 8291..inf | (?) rendezvous zero-copy fenced write to remote | rc_verbs/mlx5_bond_0:1 13% on path0, 13% on path1, 13% on path2, 13% on path3, 13% on path4, 13% on path5, 13% on path6 and 9% |
[1737449293.328110] [node13:559607:0]   +---------------------------+-------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------+
[1737449293.328247] [node13:559607:0]   +---------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
[1737449293.328250] [node13:559607:0]   | perftest inter-node cfg#0 | active message by ucp_am_send* with reply flag from host memory                                                                                                                  |
[1737449293.328253] [node13:559607:0]   +---------------------------+-------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------+
[1737449293.328256] [node13:559607:0]   |                         0 | short                                           | rc_verbs/mlx5_bond_0:1/path0                                                                                                   |
[1737449293.328259] [node13:559607:0]   |                  1..15562 | (?) rendezvous fragmented copy-in copy-out      | rc_verbs/mlx5_bond_0:1/path0                                                                                                   |
[1737449293.328262] [node13:559607:0]   |                15563..inf | (?) rendezvous zero-copy fenced write to remote | rc_verbs/mlx5_bond_0:1 13% on path0, 13% on path1, 13% on path2, 13% on path3, 13% on path4, 13% on path5, 13% on path6 and 9% |
[1737449293.328265] [node13:559607:0]   +---------------------------+-------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------+
[1737449293.328412] [node13:559607:0]   +---------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
[1737449293.328415] [node13:559607:0]   | perftest inter-node cfg#0 | active message by ucp_am_send* with reply flag(fast-completion) from host memory                                                                                                 |
[1737449293.328417] [node13:559607:0]   +---------------------------+-------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------+
[1737449293.328420] [node13:559607:0]   |                    0..107 | short                                           | rc_verbs/mlx5_bond_0:1/path0                                                                                                   |
[1737449293.328423] [node13:559607:0]   |                 108..8239 | copy-in                                         | rc_verbs/mlx5_bond_0:1/path0                                                                                                   |
[1737449293.328425] [node13:559607:0]   |               8240..12591 | multi-frag copy-in                              | rc_verbs/mlx5_bond_0:1/path0                                                                                                   |
[1737449293.328428] [node13:559607:0]   |             12592..262143 | multi-frag zero-copy                            | rc_verbs/mlx5_bond_0:1/path0                                                                                                   |
[1737449293.328431] [node13:559607:0]   |                 256K..inf | (?) rendezvous zero-copy fenced write to remote | rc_verbs/mlx5_bond_0:1 13% on path0, 13% on path1, 13% on path2, 13% on path3, 13% on path4, 13% on path5, 13% on path6 and 9% |
[1737449293.328433] [node13:559607:0]   +---------------------------+-------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------+
[1737449293.328586] [node13:559607:0]   +---------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
[1737449293.328589] [node13:559607:0]   | perftest inter-node cfg#0 | active message by ucp_am_send* with reply flag(multi) from host memory                                                                                                           |
[1737449293.328591] [node13:559607:0]   +---------------------------+-------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------+
[1737449293.328593] [node13:559607:0]   |                         0 | short                                           | rc_verbs/mlx5_bond_0:1/path0                                                                                                   |
[1737449293.328596] [node13:559607:0]   |                    1..477 | (?) rendezvous fragmented copy-in copy-out      | rc_verbs/mlx5_bond_0:1/path0                                                                                                   |
[1737449293.328598] [node13:559607:0]   |                 478..8290 | (?) rendezvous zero-copy                        | rc_verbs/mlx5_bond_0:1/path0                                                                                                   |
[1737449293.328602] [node13:559607:0]   |                 8291..inf | (?) rendezvous zero-copy fenced write to remote | rc_verbs/mlx5_bond_0:1 13% on path0, 13% on path1, 13% on path2, 13% on path3, 13% on path4, 13% on path5, 13% on path6 and 9% |
[1737449293.328604] [node13:559607:0]   +---------------------------+-------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------+
[1737449293.339837] [node13:559607:0]   +---------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
[1737449293.339846] [node13:559607:0]   | perftest inter-node cfg#0 | active message by ucp_am_send* from host memory                                                                                                                                  |
[1737449293.339850] [node13:559607:0]   +---------------------------+-------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------+
[1737449293.339853] [node13:559607:0]   |                         0 | short                                           | rc_verbs/mlx5_bond_0:1/path0                                                                                                   |
[1737449293.339856] [node13:559607:0]   |                  1..15562 | (?) rendezvous fragmented copy-in copy-out      | rc_verbs/mlx5_bond_0:1/path0                                                                                                   |
[1737449293.339859] [node13:559607:0]   |                15563..inf | (?) rendezvous zero-copy fenced write to remote | rc_verbs/mlx5_bond_0:1 13% on path0, 13% on path1, 13% on path2, 13% on path3, 13% on path4, 13% on path5, 13% on path6 and 9% |
[1737449293.339863] [node13:559607:0]   +---------------------------+-------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------+
[1737449293.340026] [node13:559607:0]   +---------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
[1737449293.340030] [node13:559607:0]   | perftest inter-node cfg#0 | active message by ucp_am_send*(fast-completion) from host memory                                                                                                                 |
[1737449293.340033] [node13:559607:0]   +---------------------------+-------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------+
[1737449293.340036] [node13:559607:0]   |                    0..115 | short                                           | rc_verbs/mlx5_bond_0:1/path0                                                                                                   |
[1737449293.340039] [node13:559607:0]   |                 116..8247 | copy-in                                         | rc_verbs/mlx5_bond_0:1/path0                                                                                                   |
[1737449293.340042] [node13:559607:0]   |               8248..12591 | multi-frag copy-in                              | rc_verbs/mlx5_bond_0:1/path0                                                                                                   |
[1737449293.340045] [node13:559607:0]   |             12592..262143 | multi-frag zero-copy                            | rc_verbs/mlx5_bond_0:1/path0                                                                                                   |
[1737449293.340047] [node13:559607:0]   |                 256K..inf | (?) rendezvous zero-copy fenced write to remote | rc_verbs/mlx5_bond_0:1 13% on path0, 13% on path1, 13% on path2, 13% on path3, 13% on path4, 13% on path5, 13% on path6 and 9% |
[1737449293.340049] [node13:559607:0]   +---------------------------+-------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------+
[1737449293.340423] [node13:559607:0]   +---------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
[1737449293.340426] [node13:559607:0]   | perftest inter-node cfg#0 | active message by ucp_am_send*(multi) from host memory                                                                                                                           |
[1737449293.340428] [node13:559607:0]   +---------------------------+-------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------+
[1737449293.340431] [node13:559607:0]   |                         0 | short                                           | rc_verbs/mlx5_bond_0:1/path0                                                                                                   |
[1737449293.340435] [node13:559607:0]   |                    1..477 | (?) rendezvous fragmented copy-in copy-out      | rc_verbs/mlx5_bond_0:1/path0                                                                                                   |
[1737449293.340438] [node13:559607:0]   |                 478..8290 | (?) rendezvous zero-copy                        | rc_verbs/mlx5_bond_0:1/path0                                                                                                   |
[1737449293.340442] [node13:559607:0]   |                 8291..inf | (?) rendezvous zero-copy fenced write to remote | rc_verbs/mlx5_bond_0:1 13% on path0, 13% on path1, 13% on path2, 13% on path3, 13% on path4, 13% on path5, 13% on path6 and 9% |
[1737449293.340444] [node13:559607:0]   +---------------------------+-------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------+
[1737449293.340588] [node13:559607:0]   +---------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
[1737449293.340591] [node13:559607:0]   | perftest inter-node cfg#0 | active message by ucp_am_send* with reply flag from host memory                                                                                                                  |
[1737449293.340594] [node13:559607:0]   +---------------------------+-------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------+
[1737449293.340596] [node13:559607:0]   |                         0 | short                                           | rc_verbs/mlx5_bond_0:1/path0                                                                                                   |
[1737449293.340598] [node13:559607:0]   |                  1..15562 | (?) rendezvous fragmented copy-in copy-out      | rc_verbs/mlx5_bond_0:1/path0                                                                                                   |
[1737449293.340601] [node13:559607:0]   |                15563..inf | (?) rendezvous zero-copy fenced write to remote | rc_verbs/mlx5_bond_0:1 13% on path0, 13% on path1, 13% on path2, 13% on path3, 13% on path4, 13% on path5, 13% on path6 and 9% |
[1737449293.340604] [node13:559607:0]   +---------------------------+-------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------+
[1737449293.340752] [node13:559607:0]   +---------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
[1737449293.340755] [node13:559607:0]   | perftest inter-node cfg#0 | active message by ucp_am_send* with reply flag(fast-completion) from host memory                                                                                                 |
[1737449293.340757] [node13:559607:0]   +---------------------------+-------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------+
[1737449293.340760] [node13:559607:0]   |                    0..107 | short                                           | rc_verbs/mlx5_bond_0:1/path0                                                                                                   |
[1737449293.340762] [node13:559607:0]   |                 108..8239 | copy-in                                         | rc_verbs/mlx5_bond_0:1/path0                                                                                                   |
[1737449293.340765] [node13:559607:0]   |               8240..12591 | multi-frag copy-in                              | rc_verbs/mlx5_bond_0:1/path0                                                                                                   |
[1737449293.340768] [node13:559607:0]   |             12592..262143 | multi-frag zero-copy                            | rc_verbs/mlx5_bond_0:1/path0                                                                                                   |
[1737449293.340771] [node13:559607:0]   |                 256K..inf | (?) rendezvous zero-copy fenced write to remote | rc_verbs/mlx5_bond_0:1 13% on path0, 13% on path1, 13% on path2, 13% on path3, 13% on path4, 13% on path5, 13% on path6 and 9% |
[1737449293.340773] [node13:559607:0]   +---------------------------+-------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------+
[1737449293.340923] [node13:559607:0]   +---------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
[1737449293.340926] [node13:559607:0]   | perftest inter-node cfg#0 | active message by ucp_am_send* with reply flag(multi) from host memory                                                                                                           |
[1737449293.340928] [node13:559607:0]   +---------------------------+-------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------+
[1737449293.340931] [node13:559607:0]   |                         0 | short                                           | rc_verbs/mlx5_bond_0:1/path0                                                                                                   |
[1737449293.340932] [node13:559607:0]   |                    1..477 | (?) rendezvous fragmented copy-in copy-out      | rc_verbs/mlx5_bond_0:1/path0                                                                                                   |
[1737449293.340935] [node13:559607:0]   |                 478..8290 | (?) rendezvous zero-copy                        | rc_verbs/mlx5_bond_0:1/path0                                                                                                   |
[1737449293.340937] [node13:559607:0]   |                 8291..inf | (?) rendezvous zero-copy fenced write to remote | rc_verbs/mlx5_bond_0:1 13% on path0, 13% on path1, 13% on path2, 13% on path3, 13% on path4, 13% on path5, 13% on path6 and 9% |
[1737449293.340940] [node13:559607:0]   +---------------------------+-------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------+

@ivanallen
Copy link
Author

@yosefe I added UCX_IB_NUM_PATHS=16 UCX_MAX_RNDV_LANES=16, it works!

@ivanallen
Copy link
Author

@yosefe Hi!

We suspect it has something to do with the configuration of the switch.

However, we also capture ib packets to check the distribution of source ports. There is a significant difference between ucx and ib_send_bw.

ucx always uses continuous source ports:
Image

ib_send_bw uses a randomized ports:
Image

@changchengx
Copy link
Contributor

changchengx commented Jan 22, 2025

ib_send_bw uses a randomized ports.
You can try to use --flow_label options in the latest perftest(master branch, not tag release version) to select the fixed source ports to check whether it could hit the same problem as UCX.

@ivanallen
Copy link
Author

ib_send_bw uses a randomized ports. You can try to use --flow_label options in the latest perftest(master branch, not tag release version) to select the fixed source ports to check whether it could hit the same problem as UCX.

Hi @changchengx

Thank you for your reply. We have now basically confirmed that it is related to the network configuration. We are preparing to reconfigure our network using MLAG to set up our switches.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants