Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

link warn : connection already connected #123

Open
zx-ai opened this issue Feb 27, 2025 · 1 comment
Open

link warn : connection already connected #123

zx-ai opened this issue Feb 27, 2025 · 1 comment

Comments

@zx-ai
Copy link

zx-ai commented Feb 27, 2025

I try to use 4 nics link like this:
std::string name = topo_.prefill_nodes[i].local_ip +":12345@mlx5_0";
std::string name1 = topo_.prefill_nodes[i].local_ip +":12345@mlx5_1";
// std::string name2 = topo_.prefill_nodes[i].local_ip +":12345@mlx5_2";
// std::string name3 = topo_.prefill_nodes[i].local_ip +":12345@mlx5_3";
RdmaTransport* rdmaport = dynamic_cast<RdmaTransport*>(xport_);
std::vector<std::shared_ptr> context = rdmaport->get_context_list();
for(auto temp : context)
{
auto endpoint = (*temp).endpoint(name);
endpoint->setupconnectionsByActive();
auto endpoint1 = (*temp).endpoint(name1);
endpoint1->setupConnectionsByActive();
// auto endpoint2 = (*temp).endpoint(name2);
// endpoint1->setupconnectionsByActive();
// auto endpoint3 = (*temp).endpoint(name3);
}
Why, when two machines each use two network cards to establish connections, do they not need to re-establish connections for data transmission? However, when each machine uses four network cards, warnings occur during connection establishment( connection already connected), and endpoint connections are created during data transmission. What causes this? Is it possible to establish all endpoint connections before transmitting data?

@alogfans
Copy link
Collaborator

A endpoint can be connection-established either by active or passive. For example, when mlx5_0 starts to get connected with mlx5_2, mlx5_0 creates a connection to mlx5_2, and mlx5_2 also creates a connection to mlx5_0. Later, if mlx5_0 starts to get connected with mlx5_0, there will be duplicated connections.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants