Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: provide a way to manage RTT component networks when hosts "disappear" (on top of #15) #16

Open
wants to merge 6 commits into
base: corba_multi_dispatcher
Choose a base branch
from

Conversation

doudou
Copy link
Member

@doudou doudou commented Feb 18, 2025

On top of #15

Whenever a remote host "disappears", a lot of operations related to dataflow are becoming blocking as well (have to wait until timeout), because these operations will call the remote side for disconnection.

This makes systems greatly unstable and misbehaving for a while, until all these calls clear. And kills the possibility for a system management layer to do the cleanup knowing what is happening, and gives situations where some half-channels will be left dangling (for instance, a task will get an OldData on a port because its part of the connection is still there).

This PR adds a new API without touching the current behaviour. The API allows to manage "half channels", that is the part of the channel that is within the process, without touching the remote side.

…connection

The issue with having a connendpoint without having the connection registered is
that it crashes on disconnect, since the endpoint calls the port and then the
port cannot find the connection
…update the policy

Policy updating is needed to exfiltrate some information in the OOB transport
case (namely, a name that explains what the other side should do to connect, as
for instance the MQ name for the MQ transport). Turns out that only the output
half is doing so, and the other take the policy as input.

Ideally, we would also have cleaned up what information is or is not being
passed to the other calls (the connect calls, for instance, really don't need
much policy information), but that would be for another PR.
The current RTT behaviour is to have destructors explicitly disconnect channels.
It's all well and good, but at destruction time things are ... unorderly. Allow
to assume that a system manager will handle the cleanup when possible.
@doudou doudou changed the title chore: provide a way to manage RTT component networks when hosts "disappear" chore: provide a way to manage RTT component networks when hosts "disappear" (on top of #15) Feb 18, 2025
Copy link

@pierrewillenbrockdfki pierrewillenbrockdfki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. I was questioning the need for the remote_side_lock, but it turns out that the remote_side variable itself needs to be protected against concurrent access(independent of the reference counter and the referenced object).

I'll keep this in mind when reworking the cpp rock-display connection handling.

@maltewi this might be interesting for cnd/execution?

@doudou
Copy link
Member Author

doudou commented Feb 25, 2025

I'll keep this in mind when reworking the cpp rock-display connection handling.

A rock-display-like tool won't necessarily benefit from this. The current connection handling will continue working fine (the signalling flag is a lot more critical). A syskit-like tool, on the other hand, can definitely benefit from this in term of robustness in distributed systems. On local systems, really not that much. But the migration is quite a bit of work.

I'm still testing, there are some crashes.

I'd be happy to discuss it with (both of) you over a call if you'd like.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants