Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ensure peer_connected is called before peer_disconnected #3110

Conversation

johncantrell97
Copy link
Contributor

Fixes #3108

Makes sure all message handler's peer_connected methods are called instead of returning early on the first to error.

As for whether or not the user has to call back into socket_disconnected after a PeerManager::read_event, I assume you mean after it returns an Err? I think the user does not have to because read_event will call disconnect_event_internal on any error before returning it to the user.

I took a look at lightning-net-tokio and it appears to be the case over there as well. It does:

if let Disconnect::PeerDisconnected = disconnect_type {
    peer_manager.as_ref().socket_disconnected(&our_descriptor);
    peer_manager.as_ref().process_events();
}

Only calling socket_disconnected if the disconnection type is one the user detected. If read_event returns an Err it breaks with a disconnection type of Disconnect::CloseConnection and does not call back into socket_disconnected.

Matt seems to think you do have to so I'm probably misunderstanding the original question. Happy to dig into it a bit more with some clarification if I misunderstood.

("Route Handler", self.message_handler.route_handler.peer_connected(&their_node_id, &msg, peer_lock.inbound_connection)),
("Channel Handler", self.message_handler.chan_handler.peer_connected(&their_node_id, &msg, peer_lock.inbound_connection)),
("Onion Handler", self.message_handler.onion_message_handler.peer_connected(&their_node_id, &msg, peer_lock.inbound_connection)),
];
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you don't like this attempt to dry up the handling then I'm find just having separate results where I check them one by one with their own log messages.

@codecov-commenter
Copy link

codecov-commenter commented Jun 7, 2024

Codecov Report

Attention: Patch coverage is 92.56757% with 11 lines in your changes missing coverage. Please review.

Project coverage is 90.78%. Comparing base (9789152) to head (922c31f).
Report is 1274 commits behind head on main.

Files with missing lines Patch % Lines
lightning/src/ln/peer_handler.rs 92.56% 10 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3110      +/-   ##
==========================================
+ Coverage   89.84%   90.78%   +0.93%     
==========================================
  Files         119      119              
  Lines       97561   103463    +5902     
  Branches    97561   103463    +5902     
==========================================
+ Hits        87655    93925    +6270     
+ Misses       7331     7032     -299     
+ Partials     2575     2506      -69     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Collaborator

@TheBlueMatt TheBlueMatt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I think we also need to move the peer_lock.their_features call up - we only call peer_disconnected if that line has been hit (Peer::handshake_complete checks for it) so we want to always hit that immediately before we call peer_connecteds.

@johncantrell97 johncantrell97 force-pushed the 2024-06-robust-peer-disconnected-events branch from 4a1cade to db3b148 Compare June 7, 2024 16:02
@johncantrell97
Copy link
Contributor Author

Thanks! I think we also need to move the peer_lock.their_features call up - we only call peer_disconnected if that line has been hit (Peer::handshake_complete checks for it) so we want to always hit that immediately before we call peer_connecteds.

Whoops, fixed it.

Can't be before calls to peer_connected because they pass a reference to msg but as long as we do it before returning it should be okay.

@johncantrell97 johncantrell97 force-pushed the 2024-06-robust-peer-disconnected-events branch from db3b148 to 3a8f3b2 Compare June 7, 2024 16:18
TheBlueMatt
TheBlueMatt previously approved these changes Jun 7, 2024
Copy link
Collaborator

@TheBlueMatt TheBlueMatt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to get a test (which should be pretty easy), but either way LGTM.

@johncantrell97
Copy link
Contributor Author

Would be nice to get a test (which should be pretty easy), but either way LGTM.

Doesn't look like there's easy way to handle testing it with the existing test message handlers. Should I create new ones that can error on peer_connected and track connected/disconnected have been called or add the functionality to the existing test handlers?

Have used something like mockall for this in the past but without it I'd have to add counters/flags that get updated and a way to check them.

Is this what you had in mind for being able to test it?

@TheBlueMatt
Copy link
Collaborator

Yea, I was figuring you'd just create a trivial CustomMessageHandler that asserts that connected/disconnecteds all come in order and then errors on connection.

@johncantrell97
Copy link
Contributor Author

johncantrell97 commented Jun 7, 2024

Yea, I was figuring you'd just create a trivial CustomMessageHandler that asserts that connected/disconnecteds all come in order and then errors on connection.

Hm, using a CustomMessageHandler doesn't really test the fix here since it goes last. One of the issues was the early return causing the later handlers to not get the peer_connected at all. Would have to use multiple new handlers to be able to check the one after an error is returned still gets peer_connected called on it (and disconnected)

I guess at least it would catch the fix for ensuring disconnect is called.

@johncantrell97
Copy link
Contributor Author

@TheBlueMatt

Added a test that passes but it duplicates a ton of code to handle all of the setup but with the new message handlers :|

not sure if this is okay, looking for feedback on the test and how to do it better if it's not okay.

@johncantrell97 johncantrell97 force-pushed the 2024-06-robust-peer-disconnected-events branch 2 times, most recently from 2c4c40a to 7a29c39 Compare June 7, 2024 23:00
@johncantrell97 johncantrell97 force-pushed the 2024-06-robust-peer-disconnected-events branch from 7a29c39 to 922c31f Compare June 8, 2024 01:22
if let Err(()) = self.message_handler.custom_message_handler.peer_connected(&their_node_id, &msg, peer_lock.inbound_connection) {
log_debug!(logger, "Custom Message Handler decided we couldn't communicate with peer {}", log_pubkey!(their_node_id));

peer_lock.their_features = Some(msg.features);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused by this, wouldn't that lead to use falsely assuming the handshake succeeded even though one of our handlers rejected it? And there is a window between us dropping the lock and handling the disconnect even where we would deal with it in a 'normal' manner, e.g., accepting further messages, and potentially rebroadcasting etc?

(cc @TheBlueMatt as he requested this change)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm, if that's true then seems like we'll need to separate "handshake_completed" from "triggered peer_connected" with a new flag on the peer that we can use to decide whether or not to trigger peer_disconnected in do_disconnect and disconnect_event_internal?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, kinda, we'll end up forwarding broadcasts, as you point out, which is maybe not ideal, but we shouldn't process any further messages - we're currently in a read processing call, and we require read processing calls for any given peer to be serial, so presumably when we return an error the read-processing pipeline for this peer will stall and we won't get any more reads. We could make that explicit in the docs, however.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mhh, rather than introducing this race-y behavior in the first place, couldn't we just introduce a new handshake_aborted flag and check that alternatively to !peer.handshake_complete in disconnect_event_internal?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fine too.

use crate::ln::{msgs, wire};
use crate::ln::msgs::{Init, LightningError, SocketAddress};
use crate::util::test_utils;


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Drop superfluous whitespace.

@@ -2779,6 +2781,76 @@ mod tests {
}
}

struct TestPeerTrackingMessageHandler {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I believe the alternative would be to add TestCustomMessageHandler and TestOnionMessageHandler to test_utils and use them as part of the default test setup?

@tnull
Copy link
Contributor

tnull commented Jul 17, 2024

@johncantrell97 Any interest in finishing this PR?

@TheBlueMatt
Copy link
Collaborator

Supersceded by #3580.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

peer_disconnected event processing isn't robust if a peer_connected Errs
4 participants