-
Notifications
You must be signed in to change notification settings - Fork 999
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(dcutr): handle empty holepunch_candidates #5583
base: master
Are you sure you want to change the base?
fix(dcutr): handle empty holepunch_candidates #5583
Conversation
This pull request has merge conflicts. Could you please resolve them @stormshield-frb? 🙏 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for upstreaming this fix @stormshield-frb!
I am not sure if buffering a pending stream is the best solution, see my below comment about affected RTT measurements.
I was wondering, did you also consider the alternative of:
- adding the
Command::NewExternalAddrCandidate
that this PR introduces - In the
Behavior
: retry in case of aEvent::OutboundConnectFailed
if the error isNoAddresses
. I would think that by the second or third attempt the remote is likely to have completed the identify exchange and updated their external address list.
/// All relayed connections. | ||
relayed_connections: HashMap<PeerId, HashSet<ConnectionId>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was confusing me a bit when reviewing because it gives the impression that the logic that was previously applied to direct_connections
is now applied to relayed_connection
.
However, if I understand it correctly, it's actually that:
direct_connection
s was already unused prior to this PR, and can be removed independentlyrelayed_connections
is required by this PR
If so, would you mind splitting the removal of direct_connections
out of this PR, and either do it in a follow-up PR yourself / I can do the follow-up PR as well.
self.inner.push(address.clone(), ()); | ||
Some(address) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self.inner.push(address.clone(), ()); | |
Some(address) | |
match self.inner.push(address.clone(), ()) { | |
Some((addr, ())) if addr == address => None, | |
_ => Some(address) | |
} |
Only return Some
if the address wasn't already in the cache?
if self | ||
.inbound_stream | ||
.try_push(inbound::handshake( | ||
stream, | ||
self.holepunch_candidates.clone(), | ||
)) | ||
.is_err() | ||
{ | ||
tracing::warn!( | ||
"New inbound connect stream while still upgrading previous one. Replacing previous with new.", | ||
); | ||
} | ||
self.attempts += 1; | ||
} | ||
future::Either::Left(stream) => self.set_stream(StreamType::Inbound, stream), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Won't buffering the inbound stream affect the rtt that the remote measures?
Since we do the buffering after the protocol negotiation succeeded, I think the remote will already have send a Connect
message and start measuring the rtt.
So a subsequent holepunching attempt would be likely to fail because the timing is off?
Description
A few months ago, we were experiencing from time to time some weird failures with
DCUTr
. After some research to find out the problem, it was a race condition : sometimesidentify
must be a little bit slow and theDCUTr
handler is created before anidentify
event is received. Alone, this is not necessarily a problem. But if this race happens when the holepunch candidates list ofDCUTr
is empty, thenDCUTr
will always fail for this connection. Indeed, when receiving an new relayed established connection,DCUTr
will create anHandler
for this connection which will be responsible to make the hole-punching. However, the candidates that are used are the one known atHandler
instantiation, so any future updates aboutNewExternalAddrCandidate
will not be forwarded to theHandler
s.This PR is the upstream of the fix we did several months ago and we did not encountered any particular problem with it since.
Timetable of the problem
OutboundError(NoAddress)
InboundError(UnexpectedEof)
Notes & open questions
I have put some
TODO
s about the potential merging ofself.attempts += 1
. Wheninbound
,self.attempts
is incremented when starting an handshake, however, whenoutbound
,self.attempts
is incremented at the "new outbound substream" request. Before I don't think it was a problem, but now that we do not necessarily trigger an handshake if there is no hole-punch candidates, I think we might was to incrementself.attempts
only when effectively starting the handshake. What do you think ?It is noted in the log when starting a new handshake that, if the corresponding stream (
inbound_stream
oroutbound_stream
) was not empty, then we replace the handshake. There iswarn
level log statingNew inbound/outbound connect stream while still upgrading previous one. Replacing previous with new
. However, when reading the code of theFuturesSet::try_push
method and then theFuturesMap::try_push
method (which is used inside), the future pushed never replaces any old one when capacity is reached, it just returns an error. So what do you think should be done ? Should be actually replace the old with the new like the log says ? Or should we not replace the old with the new and update the log to say that the new one was dropped ?Change checklist