Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add server side ICE consent checks to detect silent WebRTC disconnection #1332

Merged
merged 24 commits into from
Feb 20, 2024

Conversation

ibc
Copy link
Member

@ibc ibc commented Feb 15, 2024

Details

  • Implements ICE Lite server side of RFC 7675 "(STUN) Usage for Consent Freshness".
  • This PR provides mediasoup with the ability to detect downed networks or silent/abrupt WebRTC disconnections in client side.
  • mediasoup now monitor received ICE consent requests from WebRTC clients and moves to 'disconnected' state (and notify the app as usual) if no ICE consent request is received in the last 30 seconds as per spec (can be changed with a iceConsentTimeout setting in WebRtcTransportOptions).

TODO

  • Document iceConsentTimeout it in the website and extend the description of the "icestatechange" with state "disconnected" (also document that "icestatechange" with state "disconnected" happens after "dtlsstatechange" with state "closed" if client closes its PeerConnection).

- Fixes #1127

### Details

- This PR provides the app with the ability to detect downed networks or silent/abrupts disconnections in client side.
- Make mediasoup send ICE consent requests every ~5 seconds once ICE is established.
- Move to 'disconnected' state (and notify the app as usual) if no ICE response is received in 30 seconds as per spec (this may change with a setting, see TODO below).
- This feature requires providing the mediasoup `WebRtcTransport` with the remote ICE username fragment and ICE password, which will be done by latest mediasoup-client once released.

### TODO

- [ ] Add a `iceConsentTimeout` parameter to `WebRtcTransportOptions` to override the default value of 30 seconds (0 means "disable the mechanism").
- [ ] Rust.
- [ ] React to 403 ICE error responses by also moving to 'disconnected' state.
@ibc ibc requested a review from jmillan February 15, 2024 18:40
- Add iceConsentTimeout in WebRtcTransportOptions (default to 30).
- Limit max entries in the queue to 20.
Copy link
Member

@jmillan jmillan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall it looks good to me.

The only concern I have is that the application will receive a 'disconnected' event, but it has no means to know the reason why, and that's IMO important.

I think a 'reason' argument should be added to the notification from C++ to Node/Rust so in this case it would be something like ConsentCheckExpired and the corresponding ones in other cases.

That could be done in a different PR though.

worker/src/RTC/IceServer.cpp Outdated Show resolved Hide resolved
worker/src/RTC/IceServer.cpp Outdated Show resolved Hide resolved
worker/include/RTC/IceServer.hpp Outdated Show resolved Hide resolved
@ibc
Copy link
Member Author

ibc commented Feb 16, 2024

The only concern I have is that the application will receive a 'disconnected' event, but it has no means to know the reason why, and that's IMO important.

I think a 'reason' argument should be added to the notification from C++ to Node/Rust so in this case it would be something like ConsentCheckExpired and the corresponding ones in other cases.

reason argument where? This is what we have now:

this.safeEmit('icestatechange', iceState);

You mean adding a reason argument to that event in case iceState === 'disconnected'?

@jmillan
Copy link
Member

jmillan commented Feb 16, 2024

You mean adding a reason argument to that event in case iceState === 'disconnected'?

Yes, for example.

The problem is not knowing the reason of the disconnected now that it can be due to this checks timeout.

@ibc
Copy link
Member Author

ibc commented Feb 16, 2024

The problem is not knowing the reason of the disconnected now that it can be due to this checks timeout.

NOTE: if the WebRTC client disconnects its side by closing the PeerConnection or by closing the browser tab, DTLS close alert arrives BEFORE to mediasoup and "dtlsstatechange" (state "closed") happens BEFORE. So if that happens the app should already close the transport. Basically "iceconnectionstatechange" event with state "disconnected" ONLY happens if consent timeout happens or if the client is connected via TCP and the TCP is abruptly closed, which in both cases means that the network is down.

Taking this into account I think that we don't need any "reason" argument.

@ibc
Copy link
Member Author

ibc commented Feb 16, 2024

Also note that, before this feature, "iceconnectionstatechange" with state "disconnected" ONLY may happen if the client was connected via TCP and it was abruptly closed without sending DTLS close first. So I strongly think we don't need any "reason" argument.

@jmillan
Copy link
Member

jmillan commented Feb 16, 2024

Ok, we'll see if there is a need when troubleshooting. Not a blocker anyhow.

@ibc ibc requested a review from nazar-pc February 16, 2024 17:05
@ibc ibc marked this pull request as ready for review February 16, 2024 17:05
@ibc
Copy link
Member Author

ibc commented Feb 16, 2024

This is done, Rust included.

@ibc
Copy link
Member Author

ibc commented Feb 18, 2024

Hell, demo crashed:

2024-02-18T13:55:03.784Z mediasoup-demo-server:WARN:Room WebRtcTransport "icestatechange" event [iceState:disconnected], closing peer

2024-02-18T13:55:03.793Z mediasoup-demo-server:WARN:Room WebRtcTransport "icestatechange" event [iceState:disconnected], closing peer

2024-02-18T13:55:04.824Z mediasoup:ERROR:Channel Producer Channel error: Error: read ECONNRESET
2024-02-18T13:55:04.825Z mediasoup:ERROR:Worker worker process died unexpectedly [pid:1580729, code:null, signal:SIGSEGV]
2024-02-18T13:55:04.876Z mediasoup-demo-server:ERROR mediasoup Worker died, exiting  in 2 seconds... [pid:1580729]
2024-02-18T13:55:04.877Z mediasoup-demo-server:WARN:Room _createConsumer() | failed:InvalidStateError: Channel closed, pending request aborted [method:CONSUMER_RESUME, 

@nazar-pc
Copy link
Collaborator

Is it just me or everyone else is also fed up with memory-related issues in C++?

@ibc
Copy link
Member Author

ibc commented Feb 19, 2024

Ok I got core dump in demo server:

Program terminated with signal SIGSEGV, Segmentation fault.

#0  0x0000558176a1618b in RTC::StunPacket::CheckAuthentication(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) ()
#1  0x0000558176975be1 in RTC::IceServer::ProcessStunPacket(RTC::StunPacket*, RTC::TransportTuple*) ()
#2  0x0000558176a43909 in RTC::WebRtcServer::OnUdpSocketPacketReceived(RTC::UdpSocket*, unsigned char const*, unsigned long, sockaddr const*) ()
#3  0x0000558176d47515 in uv.udp_recvmmsg ()
#4  0x0000558176d47f6b in uv.udp_io ()
#5  0x0000558176d4b469 in uv.io_poll ()
#6  0x0000558176d3eb2f in uv_run ()
#7  0x00005581768f6a69 in DepLibUV::RunLoop() ()
#8  0x0000558176900f82 in Worker::Worker(Channel::ChannelSocket*) ()
#9  0x00005581768f5b04 in mediasoup_worker_run ()
#10 0x00005581768f4c4c in main ()

@ibc
Copy link
Member Author

ibc commented Feb 19, 2024

It's crashing in StunPacket::CheckAuthentication() method.

I did changes in that method, of course, since we used to only verify that the first part of the USERNAME attribute in the received STUN packet matches our server side ICE username fragment plus colon. But now we also have (or may have) the remote username fragment, so we match the received USERNAME attribute with localICEusernameFrag:remoteICEusernameFrag as per spec. I've verified that it works fine despite the remote provided the mediasoup WebRtcTransport with its remote ICE usernameFragment/pwd or not. So no idea what is crashing.

NOTE that in this PR we are also calling packet->CheckAuthentication() in STUN responses received from client. So we don't know if the crash is happening when receiving a STUN request or a STUN response.

@ibc
Copy link
Member Author

ibc commented Feb 19, 2024

I'm refactoring C++ code a bit to have better error traces.

@ibc
Copy link
Member Author

ibc commented Feb 19, 2024

Wow, another crash in the demo and this is NOT related to STUN...

Program terminated with signal SIGSEGV, Segmentation fault.

#0  0x0000563fbc0e96f6 in RTC::WebRtcTransport::OnRtcpDataReceived(RTC::TransportTuple*, unsigned char const*, unsigned long) ()
[Current thread is 1 (Thread 0x7fb43c53e440 (LWP 1643202))]
(gdb) bt
#0  0x0000563fbc0e96f6 in RTC::WebRtcTransport::OnRtcpDataReceived(RTC::TransportTuple*, unsigned char const*, unsigned long) ()
#1  0x0000563fbc0dbc9f in RTC::WebRtcServer::OnUdpSocketPacketReceived(RTC::UdpSocket*, unsigned char const*, unsigned long, sockaddr const*) ()
#2  0x0000563fbc3df515 in uv.udp_recvmmsg ()
#3  0x0000563fbc3dff6b in uv.udp_io ()
#4  0x0000563fbc3e3469 in uv.io_poll ()
#5  0x0000563fbc3d6b2f in uv_run ()
#6  0x0000563fbbf8ea69 in DepLibUV::RunLoop() ()
#7  0x0000563fbbf98f82 in Worker::Worker(Channel::ChannelSocket*) ()
#8  0x0000563fbbf8db04 in mediasoup_worker_run ()
#9  0x0000563fbbf8cc4c in main ()

However ICE disconnection happened before:

2024-02-19T11:36:27.220Z mediasoup-demo-server:WARN:Room WebRtcTransport "icestatechange" event [iceState:disconnected], closing peer
2024-02-19T11:36:35.394Z mediasoup-demo-server:WARN:Room WebRtcTransport "icestatechange" event [iceState:disconnected], closing peer
2024-02-19T11:36:35.485Z mediasoup:ERROR:Worker worker process died unexpectedly [pid:1643202, code:null, signal:SIGSEGV]

@ibc
Copy link
Member Author

ibc commented Feb 19, 2024

Ok, I've removed the sending of ICE requests from mediasoup and now it monitors received ICE request consents.

CC @ggarber @fippo

@ibc
Copy link
Member Author

ibc commented Feb 19, 2024

Wow, another crash in the demo and this is NOT related to STUN...

Program terminated with signal SIGSEGV, Segmentation fault.

#0  0x0000563fbc0e96f6 in RTC::WebRtcTransport::OnRtcpDataReceived(RTC::TransportTuple*, unsigned char const*, unsigned long) ()
[Current thread is 1 (Thread 0x7fb43c53e440 (LWP 1643202))]
(gdb) bt
#0  0x0000563fbc0e96f6 in RTC::WebRtcTransport::OnRtcpDataReceived(RTC::TransportTuple*, unsigned char const*, unsigned long) ()
#1  0x0000563fbc0dbc9f in RTC::WebRtcServer::OnUdpSocketPacketReceived(RTC::UdpSocket*, unsigned char const*, unsigned long, sockaddr const*) ()
#2  0x0000563fbc3df515 in uv.udp_recvmmsg ()
#3  0x0000563fbc3dff6b in uv.udp_io ()
#4  0x0000563fbc3e3469 in uv.io_poll ()
#5  0x0000563fbc3d6b2f in uv_run ()
#6  0x0000563fbbf8ea69 in DepLibUV::RunLoop() ()
#7  0x0000563fbbf98f82 in Worker::Worker(Channel::ChannelSocket*) ()
#8  0x0000563fbbf8db04 in mediasoup_worker_run ()
#9  0x0000563fbbf8cc4c in main ()

However ICE disconnection happened before:

2024-02-19T11:36:27.220Z mediasoup-demo-server:WARN:Room WebRtcTransport "icestatechange" event [iceState:disconnected], closing peer
2024-02-19T11:36:35.394Z mediasoup-demo-server:WARN:Room WebRtcTransport "icestatechange" event [iceState:disconnected], closing peer
2024-02-19T11:36:35.485Z mediasoup:ERROR:Worker worker process died unexpectedly [pid:1643202, code:null, signal:SIGSEGV]

And this has been fixed. Problem is that I was deallocating a tuple before notifying it to the listener who was then calling methods on a deallocated instance.

@ibc
Copy link
Member Author

ibc commented Feb 19, 2024

New stuff in FuzzerStunPacket.cpp also passing fuzzer.

@ibc
Copy link
Member Author

ibc commented Feb 20, 2024

I see a coredump in the demo but I strongly fail to correlate it with this PR:

Program terminated with signal SIGABRT, Aborted.
#0  __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at ./nptl/pthread_kill.c:44
44	./nptl/pthread_kill.c: No such file or directory.
[Current thread is 1 (Thread 0x7f0774b3e440 (LWP 1695953))]
(gdb) bt
#0  __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at ./nptl/pthread_kill.c:44
#1  0x00007f0774bcad9f in __pthread_kill_internal (signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:78
#2  0x00007f0774b7bf32 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#3  0x00007f0774b66472 in __GI_abort () at ./stdlib/abort.c:79
#4  0x000055b064e4bd6e in RTC::RTCP::FeedbackRtpTransportPacket::AddPacket(unsigned short, unsigned long, unsigned long) ()
#5  0x000055b064dd9ade in RTC::TransportCongestionControlServer::FillAndSendTransportCcFeedback() ()
#6  0x000055b0650dcf3e in uv.run_timers ()
#7  0x000055b0650e0c88 in uv_run ()
#8  0x000055b064c9a849 in DepLibUV::RunLoop() ()
#9  0x000055b064ca4d62 in Worker::Worker(Channel::ChannelSocket*) ()
#10 0x000055b064c998e4 in mediasoup_worker_run ()
#11 0x000055b064c98a2c in main ()

Hell, problem is in FeedbackRtpTransport::AddPacket(), an assertion is failing. I'll open a separate issue for this:

2024-02-20T07:08:16.181Z mediasoup:ERROR:Worker (stderr) (ABORT) RTC::RTCP::FeedbackRtpTransport::AddPacket() | failed assertion `baseSet': base not set
2024-02-20T07:08:16.569Z mediasoup:ERROR:Channel Producer Channel error: Error: read ECONNRESET
2024-02-20T07:08:16.574Z mediasoup-demo-server:WARN:Room _createDataConsumer() | failed:Error: request timeout at Timeout._onTimeout (/var/www/v3demo.mediasoup.org/node_modules/protoo-server/lib/Peer.js:156:14) at listOnTimeout (node:internal/timers:569:17) at process.processTimers (node:internal/timers:512:7)
2024-02-20T07:08:16.784Z mediasoup:ERROR:Worker worker process died unexpectedly [pid:1695953, code:null, signal:SIGABRT]
2024-02-20T07:08:16.823Z mediasoup-demo-server:ERROR mediasoup Worker died, exiting  in 2 seconds... [pid:1695953]

@ggarber
Copy link
Contributor

ggarber commented Feb 20, 2024

And this has been fixed. Problem is that I was deallocating a tuple before notifying it to the listener who was then calling methods on a deallocated instance.

Consider not destroying/deallocating anything. Just notify the application about "disconnected" state and let the application decide what to do.

Similarly in the browser side the ICE "disconnected" state is not destroying anything and can recover if the user decides to keep the connection open after that event.

I think it would simplify the logic and avoid some of the issues you are debugging here. Feel free to ignore it.

@ibc
Copy link
Member Author

ibc commented Feb 20, 2024

Consider not destroying/deallocating anything. Just notify the application about "disconnected" state and let the application decide what to do.

Similarly in the browser side the ICE "disconnected" state is not destroying anything and can recover if the user decides to keep the connection open after that event.

Perhaps I was not clear but I didn't mean that. We are not deallocating the WebRtcTransport when ICE is disconnected not even when remote sends us a DTLS Close Alert. We just notify the app about those events.

The bug I mean was an internal C++ thing fixed in this commit and has nothing to do with communication with Node or Rust: 7ee1753

this->oldUsernameFragment.clear();
this->oldPassword.clear();
}
if (IsConsentCheckSupported() && IsConsentCheckRunning())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would suffice:

Suggested change
if (IsConsentCheckSupported() && IsConsentCheckRunning())
if (IsConsentCheckRunning())

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I know. But I'd prefer to be super explicit because lot of logic in IceServer file is too sensitive and probably should be refactored in the future.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rationale is that I want to prevent that consent check is started when not supported. It cannot happen because the StartConsentCheck() method asserts IsConsentCheckSupported() but anyway.

Comment on lines +878 to +879
MS_ASSERT(IsConsentCheckSupported(), "ICE consent check not supported");
MS_ASSERT(IsConsentCheckRunning(), "ICE consent check not running");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would suffice:

Suggested change
MS_ASSERT(IsConsentCheckSupported(), "ICE consent check not supported");
MS_ASSERT(IsConsentCheckRunning(), "ICE consent check not running");
MS_ASSERT(IsConsentCheckRunning(), "ICE consent check not running");

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

Comment on lines +889 to +890
MS_ASSERT(IsConsentCheckSupported(), "ICE consent check not supported");
MS_ASSERT(IsConsentCheckRunning(), "ICE consent check not running");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would suffice.

In general, if we the check is running, we can assume it is supported. And we it's enough to check for support only when start running.

Suggested change
MS_ASSERT(IsConsentCheckSupported(), "ICE consent check not supported");
MS_ASSERT(IsConsentCheckRunning(), "ICE consent check not running");
MS_ASSERT(IsConsentCheckRunning(), "ICE consent check not running");

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

@ibc ibc merged commit 32e83e3 into v3 Feb 20, 2024
22 checks passed
@ibc ibc deleted the server-side-ice-consent-checks branch February 20, 2024 17:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

Support ICE Consent Freshness (RFC7675)
4 participants