-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Worker Crash SIGABRT #1176
Comments
Also seen it with this error as well. |
@GEverding, there is basically no description about the issue. Please elaborate the scenario in which the problem happens, and wether it reproduces always, etc. |
My apologies I was dealing with another issue. I'm seeing this in normal use of mediasoup in production. It's a fairly rare issues. My guess is it happens when a client disconnects half way through setup because I see this more on startup of a new instance. I know you really need core dumps for this and I'm working on that. |
The error refers to the UnixSocket between Node and C++ worker. It's not related to client connections. |
Do you want a core dump to debug this or do you need something else. I did see this on v20 but I'm seeing it more on v21-nightly (there's a node ssl crash in v20-v18) |
Coredump please. If you figure out reliable steps to reproduce it would be great but I think that will be hard. |
Still working on the core dump. but saw this in the log and might be the issue: 2023-10-16T19:42:22.374Z mediasoup:ERROR:Worker (stderr) (ABORT) RTC::IceServer::HandleTuple() | failed assertion `this->tuples.empty()': state is 'disconnected' but there are 1 tuples |
Interesting. I will check tomorrow. If you can get the core dump it would be better but maybe I'm able to figure out the issue without it. Thanks. |
Is this enough of a coredump? I have not been able to repo in testing. |
Yes, it's enough. Thanks. Will work on it finally tomorrow. BTW do you have some special scenario such as clients connecting to mediasoup through TCP instead of UDP? Are you using separate WebRtcServer to make all WebRtcTransports listen in same port(s) or not? |
Yes we have some clients using TCP b/c of network restrictions. We run UDP and TCP on different ports. We have 1 worker and it gets a WebRTCServer. Reading over my code there could be a race condition and 2 webrtcservers get created at startup. In this situation only 1 would be used for creating the all the transports. |
Do you literally mean that you are creating WebRtcServers in different Workers and assigning them the same listening ports? |
This is not possible.
|
We have 1 worker with 1 webrtcserver with 4 ports 40001+40002 for internal udp+tcp, 40003+40004 for public announced ips. Nothing is shared. I found this out yesterday. We were running a lot (95%) of traffic through TCP b/c of a config error. Since fixing the config error we see fewer errors. |
So the thing is that
|
So the bug is here, in void IceServer::HandleTuple(
RTC::TransportTuple* tuple, bool hasUseCandidate, bool hasNomination, uint32_t nomination)
{
MS_TRACE();
switch (this->state)
{
case IceState::NEW:
{
// There should be no tuples.
MS_ASSERT(
this->tuples.empty(), "state is 'new' but there are %zu tuples", this->tuples.size());
// There shouldn't be a selected tuple.
MS_ASSERT(!this->selectedTuple, "state is 'new' but there is selected tuple");
if (!hasUseCandidate && !hasNomination)
{
MS_DEBUG_TAG(
ice,
"transition from state 'new' to 'connected' [hasUseCandidate:%s, hasNomination:%s, nomination:%" PRIu32
"]",
hasUseCandidate ? "true" : "false",
hasNomination ? "true" : "false",
nomination);
// Store the tuple.
auto* storedTuple = AddTuple(tuple);
// Mark it as selected tuple.
SetSelectedTuple(storedTuple);
// Update state.
this->state = IceState::CONNECTED;
// Notify the listener.
this->listener->OnIceServerConnected(this);
}
else
{
// Store the tuple.
auto* storedTuple = AddTuple(tuple);
if ((hasNomination && nomination > this->remoteNomination) || !hasNomination)
{
MS_DEBUG_TAG(
ice,
"transition from state 'new' to 'completed' [hasUseCandidate:%s, hasNomination:%s, nomination:%" PRIu32
"]",
hasUseCandidate ? "true" : "false",
hasNomination ? "true" : "false",
nomination);
// Mark it as selected tuple.
SetSelectedTuple(storedTuple);
// Update state.
this->state = IceState::COMPLETED;
// Update nomination.
if (hasNomination && nomination > this->remoteNomination)
this->remoteNomination = nomination;
// Notify the listener.
this->listener->OnIceServerCompleted(this);
}
}
break;
} Notice that there are cases in which we store the tuple but we do NOT change the |
So basically after this PR #756 we made it possible for Hi @penguinol, problems here. |
Fixes #1176 ### Details - Problem was in this PR 756 that added support for ICE nomination. - Before that PR, we always assumed that if there is a tuple then we are in 'connected' or 'completed' state, so there is a selected ICE tuple. - That's no longer true when client uses ICE nomination stuff.
PR done here: #1182 |
@GEverding do you know which devices connect to your server? They obviously implement ICE renomination which is the reason of the crash (see PR) and AFAIK Chrome and libwebrtc doesn't support it. Well it does but I'm not aware that it's enabled by default. |
We have Chrome (electron), iOS and Android all connecting to our servers |
Well, we only use udp, so we have never met this crash. I'll check the PR. |
@GEverding it should be fixed in #1182, but I'm extremely busy working on N things at the same time. Is it possible for you to try the branch in that PR in your setup with TCP enabled and see if it works as expected? I've done all tests I could but I've not been able to reproduce the bug using v3 branch. |
Yeah I can build it and put it out in production. I'll know by tmrw likely if I don't see any crashes. |
Thanks a lot. I'm 95% sure it will work fine. |
@GEverding I'm merging PR #1182 since there was an obvious bug in the code and that PR fixes the vulnerability in the code. 3.12.16 released. But still I'd appreciate if you could validate whether it fixes your scenario. Thanks a lot. |
Hey its been out for 48hrs now and its looking good 👍 |
Great! |
Bug Report
IMPORTANT: We primarily use GitHub as an issue tracker. Just open an issue here if you have encountered a bug in mediasoup.
If you have questions or doubts about mediasoup or need support, please use the mediasoup Discourse Group instead:
https://mediasoup.discourse.group
If you got a crash in mediasoup, please try to provide a core dump into the issue report:
https://mediasoup.org/support/#crashes-in-mediasoup-get-a-core-dump
I don't have core dumps yet.
Your environment
Issue description
2023-10-15T20:33:22.685Z mediasoup:ERROR:Channel Producer Channel error: Error: write EPIPE
2023-10-15T20:33:22.731Z mediasoup:ERROR:PayloadChannel Producer PayloadChannel error: Error: write EPIPE
2023-10-15T20:33:22.751Z mediasoup:ERROR:Worker worker process died unexpectedly [pid:19, code:null, signal:SIGABRT]
The text was updated successfully, but these errors were encountered: