Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Worker: Fix possible double free when ICE consent check fails #1393

Merged

Conversation

ibc
Copy link
Member

@ibc ibc commented May 8, 2024

Fixes #1392

@ibc ibc requested a review from jmillan May 8, 2024 15:45
@ibc
Copy link
Member Author

ibc commented May 8, 2024

@miroslavpejic85, I've not been able to reproduce the crash even by forcing things that IMHO should trigger the crash. Anyway, I think that this PR is gonna fix it.

Is it possible that you try this branch and confirm whether it solves the crash or not?

@ibc
Copy link
Member Author

ibc commented May 8, 2024

This flow is super complex but it's what it is:

When calling webRtcServer.close() having active TCP connections:

  1. ~WebRtcServer iterates all its WebRtcTransports and calls `webRtcTransport->ListenServerClosed()``.

  2. Transport::ListenServerClosed() calls this->listener->OnTransportListenServerClosed().

  3. So Router deletes transport.

  4. ~WebRtcTransport deletes IceServer.

  5. ~IceServer iterates tuples and calls this->listener->OnIceServerTupleRemoved(tuple).

  6. So WebRtcTransport calls tuple->CloseTcpConnection().

  7. TransportTuple::CloseTcpConnection() calls tcpConnection->TriggerClose().

  8. So TcpConnectionHandle calls this->listener->OnTcpConnectionClosed().

  9. So TcpServerHandle calls UserOnTcpConnectionClosed() which calls this->listener->OnRtcTcpConnectionClosed() [A], and then deletes the TcpConnectionHandle.

Problem is [A] since it ends calling iceServer->RemoveTuple() which calls listener->OnIceServerTupleRemoved() again with same tuple.

@miroslavpejic85
Copy link

Is it possible that you try this branch and confirm whether it solves the crash or not?

Sure, I need to reconstruct the worker on this branch and give it a try right?

cd worker; MEDIASOUP_BUILDTYPE='Debug' make

@ibc
Copy link
Member Author

ibc commented May 8, 2024

Yes. Or just add a git url pointing to this branch as mediasoup dependency in the package.json of your server app and run "npm install mediasoup".

@miroslavpejic85
Copy link

When calling webRtcServer.close() having active TCP connections:

Note: The crash occurs despite not utilizing the WebRtcServer, but rather relying on the default WebRtcTransportOption for the transports.

@ibc
Copy link
Member Author

ibc commented May 8, 2024

The fix in this PR is not related to WebRtcServer so it may be valid (assuming in doesn't crash anymore for you once you try the branch).

@miroslavpejic85
Copy link

Great, then I'll try the fix and keep you updated.

@snnz
Copy link
Contributor

snnz commented May 8, 2024

5. ~IceServer iterates tuples and calls this->listener->OnIceServerTupleRemoved(tuple).

I think this can happen wherever OnIceServerTupleRemoved is called with a tuple containing TCP connection that is not closed yet.
In IceServer::AddTuple, when the tuples size exceeds MaxTuples:

this->listener->OnIceServerTupleRemoved(this, removedTuple);
this->tuples.erase(std::next(it).base());

What the iterator will point to? This will either crash, or destroy some extra tuple.

In IceServer::RemoveTuple it will stop short of recursively calling itself, because at that moment the connection is already closed, but nevertheless, it'll dive in webRtcTransportListener->OnWebRtcTransportTransportTupleRemoved, tuple->CloseTcpConnection - rather unnecessary.
It's better to block reenter IceServer::RemoveTuple from any method that calls OnIceServerTupleRemoved, including IceServer::RemoveTuple itself.

@ibc
Copy link
Member Author

ibc commented May 8, 2024

  1. ~IceServer iterates tuples and calls this->listener->OnIceServerTupleRemoved(tuple).

I think this can happen wherever OnIceServerTupleRemoved is called with a tuple containing TCP connection that is not closed yet. In IceServer::AddTuple, when the tuples size exceeds MaxTuples:

this->listener->OnIceServerTupleRemoved(this, removedTuple);
this->tuples.erase(std::next(it).base());

What the iterator will point to? This will either crash, or destroy some extra tuple.

It's indeed crashing but not sure why. This was done and tested long ago, not sure what has changed. I've just set static constexpr size_t MaxTuples{ 1 }; in IceServer.cpp, connected a client with TCP and triggered a ICE restart (mediasoup demo):

  mediasoup:Channel request() [method:TRANSPORT_RESTART_ICE] +7s
  mediasoup:Channel request succeeded [method:TRANSPORT_RESTART_ICE, id:57] +0ms
  mediasoup:WARN:Channel [pid:35928] RTC::IceServer::AddTuple() | too too many tuples, removing the oldest non selected one +0ms
  mediasoup:ERROR:Worker (stderr) (ABORT) RTC::IceServer::AddTuple() | failed assertion `removedTuple': couldn't find any tuple to be removed +0ms

However I think it crashes because the code assumes that MaxTuples is > 1.

What the iterator will point to? This will either crash, or destroy some extra tuple.

Why do you think it will crash? The comment above it clearly indicates what it does:

			// Notify the listener.
			this->listener->OnIceServerTupleRemoved(this, removedTuple);

			// Remove it from the list of tuples.
			// NOTE: Do it after notifying the listener since the listener may need to
			// use/read the tuple being removed so we cannot free it yet.
			// NOTE: This trick is needed since it is a reverse_iterator and
			// erase() requires a iterator, const_iterator or bidirectional_iterator.
			this->tuples.erase(std::next(it).base());

In IceServer::RemoveTuple it will stop short of recursively calling itself, because at that moment the connection is already closed, but nevertheless, it'll dive in webRtcTransportListener->OnWebRtcTransportTransportTupleRemoved, tuple->CloseTcpConnection - rather unnecessary. It's better to block reenter IceServer::RemoveTuple from any method that calls OnIceServerTupleRemoved, including IceServer::RemoveTuple itself.

I also expected this to happen, but for whatever reason it doesn't (or I couldn't reproduce it). But I agree that it makes sense that we block reenter of RemoveTuple() while invoking the listener->OnIceServerTupleRemoved() callback. I'll do it soon.

@ibc
Copy link
Member Author

ibc commented May 8, 2024

@snnz I've done it in last commit, can you check?

@snnz
Copy link
Contributor

snnz commented May 8, 2024

Why do you think it will crash?

Because this->listener->OnIceServerTupleRemoved(this, removedTuple) will, through all that chain, call this->RemoveTuple(removedTuple) , which will find and erase removedTuple.

@snnz I've done it in last commit, can you check?

Everything is ok now, I believe this will fix the problem.

@ibc
Copy link
Member Author

ibc commented May 8, 2024

In IceServer::RemoveTuple it will stop short of recursively calling itself, because at that moment the connection is already closed

While this PR fixes this I still don't understand why (before this PR) we didn't get an infinite loop. You say that it will stop recursively calling itself because the connection is closed, but we do not check if the connection is closed anywhere (if closed it should be deleted or being deleted). Well, now that I think about it I just removed that check in a recent related PR. Anyway I think everything is ok now.

@snnz
Copy link
Contributor

snnz commented May 8, 2024

but we do not check if the connection is closed anywhere

But there is a check, in TcpConnectionHandle::TriggerClose:

void TcpConnectionHandle::TriggerClose()
{
	MS_TRACE();

	if (this->closed)
	{
		return;
	}

	InternalClose();

	this->listener->OnTcpConnectionClosed(this);
}

InternalClose sets closed flag, thus next time TriggerClose won't proceed and call OnTcpConnectionClosed.

@ibc
Copy link
Member Author

ibc commented May 8, 2024

True.

@ibc
Copy link
Member Author

ibc commented May 9, 2024

@miroslavpejic85 any ETA for you to test this branch? After the discussion above it's clear that these changes make sense and I'd like to release ASAP, but having your ok would be great.

@miroslavpejic85
Copy link

@ibc Everything seems to be running smoothly now, without any worker crashes since yesterday. You can go ahead and release it. Thank you!

@ibc
Copy link
Member Author

ibc commented May 9, 2024

@ibc Everything seems to be running smoothly now, without any worker crashes since yesterday. You can go ahead and release it. Thank you!

Thanks a lot!

@ibc ibc merged commit fab4891 into v3 May 9, 2024
35 checks passed
@ibc ibc deleted the fix-ice-server-possible-double-free-when-ice-consent-check-fails branch May 9, 2024 15:20
@miroslavpejic85
Copy link

Thanks a lot!

You're welcome! And thank you for addressing the issue promptly ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

Mediasoup worker died, exiting in 2 seconds...
4 participants