You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Here are observations from connector_framed_source_notify.pony that point out that the stream registry service can leak file stream ID registrations. If a stream ID is leaked (i.e., Wallaroo mistakenly believes that the stream ID is in active use by a Wallaroo worker when in fact the stream ID is not in use at any worker), then it becomes impossible to resume sending messages to Wallaroo with that ID.
// This is a reply from a query that we'd sent in a prior TCP
// connection, or else the TCP connection is closed now. If the
// connection has been closed, any state about this query would
// have already been purged from any local state ... which makes
// it difficult to recover from the situation we're in here.
// After all, that stream ID may already be registered & in active
// use on some other worker right now.
//
// TODO: The one hammer that we have in our toolbox is a complete
// rollback to the prior state: we can force the next checkpoint
// to rollback. That would cause the entire cluster to rollback,
// and each worker would tell all active ConnectorSource sessions
// to RESTART and close. Then the entire stream registry starts
// from a clean slate. However, Wallaroo sources cannot abort
// a checkpoint, so we cannot use this method. Either, we need
// to allow sources to abort a checkpoint, or else we need
// another way to address the problem of leaked stream ids.
Here's a variation of a stream ID leak that can be addressed by rollback, namely, a rollback triggered by a worker crash:
// If the global stream registry sends a success=true reply but
// this worker were to crash immediately afterward and drop that
// reply, then we might have a "leak" of the stream id,
// permanently stuck in active state. Also, we don't have
// Erlang's process link and monitor mechanisms to help repair
// such "leaked" stream id registrations. Fortunately, because
// this worker crashed, when this worker restarts, it will cause a
// global rollback and thus, as noted above, restart the stream
// registry from a clean slate.
The text was updated successfully, but these errors were encountered:
Here are observations from connector_framed_source_notify.pony that point out that the stream registry service can leak file stream ID registrations. If a stream ID is leaked (i.e., Wallaroo mistakenly believes that the stream ID is in active use by a Wallaroo worker when in fact the stream ID is not in use at any worker), then it becomes impossible to resume sending messages to Wallaroo with that ID.
Here's a variation of a stream ID leak that can be addressed by rollback, namely, a rollback triggered by a worker crash:
The text was updated successfully, but these errors were encountered: