-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
internal/types/acceptor: added an option configuration to fold a multi-batch by key #1115
base: master
Are you sure you want to change the base?
Conversation
…i-batch by key Previously, there was a bug resulting from performing a combination of inserts and deletes could lead to data inconsistency because of how deletes and inserts are handled in the AcceptMultiBatch to handle the possibility of FK updates/deletes. As a result, we do a FoldByKey for each temporal batch in the multi batch before we flush to the target, so that only the last mutation for a single table and single PK remains. This makes it so that it is safe to perform the deletes first and then the updates after. This is also extensible in the future since we can control the behavior for this folding via a configuration flag, instead of having to plumb in a new acceptor each time; this allows us to turn on and off this ability to fold. Resolves: CC-30958 Release Note: None
internal/types/acceptors.go
Outdated
// Need to refactor this into a new place so that this and msort both | ||
// import from `types`. | ||
// This is a workaround. | ||
func FoldByKey(x []Mutation) ([]Mutation, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sravotto @ZhouXing19 any thoughts on how to best resolve this issue? I was thinking of moving the acceptors into their own package since keeping it in types seems too coupled and also leads to import cycles like this.
Any alternatives?
…ly and acceptor code
internal/types/utils.go
Outdated
"github.com/pkg/errors" | ||
) | ||
|
||
// FoldByKey now lives in this package since it's crucial to the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before considering this approach I did a local prototype of moving acceptor
to its own package so we avoid the import cycle. However, this ends up leading to a mass refactor across the codebase that would impact in-flight PRs. After doing more investigation, I also realized that the end result is still an import cycle because types
is required by acceptor
and vice versa.
Ended up just moving all references to point to types.FoldByKey
and this is a much simpler and safer change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implemented this, but the open question here is should we keep this off or on by default. If we keep this off, then we have the same behavior as before, but is exposed to this correctness issue. If we keep this on, then we may have a perf impact because we do the collapsing every time we accept a batch, which in PG and MySQL right now would do it every time commit happens.
internal/source/pglogical/conn.go
Outdated
@@ -157,6 +157,7 @@ func (c *Conn) accumulateBatch( | |||
|
|||
if err := c.acceptor.AcceptTemporalBatch(ctx, batch, &types.AcceptOptions{ | |||
TargetQuerier: tx, | |||
FoldKeys: true, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A TODO that we must consider when we pick this back up next week is to ensure that this FoldKeys
behavior is gated by a flag (I recommend --collapseUpdates
or --useBatchMode
) so that we can turn on or off the behavior. Here are the tradeoffs:
- If FoldKeys is set to on: then we do extra O(N) processing to do the collapsing and reassign the proper batches
- If FoldKeys is set to off: then we may run the risk of a correctness issue for the same PK being updated multiple times (since we have the bug in the FK logic)
What we have learned from previous customers is that it's helpful to have this escape hatch so they can turn off undesirable behavior as needed.
2c8a8b9
to
f9ba3a4
Compare
The whole logic LGTM, just thinking that tests are needed. |
I think mainly because this fixes a correctness issue in the case that you could have an update/insert/update in the same batch. However, not all customer's workloads are like that. In certain customer cases, this may regress their performance at scale because the extra O(N) operations. And also because we do expensive JSON decoding to do this folding. So, wanted to give an escape hatch here if they understand their workload. How open are you to just making this flag true by default so it's correct behavior. But if a customer knows for a fact their workload won't hit this, they can turn it off with the flag? |
I lean toward true as default but might want more PM involvement in the discussion too |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would consider moving msort.go
into the types
package. I think UniqueByTimeKey
is only referenced once.
Other than that, looks good to me.
Reviewable status: 0 of 14 files reviewed, 4 unresolved discussions (waiting on @Jeremyyang920, @noelcrl, @ryanluu12345, and @ZhouXing19)
internal/types/utils.go
line 28 at r4 (raw file):
) // FoldByKey now lives in this package since it's crucial to the
Would it make sense just to move the whole msort.go (and test) under types?
f9ba3a4
to
85d3768
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm also in agreement that true
should be the default. Will loop in PM review here.
@sravotto , good callout, I'll go ahead and move msort and relevant tests to types
and update all references.
Reviewable status: 0 of 17 files reviewed, 4 unresolved discussions (waiting on @Jeremyyang920, @noelcrl, and @ZhouXing19)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One more note here: I'll work on adding the INSERT -> DELETE
use case some time this week and will bottom out on the default behavior.
Reviewable status: 0 of 17 files reviewed, 4 unresolved discussions (waiting on @Jeremyyang920, @noelcrl, and @ZhouXing19)
@ZhouXing19 @sravotto so I'm looking into our testing right now and see that we have In MySQL, there is I plan to add a new set of configurations that helps to reproduce the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense to me.
Reviewed 1 of 5 files at r1, 4 of 10 files at r6.
Reviewable status: 4 of 18 files reviewed, 3 unresolved discussions (waiting on @Jeremyyang920, @noelcrl, and @ZhouXing19)
@sravotto @ZhouXing19 when you get a chance, wondering if you can give a review on the latest iteration? Just added in the testing and have a path forward for the flag or no flag. @rohan-joshi and I chatted and he requested that I benchmark to understand better how much timing difference doing this collapseMutation takes vs. not doing it. If the impact is trivial, then the vote goes towards not adding the flag. I'll do this sometime early next week. |
3e6743d
to
5f756e2
Compare
Previously, there was a bug resulting from performing a combination of inserts and deletes could lead to data inconsistency because of how deletes and inserts are handled in the AcceptMultiBatch to handle the possibility of FK updates/deletes. As a result, we do a FoldByKey for each temporal batch in the multi batch before we flush to the target, so that only the last mutation for a single table and single PK remains. This makes it so that it is safe to perform the deletes first and then the updates after.
This is also extensible in the future since we can control the behavior for this folding via a configuration flag, instead of having to plumb in a new acceptor each time; this allows us to turn on and off this ability to fold.
Resolves: CC-30958
Release Note: None
TODO
This change is