Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

internal/types/acceptor: added an option configuration to fold a multi-batch by key #1115

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

ryanluu12345
Copy link
Contributor

@ryanluu12345 ryanluu12345 commented Jan 17, 2025

Previously, there was a bug resulting from performing a combination of inserts and deletes could lead to data inconsistency because of how deletes and inserts are handled in the AcceptMultiBatch to handle the possibility of FK updates/deletes. As a result, we do a FoldByKey for each temporal batch in the multi batch before we flush to the target, so that only the last mutation for a single table and single PK remains. This makes it so that it is safe to perform the deletes first and then the updates after.

This is also extensible in the future since we can control the behavior for this folding via a configuration flag, instead of having to plumb in a new acceptor each time; this allows us to turn on and off this ability to fold.

Resolves: CC-30958
Release Note: None

TODO

  • Add new tests for the previously wrong behavior
  • Remove hack for FoldByKey (but right now may require refactor)

This change is Reviewable

…i-batch by key

Previously, there was a bug resulting from performing a combination of inserts
and deletes could lead to data inconsistency because of how deletes and inserts
are handled in the AcceptMultiBatch to handle the possibility of FK
updates/deletes. As a result, we do a FoldByKey for each temporal batch in the
multi batch before we flush to the target, so that only the last mutation for a
single table and single PK remains. This makes it so that it is safe to perform
the deletes first and then the updates after.

This is also extensible in the future since we can control the behavior for
this folding via a configuration flag, instead of having to plumb in a new
acceptor each time; this allows us to turn on and off this ability to fold.

Resolves: CC-30958
Release Note: None
// Need to refactor this into a new place so that this and msort both
// import from `types`.
// This is a workaround.
func FoldByKey(x []Mutation) ([]Mutation, error) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sravotto @ZhouXing19 any thoughts on how to best resolve this issue? I was thinking of moving the acceptors into their own package since keeping it in types seems too coupled and also leads to import cycles like this.

Any alternatives?

"github.com/pkg/errors"
)

// FoldByKey now lives in this package since it's crucial to the
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before considering this approach I did a local prototype of moving acceptor to its own package so we avoid the import cycle. However, this ends up leading to a mass refactor across the codebase that would impact in-flight PRs. After doing more investigation, I also realized that the end result is still an import cycle because types is required by acceptor and vice versa.

Ended up just moving all references to point to types.FoldByKey and this is a much simpler and safer change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented this, but the open question here is should we keep this off or on by default. If we keep this off, then we have the same behavior as before, but is exposed to this correctness issue. If we keep this on, then we may have a perf impact because we do the collapsing every time we accept a batch, which in PG and MySQL right now would do it every time commit happens.

@@ -157,6 +157,7 @@ func (c *Conn) accumulateBatch(

if err := c.acceptor.AcceptTemporalBatch(ctx, batch, &types.AcceptOptions{
TargetQuerier: tx,
FoldKeys: true,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A TODO that we must consider when we pick this back up next week is to ensure that this FoldKeys behavior is gated by a flag (I recommend --collapseUpdates or --useBatchMode) so that we can turn on or off the behavior. Here are the tradeoffs:

  • If FoldKeys is set to on: then we do extra O(N) processing to do the collapsing and reassign the proper batches
  • If FoldKeys is set to off: then we may run the risk of a correctness issue for the same PK being updated multiple times (since we have the bug in the FK logic)

What we have learned from previous customers is that it's helpful to have this escape hatch so they can turn off undesirable behavior as needed.

@ryanluu12345 ryanluu12345 force-pushed the user/rluu/fold-acceptor branch 2 times, most recently from 2c8a8b9 to f9ba3a4 Compare January 19, 2025 05:16
@ZhouXing19
Copy link
Contributor

The whole logic LGTM, just thinking that tests are needed.
I'm thinking why we should leave the foldbykey as an option rather than always executing it, as I thought we should always prioritize correctness over performance.

@ryanluu12345
Copy link
Contributor Author

The whole logic LGTM, just thinking that tests are needed. I'm thinking why we should leave the foldbykey as an option rather than always executing it, as I thought we should always prioritize correctness over performance.

I think mainly because this fixes a correctness issue in the case that you could have an update/insert/update in the same batch. However, not all customer's workloads are like that. In certain customer cases, this may regress their performance at scale because the extra O(N) operations. And also because we do expensive JSON decoding to do this folding. So, wanted to give an escape hatch here if they understand their workload.

How open are you to just making this flag true by default so it's correct behavior. But if a customer knows for a fact their workload won't hit this, they can turn it off with the flag?

@ZhouXing19
Copy link
Contributor

I lean toward true as default but might want more PM involvement in the discussion too

Copy link
Contributor

@sravotto sravotto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would consider moving msort.go into the types package. I think UniqueByTimeKey is only referenced once.
Other than that, looks good to me.

Reviewable status: 0 of 14 files reviewed, 4 unresolved discussions (waiting on @Jeremyyang920, @noelcrl, @ryanluu12345, and @ZhouXing19)


internal/types/utils.go line 28 at r4 (raw file):

)

// FoldByKey now lives in this package since it's crucial to the

Would it make sense just to move the whole msort.go (and test) under types?

@ryanluu12345 ryanluu12345 force-pushed the user/rluu/fold-acceptor branch from f9ba3a4 to 85d3768 Compare January 21, 2025 18:05
Copy link
Contributor Author

@ryanluu12345 ryanluu12345 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also in agreement that true should be the default. Will loop in PM review here.

@sravotto , good callout, I'll go ahead and move msort and relevant tests to types and update all references.

Reviewable status: 0 of 17 files reviewed, 4 unresolved discussions (waiting on @Jeremyyang920, @noelcrl, and @ZhouXing19)

Copy link
Contributor Author

@ryanluu12345 ryanluu12345 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more note here: I'll work on adding the INSERT -> DELETE use case some time this week and will bottom out on the default behavior.

Reviewable status: 0 of 17 files reviewed, 4 unresolved discussions (waiting on @Jeremyyang920, @noelcrl, and @ZhouXing19)

@ryanluu12345 ryanluu12345 requested a review from sravotto January 21, 2025 18:12
@ryanluu12345
Copy link
Contributor Author

ryanluu12345 commented Jan 21, 2025

One more note here: I'll work on adding the INSERT -> DELETE use case some time this week and will bottom out on the default behavior.

Reviewable status: 0 of 17 files reviewed, 4 unresolved discussions (waiting on @Jeremyyang920, @noelcrl, and @ZhouXing19)

@ZhouXing19 @sravotto so I'm looking into our testing right now and see that we have internal/source/pglogical/integration_test.go which has a TestPGLogical which has a helper that does the setup for schema, create table, and the data movement.

In MySQL, there is internal/source/mylogical/integration_test.go and in CRDB webhook, there is internal/source/cdc/server/integration_test.go TestIntegration.

I plan to add a new set of configurations that helps to reproduce the INSERT -> DELETE -> INSERT logic here and then add a new test case. Thoughts?

Copy link
Contributor

@sravotto sravotto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me.

Reviewed 1 of 5 files at r1, 4 of 10 files at r6.
Reviewable status: 4 of 18 files reviewed, 3 unresolved discussions (waiting on @Jeremyyang920, @noelcrl, and @ZhouXing19)

@ryanluu12345 ryanluu12345 requested a review from sravotto January 21, 2025 21:59
@ryanluu12345
Copy link
Contributor Author

@sravotto @ZhouXing19 when you get a chance, wondering if you can give a review on the latest iteration? Just added in the testing and have a path forward for the flag or no flag.

@rohan-joshi and I chatted and he requested that I benchmark to understand better how much timing difference doing this collapseMutation takes vs. not doing it. If the impact is trivial, then the vote goes towards not adding the flag. I'll do this sometime early next week.

@ryanluu12345 ryanluu12345 force-pushed the user/rluu/fold-acceptor branch from 3e6743d to 5f756e2 Compare January 21, 2025 22:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants