Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow explicit control of AUTO_ID_CACHE in DM incremental sync of new tables #12039

Open
michaelmdeng opened this issue Jan 29, 2025 · 6 comments · May be fixed by #12040
Open

Allow explicit control of AUTO_ID_CACHE in DM incremental sync of new tables #12039

michaelmdeng opened this issue Jan 29, 2025 · 6 comments · May be fixed by #12040
Labels
type/feature Issues about a new feature

Comments

@michaelmdeng
Copy link
Contributor

Is your feature request related to a problem?

We cannot update the AUTO_ID_CACHE setting of a table after its creation.

When we replicate new tables via incremental DM, the CREATE TABLE statements are applied with default auto-id-cache size, which is problematic if we want to control it directly, or if we want to use MySQL compatibility mode w/ AUTO_ID_CACHE=1.

Describe the feature you'd like

We'd like to have control to explicitly set it in create table DDLs that are incrementally replicated in DM. When DM syncs a new table w/ an AUTO_INCREMENT column, the syncer can modify the DDL to set an explicit auto-id-cache size based on the syncer configuration.

We propose a new auto-id-cache-size Syncer config. When set, DM will update the create statement w/ the corresponding auto-id-cache size. When unset (0 or default), the statement will be left as is.

Describe alternatives you've considered

No response

Teachability, Documentation, Adoption, Migration Strategy

No response

@michaelmdeng michaelmdeng added the type/feature Issues about a new feature label Jan 29, 2025
@lance6716
Copy link
Contributor

ptal @alastori @OliverS929

Can you explain more details about the problem without adjust the AUTO_ID_CACHE? For example, when other SQL workloads need to INSERT some rows, it will be conflict with DM's replication?

@michaelmdeng
Copy link
Contributor Author

michaelmdeng commented Feb 3, 2025

ptal @alastori @OliverS929

Can you explain more details about the problem without adjust the AUTO_ID_CACHE? For example, when other SQL workloads need to INSERT some rows, it will be conflict with DM's replication?

Sure, we are using DM to incrementally migrate data from an upstream Aurora MySQL instance and the application client relies on monotonically increasing allocated IDs which we can achieve by specifying auto ID cache size of 1 for MySQl compatibility. So this establishes a requirement for us that all upstream auto increment tables must be created with cache size 1.

We normally backfill existing data through a bulk process (ex. Lightning) and explicitly control table creation so we can create existing tables with the correct cache size. However, once we finish bulk backfill and enter into incremental replication with DM, we may replicate newly created tables. We won't have other applications inserting rows during this period, but we do want readers of replicated data to read MySQL-like increasing IDs.

During incremental phase, DM creates new tables with the default cache size. This is prohibitive also because we cannot adjust the auto ID cache size after table creation. Instead, we want DM to create the new tables with cache size of 1 pfor MySQL compatibility (or more generically, with user configurable cache size).

@alastori
Copy link

alastori commented Feb 3, 2025

@michaelmdeng thanks for the context. I understand the main issue is related to the potential huge gaps in auto_increment fields when AUTO_ID_CACHE = 0.

@lance6716, @OliverS929 : I agree this is indeed an important problem that affects data consistency and application behavior during migrations using DM incremental replication. Let me add some additional context from my perspective:

  • The AUTO_ID_CACHE = 1 setting is critical for minimizing gaps in ID sequences maintaining similar MySQL behavior when required
  • The current limitation creates challenges in scenarios when tables are created in the source during ongoing replication and big gaps aren't expected when using auto-increment sequences
  • The proposed solution to allow explicit control via syncer config is valuable because it keeps behavior close to MySQL

@michaelmdeng note that small gaps in auto_increment fields are still possible if TiDB crashes or failover and in those edge cases it is not guaranteed that the target will have the same IDs as the source.

@michaelmdeng
Copy link
Contributor Author

note that small gaps in auto_increment fields are still possible if TiDB crashes or failover and in those edge cases it is not guaranteed that the target will have the same IDs as the source.

yep, we understand this possibility and are prepared for it

The proposed solution to allow explicit control via syncer config is valuable because it keeps behavior close to MySQL

Given the requirements/need here, is the proposed approach in the PR appropriate? any other design considerations for how incremental DM interacts w/ other modes?

@OliverS929
Copy link
Contributor

OliverS929 commented Feb 11, 2025

Sorry for the late reply. I agree with @alastori that we could indeed add an option to set parameters like AUTO_ID_CACHE when creating tables in DDLs, and it’s important to ensure MySQL compatibility in this case. In addition to the points raised above and the existing PR from @michaelmdeng, I have a few suggestions on how we could approach this issue:

Beyond the AUTO_ID_CACHE option in this issue, there are several other TableOption settings, such as AUTO_RANDOM_BASE or CHECKSUM (see docs), and new options may arise in the future that we’ll need to support in DDLs executed by DM. Instead of maintaining a growing list of options in our configuration files, we could introduce something like a TableOptionString that allows users to include all the table options they want in the CREATE TABLE DDL statement. This would address the current issue while providing flexibility for future needs and avoiding further configuration changes down the line. For implementation, rather than appending a single option and its value, we could directly append the entire user-defined string. I believe this approach would be more robust and adaptable.

Besides, I don’t think this adaptation should be limited to incremental mode. As @michaelmdeng mentioned in his reply:

We normally backfill existing data through a bulk process (e.g., Lightning) and explicitly control table creation so we can create existing tables with the correct cache size.

DM's full mode relies on a similar process, using Lightning to execute CREATE TABLE statements during the process. Users may need customized table options in scenarios beyond just incremental mode. While I understand the current PR is focused on the syncer’s code path, it’s worth considering whether this issue might also apply to other modes.

With those being said, I think this issue is quite valuable, and we should definitely prioritize finding a solution.

@OliverS929
Copy link
Contributor

Also cc @River2000i

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/feature Issues about a new feature
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants