Allow explicit control of `AUTO_ID_CACHE` in DM incremental sync of new tables #12039

michaelmdeng · 2025-01-29T16:40:57Z

Is your feature request related to a problem?

We cannot update the AUTO_ID_CACHE setting of a table after its creation.

When we replicate new tables via incremental DM, the CREATE TABLE statements are applied with default auto-id-cache size, which is problematic if we want to control it directly, or if we want to use MySQL compatibility mode w/ AUTO_ID_CACHE=1.

Describe the feature you'd like

We'd like to have control to explicitly set it in create table DDLs that are incrementally replicated in DM. When DM syncs a new table w/ an AUTO_INCREMENT column, the syncer can modify the DDL to set an explicit auto-id-cache size based on the syncer configuration.

We propose a new auto-id-cache-size Syncer config. When set, DM will update the create statement w/ the corresponding auto-id-cache size. When unset (0 or default), the statement will be left as is.

Describe alternatives you've considered

No response

Teachability, Documentation, Adoption, Migration Strategy

No response

The text was updated successfully, but these errors were encountered:

lance6716 · 2025-02-02T13:47:22Z

ptal @alastori @OliverS929

Can you explain more details about the problem without adjust the AUTO_ID_CACHE? For example, when other SQL workloads need to INSERT some rows, it will be conflict with DM's replication?

michaelmdeng · 2025-02-03T00:04:16Z

ptal @alastori @OliverS929

Can you explain more details about the problem without adjust the AUTO_ID_CACHE? For example, when other SQL workloads need to INSERT some rows, it will be conflict with DM's replication?

Sure, we are using DM to incrementally migrate data from an upstream Aurora MySQL instance and the application client relies on monotonically increasing allocated IDs which we can achieve by specifying auto ID cache size of 1 for MySQl compatibility. So this establishes a requirement for us that all upstream auto increment tables must be created with cache size 1.

We normally backfill existing data through a bulk process (ex. Lightning) and explicitly control table creation so we can create existing tables with the correct cache size. However, once we finish bulk backfill and enter into incremental replication with DM, we may replicate newly created tables. We won't have other applications inserting rows during this period, but we do want readers of replicated data to read MySQL-like increasing IDs.

During incremental phase, DM creates new tables with the default cache size. This is prohibitive also because we cannot adjust the auto ID cache size after table creation. Instead, we want DM to create the new tables with cache size of 1 pfor MySQL compatibility (or more generically, with user configurable cache size).

alastori · 2025-02-03T23:25:30Z

@michaelmdeng thanks for the context. I understand the main issue is related to the potential huge gaps in auto_increment fields when AUTO_ID_CACHE = 0.

@lance6716, @OliverS929 : I agree this is indeed an important problem that affects data consistency and application behavior during migrations using DM incremental replication. Let me add some additional context from my perspective:

The AUTO_ID_CACHE = 1 setting is critical for minimizing gaps in ID sequences maintaining similar MySQL behavior when required
The current limitation creates challenges in scenarios when tables are created in the source during ongoing replication and big gaps aren't expected when using auto-increment sequences
The proposed solution to allow explicit control via syncer config is valuable because it keeps behavior close to MySQL

@michaelmdeng note that small gaps in auto_increment fields are still possible if TiDB crashes or failover and in those edge cases it is not guaranteed that the target will have the same IDs as the source.

michaelmdeng · 2025-02-04T14:27:32Z

note that small gaps in auto_increment fields are still possible if TiDB crashes or failover and in those edge cases it is not guaranteed that the target will have the same IDs as the source.

yep, we understand this possibility and are prepared for it

The proposed solution to allow explicit control via syncer config is valuable because it keeps behavior close to MySQL

Given the requirements/need here, is the proposed approach in the PR appropriate? any other design considerations for how incremental DM interacts w/ other modes?

OliverS929 · 2025-02-11T10:28:14Z

Sorry for the late reply. I agree with @alastori that we could indeed add an option to set parameters like AUTO_ID_CACHE when creating tables in DDLs, and it’s important to ensure MySQL compatibility in this case. In addition to the points raised above and the existing PR from @michaelmdeng, I have a few suggestions on how we could approach this issue:

Beyond the AUTO_ID_CACHE option in this issue, there are several other TableOption settings, such as AUTO_RANDOM_BASE or CHECKSUM (see docs), and new options may arise in the future that we’ll need to support in DDLs executed by DM. Instead of maintaining a growing list of options in our configuration files, we could introduce something like a TableOptionString that allows users to include all the table options they want in the CREATE TABLE DDL statement. This would address the current issue while providing flexibility for future needs and avoiding further configuration changes down the line. For implementation, rather than appending a single option and its value, we could directly append the entire user-defined string. I believe this approach would be more robust and adaptable.

Besides, I don’t think this adaptation should be limited to incremental mode. As @michaelmdeng mentioned in his reply:

We normally backfill existing data through a bulk process (e.g., Lightning) and explicitly control table creation so we can create existing tables with the correct cache size.

DM's full mode relies on a similar process, using Lightning to execute CREATE TABLE statements during the process. Users may need customized table options in scenarios beyond just incremental mode. While I understand the current PR is focused on the syncer’s code path, it’s worth considering whether this issue might also apply to other modes.

With those being said, I think this issue is quite valuable, and we should definitely prioritize finding a solution.

OliverS929 · 2025-02-11T10:32:47Z

Also cc @River2000i

michaelmdeng added the type/feature Issues about a new feature label Jan 29, 2025

michaelmdeng linked a pull request Jan 29, 2025 that will close this issue

syncer(dm): option for explicit auto-id-cache handling in create table DDLs #12040

Open

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow explicit control of `AUTO_ID_CACHE` in DM incremental sync of new tables #12039

Allow explicit control of `AUTO_ID_CACHE` in DM incremental sync of new tables #12039

michaelmdeng commented Jan 29, 2025

lance6716 commented Feb 2, 2025

michaelmdeng commented Feb 3, 2025 •

edited

Loading

alastori commented Feb 3, 2025

michaelmdeng commented Feb 4, 2025

OliverS929 commented Feb 11, 2025 •

edited

Loading

OliverS929 commented Feb 11, 2025

Allow explicit control of AUTO_ID_CACHE in DM incremental sync of new tables #12039

Allow explicit control of AUTO_ID_CACHE in DM incremental sync of new tables #12039

Comments

michaelmdeng commented Jan 29, 2025

Is your feature request related to a problem?

Describe the feature you'd like

Describe alternatives you've considered

Teachability, Documentation, Adoption, Migration Strategy

lance6716 commented Feb 2, 2025

michaelmdeng commented Feb 3, 2025 • edited Loading

alastori commented Feb 3, 2025

michaelmdeng commented Feb 4, 2025

OliverS929 commented Feb 11, 2025 • edited Loading

OliverS929 commented Feb 11, 2025

Allow explicit control of `AUTO_ID_CACHE` in DM incremental sync of new tables #12039

Allow explicit control of `AUTO_ID_CACHE` in DM incremental sync of new tables #12039

michaelmdeng commented Feb 3, 2025 •

edited

Loading

OliverS929 commented Feb 11, 2025 •

edited

Loading