DM: Mariadb master-slave switch twice, sync failed #10741

okJiang · 2024-03-08T04:55:47Z

What did you do?

"Mariadb binlog gtid format: {domain_id}{}{server_id}{}{seq_no}

gtid_strict_mode = off (gtid_strict_mode behavior validation: http://youdidwhatwithtsql.com/behavior-gtidstrictmode-mariadb/2089/)

Assumed Example: Data Sync Chain: A -> B -> DM -> TiDB

At time t0:

A's current latest binlog gtid: 0_101_100
B syncs A's latest binlog gtid: 0_101_100
At time t1:

B's slave writes binlog with gtid changes: 0_102_101, 0_102_102
At time t2:

A writes with gtid change: 0_101_101
At time t2:

B backs up gtid sequence:
0_101_100
0_102_101
0_102_102
0_101_101
DM synchronization reports an error: 'less than global checkpoint position.'

At time t3 (Mariadb behavior):

Master-slave switch
B's gtid (A consumption scenario):

0_101_100
0_102_101 (A consumes B)
0_102_102 (A consumes B)
0_101_101 (A Skip)
From verification tests, in mariadb gtid_strict_mode = off mode, when there is a master-slave switch between A and B, B promotes to master, and Mariadb pulls binlog from the master based on its own gtid seq, ignoring seq_no comparisons. In the above example, it consumes data from t1, skipping data already present in 0_101_101."

What did you expect to see?

sync normally

What did you see instead?

'less than global checkpoint position.'

Versions of the cluster

DM version (run dmctl -V or dm-worker -V or dm-master -V):

6.5.3

Upstream MySQL/MariaDB server version:

10.1.9

Downstream TiDB cluster version (execute SELECT tidb_version(); in a MySQL client):

(paste TiDB cluster version here)

How did you deploy DM: tiup or manually?

(leave TiUP or manually here)

Other interesting information (system version, hardware config, etc):

>
>

current status of DM cluster (execute `query-status <task-name>` in dmctl)

No response

The text was updated successfully, but these errors were encountered:

lance6716 · 2024-03-09T15:49:41Z

Is this the case of multiple master(writer/primary) node? We should setup separate domain ID for them.

https://mariadb.com/kb/en/gtid/#use-with-multi-source-replication-and-other-multi-primary-setups

However if the master node is caused by failover and multiple master nodes will not occur simultaneously, DM may need to configurate a gtid_strict_mode = off behaviour 🤔 . I don't know if that behaviour will cause more trouble like data loss.

okJiang · 2024-03-11T11:40:38Z

Is this the case of multiple master(writer/primary) node? We should setup separate domain ID for them.

https://mariadb.com/kb/en/gtid/#use-with-multi-source-replication-and-other-multi-primary-setups

However if the master node is caused by failover and multiple master nodes will not occur simultaneously, DM may need to configurate a gtid_strict_mode = off behaviour 🤔 . I don't know if that behaviour will cause more trouble like data loss.

Yes, the best way is setting separate domain ID.

But this is also a headache. I learned today that users who encountered this problem were using TDSQL. In the internal implementation of TDSQL, they relied on mariadb's gtid_strict_mode=off and same domain ID to implement master-slave switching. It is difficult for us to push users to modify the implementation of TDSQL

Now I added server-id mapping in MariadbGTIDSet in go-mysql. go-mysql-org/go-mysql#852 Then I successfully executed my local test case(#10753).

A question: Since it can be executed successfully at the source, should we also be able to synchronize it for the same domain-id and server-id 🤔? @GMHDBJD @lance6716

okJiang · 2024-03-12T07:31:04Z

DM may need to configurate a gtid_strict_mode = off behaviour 🤔 . I don't know if that behaviour will cause more trouble like data loss.

I was worried about this too, so I tried to find other ways

…cation (#10753) close #10741

okJiang added type/bug The issue is confirmed as a bug. area/dm Issues or PRs related to DM. labels Mar 8, 2024

okJiang mentioned this issue Mar 8, 2024

test: add a case of Mariadb for dm_upstream_switch #10735

Open

This was referenced Mar 11, 2024

separate serverID of Mariadb GTID set go-mysql-org/go-mysql#852

Merged

DM/mariadb: sync the gtid executed in slave during master-slave replication #10753

Merged

okJiang added type/enhancement The issue or PR belongs to an enhancement. and removed type/bug The issue is confirmed as a bug. labels Mar 12, 2024

ti-chi-bot bot closed this as completed in #10753 Mar 15, 2024

ti-chi-bot bot pushed a commit that referenced this issue Mar 15, 2024

DM/mariadb: sync the gtid executed in slave during master-slave repli…

720920a

…cation (#10753) close #10741

github-project-automation bot added this to Question and Bug Reports Aug 28, 2024

github-project-automation bot moved this to Done in Question and Bug Reports Aug 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DM: Mariadb master-slave switch twice, sync failed #10741

DM: Mariadb master-slave switch twice, sync failed #10741

okJiang commented Mar 8, 2024

lance6716 commented Mar 9, 2024

okJiang commented Mar 11, 2024 •

edited

Loading

okJiang commented Mar 12, 2024

DM: Mariadb master-slave switch twice, sync failed #10741

DM: Mariadb master-slave switch twice, sync failed #10741

Comments

okJiang commented Mar 8, 2024

What did you do?

What did you expect to see?

What did you see instead?

Versions of the cluster

current status of DM cluster (execute query-status <task-name> in dmctl)

lance6716 commented Mar 9, 2024

okJiang commented Mar 11, 2024 • edited Loading

okJiang commented Mar 12, 2024

current status of DM cluster (execute `query-status <task-name>` in dmctl)

okJiang commented Mar 11, 2024 •

edited

Loading