Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM: Mariadb master-slave switch twice, sync failed #10741

Closed
okJiang opened this issue Mar 8, 2024 · 3 comments · Fixed by #10753
Closed

DM: Mariadb master-slave switch twice, sync failed #10741

okJiang opened this issue Mar 8, 2024 · 3 comments · Fixed by #10753
Labels
area/dm Issues or PRs related to DM. type/enhancement The issue or PR belongs to an enhancement.

Comments

@okJiang
Copy link
Member

okJiang commented Mar 8, 2024

What did you do?

"Mariadb binlog gtid format: {domain_id}{}{server_id}{}{seq_no}

gtid_strict_mode = off (gtid_strict_mode behavior validation: http://youdidwhatwithtsql.com/behavior-gtidstrictmode-mariadb/2089/)

Assumed Example: Data Sync Chain: A -> B -> DM -> TiDB

At time t0:

A's current latest binlog gtid: 0_101_100
B syncs A's latest binlog gtid: 0_101_100
At time t1:

B's slave writes binlog with gtid changes: 0_102_101, 0_102_102
At time t2:

A writes with gtid change: 0_101_101
At time t2:

B backs up gtid sequence:
0_101_100
0_102_101
0_102_102
0_101_101
DM synchronization reports an error: 'less than global checkpoint position.'

At time t3 (Mariadb behavior):

Master-slave switch
B's gtid (A consumption scenario):

0_101_100
0_102_101 (A consumes B)
0_102_102 (A consumes B)
0_101_101 (A Skip)
From verification tests, in mariadb gtid_strict_mode = off mode, when there is a master-slave switch between A and B, B promotes to master, and Mariadb pulls binlog from the master based on its own gtid seq, ignoring seq_no comparisons. In the above example, it consumes data from t1, skipping data already present in 0_101_101."

What did you expect to see?

sync normally

What did you see instead?

'less than global checkpoint position.'

Versions of the cluster

DM version (run dmctl -V or dm-worker -V or dm-master -V):

6.5.3

Upstream MySQL/MariaDB server version:

10.1.9

Downstream TiDB cluster version (execute SELECT tidb_version(); in a MySQL client):

(paste TiDB cluster version here)

How did you deploy DM: tiup or manually?

(leave TiUP or manually here)

Other interesting information (system version, hardware config, etc):

>
>

current status of DM cluster (execute query-status <task-name> in dmctl)

No response

@okJiang okJiang added type/bug The issue is confirmed as a bug. area/dm Issues or PRs related to DM. labels Mar 8, 2024
@lance6716
Copy link
Contributor

Is this the case of multiple master(writer/primary) node? We should setup separate domain ID for them.

https://mariadb.com/kb/en/gtid/#use-with-multi-source-replication-and-other-multi-primary-setups

However if the master node is caused by failover and multiple master nodes will not occur simultaneously, DM may need to configurate a gtid_strict_mode = off behaviour 🤔 . I don't know if that behaviour will cause more trouble like data loss.

@okJiang
Copy link
Member Author

okJiang commented Mar 11, 2024

Is this the case of multiple master(writer/primary) node? We should setup separate domain ID for them.

https://mariadb.com/kb/en/gtid/#use-with-multi-source-replication-and-other-multi-primary-setups

However if the master node is caused by failover and multiple master nodes will not occur simultaneously, DM may need to configurate a gtid_strict_mode = off behaviour 🤔 . I don't know if that behaviour will cause more trouble like data loss.

Yes, the best way is setting separate domain ID.

But this is also a headache. I learned today that users who encountered this problem were using TDSQL. In the internal implementation of TDSQL, they relied on mariadb's gtid_strict_mode=off and same domain ID to implement master-slave switching. It is difficult for us to push users to modify the implementation of TDSQL

Now I added server-id mapping in MariadbGTIDSet in go-mysql. go-mysql-org/go-mysql#852 Then I successfully executed my local test case(#10753).

A question: Since it can be executed successfully at the source, should we also be able to synchronize it for the same domain-id and server-id 🤔? @GMHDBJD @lance6716

@okJiang okJiang added type/enhancement The issue or PR belongs to an enhancement. and removed type/bug The issue is confirmed as a bug. labels Mar 12, 2024
@okJiang
Copy link
Member Author

okJiang commented Mar 12, 2024

DM may need to configurate a gtid_strict_mode = off behaviour 🤔 . I don't know if that behaviour will cause more trouble like data loss.

I was worried about this too, so I tried to find other ways

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/dm Issues or PRs related to DM. type/enhancement The issue or PR belongs to an enhancement.
Projects
Development

Successfully merging a pull request may close this issue.

2 participants