Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deserialization trouble with BINARY(16) columns for UUID #139

Open
bobdaly opened this issue Jan 26, 2024 · 10 comments
Open

Deserialization trouble with BINARY(16) columns for UUID #139

bobdaly opened this issue Jan 26, 2024 · 10 comments

Comments

@bobdaly
Copy link

bobdaly commented Jan 26, 2024

Library version: 0.27.5
This UUID string value ab9d25e0-64e6-469f-9873-8621910ed300 came from: java.util.UUID.randomUUID().toString() (JDK version 11)

It was persisted to a MySQL 8.0 database column of type BINARY(16) using the UUID_TO_BIN() function.
Here is the relevant code relying on the binlog library:

WriteRowsEventData eventData = event.getData();
List<Serializable[]> rowData = eventData.getRows();
(iterate over rowData)
Object obj = (value from UUID/BINARY column which will come back as a byte[])

For the vast majority of UUIDs, the byte[] is of the expected length - 16.
But some, like the value included above, have a length of 15.

The following conversion will fail with only 15 bytes:
ByteBuffer byteBuffer = ByteBuffer.wrap(bytes);
long high = byteBuffer.getLong();
long low = byteBuffer.getLong();
java.util.UUID uuid = new UUID(high, low);

The exception is:
java.nio.BufferUnderflowException
at java.base/java.nio.Buffer.nextGetIndex(Buffer.java:650)
at java.base/java.nio.HeapByteBuffer.getLong(HeapByteBuffer.java:452)

If the incorrectly sized byte[] has a 16th byte added as follows:
bin2[15] = Byte.valueOf("00");
The conversion works.

I should also state that interacting with the persisted UUID using BIN_TO_UUID works as expected (i.e there is no problem from MySQL's perspective).

BTW, the CHAR_AND_BINARY_AS_BYTE_ARRAY compatibility mode is being set as well:
EventDeserializer eventDeserializer = new EventDeserializer();
eventDeserializer.setCompatibilityMode(
EventDeserializer.CompatibilityMode.CHAR_AND_BINARY_AS_BYTE_ARRAY
);
binaryLogClient.setEventDeserializer(eventDeserializer);

@leiless
Copy link

leiless commented Sep 19, 2024

Hi, @bobdaly. I've also encountered this bug when deserializing BINARY(16) to UUID.

cc @osheroff, @Naros, @jpechane.

MySQL column definition: uuid BINARY(16) NOT NULL UNIQUE KEY,

java.lang.IllegalArgumentException: UUID bytes len: 15, expected 16
    at com.google.common.base.Preconditions.checkArgument(Preconditions.java:88)
    at xxx.db.Utils.convertBytesToUUID(Utils.java:11)
    at xxx.db.ClickHouseDAO.parseUser(ClickHouseDAO.java:245)
    at xxx.db.ClickHouseDAO.insertRows(ClickHouseDAO.java:327)
    at xxx.db.ClickHouseWorker$QueueThread.insertAndAck(ClickHouseWorker.java:73)
    at xxx.db.ClickHouseWorker$QueueThread.run(ClickHouseWorker.java:62)

public static UUID convertBytesToUUID(byte[] bytes) {
    Preconditions.checkNotNull(bytes);
    Preconditions.checkArgument(bytes.length == 16, "UUID bytes len: " + bytes.length + ", expected 16");

    ByteBuffer byteBuffer = ByteBuffer.wrap(bytes);
    long high = byteBuffer.getLong();
    long low = byteBuffer.getLong();
    return new UUID(high, low);
}
EventDeserializer eventDeserializer = new EventDeserializer();
eventDeserializer.setCompatibilityMode(
        EventDeserializer.CompatibilityMode.DATE_AND_TIME_AS_LONG,
        EventDeserializer.CompatibilityMode.CHAR_AND_BINARY_AS_BYTE_ARRAY,
        EventDeserializer.CompatibilityMode.INVALID_DATE_AND_TIME_AS_MIN_VALUE
);
client.setEventDeserializer(eventDeserializer);

related: #131

@leiless
Copy link

leiless commented Sep 19, 2024

2024-09-19 15:18:21.756 ERROR xxx.cli.App: Uncaught exception on Thread-9
java.lang.IllegalArgumentException: UUID bytes len: 15: fdf085fa10f93ce087a45c5f285809, expected 16
        at com.google.common.base.Preconditions.checkArgument(Preconditions.java:115)
        at xxx.db.Utils.convertBytesToUUID(Utils.java:23)
        at xxx.db.ClickHouseDAO.parseUser(ClickHouseDAO.java:245)
        at xxx.db.ClickHouseDAO.insertRows(ClickHouseDAO.java:327)
        at xxx.db.ClickHouseWorker$QueueThread.insertAndAck(ClickHouseWorker.java:73)
        at xxx.db.ClickHouseWorker$QueueThread.run(ClickHouseWorker.java:62)

mysql> select lower(hex(uuid)) as uuid from xxx.xxx where lower(hex(uuid)) like '%fdf085fa10f93ce087a45c5f285809%';
+----------------------------------+
| uuid                             |
+----------------------------------+
| fdf085fa10f93ce087a45c5f28580900 |
+----------------------------------+
1 row in set (0.01 sec)

I.e. the trailing 00-byte got "unread" somehow.

@Naros
Copy link
Collaborator

Naros commented Sep 19, 2024

Hi @leiless can you tell me what version of MySQL/MariaDB you're using and version of the binlog client?

@leiless
Copy link

leiless commented Sep 19, 2024

@Naros, FYI, 10.6.10-MariaDB-log.

Ok, I've figured it out. It's the behavior of MySQL replication protocol.

The MySQL server indeed sent 15 bytes to the MySQL replica, it's not a bug.

# All 0x00s case
# {'values': {'UNKNOWN_COL0': 1, 'UNKNOWN_COL1': ''}, 'none_sources': {}}
# UNKNOWN_COL1 len: 0
INSERT INTO db.t (id, uuid) VALUES (NULL, UNHEX(REPLACE("00000000-0000-0000-0000-000000000000", "-","")));

# Suffix 0x00s case
# {'values': {'UNKNOWN_COL0': 2, 'UNKNOWN_COL1': '\x01'}, 'none_sources': {}}
# UNKNOWN_COL1 len: 1
INSERT INTO db.t (id, uuid) VALUES (NULL, UNHEX(REPLACE("01000000-0000-0000-0000-000000000000", "-","")));

# Mid 0x00s case
# {'values': {'UNKNOWN_COL0': 3, 'UNKNOWN_COL1': '\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01'}, 'none_sources': {}}
# UNKNOWN_COL1 len: 15
INSERT INTO db.t (id, uuid) VALUES (NULL, UNHEX(REPLACE("01000000-0000-0000-0000-000000000100", "-","")));

# Prefix 0x00s case
# {'values': {'UNKNOWN_COL0': 4, 'UNKNOWN_COL1': '\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01'}, 'none_sources': {}}
# UNKNOWN_COL1 len: 15
INSERT INTO db.t (id, uuid) VALUES (NULL, UNHEX(REPLACE("00000000-0000-0000-0000-000000000100", "-","")));

# Prefix + suffix 0x00s case (suffix 0x00s got truncated)
# {'values': {'UNKNOWN_COL0': 5, 'UNKNOWN_COL1': '\x00\x00\x00\x00\x00\x00\x01\x01'}, 'none_sources': {}}
# UNKNOWN_COL1 len: 8
INSERT INTO db.t (id, uuid) VALUES (NULL, UNHEX(REPLACE("00000000-0000-0101-0000-000000000000", "-","")));
from pymysqlreplication import BinLogStreamReader
from pymysqlreplication.row_event import WriteRowsEvent

mysql_settings = {'host': '127.0.0.1', 'port': 23306, 'user': 'root', 'passwd': '123456'}
stream = BinLogStreamReader(connection_settings=mysql_settings,
                            server_id=100,
                            is_mariadb=True,
                            auto_position='',
                            blocking=True,
                            )

for binlogevent in stream:
    binlogevent.dump()

    if isinstance(binlogevent, WriteRowsEvent):
        write_rows_ev: WriteRowsEvent = binlogevent
        if f'{write_rows_ev.schema}.{write_rows_ev.table}' == 'db.t':
            for row in write_rows_ev.rows:
                print(row)
                print('UNKNOWN_COL1 len: ' + str(len(row['values']['UNKNOWN_COL1'])))
                print()

stream.close()

As we can see, for BINARY(N) where N is a constant, if there are trailing 0x00s in the value, those trailing 0x00 bytes will be truncated.

@Naros
Copy link
Collaborator

Naros commented Sep 19, 2024

Thanks for the follow-up @leiless, so in that case can we close this or is there still an issue?

@leiless
Copy link

leiless commented Sep 19, 2024

As we can see, for BINARY(N) where N is a constant, if there are trailing 0x00s in the value, those trailing 0x00 bytes will be truncated.

Proof:

# {'values': {'UNKNOWN_COL0': 17, 'UNKNOWN_COL1': '\x01\x02\x03\x04\x05\x06\x07\x08\t\n'}, 'none_sources': {}}
# UNKNOWN_COL1 len: 10
INSERT INTO db.t (id, uuid) VALUES (NULL, UNHEX(REPLACE("01020304-0506-0708-090a-000000000000", "-","")));

2024-09-19 202026-a

MariaDB server send BINARY(16) data 01020304-05060708090a000000000000 with trailing 0x00s stripped.

I think it applies to MySQL also.

And for CHAR(N) where N is a constant, trailing 0x00s will also be stripped (not yet tested).

@leiless
Copy link

leiless commented Sep 19, 2024

Thanks for the follow-up @leiless, so in that case can we close this or is there still an issue?

@Naros I think we can close this issue, but we may need to update the README for this, it's quite tricky.

@leiless
Copy link

leiless commented Sep 20, 2024

Possible solution to this issue

private static byte[] fixTruncatedUuidBytes(byte[] uuidBytes) {
    Preconditions.checkNotNull(uuidBytes);
    Preconditions.checkArgument(uuidBytes.length <= 16);
    int truncatedTrailingZerosLen = 16 - uuidBytes.length;
    if (truncatedTrailingZerosLen == 0) {
        return uuidBytes;
    }
    byte[] fixedUuidBytes = new byte[16];
    // The remaining truncatedTrailingZerosLen bytes are already initialized to 0 by default
    System.arraycopy(uuidBytes, 0, fixedUuidBytes, 0, uuidBytes.length);
    return fixedUuidBytes;
}

@jpechane
Copy link

@leiless Hi, I am a bit confused now. The intent of the code you posted is to provide a hint for clients to how to propely consume the value? If yes, is this something you'd like to add to the README? Also, WRT README update would it be possible to send a PR to https://github.com/debezium/mysql-binlog-connector-java ? Thanks

@leiless
Copy link

leiless commented Sep 20, 2024

@leiless Hi, I am a bit confused now. The intent of the code you posted is to provide a hint for clients to how to propely consume the value? If yes, is this something you'd like to add to the README? Also, WRT README update would it be possible to send a PR to https://github.com/debezium/mysql-binlog-connector-java ? Thanks

The intent of the code you posted is to provide a hint for clients to how to propely consume the value?

Yes, in this case, deserialize BINARY(16) as byte[] (which trailing 0x00s is truncated and length of the byte[] is less than 16 bytes), and eventually turn it into a UUID.
So we must fill extra 0x00s to make the length 16 bytes.

I'll send a PR to https://github.com/debezium/mysql-binlog-connector-java someday later, pretty bz right now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants