Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix](agg) Aggregating string types with null values may result in incorrect result #42067

Merged
merged 1 commit into from
Oct 21, 2024

Conversation

liaoxin01
Copy link
Contributor

@liaoxin01 liaoxin01 commented Oct 18, 2024

Aggregating string types with null values may result in incorrect result because using the replace_column_data function can cause incorrect offsets in the column.

A reproducible case:

CREATE TABLE `test_scan_keys_with_bool_type` (
    `col1` tinyint NOT NULL,
    `col2` int NOT NULL,
    `col3` tinyint NOT NULL,
    `col5` boolean REPLACE NOT NULL,
    `col4` datetime(2) REPLACE NOT NULL,
    `col6` double REPLACE_IF_NOT_NULL NULL,
    `col7` varchar(100) REPLACE_IF_NOT_NULL NULL
) ENGINE=OLAP
AGGREGATE KEY(`col1`, `col2`, `col3`)
DISTRIBUTED BY HASH(`col1`, `col2`, `col3`) BUCKETS 1
PROPERTIES (
    "replication_allocation" = "tag.location.default: 1",
    "disable_auto_compaction" = "true"
);

insert into test_scan_keys_with_bool_type values
( -100 ,    1 ,  -82 ,    1 , '2024-02-16 04:37:37.00' , -1299962421.904282 , NULL ),
( -100 ,    0 ,  -82 ,    1 , '2024-02-16 04:37:37.00' , -1299962421.904282 , "hi" ),
( -100 ,    1 ,   92 ,    1 , '2024-02-16 04:37:37.00' ,   23423423.0324234 , NULL );

 insert into test_scan_keys_with_bool_type values
( -100 ,    1 ,  1 ,    1 , '2024-02-16 04:37:37.00' , -1299962421.904282 , "doris" );

MySQL [test]> select * from test_scan_keys_with_bool_type;
+------+------+------+------+------------------------+---------------------+-------+
| col1 | col2 | col3 | col5 | col4                   | col6                | col7  |
+------+------+------+------+------------------------+---------------------+-------+
| -100 |    0 |  -82 |    1 | 2024-02-16 04:37:37.00 | -1299962421.9042821 | hi    |
| -100 |    1 |  -82 |    1 | 2024-02-16 04:37:37.00 | -1299962421.9042821 | NULL  |
| -100 |    1 |    1 |    1 | 2024-02-16 04:37:37.00 | -1299962421.9042821 | hidor |
| -100 |    1 |   92 |    1 | 2024-02-16 04:37:37.00 |    23423423.0324234 | NULL  |
+------+------+------+------+------------------------+---------------------+-------+
4 rows in set (0.057 sec)

MySQL [test]> set skip_storage_engine_merge = true; select * from test_scan_keys_with_bool_type;
+------+------+------+------+------------------------+---------------------+-------+
| col1 | col2 | col3 | col5 | col4                   | col6                | col7  |
+------+------+------+------+------------------------+---------------------+-------+
| -100 |    0 |  -82 |    1 | 2024-02-16 04:37:37.00 | -1299962421.9042821 | hi    |
| -100 |    1 |  -82 |    1 | 2024-02-16 04:37:37.00 | -1299962421.9042821 | NULL  |
| -100 |    1 |   92 |    1 | 2024-02-16 04:37:37.00 |    23423423.0324234 | NULL  |
| -100 |    1 |    1 |    1 | 2024-02-16 04:37:37.00 | -1299962421.9042821 | doris |
+------+------+------+------+------------------------+---------------------+-------+
4 rows in set (0.023 sec)

#33493 By supporting variant type aggregation, this issue has been resolved.So versions after 2.1 do not have this issue.

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

@liaoxin01
Copy link
Contributor Author

run buildall

Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TPC-H: Total hot run time: 48910 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit fe1a6821b946d3e10823399c69968faac0ac2ba3, data reload: false

------ Round 1 ----------------------------------
q1	17807	4328	4297	4297
q2	2070	152	152	152
q3	10419	1892	1908	1892
q4	10330	1271	1329	1271
q5	8389	3862	3872	3862
q6	230	124	125	124
q7	2024	1650	1608	1608
q8	9275	2712	2692	2692
q9	10134	9751	9753	9751
q10	8669	3539	3516	3516
q11	420	235	240	235
q12	468	293	302	293
q13	18362	3956	4048	3956
q14	348	323	318	318
q15	504	452	462	452
q16	548	466	463	463
q17	1133	985	1003	985
q18	7276	6722	6887	6722
q19	1709	1610	1559	1559
q20	510	324	311	311
q21	4411	4103	4053	4053
q22	503	409	398	398
Total cold run time: 115539 ms
Total hot run time: 48910 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4309	4351	4284	4284
q2	323	226	219	219
q3	4150	4152	4117	4117
q4	2735	2726	2758	2726
q5	7140	7127	7077	7077
q6	241	126	123	123
q7	3204	2847	2774	2774
q8	4375	4437	4505	4437
q9	13797	13568	13532	13532
q10	4246	4273	4288	4273
q11	753	681	694	681
q12	1036	876	839	839
q13	6960	3735	3720	3720
q14	454	430	433	430
q15	501	465	460	460
q16	625	597	581	581
q17	3851	3798	3834	3798
q18	8674	8799	8766	8766
q19	1723	1619	1670	1619
q20	2316	2095	2080	2080
q21	8383	8371	8431	8371
q22	1059	931	937	931
Total cold run time: 80855 ms
Total hot run time: 75838 ms

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 37.85% (8157/21553)
Line Coverage: 29.57% (67126/227030)
Region Coverage: 29.05% (34614/119167)
Branch Coverage: 24.99% (17863/71478)
Coverage Report: http://coverage.selectdb-in.cc/coverage/fe1a6821b946d3e10823399c69968faac0ac2ba3_fe1a6821b946d3e10823399c69968faac0ac2ba3/report/index.html

@doris-robot
Copy link

TPC-DS: Total hot run time: 211904 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit fe1a6821b946d3e10823399c69968faac0ac2ba3, data reload: false

query1	932	387	409	387
query2	6531	2311	1957	1957
query3	6921	198	200	198
query4	23795	21513	21748	21513
query5	19743	6573	6570	6570
query6	288	227	237	227
query7	4173	297	302	297
query8	287	251	263	251
query9	3046	2659	2565	2565
query10	409	326	312	312
query11	15518	14676	15053	14676
query12	133	74	78	74
query13	1029	437	453	437
query14	17912	13622	13834	13622
query15	381	220	231	220
query16	6475	276	264	264
query17	1795	924	910	910
query18	890	319	316	316
query19	209	153	149	149
query20	98	97	94	94
query21	194	101	97	97
query22	5305	5082	5051	5051
query23	34295	33389	33400	33389
query24	7840	6318	6403	6318
query25	533	433	436	433
query26	1265	174	158	158
query27	2485	301	296	296
query28	6070	2225	2212	2212
query29	2831	2690	2769	2690
query30	241	170	161	161
query31	944	721	726	721
query32	71	62	56	56
query33	453	257	255	255
query34	865	486	490	486
query35	1122	943	965	943
query36	1297	1220	1170	1170
query37	174	61	65	61
query38	3103	2896	2883	2883
query39	1392	1320	1327	1320
query40	305	92	97	92
query41	39	37	37	37
query42	96	92	96	92
query43	618	601	550	550
query44	1169	724	725	724
query45	244	225	228	225
query46	1239	971	976	971
query47	1910	1772	1752	1752
query48	497	415	414	414
query49	642	372	381	372
query50	871	581	595	581
query51	4720	4689	4625	4625
query52	99	76	80	76
query53	226	185	190	185
query54	2672	2458	2508	2458
query55	92	80	89	80
query56	246	216	211	211
query57	1326	1130	1168	1130
query58	212	223	208	208
query59	3647	3558	3314	3314
query60	231	223	205	205
query61	95	91	92	91
query62	858	427	461	427
query63	201	175	174	174
query64	3504	1561	1343	1343
query65	3664	3604	3572	3572
query66	764	401	426	401
query67	16481	16306	15164	15164
query68	9342	666	648	648
query69	500	280	279	279
query70	1669	1396	1379	1379
query71	399	307	314	307
query72	6776	4745	4889	4745
query73	770	317	324	317
query74	6278	5911	5824	5824
query75	4666	3769	3622	3622
query76	4838	1168	1195	1168
query77	697	262	263	262
query78	12408	11592	11553	11553
query79	9918	670	647	647
query80	2527	409	392	392
query81	515	234	243	234
query82	1645	99	99	99
query83	152	137	128	128
query84	259	70	70	70
query85	1276	312	316	312
query86	359	296	290	290
query87	3295	3059	3024	3024
query88	4832	2320	2316	2316
query89	490	287	304	287
query90	1851	213	215	213
query91	155	132	133	132
query92	58	56	50	50
query93	7157	573	573	573
query94	940	211	204	204
query95	2137	1919	1966	1919
query96	647	325	323	323
query97	6492	6336	6343	6336
query98	239	202	199	199
query99	3094	949	839	839
Total cold run time: 324094 ms
Total hot run time: 211904 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 31.79 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit fe1a6821b946d3e10823399c69968faac0ac2ba3, data reload: false

query1	0.02	0.02	0.02
query2	0.07	0.02	0.02
query3	0.26	0.05	0.05
query4	1.79	0.06	0.06
query5	0.54	0.53	0.52
query6	1.23	0.62	0.62
query7	0.02	0.01	0.01
query8	0.03	0.02	0.02
query9	0.51	0.49	0.49
query10	0.54	0.53	0.54
query11	0.12	0.08	0.08
query12	0.12	0.09	0.08
query13	0.62	0.62	0.61
query14	0.79	0.78	0.80
query15	0.78	0.75	0.75
query16	0.36	0.36	0.37
query17	0.97	1.03	1.04
query18	0.21	0.27	0.26
query19	1.93	1.77	1.87
query20	0.02	0.01	0.01
query21	15.48	0.54	0.58
query22	1.83	2.25	1.88
query23	16.97	0.99	0.94
query24	5.05	2.63	1.77
query25	0.40	0.08	0.04
query26	0.78	0.16	0.15
query27	0.04	0.05	0.04
query28	6.12	0.79	0.80
query29	12.67	2.34	2.28
query30	0.63	0.54	0.52
query31	2.80	0.40	0.37
query32	3.35	0.51	0.48
query33	3.08	3.08	3.09
query34	15.25	4.78	4.77
query35	4.87	4.86	4.81
query36	1.08	1.02	1.02
query37	0.06	0.05	0.04
query38	0.03	0.02	0.02
query39	0.02	0.01	0.02
query40	0.16	0.14	0.14
query41	0.07	0.01	0.01
query42	0.02	0.01	0.02
query43	0.03	0.01	0.01
Total cold run time: 101.72 s
Total hot run time: 31.79 s

@doris-robot
Copy link

Load test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'

Load test result on commit fe1a6821b946d3e10823399c69968faac0ac2ba3 with default session variables
Stream load json:         19 seconds loaded 2358488459 Bytes, about 118 MB/s
Stream load orc:          58 seconds loaded 1101869774 Bytes, about 18 MB/s
Stream load parquet:      32 seconds loaded 861443392 Bytes, about 25 MB/s
Insert into select:       22.2 seconds inserted 10000000 Rows, about 450K ops/s

Copy link
Member

@eldenmoon eldenmoon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Oct 21, 2024
Copy link
Contributor

PR approved by at least one committer and no changes requested.

Copy link
Contributor

PR approved by anyone and no changes requested.

@xiaokang xiaokang added the usercase Important user case type label label Oct 21, 2024
@xiaokang xiaokang merged commit 8d60e57 into apache:branch-2.0 Oct 21, 2024
22 of 27 checks passed
dataroaring pushed a commit that referenced this pull request Oct 28, 2024
xiaokang pushed a commit that referenced this pull request Nov 18, 2024
…correct result (#42067)

Aggregating string types with null values may result in incorrect result
because using the replace_column_data function can cause incorrect
offsets in the column.

A reproducible case:
```
CREATE TABLE `test_scan_keys_with_bool_type` (
    `col1` tinyint NOT NULL,
    `col2` int NOT NULL,
    `col3` tinyint NOT NULL,
    `col5` boolean REPLACE NOT NULL,
    `col4` datetime(2) REPLACE NOT NULL,
    `col6` double REPLACE_IF_NOT_NULL NULL,
    `col7` varchar(100) REPLACE_IF_NOT_NULL NULL
) ENGINE=OLAP
AGGREGATE KEY(`col1`, `col2`, `col3`)
DISTRIBUTED BY HASH(`col1`, `col2`, `col3`) BUCKETS 1
PROPERTIES (
    "replication_allocation" = "tag.location.default: 1",
    "disable_auto_compaction" = "true"
);

insert into test_scan_keys_with_bool_type values
( -100 ,    1 ,  -82 ,    1 , '2024-02-16 04:37:37.00' , -1299962421.904282 , NULL ),
( -100 ,    0 ,  -82 ,    1 , '2024-02-16 04:37:37.00' , -1299962421.904282 , "hi" ),
( -100 ,    1 ,   92 ,    1 , '2024-02-16 04:37:37.00' ,   23423423.0324234 , NULL );

 insert into test_scan_keys_with_bool_type values
( -100 ,    1 ,  1 ,    1 , '2024-02-16 04:37:37.00' , -1299962421.904282 , "doris" );

MySQL [test]> select * from test_scan_keys_with_bool_type;
+------+------+------+------+------------------------+---------------------+-------+
| col1 | col2 | col3 | col5 | col4                   | col6                | col7  |
+------+------+------+------+------------------------+---------------------+-------+
| -100 |    0 |  -82 |    1 | 2024-02-16 04:37:37.00 | -1299962421.9042821 | hi    |
| -100 |    1 |  -82 |    1 | 2024-02-16 04:37:37.00 | -1299962421.9042821 | NULL  |
| -100 |    1 |    1 |    1 | 2024-02-16 04:37:37.00 | -1299962421.9042821 | hidor |
| -100 |    1 |   92 |    1 | 2024-02-16 04:37:37.00 |    23423423.0324234 | NULL  |
+------+------+------+------+------------------------+---------------------+-------+
4 rows in set (0.057 sec)

MySQL [test]> set skip_storage_engine_merge = true; select * from test_scan_keys_with_bool_type;
+------+------+------+------+------------------------+---------------------+-------+
| col1 | col2 | col3 | col5 | col4                   | col6                | col7  |
+------+------+------+------+------------------------+---------------------+-------+
| -100 |    0 |  -82 |    1 | 2024-02-16 04:37:37.00 | -1299962421.9042821 | hi    |
| -100 |    1 |  -82 |    1 | 2024-02-16 04:37:37.00 | -1299962421.9042821 | NULL  |
| -100 |    1 |   92 |    1 | 2024-02-16 04:37:37.00 |    23423423.0324234 | NULL  |
| -100 |    1 |    1 |    1 | 2024-02-16 04:37:37.00 | -1299962421.9042821 | doris |
+------+------+------+------+------------------------+---------------------+-------+
4 rows in set (0.023 sec)
```
#33493 By supporting variant type
aggregation, this issue has been resolved.So versions after 2.1 do not
have this issue.
xiaokang pushed a commit that referenced this pull request Nov 18, 2024
xiaokang pushed a commit that referenced this pull request Nov 19, 2024
…correct result (#42067)

Aggregating string types with null values may result in incorrect result
because using the replace_column_data function can cause incorrect
offsets in the column.

A reproducible case:
```
CREATE TABLE `test_scan_keys_with_bool_type` (
    `col1` tinyint NOT NULL,
    `col2` int NOT NULL,
    `col3` tinyint NOT NULL,
    `col5` boolean REPLACE NOT NULL,
    `col4` datetime(2) REPLACE NOT NULL,
    `col6` double REPLACE_IF_NOT_NULL NULL,
    `col7` varchar(100) REPLACE_IF_NOT_NULL NULL
) ENGINE=OLAP
AGGREGATE KEY(`col1`, `col2`, `col3`)
DISTRIBUTED BY HASH(`col1`, `col2`, `col3`) BUCKETS 1
PROPERTIES (
    "replication_allocation" = "tag.location.default: 1",
    "disable_auto_compaction" = "true"
);

insert into test_scan_keys_with_bool_type values
( -100 ,    1 ,  -82 ,    1 , '2024-02-16 04:37:37.00' , -1299962421.904282 , NULL ),
( -100 ,    0 ,  -82 ,    1 , '2024-02-16 04:37:37.00' , -1299962421.904282 , "hi" ),
( -100 ,    1 ,   92 ,    1 , '2024-02-16 04:37:37.00' ,   23423423.0324234 , NULL );

 insert into test_scan_keys_with_bool_type values
( -100 ,    1 ,  1 ,    1 , '2024-02-16 04:37:37.00' , -1299962421.904282 , "doris" );

MySQL [test]> select * from test_scan_keys_with_bool_type;
+------+------+------+------+------------------------+---------------------+-------+
| col1 | col2 | col3 | col5 | col4                   | col6                | col7  |
+------+------+------+------+------------------------+---------------------+-------+
| -100 |    0 |  -82 |    1 | 2024-02-16 04:37:37.00 | -1299962421.9042821 | hi    |
| -100 |    1 |  -82 |    1 | 2024-02-16 04:37:37.00 | -1299962421.9042821 | NULL  |
| -100 |    1 |    1 |    1 | 2024-02-16 04:37:37.00 | -1299962421.9042821 | hidor |
| -100 |    1 |   92 |    1 | 2024-02-16 04:37:37.00 |    23423423.0324234 | NULL  |
+------+------+------+------+------------------------+---------------------+-------+
4 rows in set (0.057 sec)

MySQL [test]> set skip_storage_engine_merge = true; select * from test_scan_keys_with_bool_type;
+------+------+------+------+------------------------+---------------------+-------+
| col1 | col2 | col3 | col5 | col4                   | col6                | col7  |
+------+------+------+------+------------------------+---------------------+-------+
| -100 |    0 |  -82 |    1 | 2024-02-16 04:37:37.00 | -1299962421.9042821 | hi    |
| -100 |    1 |  -82 |    1 | 2024-02-16 04:37:37.00 | -1299962421.9042821 | NULL  |
| -100 |    1 |   92 |    1 | 2024-02-16 04:37:37.00 |    23423423.0324234 | NULL  |
| -100 |    1 |    1 |    1 | 2024-02-16 04:37:37.00 | -1299962421.9042821 | doris |
+------+------+------+------+------------------------+---------------------+-------+
4 rows in set (0.023 sec)
```
#33493 By supporting variant type
aggregation, this issue has been resolved.So versions after 2.1 do not
have this issue.
xiaokang pushed a commit that referenced this pull request Nov 19, 2024
xiaokang pushed a commit that referenced this pull request Nov 19, 2024
…correct result (#42067)

Aggregating string types with null values may result in incorrect result
because using the replace_column_data function can cause incorrect
offsets in the column.

A reproducible case:
```
CREATE TABLE `test_scan_keys_with_bool_type` (
    `col1` tinyint NOT NULL,
    `col2` int NOT NULL,
    `col3` tinyint NOT NULL,
    `col5` boolean REPLACE NOT NULL,
    `col4` datetime(2) REPLACE NOT NULL,
    `col6` double REPLACE_IF_NOT_NULL NULL,
    `col7` varchar(100) REPLACE_IF_NOT_NULL NULL
) ENGINE=OLAP
AGGREGATE KEY(`col1`, `col2`, `col3`)
DISTRIBUTED BY HASH(`col1`, `col2`, `col3`) BUCKETS 1
PROPERTIES (
    "replication_allocation" = "tag.location.default: 1",
    "disable_auto_compaction" = "true"
);

insert into test_scan_keys_with_bool_type values
( -100 ,    1 ,  -82 ,    1 , '2024-02-16 04:37:37.00' , -1299962421.904282 , NULL ),
( -100 ,    0 ,  -82 ,    1 , '2024-02-16 04:37:37.00' , -1299962421.904282 , "hi" ),
( -100 ,    1 ,   92 ,    1 , '2024-02-16 04:37:37.00' ,   23423423.0324234 , NULL );

 insert into test_scan_keys_with_bool_type values
( -100 ,    1 ,  1 ,    1 , '2024-02-16 04:37:37.00' , -1299962421.904282 , "doris" );

MySQL [test]> select * from test_scan_keys_with_bool_type;
+------+------+------+------+------------------------+---------------------+-------+
| col1 | col2 | col3 | col5 | col4                   | col6                | col7  |
+------+------+------+------+------------------------+---------------------+-------+
| -100 |    0 |  -82 |    1 | 2024-02-16 04:37:37.00 | -1299962421.9042821 | hi    |
| -100 |    1 |  -82 |    1 | 2024-02-16 04:37:37.00 | -1299962421.9042821 | NULL  |
| -100 |    1 |    1 |    1 | 2024-02-16 04:37:37.00 | -1299962421.9042821 | hidor |
| -100 |    1 |   92 |    1 | 2024-02-16 04:37:37.00 |    23423423.0324234 | NULL  |
+------+------+------+------+------------------------+---------------------+-------+
4 rows in set (0.057 sec)

MySQL [test]> set skip_storage_engine_merge = true; select * from test_scan_keys_with_bool_type;
+------+------+------+------+------------------------+---------------------+-------+
| col1 | col2 | col3 | col5 | col4                   | col6                | col7  |
+------+------+------+------+------------------------+---------------------+-------+
| -100 |    0 |  -82 |    1 | 2024-02-16 04:37:37.00 | -1299962421.9042821 | hi    |
| -100 |    1 |  -82 |    1 | 2024-02-16 04:37:37.00 | -1299962421.9042821 | NULL  |
| -100 |    1 |   92 |    1 | 2024-02-16 04:37:37.00 |    23423423.0324234 | NULL  |
| -100 |    1 |    1 |    1 | 2024-02-16 04:37:37.00 | -1299962421.9042821 | doris |
+------+------+------+------+------------------------+---------------------+-------+
4 rows in set (0.023 sec)
```
#33493 By supporting variant type
aggregation, this issue has been resolved.So versions after 2.1 do not
have this issue.
xiaokang pushed a commit that referenced this pull request Nov 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. dev/2.0.16-merged kind/test reviewed usercase Important user case type label
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants