Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix](load) Fix potential data loss during disk migration #42296 #42387

Merged

Conversation

liaoxin01
Copy link
Contributor

cherry pick from #42296

The following operations may trigger this issue.
1. migration start
2. load start using old tablet,but wait migration lock in
`RowsetBuilder::prepare_txn`.
3. migration finish, old tablet -> new tablet
4. obtained migration lock and commit successfully using old tablet.
5. publish failed using old tablet, because old tablet has been dropped.
It cause the data loss.

Therefore, after acquiring the migration lock, check if the tablet has
already been shut down. If it has, it indicates that it is an old
tablet, and data should not be imported into the old tablet.
@liaoxin01
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

@doris-robot
Copy link

TPC-H: Total hot run time: 48783 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 318876213002dd21f70dc2328aa361e9c5c754e8, data reload: false

------ Round 1 ----------------------------------
q1	17628	4388	4322	4322
q2	2064	155	147	147
q3	10269	1859	1883	1859
q4	10299	1213	1285	1213
q5	8300	3862	3893	3862
q6	233	121	124	121
q7	2029	1629	1577	1577
q8	9483	2749	2702	2702
q9	11516	9798	9754	9754
q10	8630	3566	3479	3479
q11	419	247	250	247
q12	469	288	299	288
q13	18360	3951	4025	3951
q14	360	324	328	324
q15	514	460	448	448
q16	548	459	459	459
q17	1146	975	964	964
q18	7224	6796	6877	6796
q19	1692	1548	1506	1506
q20	513	308	306	306
q21	4449	4167	4080	4080
q22	494	378	401	378
Total cold run time: 116639 ms
Total hot run time: 48783 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4318	4296	4266	4266
q2	326	223	223	223
q3	4171	4113	4151	4113
q4	2736	2741	2738	2738
q5	7103	7105	7066	7066
q6	239	119	120	119
q7	3224	2875	2804	2804
q8	4369	4506	4499	4499
q9	13704	13676	13796	13676
q10	4278	4308	4275	4275
q11	793	694	744	694
q12	1018	866	869	866
q13	7390	3756	3774	3756
q14	464	437	432	432
q15	507	451	460	451
q16	627	604	600	600
q17	3859	3813	3835	3813
q18	8874	8943	9195	8943
q19	1750	1700	1676	1676
q20	2469	2233	2245	2233
q21	8789	8611	8498	8498
q22	1049	995	958	958
Total cold run time: 82057 ms
Total hot run time: 76699 ms

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 37.84% (8155/21553)
Line Coverage: 29.56% (67107/227036)
Region Coverage: 29.04% (34603/119170)
Branch Coverage: 24.98% (17859/71480)
Coverage Report: http://coverage.selectdb-in.cc/coverage/318876213002dd21f70dc2328aa361e9c5c754e8_318876213002dd21f70dc2328aa361e9c5c754e8/report/index.html

@doris-robot
Copy link

TPC-DS: Total hot run time: 212302 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 318876213002dd21f70dc2328aa361e9c5c754e8, data reload: false

query1	934	397	419	397
query2	6530	2002	2073	2002
query3	6924	198	202	198
query4	23616	21635	21374	21374
query5	19750	6597	6557	6557
query6	292	217	221	217
query7	4333	298	307	298
query8	255	265	260	260
query9	3053	2636	2599	2599
query10	416	297	302	297
query11	16042	15182	15272	15182
query12	128	79	74	74
query13	1027	449	444	444
query14	17937	13310	13852	13310
query15	362	213	239	213
query16	6442	278	267	267
query17	1716	956	888	888
query18	883	328	308	308
query19	210	150	150	150
query20	100	101	99	99
query21	190	94	97	94
query22	5154	5021	5034	5021
query23	34331	33747	33448	33448
query24	6797	6377	6359	6359
query25	535	435	426	426
query26	1008	164	157	157
query27	2335	292	287	287
query28	6034	2256	2235	2235
query29	3037	2654	2672	2654
query30	240	166	166	166
query31	961	749	768	749
query32	70	64	60	60
query33	445	261	257	257
query34	861	482	500	482
query35	1129	899	961	899
query36	1284	1239	1172	1172
query37	93	62	60	60
query38	3104	2907	2890	2890
query39	1406	1323	1345	1323
query40	202	97	94	94
query41	40	38	36	36
query42	86	80	82	80
query43	786	658	613	613
query44	1204	716	725	716
query45	242	233	230	230
query46	1246	970	940	940
query47	1813	1747	1760	1747
query48	491	415	399	399
query49	628	372	384	372
query50	863	622	580	580
query51	4824	4645	4730	4645
query52	93	85	85	85
query53	219	179	187	179
query54	2658	2442	2460	2442
query55	91	78	83	78
query56	203	217	207	207
query57	1282	1212	1257	1212
query58	221	209	194	194
query59	3422	3125	3182	3125
query60	223	206	206	206
query61	99	95	95	95
query62	820	449	455	449
query63	207	178	178	178
query64	5079	1599	1432	1432
query65	3644	3596	3592	3592
query66	645	423	393	393
query67	16422	15421	15952	15421
query68	10691	646	641	641
query69	523	290	283	283
query70	1808	1270	1541	1270
query71	417	313	311	311
query72	6904	5046	4577	4577
query73	763	323	319	319
query74	6226	5826	5779	5779
query75	5373	3818	3799	3799
query76	6202	1140	1138	1138
query77	1093	264	264	264
query78	12560	11759	11776	11759
query79	6658	651	623	623
query80	1300	391	388	388
query81	478	240	237	237
query82	1656	101	96	96
query83	186	135	133	133
query84	263	70	73	70
query85	881	321	319	319
query86	333	301	321	301
query87	3192	3081	3035	3035
query88	4407	2291	2290	2290
query89	388	309	289	289
query90	1963	208	214	208
query91	159	135	126	126
query92	60	55	52	52
query93	6337	537	555	537
query94	707	210	209	209
query95	2038	2070	1944	1944
query96	645	325	329	325
query97	6477	6317	6495	6317
query98	241	207	212	207
query99	2942	846	843	843
Total cold run time: 321588 ms
Total hot run time: 212302 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.59 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 318876213002dd21f70dc2328aa361e9c5c754e8, data reload: false

query1	0.02	0.02	0.03
query2	0.07	0.02	0.02
query3	0.25	0.05	0.05
query4	1.80	0.06	0.06
query5	0.54	0.52	0.52
query6	1.24	0.60	0.61
query7	0.02	0.01	0.01
query8	0.04	0.02	0.02
query9	0.53	0.49	0.48
query10	0.54	0.53	0.53
query11	0.12	0.09	0.09
query12	0.12	0.09	0.10
query13	0.63	0.61	0.62
query14	0.78	0.77	0.80
query15	0.80	0.77	0.76
query16	0.37	0.39	0.35
query17	0.99	1.02	1.03
query18	0.24	0.26	0.21
query19	1.93	1.84	1.86
query20	0.01	0.01	0.01
query21	15.49	0.56	0.55
query22	2.05	2.03	2.01
query23	17.29	0.96	1.09
query24	7.36	1.05	0.56
query25	0.37	0.07	0.05
query26	0.71	0.15	0.15
query27	0.05	0.05	0.04
query28	6.24	0.76	0.73
query29	12.72	2.28	2.14
query30	0.60	0.55	0.52
query31	2.80	0.39	0.37
query32	3.38	0.49	0.51
query33	3.09	3.06	3.10
query34	15.27	4.81	4.80
query35	4.84	4.84	4.82
query36	1.06	1.02	1.02
query37	0.06	0.05	0.05
query38	0.04	0.02	0.02
query39	0.02	0.02	0.01
query40	0.16	0.14	0.14
query41	0.07	0.02	0.01
query42	0.02	0.02	0.01
query43	0.02	0.02	0.02
Total cold run time: 104.75 s
Total hot run time: 30.59 s

@doris-robot
Copy link

Load test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'

Load test result on commit 318876213002dd21f70dc2328aa361e9c5c754e8 with default session variables
Stream load json:         19 seconds loaded 2358488459 Bytes, about 118 MB/s
Stream load orc:          58 seconds loaded 1101869774 Bytes, about 18 MB/s
Stream load parquet:      32 seconds loaded 861443392 Bytes, about 25 MB/s
Insert into select:       21.7 seconds inserted 10000000 Rows, about 460K ops/s

@liaoxin01 liaoxin01 merged commit 5609a4c into apache:branch-2.0 Oct 24, 2024
19 of 23 checks passed
@liaoxin01 liaoxin01 deleted the pick_42296_to_origin_branch-2.0 branch October 24, 2024 08:13
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants