Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fix](FS)Disable FileSystem Cache in RemoteFileSystem to Prevent Accidental Closure #47859

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

CalvinKirs
Copy link
Member

Purpose:

To prevent exceptions caused by closing the FS object managed by RemoteFileSystem when it is inadvertently shared across different parts of the application, we have disabled the file system cache. This ensures that RemoteFileSystem-managed FS instances are not closed prematurely due to shared references.

Changes:

Disabled the cache for file system implementations (such as S3, HDFS, GCS, etc.) by setting fs..impl.disable.cache=true. This ensures a new FileSystem instance is created each time, rather than reusing one from the cache. This change addresses the issue where RemoteFileSystem-managed file system instances could be closed unexpectedly in multi-threaded or multi-reference scenarios, preventing exceptions triggered by prematurely closed FileSystem instances.

What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

…dental Closure

PR Description:
Purpose:
To prevent exceptions caused by closing the FS object managed by RemoteFileSystem when it is inadvertently shared across different parts of the application, we have disabled the file system cache. This ensures that RemoteFileSystem-managed FS instances are not closed prematurely due to shared references.

Changes:
Disabled the cache for file system implementations (such as S3, HDFS, GCS, etc.) by setting fs.<scheme>.impl.disable.cache=true. This ensures a new FileSystem instance is created each time, rather than reusing one from the cache.
This change addresses the issue where RemoteFileSystem-managed file system instances could be closed unexpectedly in multi-threaded or multi-reference scenarios, preventing exceptions triggered by prematurely closed FileSystem instances.
@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@CalvinKirs
Copy link
Member Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 31604 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit f3209bcc701f1e0b22afcb664d36d08dd92e290b, data reload: false

------ Round 1 ----------------------------------
q1	17568	5127	5102	5102
q2	2046	302	173	173
q3	10391	1251	756	756
q4	10222	1015	526	526
q5	7488	2384	2371	2371
q6	195	165	131	131
q7	889	757	590	590
q8	9305	1166	1094	1094
q9	4925	4646	4849	4646
q10	6813	2287	1901	1901
q11	504	272	260	260
q12	341	355	216	216
q13	17784	3630	3074	3074
q14	232	219	216	216
q15	520	482	454	454
q16	630	622	583	583
q17	555	869	337	337
q18	6601	6178	6173	6173
q19	1758	963	549	549
q20	308	312	187	187
q21	2759	2289	1969	1969
q22	362	333	296	296
Total cold run time: 102196 ms
Total hot run time: 31604 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5105	5062	5071	5062
q2	228	325	230	230
q3	2122	2770	2407	2407
q4	1588	1858	1437	1437
q5	4324	4217	4242	4217
q6	211	171	123	123
q7	1918	1839	1701	1701
q8	2586	2574	2539	2539
q9	7243	7190	7111	7111
q10	2981	3210	2769	2769
q11	565	520	495	495
q12	695	771	614	614
q13	3428	3897	3307	3307
q14	275	295	272	272
q15	505	464	470	464
q16	641	676	629	629
q17	1138	1618	1300	1300
q18	7540	7258	7247	7247
q19	792	746	810	746
q20	1950	2031	1834	1834
q21	5437	4917	4831	4831
q22	634	564	518	518
Total cold run time: 51906 ms
Total hot run time: 49853 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 189898 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit f3209bcc701f1e0b22afcb664d36d08dd92e290b, data reload: false

query1	1303	968	919	919
query2	6242	1866	1846	1846
query3	11117	4592	4322	4322
query4	57178	24731	22998	22998
query5	5176	518	486	486
query6	372	193	196	193
query7	5169	519	292	292
query8	321	242	244	242
query9	6597	2565	2589	2565
query10	392	321	250	250
query11	15320	15096	14910	14910
query12	162	112	109	109
query13	1194	527	409	409
query14	11006	6638	6399	6399
query15	211	198	178	178
query16	7072	664	492	492
query17	1113	723	574	574
query18	1583	425	325	325
query19	205	198	169	169
query20	136	136	127	127
query21	209	131	116	116
query22	4371	4442	4410	4410
query23	33876	33384	33257	33257
query24	5784	2420	2426	2420
query25	463	461	410	410
query26	705	274	155	155
query27	1888	507	338	338
query28	2794	2437	2428	2428
query29	553	559	431	431
query30	214	216	158	158
query31	866	918	824	824
query32	76	62	62	62
query33	469	346	324	324
query34	769	857	506	506
query35	816	828	778	778
query36	941	990	919	919
query37	120	103	72	72
query38	4331	4377	4276	4276
query39	1466	1455	1437	1437
query40	209	117	102	102
query41	52	52	47	47
query42	121	105	104	104
query43	500	522	478	478
query44	1322	821	814	814
query45	176	185	168	168
query46	889	1048	649	649
query47	1850	1910	1799	1799
query48	399	405	321	321
query49	702	535	426	426
query50	705	749	417	417
query51	4191	4363	4296	4296
query52	107	109	106	106
query53	244	261	191	191
query54	488	489	420	420
query55	83	83	83	83
query56	264	282	250	250
query57	1167	1188	1126	1126
query58	267	242	239	239
query59	2801	2907	2835	2835
query60	276	274	272	272
query61	125	118	114	114
query62	750	765	661	661
query63	246	193	194	193
query64	1801	1079	701	701
query65	3231	3150	3135	3135
query66	714	391	304	304
query67	15789	15542	15558	15542
query68	5486	774	512	512
query69	519	320	257	257
query70	1219	1146	1125	1125
query71	426	292	270	270
query72	6000	3622	3718	3622
query73	1152	735	345	345
query74	9007	9109	8919	8919
query75	3271	3168	2721	2721
query76	3882	1191	752	752
query77	546	372	282	282
query78	9916	10021	9300	9300
query79	2180	802	575	575
query80	592	514	449	449
query81	500	285	237	237
query82	242	124	94	94
query83	168	164	150	150
query84	283	102	73	73
query85	791	345	302	302
query86	423	326	284	284
query87	4523	4540	4356	4356
query88	3445	2178	2144	2144
query89	397	317	282	282
query90	1855	191	191	191
query91	135	141	107	107
query92	79	61	60	60
query93	1923	1009	575	575
query94	624	410	275	275
query95	345	262	256	256
query96	486	563	263	263
query97	2831	2848	2739	2739
query98	238	205	204	204
query99	1346	1371	1258	1258
Total cold run time: 296879 ms
Total hot run time: 189898 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.32 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit f3209bcc701f1e0b22afcb664d36d08dd92e290b, data reload: false

query1	0.04	0.03	0.03
query2	0.07	0.03	0.03
query3	0.23	0.07	0.06
query4	1.62	0.10	0.10
query5	0.43	0.41	0.41
query6	1.18	0.66	0.65
query7	0.02	0.02	0.02
query8	0.03	0.04	0.03
query9	0.59	0.52	0.52
query10	0.56	0.58	0.57
query11	0.15	0.11	0.10
query12	0.15	0.11	0.11
query13	0.62	0.60	0.60
query14	2.71	2.81	2.69
query15	0.93	0.85	0.84
query16	0.37	0.37	0.36
query17	1.03	1.01	1.04
query18	0.22	0.20	0.20
query19	1.91	1.81	1.96
query20	0.01	0.01	0.02
query21	15.38	0.89	0.55
query22	0.77	1.16	0.63
query23	14.98	1.36	0.60
query24	7.27	1.29	0.64
query25	0.51	0.11	0.23
query26	0.72	0.17	0.14
query27	0.05	0.06	0.05
query28	8.91	0.88	0.43
query29	12.68	3.94	3.23
query30	0.25	0.10	0.06
query31	2.82	0.60	0.38
query32	3.23	0.55	0.48
query33	3.00	3.01	3.02
query34	15.78	5.17	4.51
query35	4.54	4.56	4.62
query36	0.68	0.51	0.48
query37	0.09	0.07	0.06
query38	0.05	0.04	0.04
query39	0.04	0.03	0.02
query40	0.16	0.13	0.12
query41	0.08	0.02	0.02
query42	0.03	0.02	0.02
query43	0.04	0.04	0.03
Total cold run time: 104.93 s
Total hot run time: 30.32 s

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants