Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature](script) Add check_jvm_xmx for start_fe.sh #28989

Merged
merged 1 commit into from
Jan 6, 2024

Conversation

SWJTU-ZhangLei
Copy link
Contributor

  • When -Xmx is configured more than 90% of total physical memory, start_fe.sh will not allowed to start, because fe maybe been killed by operating system with a high probability.

Proposed changes

Issue Number: close #xxx

Further comments

If this is a relatively large or complex change, kick off the discussion at [email protected] by explaining why you chose the solution you did and what alternatives you considered, etc...

@SWJTU-ZhangLei
Copy link
Contributor Author

run buildall

Copy link
Contributor

sh-checker report

To get the full details, please check in the job output.

shellcheck errors

'shellcheck ' returned error 1 finding the following syntactical issues:

----------

In bin/start_fe.sh line 201:
    local os_type=$(uname -s)
          ^-----^ SC2155 (warning): Declare and assign separately to avoid masking return values.


In bin/start_fe.sh line 203:
        local total_mem_byte="$(free -b | grep Mem | awk '{print $2}')"
              ^------------^ SC2155 (warning): Declare and assign separately to avoid masking return values.


In bin/start_fe.sh line 204:
        local jvm_xmx_byte="$(${JAVA} ${final_java_opt} -XX:+PrintFlagsFinal -version 2>&1 | awk '/MaxHeapSize/ {print $4}')"
              ^----------^ SC2155 (warning): Declare and assign separately to avoid masking return values.
                                      ^---------------^ SC2086 (info): Double quote to prevent globbing and word splitting.

Did you mean: 
        local jvm_xmx_byte="$(${JAVA} "${final_java_opt}" -XX:+PrintFlagsFinal -version 2>&1 | awk '/MaxHeapSize/ {print $4}')"


In bin/start_fe.sh line 205:
        local total_mem_mb=$(($total_mem_byte / 1024 / 1024))
                              ^-------------^ SC2004 (style): $/${} is unnecessary on arithmetic variables.
                              ^-------------^ SC2250 (style): Prefer putting braces around variable references even when not strictly required.

Did you mean: 
        local total_mem_mb=$((${total_mem_byte} / 1024 / 1024))


In bin/start_fe.sh line 206:
        local ninety_percent_mem_mb=$(($total_mem_byte / 10 * 9 / 1024 / 1024))
                                       ^-------------^ SC2004 (style): $/${} is unnecessary on arithmetic variables.
                                       ^-------------^ SC2250 (style): Prefer putting braces around variable references even when not strictly required.
                                                       ^-- SC2017 (info): Increase precision by replacing a/b*c with a*c/b.

Did you mean: 
        local ninety_percent_mem_mb=$((${total_mem_byte} / 10 * 9 / 1024 / 1024))


In bin/start_fe.sh line 207:
        local jvm_xmx_mb=$(($jvm_xmx_byte / 1024 / 1024))
                            ^-----------^ SC2004 (style): $/${} is unnecessary on arithmetic variables.
                            ^-----------^ SC2250 (style): Prefer putting braces around variable references even when not strictly required.

Did you mean: 
        local jvm_xmx_mb=$((${jvm_xmx_byte} / 1024 / 1024))


In bin/start_fe.sh line 209:
        if [ ${jvm_xmx_mb} -gt ${ninety_percent_mem_mb} ]; then
           ^-- SC2292 (style): Prefer [[ ]] over [ ] for tests in Bash/Ksh.
             ^-----------^ SC2248 (style): Prefer double quoting even when variables don't contain special characters.
                               ^----------------------^ SC2248 (style): Prefer double quoting even when variables don't contain special characters.

Did you mean: 
        if [[ "${jvm_xmx_mb}" -gt "${ninety_percent_mem_mb}" ]]; then

For more information:
  https://www.shellcheck.net/wiki/SC2155 -- Declare and assign separately to ...
  https://www.shellcheck.net/wiki/SC2017 -- Increase precision by replacing a...
  https://www.shellcheck.net/wiki/SC2086 -- Double quote to prevent globbing ...
----------

You can address the above issues in one of three ways:
1. Manually correct the issue in the offending shell script;
2. Disable specific issues by adding the comment:
  # shellcheck disable=NNNN
above the line that contains the issue, where NNNN is the error code;
3. Add '-e NNNN' to the SHELLCHECK_OPTS setting in your .yml action file.



shfmt errors

'shfmt ' returned error 1 finding the following formatting issues:

----------
--- bin/start_fe.sh.orig
+++ bin/start_fe.sh
@@ -209,7 +209,7 @@
         if [ ${jvm_xmx_mb} -gt ${ninety_percent_mem_mb} ]; then
             echo "java opt -Xmx is more than 90% of total physical memory"
             echo "total_mem_mb:${total_mem_mb}MB ninety_percent_mem_mb:${ninety_percent_mem_mb}MB jvm_xmx_mb:${jvm_xmx_mb}MB"
-            exit 1;
+            exit 1
         fi
     fi
 }
----------

You can reformat the above files to meet shfmt's requirements by typing:

  shfmt  -w filename


@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 44.45 seconds
stream load tsv: 563 seconds loaded 74807831229 Bytes, about 126 MB/s
stream load json: 19 seconds loaded 2358488459 Bytes, about 118 MB/s
stream load orc: 66 seconds loaded 1101869774 Bytes, about 15 MB/s
stream load parquet: 32 seconds loaded 861443392 Bytes, about 25 MB/s
insert into select: 40.1 seconds inserted 10000000 Rows, about 249K ops/s
storage size: 17187391683 Bytes

@doris-robot
Copy link

TPC-H test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'

Tpch sf100 test result on commit fb648e7ccf92496f1a60ac81fff10d6d386de810, data reload: false

run tpch-sf100 query with default conf and session variables
q1	4695	4368	4389	4368
q2	363	152	159	152
q3	1452	1241	1210	1210
q4	1128	909	887	887
q5	3156	3174	3168	3168
q6	253	131	131	131
q7	997	510	484	484
q8	2183	2236	2203	2203
q9	6689	6631	6682	6631
q10	3223	3256	3283	3256
q11	312	184	191	184
q12	366	210	209	209
q13	4534	3825	3842	3825
q14	242	220	218	218
q15	578	532	530	530
q16	438	384	381	381
q17	1017	681	594	594
q18	7080	6858	6805	6805
q19	1518	1402	1433	1402
q20	539	296	315	296
q21	3063	2674	2645	2645
q22	351	279	285	279
Total cold run time: 44177 ms
Total hot run time: 39858 ms

run tpch-sf100 query with default conf and set session variable runtime_filter_mode=off
q1	4328	4329	4355	4329
q2	271	165	173	165
q3	3496	3505	3484	3484
q4	2399	2376	2376	2376
q5	5696	5700	5714	5700
q6	241	122	123	122
q7	2385	1854	1857	1854
q8	3538	3520	3521	3520
q9	8977	8966	9028	8966
q10	3917	3996	3996	3996
q11	485	365	372	365
q12	767	592	598	592
q13	4299	3566	3534	3534
q14	289	253	258	253
q15	566	517	520	517
q16	506	453	459	453
q17	1881	1873	1878	1873
q18	8557	8167	8216	8167
q19	1742	1767	1745	1745
q20	2250	1948	1923	1923
q21	6492	6138	6172	6138
q22	501	426	413	413
Total cold run time: 63583 ms
Total hot run time: 60485 ms

@SWJTU-ZhangLei
Copy link
Contributor Author

run buildall

Copy link
Contributor

sh-checker report

To get the full details, please check in the job output.

shellcheck errors

'shellcheck ' returned error 1 finding the following syntactical issues:

----------

In bin/start_fe.sh line 201:
    local os_type=$(uname -s)
          ^-----^ SC2155 (warning): Declare and assign separately to avoid masking return values.


In bin/start_fe.sh line 203:
        local total_mem_byte="$(free -b | grep Mem | awk '{print $2}')"
              ^------------^ SC2155 (warning): Declare and assign separately to avoid masking return values.


In bin/start_fe.sh line 204:
        local jvm_xmx_byte="$(${JAVA} ${final_java_opt} -XX:+PrintFlagsFinal -version 2>&1 | awk '/MaxHeapSize/ {print $4}')"
              ^----------^ SC2155 (warning): Declare and assign separately to avoid masking return values.
                                      ^---------------^ SC2086 (info): Double quote to prevent globbing and word splitting.

Did you mean: 
        local jvm_xmx_byte="$(${JAVA} "${final_java_opt}" -XX:+PrintFlagsFinal -version 2>&1 | awk '/MaxHeapSize/ {print $4}')"


In bin/start_fe.sh line 205:
        local total_mem_mb=$(($total_mem_byte / 1024 / 1024))
                              ^-------------^ SC2004 (style): $/${} is unnecessary on arithmetic variables.
                              ^-------------^ SC2250 (style): Prefer putting braces around variable references even when not strictly required.

Did you mean: 
        local total_mem_mb=$((${total_mem_byte} / 1024 / 1024))


In bin/start_fe.sh line 206:
        local ninety_percent_mem_mb=$(($total_mem_byte / 10 * 9 / 1024 / 1024))
                                       ^-------------^ SC2004 (style): $/${} is unnecessary on arithmetic variables.
                                       ^-------------^ SC2250 (style): Prefer putting braces around variable references even when not strictly required.
                                                       ^-- SC2017 (info): Increase precision by replacing a/b*c with a*c/b.

Did you mean: 
        local ninety_percent_mem_mb=$((${total_mem_byte} / 10 * 9 / 1024 / 1024))


In bin/start_fe.sh line 207:
        local jvm_xmx_mb=$(($jvm_xmx_byte / 1024 / 1024))
                            ^-----------^ SC2004 (style): $/${} is unnecessary on arithmetic variables.
                            ^-----------^ SC2250 (style): Prefer putting braces around variable references even when not strictly required.

Did you mean: 
        local jvm_xmx_mb=$((${jvm_xmx_byte} / 1024 / 1024))


In bin/start_fe.sh line 209:
        if [ ${jvm_xmx_mb} -gt ${ninety_percent_mem_mb} ]; then
           ^-- SC2292 (style): Prefer [[ ]] over [ ] for tests in Bash/Ksh.
             ^-----------^ SC2248 (style): Prefer double quoting even when variables don't contain special characters.
                               ^----------------------^ SC2248 (style): Prefer double quoting even when variables don't contain special characters.

Did you mean: 
        if [[ "${jvm_xmx_mb}" -gt "${ninety_percent_mem_mb}" ]]; then

For more information:
  https://www.shellcheck.net/wiki/SC2155 -- Declare and assign separately to ...
  https://www.shellcheck.net/wiki/SC2017 -- Increase precision by replacing a...
  https://www.shellcheck.net/wiki/SC2086 -- Double quote to prevent globbing ...
----------

You can address the above issues in one of three ways:
1. Manually correct the issue in the offending shell script;
2. Disable specific issues by adding the comment:
  # shellcheck disable=NNNN
above the line that contains the issue, where NNNN is the error code;
3. Add '-e NNNN' to the SHELLCHECK_OPTS setting in your .yml action file.



shfmt errors

'shfmt ' returned error 1 finding the following formatting issues:

----------
--- bin/start_fe.sh.orig
+++ bin/start_fe.sh
@@ -209,7 +209,7 @@
         if [ ${jvm_xmx_mb} -gt ${ninety_percent_mem_mb} ]; then
             echo "java opt -Xmx is more than 90% of total physical memory"
             echo "total_mem_mb:${total_mem_mb}MB ninety_percent_mem_mb:${ninety_percent_mem_mb}MB jvm_xmx_mb:${jvm_xmx_mb}MB"
-            exit 1;
+            exit 1
         fi
     fi
 }
----------

You can reformat the above files to meet shfmt's requirements by typing:

  shfmt  -w filename


@SWJTU-ZhangLei
Copy link
Contributor Author

run buildall

Copy link
Contributor

sh-checker report

To get the full details, please check in the job output.

shellcheck errors
'shellcheck ' found no issues.

shfmt errors

'shfmt ' returned error 1 finding the following formatting issues:

----------
--- bin/start_fe.sh.orig
+++ bin/start_fe.sh
@@ -208,13 +208,13 @@
         total_mem_byte="$(free -b | grep Mem | awk '{print $2}')"
         jvm_xmx_byte="$(${JAVA} "${final_java_opt}" -XX:+PrintFlagsFinal -version 2>&1 | awk '/MaxHeapSize/ {print $4}')"
         total_mem_mb=$(("${total_mem_byte}" / 1024 / 1024))
-        ninety_percent_mem_mb=$(("${total_mem_byte}" * 9 / 10  / 1024 / 1024))
+        ninety_percent_mem_mb=$(("${total_mem_byte}" * 9 / 10 / 1024 / 1024))
         jvm_xmx_mb=$(("${jvm_xmx_byte}" / 1024 / 1024))
 
         if [[ ${jvm_xmx_mb} -gt ${ninety_percent_mem_mb} ]]; then
             echo "java opt -Xmx is more than 90% of total physical memory"
             echo "total_mem_mb:${total_mem_mb}MB ninety_percent_mem_mb:${ninety_percent_mem_mb}MB jvm_xmx_mb:${jvm_xmx_mb}MB"
-            exit 1;
+            exit 1
         fi
     fi
 }
----------

You can reformat the above files to meet shfmt's requirements by typing:

  shfmt  -w filename


@SWJTU-ZhangLei
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'

Tpch sf100 test result on commit 72619d80f7632765643de2d98d50d992a8a315ad, data reload: false

run tpch-sf100 query with default conf and session variables
q1	4904	4728	4689	4689
q2	376	179	159	159
q3	1562	1412	1296	1296
q4	1226	1058	987	987
q5	3271	3286	3303	3286
q6	284	140	136	136
q7	1099	552	500	500
q8	2368	2413	2405	2405
q9	6937	6895	6906	6895
q10	3273	3345	3348	3345
q11	338	223	199	199
q12	394	207	219	207
q13	4552	3830	3811	3811
q14	242	213	213	213
q15	580	521	522	521
q16	449	388	376	376
q17	1062	823	604	604
q18	7145	6765	6885	6765
q19	1681	1699	1694	1694
q20	548	311	281	281
q21	3380	2904	2938	2904
q22	370	298	308	298
Total cold run time: 46041 ms
Total hot run time: 41571 ms

run tpch-sf100 query with default conf and set session variable runtime_filter_mode=off
q1	4619	4616	4605	4605
q2	300	166	171	166
q3	3608	3578	3570	3570
q4	2507	2473	2488	2473
q5	5828	5830	5840	5830
q6	276	129	128	128
q7	2399	1857	1843	1843
q8	3774	3773	3765	3765
q9	9075	9040	9009	9009
q10	3945	4061	4059	4059
q11	496	360	365	360
q12	793	640	604	604
q13	4313	3552	3575	3552
q14	292	267	258	258
q15	579	527	512	512
q16	503	464	464	464
q17	2075	2038	2028	2028
q18	8879	8308	8438	8308
q19	1950	1977	1968	1968
q20	2325	1940	1926	1926
q21	6787	6382	6391	6382
q22	556	459	487	459
Total cold run time: 65879 ms
Total hot run time: 62269 ms

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 45.27 seconds
stream load tsv: 568 seconds loaded 74807831229 Bytes, about 125 MB/s
stream load json: 19 seconds loaded 2358488459 Bytes, about 118 MB/s
stream load orc: 66 seconds loaded 1101869774 Bytes, about 15 MB/s
stream load parquet: 32 seconds loaded 861443392 Bytes, about 25 MB/s
insert into select: 20.6 seconds inserted 10000000 Rows, about 485K ops/s
storage size: 17184182174 Bytes

@SWJTU-ZhangLei
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'

Tpch sf100 test result on commit 3553c5b6b5d5feac4e5371a8e20e640c61713272, data reload: false

run tpch-sf100 query with default conf and session variables
q1	4761	4488	4502	4488
q2	378	137	158	137
q3	1473	1321	1200	1200
q4	1151	992	924	924
q5	3177	3140	3168	3140
q6	260	134	132	132
q7	1022	489	495	489
q8	2241	2271	2243	2243
q9	6740	6681	6694	6681
q10	3215	3282	3309	3282
q11	333	209	204	204
q12	360	216	213	213
q13	4556	3834	3782	3782
q14	240	212	214	212
q15	569	523	527	523
q16	441	390	379	379
q17	1048	786	601	601
q18	7118	6851	6805	6805
q19	1585	1606	1598	1598
q20	588	319	319	319
q21	3233	2794	2803	2794
q22	365	299	309	299
Total cold run time: 44854 ms
Total hot run time: 40445 ms

run tpch-sf100 query with default conf and set session variable runtime_filter_mode=off
q1	4412	4409	4397	4397
q2	271	169	174	169
q3	3508	3496	3496	3496
q4	2450	2418	2414	2414
q5	5741	5741	5720	5720
q6	252	125	126	125
q7	2376	1848	1863	1848
q8	3603	3624	3610	3610
q9	8984	8949	8984	8949
q10	3927	4006	4017	4006
q11	489	358	350	350
q12	782	614	612	612
q13	4290	3588	3554	3554
q14	282	272	263	263
q15	563	524	524	524
q16	500	468	444	444
q17	1982	1951	1956	1951
q18	8756	8108	8293	8108
q19	1809	1814	1829	1814
q20	2249	1948	1934	1934
q21	6632	6241	6260	6241
q22	543	461	473	461
Total cold run time: 64401 ms
Total hot run time: 60990 ms

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 44.85 seconds
stream load tsv: 564 seconds loaded 74807831229 Bytes, about 126 MB/s
stream load json: 19 seconds loaded 2358488459 Bytes, about 118 MB/s
stream load orc: 66 seconds loaded 1101869774 Bytes, about 15 MB/s
stream load parquet: 33 seconds loaded 861443392 Bytes, about 24 MB/s
insert into select: 28.9 seconds inserted 10000000 Rows, about 346K ops/s
storage size: 17183821275 Bytes

* When -Xmx is configured more than 90% of total physical memory, start_fe.sh
  will not allowed to start, because fe maybe been killed by operating system
  with a high probability.
@SWJTU-ZhangLei
Copy link
Contributor Author

run buildall

Copy link
Contributor

github-actions bot commented Jan 3, 2024

PR approved by anyone and no changes requested.

@doris-robot
Copy link

TPC-H test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G', run with scripts in https://github.com/apache/doris/tree/master/tools/tpch-tools

Tpch sf100 test result on commit 80200b991282dc1551a29a3252e756121a0784b3, data reload: false

run tpch-sf100 query with default conf and session variables
q1	5474	5180	5187	5180
q2	398	177	159	159
q3	1475	1144	1164	1144
q4	1107	820	845	820
q5	3132	3147	3122	3122
q6	235	139	136	136
q7	997	575	520	520
q8	2181	2233	2239	2233
q9	6723	6710	6684	6684
q10	3187	3122	3164	3122
q11	339	229	214	214
q12	394	248	243	243
q13	4425	3668	3699	3668
q14	244	225	228	225
q15	611	549	546	546
q16	471	419	406	406
q17	1049	562	525	525
q18	7103	6892	7451	6892
q19	1663	1533	1513	1513
q20	603	342	336	336
q21	2878	2418	2458	2418
q22	400	333	344	333
Total cold run time: 45089 ms
Total hot run time: 40439 ms

run tpch-sf100 query with default conf and set session variable runtime_filter_mode=off
q1	5160	5132	5091	5091
q2	333	245	259	245
q3	3378	3342	3319	3319
q4	2148	2082	2034	2034
q5	5950	5949	5927	5927
q6	227	131	132	131
q7	2390	1920	1932	1920
q8	3575	3692	3726	3692
q9	9103	9080	9010	9010
q10	3899	3927	3928	3927
q11	577	501	479	479
q12	812	635	640	635
q13	3950	3241	3220	3220
q14	307	288	275	275
q15	606	545	552	545
q16	559	521	513	513
q17	2034	1806	1818	1806
q18	8735	8422	8651	8422
q19	1739	1662	1683	1662
q20	2276	2015	1989	1989
q21	5752	5315	5398	5315
q22	559	495	531	495
Total cold run time: 64069 ms
Total hot run time: 60652 ms

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 46.95 seconds
stream load tsv: 576 seconds loaded 74807831229 Bytes, about 123 MB/s
stream load json: 19 seconds loaded 2358488459 Bytes, about 118 MB/s
stream load orc: 66 seconds loaded 1101869774 Bytes, about 15 MB/s
stream load parquet: 32 seconds loaded 861443392 Bytes, about 25 MB/s
insert into select: 28.3 seconds inserted 10000000 Rows, about 353K ops/s
storage size: 17183980498 Bytes

Copy link
Contributor

@dataroaring dataroaring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Jan 5, 2024
Copy link
Contributor

github-actions bot commented Jan 5, 2024

PR approved by at least one committer and no changes requested.

@yiguolei yiguolei merged commit 8c9908c into apache:master Jan 6, 2024
19 checks passed
shuke987 added a commit to shuke987/doris that referenced this pull request Jan 7, 2024
Gabriel39 pushed a commit that referenced this pull request Jan 7, 2024
HappenLee pushed a commit to HappenLee/incubator-doris that referenced this pull request Jan 12, 2024
* When -Xmx is configured more than 90% of total physical memory, start_fe.sh
  will not allowed to start, because fe maybe been killed by operating system
  with a high probability.
HappenLee pushed a commit to HappenLee/incubator-doris that referenced this pull request Jan 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. dev/3.0.0-merged reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants