DAOS-16620 vos: exit scrub earlier during container destruction #15957

wangshilong · 2025-02-22T13:04:22Z

In sc_wait_until_should_continue(), the function could sleep for over 60 seconds before proceeding, causing container destruction timeouts and DER_BUSY errors during retries. This fix adds 1-second interval checks for sc_cont_is_stopping() during sleep cycles.

Additionally, enhanced error logging when container destruction is already in progress.

Test-tag: test_soak_smoke

Before requesting gatekeeper:

Two review approvals and any prior change requests have been resolved.
Testing is complete and all tests passed or there is a reason documented in the PR why it should be force landed and forced-landing tag is set.
Features: (or Test-tag*) commit pragma was used or there is a reason documented that there are no appropriate tags for this PR.
Commit messages follows the guidelines outlined here.
Any tests skipped by the ticket being addressed have been run and passed in the PR.

Gatekeeper:

github-actions · 2025-02-22T13:04:41Z

Ticket title is 'soak/smoke.py:SoakSmoke.test_soak_smoke - failed to destroy container TestContainer_1: DER_IO(-2001)'
Status is 'In Review'
Labels: '2.6.1_issue,2.6.1rc2,2.6.2tb2,2.6.3rc2,2.6.3rc3,2.7.101tb,ci_2.6_weekly,ci_master_weekly,weekly_test'
https://daosio.atlassian.net/browse/DAOS-16620

daosbuild1 · 2025-02-22T13:57:25Z

Test stage NLT on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-15957/2/testReport/

In sc_wait_until_should_continue(), the function could sleep for over 60 seconds before proceeding, causing container destruction timeouts and DER_BUSY errors during retries. This fix adds 1-second interval checks for sc_cont_is_stopping() during sleep cycles. Additionally, enhanced error logging when container destruction is already in progress. Test-tag: test_soak_smoke pr Signed-off-by: Wang Shilong <[email protected]>

NiuYawei · 2025-02-24T02:53:37Z

src/vos/vos_pool_scrub.c

+				break;
+			sc_sleep(ctx, 1000);
+			sleep_seconds--;
+		}


sched_req_wakeup(cont->sc_pool->spc_scrubbing_req) is called on VOS container destroy, this is not necessary.

Signed-off-by: Wang Shilong <[email protected]>

…S-16620

Test-tag: test_soak_smoke pr Signed-off-by: Wang Shilong <[email protected]>

daosbuild1 · 2025-02-25T21:21:23Z

Test stage Functional Hardware Medium Verbs Provider completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15957/5/execution/node/1557/log

Nasf-Fan · 2025-02-27T08:53:22Z

src/vos/vos_pool_scrub.c

 	if (sc_mode(ctx) == DAOS_SCRUB_MODE_TIMED) {
 		struct timespec	now;
 		uint64_t	msec_between;

 		d_gettime(&now);
 		while ((msec_between = sc_get_ms_between_scrubs(ctx)) > 0) {
 			d_tm_set_gauge(ctx->sc_metrics.scm_next_csum_scrub, msec_between);
+			if (sc_cont_is_stopping(ctx))


Strictly, checking stopping before d_tm_set_gauge() or after sc_sleep() seems better. Anyway, not serious, can be adjusted latter.

You are right. fixed.

…S-16620

Test-tag: test_soak_smoke pr Signed-off-by: Wang Shilong <[email protected]>

wangshilong force-pushed the shilongw/DAOS-16620 branch from f0dc6aa to ded11fc Compare February 22, 2025 13:12

wangshilong changed the title ~~DAOS-16620 vos: Exit scrub earlier during container destruction~~ DAOS-16620 vos: exit scrub earlier during container destruction Feb 22, 2025

wangshilong force-pushed the shilongw/DAOS-16620 branch from ded11fc to 38c35c5 Compare February 24, 2025 02:41

wangshilong marked this pull request as ready for review February 24, 2025 02:43

wangshilong requested review from a team as code owners February 24, 2025 02:43

wangshilong requested review from NiuYawei, Nasf-Fan, jolivier23 and ryon-jensen February 24, 2025 02:43

NiuYawei requested changes Feb 24, 2025

View reviewed changes

wangshilong added 2 commits February 24, 2025 21:03

address comments

3258ac9

Signed-off-by: Wang Shilong <[email protected]>

Merge branch 'master' of github.com:daos-stack/daos into shilongw/DAO…

e22f090

…S-16620

wangshilong requested a review from NiuYawei February 25, 2025 02:05

test tag

97b3557

Test-tag: test_soak_smoke pr Signed-off-by: Wang Shilong <[email protected]>

NiuYawei previously approved these changes Feb 27, 2025

View reviewed changes

Nasf-Fan previously approved these changes Feb 27, 2025

View reviewed changes

wangshilong added 2 commits February 27, 2025 10:41

Merge branch 'master' of github.com:daos-stack/daos into shilongw/DAO…

d090ef5

…S-16620

address comments

9631f72

Test-tag: test_soak_smoke pr Signed-off-by: Wang Shilong <[email protected]>

wangshilong dismissed stale reviews from Nasf-Fan and NiuYawei via 9631f72 February 27, 2025 15:46

wangshilong requested review from NiuYawei and Nasf-Fan February 27, 2025 15:47

Nasf-Fan approved these changes Feb 27, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DAOS-16620 vos: exit scrub earlier during container destruction #15957

DAOS-16620 vos: exit scrub earlier during container destruction #15957

wangshilong commented Feb 22, 2025

github-actions bot commented Feb 22, 2025 •

edited

Loading

daosbuild1 commented Feb 22, 2025

NiuYawei Feb 24, 2025

daosbuild1 commented Feb 25, 2025

Nasf-Fan Feb 27, 2025

wangshilong Feb 27, 2025

DAOS-16620 vos: exit scrub earlier during container destruction #15957

Are you sure you want to change the base?

DAOS-16620 vos: exit scrub earlier during container destruction #15957

Conversation

wangshilong commented Feb 22, 2025

Before requesting gatekeeper:

Gatekeeper:

github-actions bot commented Feb 22, 2025 • edited Loading

daosbuild1 commented Feb 22, 2025

NiuYawei Feb 24, 2025

Choose a reason for hiding this comment

daosbuild1 commented Feb 25, 2025

Nasf-Fan Feb 27, 2025

Choose a reason for hiding this comment

wangshilong Feb 27, 2025

Choose a reason for hiding this comment

github-actions bot commented Feb 22, 2025 •

edited

Loading