-
Notifications
You must be signed in to change notification settings - Fork 308
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DAOS-16620 vos: exit scrub earlier during container destruction #15957
base: master
Are you sure you want to change the base?
Conversation
Ticket title is 'soak/smoke.py:SoakSmoke.test_soak_smoke - failed to destroy container TestContainer_1: DER_IO(-2001)' |
f0dc6aa
to
ded11fc
Compare
Test stage NLT on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-15957/2/testReport/ |
In sc_wait_until_should_continue(), the function could sleep for over 60 seconds before proceeding, causing container destruction timeouts and DER_BUSY errors during retries. This fix adds 1-second interval checks for sc_cont_is_stopping() during sleep cycles. Additionally, enhanced error logging when container destruction is already in progress. Test-tag: test_soak_smoke pr Signed-off-by: Wang Shilong <[email protected]>
ded11fc
to
38c35c5
Compare
src/vos/vos_pool_scrub.c
Outdated
break; | ||
sc_sleep(ctx, 1000); | ||
sleep_seconds--; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sched_req_wakeup(cont->sc_pool->spc_scrubbing_req) is called on VOS container destroy, this is not necessary.
Signed-off-by: Wang Shilong <[email protected]>
Test-tag: test_soak_smoke pr Signed-off-by: Wang Shilong <[email protected]>
Test stage Functional Hardware Medium Verbs Provider completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15957/5/execution/node/1557/log |
src/vos/vos_pool_scrub.c
Outdated
if (sc_mode(ctx) == DAOS_SCRUB_MODE_TIMED) { | ||
struct timespec now; | ||
uint64_t msec_between; | ||
|
||
d_gettime(&now); | ||
while ((msec_between = sc_get_ms_between_scrubs(ctx)) > 0) { | ||
d_tm_set_gauge(ctx->sc_metrics.scm_next_csum_scrub, msec_between); | ||
if (sc_cont_is_stopping(ctx)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Strictly, checking stopping before d_tm_set_gauge() or after sc_sleep() seems better. Anyway, not serious, can be adjusted latter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right. fixed.
Test-tag: test_soak_smoke pr Signed-off-by: Wang Shilong <[email protected]>
In sc_wait_until_should_continue(), the function could sleep for over 60 seconds before proceeding, causing container destruction timeouts and DER_BUSY errors during retries. This fix adds 1-second interval checks for sc_cont_is_stopping() during sleep cycles.
Additionally, enhanced error logging when container destruction is already in progress.
Test-tag: test_soak_smoke
Before requesting gatekeeper:
Features:
(orTest-tag*
) commit pragma was used or there is a reason documented that there are no appropriate tags for this PR.Gatekeeper: