-
Notifications
You must be signed in to change notification settings - Fork 309
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DAOS-14884 pool: set the pool server handle before update #13618
Conversation
Bug-tracker data: |
@phender @daltonbohning Will this CI run touch any of the affected tests with no "Features" or "Test-tag" pragmas? I vote for force landing and then trigger a master daily run. |
"any" of the affected tests, yes. There are basic
We would already get results from the next daily run, so no need to trigger another. But again, if we expect some areas to be affected then we should run those tests now. Otherwise we introduce more regressions in CI that cause more churn and more people to look through known failures, and ultimately closes landing again. |
Set the pool server handler before IV update, to make sure IV server checking accurate on the PS leader once step up finish. Required-githooks: true Signed-off-by: Di Wang <[email protected]>
7ebfd1a
to
9e8be9c
Compare
PR is not being run with any pragmas. |
Right. Which means only the default set of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems okay.
* container server handle will not be copied here, otherwise | ||
* ds_pool_iv_refresh_hdl will not open the server handle container. | ||
*/ | ||
uuid_copy(svc->ps_pool->sp_srv_pool_hdl, pool_hdl_uuid); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By the way, there seems to be a data race on sp_srv_pool_hdl
if a non-system xstream calls ds_pool_iv_srv_hdl_fetch_non_sys
to read sp_srv_pool_hdl
when the pool_iv
code or this new uuid_copy
code updates sp_srv_pool_hdl
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sp_srv_pool_hdl and sp_srv_cont_hdl will only be accessed by system xstream. Hmm, ds_pool_iv_srv_hdl_fetch_non_sys is obsolete, should be removed. let me remove it if the PR needs to refresh.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, that's good.
src/pool/srv_pool.c
Outdated
@@ -1810,6 +1810,11 @@ pool_svc_step_up_cb(struct ds_rsvc *rsvc) | |||
} else { | |||
uuid_generate(pool_hdl_uuid); | |||
uuid_generate(cont_hdl_uuid); | |||
/* Only copy server handle to make is_from_srv() check correctly, and | |||
* container server handle will not be copied here, otherwise | |||
* ds_pool_iv_refresh_hdl will not open the server handle container. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[Nit] Does the first line mean "pool server handle" instead of "server handle"? On the last line, what is "the server handle container"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, pool server handle means pool handle for server to server communication. Sorry, it should be container server handle.
Remove obsolete is_from_srv check for space query, and remove obsolete functions. Required-githooks: true Signed-off-by: Di Wang <[email protected]>
Updated the patch to resolve Li Wei's comments |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Locally tested with my reproducer.
Test stage Functional Hardware Medium Verbs Provider completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13618/3/execution/node/1434/log |
failure due to DAOS-14969 |
To verify the fix for the failing |
Features: pool Required-githooks: true
Test stage Functional Hardware Medium Verbs Provider completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13618/5/execution/node/1429/log |
Set the pool server handler before IV update, to make sure IV server checking accurate on the PS leader once step up finish.
Required-githooks: true
Before requesting gatekeeper:
Features:
(orTest-tag*
) commit pragma was used or there is a reason documented that there are no appropriate tags for this PR.Gatekeeper: