-
Notifications
You must be signed in to change notification settings - Fork 309
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DAOS-14903 container: refine task process in pmap_refresh_cb (#13491) #13682
Conversation
Bug-tracker data: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. No errors found by checkpatch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should at least run the modified functional test and possibly some other features.
Features: EcodOnlineMultFail
should register completion callback before task reinit, or the complete cb possibly cannot be triggered. and a few other backports: 6d4e549 - DAOS-14788 pool: Fix some reinit usages (#13518) 91b93c8 - DAOS-13252 tests: set svcn for multiple_failure test (#13619) d30e842 - DAOS-14903 object: fix bug in peer status check (#13585) Required-githooks: true Test-tag: pr ec_multiple_failure Signed-off-by: Xuezhao Liu <[email protected]> Signed-off-by: Li Wei <[email protected]>
890a562
to
1d6011d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. No errors found by checkpatch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ftest LGTM
@@ -1974,6 +1981,7 @@ map_refresh(tse_task_t *task) | |||
|
|||
D_DEBUG(DB_MD, DF_UUID": %p: asking rank %u for version > %u\n", | |||
DP_UUID(pool->dp_pool), task, rank, version); | |||
dc_pool_put(pool); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we hold the reference during the RPC process on server side, and then ask map_refresh_cb() to release the reference? Then map_refresh_cb() do not need to get reference again. Otherwise, if it is possible that the pool may be freed before map_refresh_cb() triggered, then dc_pool_get() in map_refresh_cb() may access invalid memory?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
leave it to @liw reply as this part is backport from one of his PR, thx.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Nasf-Fan, arg->mra_pool
is the reference that prevents the dc_pool
from being freed before map_refresh_cb
accesses it. The extra references taken in this PR are for the tricky reinit
cases, where anther thread could execute the reinitialized task and (in theory) free the arg->mra_pool
reference after the reinit
call but before the function that calls reinit
returns.
Consider the following scenario: map_refresh() hold the reference at the beginning of the function, then registers map_refresh_cb for the task. Before RPC sending, it puts the reference on the pool. At that time, map_refresh_cb is not called yet. It is possible that someone free the pool before map_refresh_cb() being triggered, right? if yes, then calling dc_pool_get(pool) in map_refresh_cb() will access invalid memory? |
as my understanding, before map_refresh() called, in dc_pool_create_map_refresh_task() it take one extra reference "dc_pool_get(pool); a->mra_pool = pool;", and that reference will be released at map_refresh_cb()'s "!reinit" case. that would aovid the case you worried? @Nasf-Fan |
Thanks for the explain. I am wondering whether map_refresh_cb() can properly distinguish "reinit" or not, for example, if someone triggered reinit, then as shown following:
At that time, dt_result is reset as zero, then whether set "reinit" flag depends on pool_tgt_query_map_out::po_rc, but related (re-sent) RPC may be not handled by server, then can be zero also? The issue is not directly related with the back port patch. But may be worth for us to consider more. |
@Nasf-Fan, the |
I see, thanks |
should register completion callback before task reinit, or the complete cb possibly cannot be triggered.
and a few other backports:
6d4e549 - DAOS-14788 pool: Fix some reinit usages (#13518)
91b93c8 - DAOS-13252 tests: set svcn for multiple_failure test (#13619)
d30e842 - DAOS-14903 object: fix bug in peer status check (#13585)
Required-githooks: true
Before requesting gatekeeper:
Features:
(orTest-tag*
) commit pragma was used or there is a reason documented that there are no appropriate tags for this PR.Gatekeeper: