Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DAOS-16902 pool: Polish map change logging #15744

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open

Conversation

liw
Copy link
Contributor

@liw liw commented Jan 17, 2025

No logic changes; just adding pool UUID and pool map address to pool map change logging, and logging important messages to both stdout and log.

Before requesting gatekeeper:

  • Two review approvals and any prior change requests have been resolved.
  • Testing is complete and all tests passed or there is a reason documented in the PR why it should be force landed and forced-landing tag is set.
  • Features: (or Test-tag*) commit pragma was used or there is a reason documented that there are no appropriate tags for this PR.
  • Commit messages follows the guidelines outlined here.
  • Any tests skipped by the ticket being addressed have been run and passed in the PR.

Gatekeeper:

  • You are the appropriate gatekeeper to be landing the patch.
  • The PR has 2 reviews by people familiar with the code, including appropriate owners.
  • Githooks were used. If not, request that user install them and check copyright dates.
  • Checkpatch issues are resolved. Pay particular attention to ones that will show up on future PRs.
  • All builds have passed. Check non-required builds for any new compiler warnings.
  • Sufficient testing is done. Check feature pragmas and test tags and that tests skipped for the ticket are run and now pass with the changes.
  • If applicable, the PR has addressed any potential version compatibility issues.
  • Check the target branch. If it is master branch, should the PR go to a feature branch? If it is a release branch, does it have merge approval in the JIRA ticket.
  • Extra checks if forced landing is requested
    • Review comments are sufficiently resolved, particularly by prior reviewers that requested changes.
    • No new NLT or valgrind warnings. Check the classic view.
    • Quick-build or Quick-functional is not used.
  • Fix the commit message upon landing. Check the standard here. Edit it to create a single commit. If necessary, ask submitter for a new summary.

No logic changes; just adding pool UUID and pool map address to pool map
change logging, and logging important messages to both stdout and log.

Signed-off-by: Li Wei <[email protected]>
Required-githooks: true
Copy link

Ticket title is 'LRZ: rebuild of pool is hanging'
Status is 'Awaiting Verification'
Labels: 'LRZ'
https://daosio.atlassian.net/browse/DAOS-16902

@liw liw marked this pull request as ready for review January 20, 2025 00:43
@liw liw requested review from a team as code owners January 20, 2025 00:43
@liw liw requested review from liuxuezhao, kccain and knard38 January 20, 2025 00:44
@liw
Copy link
Contributor Author

liw commented Jan 20, 2025

Since the CI run has taken 3 days and is still in progress, I'm prematurely marking this PR as ready for review.

knard38
knard38 previously approved these changes Jan 20, 2025
Copy link
Contributor

@knard38 knard38 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

liuxuezhao
liuxuezhao previously approved these changes Jan 20, 2025
Copy link
Contributor

@kccain kccain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hope I haven't overthought this patch with some of the suggestions that are independent potential fixes for existing log messages (as opposed to the intent of this patch to provide pool UUID / map pointer context to messages).

return -DER_BUSY;
case PO_COMP_ST_UPIN:
D_DEBUG(DB_MD, "change "DF_TARGET" to DRAIN %p\n",
DP_TARGET(target), map);
D_DEBUG(DB_MD, DF_MAP ": change " DF_TARGET " to DRAIN %p\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

map is now printed twice in this format string (DF_MAP and final %p that perhaps can be removed?)

And no change requested for this: at first I started to think this D_DEBUG call (and similar instances) could be removed, since the subsequent D_PRINT() and D_INFO() cover the logging needed. But DP_TARGET() logged before and after the target changes will show the status and failure sequence transitions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed %p at the end of the message.

Since I did not design this code, I'll leave the potential removal of this D_DEBUG to you and @liuxuezhao.

@@ -101,29 +106,28 @@ update_one_tgt(struct pool_map *map, struct pool_target *target,
case PO_COMP_ST_DRAIN:
case PO_COMP_ST_DOWNOUT:
/* Nothing to do, already excluded / draining */
D_INFO("Skip drain down target (rank %u idx %u)\n",
target->ta_comp.co_rank,
D_INFO(DF_MAP ": Skip drain down target (rank %u idx %u)\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no change requested (unless it is obvious) - I'm assuming DF_TARGET/DP_TARGET are somehow not appropriate to use in this particular state change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any reason; changed to DF_TARGET.

return -DER_BUSY;
case PO_COMP_ST_DOWN:
target->ta_comp.co_flags |= PO_COMPF_DOWN2UP;
case PO_COMP_ST_DOWNOUT:
D_DEBUG(DB_MD, "change "DF_TARGET" to UP %p\n",
DP_TARGET(target), map);
D_DEBUG(DB_MD, DF_MAP ": change " DF_TARGET " to UP %p\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

redundant printing of map pointer value in the format?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the %p at the end of the message.

@@ -213,12 +217,13 @@ update_one_tgt(struct pool_map *map, struct pool_target *target,
case PO_COMP_ST_NEW:
case PO_COMP_ST_UP:
/* Nothing to do, already UPIN */
D_INFO("Skip ADD_IN UPIN "DF_TARGET"\n",
D_INFO(DF_MAP ": Skip ADD_IN UPIN " DF_TARGET "\n", DP_MAP(pool_uuid, map),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
D_INFO(DF_MAP ": Skip ADD_IN UPIN " DF_TARGET "\n", DP_MAP(pool_uuid, map),
D_INFO(DF_MAP ": Skip EXCLUDE_OUT " DF_TARGET "\n", DP_MAP(pool_uuid, map),

and maybe the comment above meant to say already downout

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to reflect the cases better. Please double check.

@@ -229,16 +234,17 @@ update_one_tgt(struct pool_map *map, struct pool_target *target,
case PO_COMP_ST_DOWNOUT:
case PO_COMP_ST_NEW:
/* Nothing to do, already UPIN */
D_INFO("Skip ADD_IN UPIN "DF_TARGET"\n",
D_INFO(DF_MAP ": Skip ADD_IN UPIN " DF_TARGET "\n", DP_MAP(pool_uuid, map),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
D_INFO(DF_MAP ": Skip ADD_IN UPIN " DF_TARGET "\n", DP_MAP(pool_uuid, map),
D_INFO(DF_MAP ": Skip FINISH_REBUILD DF_TARGET "\n", DP_MAP(pool_uuid, map),

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to reflect the cases better. Please double check.

@@ -250,8 +256,8 @@ update_one_tgt(struct pool_map *map, struct pool_target *target,
case PO_COMP_ST_DOWN:
case PO_COMP_ST_NEW:
/* Nothing to do, already UPIN and DOWNOUT. DOWN can not be revert. */
D_INFO("Skip ADD_IN UPIN DOWN "DF_TARGET"\n",
DP_TARGET(target));
D_INFO(DF_MAP ": Skip ADD_IN UPIN DOWN " DF_TARGET "\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
D_INFO(DF_MAP ": Skip ADD_IN UPIN DOWN " DF_TARGET "\n",
D_INFO(DF_MAP ": Skip REVERT_REBUILD UPIN DOWN " DF_TARGET "\n",

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to reflect the cases better. Please double check.

target->ta_comp.co_status = PO_COMP_ST_NEW;
target->ta_comp.co_in_ver = 0;
++(*version);
} else {
D_DEBUG(DB_MD, "change "DF_TARGET" to DOWNOUT %p fseq %u\n",
DP_TARGET(target), map, target->ta_comp.co_fseq);
D_DEBUG(DB_MD, DF_MAP ": change " DF_TARGET " to DOWNOUT fseq %u\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
D_DEBUG(DB_MD, DF_MAP ": change " DF_TARGET " to DOWNOUT fseq %u\n",
D_DEBUG(DB_MD, DF_MAP ": change " DF_TARGET " to DOWN or DOWNOUT fseq %u\n",

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed this to two messages, one for each if branch. Please check.

D_PRINT(DF_TARGET" is excluded.\n", DP_TARGET(target));
else
D_INFO(DF_TARGET" is excluded.\n", DP_TARGET(target));
D_PRINT(DF_MAP ": " DF_TARGET " is excluded.\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is more difficult for me to interpret quickly, and then I suppose get the log message perfectly accurate ("excluded" not always the case?). It seems the state may have changed to NEW, DOWN, or DOWNOUT. Maybe not worth perfecting it right now (if my understanding is accurate to begin with).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to "reverted". Please double check.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems good, though I think the change will be needed also in the D_PRINT() in addition to the D_INFO() call.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, sorry.

return -DER_NONEXIST;
}

rc = update_one_tgt(map, target, opc, &version, print_changes);
rc = update_one_tgt(pool_uuid, map, target, opc, &version, print_changes);
if (rc < 0)
return rc;

/* if the target status does not need to change */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure but maybe the comment is intended to say this?

Suggested change
/* if the target status does not need to change */
/* if the rank status does not need to change */

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to "rank".

liw added 2 commits January 30, 2025 13:54
Signed-off-by: Li Wei <[email protected]>
Required-githooks: true
@liw liw dismissed stale reviews from liuxuezhao and knard38 via 5f3ec8c January 30, 2025 05:48
Copy link
Contributor Author

@liw liw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kccain, thanks.

@@ -101,29 +106,28 @@ update_one_tgt(struct pool_map *map, struct pool_target *target,
case PO_COMP_ST_DRAIN:
case PO_COMP_ST_DOWNOUT:
/* Nothing to do, already excluded / draining */
D_INFO("Skip drain down target (rank %u idx %u)\n",
target->ta_comp.co_rank,
D_INFO(DF_MAP ": Skip drain down target (rank %u idx %u)\n",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any reason; changed to DF_TARGET.

return -DER_BUSY;
case PO_COMP_ST_UPIN:
D_DEBUG(DB_MD, "change "DF_TARGET" to DRAIN %p\n",
DP_TARGET(target), map);
D_DEBUG(DB_MD, DF_MAP ": change " DF_TARGET " to DRAIN %p\n",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed %p at the end of the message.

Since I did not design this code, I'll leave the potential removal of this D_DEBUG to you and @liuxuezhao.

return -DER_BUSY;
case PO_COMP_ST_DOWN:
target->ta_comp.co_flags |= PO_COMPF_DOWN2UP;
case PO_COMP_ST_DOWNOUT:
D_DEBUG(DB_MD, "change "DF_TARGET" to UP %p\n",
DP_TARGET(target), map);
D_DEBUG(DB_MD, DF_MAP ": change " DF_TARGET " to UP %p\n",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the %p at the end of the message.

@@ -213,12 +217,13 @@ update_one_tgt(struct pool_map *map, struct pool_target *target,
case PO_COMP_ST_NEW:
case PO_COMP_ST_UP:
/* Nothing to do, already UPIN */
D_INFO("Skip ADD_IN UPIN "DF_TARGET"\n",
D_INFO(DF_MAP ": Skip ADD_IN UPIN " DF_TARGET "\n", DP_MAP(pool_uuid, map),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to reflect the cases better. Please double check.

@@ -229,16 +234,17 @@ update_one_tgt(struct pool_map *map, struct pool_target *target,
case PO_COMP_ST_DOWNOUT:
case PO_COMP_ST_NEW:
/* Nothing to do, already UPIN */
D_INFO("Skip ADD_IN UPIN "DF_TARGET"\n",
D_INFO(DF_MAP ": Skip ADD_IN UPIN " DF_TARGET "\n", DP_MAP(pool_uuid, map),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to reflect the cases better. Please double check.

@@ -250,8 +256,8 @@ update_one_tgt(struct pool_map *map, struct pool_target *target,
case PO_COMP_ST_DOWN:
case PO_COMP_ST_NEW:
/* Nothing to do, already UPIN and DOWNOUT. DOWN can not be revert. */
D_INFO("Skip ADD_IN UPIN DOWN "DF_TARGET"\n",
DP_TARGET(target));
D_INFO(DF_MAP ": Skip ADD_IN UPIN DOWN " DF_TARGET "\n",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to reflect the cases better. Please double check.

target->ta_comp.co_status = PO_COMP_ST_NEW;
target->ta_comp.co_in_ver = 0;
++(*version);
} else {
D_DEBUG(DB_MD, "change "DF_TARGET" to DOWNOUT %p fseq %u\n",
DP_TARGET(target), map, target->ta_comp.co_fseq);
D_DEBUG(DB_MD, DF_MAP ": change " DF_TARGET " to DOWNOUT fseq %u\n",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed this to two messages, one for each if branch. Please check.

D_PRINT(DF_TARGET" is excluded.\n", DP_TARGET(target));
else
D_INFO(DF_TARGET" is excluded.\n", DP_TARGET(target));
D_PRINT(DF_MAP ": " DF_TARGET " is excluded.\n",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to "reverted". Please double check.

return -DER_NONEXIST;
}

rc = update_one_tgt(map, target, opc, &version, print_changes);
rc = update_one_tgt(pool_uuid, map, target, opc, &version, print_changes);
if (rc < 0)
return rc;

/* if the target status does not need to change */
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to "rank".

D_PRINT(DF_TARGET" is excluded.\n", DP_TARGET(target));
else
D_INFO(DF_TARGET" is excluded.\n", DP_TARGET(target));
D_PRINT(DF_MAP ": " DF_TARGET " is excluded.\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems good, though I think the change will be needed also in the D_PRINT() in addition to the D_INFO() call.

liw added 2 commits January 31, 2025 08:36
Signed-off-by: Li Wei <[email protected]>
Required-githooks: true
Allow-unstable-test: true
Required-githooks: true
@liw
Copy link
Contributor Author

liw commented Jan 31, 2025

Merged master to add Allow-unstable-test: true because of unrelated NLT warnings.

@daosbuild1
Copy link
Collaborator

daosbuild1 commented Jan 31, 2025

Test stage NLT on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-15744/4/testReport/

[liw] Unrelated Valgrind errors in command daos: DAOS-17016

@liw liw requested review from knard38 and liuxuezhao February 3, 2025 00:11
@liw liw requested a review from a team February 4, 2025 01:26
@liw
Copy link
Contributor Author

liw commented Feb 6, 2025

@daos-stack/daos-gatekeeper, please note that the latest build has been tested with Allow-unstable-test: true, and the only failure is the NLT Valgrind one (DAOS-17016).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

5 participants