Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DAOS-16951 dtx: add invalid records discard capability #15752

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

janekmi
Copy link
Contributor

@janekmi janekmi commented Jan 21, 2025

Guide's TLDR

  • Introduce a new vos_dtx_discard_invalid() API
  • Introduce a few accessory functions validating ILOG, SVT and EVT records
  • Make the vos_dtx_discard_invalid() API available via DDB
  • New unit test binaries (dtx_ut and ddb_ut)

Guide (full)

  • Introduce a new vos_dtx_discard_invalid() API
    • src/include/daos_srv/vos.h - add to the header file
    • src/vos/vos_dtx.c
      • core of the implementation goes here
      • Note: a few types and a helper function from this file have been moved. Please see src/vos/ilog_internal.h for details.
    • unit tests
      • src/dtx/tests/dts_discard_invalid.c - here you find all the unit tests for the vos_dtx_discard_invalid() API
      • src/dtx/tests/SConscript - because of the mocking requirements a new dtx_ut unit test binary has been introduced
      • src/dtx/tests/dtx_ut.c - greatly simplified copy of the dtx_tests.c - the entry point for all unit tests compiled in the binary
      • src/dtx/tests/dts_structs.c - because of the minimal requirements these tests fitted more to the new dtx_ut binary rather than to dtx_tests
  • Introduce a few accessory functions validating ILOG, SVT and EVT records
    • ILOG:
      • src/vos/ilog.h - header
      • src/vos/ilog.c - implementation
        • Note: A few things have been moved from here. Please see src/vos/ilog_internal.h for details.
      • src/vos/tests/vts_ilog.c - unit tests
        • src/vos/ilog_internal.h - introduced to easier access ILOG internals for testing purposes.
    • SVT:
      • src/vos/vos_internal.h - header
      • src/vos/vos_tree.c - implementation
      • src/vos/tests/vts_tree.c - unit tests
    • EVT:
      • src/include/daos_srv/evtree.h - header file
      • src/vos/evtree.c - impementation
      • src/vos/tests/vts_evtree.c - unit tests
  • Make the vos_dtx_discard_invalid() API available via DDB
    • src/utils/ddb/ddb_commands.c
      • ddb_run_dtx_act_discard_invalid() implementation
      • extend dtx_modify_init() to skip DTX ID initialization if dti_all is requested
      • src/utils/ddb/tests/ddb_commands_tests.c - simple tests
    • src/utils/ddb/ddb.h:
      • add a new command type and a new entry point - ddb_run_dtx_act_discard_invalid()
      • replace struct dtx_act_commit_options and struct dtx_act_abort_options with a single struct dtx_act_options which works for both dtx_act_commit and dtx_act_abort commands as well as the newly introduced dtx_act_discard_invalid
        • src/utils/ddb/tests/ddb_cmd_options_tests.c - update tests accordingly
    • src/utils/ddb/ddb.c:
      • dtx_act_option_parse() instead of dtx_act_commit_option_parse() and dtx_act_abort_option_parse() which works for both old and new dtx_act_* commands
      • connect command parsing and the new entry point
    • dv_dtx_active_entry_discard_invalid() proxy
      • src/utils/ddb/ddb_vos.h and src/utils/ddb/ddb_vos.c
      • src/utils/ddb/tests/ddb_vos_ut.c - unit tests
      • src/utils/ddb/tests/SConscript - because of the mocking requirements a new ddb_ut unit test binary has been introduced
      • src/utils/ddb/tests/ddb_ut.c - greatly simplified copy of the ddb_test_driver.c - the entry point for all unit tests compiled in the binary
  • New unit test binaries (dtx_ut and ddb_ut)
    • utils/utest.yaml - add them to standard execution outlets
    • utils/rpms/daos.spec - add them to the RPM package
      • utils/rpms/daos.rpmlintrc - add them as yet another binaries without manuals
    • debian/daos-server-tests.install - add them to the DEB package
      • debian/changelog - report above in the changelog

Before requesting gatekeeper:

  • Two review approvals and any prior change requests have been resolved.
  • Testing is complete and all tests passed or there is a reason documented in the PR why it should be force landed and forced-landing tag is set.
  • Features: (or Test-tag*) commit pragma was used or there is a reason documented that there are no appropriate tags for this PR.
  • Commit messages follows the guidelines outlined here.
  • Any tests skipped by the ticket being addressed have been run and passed in the PR.

Gatekeeper:

  • You are the appropriate gatekeeper to be landing the patch.
  • The PR has 2 reviews by people familiar with the code, including appropriate owners.
  • Githooks were used. If not, request that user install them and check copyright dates.
  • Checkpatch issues are resolved. Pay particular attention to ones that will show up on future PRs.
  • All builds have passed. Check non-required builds for any new compiler warnings.
  • Sufficient testing is done. Check feature pragmas and test tags and that tests skipped for the ticket are run and now pass with the changes.
  • If applicable, the PR has addressed any potential version compatibility issues.
  • Check the target branch. If it is master branch, should the PR go to a feature branch? If it is a release branch, does it have merge approval in the JIRA ticket.
  • Extra checks if forced landing is requested
    • Review comments are sufficiently resolved, particularly by prior reviewers that requested changes.
    • No new NLT or valgrind warnings. Check the classic view.
    • Quick-build or Quick-functional is not used.
  • Fix the commit message upon landing. Check the standard here. Edit it to create a single commit. If necessary, ask submitter for a new summary.

Copy link

github-actions bot commented Jan 21, 2025

Ticket title is 'DDB: add invalid DTX records discard capability'
Status is 'In Review'
https://daosio.atlassian.net/browse/DAOS-16951

Copy link
Contributor

@Nasf-Fan Nasf-Fan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The framework and the basic process are fine. The main concern is that for most of cases, the user does not know which DTX entry is corrupted. Means that related DTX ID is known. So for conveniency, the logic itself needs to iterate the active DTX table, and for each active DTX entry, calls ddb_run_dtx_act_discard_invalid().

On the other hand, please check the patch from Jeff #15747, he does similar repair with online mode. Currently, no decision about which one will be used finally, need more discussion.

Priority: 2
Required-githooks: true

Signed-off-by: Jan Michalski <[email protected]>
@janekmi janekmi force-pushed the jmichal/ddb-dtx-discard branch 5 times, most recently from 0ca769d to dc392b6 Compare January 22, 2025 08:53
@janekmi janekmi changed the title DAOS-16951 dtx: add invalid records discard capability (WIP) DAOS-16951 dtx: add invalid records discard capability Jan 22, 2025
@janekmi
Copy link
Contributor Author

janekmi commented Jan 22, 2025

The framework and the basic process are fine. The main concern is that for most of cases, the user does not know which DTX entry is corrupted. Means that related DTX ID is known. So for conveniency, the logic itself needs to iterate the active DTX table, and for each active DTX entry, calls ddb_run_dtx_act_discard_invalid().

On the other hand, please check the patch from Jeff #15747, he does similar repair with online mode. Currently, no decision about which one will be used finally, need more discussion.

I added an option to just set all or a specific DTX id in this case.

@janekmi janekmi marked this pull request as ready for review January 22, 2025 08:55
@janekmi janekmi requested review from a team as code owners January 22, 2025 08:55
@janekmi janekmi added CR Catastrophic Recovery Feature priority Ticket has high priority (automatically managed) release-2.6.3 Targeted for 2.6.3 labels Jan 22, 2025
@janekmi janekmi added this to the Release 2.6.3 milestone Jan 22, 2025
@janekmi janekmi force-pushed the jmichal/ddb-dtx-discard branch from dc392b6 to db92efd Compare January 22, 2025 10:09
@github-actions github-actions bot removed the priority Ticket has high priority (automatically managed) label Jan 22, 2025
@janekmi janekmi force-pushed the jmichal/ddb-dtx-discard branch from db92efd to 7b65089 Compare January 22, 2025 10:56
@Nasf-Fan
Copy link
Contributor

The basic logic for discarding dangling DTX reference is fine. But my concern is that whether all the changes are related with our main purpose of resolving existing dangling DTX reference? We are so closed to the 2.6.3 deadline, I am afraid of landing such a huge patch maybe some challenged, not sure, but if possible, please split unrelated parts into subsequent patch(es). That will be helpful.

@daos-stack daos-stack deleted a comment from daosbuild1 Jan 24, 2025
@daos-stack daos-stack deleted a comment from daosbuild1 Jan 24, 2025
@daos-stack daos-stack deleted a comment from daosbuild1 Jan 24, 2025
@daos-stack daos-stack deleted a comment from daosbuild1 Jan 24, 2025
@daos-stack daos-stack deleted a comment from daosbuild1 Jan 24, 2025
@daltonbohning daltonbohning removed this from the Release 2.6.3 milestone Jan 24, 2025
knard38
knard38 previously approved these changes Jan 24, 2025
src/control/cmd/ddb/ddb_commands.go Outdated Show resolved Hide resolved
src/vos/ilog.c Outdated Show resolved Hide resolved
src/vos/vos_dtx.c Outdated Show resolved Hide resolved
src/vos/vos_dtx.c Outdated Show resolved Hide resolved
Priority: 2
Required-githooks: true

Signed-off-by: Jan Michalski <[email protected]>
Priority: 2
Required-githooks: true

Signed-off-by: Jan Michalski <[email protected]>
jolivier23
jolivier23 previously approved these changes Jan 27, 2025
bool
vos_irec_is_valid(const struct vos_irec_df *svt, uint32_t dtx_lid)
{
if (svt == NULL) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, my assertion is that we could take one of those fields, rename it to "magic", set a magic number and then allow 0 or magic. We are essentially preventing corruption here so I think using whatever we can for validation is reasonable. @NiuYawei @Nasf-Fan have an opinion here?

@janekmi janekmi dismissed stale reviews from jolivier23 and knard38 via 24f5505 January 27, 2025 16:24
- disregard whether DAE is aborted/aborting
- discard inline records first

Priority: 2
Required-githooks: true

Signed-off-by: Jan Michalski <[email protected]>
@janekmi janekmi requested review from knard38 and jolivier23 January 27, 2025 17:49
osalyk
osalyk previously approved these changes Jan 28, 2025
Copy link
Contributor

@osalyk osalyk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Priority: 2

Signed-off-by: Jan Michalski <[email protected]>
Copy link
Contributor

@Nasf-Fan Nasf-Fan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can consider to improve the ability of detection SVT record properly in subsequent patch(es).

@janekmi janekmi requested a review from a team January 31, 2025 10:02
Copy link
Contributor

@tanabarr tanabarr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Go and build changes LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CR Catastrophic Recovery Feature release-2.6.3 Targeted for 2.6.3
Development

Successfully merging this pull request may close these issues.

8 participants