Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DAOS-14073 dfuse: Move writeback caching from kernel to dfuse. #12729

Merged
merged 40 commits into from
Apr 15, 2024

Conversation

ashleypittman
Copy link
Contributor

@ashleypittman ashleypittman commented Jul 28, 2023

Have dfuse implement write-back caching rather than relying on the kernel.

Having the kernel do this means that the kernel assumes that it is the
single point of truth for both file size and mtime so it will disregard any
updates from dfuse which makes working across multiple clients very
difficult.

This change removes the kernel flag allowing it to perform writeback caching
but rather if enabled at the dfuse level then dfuse will acknowledge writes
to the kernel before daos/dfs has acknowledged them on the backend. This
gives better performance in the (default) writeback case as it means the kernel
does not make a setattr call to update the mtime after every write and it
re-instates same semantics where the kernel will handle updates from other
clients correctly.

The kernel writeback cache would also perform write coalescing of smaller writes
into larger ones, and dfuse no longer gets the benefit of that, this is something we
need to add back into dfuse as part of future work.

Signed-off-by: Ashley Pittman [email protected]

Test-tag: dfuse

Required-githooks: true

Signed-off-by: Ashley Pittman <[email protected]>
@github-actions
Copy link

github-actions bot commented Jul 28, 2023

Bug-tracker data:
Ticket title is 'Move write-back cache feature into dfuse.'
Status is 'In Review'
Labels: 'scrubbed'
https://daosio.atlassian.net/browse/DAOS-14073

Copy link
Collaborator

@daosbuild1 daosbuild1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. No errors found by checkpatch.

@daosbuild1
Copy link
Collaborator

Test stage Functional on EL 8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-12729/1/testReport/

Copy link
Collaborator

@daosbuild1 daosbuild1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. No errors found by checkpatch.

@daosbuild1
Copy link
Collaborator

Test stage Functional on EL 8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-12729/2/testReport/

Copy link
Collaborator

@daosbuild1 daosbuild1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. No errors found by checkpatch.

@daosbuild1
Copy link
Collaborator

Test stage Functional on EL 8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-12729/3/testReport/

Test-tag: test_dfuse_daos_build_wb

Required-githooks: true
Signed-off-by: Ashley Pittman <[email protected]>
Copy link
Collaborator

@daosbuild1 daosbuild1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. No errors found by checkpatch.

Required-githooks: true
Test-tag: test_dfuse_daos_build_wb

Skip-func-hw-test: true
Skip-func-test: true
Quick-Functional: true
Test-tag: dfuse
@daosbuild1
Copy link
Collaborator

Test stage Build DEB on Ubuntu 20.04 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-12729/5/execution/node/190/log

Required-githooks: true

Test-tag: test_dfuse_daos_build_wb

Signed-off-by: Ashley Pittman <[email protected]>
Copy link
Collaborator

@daosbuild1 daosbuild1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. No errors found by checkpatch.

@ashleypittman ashleypittman marked this pull request as ready for review September 22, 2023 14:24
@ashleypittman ashleypittman requested a review from a team as a code owner September 22, 2023 14:24
Copy link
Collaborator

@daosbuild1 daosbuild1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. No errors found by checkpatch.

Required-githooks: true
Signed-off-by: Ashley Pittman <[email protected]>
Copy link
Collaborator

@daosbuild1 daosbuild1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. No errors found by checkpatch.

Copy link
Collaborator

@daosbuild1 daosbuild1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. No errors found by checkpatch.

Test-tag: dfuse

Required-githooks: true

Signed-off-by: Ashley Pittman <[email protected]>
Copy link
Collaborator

@daosbuild1 daosbuild1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. No errors found by checkpatch.

@@ -138,6 +138,8 @@ dfuse_cb_release(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi)

DFUSE_TRA_DEBUG(oh, "Closing %d", oh->doh_caching);

DFUSE_IE_WFLUSH(oh->doh_ie);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is in open.c but actually the function is release which maps to close(), this line is what causes flush-on-close which we do want to keep, and won't affect the performance of open at all.

Features: dfuse

Required-githooks: true
Signed-off-by: Ashley Pittman <[email protected]>
Copy link

Ticket title is 'Move write-back cache feature into dfuse.'
Status is 'In Review'
Labels: 'scrubbed'
https://daosio.atlassian.net/browse/DAOS-14073

@daosbuild1
Copy link
Collaborator

Test stage Build RPM on EL 9 completed with status UNSTABLE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-12729/30/execution/node/314/log

@daosbuild1
Copy link
Collaborator

Test stage Build RPM on EL 8 completed with status UNSTABLE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-12729/30/execution/node/364/log

@daosbuild1
Copy link
Collaborator

Test stage Build RPM on Leap 15.5 completed with status UNSTABLE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-12729/30/execution/node/367/log

Features: dfuse
Required-githooks: true
@daosbuild1
Copy link
Collaborator

Test stage Functional on EL 8.8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-12729/31/execution/node/1173/log

@ashleypittman ashleypittman requested review from daltonbohning, johannlombardi and jolivier23 and removed request for jolivier23 April 5, 2024 19:35
@daosbuild1
Copy link
Collaborator

Test stage Functional Hardware Medium Verbs Provider completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-12729/33/execution/node/1540/log

@daosbuild1
Copy link
Collaborator

Test stage Functional Hardware Medium Verbs Provider completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-12729/34/execution/node/424/log

@ashleypittman ashleypittman requested a review from a team April 9, 2024 10:03
@daosbuild1
Copy link
Collaborator

Test stage Functional Hardware Medium completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-12729/35/testReport/

@daosbuild1
Copy link
Collaborator

Test stage Functional Hardware Medium completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-12729/36/execution/node/1405/log

@daosbuild1
Copy link
Collaborator

Test stage Functional Hardware Medium Verbs Provider completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-12729/36/execution/node/1501/log

@ashleypittman
Copy link
Contributor Author

@daos-stack/daos-gatekeeper can this be landed now? The most recent run with test-tags was https://build.hpdd.intel.com/job/daos-stack/job/daos/job/PR-12729/35/ and that passed with only DAOS-15616 failing.

@ashleypittman ashleypittman merged commit 4568c2d into master Apr 15, 2024
49 checks passed
@ashleypittman ashleypittman deleted the amd/dfuse-write-cache branch April 15, 2024 07:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

6 participants