Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DAOS-15338 dfuse: Configure fuse to allow writes to populate cache. #13927

Open
wants to merge 59 commits into
base: master
Choose a base branch
from

Conversation

ashleypittman
Copy link
Contributor

@ashleypittman ashleypittman commented Mar 4, 2024

Change dfuse data caching settings to allow writes to populate cache.
Track the data cache validity separately from attribute cache validity to
allow finer grained control.

On close of a file then speculatively call getattr to get updated size/mtime
information and use this as a reference for future dcache invalidation.

For getattr calls after close then use the pre-loaded stat data leading to
increased performance by moving the getattr off the critical path (only for
files which were written to, files which aren't do not invalidate the atttr cache).

Use the FUSE_CAP_EXPLICIT_INVAL_DATA kernel feature to instruct the
kernel not to automatically invalidate data in the size/mtime has changed, this
is now performed by dfuse.

Overall this is a performance improvement for stat-after-close but the major
change is that a write/read workflow will now correctly use the kernel page
cache. Previously writes were modifying mtime values and that was incorrectly
triggering the kernel to discard it's cache unnecessarily.

Add testing of write/read and read/read I/O patterns verified by the dfuse
statistics that data caching is working as expected.

Signed-off-by: Ashley Pittman [email protected]

Copy link

github-actions bot commented Mar 4, 2024

Ticket title is 'dfuse read from cache not working correctly.'
Status is 'In Review'
Labels: 'scrubbed_2.8,triaged'
https://daosio.atlassian.net/browse/DAOS-15338

@daosbuild1
Copy link
Collaborator

Test stage Build RPM on EL 9 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13927/1/execution/node/353/log

@daosbuild1
Copy link
Collaborator

Test stage Build RPM on Leap 15.5 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13927/1/execution/node/346/log

@daosbuild1
Copy link
Collaborator

Test stage Build RPM on EL 8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13927/1/execution/node/349/log

@daosbuild1
Copy link
Collaborator

Test stage Build DEB on Ubuntu 20.04 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13927/1/execution/node/314/log

@daosbuild1
Copy link
Collaborator

Test stage Build RPM on EL 8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13927/2/execution/node/354/log

@daosbuild1
Copy link
Collaborator

Test stage Build RPM on EL 9 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13927/2/execution/node/332/log

@daosbuild1
Copy link
Collaborator

Test stage Build RPM on Leap 15.5 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13927/2/execution/node/327/log

@daosbuild1
Copy link
Collaborator

Test stage Build DEB on Ubuntu 20.04 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13927/2/execution/node/270/log

@ashleypittman ashleypittman force-pushed the amd/dfuse-read-cache branch from b4df7ca to ba43309 Compare March 5, 2024 11:20
@daosbuild1
Copy link
Collaborator

Test stage Functional on EL 9 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13927/3/execution/node/740/log

@daosbuild1
Copy link
Collaborator

Test stage Test RPMs on EL 8.6 completed with status FAILURE. https://build.hpdd.intel.com/job/daos-stack/job/daos/job/PR-13927/3/display/redirect

@daosbuild1
Copy link
Collaborator

Test stage Functional on EL 9 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13927/4/execution/node/653/log

@daosbuild1
Copy link
Collaborator

Test stage Test RPMs on EL 8.6 completed with status FAILURE. https://build.hpdd.intel.com/job/daos-stack/job/daos/job/PR-13927/4/display/redirect

@daosbuild1
Copy link
Collaborator

Test stage Functional on EL 9 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-13927/5/testReport/

@daosbuild1
Copy link
Collaborator

Test stage Test RPMs on EL 8.6 completed with status FAILURE. https://build.hpdd.intel.com/job/daos-stack/job/daos/job/PR-13927/5/display/redirect

@daosbuild1
Copy link
Collaborator

Test stage Test RPMs on EL 8.6 completed with status FAILURE. https://build.hpdd.intel.com/job/daos-stack/job/daos/job/PR-13927/6/display/redirect

@daosbuild1
Copy link
Collaborator

Test stage Test RPMs on EL 8.6 completed with status FAILURE. https://build.hpdd.intel.com/job/daos-stack/job/daos/job/PR-13927/7/display/redirect

@daosbuild1
Copy link
Collaborator

Test stage Build RPM on EL 9 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13927/8/execution/node/324/log

@daosbuild1
Copy link
Collaborator

Test stage Build RPM on Leap 15.5 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13927/8/execution/node/325/log

@daosbuild1
Copy link
Collaborator

Test stage Test RPMs on EL 8.6 completed with status FAILURE. https://build.hpdd.intel.com/job/daos-stack/job/daos/job/PR-13927/9/display/redirect

@daosbuild1
Copy link
Collaborator

Test stage Functional on EL 8.8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13927/10/execution/node/751/log

@daosbuild1
Copy link
Collaborator

Test stage Functional on EL 9 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13927/10/execution/node/767/log

@daosbuild1
Copy link
Collaborator

Test stage Functional on EL 8.8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13927/11/execution/node/747/log

@daosbuild1
Copy link
Collaborator

Test stage Functional on EL 9 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13927/11/execution/node/722/log

@ashleypittman ashleypittman force-pushed the amd/dfuse-read-cache branch from 5014502 to a5cbf45 Compare March 8, 2024 08:26
@daosbuild1
Copy link
Collaborator

Test stage Functional on EL 9 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13927/13/execution/node/751/log

@daosbuild1
Copy link
Collaborator

Test stage Functional on EL 8.8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13927/13/execution/node/767/log

@daosbuild1
Copy link
Collaborator

Test stage Functional on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-13927/14/testReport/

@daosbuild1
Copy link
Collaborator

Test stage Functional on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-13927/15/testReport/

Comment on lines +1085 to +1086
D_ASSERT(max_age != -1);
D_ASSERT(max_age >= 0);
Copy link
Contributor

@wiliamhuang wiliamhuang May 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor issue, if "max_age >= 0", then the first assertion always is true. Do we need the first assertion?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, we don't. I've used this check elsewhere in the code though, for some cache types (data) then -1 is a valid cache age so the reason for having two checks here is that if it does fail we know why, was the cache incorrectly initialised/loaded in which case it'll be -1 or was a specific age incorrectly calculated in which case it's be a different negative value.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. Thank you for your clarification!

@@ -1048,6 +1064,51 @@ dfuse_mcache_get_valid(struct dfuse_inode_entry *ie, double max_age, double *tim
return use;
}

/* Set a timer to mark cache entry as valid */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: "timer"-->"time"?

Features: dfuse
Signed-off-by: Ashley Pittman <[email protected]>
Test-tag: test_dfuse_caching_check
Skip-fault-injection-test: true
Skip-unit-tests: true

Signed-off-by: Ashley Pittman <[email protected]>
Features: dfuse,-test_dfuse_daos_build_wt_pil4dfs

Signed-off-by: Ashley Pittman <[email protected]>
@ashleypittman ashleypittman force-pushed the amd/dfuse-read-cache branch from ee85062 to 7039849 Compare May 2, 2024 11:26
@daosbuild1
Copy link
Collaborator

Test stage Functional Hardware Medium Verbs Provider completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13927/72/execution/node/1436/log

Copy link
Contributor

@knard38 knard38 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly LGTM for what I understand.
Some points to clarify as I am not used to this part of the code.

*
* Future accesses of the inode should check active, if the value is 0 then there is nothing
* to do.
* If active is positive then it should increate active, wait on the semaphore, decrease
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* If active is positive then it should increate active, wait on the semaphore, decrease
* If active is positive then it should increase active, wait on the semaphore, decrease


timeout = max(timeout, dfc->dfc_dentry_timeout);

dfc->dfc_dentry_inval_time = timeout + 3;
Copy link
Contributor

@knard38 knard38 May 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why +3?
A named constant could help to understand the purpose of this addition.

* correct. Data timeout can be set to True/-1 so cap this duration at
* ten minutes or nothing would ever get evicted.
*/
timeout = max(dfc->dfc_attr_timeout, dfc->dfc_data_timeout);
Copy link
Contributor

@knard38 knard38 May 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would have expected min instead of max for L895 and L899, but I am probably missing something.

@@ -164,6 +176,22 @@ df_ll_getattr(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi)
DFUSE_IE_STAT_ADD(inode, DS_GETATTR);
}

/* Check for stat-after-write/close. On close a stat is performed so the first getattr
* call can use the result of that.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably dummy questions:

  • why only the first getattr() can use the result ?
  • If after the close(), an open() is done before getattr() is called. It seems, that it will be able to use the cache. Is this use case could lead to some inconstancies ?


sem_post(&eqt->de_sem);

return;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case of success, the resources such as ev are not freed ?

Copy link
Contributor

@mchaarawi mchaarawi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as discussed in WG, the stat on close is pretty significant hit performance wise and to the DAOS cluster itself in case of large deployments and widely striped files

@jolivier23
Copy link
Contributor

@mchaarawi having the extra asynchronous getattr on close will be a big win for many of the first impression applications (such as when people try to clone and compiler in a dfuse mount). In such cases, fuse will do subsequent getattr anyway, this just caches it preemptively and, more importantly, keeps the write cache for the subsequent reads. For performance sensitive cases, I suspect the extra asynchronous stat is somewhat negligible compared to the expected normal overhead of dfuse. It already does a million getattr's.

@jolivier23
Copy link
Contributor

Actually, maybe we can augment dfs_write to return the max mtime for the chunks it writes? And we can approximate the size based on what we wrote and what we already know?

@mchaarawi
Copy link
Contributor

mchaarawi commented May 31, 2024

@mchaarawi having the extra asynchronous getattr on close will be a big win for many of the first impression applications (such as when people try to clone and compiler in a dfuse mount). In such cases, fuse will do subsequent getattr anyway, this just caches it preemptively and, more importantly, keeps the write cache for the subsequent reads. For performance sensitive cases, I suspect the extra asynchronous stat is somewhat negligible compared to the expected normal overhead of dfuse. It already does a million getattr's.

im pretty sure that this does serve some use cases. im worried though about use cases that are just now doing an extra stat on close.. say an app that just writes a file and never accesses it again till much later. the stat in this case of widely stripped files if there are a bunch of them, would be bad. so yes this probably improves 1 use case but can be problemetic in others

@mchaarawi
Copy link
Contributor

Actually, maybe we can augment dfs_write to return the max mtime for the chunks it writes? And we can approximate the size based on what we wrote and what we already know?

the io path should never be modified IMO to do extra metadata things like this.

@jolivier23
Copy link
Contributor

jolivier23 commented May 31, 2024

@mchaarawi having the extra asynchronous getattr on close will be a big win for many of the first impression applications (such as when people try to clone and compiler in a dfuse mount). In such cases, fuse will do subsequent getattr anyway, this just caches it preemptively and, more importantly, keeps the write cache for the subsequent reads. For performance sensitive cases, I suspect the extra asynchronous stat is somewhat negligible compared to the expected normal overhead of dfuse. It already does a million getattr's.

im pretty sure that this does serve some use cases. im worried though about use cases that are just now doing an extra stat on close.. say an app that just writes a file and never accesses it again till much later. the stat in this case of widely stripped files if there are a bunch of them, would be bad. so yes this probably improves 1 use case but can be problemetic in others

Along the lines of my next comment, what if we locally tracked size and mtime on the client object and allowed the application to query the values. This would be accurate in the cases I'm talking about and just get invalidated for others.

In other words the local object would keep track of locally highest offset and mtime

@mchaarawi
Copy link
Contributor

@mchaarawi having the extra asynchronous getattr on close will be a big win for many of the first impression applications (such as when people try to clone and compiler in a dfuse mount). In such cases, fuse will do subsequent getattr anyway, this just caches it preemptively and, more importantly, keeps the write cache for the subsequent reads. For performance sensitive cases, I suspect the extra asynchronous stat is somewhat negligible compared to the expected normal overhead of dfuse. It already does a million getattr's.

im pretty sure that this does serve some use cases. im worried though about use cases that are just now doing an extra stat on close.. say an app that just writes a file and never accesses it again till much later. the stat in this case of widely stripped files if there are a bunch of them, would be bad. so yes this probably improves 1 use case but can be problemetic in others

Along the lines of my next comment, what if we locally tracked size and mtime on the client object and allowed the application to query the values. This would be a good approximation in the cases I'm talking about and just get invalidated for others.

In other words the local object would keep track of locally highest offset and mtime

the mtime we use today there is based on server time stamps (the max epoch thing). so if dfuse wants to use it's own timestamps to do this, there might be issues of client and servers are not properly synchronized.
but in a single case dfuse instance, it might not be a big deal?
we cannot do his at the dfs level because it we currently do not do any caching there. but we might do that in the future.

@daosbuild1
Copy link
Collaborator

Test stage Build RPM on EL 9 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13927/73/execution/node/317/log

@daosbuild1
Copy link
Collaborator

Test stage Build RPM on EL 8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13927/73/execution/node/314/log

@daosbuild1
Copy link
Collaborator

Test stage Build RPM on Leap 15.5 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13927/73/execution/node/273/log

@daosbuild1
Copy link
Collaborator

@daosbuild1
Copy link
Collaborator

Test stage Build DEB on Ubuntu 20.04 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13927/73/execution/node/357/log

@daosbuild1
Copy link
Collaborator

Test stage Build on Leap 15.5 with Intel-C and TARGET_PREFIX completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13927/73/execution/node/519/log

Test-tag: dfuse
Signed-off-by: Ashley Pittman <[email protected]>
@daosbuild1
Copy link
Collaborator

Test stage NLT on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-13927/74/testReport/

Test-tag: dfuse
Signed-off-by: Ashley Pittman <[email protected]>
Test-tag: dfuse
Signed-off-by: Ashley Pittman <[email protected]>
@daosbuild1
Copy link
Collaborator

Test stage NLT on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-13927/75/testReport/

@daosbuild1
Copy link
Collaborator

Test stage Functional Hardware Medium completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-13927/76/testReport/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

7 participants