-
Notifications
You must be signed in to change notification settings - Fork 308
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DAOS-15338 dfuse: Configure fuse to allow writes to populate cache. #13927
base: master
Are you sure you want to change the base?
Conversation
Ticket title is 'dfuse read from cache not working correctly.' |
Test stage Build RPM on EL 9 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13927/1/execution/node/353/log |
Test stage Build RPM on Leap 15.5 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13927/1/execution/node/346/log |
Test stage Build RPM on EL 8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13927/1/execution/node/349/log |
Test stage Build DEB on Ubuntu 20.04 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13927/1/execution/node/314/log |
Test stage Build RPM on EL 8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13927/2/execution/node/354/log |
Test stage Build RPM on EL 9 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13927/2/execution/node/332/log |
Test stage Build RPM on Leap 15.5 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13927/2/execution/node/327/log |
Test stage Build DEB on Ubuntu 20.04 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13927/2/execution/node/270/log |
b4df7ca
to
ba43309
Compare
Test stage Functional on EL 9 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13927/3/execution/node/740/log |
Test stage Test RPMs on EL 8.6 completed with status FAILURE. https://build.hpdd.intel.com/job/daos-stack/job/daos/job/PR-13927/3/display/redirect |
Test stage Functional on EL 9 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13927/4/execution/node/653/log |
Test stage Test RPMs on EL 8.6 completed with status FAILURE. https://build.hpdd.intel.com/job/daos-stack/job/daos/job/PR-13927/4/display/redirect |
Test stage Functional on EL 9 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-13927/5/testReport/ |
Test stage Test RPMs on EL 8.6 completed with status FAILURE. https://build.hpdd.intel.com/job/daos-stack/job/daos/job/PR-13927/5/display/redirect |
Test stage Test RPMs on EL 8.6 completed with status FAILURE. https://build.hpdd.intel.com/job/daos-stack/job/daos/job/PR-13927/6/display/redirect |
Test stage Test RPMs on EL 8.6 completed with status FAILURE. https://build.hpdd.intel.com/job/daos-stack/job/daos/job/PR-13927/7/display/redirect |
Test stage Build RPM on EL 9 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13927/8/execution/node/324/log |
Test stage Build RPM on Leap 15.5 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13927/8/execution/node/325/log |
Test stage Test RPMs on EL 8.6 completed with status FAILURE. https://build.hpdd.intel.com/job/daos-stack/job/daos/job/PR-13927/9/display/redirect |
Test stage Functional on EL 8.8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13927/10/execution/node/751/log |
Test stage Functional on EL 9 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13927/10/execution/node/767/log |
Test stage Functional on EL 8.8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13927/11/execution/node/747/log |
Test stage Functional on EL 9 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13927/11/execution/node/722/log |
5014502
to
a5cbf45
Compare
Test stage Functional on EL 9 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13927/13/execution/node/751/log |
Test stage Functional on EL 8.8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13927/13/execution/node/767/log |
Test stage Functional on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-13927/14/testReport/ |
Test stage Functional on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-13927/15/testReport/ |
D_ASSERT(max_age != -1); | ||
D_ASSERT(max_age >= 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor issue, if "max_age >= 0", then the first assertion always is true. Do we need the first assertion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, we don't. I've used this check elsewhere in the code though, for some cache types (data) then -1 is a valid cache age so the reason for having two checks here is that if it does fail we know why, was the cache incorrectly initialised/loaded in which case it'll be -1 or was a specific age incorrectly calculated in which case it's be a different negative value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. Thank you for your clarification!
@@ -1048,6 +1064,51 @@ dfuse_mcache_get_valid(struct dfuse_inode_entry *ie, double max_age, double *tim | |||
return use; | |||
} | |||
|
|||
/* Set a timer to mark cache entry as valid */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor: "timer"-->"time"?
Features: dfuse Signed-off-by: Ashley Pittman <[email protected]>
Test-tag: test_dfuse_caching_check Skip-fault-injection-test: true Skip-unit-tests: true Signed-off-by: Ashley Pittman <[email protected]>
Features: dfuse,-test_dfuse_daos_build_wt_pil4dfs Signed-off-by: Ashley Pittman <[email protected]>
ee85062
to
7039849
Compare
Test stage Functional Hardware Medium Verbs Provider completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13927/72/execution/node/1436/log |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly LGTM for what I understand.
Some points to clarify as I am not used to this part of the code.
src/client/dfuse/dfuse.h
Outdated
* | ||
* Future accesses of the inode should check active, if the value is 0 then there is nothing | ||
* to do. | ||
* If active is positive then it should increate active, wait on the semaphore, decrease |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* If active is positive then it should increate active, wait on the semaphore, decrease | |
* If active is positive then it should increase active, wait on the semaphore, decrease |
|
||
timeout = max(timeout, dfc->dfc_dentry_timeout); | ||
|
||
dfc->dfc_dentry_inval_time = timeout + 3; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why +3?
A named constant could help to understand the purpose of this addition.
* correct. Data timeout can be set to True/-1 so cap this duration at | ||
* ten minutes or nothing would ever get evicted. | ||
*/ | ||
timeout = max(dfc->dfc_attr_timeout, dfc->dfc_data_timeout); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would have expected min instead of max for L895 and L899, but I am probably missing something.
@@ -164,6 +176,22 @@ df_ll_getattr(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi) | |||
DFUSE_IE_STAT_ADD(inode, DS_GETATTR); | |||
} | |||
|
|||
/* Check for stat-after-write/close. On close a stat is performed so the first getattr | |||
* call can use the result of that. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably dummy questions:
- why only the first
getattr()
can use the result ? - If after the
close()
, anopen()
is done beforegetattr()
is called. It seems, that it will be able to use the cache. Is this use case could lead to some inconstancies ?
|
||
sem_post(&eqt->de_sem); | ||
|
||
return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case of success, the resources such as ev
are not freed ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as discussed in WG, the stat on close is pretty significant hit performance wise and to the DAOS cluster itself in case of large deployments and widely striped files
@mchaarawi having the extra asynchronous getattr on close will be a big win for many of the first impression applications (such as when people try to clone and compiler in a dfuse mount). In such cases, fuse will do subsequent getattr anyway, this just caches it preemptively and, more importantly, keeps the write cache for the subsequent reads. For performance sensitive cases, I suspect the extra asynchronous stat is somewhat negligible compared to the expected normal overhead of dfuse. It already does a million getattr's. |
Actually, maybe we can augment dfs_write to return the max mtime for the chunks it writes? And we can approximate the size based on what we wrote and what we already know? |
im pretty sure that this does serve some use cases. im worried though about use cases that are just now doing an extra stat on close.. say an app that just writes a file and never accesses it again till much later. the stat in this case of widely stripped files if there are a bunch of them, would be bad. so yes this probably improves 1 use case but can be problemetic in others |
the io path should never be modified IMO to do extra metadata things like this. |
Along the lines of my next comment, what if we locally tracked size and mtime on the client object and allowed the application to query the values. This would be accurate in the cases I'm talking about and just get invalidated for others. In other words the local object would keep track of locally highest offset and mtime |
the mtime we use today there is based on server time stamps (the max epoch thing). so if dfuse wants to use it's own timestamps to do this, there might be issues of client and servers are not properly synchronized. |
Test-tag: dfuse
Test stage Build RPM on EL 9 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13927/73/execution/node/317/log |
Test stage Build RPM on EL 8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13927/73/execution/node/314/log |
Test stage Build RPM on Leap 15.5 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13927/73/execution/node/273/log |
Test stage Build on EL 8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13927/73/execution/node/453/log |
Test stage Build DEB on Ubuntu 20.04 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13927/73/execution/node/357/log |
Test stage Build on Leap 15.5 with Intel-C and TARGET_PREFIX completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13927/73/execution/node/519/log |
Test-tag: dfuse Signed-off-by: Ashley Pittman <[email protected]>
Test stage NLT on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-13927/74/testReport/ |
Test-tag: dfuse Signed-off-by: Ashley Pittman <[email protected]>
Test-tag: dfuse Signed-off-by: Ashley Pittman <[email protected]>
Test stage NLT on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-13927/75/testReport/ |
Test stage Functional Hardware Medium completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-13927/76/testReport/ |
Change dfuse data caching settings to allow writes to populate cache.
Track the data cache validity separately from attribute cache validity to
allow finer grained control.
On close of a file then speculatively call getattr to get updated size/mtime
information and use this as a reference for future dcache invalidation.
For getattr calls after close then use the pre-loaded stat data leading to
increased performance by moving the getattr off the critical path (only for
files which were written to, files which aren't do not invalidate the atttr cache).
Use the FUSE_CAP_EXPLICIT_INVAL_DATA kernel feature to instruct the
kernel not to automatically invalidate data in the size/mtime has changed, this
is now performed by dfuse.
Overall this is a performance improvement for stat-after-close but the major
change is that a write/read workflow will now correctly use the kernel page
cache. Previously writes were modifying mtime values and that was incorrectly
triggering the kernel to discard it's cache unnecessarily.
Add testing of write/read and read/read I/O patterns verified by the dfuse
statistics that data caching is working as expected.
Signed-off-by: Ashley Pittman [email protected]