DAOS-15338 dfuse: Configure fuse to allow writes to populate cache. #13927

ashleypittman · 2024-03-04T17:43:35Z

Change dfuse data caching settings to allow writes to populate cache.
Track the data cache validity separately from attribute cache validity to
allow finer grained control.

On close of a file then speculatively call getattr to get updated size/mtime
information and use this as a reference for future dcache invalidation.

For getattr calls after close then use the pre-loaded stat data leading to
increased performance by moving the getattr off the critical path (only for
files which were written to, files which aren't do not invalidate the atttr cache).

Use the FUSE_CAP_EXPLICIT_INVAL_DATA kernel feature to instruct the
kernel not to automatically invalidate data in the size/mtime has changed, this
is now performed by dfuse.

Overall this is a performance improvement for stat-after-close but the major
change is that a write/read workflow will now correctly use the kernel page
cache. Previously writes were modifying mtime values and that was incorrectly
triggering the kernel to discard it's cache unnecessarily.

Add testing of write/read and read/read I/O patterns verified by the dfuse
statistics that data caching is working as expected.

Signed-off-by: Ashley Pittman [email protected]

github-actions · 2024-03-04T17:43:56Z

Ticket title is 'dfuse read from cache not working correctly.'
Status is 'In Review'
Labels: 'scrubbed_2.8,triaged'
https://daosio.atlassian.net/browse/DAOS-15338

daosbuild1 · 2024-03-04T17:55:01Z

Test stage Build RPM on EL 9 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13927/1/execution/node/353/log

daosbuild1 · 2024-03-04T17:56:03Z

Test stage Build RPM on Leap 15.5 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13927/1/execution/node/346/log

daosbuild1 · 2024-03-04T17:56:16Z

Test stage Build RPM on EL 8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13927/1/execution/node/349/log

daosbuild1 · 2024-03-04T17:56:45Z

Test stage Build DEB on Ubuntu 20.04 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13927/1/execution/node/314/log

daosbuild1 · 2024-03-04T18:08:15Z

Test stage Build RPM on EL 8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13927/2/execution/node/354/log

daosbuild1 · 2024-03-04T18:11:51Z

Test stage Build RPM on EL 9 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13927/2/execution/node/332/log

daosbuild1 · 2024-03-04T18:13:57Z

Test stage Build RPM on Leap 15.5 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13927/2/execution/node/327/log

daosbuild1 · 2024-03-04T18:15:41Z

Test stage Build DEB on Ubuntu 20.04 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13927/2/execution/node/270/log

daosbuild1 · 2024-03-05T11:42:38Z

Test stage Functional on EL 9 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13927/3/execution/node/740/log

daosbuild1 · 2024-03-05T11:49:48Z

Test stage Test RPMs on EL 8.6 completed with status FAILURE. https://build.hpdd.intel.com/job/daos-stack/job/daos/job/PR-13927/3/display/redirect

daosbuild1 · 2024-03-05T12:06:00Z

Test stage Functional on EL 9 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13927/4/execution/node/653/log

daosbuild1 · 2024-03-05T12:13:32Z

Test stage Test RPMs on EL 8.6 completed with status FAILURE. https://build.hpdd.intel.com/job/daos-stack/job/daos/job/PR-13927/4/display/redirect

daosbuild1 · 2024-03-05T13:12:40Z

Test stage Functional on EL 9 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-13927/5/testReport/

daosbuild1 · 2024-03-05T13:20:21Z

Test stage Test RPMs on EL 8.6 completed with status FAILURE. https://build.hpdd.intel.com/job/daos-stack/job/daos/job/PR-13927/5/display/redirect

daosbuild1 · 2024-03-05T13:58:53Z

Test stage Test RPMs on EL 8.6 completed with status FAILURE. https://build.hpdd.intel.com/job/daos-stack/job/daos/job/PR-13927/6/display/redirect

daosbuild1 · 2024-03-05T14:41:47Z

Test stage Test RPMs on EL 8.6 completed with status FAILURE. https://build.hpdd.intel.com/job/daos-stack/job/daos/job/PR-13927/7/display/redirect

daosbuild1 · 2024-03-05T16:09:33Z

Test stage Build RPM on EL 9 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13927/8/execution/node/324/log

daosbuild1 · 2024-03-05T16:09:35Z

Test stage Build RPM on Leap 15.5 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13927/8/execution/node/325/log

daosbuild1 · 2024-03-05T16:59:38Z

Test stage Test RPMs on EL 8.6 completed with status FAILURE. https://build.hpdd.intel.com/job/daos-stack/job/daos/job/PR-13927/9/display/redirect

daosbuild1 · 2024-03-05T17:28:57Z

Test stage Functional on EL 8.8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13927/10/execution/node/751/log

daosbuild1 · 2024-03-05T17:29:28Z

Test stage Functional on EL 9 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13927/10/execution/node/767/log

daosbuild1 · 2024-03-05T17:50:05Z

Test stage Functional on EL 8.8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13927/11/execution/node/747/log

daosbuild1 · 2024-03-05T17:50:18Z

Test stage Functional on EL 9 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13927/11/execution/node/722/log

daosbuild1 · 2024-03-08T08:44:19Z

Test stage Functional on EL 9 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13927/13/execution/node/751/log

daosbuild1 · 2024-03-08T08:44:32Z

Test stage Functional on EL 8.8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13927/13/execution/node/767/log

daosbuild1 · 2024-03-08T09:12:02Z

Test stage Functional on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-13927/14/testReport/

daosbuild1 · 2024-03-08T09:40:44Z

Test stage Functional on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-13927/15/testReport/

wiliamhuang · 2024-05-01T17:38:37Z

src/client/dfuse/dfuse_core.c

+	D_ASSERT(max_age != -1);
+	D_ASSERT(max_age >= 0);


Minor issue, if "max_age >= 0", then the first assertion always is true. Do we need the first assertion?

True, we don't. I've used this check elsewhere in the code though, for some cache types (data) then -1 is a valid cache age so the reason for having two checks here is that if it does fail we know why, was the cache incorrectly initialised/loaded in which case it'll be -1 or was a specific age incorrectly calculated in which case it's be a different negative value.

I agree. Thank you for your clarification!

wiliamhuang · 2024-05-01T17:53:31Z

src/client/dfuse/dfuse_core.c

@@ -1048,6 +1064,51 @@ dfuse_mcache_get_valid(struct dfuse_inode_entry *ie, double max_age, double *tim
 	return use;
 }

+/* Set a timer to mark cache entry as valid */


minor: "timer"-->"time"?

Features: dfuse Signed-off-by: Ashley Pittman <[email protected]>

Test-tag: test_dfuse_caching_check Skip-fault-injection-test: true Skip-unit-tests: true Signed-off-by: Ashley Pittman <[email protected]>

Features: dfuse,-test_dfuse_daos_build_wt_pil4dfs Signed-off-by: Ashley Pittman <[email protected]>

daosbuild1 · 2024-05-02T20:55:09Z

Test stage Functional Hardware Medium Verbs Provider completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13927/72/execution/node/1436/log

knard38

Mostly LGTM for what I understand.
Some points to clarify as I am not used to this part of the code.

knard38 · 2024-05-03T07:32:37Z

src/client/dfuse/dfuse.h

+	 *
+	 * Future accesses of the inode should check active, if the value is 0 then there is nothing
+	 * to do.
+	 * If active is positive then it should increate active, wait on the semaphore, decrease


Suggested change

* If active is positive then it should increate active, wait on the semaphore, decrease

* If active is positive then it should increase active, wait on the semaphore, decrease

knard38 · 2024-05-03T07:36:13Z

src/client/dfuse/dfuse_core.c

+
+			timeout = max(timeout, dfc->dfc_dentry_timeout);
+
+			dfc->dfc_dentry_inval_time = timeout + 3;


Why +3?
A named constant could help to understand the purpose of this addition.

knard38 · 2024-05-03T07:58:02Z

src/client/dfuse/dfuse_core.c

+			 * correct.  Data timeout can be set to True/-1 so cap this duration at
+			 * ten minutes or nothing would ever get evicted.
+			 */
+			timeout = max(dfc->dfc_attr_timeout, dfc->dfc_data_timeout);


I would have expected min instead of max for L895 and L899, but I am probably missing something.

knard38 · 2024-05-03T08:51:54Z

src/client/dfuse/dfuse_fuseops.c

@@ -164,6 +176,22 @@ df_ll_getattr(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi)
 		DFUSE_IE_STAT_ADD(inode, DS_GETATTR);
 	}

+	/* Check for stat-after-write/close.  On close a stat is performed so the first getattr
+	 * call can use the result of that.


Probably dummy questions:

why only the first getattr() can use the result ?

If after the close(), an open() is done before getattr() is called. It seems, that it will be able to use the cache. Is this use case could lead to some inconstancies ?

knard38 · 2024-05-03T09:07:04Z

src/client/dfuse/ops/open.c

+
+	sem_post(&eqt->de_sem);
+
+	return;


In case of success, the resources such as ev are not freed ?

mchaarawi

as discussed in WG, the stat on close is pretty significant hit performance wise and to the DAOS cluster itself in case of large deployments and widely striped files

jolivier23 · 2024-05-31T12:54:28Z

@mchaarawi having the extra asynchronous getattr on close will be a big win for many of the first impression applications (such as when people try to clone and compiler in a dfuse mount). In such cases, fuse will do subsequent getattr anyway, this just caches it preemptively and, more importantly, keeps the write cache for the subsequent reads. For performance sensitive cases, I suspect the extra asynchronous stat is somewhat negligible compared to the expected normal overhead of dfuse. It already does a million getattr's.

jolivier23 · 2024-05-31T13:39:03Z

Actually, maybe we can augment dfs_write to return the max mtime for the chunks it writes? And we can approximate the size based on what we wrote and what we already know?

mchaarawi · 2024-05-31T13:53:17Z

@mchaarawi having the extra asynchronous getattr on close will be a big win for many of the first impression applications (such as when people try to clone and compiler in a dfuse mount). In such cases, fuse will do subsequent getattr anyway, this just caches it preemptively and, more importantly, keeps the write cache for the subsequent reads. For performance sensitive cases, I suspect the extra asynchronous stat is somewhat negligible compared to the expected normal overhead of dfuse. It already does a million getattr's.

im pretty sure that this does serve some use cases. im worried though about use cases that are just now doing an extra stat on close.. say an app that just writes a file and never accesses it again till much later. the stat in this case of widely stripped files if there are a bunch of them, would be bad. so yes this probably improves 1 use case but can be problemetic in others

mchaarawi · 2024-05-31T13:55:42Z

Actually, maybe we can augment dfs_write to return the max mtime for the chunks it writes? And we can approximate the size based on what we wrote and what we already know?

the io path should never be modified IMO to do extra metadata things like this.

jolivier23 · 2024-05-31T13:56:41Z

@mchaarawi having the extra asynchronous getattr on close will be a big win for many of the first impression applications (such as when people try to clone and compiler in a dfuse mount). In such cases, fuse will do subsequent getattr anyway, this just caches it preemptively and, more importantly, keeps the write cache for the subsequent reads. For performance sensitive cases, I suspect the extra asynchronous stat is somewhat negligible compared to the expected normal overhead of dfuse. It already does a million getattr's.

im pretty sure that this does serve some use cases. im worried though about use cases that are just now doing an extra stat on close.. say an app that just writes a file and never accesses it again till much later. the stat in this case of widely stripped files if there are a bunch of them, would be bad. so yes this probably improves 1 use case but can be problemetic in others

Along the lines of my next comment, what if we locally tracked size and mtime on the client object and allowed the application to query the values. This would be accurate in the cases I'm talking about and just get invalidated for others.

In other words the local object would keep track of locally highest offset and mtime

mchaarawi · 2024-05-31T14:23:44Z

@mchaarawi having the extra asynchronous getattr on close will be a big win for many of the first impression applications (such as when people try to clone and compiler in a dfuse mount). In such cases, fuse will do subsequent getattr anyway, this just caches it preemptively and, more importantly, keeps the write cache for the subsequent reads. For performance sensitive cases, I suspect the extra asynchronous stat is somewhat negligible compared to the expected normal overhead of dfuse. It already does a million getattr's.

im pretty sure that this does serve some use cases. im worried though about use cases that are just now doing an extra stat on close.. say an app that just writes a file and never accesses it again till much later. the stat in this case of widely stripped files if there are a bunch of them, would be bad. so yes this probably improves 1 use case but can be problemetic in others

Along the lines of my next comment, what if we locally tracked size and mtime on the client object and allowed the application to query the values. This would be a good approximation in the cases I'm talking about and just get invalidated for others.

In other words the local object would keep track of locally highest offset and mtime

the mtime we use today there is based on server time stamps (the max epoch thing). so if dfuse wants to use it's own timestamps to do this, there might be issues of client and servers are not properly synchronized.
but in a single case dfuse instance, it might not be a big deal?
we cannot do his at the dfs level because it we currently do not do any caching there. but we might do that in the future.

Test-tag: dfuse

daosbuild1 · 2024-09-26T14:46:58Z

Test stage Build RPM on EL 9 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13927/73/execution/node/317/log

daosbuild1 · 2024-09-26T14:47:41Z

Test stage Build RPM on EL 8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13927/73/execution/node/314/log

daosbuild1 · 2024-09-26T14:48:13Z

Test stage Build RPM on Leap 15.5 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13927/73/execution/node/273/log

daosbuild1 · 2024-09-26T14:52:24Z

Test stage Build on EL 8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13927/73/execution/node/453/log

daosbuild1 · 2024-09-26T14:53:27Z

Test stage Build DEB on Ubuntu 20.04 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13927/73/execution/node/357/log

daosbuild1 · 2024-09-26T14:55:17Z

Test stage Build on Leap 15.5 with Intel-C and TARGET_PREFIX completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13927/73/execution/node/519/log

Test-tag: dfuse Signed-off-by: Ashley Pittman <[email protected]>

daosbuild1 · 2024-09-26T15:41:09Z

Test stage NLT on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-13927/74/testReport/

Test-tag: dfuse Signed-off-by: Ashley Pittman <[email protected]>

daosbuild1 · 2024-09-30T10:27:35Z

Test stage NLT on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-13927/75/testReport/

daosbuild1 · 2024-09-30T18:53:37Z

Test stage Functional Hardware Medium completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-13927/76/testReport/

ashleypittman force-pushed the amd/dfuse-read-cache branch from b4df7ca to ba43309 Compare March 5, 2024 11:20

ashleypittman force-pushed the amd/dfuse-read-cache branch from 5014502 to a5cbf45 Compare March 8, 2024 08:26

wiliamhuang reviewed May 1, 2024

View reviewed changes

ashleypittman added 2 commits May 2, 2024 07:22

Merge branch 'master' into amd/dfuse-read-cache

973ec34

Features: dfuse Signed-off-by: Ashley Pittman <[email protected]>

Tidy up test code.

1a48745

Test-tag: test_dfuse_caching_check Skip-fault-injection-test: true Skip-unit-tests: true Signed-off-by: Ashley Pittman <[email protected]>

ashleypittman dismissed jolivier23’s stale review via 1a48745 May 2, 2024 07:55

Run with features.

7039849

Features: dfuse,-test_dfuse_daos_build_wt_pil4dfs Signed-off-by: Ashley Pittman <[email protected]>

ashleypittman force-pushed the amd/dfuse-read-cache branch from ee85062 to 7039849 Compare May 2, 2024 11:26

knard38 reviewed May 3, 2024

View reviewed changes

mchaarawi requested changes May 3, 2024

View reviewed changes

Merge branch 'master' into amd/dfuse-read-cache

52124db

Test-tag: dfuse

Fix merge.

bce5f4e

Test-tag: dfuse Signed-off-by: Ashley Pittman <[email protected]>

ashleypittman added 2 commits September 30, 2024 09:45

Bump test runtime.

fc82b21

Test-tag: dfuse Signed-off-by: Ashley Pittman <[email protected]>

Update spelling.

288e2f6

Test-tag: dfuse Signed-off-by: Ashley Pittman <[email protected]>

johannlombardi requested a review from wangdi1 December 18, 2024 00:07

	* If active is positive then it should increate active, wait on the semaphore, decrease
	* If active is positive then it should increase active, wait on the semaphore, decrease


		timeout = max(timeout, dfc->dfc_dentry_timeout);

		dfc->dfc_dentry_inval_time = timeout + 3;

DAOS-15338 dfuse: Configure fuse to allow writes to populate cache. #13927

Are you sure you want to change the base?

DAOS-15338 dfuse: Configure fuse to allow writes to populate cache. #13927

Conversation

ashleypittman commented Mar 4, 2024 • edited Loading

github-actions bot commented Mar 4, 2024 • edited Loading

daosbuild1 commented Mar 4, 2024

daosbuild1 commented Mar 4, 2024

daosbuild1 commented Mar 4, 2024

daosbuild1 commented Mar 4, 2024

daosbuild1 commented Mar 4, 2024

daosbuild1 commented Mar 4, 2024

daosbuild1 commented Mar 4, 2024

daosbuild1 commented Mar 4, 2024

daosbuild1 commented Mar 5, 2024

daosbuild1 commented Mar 5, 2024

daosbuild1 commented Mar 5, 2024

daosbuild1 commented Mar 5, 2024

daosbuild1 commented Mar 5, 2024

daosbuild1 commented Mar 5, 2024

daosbuild1 commented Mar 5, 2024

daosbuild1 commented Mar 5, 2024

daosbuild1 commented Mar 5, 2024

daosbuild1 commented Mar 5, 2024

daosbuild1 commented Mar 5, 2024

daosbuild1 commented Mar 5, 2024

daosbuild1 commented Mar 5, 2024

daosbuild1 commented Mar 5, 2024

daosbuild1 commented Mar 5, 2024

daosbuild1 commented Mar 8, 2024

daosbuild1 commented Mar 8, 2024

daosbuild1 commented Mar 8, 2024

daosbuild1 commented Mar 8, 2024

wiliamhuang May 1, 2024 • edited Loading

Choose a reason for hiding this comment

ashleypittman May 2, 2024

Choose a reason for hiding this comment

wiliamhuang May 2, 2024

Choose a reason for hiding this comment

wiliamhuang May 1, 2024

Choose a reason for hiding this comment

daosbuild1 commented May 2, 2024

knard38 left a comment

Choose a reason for hiding this comment

knard38 May 3, 2024

Choose a reason for hiding this comment

knard38 May 3, 2024 • edited Loading

Choose a reason for hiding this comment

knard38 May 3, 2024 • edited Loading

Choose a reason for hiding this comment

knard38 May 3, 2024

Choose a reason for hiding this comment

knard38 May 3, 2024

Choose a reason for hiding this comment

mchaarawi left a comment • edited Loading

Choose a reason for hiding this comment

jolivier23 commented May 31, 2024

jolivier23 commented May 31, 2024

mchaarawi commented May 31, 2024 • edited Loading

mchaarawi commented May 31, 2024

jolivier23 commented May 31, 2024 • edited Loading

mchaarawi commented May 31, 2024

daosbuild1 commented Sep 26, 2024

daosbuild1 commented Sep 26, 2024

daosbuild1 commented Sep 26, 2024

daosbuild1 commented Sep 26, 2024

daosbuild1 commented Sep 26, 2024

daosbuild1 commented Sep 26, 2024

daosbuild1 commented Sep 26, 2024

daosbuild1 commented Sep 30, 2024

daosbuild1 commented Sep 30, 2024

ashleypittman commented Mar 4, 2024 •

edited

Loading

github-actions bot commented Mar 4, 2024 •

edited

Loading

wiliamhuang May 1, 2024 •

edited

Loading

knard38 May 3, 2024 •

edited

Loading

knard38 May 3, 2024 •

edited

Loading

mchaarawi left a comment •

edited

Loading

mchaarawi commented May 31, 2024 •

edited

Loading

jolivier23 commented May 31, 2024 •

edited

Loading