DAOS-16877 client: implement utilities for shared memory #15613

wiliamhuang · 2024-12-13T17:39:22Z

Features: shm

use tlsf as memory allocator
shared memory create/destroy
robust mutex in shared memory
hash table in shared memory

Required-githooks: true
Skipped-githooks: codespell

Before requesting gatekeeper:

Two review approvals and any prior change requests have been resolved.
Testing is complete and all tests passed or there is a reason documented in the PR why it should be force landed and forced-landing tag is set.
Features: (or Test-tag*) commit pragma was used or there is a reason documented that there are no appropriate tags for this PR.
Commit messages follows the guidelines outlined here.
Any tests skipped by the ticket being addressed have been run and passed in the PR.

Gatekeeper:

Features: shm 1. use tlsf as memory allocator 2. shared memory create/destroy 3. robust mutex in shared memory 4. hash table in shared memory Required-githooks: true Skipped-githooks: codespell Signed-off-by: Lei Huang <[email protected]>

github-actions · 2024-12-13T17:39:38Z

Ticket title is 'To implement node-wise caching with shared memory '
Status is 'Open'
https://daosio.atlassian.net/browse/DAOS-16877

daosbuild1 · 2024-12-13T22:33:35Z

Test stage Functional on EL 8.8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15613/1/execution/node/1210/log

Features: shm Required-githooks: true Skipped-githooks: codespell Signed-off-by: Lei Huang <[email protected]>

…hm_mutex Features: shm Required-githooks: true Skipped-githooks: codespell Signed-off-by: Lei Huang <[email protected]>

Features: shm Required-githooks: true Skipped-githooks: codespell Signed-off-by: Lei Huang <[email protected]>

wiliamhuang · 2024-12-23T15:17:42Z

Some component (e.g., hash table record reference count) will be revised later when we add more use cases.

wiliamhuang · 2024-12-23T15:18:37Z

Currently we use tlsf memory allocator. It will be replaced by our own allocator in future.

phender · 2025-01-02T13:50:34Z

src/tests/ftest/daos_test/shm.py

+        job.assign_hosts(cmocka_utils.hosts)
+        job.assign_environment(daos_test_env)
+
+        cmocka_utils.run_cmocka_test(self, job)


Normally we run the cmocka tests, like daos_test, via DaosCoreBase.run_subtest(), which sets up additional environment variables and configures the dmg command. Do we need any of that here? Note: In its current form the run_subtest() method uses Orterun to run the daos_test command remotely.

As a requirement for adding this test we should also run it with the faults-enabled: false commit pragma to ensure that it will run when we attempt a release build.

@phender Thank you very much! I wrote shm.py and shm.yaml with dfuse.py and dfuse.yaml as templates. "shm_test" does not need to configure dmg command. Additional environment variables might be in added in futures tests with "daos_test_env" here.
Thank you for your tip of using "faults-enabled: false". I will use it in future. I used "Features: shm" previously to run the new test.

phender · 2025-01-02T14:21:21Z

src/tests/ftest/daos_test/shm.yaml

+pool:
+  scm_size: 1G
+container:
+  type: POSIX
+  control_method: daos


This isn't used by the test. A typical cmocka test would use the pool entry information, but only when the test is run via DaosCoreBase.run_subtest():

https://github.com/daos-stack/daos/blob/master/src/tests/ftest/util/daos_core_base.py#L68-L69

https://github.com/daos-stack/daos/blob/master/src/tests/ftest/util/daos_core_base.py#L89-L90

The new test "shm_test" does not need scm_size and nvme_size information.

I think what @phender means is since this test does not use DaosCoreBase.run_subtest(), the entire pool and container keys here are not used and can be removed

... or if we use DaosCoreBase.run_subtest() we can keep it

@daltonbohning @phender Thank you very much! I will try removing pool and container keys locally to make sure it works. I thought they are required.

@daltonbohning @phender You are right. We can remove pool and container keys in yaml file as you suggested. I will update it in next commit. Thank you!

knard38

I have not yet finish the review process, but I still have several concerns and questions regarding this PR.

knard38 · 2025-01-08T09:04:11Z

src/tests/ftest/daos_test/shm.py

@@ -0,0 +1,44 @@
+"""


From my understanding this test is more a unit test and thus should probably be run with the utils/run_utest.py python script instead of by the functional test framework.
@phender and @daltonbohning what is your opinion on this point ?

In general, yes. If the same test can be ran as a unit test (low cost, quick) instead of a functional test (higher cost, slower) then we should run it as a unit test

@knard38 @daltonbohning Thank you very much! I will look into running this test as a unit test.

@wiliamhuang , If it can help, I recently added some unit tests in the following PR:
https://github.com/daos-stack/daos/pull/14713/files#diff-294ea4ccb7880cabe2a9a4ffadd3c709916da304a39a819c82f23bfe06197a61

Yes. Current tests are simple. They could fit as a unit test. More complex tests will be added in future. We can add shared memory related ftest later when we need.

@wiliamhuang , If it can help, I recently added some unit tests in the following PR: https://github.com/daos-stack/daos/pull/14713/files#diff-294ea4ccb7880cabe2a9a4ffadd3c709916da304a39a819c82f23bfe06197a61

@knard38 Thank you very much! It's very helpful.

knard38 · 2025-01-08T10:08:38Z

src/gurt/shm_alloc.c

+	/* failed to open */
+	if (shm_ht_fd == -1) {
+		if (errno == ENOENT) {
+			goto create_shm;


NIT, could improve readability to put this code in a dedicated function instead of goto ?

You are right. I will create a function for shm creation. Thank you!

knard38 · 2025-01-08T10:32:15Z

src/include/gurt/shm_alloc.h

+	uint64_t         shm_pool_size;
+	/* reserved for future usage */
+	char             reserved[256];
+};


Should be needed if we want to have the mmaped address space to be well aligned

Suggested change

};

} __attribute__((aligned(PAGE_SIZE)));

Thank you! The address returned by mmap() always is page aligned.

knard38 · 2025-01-08T10:37:49Z

src/gurt/shm_alloc.c

+		shm_pool_size = shm_size / N_SHM_POOL;
+		if (shm_pool_size % 4096)
+			/* make shm_pool_size 4K aligned */
+			shm_pool_size += (4096 - (shm_pool_size % 4096));


To support different architectures and page size configuration, the value 4096 should probably be defined with a macro such as PAGE_SIZE

Suggested change

shm_pool_size += (4096 - (shm_pool_size % 4096));

shm_pool_size += (PAGE_SIZE - (shm_pool_size % PAGE_SIZE));

Thank you! I will update it as you suggested to make it more portable.

knard38 · 2025-01-08T13:50:15Z

src/gurt/shm_alloc.c

+
+	/* map existing shared memory */
+	shm_addr = mmap(FIXED_SHM_ADDR, shm_size, PROT_READ | PROT_WRITE,
+			MAP_SHARED | MAP_FIXED, shm_ht_fd, 0);


Using fixed memory location seems to be strongly discouraged by the man page.
Not sure to understand how we will be sure why it is needed and how we will be sure that it will not overlap existing mmaps.

I agree. Fixed memory location is a strong limitation. It comes from the memory allocator we use. We could eliminate this limitation later once we have our own memory allocator supporting shared memory management.

i also don't get why using MAP_FIXED and the fixed address. using MAP_FIXED can cause undefined behavior if the address is actually in use by something else, no?
maybe i don't get the requirement why you need this.

You are right.
The requirement of using fix same address across processes is due to memory allocator we use. This is a quick and dirty way to allow us to use existing memory allocator for now. In future we need to implement our own memory allocator to natively support shared memory management, then the limitation could be removed.

i am wondering if we should block on this before landing. im worried about this limitation and what issues can arise because of that.

I agree that this fixed mapping address limit should be removed before landing to master. Maybe ok for feature branch? This would allow us to work on caching while we implement our own memory allocator.

src/gurt/shm_alloc.c

knard38 · 2025-01-08T14:26:49Z

src/gurt/shm_alloc.c

+	char daos_shm_file_name[128];
+
+	sprintf(daos_shm_file_name, "/dev/shm/%s_%d", daos_shm_name, getuid());
+	unlink(daos_shm_file_name);


NIT,
From my understanding shm_unlink() and shm_link() are equivalent, but using shm_unlink() with the fd opened with shm_open() seems to be more understandable to me: explicitly indicating that we are closing a file descriptor get with shm_open()

You are right. I will replace unlink() with shm_unlink(). Thank you!

Fixed as suggested. Thank you!

src/gurt/shm_alloc.c

knard38 · 2025-01-08T15:06:08Z

src/gurt/shm_alloc.c

+
+	atomic_fetch_add_relaxed(&(d_shm_head->ref_count), -1);
+	if (pid != pid_shm_creator)
+		munmap(d_shm_head, d_shm_head->size);


From my understanding, all the process should unmap and then close the file.
The shared mmaped file content should be keept until the unlink() and shm_unlink() will be called.
Thus, not unmapping and closing with the process creating the shared memory file seems to be useless.
I am also concerned that it could be seen as memoy leak by valgrind or other memory checker tools.

Considering the cache in kernel space, I thought we may want to keep our cache persistent too. Otherwise, shared memory needs to be initialized again and again. Ideally, the space for caching would be freed after the content expires. We need our own shared memory allocator to dynamically expand/shrink shared memory region. It would be a long way to get there.
Yes. I did have a little concern about whether valgrind can detect the memory leak in shared memory usage here. I could play with valgrind to find out with simple test.

Not sure that it will change something for the kernel cache to keep a dangling pointer when the process will die.
From my understanding what will make the difference is to unlink the file.
@mchaarawi do you have an opinion on this ?

One extreme case, the caching does not have any benefit at all if a user runs job serially in case we destruct caching once application ends.

Maybe, I am missing something, but if you remove the shared file then keeping a dangling pointer will not help if you have unlink the file. On the other hand, if you do not unlink the file, the cache will still be available even if you do not have a dangling pointer at the end your applications. From my understanding the live of the cache is managed by the kernel and it will be removed when the file will be unlinked.
However, it is perfectly that I am missing something obvious.

shm_destroy() is not called by regular applications. It was called only in the shm_test. The file associated with shared memory will not be unlinked.
I talked to Mohamad recently. He suggested daos_agent will initialize and destruct shared memory region. We will update it later.

not sure i quite understood the discussion. the cache needs to be persistent beyond one process lifetime and can be destroyed when the agent is killed.
but as i understand each procecss needs to call shm_open(), mmap(), close(), munmap() on the shared memory region, and only the agent calls shm_unlink().
the agent part can be done later.

From y understanding, the mmap address is now always close at line 209.
Let the agent calls she_unlink() is also OK for me.
If I am write, then it is OK for me.

Thank you!
To be clear, line 209,
"close(shm_ht_fd);"
It only frees the file descriptor.

knard38 · 2025-01-08T17:56:19Z

src/gurt/shm_alloc.c

+		if (shm_pool_size % 4096)
+			/* make shm_pool_size 4K aligned */
+			shm_pool_size += (4096 - (shm_pool_size % 4096));
+		shm_size = shm_pool_size * N_SHM_POOL + sizeof(struct d_shm_hdr);


In fact my remarks on aligning the struct d_shm_addr was to have a sizeof struct which will be a multiple of the PAGE_SIZE otherwise shm_size could be not a multiple of PAGE_SIZE from my understanding. Then the shared pools (following the header), could be not aligned on PAGE_SIZE (from my understanding).
However, I have not check that the align attribute will properly change the sizeof. I will check this asap.

"shm_size" does not have be a multiple of PAGE_SIZE. mmap() does not require size to be a multiple of PAGE_SIZE. The memory allocator will use some space in the pool too. I am not sure making shm_size a multiple of PAGE_SIZE will bring noticeable benefit. Maybe some performance tests could help to clarify later.

From what I know it is indeed always better to have aligned memory for performance.
Moreover, from what I understand, the size really allocated by mmap() will be the same.
The only difference is that the padding will be added at the end of allocated memory instead of being between the struct d_shm_hdr and the first shm_pool.
In any case, this is not a blocker from my side.

mchaarawi

just some quick comments.
still need to review more closely. will do that soon

mchaarawi · 2025-01-08T17:07:48Z

src/gurt/shm_alloc.c

+		return 0;
+	}
+
+	rc = d_getenv_uint64_t("DAOS_SHM_SIZE", &shm_size);


we probably should have something different than an env variable to determine the size. i previosly implemented a utility that grabs that from the agent. i can integrate that into the branch later and replace this.

mchaarawi · 2025-01-08T17:14:25Z

src/gurt/shm_alloc.c

+
+	/* map existing shared memory */
+	shm_addr = mmap(FIXED_SHM_ADDR, shm_size, PROT_READ | PROT_WRITE,
+			MAP_SHARED | MAP_FIXED, shm_ht_fd, 0);


i also don't get why using MAP_FIXED and the fixed address. using MAP_FIXED can cause undefined behavior if the address is actually in use by something else, no?
maybe i don't get the requirement why you need this.

mchaarawi · 2025-01-08T17:19:17Z

src/gurt/shm_alloc.c

+
+	/* the shared memory only accessible for individual user for now */
+	sprintf(daos_shm_name_buf, "%s_%d", daos_shm_name, getuid());
+open_rw:


would it work if we just use O_CREAT without O_EXCL? so you don't need the try_open then try_create semantics here?

I used "O_CREAT | O_EXCL" to make sure only one process will initialize shared memory. Without "O_EXCL" would allow more than one concurrent processes to initialize shared memory.

mchaarawi · 2025-01-08T18:00:32Z

src/tests/suite/SConscript

+    shm_test_env = base_env.Clone()
+    shm_test_env.compiler_setup()
+    shm_test_env.AppendUnique(LIBPATH=[Dir('../../gurt')])
+    shm_test_env.AppendUnique(LIBPATH=[Dir('../../common')])
+    shm_test_env.AppendUnique(LIBPATH=[Dir('../../cart')])
+    shmtest = shm_test_env.d_program(File("shm_test.c"), LIBS=['gurt', 'daos_common', 'cart',
+                                     'cmocka', 'rt', 'pthread'])
+    denv.Install('$PREFIX/bin/', shmtest)
+


i don't think you need a DAOS server to run those tests, right?
so probably adding those as unit tests in gurt will be more appropriate

Right. Current tests are quite simple. DAOS server is not needed. I will change the tests as unit tests. We will add ftest later when necessary. Thank you!

mchaarawi · 2025-01-08T18:01:40Z

src/include/gurt/shm_alloc.h

+#ifndef __DAOS_SHM_ALLOC_H__
+#define __DAOS_SHM_ALLOC_H__
+
+#include <stdint.h>


it's probably better to create 1 header for this rather than multiple public headers that users of this module need to always include.

Thank you very much! Just fixed it. Now only necessary APIs are exposed in this header file.

Required-githooks: true Skipped-githooks: codespell Signed-off-by: Lei Huang <[email protected]>

daosbuild1 · 2025-01-09T07:14:55Z

Test stage Unit Test with memcheck on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-15613/6/testReport/

Required-githooks: true Skipped-githooks: codespell Signed-off-by: Lei Huang <[email protected]>

Required-githooks: true Signed-off-by: Lei Huang <[email protected]>

daosbuild1 · 2025-01-14T17:32:25Z

Test stage Unit Test with memcheck on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-15613/14/testReport/

Required-githooks: true Signed-off-by: Lei Huang <[email protected]>

knard38 · 2025-01-15T15:37:43Z

src/include/gurt/shm_internal.h

From what I understand of the best DAOS practice, the internal header should not be located in the src/include directory to not be visible by the end user. But I could be wrong on this point.

ok. I will move it to src/gurt. Thank you!

Fixed as suggested. Thank you!

knard38 · 2025-01-15T15:41:10Z

src/include/gurt/shm_internal.h

+	_Atomic int      ref_count;
+	/* global counter used for round robin picking memory allocator for large memory request */
+	_Atomic uint64_t large_mem_count;
+	/* array of pointors to memory allocators */


NIT

Suggested change

/* array of pointors to memory allocators */

/* array of pointers to memory allocators */

A good catch. Thank you! Will fix it.

This is fixed. Thank you!

knard38 · 2025-01-15T15:43:42Z

src/gurt/shm_dict.c

Instead of creating a new hash map, could it not be possible to update the current DAOS htable implementation with your shared memory new features ?

I agree. I inclined to take current DAOS htable implementation and modify it to fit shared memory at the beginning. I decided to implement hash table in shared memory from scratch after I realized many parts are not compatible.

Fair enough, I will thus have a more in depth look to this file.

i did attempt to do that myself too and got the same conclusion as Lei before.

mchaarawi · 2025-02-04T18:30:58Z

src/gurt/shm_alloc.c

+		DS_ERROR(errno, "ftruncate() failed for shm_ht_fd");
+		goto err;
+	}
+	/* map the shared memory at a fixed address for now. We will remove this limit later. */


can we add some more details for this in the comment as to why FIXED is needed for the allocator.

ok. I will add more details as suggested. Thank you!

mchaarawi · 2025-02-04T18:38:44Z

src/gurt/shm_alloc.c

+			else
+				return rc;
+		} else {
+			DS_ERROR(errno, "unexpected error shm_open()");


im curious why the need for retry in this case?
also this should be a warn not an error in this case.

I thought retry might make the code more robust.
I just went through all possible errno returned by shm_open(). Retry should be not needed. I will change DS_ERROR to DS_WARN. Thank you!

The retry code was removed. DS_ERROR is kept to log the error of shm_open().

mchaarawi · 2025-02-04T18:40:37Z

src/gurt/shm_alloc.c

+
+	/* map existing shared memory */
+	shm_addr = mmap(FIXED_SHM_ADDR, shm_size, PROT_READ | PROT_WRITE,
+			MAP_SHARED | MAP_FIXED, shm_ht_fd, 0);


i am wondering if we should block on this before landing. im worried about this limitation and what issues can arise because of that.

src/gurt/shm_alloc.c

mchaarawi · 2025-02-04T18:49:40Z

src/gurt/shm_alloc.c

+
+	atomic_fetch_add_relaxed(&(d_shm_head->ref_count), -1);
+	if (pid != pid_shm_creator)
+		munmap(d_shm_head, d_shm_head->size);


not sure i quite understood the discussion. the cache needs to be persistent beyond one process lifetime and can be destroyed when the agent is killed.
but as i understand each procecss needs to call shm_open(), mmap(), close(), munmap() on the shared memory region, and only the agent calls shm_unlink().
the agent part can be done later.

mchaarawi · 2025-02-04T18:50:17Z

src/gurt/shm_dict.c

i did attempt to do that myself too and got the same conclusion as Lei before.

2. use lock when destroy hash table 3. fix misused lock when destroy hash table 4. free allocated memory when inserting ht record fails Allow-unstable-test: true Signed-off-by: Lei Huang <[email protected]>

daosbuild1 · 2025-02-05T05:32:45Z

Test stage Build RPM on EL 9 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15613/17/execution/node/356/log

daosbuild1 · 2025-02-05T05:33:40Z

Test stage Build RPM on EL 8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15613/17/execution/node/334/log

daosbuild1 · 2025-02-05T05:34:23Z

Test stage Build RPM on Leap 15.5 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15613/17/execution/node/357/log

daosbuild1 · 2025-02-05T05:37:55Z

Test stage Build on EL 8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15613/17/execution/node/453/log

daosbuild1 · 2025-02-05T05:39:40Z

Test stage Build DEB on Ubuntu 20.04 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15613/17/execution/node/351/log

daosbuild1 · 2025-02-05T05:47:00Z

Test stage Build on Leap 15.5 with Intel-C and TARGET_PREFIX completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15613/17/execution/node/519/log

Allow-unstable-test: true Signed-off-by: Lei Huang <[email protected]>

daosbuild1 · 2025-02-05T14:34:15Z

Test stage NLT on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-15613/19/testReport/

daosbuild1 · 2025-02-10T21:50:18Z

Test stage Functional Hardware Medium Verbs Provider completed with status FAILURE. https://build.hpdd.intel.com/job/daos-stack/job/daos/job/PR-15613/19/display/redirect

knard38 · 2025-02-11T16:13:29Z

src/gurt/shm_alloc.c

+	atomic_store_relaxed(&(d_shm_head->large_mem_count), 0);
+	d_shm_head->size          = shm_size;
+	d_shm_head->shm_pool_size = shm_pool_size;
+	d_shm_head->magic         = DSM_MAGIC;


To avoid compiler optimization and out of order execution issues, a memory barrier should probably be used here. Otherwise, the test at line 199 could be true without the d_shm_head properly initialized.

Suggested change

d_shm_head->magic = DSM_MAGIC;

__sync_synchronize() ;

d_shm_head->magic = DSM_MAGIC;

Thank you! This is fixed as you suggested.

knard38 · 2025-02-12T07:15:51Z

src/gurt/shm_alloc.c

+	d_shm_head->shm_pool_size = shm_pool_size;
+	d_shm_head->magic         = DSM_MAGIC;
+	/* initialization is finished now. */
+	return 0;


Why are you not munmap() and close() the shared memory as it will be re-open and close into shm_init() ?

Thank you! Now fd is closed here. After creating shared memory region, it returns immediately instead of calling munmap and mmap.

src/gurt/shm_alloc.c

knard38 · 2025-02-12T07:26:33Z

src/gurt/shm_alloc.c

+
+	atomic_fetch_add_relaxed(&(d_shm_head->ref_count), -1);
+	if (pid != pid_shm_creator)
+		munmap(d_shm_head, d_shm_head->size);


From y understanding, the mmap address is now always close at line 209.
Let the agent calls she_unlink() is also OK for me.
If I am write, then it is OK for me.

knard38 · 2025-02-12T07:52:31Z

src/gurt/shm_dict.c

+	}
+
+	/* This hash table does not exist, then create it. */
+	*ht_head = shm_alloc(sizeof(struct d_shm_ht_head) + (sizeof(d_shm_mutex_t) * n_lock) +


Sound strange to me to allow to have multiple htable with the same name but with different size and/or number of locks. Moreover, it will not be compliant with the function get_shm_ht_with_name()

Now error (EINVAL) is returned if hash table exists but with different parameters. Thank you!

knard38 · 2025-02-12T08:13:42Z

src/gurt/shm_dict.c

+	ht_head_loc = *ht_head;
+
+	memcpy(ht_head_loc->ht_name, name, len_name + 1);
+	ht_head_loc->n_bucket = n_bucket;


At least we should check if n_bucket is greater or equal than n_lock.
We could also check if n_bucket % n_lock == 0 as it will ensure an even number of buckets per lock and also not need to use float to find the lock index.

knard38 · 2025-02-12T08:42:49Z

src/gurt/shm_dict.c

+	idx        = hash & (ht_head->n_bucket - 1);
+	idx_lock   = (unsigned int)(idx * ht_head->n_lock * 1.0f / ht_head->n_bucket);
+	p_ht_lock  = (d_shm_mutex_t *)((char *)ht_head + sizeof(struct d_shm_ht_head));
+	p_off_list = (long int *)((char *)p_ht_lock + sizeof(d_shm_mutex_t) * ht_head->n_lock);


seems to have concurrency issue with shm_ht_destroy() function.
The same apply for most of the other functions working on records and get_shm_ht_with_name() functions.

Thank you very much! I just made some improvements about possible issue of accessing hash table head. I am working on mitigating the concurrency of hash table record. I will push my commit soon.

2. avoid fixed mapping address for shared memory region 3. add more hash table tests Required-githooks: true Signed-off-by: Lei Huang <[email protected]>

daosbuild1 · 2025-02-19T14:41:34Z

Test stage Build on Leap 15.5 with Intel-C and TARGET_PREFIX completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15613/20/execution/node/571/log

Required-githooks: true Signed-off-by: Lei Huang <[email protected]>

daosbuild1 · 2025-02-19T17:16:04Z

Test stage Build on Leap 15.5 with Intel-C and TARGET_PREFIX completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15613/21/execution/node/571/log

Required-githooks: true Signed-off-by: Lei Huang <[email protected]>

daosbuild1 · 2025-02-21T07:00:51Z

Test stage Build on Leap 15.5 with Intel-C and TARGET_PREFIX completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-15613/22/execution/node/571/log

wiliamhuang mentioned this pull request Dec 13, 2024

DAOS-16877 client: implement utilities for shared memory #15612

Closed

18 tasks

wiliamhuang added 3 commits December 15, 2024 16:05

use atomic operation to avoid locking and import python function

88210eb

Features: shm Required-githooks: true Skipped-githooks: codespell Signed-off-by: Lei Huang <[email protected]>

revert the support of flock with rwlock and wrap pthread_mutex with s…

fe7d780

…hm_mutex Features: shm Required-githooks: true Skipped-githooks: codespell Signed-off-by: Lei Huang <[email protected]>

fix format issues and add env DAOS_SHM_SIZE

c1acb9d

Features: shm Required-githooks: true Skipped-githooks: codespell Signed-off-by: Lei Huang <[email protected]>

wiliamhuang marked this pull request as ready for review December 23, 2024 15:15

wiliamhuang requested review from a team as code owners December 23, 2024 15:15

wiliamhuang requested review from mchaarawi, daltonbohning and knard38 December 23, 2024 15:15

phender requested changes Jan 2, 2025

View reviewed changes

knard38 reviewed Jan 8, 2025

View reviewed changes

mchaarawi reviewed Jan 8, 2025

View reviewed changes

run as utest instead of ftest

989be37

Required-githooks: true Skipped-githooks: codespell Signed-off-by: Lei Huang <[email protected]>

wiliamhuang added 5 commits January 9, 2025 14:57

relax time cutoff considering the overhead from valgrind

cbb6611

Required-githooks: true Skipped-githooks: codespell Signed-off-by: Lei Huang <[email protected]>

free exe_path to avoid memory leak

036f7b8

Required-githooks: true Skipped-githooks: codespell Signed-off-by: Lei Huang <[email protected]>

only expose necessary APIs in shm_alloc.h

2dd3943

Required-githooks: true Skipped-githooks: codespell Signed-off-by: Lei Huang <[email protected]>

update copyright info

a67889a

Required-githooks: true Signed-off-by: Lei Huang <[email protected]>

increase utest memcheck timeout in Jenkins configuration

bfdc419

Required-githooks: true Signed-off-by: Lei Huang <[email protected]>

relax threshold value to avoid issues on slow CI vm nodes

495a4f5

Required-githooks: true Signed-off-by: Lei Huang <[email protected]>

knard38 reviewed Jan 15, 2025

View reviewed changes

wiliamhuang requested a review from mchaarawi January 15, 2025 16:19

mchaarawi reviewed Feb 4, 2025

View reviewed changes

1. add comments about why using fixed mapping address

cce3fc6

2. use lock when destroy hash table 3. fix misused lock when destroy hash table 4. free allocated memory when inserting ht record fails Allow-unstable-test: true Signed-off-by: Lei Huang <[email protected]>

wiliamhuang added 2 commits February 5, 2025 13:17

revert unneeded change

00d00a2

Allow-unstable-test: true Signed-off-by: Lei Huang <[email protected]>

fix format issues

e1c2c0e

Allow-unstable-test: true Signed-off-by: Lei Huang <[email protected]>

knard38 reviewed Feb 12, 2025

View reviewed changes

1. use offset internally instead of pointers in tlsf allocator

f8a8a69

2. avoid fixed mapping address for shared memory region 3. add more hash table tests Required-githooks: true Signed-off-by: Lei Huang <[email protected]>

use lock in get_shm_ht_with_name()

66e1430

Required-githooks: true Signed-off-by: Lei Huang <[email protected]>

add usability checking for hash table in shm

5974523

Required-githooks: true Signed-off-by: Lei Huang <[email protected]>

	shm_pool_size += (4096 - (shm_pool_size % 4096));
	shm_pool_size += (PAGE_SIZE - (shm_pool_size % PAGE_SIZE));

	/* array of pointors to memory allocators */
	/* array of pointers to memory allocators */

	d_shm_head->magic = DSM_MAGIC;
	__sync_synchronize() ;
	d_shm_head->magic = DSM_MAGIC;

DAOS-16877 client: implement utilities for shared memory #15613

Are you sure you want to change the base?

DAOS-16877 client: implement utilities for shared memory #15613

Conversation

wiliamhuang commented Dec 13, 2024

Before requesting gatekeeper:

Gatekeeper:

github-actions bot commented Dec 13, 2024

daosbuild1 commented Dec 13, 2024

wiliamhuang commented Dec 23, 2024

wiliamhuang commented Dec 23, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

knard38 left a comment

Choose a reason for hiding this comment

knard38 Jan 8, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

knard38 Jan 8, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

knard38 Jan 8, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

knard38 Jan 17, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mchaarawi left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

daosbuild1 commented Jan 9, 2025

daosbuild1 commented Jan 14, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

knard38 Jan 8, 2025 •

edited

Loading

knard38 Jan 8, 2025 •

edited

Loading

knard38 Jan 8, 2025 •

edited

Loading

knard38 Jan 17, 2025 •

edited

Loading