-
Notifications
You must be signed in to change notification settings - Fork 309
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DAOS-14826 client: reserve low fds #13532
Conversation
Features: pil4dfs also fix a corner case in fstatat() to mimic libc. rely on dup2() to get desired fd to avoid error "failed to get the desired fd in dup2()". Signed-off-by: Lei Huang <[email protected]>
Bug-tracker data: |
Features: pil4dfs Signed-off-by: Lei Huang <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't you say you had a working test for the low fd issue? It would be good to see that included alongside the code.
src/client/dfuse/pil4dfs/int_dfs.c
Outdated
@@ -1007,6 +1010,19 @@ query_path(const char *szInput, int *is_target_path, dfs_obj_t **parent, char *i | |||
/* trying to avoid lock as much as possible */ | |||
if (!daos_inited) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at this existing code it doesn't appear thread-safe in the case of two threads entering this code-path concurrently, not a regression but it would be good to add locking to this at the same time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. We were aware of this possible issue. daos_init() uses lock already. query_path() is called in many intercepted functions. We hoped to avoid lock if possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A lock here isn't going to make any difference to performance and should only be needed in the contended case anyway. Without one there's potential for daos_init() to be called multiple times but daos_fini() only being called once leading to resource leaks and incorrect tidyup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case daos_init() is called more than one time, daos_init_cnt has the count. daos_fini() will be called daos_init_cnt times in destruction phase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like you've re-worked this code since yesterday but it's still not thread safe although as you daos_fini() does look like it'll be called the correct number of times.
pthread_atfork() wants to be called precisely once, and if you're using the daos_inited boolean in this manner then you need to access it atomically.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! Moved pthread_atfork() into the locked region. Could you please check again?
src/client/dfuse/pil4dfs/int_dfs.c
Outdated
@@ -1007,6 +1010,19 @@ query_path(const char *szInput, int *is_target_path, dfs_obj_t **parent, char *i | |||
/* trying to avoid lock as much as possible */ | |||
if (!daos_inited) { | |||
/* daos_init() is expensive to call. We call it only when necessary. */ | |||
int low_fd_list[DAOS_MIN_FD], low_fd_count = 0, fd_kernel, idx; | |||
|
|||
fd_kernel = open("/proc/self/maps", O_RDONLY); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's wrong with opening "/"?
Presumably this code can only get here after the hooks have been installed so perhaps it would be better to call the redirected open directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. "/" should work too.
You are right. libc_open() should work here. I will change it in next commit. Thank you!
src/client/dfuse/pil4dfs/int_dfs.c
Outdated
if (dirfd >= FD_FILE_BASE && dirfd < FD_DIR_BASE && path[0] == 0) | ||
/* same as fstat for a file. May need further work to handle flags */ | ||
return fstat(dirfd, stat_buf); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this related to the commit message? Same applies for the rest of the code changes in this file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a corner case of fstatat(). dirfd is allowed to be a fd of a regular file.
Yes. There are probably a few places where we need to handle flags. I added a comment here to make sure I will not forget it in future.
Yes. The test bash script needs some code in exec interception to pass. I will put the test case over there. fd 0, 1, and 2 are preserved after exec(). We need to handle such issue in #13519. We also probably need to close all fds higher than 2. |
Features: pil4dfs also use a lock before init main_eqh in query_path(). Required-githooks: true Signed-off-by: Lei Huang <[email protected]>
That link and some of the code in this PR is for the execv issue, but most of the code in this PR and the commit message relate to the low FD issue. Do you have a test for the low FD issue and can you include it in this PR. |
Features: pil4dfs also added a unit test to make sure daos does not use low fd. Required-githooks: true Signed-off-by: Lei Huang <[email protected]>
Features: pil4dfs Required-githooks: true Signed-off-by: Lei Huang <[email protected]>
Just added a unit test to make sure daos does not use low fd. Thank you! |
Features: pil4dfs Required-githooks: true Signed-off-by: Lei Huang <[email protected]>
Features: pil4dfs Required-githooks: true Signed-off-by: Lei Huang <[email protected]>
Test stage Functional on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-13532/7/testReport/ |
Features: pil4dfs Required-githooks: true Signed-off-by: Lei Huang <[email protected]>
@ashleypittman @mlawsonca Could you please review? Thank you very much! |
src/client/dfuse/pil4dfs/int_dfs.c
Outdated
@@ -5565,6 +5676,20 @@ init_myhook(void) | |||
eq_count_max = MAX_EQ; | |||
} | |||
|
|||
/* D_IL_DAOS_MIN_FD=3 or smaller value will disable blocking low fds */ | |||
rc = d_getenv_uint64_t("D_IL_DAOS_MIN_FD", &daos_min_fd); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this env documented somewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not in our document. Normally users do not need to set it. Default value 10 will be used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok well if users don't need to set it then i do not see why it needs to be an env variable..
if it needs to, then we should document it in the IL readme, and also consider changing it's name D_ and DAOS are repetitive.
maybe call it DAOS_RESERVE_MIN_FD? other suggestion welcome
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok. We can just use a hard-coded value for now. We add an env later if necessary. Thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, I am not against making this an env variable. im just saying that an env variable needs to be documented somewhere if we need to ask users to play with it. otherwise only the dev (you) knows about it and what it does.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! Understood. We can use hardcoded value to make our code simpler and decrease the number of env users need to deal with. Originally, I thought bash script can use arbitrary fd. An env may be helpful if a larger value is needed. However, it could be very hard for a regular user to figure out the situation. I will add a warning in dup2() if a fd pointing to "socket:[xxxx]" or "anon_inode:[eventpoll]" is closed, so it would help for debugging related issues.
DL_WARN(rc, "daos_eq_create() failed"); | ||
main_eqh = td_eqh; | ||
rc = pthread_atfork(NULL, NULL, &child_hdlr); | ||
D_MUTEX_LOCK(&lock_eqh); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i don't understand why we need nested locks now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok. I will rearrange the code to avoid nested locks. Thank you!
src/client/dfuse/pil4dfs/int_dfs.c
Outdated
@@ -1007,23 +1075,41 @@ query_path(const char *szInput, int *is_target_path, dfs_obj_t **parent, char *i | |||
/* trying to avoid lock as much as possible */ | |||
if (!daos_inited) { | |||
/* daos_init() is expensive to call. We call it only when necessary. */ | |||
|
|||
D_MUTEX_LOCK(&lock_global); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this needed for the entire block? isn't this just needed internally for consume_low_fd()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right. I will move the lock inside consume_low_fd(). Thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well my point was just that if you already added a lock on top (not sure if that one is needed), do you need this one too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually nevemind that comment was for the next lock not the lock_global
src/client/dfuse/pil4dfs/int_dfs.c
Outdated
@@ -2540,6 +2626,10 @@ new_fstatat(int dirfd, const char *__restrict path, struct stat *__restrict stat | |||
return new_xstat(1, path, stat_buf); | |||
} | |||
|
|||
if (dirfd >= FD_FILE_BASE && dirfd < FD_DIR_BASE && path[0] == 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
im not sure i understand what you are doing here, nor i understand this commit message:
"Also fix a corner case in fstatat() to mimic libc. Rely on libc dup2() to get desired fd to avoid error
"failed to get the desired fd in dup2()"."
what is the corner case being fixed? can you be more clear. if this corner case is fixed, can you add a test for the corner case? thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dirfd is allowed to be a fd of a regular file. I did not expect such situation. ok. I will add a unit test for this corner case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am trying to add the corner case in dfuse unit test. Looks libioil has the same issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry. False alarm. Used wrong flag in fstatat().
src/client/dfuse/pil4dfs/int_dfs.c
Outdated
/* the number of low fd blocked */ | ||
static uint16_t low_fd_count; | ||
/* upper limit of low fd that will be blocked */ | ||
static uint16_t max_low_fd_blocked; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it blocked or reserved?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe "reserved" is a little better.
Features: pil4dfs 1. add unit test for fstatat corner cases 2. rename blocked as reserved 3. remove the env to set the minimum fd daos can use 4. atomic APIs to access daos_inited 5. apply lock_global only for consume_low_fd() 6. Move pthread_atfork() inside the locked regime 7. check fd before closing it in dup2() Required-githooks: true Signed-off-by: Lei Huang <[email protected]>
Features: pil4dfs Required-githooks: true Signed-off-by: Lei Huang <[email protected]>
src/client/dfuse/pil4dfs/int_dfs.c
Outdated
@@ -241,6 +249,7 @@ struct sigaction old_segv; | |||
/* the flag to indicate whether initlization is finished or not */ | |||
static bool hook_enabled; | |||
static bool hook_enabled_bak; | |||
static pthread_mutex_t lock_global; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not a big deal, but if a repush is needed, better to rename that to lock_reserve_fd
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! I will fix it if I need a repush.
src/client/dfuse/pil4dfs/int_dfs.c
Outdated
} | ||
path[len] = 0; | ||
if (strncmp(path, "socket:", 7) == 0 || strncmp(path, "anon_inode:[eventpoll]", 22) == 0) { | ||
D_WARN("dup2/fcntl is closing fd %d (%s) which may be used by DAOS!", fd, path); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i guess i don't understand what a warning is supposed to do. how can we end up in this situation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This message is mainly for debugging. Not sure whether this is always related to real issues, especially for an application uses network. Shall I use D_DEBUG() instead?
If a bash script uses larger fd (e.g., 15) directly, this piece of code will be triggered.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For what it's worth the ioil code is at https://github.com/daos-stack/daos/blob/master/src/client/dfuse/il/int_posix.c#L741-L797
It just uses a hard-coded value of 10 and does no checking. It also uses dup for all but the first fd and does it all in one function rather than using globals.
src/client/dfuse/pil4dfs/int_dfs.c
Outdated
return 0; | ||
|
||
low_fd_count = 0; | ||
low_fd_list[low_fd_count] = libc_open("/", O_RDONLY); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd of expected O_PATH|O_DIRECTORY here but maybe it's not required?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. "O_PATH|O_DIRECTORY" works too. I can use it to match the code in other places. Thank you!
src/client/dfuse/pil4dfs/int_dfs.c
Outdated
/* check the fd to be closed. If the fd pointing to something like | ||
* 'socket:[xxxx]' or 'anon_inode:[eventpoll]', print out a warning | ||
* since DAOS network contexts may be using such fds. | ||
*/ | ||
static void | ||
check_fd_to_close(int fd) | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like this and it will be expensive, particularly on something like dfuse where readlink has to do a full path walk.
The low fds issue is one we need to work around but we don't want to compromise the performance of the whole library to do so. I also don't see that if there ever was an error here that it would be restricted to the dup2 call so I think this check is expensive and not helpful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok. I can remove this check for now. Thank you!
src/client/dfuse/pil4dfs/int_dfs.c
Outdated
oldfd_directed = query_fd_forward_dest(oldfd); | ||
if ((oldfd_directed < FD_FILE_BASE) && (oldfd < FD_FILE_BASE) && (newfd < FD_FILE_BASE)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still don't see the relevance of this change? Is it to do with low fds and required for the new test to pass or is it addressing another issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fixing a bug I recently identified. This change is needed to pass the new test of running bash script that uses low fd directly. The new test is added in the PR for exec() interception. bash opens a file on DFS then calls dup2(fake_fd, 5) to get a low fd 5, then call dup2(5, 1) to let fd 1 point to the file on DFS. After that, bash calls exec() to run "cat" to write to the file on DFS.
#! /bin/sh
exec 5>config.log {
cat <<EOF
EOF
} >&5
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should that test be in this PR or does this PR depend on the exec PR then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. The above is added in the exec PR. It requires the new code there.
Features: pil4dfs Required-githooks: true Signed-off-by: Lei Huang <[email protected]>
Test stage Build DEB on Ubuntu 20.04 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13532/18/execution/node/283/log |
Test stage Build RPM on EL 9 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13532/19/execution/node/356/log |
Test stage Build RPM on Leap 15.5 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13532/19/execution/node/286/log |
Test stage Build RPM on EL 8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13532/19/execution/node/371/log |
Test stage Build DEB on Ubuntu 20.04 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13532/19/execution/node/272/log |
Test stage NLT on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-13532/18/testReport/ |
Features: pil4dfs Required-githooks: true Signed-off-by: Lei Huang <[email protected]>
Test stage NLT on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-13532/20/testReport/ |
Features: pil4dfs Required-githooks: true Signed-off-by: Lei Huang <[email protected]>
Features: pil4dfs Required-githooks: true Signed-off-by: Lei Huang <[email protected]>
I can confirm that my test now runs and passes, It looks like the remaining issues were in the dup() code from your change? There is one error I'm getting from pil4dfs during this operation which you should look at but once that's resolved then I think this PR might be complete.
There are some IO requests to dfuse during the test, I assume this is down to the getcwd handling although I also see two opens and two reads - we can deal with that elsewhere though. |
Yes. One situation in dup2() was not implemented previously. Also glibc chdir() is needed to update the current working directory before exec(), so we always call glibc chdir inside chdir() and fchdir() now.
Thank you! I will look into this issue.
chdir()/fchdir() will be passed through to dfuse. exec() also needs dfuse. |
Test stage Functional Hardware Medium Verbs Provider completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13532/22/execution/node/1503/log |
Features: pil4dfs Priority: 2 Required-githooks: true Signed-off-by: Lei Huang <[email protected]>
Features: pil4dfs Priority: 2 Required-githooks: true Signed-off-by: Lei Huang <[email protected]>
I just pushed a commit to introduce fd duplication reference count to handle complex situations. This issue is resolved in my local test. Could you please give it a try? Thank you! |
Much better thank you. I updated the test to perform a write between the first and second close of the duplicated fd and was seeing a segfault before this latest revision, not it's passing successfully. |
@ashleypittman Thanks a lot for your testing and update! |
@mchaarawi Could you please review the latest changes? Thank you! |
Test stage Functional Hardware Large completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-13532/24/execution/node/1638/log |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my comment can be addressed in the next PR.
rc = dfs_open(dfs_mt->dfs, parent, item_name, mode | S_IFREG, oflags & (~O_APPEND), | ||
0, 0, NULL, &dfs_obj); | ||
mode_query = S_IFREG; | ||
} else if (!parent && (strncmp(item_name, "/", 2) == 0)) { | ||
rc = dfs_lookup(dfs_mt->dfs, "/", oflags, &dfs_obj, &mode_query, NULL); | ||
rc = | ||
dfs_lookup(dfs_mt->dfs, "/", oflags & (~O_APPEND), &dfs_obj, &mode_query, NULL); | ||
} else { | ||
rc = dfs_lookup_rel(dfs_mt->dfs, parent, item_name, oflags, &dfs_obj, &mode_query, | ||
NULL); | ||
rc = dfs_lookup_rel(dfs_mt->dfs, parent, item_name, oflags & (~O_APPEND), &dfs_obj, | ||
&mode_query, NULL); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some comments need to be added here as to how O_APPEND is being handled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok. I will add comments in my another PR. Thank you!
The failure in hardware tests are not related to libpil4dfs. |
Features: pil4dfs
Reserve low fd to avoid daos network contexts use them.
Fix a corner case in fstatat() to mimic libc and add unit test.
Rely on libc dup2() to get desired fd to avoid error "failed to get the desired fd in dup2()".
Use atomic APIs to access daos_inited.
Move pthread_atfork() inside locked region to avoid multiple execution.
Call glibc chdir() to update process's cwd in kernel state.
Introduce fd duplication reference count to handle complex situation
Add a unit test to make sure daos does not use low fd.
Before requesting gatekeeper:
Features:
(orTest-tag*
) commit pragma was used or there is a reason documented that there are no appropriate tags for this PR.Gatekeeper: