-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
readlink and readlinkat implementation on RawPOSIX / Wasmtime / glibc #76
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have a few doubts but overall looks good to me. Also a reminder to @ChinmayShringi , once this PR is merged, we need to also add address checking for the two new syscalls we have here in current memory PR or a seperate PR, depends on whether this PR or the memory PR merges first
@@ -47,6 +47,7 @@ profil - profil i:piii __profil profil | |||
ptrace - ptrace i:iiii ptrace | |||
read - read Ci:ibU __libc_read __read read | |||
readlink - readlink i:spU __readlink readlink | |||
readlinkat - readlinkat i:spU __libc_readlinkat readlinkat |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may does not matter but just wondering why you are adding this here. I think readlinkat is defined in another syscalls.list (sysdeps/unix/sysv/linux/syscalls.list
) before and I am not very sure if this syscalls.list is really responsbie for readlinkat
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting. Do we have notes on "how to add a syscall"? Having these steps documented in the docs would be great.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my understanding, sysdeps/unix/syscalls.list
defines general unix syscall support, while sysdeps/unix/sysv/linux/syscalls.list
adds linux-specific extensions or overrides based on the former. In the previous glibc implementation, readlinkat
wasn’t directly exposed as a public API -- it was invoked indirectly via readlink
(as seen in the__GI_
naming beforehand and also the original glibc readlink
implementation).
For the lind-wasm implementation of readlink
and readlinkat
, I decided to expose both as public APIs. To support this, I modified sysdeps/unix/syscalls.list
. Considering whether sysdeps/unix/sysv/linux/syscalls.list
might overwrite these changes, my understanding is that the weak_alias(__libc_readlinkat, readlinkat)
binds readlinkat
to the __libc_readlinkat
implementation, which should prevent it from being overwritten. As such, I think modifications to sysdeps/unix/sysv/linux/syscalls.list
might not be strictly necessary. That said, I’m not entirely certain if modifying only sysdeps/unix/sysv/linux/syscalls.list
could achieve the same effect. Any clarification on this would be greatly appreciated.
Additionally, I noticed we don’t currently have documentation outlining the code standards for lind-wasm. It might be helpful to discuss this in today’s meeting to ensure we’re aligned moving forward.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also a quick note, I'm a bit confused by differences between wasmtest.sh
and lindtool.sh
... It seems some redundant functionality and there're some minor issue in wasmtest.sh
(missing required abs path / incorrect command operation / etc.)..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, definitely agree. This is one of the things I'm hoping @m-hemmings will guide us on in the short term while working with the devops team.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So lindtool.sh
is actually my personal script that's basically helping me to do things more efficient. So it basically contains all the commands that I might need to use during development. As more people get involved into lind-wasm, I shared this script to them and then this script somehow get uploaded to github.
wasmtest.sh
is a new script wrote by Robin for testing purpose only (I think?). I think wasmtest.sh
is directly modified upon lindtool.sh
so there are a lot of similarity here.
let mut path = path.to_string(); | ||
let libc_buflen = buflen + LIND_ROOT.len(); | ||
let mut libc_buf = vec![0u8; libc_buflen]; | ||
if virtual_fd == libc::AT_FDCWD { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we moved to using our own defined constants instead of libc, but similar to qianxi's comment were waiting on the memory PR to be merged. So this will have to be fixed depending what ordering we merge.
let relpath = normpath(convpath(path), self); | ||
let relative_path = relpath.to_str().unwrap(); | ||
let full_path = format!("{}{}", LIND_ROOT, relative_path); | ||
let c_path = CString::new(full_path).unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a question regarding this. I am curious if there is a risk of panic if full_path were to contain a non utf-8 character here. I'm not sure whether there is a layer of validation elsewhere that would prevent this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing this out! I hadn't considered the possibility of the path containing non utf-8 characters during the design. You're right that if full_path
contains non utf-8 characters, the line let relative_path = relpath.to_str().unwrap();
would panic. To avoid this, we should implement better error handling (ie: using return EINVAL
instead of .unwrap()
). I will open an issue to document this and address it in the following refactor.
}; | ||
} | ||
|
||
if libcret < 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While this checks for a negative value, what would happen if libcret was the result of an empty link target? Would that return a 0 value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
readlinkat returns the length of the symbolic link's target path. If the symbolic link's target path is empty (i.e. it points to a non-existent file or an empty path), readlinkat returns 0 without filling buf with any data.
This PR adds support for readlink and readlinkat in lind-wasm to address issues encountered during Python compilation.
Changes:
RawPOSIX:
readlink
andreadlinkat
along with necessary path resolution logic.glibc:
readlinkat
fromreadlink
if the latter was not fully implemented. Since both syscalls are now implemented separately, the conditional check has been removed.Tests:
readlink
andreadlinka
t to ensure proper functionality.