Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

readlink and readlinkat implementation on RawPOSIX / Wasmtime / glibc #76

Open
wants to merge 25 commits into
base: main
Choose a base branch
from

Conversation

Yaxuan-w
Copy link
Member

@Yaxuan-w Yaxuan-w commented Jan 6, 2025

This PR adds support for readlink and readlinkat in lind-wasm to address issues encountered during Python compilation.

Changes:

RawPOSIX:

  • Added support for readlink and readlinkat along with necessary path resolution logic.

glibc:

  • Previously, glibc would call readlinkat from readlink if the latter was not fully implemented. Since both syscalls are now implemented separately, the conditional check has been removed.

Tests:

  • Added test cases for both readlink and readlinkat to ensure proper functionality.
  • Added a script to set up the testing environment, making it easier to run and validate the changes.

Copy link
Contributor

@qianxichen233 qianxichen233 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have a few doubts but overall looks good to me. Also a reminder to @ChinmayShringi , once this PR is merged, we need to also add address checking for the two new syscalls we have here in current memory PR or a seperate PR, depends on whether this PR or the memory PR merges first

@@ -47,6 +47,7 @@ profil - profil i:piii __profil profil
ptrace - ptrace i:iiii ptrace
read - read Ci:ibU __libc_read __read read
readlink - readlink i:spU __readlink readlink
readlinkat - readlinkat i:spU __libc_readlinkat readlinkat
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may does not matter but just wondering why you are adding this here. I think readlinkat is defined in another syscalls.list (sysdeps/unix/sysv/linux/syscalls.list) before and I am not very sure if this syscalls.list is really responsbie for readlinkat

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. Do we have notes on "how to add a syscall"? Having these steps documented in the docs would be great.

Copy link
Member Author

@Yaxuan-w Yaxuan-w Jan 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my understanding, sysdeps/unix/syscalls.list defines general unix syscall support, while sysdeps/unix/sysv/linux/syscalls.list adds linux-specific extensions or overrides based on the former. In the previous glibc implementation, readlinkat wasn’t directly exposed as a public API -- it was invoked indirectly via readlink (as seen in the__GI_naming beforehand and also the original glibc readlink implementation).

For the lind-wasm implementation of readlink and readlinkat, I decided to expose both as public APIs. To support this, I modified sysdeps/unix/syscalls.list. Considering whether sysdeps/unix/sysv/linux/syscalls.list might overwrite these changes, my understanding is that the weak_alias(__libc_readlinkat, readlinkat) binds readlinkat to the __libc_readlinkat implementation, which should prevent it from being overwritten. As such, I think modifications to sysdeps/unix/sysv/linux/syscalls.list might not be strictly necessary. That said, I’m not entirely certain if modifying only sysdeps/unix/sysv/linux/syscalls.list could achieve the same effect. Any clarification on this would be greatly appreciated.

Additionally, I noticed we don’t currently have documentation outlining the code standards for lind-wasm. It might be helpful to discuss this in today’s meeting to ensure we’re aligned moving forward.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also a quick note, I'm a bit confused by differences between wasmtest.sh and lindtool.sh... It seems some redundant functionality and there're some minor issue in wasmtest.sh (missing required abs path / incorrect command operation / etc.)..

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, definitely agree. This is one of the things I'm hoping @m-hemmings will guide us on in the short term while working with the devops team.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So lindtool.sh is actually my personal script that's basically helping me to do things more efficient. So it basically contains all the commands that I might need to use during development. As more people get involved into lind-wasm, I shared this script to them and then this script somehow get uploaded to github.
wasmtest.sh is a new script wrote by Robin for testing purpose only (I think?). I think wasmtest.sh is directly modified upon lindtool.sh so there are a lot of similarity here.

let mut path = path.to_string();
let libc_buflen = buflen + LIND_ROOT.len();
let mut libc_buf = vec![0u8; libc_buflen];
if virtual_fd == libc::AT_FDCWD {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we moved to using our own defined constants instead of libc, but similar to qianxi's comment were waiting on the memory PR to be merged. So this will have to be fixed depending what ordering we merge.

@rennergade rennergade requested a review from m-hemmings January 6, 2025 18:52
let relpath = normpath(convpath(path), self);
let relative_path = relpath.to_str().unwrap();
let full_path = format!("{}{}", LIND_ROOT, relative_path);
let c_path = CString::new(full_path).unwrap();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a question regarding this. I am curious if there is a risk of panic if full_path were to contain a non utf-8 character here. I'm not sure whether there is a layer of validation elsewhere that would prevent this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing this out! I hadn't considered the possibility of the path containing non utf-8 characters during the design. You're right that if full_path contains non utf-8 characters, the line let relative_path = relpath.to_str().unwrap(); would panic. To avoid this, we should implement better error handling (ie: using return EINVAL instead of .unwrap()). I will open an issue to document this and address it in the following refactor.

};
}

if libcret < 0 {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While this checks for a negative value, what would happen if libcret was the result of an empty link target? Would that return a 0 value?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

readlinkat returns the length of the symbolic link's target path. If the symbolic link's target path is empty (i.e. it points to a non-existent file or an empty path), readlinkat returns 0 without filling buf with any data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants