-
Notifications
You must be signed in to change notification settings - Fork 277
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Complete vld1 instructions with some corrections #1216
Conversation
r? @Amanieu (rust-highfive has picked a reviewer for you, use r? to override) |
name = vmull | ||
n-suffix | ||
name = vmull_n | ||
no-q |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this change should be documented in the PR or in an extra commit/extra PR as it changes the names of public intrinsics (to their correct names).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing it out! I added relevant documents to the description of the PR.
@@ -3568,7 +3733,7 @@ generate int16x4_t:i16:int16x4_t, int32x2_t:i32:int32x2_t | |||
|
|||
/// Vector saturating doubling multiply high with scalar | |||
name = vqdmulhq_n | |||
out-suffix | |||
no-q |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above.
#[target_feature(enable = "neon,aes")] | ||
#[cfg_attr(test, assert_instr(ldr))] | ||
pub unsafe fn vld1q_p64(ptr: *const p64) -> poly64x2_t { | ||
transmute(u64x2::new(*ptr, *ptr.offset(1))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a concern raised in #1148 that this doesn't get optimized down to a single instruction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, all others have been already changed to use read_unaligned()
instead which cannot be seen in the diff as it this PR is based on an older commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense. I will rebase the PR and make changes accordingly.
Update stdarch submodule This is mainly to fix the critical issue of aarch64 store intrinsics overwriting additional memory, see rust-lang/stdarch#1220 Changes: * aarch64/armv7: additional vld1/vst1 intrinsics + perf fixes for existing ones * rust-lang/stdarch#1205 * rust-lang/stdarch#1207 * rust-lang/stdarch#1216 * armv7: Make FMA work with vfpv4 and optimize * rust-lang/stdarch#1219 * Non-visible changes to the testing framework * rust-lang/stdarch#1208 * rust-lang/stdarch#1211 * rust-lang/stdarch#1213 * rust-lang/stdarch#1215 * rust-lang/stdarch#1218
This PR does the following:
(1) completes
vld1
neon instructions, mainly for*_p64
s(2) replaces
crypto
feature withaes
for arm(3) adds
"core_arch/src/arm_shared/neon"
dir instdarch-verify
. Based on this, the function names ofvmull_n_s16
,vmull_n_s32
,vmull_n_u16
,vmull_n_u32
,vqdmulhq_n_s16
,vqdmulhq_n_s32
have been corrected(4) optimizes
stdarch-gen
so thatvld2*
(and other) instructions can be generated more clearly(For the convenience of reviewing, they are not generated in this PR)