Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for aarch64 platform intrinsics #3172

Open
alex opened this issue Nov 17, 2023 · 15 comments
Open

Add support for aarch64 platform intrinsics #3172

alex opened this issue Nov 17, 2023 · 15 comments
Labels
A-shims Area: This affects the external function shims A-target Area: concerns targets outside of what we currently support C-enhancement Category: a PR with an enhancement or an issue tracking an accepted enhancement

Comments

@alex
Copy link
Member

alex commented Nov 17, 2023

Currently this produces:

   --> /Users/alex_gaynor/.rustup/toolchains/nightly-aarch64-apple-darwin/lib/rustlib/src/rust/library/core/src/../../stdarch/crates/core_arch/src/arm_shared/crypto.rs:69:5
    |
69  |     vaeseq_u8_(data, key)
    |     ^^^^^^^^^^^^^^^^^^^^^ can't call foreign function `llvm.aarch64.crypto.aese` on OS `macos`
    |

or

test inputs::encoded::tests::test_input ... error: unsupported operation: can't call foreign function `llvm.aarch64.neon.tbl1.v16i8` on OS `macos`
    --> /Users/dmnk/.rustup/toolchains/nightly-aarch64-apple-darwin/lib/rustlib/src/rust/library/core/src/../../stdarch/crates/core_arch/src/aarch64/neon/mod.rs:2438:15
     |
2438 |     transmute(vqtbl1q(transmute(t), transmute(idx)))
     |               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ can't call foreign function `llvm.aarch64.neon.tbl1.v16i8` on OS `macos`
     |
@saethlin
Copy link
Member

I should probably do a version of #2057 for aarch64, all my current surveying is done based on x86-64-v2.

@alex
Copy link
Member Author

alex commented Nov 17, 2023

If there's a straightforward script for it (and you promise it won't destroy my computer :D), I'm happy to do a run on my ARM64 laptop.

@saethlin
Copy link
Member

saethlin commented Nov 17, 2023

It involves running the tests for every published crate so I feel like you're not up for that :)

Also Miri supports cross-interpretation so the host doesn't matter, my big x86_64 CPU will do just fine for this.

@alex
Copy link
Member Author

alex commented Nov 17, 2023

I was thinking maybe I'd just do the first 500 or 1k or something :-)

But if your setup already works for it, that sounds good!

@saethlin
Copy link
Member

saethlin commented Nov 17, 2023

I hacked up https://github.com/saethlin/crater-at-home a bit to set the target to aarch64-unknown-linux-gnu and here's a thousand crates (hosted for now in my dev bucket): https://miri-bot-dev.s3.amazonaws.com/aarch64-1000.tar.xz

Missing LLVM intrinsics look like:

716 counts
(  1)      178 (24.9%, 24.9%): llvm.aarch64.neon.uminv.i32.v4i32
(  2)       62 ( 8.7%, 33.5%): llvm.aarch64.neon.uminv.i8.v16i8
(  3)       60 ( 8.4%, 41.9%): llvm.aarch64.neon.uminv.i16.v8i16
(  4)       50 ( 7.0%, 48.9%): llvm.aarch64.neon.tbl1.v16i8
(  5)       40 ( 5.6%, 54.5%): llvm.aarch64.neon.umaxp.v16i8
(  6)       26 ( 3.6%, 58.1%): llvm.aarch64.neon.ushl.v2i64
(  7)       24 ( 3.4%, 61.5%): llvm.aarch64.neon.ushl.v4i32
(  8)       18 ( 2.5%, 64.0%): llvm.aarch64.neon.uaddv.i32.v4i32
(  9)       18 ( 2.5%, 66.5%): llvm.fma.v2f64
( 10)       16 ( 2.2%, 68.7%): llvm.fma.v4f32
( 11)       14 ( 2.0%, 70.7%): llvm.aarch64.neon.sshl.v4i32
( 12)       14 ( 2.0%, 72.6%): llvm.aarch64.neon.uaddlv.i32.v16i8
( 13)       12 ( 1.7%, 74.3%): llvm.aarch64.neon.frintn.v4f32
( 14)       10 ( 1.4%, 75.7%): llvm.aarch64.neon.sshl.v8i16
( 15)        8 ( 1.1%, 76.8%): llvm.aarch64.neon.fcvtns.v4i32.v4f32
( 16)        8 ( 1.1%, 77.9%): llvm.aarch64.neon.ld1x4.v16i8.p0i8
( 17)        8 ( 1.1%, 79.1%): llvm.aarch64.neon.smin.v4i32
( 18)        8 ( 1.1%, 80.2%): llvm.aarch64.neon.smin.v8i16
( 19)        8 ( 1.1%, 81.3%): llvm.aarch64.neon.sqrdmulh.v8i16
( 20)        8 ( 1.1%, 82.4%): llvm.aarch64.neon.sshl.v2i64
( 21)        8 ( 1.1%, 83.5%): llvm.aarch64.neon.umaxv.i8.v16i8
( 22)        8 ( 1.1%, 84.6%): llvm.fptosi.sat.v4i32.v4f32
( 23)        6 ( 0.8%, 85.5%): llvm.aarch64.neon.uaddv.i32.v8i16
( 24)        4 ( 0.6%, 86.0%): llvm.aarch64.neon.abs.v16i8
( 25)        4 ( 0.6%, 86.6%): llvm.aarch64.neon.abs.v4i32
( 26)        4 ( 0.6%, 87.2%): llvm.aarch64.neon.abs.v8i16
( 27)        4 ( 0.6%, 87.7%): llvm.aarch64.neon.fmax.v2f64
( 28)        4 ( 0.6%, 88.3%): llvm.aarch64.neon.fmax.v4f32
( 29)        4 ( 0.6%, 88.8%): llvm.aarch64.neon.fmaxnm.v2f64
( 30)        4 ( 0.6%, 89.4%): llvm.aarch64.neon.fmaxnm.v4f32
( 31)        4 ( 0.6%, 89.9%): llvm.aarch64.neon.fmin.v2f64
( 32)        4 ( 0.6%, 90.5%): llvm.aarch64.neon.fmin.v4f32
( 33)        4 ( 0.6%, 91.1%): llvm.aarch64.neon.fminnm.v2f64
( 34)        4 ( 0.6%, 91.6%): llvm.aarch64.neon.fminnm.v4f32
( 35)        4 ( 0.6%, 92.2%): llvm.aarch64.neon.smax.v16i8
( 36)        4 ( 0.6%, 92.7%): llvm.aarch64.neon.smax.v8i16
( 37)        4 ( 0.6%, 93.3%): llvm.aarch64.neon.smin.v16i8
( 38)        4 ( 0.6%, 93.9%): llvm.aarch64.neon.smull.v4i16
( 39)        4 ( 0.6%, 94.4%): llvm.aarch64.neon.sqadd.v16i8
( 40)        4 ( 0.6%, 95.0%): llvm.aarch64.neon.sqadd.v8i16
( 41)        4 ( 0.6%, 95.5%): llvm.aarch64.neon.sqsub.v16i8
( 42)        4 ( 0.6%, 96.1%): llvm.aarch64.neon.sqsub.v8i16
( 43)        4 ( 0.6%, 96.6%): llvm.aarch64.neon.ushl.v8i16
( 44)        2 ( 0.3%, 96.9%): llvm.aarch64.neon.sqxtn.v4i16
( 45)        2 ( 0.3%, 97.2%): llvm.aarch64.neon.sqxtn.v8i8
( 46)        2 ( 0.3%, 97.5%): llvm.aarch64.neon.umax.v16i8
( 47)        2 ( 0.3%, 97.8%): llvm.aarch64.neon.umax.v4i32
( 48)        2 ( 0.3%, 98.0%): llvm.aarch64.neon.umax.v8i16
( 49)        2 ( 0.3%, 98.3%): llvm.aarch64.neon.umin.v16i8
( 50)        2 ( 0.3%, 98.6%): llvm.aarch64.neon.umin.v4i32
( 51)        2 ( 0.3%, 98.9%): llvm.aarch64.neon.umin.v8i16
( 52)        2 ( 0.3%, 99.2%): llvm.aarch64.neon.uqadd.v16i8
( 53)        2 ( 0.3%, 99.4%): llvm.aarch64.neon.uqadd.v8i16
( 54)        2 ( 0.3%, 99.7%): llvm.aarch64.neon.uqsub.v16i8
( 55)        2 ( 0.3%,100.0%): llvm.aarch64.neon.uqsub.v8i16

Nothing about AES. Do I need a particular target-cpu set to have AES intrinsics? Or is there a specific crate you were working on above that I can test target CPUs to see which one hits this?

@alex
Copy link
Member Author

alex commented Nov 17, 2023

https://github.com/ogxd/gxhash is what I was playing with when I originally ran into this.

https://github.com/RustCrypto/block-ciphers/tree/master/aes uses the same instruction, but goes via inline assembly instead of the intrinsic, for whatever reason.

@alex
Copy link
Member Author

alex commented Nov 17, 2023

In any event, thanks for running these numbers!

I'm using an Apple M1, which will have a set of baseline capabilities that I'm not sure is guaranteed for all aarch64 chips.

@saethlin
Copy link
Member

saethlin commented Nov 17, 2023

Baseline aarch64 does not have the aes feature, but I think here rust-lang/rust#93889 (comment) @workingjubilee says that M1 is target-cpu=apple-a14. I'll try that, it has the aes feature.

@RalfJung RalfJung added C-enhancement Category: a PR with an enhancement or an issue tracking an accepted enhancement A-shims Area: This affects the external function shims labels Apr 18, 2024
@RalfJung RalfJung changed the title Add support for "llvm.aarch64.crypto.aese" Add support for aarch64 platform intrinsics Apr 18, 2024
@RalfJung RalfJung added the A-target Area: concerns targets outside of what we currently support label Apr 18, 2024
@akern40
Copy link

akern40 commented Oct 24, 2024

Sorry I'm just getting started with Miri, but llvm.fma.v2f64 would be great to have for developing on Mac - it's a common call from the matrixmultiply crate, which many crates in the Rust computational compute ecosystem use (I'm a maintainer of ndarray, which uses it for our dot product implementations).

@RalfJung
Copy link
Member

Until someone is interested in implementing the aarch64 intrinsics, you can run your code with a different target: --target x86_64-apple-darwin works fine on all hosts and can use the x86 intrinsics that have been implemented in Miri.

@workingjubilee
Copy link
Member

This problem should just be fixed in stdarch. It links against LLVMIR for aarch64 instruction generation instead of going through core::intrinsics.

https://doc.rust-lang.org/nightly/src/core/stdarch/crates/core_arch/src/arm_shared/neon/generated.rs.html#18835-18853

@RalfJung
Copy link
Member

RalfJung commented Oct 24, 2024

Oh, these operations have equivalents in core::intrinsics::simd? Yeah those should be used then, as that makes life easier for all backends.

@workingjubilee
Copy link
Member

Yes, all of these at least should be able to be implemented in terms of our already extant intrinsics:

https://github.com/rust-lang/stdarch/blob/0669fa8a6e48afde64e0ceccdf462a9cc3fb689f/crates/stdarch-gen-arm/neon.spec#L3983-L4125

@akern40 I realize the spec file is slightly inscrutable but if you can spend the time to puzzle out getting that code to use https://doc.rust-lang.org/nightly/std/intrinsics/simd/fn.simd_fma.html then your problem goes away because Miri already implements simd_fma and it's the same LLVMIR.

@RalfJung
Copy link
Member

I opened a stdarch issue about changing their implementation of the aarch64 intrinsics, which indeed would get us a lot of Miri (and cranelift and gcc) support for free: rust-lang/stdarch#1659.

@akern40
Copy link

akern40 commented Oct 24, 2024

Thank you both! When I've got a bit more time this weekend I'll try to dig into this and see what I can help contribute.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-shims Area: This affects the external function shims A-target Area: concerns targets outside of what we currently support C-enhancement Category: a PR with an enhancement or an issue tracking an accepted enhancement
Projects
None yet
Development

No branches or pull requests

5 participants