Add support for aarch64 platform intrinsics #3172

alex · 2023-11-17T15:08:57Z

Currently this produces:

   --> /Users/alex_gaynor/.rustup/toolchains/nightly-aarch64-apple-darwin/lib/rustlib/src/rust/library/core/src/../../stdarch/crates/core_arch/src/arm_shared/crypto.rs:69:5
    |
69  |     vaeseq_u8_(data, key)
    |     ^^^^^^^^^^^^^^^^^^^^^ can't call foreign function `llvm.aarch64.crypto.aese` on OS `macos`
    |

or

test inputs::encoded::tests::test_input ... error: unsupported operation: can't call foreign function `llvm.aarch64.neon.tbl1.v16i8` on OS `macos`
    --> /Users/dmnk/.rustup/toolchains/nightly-aarch64-apple-darwin/lib/rustlib/src/rust/library/core/src/../../stdarch/crates/core_arch/src/aarch64/neon/mod.rs:2438:15
     |
2438 |     transmute(vqtbl1q(transmute(t), transmute(idx)))
     |               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ can't call foreign function `llvm.aarch64.neon.tbl1.v16i8` on OS `macos`
     |

The text was updated successfully, but these errors were encountered:

saethlin · 2023-11-17T15:10:50Z

I should probably do a version of #2057 for aarch64, all my current surveying is done based on x86-64-v2.

alex · 2023-11-17T15:14:38Z

If there's a straightforward script for it (and you promise it won't destroy my computer :D), I'm happy to do a run on my ARM64 laptop.

saethlin · 2023-11-17T15:16:00Z

It involves running the tests for every published crate so I feel like you're not up for that :)

Also Miri supports cross-interpretation so the host doesn't matter, my big x86_64 CPU will do just fine for this.

alex · 2023-11-17T15:18:16Z

I was thinking maybe I'd just do the first 500 or 1k or something :-)

But if your setup already works for it, that sounds good!

saethlin · 2023-11-17T20:31:31Z

I hacked up https://github.com/saethlin/crater-at-home a bit to set the target to aarch64-unknown-linux-gnu and here's a thousand crates (hosted for now in my dev bucket): https://miri-bot-dev.s3.amazonaws.com/aarch64-1000.tar.xz

Missing LLVM intrinsics look like:

716 counts
(  1)      178 (24.9%, 24.9%): llvm.aarch64.neon.uminv.i32.v4i32
(  2)       62 ( 8.7%, 33.5%): llvm.aarch64.neon.uminv.i8.v16i8
(  3)       60 ( 8.4%, 41.9%): llvm.aarch64.neon.uminv.i16.v8i16
(  4)       50 ( 7.0%, 48.9%): llvm.aarch64.neon.tbl1.v16i8
(  5)       40 ( 5.6%, 54.5%): llvm.aarch64.neon.umaxp.v16i8
(  6)       26 ( 3.6%, 58.1%): llvm.aarch64.neon.ushl.v2i64
(  7)       24 ( 3.4%, 61.5%): llvm.aarch64.neon.ushl.v4i32
(  8)       18 ( 2.5%, 64.0%): llvm.aarch64.neon.uaddv.i32.v4i32
(  9)       18 ( 2.5%, 66.5%): llvm.fma.v2f64
( 10)       16 ( 2.2%, 68.7%): llvm.fma.v4f32
( 11)       14 ( 2.0%, 70.7%): llvm.aarch64.neon.sshl.v4i32
( 12)       14 ( 2.0%, 72.6%): llvm.aarch64.neon.uaddlv.i32.v16i8
( 13)       12 ( 1.7%, 74.3%): llvm.aarch64.neon.frintn.v4f32
( 14)       10 ( 1.4%, 75.7%): llvm.aarch64.neon.sshl.v8i16
( 15)        8 ( 1.1%, 76.8%): llvm.aarch64.neon.fcvtns.v4i32.v4f32
( 16)        8 ( 1.1%, 77.9%): llvm.aarch64.neon.ld1x4.v16i8.p0i8
( 17)        8 ( 1.1%, 79.1%): llvm.aarch64.neon.smin.v4i32
( 18)        8 ( 1.1%, 80.2%): llvm.aarch64.neon.smin.v8i16
( 19)        8 ( 1.1%, 81.3%): llvm.aarch64.neon.sqrdmulh.v8i16
( 20)        8 ( 1.1%, 82.4%): llvm.aarch64.neon.sshl.v2i64
( 21)        8 ( 1.1%, 83.5%): llvm.aarch64.neon.umaxv.i8.v16i8
( 22)        8 ( 1.1%, 84.6%): llvm.fptosi.sat.v4i32.v4f32
( 23)        6 ( 0.8%, 85.5%): llvm.aarch64.neon.uaddv.i32.v8i16
( 24)        4 ( 0.6%, 86.0%): llvm.aarch64.neon.abs.v16i8
( 25)        4 ( 0.6%, 86.6%): llvm.aarch64.neon.abs.v4i32
( 26)        4 ( 0.6%, 87.2%): llvm.aarch64.neon.abs.v8i16
( 27)        4 ( 0.6%, 87.7%): llvm.aarch64.neon.fmax.v2f64
( 28)        4 ( 0.6%, 88.3%): llvm.aarch64.neon.fmax.v4f32
( 29)        4 ( 0.6%, 88.8%): llvm.aarch64.neon.fmaxnm.v2f64
( 30)        4 ( 0.6%, 89.4%): llvm.aarch64.neon.fmaxnm.v4f32
( 31)        4 ( 0.6%, 89.9%): llvm.aarch64.neon.fmin.v2f64
( 32)        4 ( 0.6%, 90.5%): llvm.aarch64.neon.fmin.v4f32
( 33)        4 ( 0.6%, 91.1%): llvm.aarch64.neon.fminnm.v2f64
( 34)        4 ( 0.6%, 91.6%): llvm.aarch64.neon.fminnm.v4f32
( 35)        4 ( 0.6%, 92.2%): llvm.aarch64.neon.smax.v16i8
( 36)        4 ( 0.6%, 92.7%): llvm.aarch64.neon.smax.v8i16
( 37)        4 ( 0.6%, 93.3%): llvm.aarch64.neon.smin.v16i8
( 38)        4 ( 0.6%, 93.9%): llvm.aarch64.neon.smull.v4i16
( 39)        4 ( 0.6%, 94.4%): llvm.aarch64.neon.sqadd.v16i8
( 40)        4 ( 0.6%, 95.0%): llvm.aarch64.neon.sqadd.v8i16
( 41)        4 ( 0.6%, 95.5%): llvm.aarch64.neon.sqsub.v16i8
( 42)        4 ( 0.6%, 96.1%): llvm.aarch64.neon.sqsub.v8i16
( 43)        4 ( 0.6%, 96.6%): llvm.aarch64.neon.ushl.v8i16
( 44)        2 ( 0.3%, 96.9%): llvm.aarch64.neon.sqxtn.v4i16
( 45)        2 ( 0.3%, 97.2%): llvm.aarch64.neon.sqxtn.v8i8
( 46)        2 ( 0.3%, 97.5%): llvm.aarch64.neon.umax.v16i8
( 47)        2 ( 0.3%, 97.8%): llvm.aarch64.neon.umax.v4i32
( 48)        2 ( 0.3%, 98.0%): llvm.aarch64.neon.umax.v8i16
( 49)        2 ( 0.3%, 98.3%): llvm.aarch64.neon.umin.v16i8
( 50)        2 ( 0.3%, 98.6%): llvm.aarch64.neon.umin.v4i32
( 51)        2 ( 0.3%, 98.9%): llvm.aarch64.neon.umin.v8i16
( 52)        2 ( 0.3%, 99.2%): llvm.aarch64.neon.uqadd.v16i8
( 53)        2 ( 0.3%, 99.4%): llvm.aarch64.neon.uqadd.v8i16
( 54)        2 ( 0.3%, 99.7%): llvm.aarch64.neon.uqsub.v16i8
( 55)        2 ( 0.3%,100.0%): llvm.aarch64.neon.uqsub.v8i16

Nothing about AES. Do I need a particular target-cpu set to have AES intrinsics? Or is there a specific crate you were working on above that I can test target CPUs to see which one hits this?

alex · 2023-11-17T20:42:49Z

https://github.com/ogxd/gxhash is what I was playing with when I originally ran into this.

https://github.com/RustCrypto/block-ciphers/tree/master/aes uses the same instruction, but goes via inline assembly instead of the intrinsic, for whatever reason.

alex · 2023-11-17T20:46:12Z

In any event, thanks for running these numbers!

I'm using an Apple M1, which will have a set of baseline capabilities that I'm not sure is guaranteed for all aarch64 chips.

saethlin · 2023-11-17T20:56:34Z

Baseline aarch64 does not have the aes feature, but I think here rust-lang/rust#93889 (comment) @workingjubilee says that M1 is target-cpu=apple-a14. I'll try that, it has the aes feature.

akern40 · 2024-10-24T00:48:20Z

Sorry I'm just getting started with Miri, but llvm.fma.v2f64 would be great to have for developing on Mac - it's a common call from the matrixmultiply crate, which many crates in the Rust computational compute ecosystem use (I'm a maintainer of ndarray, which uses it for our dot product implementations).

RalfJung · 2024-10-24T05:52:05Z

Until someone is interested in implementing the aarch64 intrinsics, you can run your code with a different target: --target x86_64-apple-darwin works fine on all hosts and can use the x86 intrinsics that have been implemented in Miri.

workingjubilee · 2024-10-24T06:45:58Z

This problem should just be fixed in stdarch. It links against LLVMIR for aarch64 instruction generation instead of going through core::intrinsics.

https://doc.rust-lang.org/nightly/src/core/stdarch/crates/core_arch/src/arm_shared/neon/generated.rs.html#18835-18853

RalfJung · 2024-10-24T06:48:31Z

Oh, these operations have equivalents in core::intrinsics::simd? Yeah those should be used then, as that makes life easier for all backends.

workingjubilee · 2024-10-24T06:56:07Z

Yes, all of these at least should be able to be implemented in terms of our already extant intrinsics:

https://github.com/rust-lang/stdarch/blob/0669fa8a6e48afde64e0ceccdf462a9cc3fb689f/crates/stdarch-gen-arm/neon.spec#L3983-L4125

@akern40 I realize the spec file is slightly inscrutable but if you can spend the time to puzzle out getting that code to use https://doc.rust-lang.org/nightly/std/intrinsics/simd/fn.simd_fma.html then your problem goes away because Miri already implements simd_fma and it's the same LLVMIR.

RalfJung · 2024-10-24T08:19:44Z

I opened a stdarch issue about changing their implementation of the aarch64 intrinsics, which indeed would get us a lot of Miri (and cranelift and gcc) support for free: rust-lang/stdarch#1659.

akern40 · 2024-10-24T12:32:57Z

Thank you both! When I've got a bit more time this weekend I'll try to dig into this and see what I can help contribute.

saethlin mentioned this issue Nov 17, 2023

Add a --target flag to support cross-interpreting to M1 saethlin/crater-at-home#100

Merged

alex mentioned this issue Dec 31, 2023

Neon instructions fail on Miri #3243

Closed

RalfJung added C-enhancement Category: a PR with an enhancement or an issue tracking an accepted enhancement A-shims Area: This affects the external function shims labels Apr 18, 2024

RalfJung changed the title ~~Add support for "llvm.aarch64.crypto.aese"~~ Add support for aarch64 platform intrinsics Apr 18, 2024

RalfJung added the A-target Area: concerns targets outside of what we currently support label Apr 18, 2024

glandium mentioned this issue Jun 24, 2024

Please add a feature to switch to a basic implementation BurntSushi/memchr#155

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for aarch64 platform intrinsics #3172

Add support for aarch64 platform intrinsics #3172

alex commented Nov 17, 2023 •

edited by RalfJung

Loading

saethlin commented Nov 17, 2023

alex commented Nov 17, 2023

saethlin commented Nov 17, 2023 •

edited

Loading

alex commented Nov 17, 2023

saethlin commented Nov 17, 2023 •

edited

Loading

alex commented Nov 17, 2023

alex commented Nov 17, 2023

saethlin commented Nov 17, 2023 •

edited

Loading

akern40 commented Oct 24, 2024

RalfJung commented Oct 24, 2024

workingjubilee commented Oct 24, 2024

RalfJung commented Oct 24, 2024 •

edited

Loading

workingjubilee commented Oct 24, 2024

RalfJung commented Oct 24, 2024

akern40 commented Oct 24, 2024

Add support for aarch64 platform intrinsics #3172

Add support for aarch64 platform intrinsics #3172

Comments

alex commented Nov 17, 2023 • edited by RalfJung Loading

saethlin commented Nov 17, 2023

alex commented Nov 17, 2023

saethlin commented Nov 17, 2023 • edited Loading

alex commented Nov 17, 2023

saethlin commented Nov 17, 2023 • edited Loading

alex commented Nov 17, 2023

alex commented Nov 17, 2023

saethlin commented Nov 17, 2023 • edited Loading

akern40 commented Oct 24, 2024

RalfJung commented Oct 24, 2024

workingjubilee commented Oct 24, 2024

RalfJung commented Oct 24, 2024 • edited Loading

workingjubilee commented Oct 24, 2024

RalfJung commented Oct 24, 2024

akern40 commented Oct 24, 2024

alex commented Nov 17, 2023 •

edited by RalfJung

Loading

saethlin commented Nov 17, 2023 •

edited

Loading

saethlin commented Nov 17, 2023 •

edited

Loading

saethlin commented Nov 17, 2023 •

edited

Loading

RalfJung commented Oct 24, 2024 •

edited

Loading