Native Rotation Intrinsic Optimization #37

NCGThompson · 2023-11-19T17:41:36Z

I removed some redundancies in rol3. On generic x86_64, the rotate left short benchmark got a 44% time reduction to 4.25ns and rotate left long got 34% improvement to 4.6ns. On the same computer with target-cpu=skylake, both benchmark times got approximately halved to 3.45ns. Hopefully native rol3 is called more than 2 * 10^13 times in the future so my time spent was worth while.

I marked it draft because two things need to happen:

ror3 needs to be optimized as well. I set to just negate b and call rol3. This is still an improvement over the status quo, but on generic x86_64, rotate right gets considerably worse benchmark scores than rotate rotate left. My only hypothesis is that the negation tips it over the edge of an inlining threshold. I'll give it its own mirror of rol3.
The new function needs to be benchmarked on other architectures. This is particularily important because I explicitly wrote the new version with 64-bit operations in mind, so it might be a pessimization on 32 bit architectures. We may need to either use 32-bit integers instead, or maybe go the opposite direction to 128-bit. There is also some endianness-specific code. It is pretty simple, but it should be checked for correctness to be safe. I only have access to x86_64 and 64-bit ARM so I need someone else to test this for me.

NCGThompson · 2023-11-19T17:51:04Z

On an only somewhat related note. It's worth considering totally getting rid of the llvm intrinsics for shl and shr. The native implementations use branching rather than cmov. Even after the $CARGO_ENCODED_RUSTFLAGS patch, I'm still getting worse benchmarks for many llvm intrinsics than native for many other functions so I don't want to jump to conclusions.

NCGThompson · 2023-12-10T01:33:09Z

My last commit made it slower in some circumstances but faster in others. However, I think it its overall a bit better than the 64-bit rotation version.

Optimized native rol3

2583874

Refined Native rol3

1555bd1

NCGThompson marked this pull request as ready for review December 10, 2023 01:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Native Rotation Intrinsic Optimization #37

Native Rotation Intrinsic Optimization #37

NCGThompson commented Nov 19, 2023

NCGThompson commented Nov 19, 2023

NCGThompson commented Dec 10, 2023

Native Rotation Intrinsic Optimization #37

Are you sure you want to change the base?

Native Rotation Intrinsic Optimization #37

Conversation

NCGThompson commented Nov 19, 2023

NCGThompson commented Nov 19, 2023

NCGThompson commented Dec 10, 2023