-
-
Notifications
You must be signed in to change notification settings - Fork 852
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ARM version of calculating mode scores #2356
Conversation
# Conflicts: # tests/ImageSharp.Tests/Formats/WebP/LossyUtilsTests.cs
sum = AccumulateSSE16Neon( | ||
ref Unsafe.Add(ref aRef, y * WebpConstants.Bps), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using pointers could allow emitting better code for address calculation. We can then increment the pointers as
aPtr += WebpConstants.Bps;
bPtr += WebpConstants.Bps;
This would improve address calculation.
before:
lsl w0, w20, #5
sxtw x0, w0
add x19, x19, x0
after:
add x19, x19, #32
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The generated code looks indeed a bit better with pointers. I was not aware of that.
Here is a SharpLab gist: Sse16x16_NeonPointers
/// <param name="accumulator">The accumulator to reduce.</param> | ||
/// <returns>The sum of all elements.</returns> | ||
[MethodImpl(InliningOptions.ShortMethod)] | ||
public static int ReduceSumArm(Vector128<uint> accumulator) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use Vector128<T>.sum()
instead of this method. In general, try using Vector128/Vector256 API wherever possible. This would improve portability of the code and benefit from improvements to the API itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ReduceSum
can also be refactored out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The
ReduceSum
can also be refactored out.
We cannot get rid of ReduceSum
yet, because we target net6.0
and the Vector128<T>.sum
was introduced with net7.0
.
I am using Vector128<T>.sum
for >= Net7.0
: b0bfb0a
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, makes sense 👍
@SwapnilGaikwad Thanks for reviewing the code! |
Prerequisites
Description
This PR adds a ARM version of calculating mode scores which is used during webp encoding. Implementation is based on libwebp/enc_neon.c
Benchmarks:
main
PR
Test image was
Jpg/baseline/Calliphora.jpg
from the tests/Images/Input folder.cpu info