-
Notifications
You must be signed in to change notification settings - Fork 205
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Windows: ARM64/NEON Support #769
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Jamie Vital <[email protected]>
@@ -299,7 +299,11 @@ volk_32f_index_max_32u_neon(uint32_t* target, const float* src0, uint32_t num_po | |||
if (maxValuesBuffer[number] > max) { | |||
index = maxIndexesBuffer[number]; | |||
max = maxValuesBuffer[number]; | |||
#ifdef _MSC_VER | |||
} else if (maxValues.n128_f32[number] == max) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I must admit I don't quite understand that; what type is float32x4_t
on MSVC/aarch64/neon?
if(MSVC) | ||
if(CMAKE_SYSTEM_PROCESSOR STREQUAL "ARM") | ||
overrule_arch(neonv8 "Compiler doesn't support neonv8") | ||
endif() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😢
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR!
My comments are rather critical because the changes are specific to MSVC, which is notoriously non-standard compliant. This might affect the result for other compilers.
I'd really like to merge this PR but make sure, that we don't see a performance regression.
The whole assume NEON is present etc. reasoning is good. Maybe, you could add a comment next to the checks to make sure this knowledge doesn't get lost over time.
have_neonv7_result) | ||
check_c_source_compiles( | ||
"#include <volk/volk_common.h>\n int main(){__VOLK_ASM(\"sub v1.4s,v1.4s,v1.4s\");}" | ||
have_neonv8_result) | ||
|
||
if(NOT have_neonv7_result) | ||
overrule_arch(neonv7 "Compiler doesn't support neonv7") | ||
endif() | ||
|
||
if(NOT have_neonv8_result) | ||
overrule_arch(neonv8 "Compiler doesn't support neonv8") | ||
endif() | ||
else(neon_compile_result) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would you want to remove the non-MSVC checks here? Looking at this section makes me think, it might be a good time to unify this section. Just a thought.
#if defined(__aarch64__) | ||
#ifdef _MSC_VER | ||
#define DO_RBIT \ | ||
*out_ptr = _byteswap_ulong(*in_ptr); \ | ||
*out_ptr = ((*out_ptr & 0x55555555) << 1) | ((*out_ptr & 0xAAAAAAAA) >> 1); \ | ||
*out_ptr = ((*out_ptr & 0x33333333) << 2) | ((*out_ptr & 0xCCCCCCCC) >> 2); \ | ||
*out_ptr = ((*out_ptr & 0x0F0F0F0F) << 4) | ((*out_ptr & 0xF0F0F0F0) >> 4); \ | ||
in_ptr++; \ | ||
out_ptr++; | ||
#elif defined(__aarch64__) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This section indicates that we already had special treatment for some conditions. Now, it looks like we exchange them for another set of conditions. This change would warrant a comment on the specifics of different platforms. This will help anyone looking at this in the future.
const uint8x16_t idx = { 3, 2, 1, 0, 7, 6, 5, 4, 11, 10, 9, 8, 15, 14, 13, 12 }; | ||
uint8x16_t idx; | ||
const uint8_t idx_data[] = { 3, 2, 1, 0, 7, 6, 5, 4, 11, 10, 9, 8, 15, 14, 13, 12 }; | ||
idx = vld1q_u8(idx_data); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why? The data type implies 16 values of the same type. Does MSVC fail to perform the correct aggregate initialization? This indirection may negatively impact optimizations.
I'd like to see this unified in one line. Would that be possible?
Would it be possible to have a godbolt/compiler explorer comparison of this for GCC/Clang vs. MSVC?
int32x4_t toint_a = { 0, 0, 0, 0 }, toint_b = { 0, 0, 0, 0 }; | ||
int32x4_t toint_a = { 0 }, toint_b = { 0 }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While you're at it, could you change this to give every variable their own statement?
float32x4_t in_vec; | ||
float32x4_t out_vec0 = { 0.f, 0.f, 0.f, 0.f }; | ||
float32x4_t out_vec1 = { 0.f, 0.f, 0.f, 0.f }; | ||
float32x4_t out_vec2 = { 0.f, 0.f, 0.f, 0.f }; | ||
float32x4_t out_vec3 = { 0.f, 0.f, 0.f, 0.f }; | ||
float32x4_t out_vec0 = { 0.f }; | ||
float32x4_t out_vec1 = { 0.f }; | ||
float32x4_t out_vec2 = { 0.f }; | ||
float32x4_t out_vec3 = { 0.f }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I fear this affects initialization. Are we sure all values are initialized correctly for all compilers?
#ifdef _MSC_VER | ||
} else if (minValues.n128_f32[number] == min) { | ||
#else | ||
} else if (minValues[number] == min) { | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like smth very special. Could you add a comment with some context? Maybe a link that explains the necessary change?
@JVital2013 are you still working on this PR? |
Fixes NEON support on Windows ARM64. Most of the changes deal with the somewhat odd way in which Microsoft deals with the uint8x16_t, etc types.
Additionally, MSVC on ARM/ARM64 does not support any inline assembly. Therefore
__VOLK_ASM
does not work, so a few workarounds have been implemented. One of these workarounds is to assume NEON is present if you're using MSVC, and building for ARM64 instead of running a check. NEON is a requirement for Windows on ARM as shown here: https://learn.microsoft.com/en-us/cpp/build/arm64-windows-abi-conventions?view=msvc-170.Note that NEONv7 on Windows is not supported in this PR, as it would require re-wiring all the kernels into MASM (or some other format) to build on MSVC. Windows on 32-bit ARM is being deprecated anyway, so I'm thinking the neonv7 kernels can be ignored on Windows.