编译aarch64版本的IM_Conv_SIMD出现以下报错，请问有解决方法吗？ #58

Jony-2018 · 2024-06-19T10:37:15Z

DennisLiu1993 · 2024-06-20T07:22:06Z

這個直接貼GPT就解決了 Jony ***@***.***> 於 2024年6月19日週三下午6:37寫道：

default.png (view on web) <https://github.com/DennisLiu1993/Fastest_Image_Pattern_Matching/assets/51848340/dec6f37c-690b-41e6-8c17-ece590b7ec0a> — Reply to this email directly, view it on GitHub <#58>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AY7JBQYQOC46BVZRZJJSREDZIFNPFAVCNFSM6AAAAABJRZD6UWVHI2DSMVQWIX3LMV43ASLTON2WKOZSGM3DEMBQGQZDKOI> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

Jony-2018 · 2024-06-20T08:24:16Z

感谢大佬抽空回复，我昨天找了GPT，他让我
将代码块修改成：

		int16x8_t SrcK_L = (int16x8_t)vmovl_u8(vget_low_u8(SrcK));
		int16x8_t SrcK_H = (int16x8_t)vmovl_u8(vget_high_u8(SrcK));
		int16x8_t SrcC_L = (int16x8_t)vmovl_u8(vget_low_u8(SrcC));
		int16x8_t SrcC_H = (int16x8_t)vmovl_u8(vget_high_u8(SrcC));

就能正常编译了，但是运行结果有问题，暂时不知道哪里的值需要更改
运行后粗匹配的iMatchSize和在Windows上的数量是一样的，但是最后输出的结果是空，而在Windows上输出的结果是匹配到3个目标
请问您有思路吗？

DennisLiu1993 · 2024-06-20T09:29:19Z

inline int32_t neon_hsum_epi32(int32x4_t V) { int32x2_t SumV = vadd_s32(vget_low_s32(V), vget_high_s32(V)); SumV = vpadd_s32(SumV, SumV); return vget_lane_s32(SumV, 0); } inline int32_t neon_haddw_s32(int16x8_t V) { int32x4_t SumV = vpaddlq_s16(V); SumV = vaddq_s32(SumV, vextq_s32(SumV, SumV, 1)); // Optional: Enable for summing all 4 lanes return neon_hsum_epi32(SumV); } inline int IM_Conv_SIMD(unsigned char* pCharKernel, unsigned char* pCharConv, int iLength) { const int iBlockSize = 16, Block = iLength / iBlockSize; int32x4_t SumV = vdupq_n_s32(0); uint8x16_t Zero = vdupq_n_u8(0); for (int Y = 0; Y < Block * iBlockSize; Y += iBlockSize) { uint8x16_t SrcK = vld1q_u8(pCharKernel + Y); uint8x16_t SrcC = vld1q_u8(pCharConv + Y); int16x8_t SrcK_L = vmovl_u8(vget_low_u8(SrcK)); int16x8_t SrcK_H = vmovl_u8(vget_high_u8(SrcK)); int16x8_t SrcC_L = vmovl_u8(vget_low_u8(SrcC)); int16x8_t SrcC_H = vmovl_u8(vget_high_u8(SrcC)); int32x4_t MulLow = vmull_s16(vget_low_s16(SrcK_L), vget_low_s16(SrcC_L)); int32x4_t MulHigh = vmull_s16(vget_high_s16(SrcK_L), vget_high_s16(SrcC_L)); int32x4_t SumT = vaddq_s32(MulLow, MulHigh); MulLow = vmull_s16(vget_low_s16(SrcK_H), vget_low_s16(SrcC_H)); MulHigh = vmull_s16(vget_high_s16(SrcK_H), vget_high_s16(SrcC_H)); SumT = vaddq_s32(SumT, vaddq_s32(MulLow, MulHigh)); SumV = vaddq_s32(SumV, SumT); } int32_t Sum = neon_hsum_epi32(SumV); for (int Y = Block * iBlockSize; Y < iLength; Y++) { Sum += pCharKernel[Y] * pCharConv[Y]; } return Sum; } 試試這段？ Jony ***@***.***> 於 2024年6月20日週四下午4:24寫道：

…

感谢大佬抽空回复，我昨天找了GPT，他让我将代码块修改成： int16x8_t SrcK_L = (int16x8_t)vmovl_u8(vget_low_u8(SrcK)); int16x8_t SrcK_H = (int16x8_t)vmovl_u8(vget_high_u8(SrcK)); int16x8_t SrcC_L = (int16x8_t)vmovl_u8(vget_low_u8(SrcC)); int16x8_t SrcC_H = (int16x8_t)vmovl_u8(vget_high_u8(SrcC)); 就能正常编译了，但是运行结果有问题，暂时不知道哪里的值需要更改运行后粗匹配的iMatchSize和在Windows上的数量是一样的，但是最后输出的结果是空，而在Windows上输出的结果是匹配到3个目标请问您有思路吗？ — Reply to this email directly, view it on GitHub <#58 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AY7JBQYBG3FVEOPWMZA2NS3ZIKGUNAVCNFSM6AAAAABJRZD6UWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOBQGEYDKMJVGM> . You are receiving this because you commented.Message ID: ***@***.*** com>

Jony-2018 · 2024-06-20T10:21:30Z

在您的基础上修改

		int16x8_t SrcK_L = (int16x8_t)vmovl_u8(vget_low_u8(SrcK));
		int16x8_t SrcK_H = (int16x8_t)vmovl_u8(vget_high_u8(SrcK));
		int16x8_t SrcC_L = (int16x8_t)vmovl_u8(vget_low_u8(SrcC));
		int16x8_t SrcC_H = (int16x8_t)vmovl_u8(vget_high_u8(SrcC));

成功编译并结果正常，感谢大佬回复和帮助！

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

编译aarch64版本的IM_Conv_SIMD出现以下报错，请问有解决方法吗？ #58

编译aarch64版本的IM_Conv_SIMD出现以下报错，请问有解决方法吗？ #58

Jony-2018 commented Jun 19, 2024

DennisLiu1993 commented Jun 20, 2024 via email

Jony-2018 commented Jun 20, 2024

DennisLiu1993 commented Jun 20, 2024 via email

Jony-2018 commented Jun 20, 2024

编译aarch64版本的IM_Conv_SIMD出现以下报错，请问有解决方法吗？ #58

编译aarch64版本的IM_Conv_SIMD出现以下报错，请问有解决方法吗？ #58

Comments

Jony-2018 commented Jun 19, 2024

DennisLiu1993 commented Jun 20, 2024 via email

Jony-2018 commented Jun 20, 2024

DennisLiu1993 commented Jun 20, 2024 via email

Jony-2018 commented Jun 20, 2024