-
Notifications
You must be signed in to change notification settings - Fork 212
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
编译aarch64版本的IM_Conv_SIMD出现以下报错,请问有解决方法吗? #58
Comments
這個直接貼GPT就解決了
Jony ***@***.***> 於 2024年6月19日 週三 下午6:37寫道:
… default.png (view on web)
<https://github.com/DennisLiu1993/Fastest_Image_Pattern_Matching/assets/51848340/dec6f37c-690b-41e6-8c17-ece590b7ec0a>
—
Reply to this email directly, view it on GitHub
<#58>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AY7JBQYQOC46BVZRZJJSREDZIFNPFAVCNFSM6AAAAABJRZD6UWVHI2DSMVQWIX3LMV43ASLTON2WKOZSGM3DEMBQGQZDKOI>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
感谢大佬抽空回复,我昨天找了GPT,他让我
就能正常编译了,但是运行结果有问题,暂时不知道哪里的值需要更改 |
inline int32_t neon_hsum_epi32(int32x4_t V)
{
int32x2_t SumV = vadd_s32(vget_low_s32(V), vget_high_s32(V));
SumV = vpadd_s32(SumV, SumV);
return vget_lane_s32(SumV, 0);
}
inline int32_t neon_haddw_s32(int16x8_t V)
{
int32x4_t SumV = vpaddlq_s16(V);
SumV = vaddq_s32(SumV, vextq_s32(SumV, SumV, 1)); // Optional: Enable for
summing all 4 lanes
return neon_hsum_epi32(SumV);
}
inline int IM_Conv_SIMD(unsigned char* pCharKernel, unsigned char* pCharConv,
int iLength)
{
const int iBlockSize = 16, Block = iLength / iBlockSize;
int32x4_t SumV = vdupq_n_s32(0);
uint8x16_t Zero = vdupq_n_u8(0);
for (int Y = 0; Y < Block * iBlockSize; Y += iBlockSize)
{
uint8x16_t SrcK = vld1q_u8(pCharKernel + Y);
uint8x16_t SrcC = vld1q_u8(pCharConv + Y);
int16x8_t SrcK_L = vmovl_u8(vget_low_u8(SrcK));
int16x8_t SrcK_H = vmovl_u8(vget_high_u8(SrcK));
int16x8_t SrcC_L = vmovl_u8(vget_low_u8(SrcC));
int16x8_t SrcC_H = vmovl_u8(vget_high_u8(SrcC));
int32x4_t MulLow = vmull_s16(vget_low_s16(SrcK_L), vget_low_s16(SrcC_L));
int32x4_t MulHigh = vmull_s16(vget_high_s16(SrcK_L), vget_high_s16(SrcC_L));
int32x4_t SumT = vaddq_s32(MulLow, MulHigh);
MulLow = vmull_s16(vget_low_s16(SrcK_H), vget_low_s16(SrcC_H));
MulHigh = vmull_s16(vget_high_s16(SrcK_H), vget_high_s16(SrcC_H));
SumT = vaddq_s32(SumT, vaddq_s32(MulLow, MulHigh));
SumV = vaddq_s32(SumV, SumT);
}
int32_t Sum = neon_hsum_epi32(SumV);
for (int Y = Block * iBlockSize; Y < iLength; Y++)
{
Sum += pCharKernel[Y] * pCharConv[Y];
}
return Sum;
} 試試這段?
Jony ***@***.***> 於 2024年6月20日 週四 下午4:24寫道:
… 感谢大佬抽空回复,我昨天找了GPT,他让我
将代码块修改成:
int16x8_t SrcK_L = (int16x8_t)vmovl_u8(vget_low_u8(SrcK));
int16x8_t SrcK_H = (int16x8_t)vmovl_u8(vget_high_u8(SrcK));
int16x8_t SrcC_L = (int16x8_t)vmovl_u8(vget_low_u8(SrcC));
int16x8_t SrcC_H = (int16x8_t)vmovl_u8(vget_high_u8(SrcC));
就能正常编译了,但是运行结果有问题,暂时不知道哪里的值需要更改
运行后粗匹配的iMatchSize和在Windows上的数量是一样的,但是最后输出的结果是空,而在Windows上输出的结果是匹配到3个目标
请问您有思路吗?
—
Reply to this email directly, view it on GitHub
<#58 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AY7JBQYBG3FVEOPWMZA2NS3ZIKGUNAVCNFSM6AAAAABJRZD6UWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOBQGEYDKMJVGM>
.
You are receiving this because you commented.Message ID:
***@***.***
com>
|
在您的基础上修改
成功编译并结果正常,感谢大佬回复和帮助! |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
The text was updated successfully, but these errors were encountered: