-
Notifications
You must be signed in to change notification settings - Fork 487
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
response map fusion implementation #77
Comments
一般来说是因为MIPP在有些指令集上没有实现函数。先关掉能正常跑吗 |
|
什么报错? |
这个是最新的代码直接跑的吗?我找个win笔记本试试 |
這一個fusion.h,改了點指標用到gauss_size的地方讓VS編譯過,MIPP也是從這來的 |
@DennisLiu-elogic 我试了下,gauss_size那用vector,SIMD关掉可以跑呀。 |
這麼奇怪,int32_t* parent_buf_ptr [gauss_size] --- > int32_t* parent_buf_ptr [5]導致不開simd也會錯...? |
@meiqua RGB图的fusion最近有计划更新吗? |
@aemior 我打算先把这个SIMD的问题解决掉,然后做RGB2GRAY的fusion。RGB的fusion有点麻烦,感觉不是很必要。 |
@meiqua 好的,我这边做的RGB的pipline,RGB的化如果涉及不同目标的自然场景的检测应该能提高精度,工业场景确实没必要 |
@mangosroom VS编译器不支持变量数组,新commit改成vector可用 |
嗯嗯,我也是这么改的,算法层代码最好还是写标准的c++ |
@DennisLiu-elogic 现在SSE2应该能跑了。之前测的结果是SSE4 AVX2可以 |
|
这两个文件没改,改的是MIPP,增加了mul<int32_t> abs<int32_t> cvt<int16_t,int32_t> |
居然沒注意到後面有判斷... |
先填0不如这个快,因为会多一遍写入。不过这里不是hot path,时间差不了多少。 |
如果use_simd = true,但没有配置SIMD确实会出错;use_simd = false这个我跑的没问题,是用的最新的代码吗? |
确实会越界,应该加上条件。之前之所以还能正常跑,是因为越界的时候刚好没用这个值,然后编译器也不会做越界检查。 |
這個加了檢查後沒問題 但在use_simd=true且編譯器開啟SSE2時還是會報錯。 update_simd ()中的dxint16.r = 0時 測試圖檔 |
什么报错? |
看起来是未定义low<int16_t>,但其实已经在这里定义过了。这应该会在use_simd=true,同时没有配置SSE2时发生;确定SSE2开了吗?可以跑mipp_test()看看 |
MIPP通过这里的宏进入SSE分支,不太清楚VS编译器定义了没。 |
我用vs也是只能用avx2,但cpu不支持avx指令集,这个怎么使用MIPP呢?看MIPP那里是支持SSE的。 |
也是上面说的问题吗,开SSE但MIPP没进入SSE分支? |
搜了下,还真是这样:
试试这个branch解决了没 |
这个关系不大。SSE2的时候应该把__SSE__的宏也加上,改了下,再试试? |
可以用了,赞。不过,我测试感觉在VS上,使用MIPP的效果不明显。
不过,这个在linux上跑很快,设置padding=500,像素大于200w的,大概总耗时80ms。CPU:i7-8700 然后,VS2017,AVX2,CPU:i5-6300,同样master那份,padding=500,耗时大概280ms。
然后,fusion那份代码,(1)图像200w,VS2017,SSE2,CPU: i3,开闭AVX2的耗时都大概100-110ms。(2)图像200w,VS2015,CPU: i7-6700,开闭AVX2的耗时都大概80ms。 这个环境用的有点乱,但VS上使用MIPP速度没怎么提升,Linux上提升明显。看MIPP那里的说明,是需要升级到VS2019吗? |
MIPP相对最开始SSE实现对速度提升应该不大,是为了在arm上能用加的;linux平台下快一点是有可能的,一是opencv可能不同版本、不同编译选项下的速度不一样,二是可能像这里说的inline做的更好。 |
哦哦。fusion那份代码跑200w像素的图片,用时大概70-80ms,CPU:i7,OpenCV:3.4.6;这个属于正常吗? |
不正常,我在ubuntu16.04 i7跑的20ms。可以把这行改成false先关掉MIPP看看是不是inline的问题,我关掉后大概40ms。 |
自带的图像,padding=500,在ubuntu16.04 i7上跑也是20ms,关了MIPP大概50ms。 |
看起来是这样,因为fusion的代码没调用opencv,那可能就是编译器优化不够了。 |
嗯嗯,之后找个装VS2017的电脑试试。感谢感谢。 |
VS2017对速度提升是有效的。看来VS2015对MIPP也是不支持的。 |
单张图像内,多个模板实例,需要加上 cv_dnn_nms::NMSBoxes,设置好重叠率,然后再做ICP 配准 |
测试图片1200x1200
|
使用 fusion_by_hand 分支跑了一下测试程序,结果如下:第一次打印的 fusion 耗时严重
|
@wiekern |
Motivation
According to Halide paper, fusion can improve the creation of response map a lot. However, configing Halide is not an easy job, and our response map don't need many features of Halide too. So implementing a simple version of tile-based fusion method is preferred. This is also what opencv4 is doing.
Related issues
Current works
Currently, a simple tile-based fusion pipeline is implemented, and gaussian / sobel / mag / phase / hist / spread ... is finished and tested. Refer to fusion by hand branch for more info. The basic idea is implementing tile-based fusion only, and do the compiling stuff of Halide by hand... Though it seems not as fancy as Halide, it simplifies jobs a lot and is easy to use too.
Results and TODOs
The speed is roughly 10x faster than using opencv. We will use it to create response map in the future.
See test_fusion.cpp for more examples. Also, Any discussion, test, or improvements are welcomed!
Update
Now we pass all tests and match function can be used as usual! It's about 6x faster for full pipeline of creating response map, and no need to crop images to 16n as before.
Update
Now rgb image is also supported, by cvtColor first. After investigating many solutions, we found using opencv is the cleanest way... Compared with using gray image, cvtColor only cost ~5% more.
The text was updated successfully, but these errors were encountered: