Attempting to implement convolution in CUDA following XNOR-net strategy.
- CUDA
- CUDA capable GPU
Navigate to the directory where xnorconv.cu is located.
nvcc -arch=sm_50 xnorconv.cu -std=c++11 && ./a.out
To profile the application:
nvprof ./a.out
This is a work in progress. There might/should be some mistakes here. I started learning CUDA a month ago. Do let me know if you find any logical errors in the code.
- Add support for variable input sizes
- Add support for 3D convolution
- Parallelize per convolution
- Add code/function for general matrix multiplication (Already created, PM for code.)
- Maximize shared memory usage - balance channel parallelization
- Create a full precision verification kernel
- Add full support for custom kernel sizes
- Build a parser to take in shape arguments