XNOR convolution

Attempting to implement convolution in CUDA following XNOR-net strategy.

Prerequisites:

CUDA
CUDA capable GPU

To run:

Navigate to the directory where xnorconv.cu is located.

nvcc -arch=sm_50 xnorconv.cu -std=c++11 && ./a.out

To profile the application:

nvprof ./a.out

Note:

This is a work in progress. There might/should be some mistakes here. I started learning CUDA a month ago. Do let me know if you find any logical errors in the code.

TO DO:

Add support for variable input sizes
Add support for 3D convolution
Parallelize per convolution
Add code/function for general matrix multiplication (Already created, PM for code.)
Maximize shared memory usage - balance channel parallelization
Create a full precision verification kernel
Add full support for custom kernel sizes
Build a parser to take in shape arguments

Related/Relevant resources:

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
README.md		README.md
xnorconv.cu		xnorconv.cu

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

XNOR convolution

Prerequisites:

To run:

Note:

TO DO:

Related/Relevant resources:

About

Releases

Packages

Languages

akhauriyash/XNOR-convolution

Folders and files

Latest commit

History

Repository files navigation

XNOR convolution

Prerequisites:

To run:

Note:

TO DO:

Related/Relevant resources:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages