CUDA Kernels

In this repository, I will brush up on C, dive deeper into low-level programming, and optimizations to aid scientific computing. Most scientific programming I do involves large tensors. Hence, understanding the concepts required to effectively use parallel programming and efficient memory access will increase overall workflow efficiency.

Objectives

Familiarize with C:
- Syntax
- Pointers
- Structs
- Memory management
- (Basically things not present in Python, which I'm most familiar with)
Understand low-level concepts:
- Caching
- Optimizing code (e.g., row-major vs column-major matrices)
Learn CUDA C:
- Implement my own matrix multiplication kernel with different levels of optimization (e.g., tiling)
Create a perceptron using CuBLAS:
- ~~Leverage its ability to merge kernels~~
Create a convolution kernel and use it for classification with MNIST:
- Train the model with pytorch
- Deploy using the kernel ported to python
Develop kernels accessible with PyTorch
Explore higher-level CUDA abstractions:
- Numba
- Triton (used by PyTorch)
- Taichi (a DSL with python frontend)

Name		Name	Last commit message	Last commit date
Latest commit History 93 Commits
2dconvolution		2dconvolution
gemm		gemm
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
cpu_functions.c		cpu_functions.c
cpu_functions.h		cpu_functions.h
git.sh		git.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CUDA Kernels

Objectives

About

Releases

Packages

Languages

ofs416/cuda_kernels

Folders and files

Latest commit

History

Repository files navigation

CUDA Kernels

Objectives

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages