Skip to content

Fully scratched trainer for neural network classifier for MNIST hand written digit. Only C and CUDA is used.

Notifications You must be signed in to change notification settings

TsutsuiMasayoshi/CNN-from-scratch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

32 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MNIST Trainer From Scratch on GPU

About

This is a CNN trainer for MNIST handwritten digit dataset written in C++ and CUDA from scratch.

Tools for GPU implementation

Requirements

  • nvcc
  • pgcc

Feature

image

This program supports NVIDIA Tensor Core. Tensor Core is an arithmetic circuit specialized for matrix multiplication operations.

CUDA can access Tensor Cores through WMMA API like below.

wmma::fragment<wmma::matrix_a, TILESIZE, TILESIZE, TILESIZE, __half, wmma::row_major> a_frag;
wmma::fragment<wmma::matrix_b, TILESIZE, TILESIZE, TILESIZE, __half, wmma::row_major> b_frag;
wmma::fragment<wmma::accumulator, TILESIZE, TILESIZE, TILESIZE, __half> c_frag;
wmma::fill_fragment(c_frag, __float2half(0.f));

wmma::load_matrix_sync(a_frag, &a_half[wid*ELEMS_TILE], 16);
wmma::load_matrix_sync(b_frag, &b_half[wid*ELEMS_TILE], 16);
wmma::mma_sync(c_frag, a_frag, b_frag, c_frag);
wmma::store_matrix_sync(&c_half[wid*ELEMS_TILE], c_frag, 16, wmma::mem_row_major);

Tensor core performs a certain size of matrix multiplications. We can make large matrix multiplications by splitting each matrix into small 'tiles' and throw it into Tensor Cores.

About

Fully scratched trainer for neural network classifier for MNIST hand written digit. Only C and CUDA is used.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published