It uses simple neural neywork with Stochastic Gradient Descent and Backpropogation
1. cross-entropy cost function instead of quadratic cost function L2-regularisation
2. Weights are initiazed to mean zero
The best efficiency I got was 98.87% with a couple of convolutional layers, a fully connected layer of 640 neurons, learning rate of 0.1