Caffe for Deep Compression
This is a simple caffe implementation of Deep Compression(https://arxiv.org/abs/1510.00149), including weight prunning and quantization.
According to the paper, the compression are implemented only on convolution and fully-connected layers.
Thus we add a CmpConvolution and a CmpInnerProduct layer.
The params that controlls the sparsity including:
- sparse_ratio: the ratio of pruned weights
- class_num: the numbers of k-means for weight quantization
- quantization_term: whether to set quantization on
For a better understanding, please see the examples/mnist and run the demo script, which automatically compresses a pretrained MNIST LeNet caffemodel.
$Bash
# clone repository and make
$ git clone https://github.com/may0324/DeepCompression-caffe.git
$ cd DeepCompression-caffe
$ make -j 32
# run demo script, this will finetune a pretrained model
$ python examples/mnist/train_compress_lenet.py
the sparse parameters of lenet are set based on the paper as follows:
layer name | sparse ratio | quantization num |
---|---|---|
conv1 | 0.33 | 256 |
conv2 | 0.8 | 256 |
fc1 | 0.9 | 32 |
fc2 | 0.8 | 32 |
In practice, the layers are much more sensitive to weight prunning than weight quantization.
So we suggest to do weight prunning layer-wisely
and do weight quantization finally since it almost does no harm to accuary.
In the script demo, we set the sparse ratio (the ratio of pruned weights) layer-wisely and do each finetuning iteration.
After all layers are properly pruned, weight quantization are done on all layers simultaneously.
The final accuracy of finetuned model is about 99.06%, you can check if the weights are most pruned and weight-shared for sure.
Please refer to http://blog.csdn.net/may0324/article/details/52935869 for more.
Enjoy!