-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Slimming Resnet #2
Comments
In our models, the residual branch is BN-RELU-CONV-BN-RELU-CONV-BN-RELU-CONV. In the addition, all features from the identity mapping and the last CONV in residual branch are kept. So the main branch has the original widths of ResNets. The pruning only happens in layers inside residual branch. Inside each residual branch:
If your residual branch is different from ours, you may need to modify the pruning process. But the key point is that the main branch doesn't get slimmed, the pruning is only inside residual branch. How you prune in the residual branch depends on how you order your BN and CONV layers. |
Thanks. Do you think the sparsity will be effected if BN layers on main branch are not penalty by L1 norm. If yes, how? |
What I mean by "main branch" is the identity shortcut throughout the network, so there are no BN layers in main branch. Whenever there is an BN, we can do channel pruning or selection according to its scaling parameters. Thanks! |
hi, @liuzhuang13 , can you release the code about DenseNet-slimming? Thank you |
Hi @youngfly11, thanks for your interests. DenseNet's code is a little different than VGG's. Unfortunately I am busy with other things now, so I will probably release the code when I have time next month. The way I implemented DenseNet slimming can save parameters and FLOPs, however, cannot bring speedup in the current Torch package. I implemented it using a channel selection layer, which leads to slower inference than a normal network, because it involves memory copy, not in-place selection. If you just want the same speed as normal network, after training you can set low scaling factors and corresponding biases to 0, and don't do gradient update on them. It's equivalent as actually pruning the channels. Thanks |
In case you're still interested, we've released our Pytorch implementation here https://github.com/Eric-mingjie/network-slimming, which supports ResNet and DenseNet. |
Thanks |
Thanks for your wonderful work. |
hi,have you solved this problem?i also encounter this issue. |
hi,how do you handle with this situation?thx |
Dear @liuzhuang13,
data:image/s3,"s3://crabby-images/4246b/4246b5cbfbb57c37dbbef1b58c92b63a0a90077a" alt="image"
I guess we should prune some channel of subsequent conv layer' kernels after pruning current layer. Am I right?
So I can not figure out how to slim residual block using your method.
The two branches may have diffrient channels pruned, so we can only prune the intersection of both?
Almost the same situation in shortcut version. How do you handle this?
Thanks
The text was updated successfully, but these errors were encountered: