A Survey of Deep Neural Network Compression

Cao, Guopang; Wu, Fengge; Zhao, Junsuo

doi:10.1007/978-3-030-70665-4_157

Guopang Cao^8,9,
Fengge Wu⁸ &
Junsuo Zhao⁸

Part of the book series: Lecture Notes on Data Engineering and Communications Technologies ((LNDECT,volume 88))

Included in the following conference series:

The International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery

162 Accesses

Abstract

Deep neural networks have achieved great success in many fields, including computer vision, natural language processing, and speech recognition. But these deep neural networks require huge computational overhead and memory storage, making them difficult to deploy in resource-constrained embedded devices or mobile devices. Therefore, deep neural network compression has become a hot research direction in recent years. This paper reviews and summarizes current mainstream methods of compressing deep neural networks. We divide these methods into three categories: weight compression, local compression, and global compression. In addition, we compare and analyze the results of different compression methods on the dataset. Finally, we discuss how to choose different compression methods and the future development trend on this topic.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 229.00; Price excludes VAT (USA)

Softcover Book: USD 299.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Model Compression Techniques in Deep Neural Networks

Compression of Deep Neural Networks on the Fly

Neural Network Compression Framework for Fast Model Inference

References

Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: arXiv preprint arXiv:1409.1556 (2014)
Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L.. In: ImageNet: a large-scale hierarchical image database. In: IEEE Conference on Computer Vision and Pattern Recognition. IEEE, pp. 248–255 (2009)
Google Scholar
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167 (2015)
Szegedy, C., et al.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)
Google Scholar
Denil, M., Shakibi, B., Dinh, L., Ranzato, M., De Freitas, N.: Predicting Parameters in Deep Learning. In: Advances in Neural Information Processing Systems, pp. 2148–2156 (2013)
Google Scholar
LeCun, Y., Denker, J.S., Solla, S.A.: Optimal brain damage. In: Advances in Neural Information Processing Systems, pp. 598–605 (1990)
Google Scholar
Hassibi, B., Stork, D.G.: Second order derivatives for network pruning: optimal brain surgeon. In: Advances in Neural Information Processing Systems, pp. 164–171 (1993)
Google Scholar
Srinivas, S., Babu, R.V.: Data-free parameter pruning for deep neural networks. In: arXiv preprint arXiv:1507.06149 (2015)
Han, S., Pool, J., Tran, J., Dally, W.: Learning both weights and connections for efficient neural network. In: Advances in Neural Information Processing Systems, pp. 1135–1143 (2015)
Google Scholar
Han, S., Mao, H., Dally, W.J.: Deep compression: compressing deep neural networks with pruning, trained quantization and Huffman coding. arXiv preprint arXiv:1510.00149 (2015)
Gong, Y., Liu, L., Yang, M., Bourdev, L.: Compressing deep convolutional networks using vector quantization. arXiv preprint. arXiv:1412.6115 (2014)
Courbariaux, M., Bengio, Y., David, J.-P.: BinaryConnect: training deep neural networks with binary weights during propagations. In: Advances in Neural Information Processing Systems, pp. 3123–3131 (2015)
Google Scholar
Courbariaux, M., Hubara, I., Soudry, D., El-Yaniv, R., Bengio, Y.: Binarized neural networks: Training deep neural networks with weights and activations constrained to +1 or $-$1. arXiv preprint arXiv:1602.02830 (2016)
Li, H., Kadav, A., Durdanovic, I., Samet, H., Graf, H.P.: Pruning filters for efficient convnets. arXiv preprint arXiv:1608.08710 (2016)
Luo, J.-H., Wu, J., Lin, W.: ThiNet: a filter level pruning method for deep neural network compression. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 5058–5066 (2017)
Google Scholar
He, Y., Zhang, X., Sun, J.: Channel pruning for accelerating very deep neural networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1389–1397 (2017)
Google Scholar
Mao, H., et al.: Exploring the regularity of sparse structure in convolutional neural networks. arXiv preprint arXiv:1705.08922 (2017)
Rastegari, M., Ordonez, V., Redmon, J., Farhadi, A.: XNOR-NET: ImageNet classification using binary convolutional neural networks. In: European Conference on Computer Vision, pp. 525–542. Springer (2016). https://doi.org/10.1007/978-3-319-46493-0_32
Wang , P., Cheng, J.: Fixed-point factorized networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4012–4020 (2017)
Google Scholar
Denton, E.L., Zaremba, W., Bruna, J., LeCun, Y., Fergus, R.: Exploiting linear structure within convolutional networks for efficient evaluation. In: Advances in Neural Information processing Systems, pp. 1269–1277 (2014)
Google Scholar
Jaderberg, M., Vedaldi, A., Zisserman, A.: Speeding up convolutional neural networks with low rank expansions. arXiv preprint arXiv:1405.3866 (2014)
Lebedev, V., Ganin, Y., Rakhuba, M., Oseledets, I., Lempitsky, V.: Speeding-up convolutional neural networks using fine-tuned CP-decomposition. arXiv preprint arXiv:1412.6553 (2014)
Tai, C., Xiao, T., Zhang, Y., Wang, X., et al.: Convolutional neural networks with low-rank regularization. arXiv preprint arXiv:1511.06067 (2015)
Kim, Y.-D., Park, E., Yoo, S., Choi, T., Yang, L., Shin, D.: Compression of deep convolutional neural networks for fast and low power mobile applications. arXiv preprint arXiv:1511.06530 (2015)
Wang, P., Cheng, J.: Accelerating convolutional neural networks for mobile applications. In: Proceedings of the 24th ACM International Conference on Multimedia, pp. 541–545. ACM (2016)
Google Scholar
Kolda, T.G., Bader, B.W.: Tensor decompositions and applications. SIAM Rev. 51(3), 455–500 (2009)
Article MathSciNet Google Scholar
Lebedev, V., Lempitsky, V.: Fast convnets using group-wise brain damage. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2554–2564 (2016)
Google Scholar
Zhou, H., Alvarez, J.M., Porikli, F.: Less is more: towards compact CNNs. In: European Conference on Computer Vision, pp. 662–677. Springer (2016). https://doi.org/10.1007/978-3-319-46493-0_40
Wen, W., Wu, C., Wang, Y., Chen, Y., Li, H.: Learning structured sparsity in deep neural networks. In: Advances in Neural Information Processing Systems, pp. 2074–2082 (2016)
Google Scholar
Liu, Z., Li, J., Shen, Z., Huang, G., Yan, S., Zhang, C.: Learning efficient convolutional networks through network slimming. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2736–2744 (2017)
Google Scholar
Hou, L., Yao, Q., Kwok, J.T.: Loss-aware binarization of deep networks. arXiv preprint arXiv:1611.01600 (2016)
Hu, Q., Wang, P., Cheng, J.: From hashing to CNNs: training binary weight networks via hashing. In: Thirty-Second AAAI Conference on Artificial Intelligence (2018)
Google Scholar
Zhu, C., Han, S., Mao, H., Dally, W.J.: Trained ternary quantization. arXiv preprint arXiv:1612.01064 (2016)
Li, F., Zhang, B., Liu, B.: Ternary weight networks. arXiv preprint arXiv:1605.04711 (2016)
Ioannou, Y., Robertson, D., Shotton, J., Cipolla, R., Criminisi, A.: Training CNNs with low-rank filters for efficient image classification. arXiv preprint arXiv:1511.06744 (2015)
Jin, J., Dundar, A., Culurciello, E.: Flattened convolutional neural networks for feedforward acceleration. arXiv preprint arXiv:1412.5474 (2014)
Rigamonti, R., Sironi, A., Lepetit, V., Fua, P.: Learning separable filters. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2754–2761 (2013)
Google Scholar
Lin, M., Chen, Q., Yan, S.: Network in network. arXiv preprint arXiv:1312.4400 (2013)
Wen, W., Xu, C., Wu, C., Wang, Y., Chen, Y., Li, H.: Coordinating filters for faster deep neural networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 658–666 (2017)
Google Scholar
Buciluǎ, C., Caruana, R., Niculescu-Mizil, A.: Model compression. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 535–541. ACM (2006)
Google Scholar
Ba, J., Caruana, R.: Do deep nets really need to be deep?. in: Advances in Neural Information Processing Systems, pp. 2654–2662 (2014)
Google Scholar
Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. In: arXiv preprint arXiv:1503.02531 (2015)
Romero, A., Ballas, N., Kahou, S.E., Chassang, A., Gatta, C., Bengio, Y.: FitNets: hints for thin deep nets. arXiv preprint arXiv:1412.6550 (2014)
Iandola, F.N., Han, S., Moskewicz, M.W., Ashraf, K., Dally, W.J., Keutzer, K.: SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and ¡0.5 MB model size. arXiv preprint arXiv:1602.07360 (2016)
Chollet, F.: Xception: deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision Pattern Recognition, pp. 1251–1258 (2017)
Google Scholar
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2818–2826 (2016)
Google Scholar
Howard, A.G., et al.: MobileNets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017)

Download references

Author information

Authors and Affiliations

Institute of Software Chinese Academy of Sciences, Beijing, China
Guopang Cao, Fengge Wu & Junsuo Zhao
University of Chinese Academy of Sciences, Beijing, China
Guopang Cao

Authors

Guopang Cao
View author publications
You can also search for this author in PubMed Google Scholar
Fengge Wu
View author publications
You can also search for this author in PubMed Google Scholar
Junsuo Zhao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Guopang Cao .

Editor information

Editors and Affiliations

College of Engineering, Design and Physical Sciences, Brunel University London, Uxbridge, UK
Hongying Meng
School of Electronical Information and Artificial Engineering, Shaanxi University of Science and Technology, Xi’an, China
Tao Lei
College of Engineering, Design and Physical Sciences, Brunel University London, Uxbridge, UK
Maozhen Li
College of Electrical and Information, Hunan University, Changsha, China
Kenli Li
Division of Intelligent Future Technologies, Mälardalen University, Västerås, Västmanlands Län, Sweden
Ning Xiong
School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore, Singapore
Lipo Wang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Cao, G., Wu, F., Zhao, J. (2021). A Survey of Deep Neural Network Compression. In: Meng, H., Lei, T., Li, M., Li, K., Xiong, N., Wang, L. (eds) Advances in Natural Computation, Fuzzy Systems and Knowledge Discovery. ICNC-FSKD 2020. Lecture Notes on Data Engineering and Communications Technologies, vol 88. Springer, Cham. https://doi.org/10.1007/978-3-030-70665-4_157

Download citation

DOI: https://doi.org/10.1007/978-3-030-70665-4_157
Published: 27 June 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-70664-7
Online ISBN: 978-3-030-70665-4
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

A Survey of Deep Neural Network Compression

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Model Compression Techniques in Deep Neural Networks

Compression of Deep Neural Networks on the Fly

Neural Network Compression Framework for Fast Model Inference

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

A Survey of Deep Neural Network Compression

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Model Compression Techniques in Deep Neural Networks

Compression of Deep Neural Networks on the Fly

Neural Network Compression Framework for Fast Model Inference

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation