Abstract
Quantized neural networks (QNNs), which use low bitwidth numbers for representing parameters and performing computations, have been proposed to reduce the computation complexity, storage size and memory usage. In QNNs, parameters and activations are uniformly quantized, such that the multiplications and additions can be accelerated by bitwise operations. However, distributions of parameters in neural networks are often imbalanced, such that the uniform quantization determined from extremal values may underutilize available bitwidth. In this paper, we propose a novel quantization method that can ensure the balance of distributions of quantized values. Our method first recursively partitions the parameters by percentiles into balanced bins, and then applies uniform quantization. We also introduce computationally cheaper approximations of percentiles to reduce the computation overhead introduced. Overall, our method improves the prediction accuracies of QNNs without introducing extra computation during inference, has negligible impact on training speed, and is applicable to both convolutional neural networks and recurrent neural networks. Experiments on standard datasets including ImageNet and Penn Treebank confirm the effectiveness of our method. On ImageNet, the top-5 error rate of our 4-bit quantized GoogLeNet model is 12.7%, which is superior to the state-of-the-arts of QNNs.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks. In Proc. the 25th Int. Conf. Neural Information Processing Systems, December 2012, pp.1097-1105.
Zeiler M D, Fergus R. Visualizing and understanding convolutional networks. In Proc. European Conference on Computer Vision, September 2014, pp.818-833.
Girshick R, Donahue J, Darrell T, Malik J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, June 2014, pp.580-587.
Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, June 2015, pp.3431-3440.
Hinton G, Deng L, Yu D, Dahl G E, Mohamed A R, Jaitly N, Senior A, Vanhoucke V, Nguyen P, Sainath T N, Kingsbury B. Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Processing Magazine, 2012, 29(6): 82-97.
Graves A, Mohamed A R, Hinton G E. Speech recognition with deep recurrent neural networks. In Proc. IEEE Int. Conf. Acoustics Speech and Signal Processing (ICASSP), May 2013, pp.6645-6649.
Mikolov T, Sutskever I, Chen K, Corrado G S, Dean J. Distributed representations of words and phrases and their compositionality. In Proc. Advances in Neural Information Processing Systems, December 2013, pp.3111-3119.
Sutskever I, Vinyals O, Le Q V. Sequence to sequence learning with neural networks. In Proc. Advances in Neural Information Processing Systems, December 2014, pp.3104-3112.
Bahdanau D, Cho K, Bengio Y. Neural machine translation by jointly learning to align and translate. arXiv: 1409.0473, 2014. http://arxiv.org/abs/1409.0473, May 2017.
Mnih V, Kavukcuoglu K, Silver D, Rusu A A, Veness J, Bellemare M G, Graves A, Riedmiller M, Fidjeland A K, Ostrovski G, Petersen S, Beattie C, Sadik A, Antonoglou I, King H, Wierstra D K D, Legg S, Hassabis D. Human-level control through deep reinforcement learning. Nature, 2015, 518(7540): 529-533.
Silver D, Huang A, Maddison C J, Guez A, Sifre L, van den Driessche G, Schrittwieser J, Antonoglou I, Panneershelvam V, Lanctot M, Dieleman S, Grewe D, Nham J, Kalchbrenner N, Sutskever I, Lillicrap T, Leach M, Kavukcuoglu K, Graepel T, Hassabis D. Mastering the game of Go with deep neural networks and tree search. Nature, 2016, 529(7587): 484-489.
He K M, Zhang X Y, Ren S Q Sun J. Identity mappings in deep residual networks. In Proc. the 14th European Conf. Computer Vision (ECCV), October 2016, pp.630-645.
Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. arXiv: 1409.1556, 2014. http://arxiv.org/abs/1409.1556, May 2017.
Szegedy C, Liu W, Jia Y Q, Sermanet P, Reed S E, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A. Going deeper with convolutions. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, June 2015.
He K M, Zhang X Y, Ren S Q, Sun J. Deep residual learning for image recognition. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, June 2016, pp.770-778.
Galal S, Horowitz M. Energy-efficient floating-point unit design. IEEE Trans. Computers, 2011, 60(7): 913-922.
Hochreiter S, Schmidhuber J. Long short-term memory. Neural Computation, 1997, 9(8): 1735-1780.
Chung J, Gulcehre C, Cho K, Bengio Y. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv: 1412.3555, 2014. http://arxiv.org/abs/14-12.3555, May 2017.
Pham P H, Jelaca D, Farabet C, Martini B, LeCun Y, Culurciello E. NeuFlow: Dataflow vision processing system-ona-chip. In Proc. the 55th IEEE Int. Midwest Symp. Circuits and Systems (MWSCAS), August 2012, pp.1044-1047.
Chen T S, Du Z D, Sun N H, Wang J, Wu C Y, Chen Y J, Temam O. DianNao: A small-footprint high-throughput accelerator for ubiquitous machine-learning. In Proc. the 9th Int. Conf. Architectural Support for Programming Languages and Operating Systems (ASPLOS), March 2014, pp.269-284.
Luo T, Liu S L, Li L, Wang Y Q, Zhang S J, Chen T S, Xu Z W, Temam O, Chen Y J. DaDianNao: A neural network supercomputer. IEEE Trans. Computers, 2017, 66(1): 73-88.
Denton E L, Zaremba W, Bruna J, LeCun Y, Fergus R. Exploiting linear structure within convolutional networks for efficient evaluation. In Proc. the 27th Int. Conf. Neural Information Processing Systems, December 2014, pp.1269-1277.
Jaderberg M, Vedaldi A, Zisserman A. Speeding up convolutional neural networks with low rank expansions. In Proc. British Machine Vision Conference (BMVC), September 2014.
Tai C, Xiao T, Zhang Y, Wang X G, E W N. Convolutional neural networks with low-rank regularization. arXiv: 1511.06067, 2015. http://arxiv.org/abs/1511.06067, May 2017.
Zhou S C, Wu J N, Wu Y X, Zhou X Y. Exploiting local structures with the Kronecker layer in convolutional networks. arXiv: 1512.09194, 2015. https://arxiv.org/abs/15-12.09194, May 2017.
Novikov A, Podoprikhin D, Osokin A, Vetrov D. Tensorizing neural networks. In Proc. Advances in Neural Information Processing Systems, December 2015, pp.442-450.
Zhang X Y, Zou J H, He K M, Sun J. Accelerating very deep convolutional networks for classification and detection. IEEE Trans. Pattern Analysis and Machine Intelligence, 2016, 38(10): 1943-1955.
Anwar S, Hwang K, Sung W. Structured pruning of deep convolutional neural networks. arXiv: 1512.08571, 2015. http://arxiv.org/abs/1512.08571, May 2017.
Han S, Pool J, Tran J, Dally W J. Learning both weights and connections for efficient neural network. In Proc. Advances in Neural Information Processing Systems, December 2015, pp.1135-1143.
Han S, Mao H, Dally W J. Deep compression: Compressing deep neural networks with pruning, trained quantization and Huffman coding. arXiv: 1510.00149, 2015. https://arxiv.org/abs/1510.00149, May 2017.
Liu B Y,Wang M, Foroosh H, Tappen M, Penksy M. Sparse convolutional neural networks. In Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), June 2015, pp.806-814.
Cheng Y, Yu F X, Feris R S, Kumar S, Choudhary A, Chang S F. An exploration of parameter redundancy in deep networks with circulant projections. In Proc. IEEE Int. Conf. Computer Vision, December 2015, pp.2857-2865.
Chen W L, Wilson J T, Tyree S, Weinberger K Q, Chen Y X. Compressing neural networks with the hashing trick. In Proc. the 32nd Int. Conf. Int. Machine Learning, July 2015, pp.2285-2294.
Chen W L, Wilson J, Tyree S, Weinberger K Q, Chen Y X. Compressing convolutional neural networks in the frequency domain. In Proc. the 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, August 2016, pp.1475-1484.
Anguita D, Carlino L, Ghio A, Ridella S. A FPGA core generator for embedded classification systems. Journal of Circuits Systems and Computers, 2011, 20(2): 263-282.
Vanhoucke V, Senior A, Mao M Z. Improving the speed of neural networks on CPUs. In Proc. Deep Learning and Unsupervised Feature Learning Workshop, December 2011.
Alvarez R, Prabhavalkar R, Bakhtin A. On the efficient representation and execution of deep acoustic models. In Proc. the 17th Annual Conf. the Int. Speech Communication Association, September 2016, pp.2746-2750.
Zen H, Agiomyrgiannakis Y, Egberts N, Henderson F, Szczepaniak P. Fast, compact, and high quality LSTMRNN based statistical parametric speech synthesizers for mobile devices. In Proc. the 17th Annual Conf. the Int. Speech Communication Association, September 2016, pp.2273-2277.
Gong Y C, Liu L, Yang M, Bourdev L. Compressing deep convolutional networks using vector quantization. arXiv: 1412.6115, 2014. https://arxiv.org/abs/1412.6115, May 2017.
Merolla P, Appuswamy R, Arthur J, Esser S K, Modha D. Deep neural networks are robust to weight binarization and other non-linear distortions. arXiv: 1606.01981, 2016. https://arxiv.org/abs/1606.01981, May 2017.
Gupta S, Agrawal A, Gopalakrishnan K, Narayanan P. Deep learning with limited numerical precision. arXiv: 1502.02551, 2015. http://arxiv.org/abs/1502.02551, May 2017.
Courbariaux M, Bengio Y. BinaryNet: Training deep neural networks with weights and activations constrained to +1 or −1. arXiv: 1602.02830v1, 2016. http://arxiv.org/abs/1602.02830v1, May 2017.
Wu J X, Leng C, Wang Y H, Hu Q H, Cheng J. Quantized convolutional neural networks for mobile devices. arXiv: 1512.06473, 2016. https://www.arxiv.org/abs/1512.06473, May 2017.
Kim M, Smaragdis P. Bitwise neural networks. arXiv: 1601.06071, 2016. https://arxiv.org/abs/1601.06071, May 2017.
Hubara I, Courbariaux M, Soudry D, El-Yaniv R, Bengio Y. Binarized neural networks. In Proc. the 30th Conf. Neural Information Processing Systems, December 2016, pp.4107-4115.
Rastegari M, Ordonez V, Redmon J, Farhadi A. XNORNet: ImageNet classification using binary convolutional neural networks. In Proc. the 14th European Conf. Computer Vision, October 2016, pp.525-542.
Hinton G, Srivastava N, Swersky K. Coursera: Neural networks for machine learning. 2012. https://www.classcentral. com/mooc/398/coursera-neural-networks-for-mach ine-learning, May 2017.
Bengio Y, L´eonard N, Courville A C. Estimating or propagating gradients through stochastic neurons for conditional computation. arXiv: 1308.3432, 2013. http://adsabs.harvard. edu/abs/2013arXiv1308.3432B, May 2017.
Hwang K, Sung W. Fixed-point feedforward deep neural network design using weights +1, 0, and −1. In Proc. IEEE Workshop on Signal Processing Systems, October 2014.
Shin S, Hwang K, Sung W. Fixed-point performance analysis of recurrent neural networks. In Proc. IEEE Int. Conf. Acoustics Speech and Signal Processing (ICASSP), March 2016, pp.976-980.
Hubara I, Courbariaux M, Soudry D, El-Yaniv R, Bengio Y. Quantized neural networks: Training neural networks with low precision weights and activations. arXiv: 1609.07061, 2016. http://arxiv.org/abs/1609.07061, May 2017.
Miyashita D, Lee E H, Murmann B. Convolutional neural networks using logarithmic data representation. arXiv: 1603.01025, 2016. https://arxiv.org/abs/1603.01025, May 2017.
Zhou S C, Wu Y X, Ni Z K, Zhou X Y, Wen H, Zou Y H. DoReFa-Net: Training low bitwidth convolutional neural networks with low bitwidth gradients. arXiv: 1606.06160, 2016. https://www.arxiv.org/abs/1606.06160, May 2017.
Abadi M, Agarwal A, Barham P, Brevdo E, Chen Z F, Citro C, Corrado G S, Davis A, Dean J, Devin M, Ghemawat S, Goodfellow I, Harp A, Irving G, Isard M, Jia Y Q, Jozefowicz R, Kaiser L, Kudlur M, Levenberg J, Mane D, Monga R, Moore S, Murray D, Olah C, Schuster M, Shlens J, Steiner B, Sutskever I, Talwar K, Tucker P, Vanhoucke V, Vasudevan V, Viegas F, Vinyals O, Warden P, Wattenberg M, Wicke M, Yu Y, Zheng X Q. TensorFlow: Large-scale machine learning on heterogeneous distributed systems. arXiv: 1603.04467, 2015. https://arxiv.org/abs/1603.04467, May 2017.
Andri R, Cavigelli L, Rossi D, Benini L. YodaNN: An ultralow power convolutional neural network accelerator based on binary weights. In Proc. IEEE Computer Society Annual Symposium on VLSI, July 2016, pp.236-241.
Lee M, Hwang K, Park J, Choi S, Shin S, Sung W. FPGAbased low-power speech recognition with recurrent neural networks. In Proc. IEEE Int. Workshop on Signal Processing Systems, October 2016, pp.230-235.
Courbariaux M, Bengio Y, David J P. BinaryConnect: Training deep neural networks with binary weights during propagations. In Proc. the 28th Int. Conf. Neural Information Processing Systems, December 2015, pp.3123-3131.
Saxe A M, Koh P W, Chen Z H, Bhand M, Suresh B, Ng A Y. On random weights and unsupervised feature learning. In Proc. the 28th Int. Conf. Machine Learning, June 2011, pp.1089-1096.
Giryes R, Sapiro G, Bronstein A M. Deep neural networks with random gaussian weights: A universal classification strategy? IEEE Trans. Signal Processing, 2016, 64(13): 3444-3457.
Heckbert P. Color image quantization for frame buffer display. In Proc. the 9th Annual Conf. Computer Graphics and Interactive Techniques, July 1982, pp.297-307.
Mallows C. Another comment on o’cinneide. The American Statistician, 1991, 45(3): 257.
Ioffe S, Szegedy C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv: 1502.03167, 2015. https://arxiv.org/abs/1502.03167, May 2017.
Netzer Y, Wang T, Coates A, Bissacco A, Wu B, Ng A Y. Reading digits in natural images with unsupervised feature learning. In Proc. Workshop on Deep Learning and Unsupervised Feature Learning, Dec. 2011.
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z H, Karpathy A, Khosla A, Bernstein M, Berg A C, Li F F. ImageNet large scale visual recognition challenge. International Journal of Computer Vision, 2015, 115(3): 211-252.
Gysel P, Motamedi M, Ghiasi S. Hardware-oriented approximation of convolutional neural networks. arXiv: 1604.03168, 2016. http://arxiv.org/abs/1604.03168, May 2017.
Taylor A, Marcus M, Santorini B. The Penn Treebank: An overview. In Treebanks, Abeill´e A(ed.), Springer, 2003, pp.5-22.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Below is the link to the electronic supplementary material.
ESM 1
(PDF 1258 kb)
Rights and permissions
About this article
Cite this article
Zhou, SC., Wang, YZ., Wen, H. et al. Balanced Quantization: An Effective and Efficient Approach to Quantized Neural Networks. J. Comput. Sci. Technol. 32, 667–682 (2017). https://doi.org/10.1007/s11390-017-1750-y
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11390-017-1750-y