Learning dynamics of gradient descent optimization in deep neural networks

Wu, Wei; Jing, Xiaoyuan; Du, Wencai; Chen, Guoliang

doi:10.1007/s11432-020-3163-0

Learning dynamics of gradient descent optimization in deep neural networks

Research Paper
Special Focus on Constraints and Optimization in Artificial Intelligence
Published: 08 April 2021

Volume 64, article number 150102, (2021)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Science China Information Sciences Aims and scope Submit manuscript

Learning dynamics of gradient descent optimization in deep neural networks

Download PDF

Wei Wu¹,
Xiaoyuan Jing¹,
Wencai Du² &
…
Guoliang Chen³

363 Accesses
14 Citations
Explore all metrics

Abstract

Stochastic gradient descent (SGD)-based optimizers play a key role in most deep learning models, yet the learning dynamics of the complex model remain obscure. SGD is the basic tool to optimize model parameters, and is improved in many derived forms including SGD momentum and Nesterov accelerated gradient (NAG). However, the learning dynamics of optimizer parameters have seldom been studied. We propose to understand the model dynamics from the perspective of control theory. We use the status transfer function to approximate parameter dynamics for different optimizers as the first- or second-order control system, thus explaining how the parameters theoretically affect the stability and convergence time of deep learning models, and verify our findings by numerical experiments.

Article PDF

State space representation and phase analysis of gradient descent optimizers

Article 27 March 2023

Convergence of Stochastic Gradient Descent in Deep Neural Network

Article 06 January 2021

A mean-field optimal control formulation of deep learning

Article 13 December 2018

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Ruder S. An overview of gradient descent optimization algorithms. 2016. ArXiv:1609.04747
An W P, Wang H Q, Sun Q Y, et al. A PID controller approach for stochastic optimization of deep networks. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2018. 8522–8531
Kim D, Kim J, Kwon J, et al. Depth-controllable very deep super-resolution network. In: Proceedings of International Joint Conference on Neural Networks, 2019. 1–8
Hinton G, Srivastava N, Swersky K. Overview of mini-batch gradient descent. 2012. http://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf
Qian N. On the momentum term in gradient descent learning algorithms. Neural Netw, 1999, 12: 145–151
Article Google Scholar
Duchi J, Hazan E, Singer Y. Adaptive subgradient methods for online learning and stochastic optimization. J Mach Learn Res, 2011, 12: 2121–2159
MathSciNet MATH Google Scholar
Zeiler M D. Adadelta: an adaptive learning rate method. 2012. ArXiv:1212.5701
Dauphin Y N, de Vries H, Bengio Y. Equilibrated adaptive learning rates for nonconvex optimization. In: Proceedings of Conference and Workshop on Neural Information Processing Systems, 2015
Kingma D, Ba J. Adam: a method for stochastic optimization. In: Proceedings of International Conference on Learning Representations, 2015. 1–15
Reddi S J, Kale S, Kumar S. On the convergence of ADAM and beyond. In: Proceedings of International Conference on Learning Representations, 2018. 1–23
Luo L C, Xiong Y H, Liu Y, et al. Adaptive gradient methods with dynamic bound of learning rate. In: Proceedings of International Conference on Learning Representations, 2019. 1–19
Saxe A M, McClelland J L, Ganguli S. Exact solutions to the nonlinear dynamics of learning in deep linear neural networks. 2013. ArXiv:1312.6120
Lee T H, Trinh H M, Park J H. Stability analysis of neural networks with time-varying delay by constructing novel Lyapunov functionals. IEEE Trans Neural Netw Learn Syst, 2018, 29: 4238–4247
Article Google Scholar
Faydasicok O, Arik S. A novel criterion for global asymptotic stability of neutral type neural networks with discrete time delays. In: Proceedings of International Conference on Neural Information Processing, 2018. 353–360
Vidal R, Bruna J, Giryes R, et al. Mathematics of deep learning. 2017. ArXiv:1712.04741
Chaudhari P, Oberman A, Osher S, et al. Deep relaxation: partial differential equations for optimizing deep neural networks. Res Math Sci, 2018, 5: 30
Article MathSciNet Google Scholar
Wang H Q, Luo Y, An W P, et al. PID controller-based stochastic optimization acceleration for deep neural networks. IEEE Trans Neural Netw Learn Syst, 2020, 31: 5079–5091
Article Google Scholar
Cousseau F, Ozeki T, Amari S. Dynamics of learning in multilayer perceptrons near singularities. IEEE Trans Neural Netw, 2008, 19: 1313–1328
Article Google Scholar
Amari S, Park H, Ozeki T. Singularities affect dynamics of learning in neuromanifolds. Neural Comput, 2006, 18: 1007–1065
Article MathSciNet Google Scholar
Bietti A, Mairal J. Group invariance, stability to deformations, and complexity of deep convolutional representations. J Mach Learn Res, 2019, 20: 876–924
MathSciNet MATH Google Scholar
Sutskever I, Martens J, Dahl G, et al. On the importance of initialization and momentum in deep learning. In: Proceedings of International Conference on Machine Learning, 2013. 1139–1147
Lecun Y, Bottou L, Bengio Y, et al. Gradient-based learning applied to document recognition. Proc IEEE, 1998, 86: 2278–2324
Article Google Scholar
Li L S, Jamieson K, DeSalvo G, et al. Hyperband: a novel bandit-based approach to hyperparameter optimization. J Mach Learn Res, 2018, 18: 1–52
MathSciNet MATH Google Scholar

Download references

Acknowledgements

This work was supported by National Natural Science Foundation of China (Grant Nos. 61933013, U1736211), Strategic Priority Research Program of Chinese Academy of Sciences (Grant No. XDA22030301), Natural Science Foundation of Guangdong Province (Grant No. 2019A1515011076), and Key Project of Natural Science Foundation of Hubei Province (Grant No. 2018CFA024).

Author information

Authors and Affiliations

School of Computer Science, Wuhan University, Wuhan, 430072, China
Wei Wu & Xiaoyuan Jing
Institute of Data Science, City University of Macau, Macau, 999078, China
Wencai Du
College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, 518060, China
Guoliang Chen

Authors

Wei Wu
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoyuan Jing
View author publications
You can also search for this author in PubMed Google Scholar
Wencai Du
View author publications
You can also search for this author in PubMed Google Scholar
Guoliang Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Wei Wu or Xiaoyuan Jing.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wu, W., Jing, X., Du, W. et al. Learning dynamics of gradient descent optimization in deep neural networks. Sci. China Inf. Sci. 64, 150102 (2021). https://doi.org/10.1007/s11432-020-3163-0

Download citation

Received: 26 April 2020
Revised: 22 August 2020
Accepted: 19 November 2020
Published: 08 April 2021
DOI: https://doi.org/10.1007/s11432-020-3163-0

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Learning dynamics of gradient descent optimization in deep neural networks

Abstract

Article PDF

Similar content being viewed by others

State space representation and phase analysis of gradient descent optimizers

Convergence of Stochastic Gradient Descent in Deep Neural Network

A mean-field optimal control formulation of deep learning

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Learning dynamics of gradient descent optimization in deep neural networks

Abstract

Article PDF

Similar content being viewed by others

State space representation and phase analysis of gradient descent optimizers

Convergence of Stochastic Gradient Descent in Deep Neural Network

A mean-field optimal control formulation of deep learning

Explore related subjects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation