Abstract
This paper describes the novel study on accelerating INertial Newton Algorithm (INNA) for neural network training. Recently, INNA, a dynamic system of optimization methods, has been proposed and applied to neural network training. INNA combines the ideas of Newton and the Inertial methods into a dynamical system and expresses them as differential equations. This paper proposes a new training algorithm called Nesterov’s Accelerated Dynamical InertiAl Newton method (NADIAN), which accelerates INNA by introducing Nesterov’s accelerated gradient. Finally, the proposed method is applied to neural network training and verified.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Goodfellow I, Bengio Y, Courville A (2016) Deep learning (adaptive computation and machine learning series). MIT Press
Attouch H, Goudou X, Redont P (2011) The heavy ball with friction method, I. The continuous dynamical system: global exploration of the global minima of a real-valued function by asymptotic analysis of a dissipative dynamical system. Commun Contemp Math 2(1):1–43
Antipin AS (1997) Linearization method, in nonlinear dynamic systems: qualitative analysis and control (in Russian). Proc ISA Russ Acad Sci 2:4–20 (1994) (English translation: Comput Math 8(1):1–15)
Alvarez F, Attouch H, Bolte J, Redont P (2002) A second-order gradient-like dissipative dynamical system with Hessian-driven damping: application to optimization and mechanics. J Math Pure Appl 81(8):747–779
Castera C, Bolte J, Févotte C, Pauwels E (2021) An inertial newton algorithm for deep learning. J Mach Learn Res 22(134):1–31
TensorFlow. https://www.tensorflow.org. Last accessed 30 May 2023
https://github.com/ninomiyalab/NADIAN_optimizer. Last accessed 30 May 2023
Sutskever I, Martens J, Dahl GE, Hinton GE (2013) On the importance of initialization and momentum in deep learning. In: Proceedings of the ICML, pp 1139–1147
Nesterov Y (2004) Introductory lectures on convex optimization: a basic course. Kluwer, Boston
Muehlebach M, Jordan M (2019) A dynamical systems perspective on Nesterov acceleration. In: Proceedings of the ICML PMLR, pp 4656–4662
Lecun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324
Google Colablatory, https://colab.research.google.com. Last accessed 30 May 2023
Tieleman T, Hinton G (2012) Lecture 6.5—RMSprop: divide the gradient by a running average of its recent magnitude. COURSERA Neural Netw Mach Learn 4(2):26–31
Kingma DP, Ba J (2015) Adam: a method for stochastic optimization. In: Proceedings of the ICLR, pp 1–13
Zhuang J, Tang T, Ding Y, Tatikonda SC, Dvornek N, Papademetris X, Duncan J (2020) Adabelief optimizer: adapting stepsizes by the belief in observed gradients. In: Proceedings of the NeurIPS, pp 18795-18806
Krizhevsky A, Hinton GE (2009) Learning multiple layers of features from tiny images
Acknowledgements
This work is supported by The Japan Society Promotion of Science (JSPS), KAKENHI (20K11979 and 23K11267).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Mahboubi, S., Yamatomi, R., Samejima, Y., Ninomiya, H. (2024). A Study on Accelerating of Inertial Newton Algorithm for Neural Network Training. In: Nagar, A.K., Jat, D.S., Mishra, D., Joshi, A. (eds) Intelligent Sustainable Systems. WorldS4 2023. Lecture Notes in Networks and Systems, vol 803. Springer, Singapore. https://doi.org/10.1007/978-981-99-7569-3_16
Download citation
DOI: https://doi.org/10.1007/978-981-99-7569-3_16
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-7568-6
Online ISBN: 978-981-99-7569-3
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)