Abstract
An optimization algorithm is essential for minimizing loss (or objective) functions in machine learning and deep learning. Optimization algorithms face several challenges, one among which is to determine an appropriate learning rate. Generally, a low learning rate leads to slow convergence whereas a large learning rate causes the loss function to fluctuate around the minimum. As a hyper-parameter, the learning rate is determined in advance before parameter training, which is time-consuming. This paper proposes a modified stochastic gradient descent (mSGD) algorithm that uses a random learning rate. Random numbers are generated for a learning rate at every iteration, and the one that gives the minimum value of the loss function is chosen. The proposed mSGD algorithm can reduce the time required for determining the learning rate. In fact, the k-point mSGD algorithm can be considered as a kind of steepest descent algorithm. In a real experiment using the MNIST dataset of hand-written digits, it is demonstrated that the convergence performance of mSGD algorithm is much better than that of the SGD algorithm and slightly better than that of the AdaGrad and Adam algorithms.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
N. Fatima, “Enhancing performance of a deep neural network: A comparative analysis of optimization algorithms,” Advances in Distributed Computing and Artificial Intelligence Journal, vol. 9, no. 2, pp. 79–90, 2020.
H. Robinds and S. Monro, “A stochastic approximation method,” Annals of Mathematical Statistics, vol. 22, pp. 400–407, 1951.
N. Qian, “On the momentum term in gradient descent learning algorithms,” Neural networks, vol. 12, no. 1, pp. 145–151, 1999.
J. Duchi, E. Hazan, and Y. Singer, “Adaptive subgradient methods for online learning and stochastic optimization,” Journal of Machine Learning Research, vol. 12, pp. 2121–2159, 2011.
D. P. Kingma and j. L. Ba, “Adam: A Method for Stochastic Optimization,” Proc. of the 3rd International Conference on Learning Representations, pp. 1–15, San Diego, 2015.
A. Zhang, Z. C. Lipton, and S. J. Smola, Dive into Deep Learning, 2022.
I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning, The MIT Press, 2016.
A. Mukherjee, K. C. Teh, and E. Gunawan, “Blind multiuser detector for DS/CDMA channels based on the modified stochastic gradient descent algorithm,” Proc. of the IEEE International Conference on Communications, pp. 1431–1435, 2001.
D. Valiente, A. Gil, L. Fernandez, and O. Reinoso, “A modified stochastic gradient descent algorithm for view-based SLAM using omnidirectional images,” Information Science, vol. 279, 20, pp. 326–337, 2014.
E. K. P. Ching and S. H. Zak, An Introduction to Optimization, Wiley-Interscience, Hoboken, N.J., 2008.
Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient based learning applied to document recognition,” Proceeding of the IEEE, pp. 1–46, 1998.
G. Saito, Deep Learning from Scratch (in Korean), Hanbit Media, inc., 2017.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
The authors declare that there is no competing financial interest or personal relationship that could have appeared to influence the work reported in this paper.
Additional information
Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Shim, DS., Shim, J. A Modified Stochastic Gradient Descent Optimization Algorithm With Random Learning Rate for Machine Learning and Deep Learning. Int. J. Control Autom. Syst. 21, 3825–3831 (2023). https://doi.org/10.1007/s12555-022-0947-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12555-022-0947-1