Abstract
This work presents and discusses optimization methods for solving finite-sum minimization problems which are pervasive in applications, including image processing. The procedures analyzed employ first-order models for the objective function and stochastic gradient approximations based on subsampling. Among the variety of methods in the literature, the focus is on selected algorithms which can be cast into two groups: algorithms using gradient estimates evaluated on samples of very small size and algorithms relying on gradient estimates and machinery from standard globally convergent optimization procedures. Neural networks and convolutional neural networks widely used for image processing tasks are considered, and a classification problem of images is solved with some of the methods presented.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Aggarwal, C.C.: Neural Networks and Deep Learning. Springer (2018)
Andradottir, S.: A scaled stochastic approximation algorithm. Manag. Sci. 42, 475–498 (1996)
Armijo, L.: Minimization of functions having lipschitz continuous first partial derivatives. Pac. J. Math. 16, 1–3 (1966)
Babanezhad, R., Ahmed, M.O., Virani, A., Schmidt, M., Konečný, J., Sallinen S.: Stop wasting my gradients: Practical SVRG. In: Proceedings of the 28th International Conference on Neural Information Processing Systems, Vol. 2 pp. 2251–2259 (2015)
Barzilai, J., Borwein, J.: Two-point step size gradient. IMA J. Numer. Anal. 8, 141–148 (1988)
Bellavia, S., Gurioli, G.: Complexity analysis of a Stochastic cubic regularisation method under inexact gradient evaluations and dynamic Hessian accuracy, (2020). arXiv:2001.10827
Bellavia, S., Gurioli, G., Morini, B., Adaptive cubic regularization methods with dynamic inexact Hessian information and applications to finite-sum minimization. IMA J. Numer. Anal. 41, 764–799 (2021). https://doi.org/10.1093/imanum/drz076
Bellavia, S., Gurioli, G., Morini, B., Toint, P.L.: Adaptive regularization algorithms with inexact evaluations for nonconvex optimization. SIAM J. Optimiz. 29, 2881–2915 (2019)
Bellavia, S., Gurioli, G., Morini, B., Toint, P.L.: High-order Evaluation Complexity of a Stochastic Adaptive Regularization Algorithm for Nonconvex Optimization Using Inexact Function Evaluations and Randomly Perturbed Derivatives (2020a) arXiv:2005.04639
Bellavia, S., Krejić, N., Krklec Jerinkić, N.: Subsampled Inexact Newton methods for minimizing large sums of convex function. IMA J. Numer. Anal. 40, 2309–2341 (2020b)
Bellavia, S., Krejić, N., Morini, B.: Inexact restoration with subsampled trust-region methods for finite-sum minimization. Comput. Optim. Appl. 76, 701–736 (2020c)
Berahas, A.S., Bollapragada, R., Nocedal, J.: An investigation of Newton-sketch and subsampled Newton methods. Optim. Method Softw. 35, 661–680 (2020)
Bertsekas, D.P., Tsitsiklis, J.N.: Gradient convergence in gradient methods with errors. SIAM J. Optimiz. 10, 627–642 (2000)
Bishop, C.M.: Pattern Recognition and Machine Learning (Information Science and Statistics). Springer (2006)
Birgin, G.E., Krejić, N., Martínez, J.M.: On the employment of Inexact Restoration for the minimization of functions whose evaluation is subject to programming errors. Math. Comput. 87, 1307–1326 (2018)
Blanchet, J., Cartis, C., Menickelly, M., Scheinberg, K.: Convergence rate analysis of a stochastic trust region method via submartingales. INFORMS J. Optim. 1, 92–119 (2019)
Bollapragada, R., Byrd, R., Nocedal, J.: Exact and inexact subsampled Newton methods for optimization. IMA J. Numer. Anal. 39, 545–578 (2019)
Bottou, L., Curtis, F.C., Nocedal, J.: Optimization methods for large-scale machine learning. SIAM Rev. 60, 223–311 (2018)
Byrd, R.H., Hansen, S.L., Nocedal, J., Singer Y.: A stochastic quasi-Newton method for large-scale optimization. SIAM J. Optimiz. 26, 1008–1021 (2016)
Byrd, R.H., Chin, G.M., Nocedal, J., Wu, Y.: Sample size selection in optimization methods for machine learning. Math. Program. 134, 127–155 (2012)
Cartis, C., Scheinberg, K.: Global convergence rate analysis of unconstrained optimization methods based on probablistic models. Math. Program. 169, 337–375 (2018)
Chen, R., Menickelly, M., Scheinberg, K.: Stochastic optimization using a trust-region method and random models. Math. Program. 169, 447–487 (2018)
Chollet, F.: Deep Learning with Python. Manning Publications Co. (2017)
Curtis, F.E., Scheinberg, K., Shi, R.: A stochastic trust region algorithm based on careful step normalization. INFORMS J. Optimiz. 1, 200–220 (2019)
Defazio, A., Bach, F., Lacoste-Julien, S.: SAGA: A fast incremental gradient method with support for non-strongly convex composite objectives. Advances in Neural Information Processing Systems 27 (NIPS 2014)
Delyon, B., Juditsky, A.: Accelerated stochastic approximation. SIAM J. Optimiz. 3, 868–881 (1993)
Erdogdu, M.A., Montanari, A.: Convergence rates of sub-sampled Newton methods, Advances in Neural Information Processing Systems 28 (NIPS 2015)
Forsyth, D.A., Ponce, J.: Computer Vision: A Modern Approach. Pearson (2002)
Friedlander, M.P., Schmidt, M.: Hybrid deterministic-stochastic methods for data fitting. SIAM J. Sci. Comput. 34, 1380–1405 (2012)
Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. The MIT Press (2016)
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning. Springer New York Inc. (2001)
Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: Proceedings of the 26th International Conference on Neural Information Processing Systems 26 (NIPS 2013)
Lei, L., Jordan, M.I.: Less than a single pass: Stochastically controlled stochastic gradient method. In: Proceedings of the Twentieth Conference on Artificial Intelligence and Statistics (AISTATS) (2017)
Liu, L., Liu, X., Hsieh, C.-J., Tao, D.: Stochastic second-order methods for non-convex optimization with inexact Hessian and gradient (2018) arXiv:1809.09853
Loizou, N., Richtárik, P.: Momentum and stochastic momentum for stochastic gradient, Newton, proximal point and subspace descent methods (2017) arXiv:1712.09677
Kesten, H.: Accelerated stochastic approximation. Ann. Math. Statist. 29, 41–59 (1958)
Kiefer, J., Wolfowitz, J.: Stochastic estimation of the maximum of a regression function. Ann. Math. Stat. 23, 462–466 (1952)
Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization, 3rd International Conference on Learning Representations, ICLR 2015 (2015) arXiv: 1412.6980
Krejić, N., Lužanin, Z., Stojkovska, I.: A gradient method for unconstrained optimization in noisy environment. App. Numer. Math. 70, 1–21 (2013)
Krejić, N., Lužanin, Z., Ovcin, Z., Stojkovska, I.: Descent direction method with line search for unconstrained optimization in noisy environment. Optim. Method. Soft. 30, 1164–1184 (2015)
Nocedal, J., Wright, S.J.: Numerical Optimization. Springer Series in Operations Research. Springer (1999)
Krejić, N., Martínez, J.M.: Inexact restoration approach for minimization with inexact evaluation of the objective function. Math. Comput. 85, 1775–1791 (2016)
Krejić, N., Krklec, N.: Line search methods with variable sample size for unconstrained optimization. J. Comput. Appl. Math. 245, 213–231 (2013)
Krejić, N., Krklec Jerinkić N.: Nonmonotone line search methods with variable sample size. Numer. Algorithms 68, 711–739 (2015)
Krizhevsky, A.: Learning multiple layers of features from tiny images. Technical Report, University of Toronto (2009)
Nemirovski, A., Juditsky, A., Lan, G., Shapiro, A.: Robust stochastic approximation approach to stochastic programming. SIAM J. Optimiz. 19, 1574–1609 (2009)
Nesterov, Y.: Introductory lectures on convex programming, Volume I: Basic course. Lecture Notes (1998)
Nocedal, J., Sartenaer, A., Zhu, C.: On the behavior of the gradient norm in the steepest descent method. Comput. Optim. Appl. 22, 5–35 (2002)
Nguyen, L.M., Liu, J., Scheinberg, K., Takač, M., SARAH: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning (2017) pp. 2613–2621
Paquette, C., Scheinberg, K.: A stochastic line search method with expected complexity analysis. SIAM J. Optim. 30, 349–376 (2020)
Patterson, J., Gibson, A.: Deep Learning: A Practitioner’s Approach. O’Reilly Media, Inc (2017)
Pilanci, M., Wainwright, M.J.: Newton sketch: A near linear-time optimization algorithm with linear-quadratic convergence. SIAM J. Optimiz. 27, 205–245 (2017)
Polak, E., Royset, J.O.: Efficient sample sizes in stochastic nonlinear programing. J. Comput. Appl. Math. 217, 301–310 (2008)
Raj, A., Stich, S.U.: k-SVRG: Variance Reduction for Large Scale Optimization (2018) arXiv:1805.00982
Raydan, M.: The Barzilai and Borwein gradient method for the large scale unconstrained minimization problem. SIAM J. Optim. 7, 26–33 (1997)
Ross, S.: Simulation. Elsevier, 4th edn. (2006)
Roosta-Khorasani, F., Mahoney, M.W.: Sub-sampled Newton methods. Math. Progr. 174, 293–326 (2019)
Robbins, H., Monro, S.: A stochastic approximation method. Ann. Math. Stat. 22, 400–407 (1951)
Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning representations by back-propagating errors. Nature 323, 533–536 (1986)
Sashank, J.R., Satyen, K.A., Sanjiv, K.U.: On the convergence of Adam and beyond. In: 6th International Conference on Learning Representations (ICLR 2018)
Schmidt, M., Le Roux, N., Bach, F.: Minimizing finite sums with the stochastic average gradient. Math. Program. 162, 83–112 (2017)
Shanmugamani, R.: Deep Learning for Computer Vision: Expert Techniques to Train Advanced Neural Networks Using TensorFlow and Keras. Packt Publishing (2018)
Spall J.C.: Introduction to Stochastic Search and Optimization. Wiley-Interscience series in discrete mathematics (2003)
Spall J.C.: Implementation of the simultaneous perturbation algorithm for stochastic optimization. IEEE Trans. Aerosp. Electron. Syst. 34, 817–823 (1998)
Shanmugamani, R.: Deep Learning for Computer Vision: Expert Techniques to Train Advanced Neural Networks Using TensorFlow and Keras. Packt Publishing (2018)
Strang, G.: Linear Algebra and Learning from Data. Wellesley-Cambridge Press (2019)
Sutton, R.: Two problems with back propagation and other steepest descent learning procedures for networks. In: Proceedings of the Eighth Annual Conference of the Cognitive Science Society, pp. 823—832 (1986)
Tan, C., Ma, S., Dai, Y., Qian, Y.: Barzilai-Borwein step size for stochastic gradient descent. Advances in Neural Information Processing Systems 29, 685–693 (2016)
Toint, P.L.: Nonlinear stepsize control, trust regions and regularizations for unconstrained optimization. Optimiz. Meth. Softw. 28, 82–95 (2013)
Tripuraneni, N., Stern, M., Regier, M., Jordan, M.I.: Stochastic cubic regularization for fast nonconvex optimization. Adv. Neural Inf. Proces. Syst. 31, 2899–2908 (2018)
Tropp, J.: An introduction to matrix concentration inequalities. Found. Trends Mach. Learn. 8(1–2), 1–230 (2015)
Yousefian, F., Nedic, A., Shanbhag, U.V.: On stochastic gradient and subgradient methods with adaptive steplength sequences. Automatica 48, 56–67 (2012)
Wang, C., Chen, X., Smola, A., Xing, E.: Variance reduction for stochastic gradient optimization. Adv. Neural Inf. Proces. Syst. 26, 181–189 (2013)
Xu, Z., Dai, Y.H.: New stochastic approximation algorithms with adaptive step sizes. Optim. Lett. 6, 1831–1846 (2012)
Xu, P., Yang, J., Roosta-Khorasani, F., Ré, C., Mahoney, M.W.: Sub-sampled Newton methods with non-uniform sampling. Adv. Neural Inf. Proces. Syst. 29, 3008–3016 (2016)
Xu, P., Roosta-Khorasani, F., Mahoney, M.W.: Newton-type methods for non-convex optimization under inexact Hessian information. Math. Program. (2019). https://doi.org/10.1007/s10107-019-01405-z
Wang, X., Yuan, Y.X.: Stochastic Trust Region Methods with Trust Region Radius Depending on Probabilistic Models (2019). arXiv:1904.03342
Acknowledgements
The financial support of INdAM-GNCS Projects 2019 and 2020 is gratefully acknowledged by the first and by the fourth authors. Thanks are due the referee whose comments improved the presentation of this paper.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 Springer Nature Switzerland AG
About this entry
Cite this entry
Bellavia, S., Bianconcini, T., Krejić, N., Morini, B. (2023). Subsampled First-Order Optimization Methods with Applications in Imaging. In: Chen, K., Schönlieb, CB., Tai, XC., Younes, L. (eds) Handbook of Mathematical Models and Algorithms in Computer Vision and Imaging. Springer, Cham. https://doi.org/10.1007/978-3-030-98661-2_78
Download citation
DOI: https://doi.org/10.1007/978-3-030-98661-2_78
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-98660-5
Online ISBN: 978-3-030-98661-2
eBook Packages: Mathematics and StatisticsReference Module Computer Science and Engineering