Skip to main content

Subsampled First-Order Optimization Methods with Applications in Imaging

  • Reference work entry
  • First Online:
Handbook of Mathematical Models and Algorithms in Computer Vision and Imaging

Abstract

This work presents and discusses optimization methods for solving finite-sum minimization problems which are pervasive in applications, including image processing. The procedures analyzed employ first-order models for the objective function and stochastic gradient approximations based on subsampling. Among the variety of methods in the literature, the focus is on selected algorithms which can be cast into two groups: algorithms using gradient estimates evaluated on samples of very small size and algorithms relying on gradient estimates and machinery from standard globally convergent optimization procedures. Neural networks and convolutional neural networks widely used for image processing tasks are considered, and a classification problem of images is solved with some of the methods presented.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 899.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 949.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  • Aggarwal, C.C.: Neural Networks and Deep Learning. Springer (2018)

    Google Scholar 

  • Andradottir, S.: A scaled stochastic approximation algorithm. Manag. Sci. 42, 475–498 (1996)

    Article  MATH  Google Scholar 

  • Armijo, L.: Minimization of functions having lipschitz continuous first partial derivatives. Pac. J. Math. 16, 1–3 (1966)

    Article  MathSciNet  MATH  Google Scholar 

  • Babanezhad, R., Ahmed, M.O., Virani, A., Schmidt, M., Konečný, J., Sallinen S.: Stop wasting my gradients: Practical SVRG. In: Proceedings of the 28th International Conference on Neural Information Processing Systems, Vol. 2 pp. 2251–2259 (2015)

    Google Scholar 

  • Barzilai, J., Borwein, J.: Two-point step size gradient. IMA J. Numer. Anal. 8, 141–148 (1988)

    Article  MathSciNet  MATH  Google Scholar 

  • Bellavia, S., Gurioli, G.: Complexity analysis of a Stochastic cubic regularisation method under inexact gradient evaluations and dynamic Hessian accuracy, (2020). arXiv:2001.10827

    Google Scholar 

  • Bellavia, S., Gurioli, G., Morini, B., Adaptive cubic regularization methods with dynamic inexact Hessian information and applications to finite-sum minimization. IMA J. Numer. Anal. 41, 764–799 (2021). https://doi.org/10.1093/imanum/drz076

    Article  MathSciNet  MATH  Google Scholar 

  • Bellavia, S., Gurioli, G., Morini, B., Toint, P.L.: Adaptive regularization algorithms with inexact evaluations for nonconvex optimization. SIAM J. Optimiz. 29, 2881–2915 (2019)

    Article  MathSciNet  MATH  Google Scholar 

  • Bellavia, S., Gurioli, G., Morini, B., Toint, P.L.: High-order Evaluation Complexity of a Stochastic Adaptive Regularization Algorithm for Nonconvex Optimization Using Inexact Function Evaluations and Randomly Perturbed Derivatives (2020a) arXiv:2005.04639

    Google Scholar 

  • Bellavia, S., Krejić, N., Krklec Jerinkić, N.: Subsampled Inexact Newton methods for minimizing large sums of convex function. IMA J. Numer. Anal. 40, 2309–2341 (2020b)

    Article  MathSciNet  MATH  Google Scholar 

  • Bellavia, S., Krejić, N., Morini, B.: Inexact restoration with subsampled trust-region methods for finite-sum minimization. Comput. Optim. Appl. 76, 701–736 (2020c)

    Article  MathSciNet  MATH  Google Scholar 

  • Berahas, A.S., Bollapragada, R., Nocedal, J.: An investigation of Newton-sketch and subsampled Newton methods. Optim. Method Softw. 35, 661–680 (2020)

    Article  MathSciNet  MATH  Google Scholar 

  • Bertsekas, D.P., Tsitsiklis, J.N.: Gradient convergence in gradient methods with errors. SIAM J. Optimiz. 10, 627–642 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  • Bishop, C.M.: Pattern Recognition and Machine Learning (Information Science and Statistics). Springer (2006)

    Google Scholar 

  • Birgin, G.E., Krejić, N., Martínez, J.M.: On the employment of Inexact Restoration for the minimization of functions whose evaluation is subject to programming errors. Math. Comput. 87, 1307–1326 (2018)

    Article  MATH  Google Scholar 

  • Blanchet, J., Cartis, C., Menickelly, M., Scheinberg, K.: Convergence rate analysis of a stochastic trust region method via submartingales. INFORMS J. Optim. 1, 92–119 (2019)

    Article  MathSciNet  Google Scholar 

  • Bollapragada, R., Byrd, R., Nocedal, J.: Exact and inexact subsampled Newton methods for optimization. IMA J. Numer. Anal. 39, 545–578 (2019)

    Article  MathSciNet  MATH  Google Scholar 

  • Bottou, L., Curtis, F.C., Nocedal, J.: Optimization methods for large-scale machine learning. SIAM Rev. 60, 223–311 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  • Byrd, R.H., Hansen, S.L., Nocedal, J., Singer Y.: A stochastic quasi-Newton method for large-scale optimization. SIAM J. Optimiz. 26, 1008–1021 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  • Byrd, R.H., Chin, G.M., Nocedal, J., Wu, Y.: Sample size selection in optimization methods for machine learning. Math. Program. 134, 127–155 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  • Cartis, C., Scheinberg, K.: Global convergence rate analysis of unconstrained optimization methods based on probablistic models. Math. Program. 169, 337–375 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  • Chen, R., Menickelly, M., Scheinberg, K.: Stochastic optimization using a trust-region method and random models. Math. Program. 169, 447–487 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  • Chollet, F.: Deep Learning with Python. Manning Publications Co. (2017)

    Google Scholar 

  • Curtis, F.E., Scheinberg, K., Shi, R.: A stochastic trust region algorithm based on careful step normalization. INFORMS J. Optimiz. 1, 200–220 (2019)

    Article  MathSciNet  Google Scholar 

  • Defazio, A., Bach, F., Lacoste-Julien, S.: SAGA: A fast incremental gradient method with support for non-strongly convex composite objectives. Advances in Neural Information Processing Systems 27 (NIPS 2014)

    Google Scholar 

  • Delyon, B., Juditsky, A.: Accelerated stochastic approximation. SIAM J. Optimiz. 3, 868–881 (1993)

    Article  MathSciNet  MATH  Google Scholar 

  • Erdogdu, M.A., Montanari, A.: Convergence rates of sub-sampled Newton methods, Advances in Neural Information Processing Systems 28 (NIPS 2015)

    Google Scholar 

  • Forsyth, D.A., Ponce, J.: Computer Vision: A Modern Approach. Pearson (2002)

    Google Scholar 

  • Friedlander, M.P., Schmidt, M.: Hybrid deterministic-stochastic methods for data fitting. SIAM J. Sci. Comput. 34, 1380–1405 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  • Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. The MIT Press (2016)

    Google Scholar 

  • Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning. Springer New York Inc. (2001)

    Google Scholar 

  • Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: Proceedings of the 26th International Conference on Neural Information Processing Systems 26 (NIPS 2013)

    Google Scholar 

  • Lei, L., Jordan, M.I.: Less than a single pass: Stochastically controlled stochastic gradient method. In: Proceedings of the Twentieth Conference on Artificial Intelligence and Statistics (AISTATS) (2017)

    Google Scholar 

  • Liu, L., Liu, X., Hsieh, C.-J., Tao, D.: Stochastic second-order methods for non-convex optimization with inexact Hessian and gradient (2018) arXiv:1809.09853

    Google Scholar 

  • Loizou, N., Richtárik, P.: Momentum and stochastic momentum for stochastic gradient, Newton, proximal point and subspace descent methods (2017) arXiv:1712.09677

    Google Scholar 

  • Kesten, H.: Accelerated stochastic approximation. Ann. Math. Statist. 29, 41–59 (1958)

    Article  MathSciNet  MATH  Google Scholar 

  • Kiefer, J., Wolfowitz, J.: Stochastic estimation of the maximum of a regression function. Ann. Math. Stat. 23, 462–466 (1952)

    Article  MathSciNet  MATH  Google Scholar 

  • Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization, 3rd International Conference on Learning Representations, ICLR 2015 (2015) arXiv: 1412.6980

    Google Scholar 

  • Krejić, N., Lužanin, Z., Stojkovska, I.: A gradient method for unconstrained optimization in noisy environment. App. Numer. Math. 70, 1–21 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  • Krejić, N., Lužanin, Z., Ovcin, Z., Stojkovska, I.: Descent direction method with line search for unconstrained optimization in noisy environment. Optim. Method. Soft. 30, 1164–1184 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  • Nocedal, J., Wright, S.J.: Numerical Optimization. Springer Series in Operations Research. Springer (1999)

    Book  MATH  Google Scholar 

  • Krejić, N., Martínez, J.M.: Inexact restoration approach for minimization with inexact evaluation of the objective function. Math. Comput. 85, 1775–1791 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  • Krejić, N., Krklec, N.: Line search methods with variable sample size for unconstrained optimization. J. Comput. Appl. Math. 245, 213–231 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  • Krejić, N., Krklec Jerinkić N.: Nonmonotone line search methods with variable sample size. Numer. Algorithms 68, 711–739 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  • Krizhevsky, A.: Learning multiple layers of features from tiny images. Technical Report, University of Toronto (2009)

    Google Scholar 

  • Nemirovski, A., Juditsky, A., Lan, G., Shapiro, A.: Robust stochastic approximation approach to stochastic programming. SIAM J. Optimiz. 19, 1574–1609 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  • Nesterov, Y.: Introductory lectures on convex programming, Volume I: Basic course. Lecture Notes (1998)

    Google Scholar 

  • Nocedal, J., Sartenaer, A., Zhu, C.: On the behavior of the gradient norm in the steepest descent method. Comput. Optim. Appl. 22, 5–35 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  • Nguyen, L.M., Liu, J., Scheinberg, K., Takač, M., SARAH: A novel method for machine learning problems using stochastic recursive gradient. In: Proceedings of the 34th International Conference on Machine Learning (2017) pp. 2613–2621

    Google Scholar 

  • Paquette, C., Scheinberg, K.: A stochastic line search method with expected complexity analysis. SIAM J. Optim. 30, 349–376 (2020)

    Article  MathSciNet  MATH  Google Scholar 

  • Patterson, J., Gibson, A.: Deep Learning: A Practitioner’s Approach. O’Reilly Media, Inc (2017)

    Google Scholar 

  • Pilanci, M., Wainwright, M.J.: Newton sketch: A near linear-time optimization algorithm with linear-quadratic convergence. SIAM J. Optimiz. 27, 205–245 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  • Polak, E., Royset, J.O.: Efficient sample sizes in stochastic nonlinear programing. J. Comput. Appl. Math. 217, 301–310 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  • Raj, A., Stich, S.U.: k-SVRG: Variance Reduction for Large Scale Optimization (2018) arXiv:1805.00982

    Google Scholar 

  • Raydan, M.: The Barzilai and Borwein gradient method for the large scale unconstrained minimization problem. SIAM J. Optim. 7, 26–33 (1997)

    Article  MathSciNet  MATH  Google Scholar 

  • Ross, S.: Simulation. Elsevier, 4th edn. (2006)

    Google Scholar 

  • Roosta-Khorasani, F., Mahoney, M.W.: Sub-sampled Newton methods. Math. Progr. 174, 293–326 (2019)

    Article  MathSciNet  MATH  Google Scholar 

  • Robbins, H., Monro, S.: A stochastic approximation method. Ann. Math. Stat. 22, 400–407 (1951)

    Article  MathSciNet  MATH  Google Scholar 

  • Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning representations by back-propagating errors. Nature 323, 533–536 (1986)

    Article  MATH  Google Scholar 

  • Sashank, J.R., Satyen, K.A., Sanjiv, K.U.: On the convergence of Adam and beyond. In: 6th International Conference on Learning Representations (ICLR 2018)

    Google Scholar 

  • Schmidt, M., Le Roux, N., Bach, F.: Minimizing finite sums with the stochastic average gradient. Math. Program. 162, 83–112 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  • Shanmugamani, R.: Deep Learning for Computer Vision: Expert Techniques to Train Advanced Neural Networks Using TensorFlow and Keras. Packt Publishing (2018)

    Google Scholar 

  • Spall J.C.: Introduction to Stochastic Search and Optimization. Wiley-Interscience series in discrete mathematics (2003)

    Google Scholar 

  • Spall J.C.: Implementation of the simultaneous perturbation algorithm for stochastic optimization. IEEE Trans. Aerosp. Electron. Syst. 34, 817–823 (1998)

    Article  Google Scholar 

  • Shanmugamani, R.: Deep Learning for Computer Vision: Expert Techniques to Train Advanced Neural Networks Using TensorFlow and Keras. Packt Publishing (2018)

    Google Scholar 

  • Strang, G.: Linear Algebra and Learning from Data. Wellesley-Cambridge Press (2019)

    Google Scholar 

  • Sutton, R.: Two problems with back propagation and other steepest descent learning procedures for networks. In: Proceedings of the Eighth Annual Conference of the Cognitive Science Society, pp. 823—832 (1986)

    Google Scholar 

  • Tan, C., Ma, S., Dai, Y., Qian, Y.: Barzilai-Borwein step size for stochastic gradient descent. Advances in Neural Information Processing Systems 29, 685–693 (2016)

    Google Scholar 

  • Toint, P.L.: Nonlinear stepsize control, trust regions and regularizations for unconstrained optimization. Optimiz. Meth. Softw. 28, 82–95 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  • Tripuraneni, N., Stern, M., Regier, M., Jordan, M.I.: Stochastic cubic regularization for fast nonconvex optimization. Adv. Neural Inf. Proces. Syst. 31, 2899–2908 (2018)

    Google Scholar 

  • Tropp, J.: An introduction to matrix concentration inequalities. Found. Trends Mach. Learn. 8(1–2), 1–230 (2015)

    Article  MATH  Google Scholar 

  • Yousefian, F., Nedic, A., Shanbhag, U.V.: On stochastic gradient and subgradient methods with adaptive steplength sequences. Automatica 48, 56–67 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  • Wang, C., Chen, X., Smola, A., Xing, E.: Variance reduction for stochastic gradient optimization. Adv. Neural Inf. Proces. Syst. 26, 181–189 (2013)

    Google Scholar 

  • Xu, Z., Dai, Y.H.: New stochastic approximation algorithms with adaptive step sizes. Optim. Lett. 6, 1831–1846 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  • Xu, P., Yang, J., Roosta-Khorasani, F., Ré, C., Mahoney, M.W.: Sub-sampled Newton methods with non-uniform sampling. Adv. Neural Inf. Proces. Syst. 29, 3008–3016 (2016)

    Google Scholar 

  • Xu, P., Roosta-Khorasani, F., Mahoney, M.W.: Newton-type methods for non-convex optimization under inexact Hessian information. Math. Program. (2019). https://doi.org/10.1007/s10107-019-01405-z

    MATH  Google Scholar 

  • Wang, X., Yuan, Y.X.: Stochastic Trust Region Methods with Trust Region Radius Depending on Probabilistic Models (2019). arXiv:1904.03342

    Google Scholar 

Download references

Acknowledgements

The financial support of INdAM-GNCS Projects 2019 and 2020 is gratefully acknowledged by the first and by the fourth authors. Thanks are due the referee whose comments improved the presentation of this paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Benedetta Morini .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 Springer Nature Switzerland AG

About this entry

Check for updates. Verify currency and authenticity via CrossMark

Cite this entry

Bellavia, S., Bianconcini, T., Krejić, N., Morini, B. (2023). Subsampled First-Order Optimization Methods with Applications in Imaging. In: Chen, K., Schönlieb, CB., Tai, XC., Younes, L. (eds) Handbook of Mathematical Models and Algorithms in Computer Vision and Imaging. Springer, Cham. https://doi.org/10.1007/978-3-030-98661-2_78

Download citation

Publish with us

Policies and ethics