Convergence of Constant Step Stochastic Gradient Descent for Non-Smooth Non-Convex Functions

Bianchi, Pascal; Hachem, Walid; Schechtman, Sholom

doi:10.1007/s11228-022-00638-z

Convergence of Constant Step Stochastic Gradient Descent for Non-Smooth Non-Convex Functions

Published: 08 April 2022

Volume 30, pages 1117–1147, (2022)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Set-Valued and Variational Analysis Aims and scope Submit manuscript

Convergence of Constant Step Stochastic Gradient Descent for Non-Smooth Non-Convex Functions

Download PDF

549 Accesses
17 Citations
Explore all metrics

Abstract

This paper studies the asymptotic behavior of the constant step Stochastic Gradient Descent for the minimization of an unknown function, defined as the expectation of a non convex, non smooth, locally Lipschitz random function. As the gradient may not exist, it is replaced by a certain operator: a reasonable choice is to use an element of the Clarke subdifferential of the random function; another choice is the output of the celebrated backpropagation algorithm, which is popular amongst practioners, and whose properties have recently been studied by Bolte and Pauwels. Since the expectation of the chosen operator is not in general an element of the Clarke subdifferential of the mean function, it has been assumed in the literature that an oracle of the Clarke subdifferential of the mean function is available. As a first result, it is shown in this paper that such an oracle is not needed for almost all initialization points of the algorithm. Next, in the small step size regime, it is shown that the interpolated trajectory of the algorithm converges in probability (in the compact convergence sense) towards the set of solutions of a particular differential inclusion: the subgradient flow. Finally, viewing the iterates as a Markov chain whose transition kernel is indexed by the step size, it is shown that the invariant distribution of the kernel converge weakly to the set of invariant distribution of this differential inclusion as the step size tends to zero. These results show that when the step size is small, with large probability, the iterates eventually lie in a neighborhood of the critical points of the mean function.

Article PDF

Convergence of Gradient Algorithms for Nonconvex C^1+α Cost Functions

Article 27 May 2023

On the linear convergence of the stochastic gradient method with constant step-size

Article 25 September 2018

Convergence of Stochastic Proximal Gradient Algorithm

Article 15 October 2019

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Aliprantis, C.D., Border, K.C.: Infinite Dimensional Analysis: a Hitchhiker’s Guide. Springer, Berlin (2006). https://doi.org/10.1007/3-540-29587-9
MATH Google Scholar
Aubin, J.P., Cellina, A.: Differential inclusions, Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences], vol. 264. Springer, Berlin (1984). https://doi.org/10.1007/978-3-642-69512-4. Set-valued maps and viability theory
Google Scholar
Aubin, J.P., Frankowska, H., Lasota, A.: Poincaré’s recurrence theorem for set-valued dynamical systems. Ann. Polon. Math. 54(1), 85–91 (1991). https://doi.org/10.4064/ap-54-1-85-91
Article MathSciNet Google Scholar
Benaïm, M., Hofbauer, J., Sorin, S.: Stochastic approximations and differential inclusions. SIAM J. Control Optim. 44(1), 328–348 (2005). (electronic). https://doi.org/10.1137/S0363012904439301
Article MathSciNet Google Scholar
Benveniste, A., Métivier, M., Priouret, P.: Adaptive algorithms and stochastic approximations, Applications of Mathematics (New York), vol. 22. Springer, Berlin (1990). https://doi.org/10.1007/978-3-642-75894-2. Translated from the French by Stephen S. Wilson
MATH Google Scholar
Bianchi, P., Hachem, W., Salim, A.: Constant step stochastic approximations involving differential inclusions: stability, long-run convergence and applications. Stochastics 91(2), 288–320 (2019). https://doi.org/10.1080/17442508.2018.1539086
Article MathSciNet Google Scholar
Bolte, J., Daniilidis, A., Lewis, A., Shiota, M.: Clarke subgradients of stratifiable functions. SIAM J. Optim. 18(2), 556–572 (2007)
Article MathSciNet Google Scholar
Bolte, J., Pauwels, E.: Conservative set valued fields, automatic differentiation, stochastic gradient method and deep learning. arXiv:1909.10300(2019)
Clarke, F.H., Ledyaev, Y.S., Stern, R.J., Wolenski, P.R.: Nonsmooth Analysis and Control Theory Graduate Texts in Mathematics, vol. 178. Springer, New York (1998)
Google Scholar
Davis, D., Drusvyatskiy, D., Kakade, S., Lee, J.D.: Stochastic subgradient method converges on tame functions. Found Comput Math (20), 119–154. https://doi.org/10.1007/s10208-018-09409-5 (2020)
van den Dries, L., Miller, C.: Geometric categories and o-minimal structures. Duke. Math. J. 84(2), 497–540 (1996). https://doi.org/10.1215/S0012-7094-96-08416-1
MathSciNet MATH Google Scholar
Ermoliev, Y., Norkin, V.: Stochastic generalized gradient method for solving nonconvex nonsmooth stochastic optimization problems. Cybern. Syst. Anal. 34(2), 196–215 (1998). https://doi.org/10.1007/BF02742069. http://pure.iiasa.ac.at/id/eprint/5415/
Article Google Scholar
Ermoliev, Y.M., Norkin, V.: Solution of nonconvex nonsmooth stochastic optimization problems. Cybern. Syst. Anal. 39(5), 701–715 (2003)
Article Google Scholar
Faure, M., Roth, G.: Ergodic properties of weak asymptotic pseudotrajectories for set-valued dynamical systems. Stoch. Dyn. 13(1), 1250011,23 (2013). https://doi.org/10.1142/S0219493712500116
Article MathSciNet Google Scholar
Folland, G.: Real Analysis: Modern Techniques and Their Applications. Pure and Applied Mathematics: A Wiley Series of Texts, Monographs and Tracts. Wiley, Hoboken (2013). https://books.google.fr/books?id=wI4fAwAAQBAJ
Google Scholar
Has’minskiı̆, R.Z.: The averaging principle for parabolic and elliptic differential equations and Markov processes with small diffusion. Teor. Verojatnost. i Primenen. 8, 3–25 (1963)
MathSciNet Google Scholar
Ioffe, A.D.: An invitation to tame optimization. SIAM J. Optim. 19(4), 1894–1917 (2009). https://doi.org/10.1137/080722059
Article MathSciNet Google Scholar
Kakade, S., Lee, J.D.: Provably correct automatic sub-differentiation for qualified programs. In: Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R. (eds.) Advances in Neural Information Processing Systems. http://papers.nips.cc/paper/7943-provably-correct-automatic-sub-differentiation-for-qualified-programs.pdf, vol. 31, pp 7125–7135. Curran Associates, Inc (2018)
Kushner, H.J., Yin, G.G.: Stochastic Approximation and Recursive Algorithms and Applications, Applications of Mathematics (New York), 2nd edn., vol. 35. Springer, New York (2003). Stochastic Modelling and Applied Probability
Lebourg, G.: Generic differentiability of Lipschitzian functions. Transactions of the American Mathematical Society 256, 125–144 (1979). http://www.jstor.org/stable/1998104
Article MathSciNet Google Scholar
Majewski, S., Miasojedow, B., Moulines, E.: Analysis of nonsmooth stochastic approximation: the differential inclusion approach. arXiv:1805.01916(2018)
Meyn, S., Tweedie, R.L.: Markov Chains and Stochastic Stability, 2nd edn. Cambridge University Press, New York (2009)
Book Google Scholar
Mikhalevich, V., Gupal, A., Norkin, V.: Methods of nonconvex optimization. Nauka (1987)
Norkin, V.: Generalized-differentiable functions. Cybern. Syst. Anal. 16, 10–12 (1980). https://doi.org/10.1007/BF01099354
Article Google Scholar
Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic Differentiation in PyTorch. In: NIPS-W (2017)
Roth, G., Sandholm, W.H.: Stochastic approximations with constant step size and differential inclusions. SIAM J. Control Optim. 51(1), 525–555 (2013). https://doi.org/10.1137/110844192
Article MathSciNet Google Scholar
Ruszczyński, A.: Convergence of a stochastic subgradient method with averaging for nonsmooth nonconvex constrained optimization. Optim. Lett. 14. https://doi.org/10.1007/s11590-020-01537-8 (2020)

Download references

Acknowledgements

The authors wish to thank Jérôme Bolte and Edouard Pauwels for their inspiring remarks. This work is partially supported by the Région Ile-de-France.

Funding

The work of the third author is supported by the Région Ile-de-France.

Author information

Authors and Affiliations

LTCI, Telecom Paris, IP, Paris, France
Pascal Bianchi
LIGM, CNRS, Université Gustave Eiffel, F-77454, Marne-la-Vallée, France
Walid Hachem & Sholom Schechtman

Authors

Pascal Bianchi
View author publications
You can also search for this author in PubMed Google Scholar
Walid Hachem
View author publications
You can also search for this author in PubMed Google Scholar
Sholom Schechtman
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sholom Schechtman.

Ethics declarations

Computing interest

Not applicable.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Availability of data and materials

Not applicable.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bianchi, P., Hachem, W. & Schechtman, S. Convergence of Constant Step Stochastic Gradient Descent for Non-Smooth Non-Convex Functions. Set-Valued Var. Anal 30, 1117–1147 (2022). https://doi.org/10.1007/s11228-022-00638-z

Download citation

Received: 27 January 2021
Accepted: 16 March 2022
Published: 08 April 2022
Issue Date: September 2022
DOI: https://doi.org/10.1007/s11228-022-00638-z

Keywords

Mathematics Subject Classification (2010)

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Convergence of Constant Step Stochastic Gradient Descent for Non-Smooth Non-Convex Functions

Abstract

Article PDF

Similar content being viewed by others

Convergence of Gradient Algorithms for Nonconvex C^1+α Cost Functions

On the linear convergence of the stochastic gradient method with constant step-size

Convergence of Stochastic Proximal Gradient Algorithm

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Computing interest

Additional information

Publisher’s Note

Availability of data and materials

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification (2010)

Navigation

Convergence of Constant Step Stochastic Gradient Descent for Non-Smooth Non-Convex Functions

Abstract

Article PDF

Similar content being viewed by others

Convergence of Gradient Algorithms for Nonconvex C1+α Cost Functions

On the linear convergence of the stochastic gradient method with constant step-size

Convergence of Stochastic Proximal Gradient Algorithm

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Computing interest

Additional information

Publisher’s Note

Availability of data and materials

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification (2010)

Search

Navigation

Convergence of Gradient Algorithms for Nonconvex C^1+α Cost Functions