Perturbation Techniques for Convergence Analysis of Proximal Gradient Method and Other First-Order Algorithms via Variational Analysis

Wang, Xiangfeng; Ye, Jane J.; Yuan, Xiaoming; Zeng, Shangzhi; Zhang, Jin

doi:10.1007/s11228-020-00570-0

Perturbation Techniques for Convergence Analysis of Proximal Gradient Method and Other First-Order Algorithms via Variational Analysis

Published: 28 January 2021

Volume 30, pages 39–79, (2022)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Set-Valued and Variational Analysis Aims and scope Submit manuscript

Perturbation Techniques for Convergence Analysis of Proximal Gradient Method and Other First-Order Algorithms via Variational Analysis

Download PDF

Xiangfeng Wang¹,
Jane J. Ye²,
Xiaoming Yuan³,
Shangzhi Zeng³ &
…
Jin Zhang ORCID: orcid.org/0000-0002-6691-5612⁴

395 Accesses
5 Citations
Explore all metrics

Abstract

We develop new perturbation techniques for conducting convergence analysis of various first-order algorithms for a class of nonsmooth optimization problems. We consider the iteration scheme of an algorithm to construct a perturbed stationary point set-valued map, and define the perturbing parameter by the difference of two consecutive iterates. Then, we show that the calmness condition of the induced set-valued map, together with a local version of the proper separation of stationary value condition, is a sufficient condition to ensure the linear convergence of the algorithm. The equivalence of the calmness condition to the one for the canonically perturbed stationary point set-valued map is proved, and this equivalence allows us to derive some sufficient conditions for calmness by using some recent developments in variational analysis. These sufficient conditions are different from existing results (especially, those error-bound-based ones) in that they can be easily verified for many concrete application models. Our analysis is focused on the fundamental proximal gradient (PG) method, and it enables us to show that any accumulation of the sequence generated by the PG method must be a stationary point in terms of the proximal subdifferential, instead of the limiting subdifferential. This result finds the surprising fact that the solution quality found by the PG method is in general superior. Our analysis also leads to some improvement for the linear convergence results of the PG method in the convex case. The new perturbation technique can be conveniently used to derive linear rate convergence of a number of other first-order methods including the well-known alternating direction method of multipliers and primal-dual hybrid gradient method, under mild assumptions.

Article PDF

Strong Convergence of Regularized New Proximal Point Algorithms

Article 11 March 2019

Strong convergence and bounded perturbation resilience of a modified proximal gradient algorithm

Article Open access 02 May 2018

Finite convergence analysis and weak sharp solutions for variational inequalities

Article 31 August 2016

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Attouch, H., Bolte, J., Redont, P., Soubeyran, A.: Proximal alternating minimization and projection methods for nonconvex problems: an approach based on the Kurdyka-Łojasiewicz inequality. Math. Oper. Res. 35(2), 438–457 (2010)
Article MathSciNet MATH Google Scholar
Attouch, H., Bolte, J., Svaiter, B.: Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward-backward splitting, and regularized Gauss-Seidel methods. Math. Program. 137(1), 91–129 (2013)
Article MathSciNet MATH Google Scholar
Aubin, J.: Lipschitz behavior of solutions to convex minimization problems. Math. Oper. Res. 9(1), 87–111 (1984)
Article MathSciNet MATH Google Scholar
Bai, K., Ye, J. J., Zhang, J.: Directional quasi-/pseudo-normality as sufficient conditions for metric subregularity. SIAM J. Optim. 29(4), 2625–2649 (2019)
Article MathSciNet MATH Google Scholar
Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009)
Article MathSciNet MATH Google Scholar
Bello-Cruz, J., Li, G., Nghia, T. T. A.: On the Q-linear convergence of forward-backward splitting method and uniqueness of optimal solution to lasso. arXiv:1806.06333 (2018)
Bolte, J., Daniilidis, A., Lewis, A.: The Łojasiewicz inequality for nonsmooth subanalytic functions with applications to subgradient dynamical systems. SIAM J. Optim. 17(4), 1205–1223 (2007)
Article MATH Google Scholar
Bolte, J, Daniilidis, A., Ley, O., Mazet, L.: Characterizations of Łojasiewicz inequalities: subgradient flows, talweg, convexity. Trans. Amer. Math. Soc. 362(6), 3319–3363 (2010)
Article MathSciNet MATH Google Scholar
Bolte, J., Nguyen, T. P., Peypouquet, J., Suter, B. W.: From error bounds to the complexity of first-order descent methods for convex functions. Math. Program. 165, 471–507 (2017)
Article MathSciNet MATH Google Scholar
Bolte, J., Sabach, S., Teboulle, M.: Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Math. Program. 146(1), 459–494 (2014)
Article MathSciNet MATH Google Scholar
Cauchy, A. L.: Méthode générale pour la résolution des systèms d’equations simultanées. Comptes Rendus de l’Académie des Sciences 25, 46–89 (1847)
Google Scholar
Chambolle, A., Pock, T.: A first-order primal-dual algorithms for convex problem with applications to imaging. J. Math. Imaging Vis. 40, 120–145 (2011)
Article MathSciNet MATH Google Scholar
Clarke, F., Ledyaev, Y., Stern, R., Wolenski, P.: Nonsmooth Analysis and Control Theory, vol. 178. Springer Science & Business Media (2008)
Clarke, F., Stern, R., Wolenski, P.: Subgradient critieria for montonicity, the Lipschitz condition, and convexity. Can. J. Math. 45, 1167–1183 (1993)
Article MATH Google Scholar
Dontchev, A., Rockafellar, R. T.: Implicit Functions and Solution Mappings. Springer Monographs in Mathematics (2009)
Drusvyatskiy, D., Ioffe, A., Lewis, A.: Nonsmooth optimization using taylor-like models: error bounds, convergence, and termination criteria. Math. Program., 1–27 (2019)
Drusvyatskiy, D., Lewis, A.: Error bounds, quadratic growth, and linear convergence of proximal methods. Math. Oper. Res. 43(3), 919–948 (2018)
Article MathSciNet MATH Google Scholar
Drusvyatskiy, D., Mordukhovich, B. S., Nghia, T. T. A.: Second-order growth, tilt stability, and metric regularity of the subdifferential. J. Convex Anal. 21(4), 1165–1192 (2014)
MathSciNet MATH Google Scholar
Fan, J. Q., Li, R. Z.: Variable selection via nonconcave penalized likelihood and its oracle properties. J. Amer. Stat. Assoc. 96(456), 1348–1360 (2001)
Article MathSciNet MATH Google Scholar
Frankel, P., Garrigos, G., Peypouquet, J.: Splitting methods with variable metric for Kurdyka-Łojasiewicz functions and general convergence rates. J. Optim. Theory Appl. 165(3), 874–900 (2015)
Article MathSciNet MATH Google Scholar
Gfrerer, H.: On directional metric regularity, subregularity and optimality conditions for nonsmooth mathematical programs. Set-Valued Var. Anal. 21(2), 151–176 (2013)
Article MathSciNet MATH Google Scholar
Gfrerer, H.: Optimality conditions for disjunctive programs based on generalized differentiation with application to mathematical programs with equilibrium constraints. SIAM J. Optim. 24(2), 898–931 (2014)
Article MathSciNet MATH Google Scholar
Gfrerer, H., Klatte, D.: Lipschitz and Hölder stability of optimization problems and generalized equations. Math. Program. 158(1), 35–75 (2016)
Article MathSciNet MATH Google Scholar
Gfrerer, H., Outrata, J.: On Lipschitzian properties of implicit multifunctions. SIAM J. Optim. 26(4), 2160–2189 (2016)
Article MathSciNet MATH Google Scholar
Gfrerer, H., Ye, J. J.: New constraint qualifications for mathematical programs with equilibrium constraints via variational analysis. SIAM J. Optim. 27 (2), 842–865 (2017)
Article MathSciNet MATH Google Scholar
Ginchev, I., Mordukhovich, B. S.: On directionally dependent subdifferentials. Comptes rendus de L’Academie Bulgare des Sciences 64, 497–508 (2011)
MathSciNet MATH Google Scholar
Goodfellow, T., Bengio, Y., Courville, A.: Deep Learning. MIT Press (2016)
He, B., Fu, X., Jiang, Z.: Proximal-point algorithm using a linear proximal term. J. Optim. Theory Appl. 141(2), 299–319 (2009)
Article MathSciNet MATH Google Scholar
He, B., Yuan, X.: Convergence analysis of primal-dual algorithms for a saddle-point problem: from contraction perspective. J. Math. Imaging Vis. 5(1), 119–149 (2012)
MathSciNet MATH Google Scholar
Henrion, R., Jourani, A., Outrata, J.: On the calmness of a class of multifunctions. SIAM J. Optim. 13(2), 603–618 (2002)
Article MathSciNet MATH Google Scholar
Henrion, R., Outrata, J.: Calmness of constraint systems with applications. Math. Program. 104(2–3), 437–464 (2005)
Article MathSciNet MATH Google Scholar
Hoffman, A. J.: On approximate solutions of systems of linear inequalities. J. Res. Natl. Bur. Stand. 49(4), 263–265 (1952)
Article MathSciNet Google Scholar
Klatte, D., Kummer, B.: Constrained minima and lipschitzian penalties in metric spaces. SIAM J. Optim. 13(2), 619–633 (2002)
Article MathSciNet MATH Google Scholar
LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015)
Article Google Scholar
Li, G., Pong, T. -K.: Calculus of the exponent of Kurdyka-Łojasiewicz inequality and its applications to linear convergence of first-order methods. Found. Comut. Math. 18, 1199–1232 (2018)
Article MATH Google Scholar
Liu, Y., Yuan, X., Zeng, S., Zhang, J.: Partial error bound conditions and the linear convergence rate of alternating direction method of multipliers. SIAM J. Numer. Anal. 56(4), 2095–2123 (2018)
Article MathSciNet MATH Google Scholar
Luo, Z. -Q., Tseng, P.: Error bounds and convergence analysis of feasible descent methods: a general approach. Ann. Oper. Res. 46(1), 157–178 (1993)
Article MathSciNet MATH Google Scholar
Luo, Z. -Q., Tseng, P.: Error bound and convergence analysis of matrix splitting algorithms for the affine variational inequality problem. SIAM J. Optim. 2(1), 43–54 (2006)
Article MathSciNet MATH Google Scholar
Mordukhovich, B.: Variational Analysis and Generalized Differentiation I: Basic theory, II: Applications. Springer Science & Business Media (2006)
Nesterov, Y.: Introductory Lectures on Convex Optimization: a Basic Course, vol. 87. Springer Science & Business Media (2013)
Noll, D.: Convergence of non-smooth descent methods using the Kurdyka–Łojasiewicz inequality. J. Optim. Theory Appl. 160(2), 553–572 (2014)
Article MathSciNet MATH Google Scholar
O’donoghue, B., Candes, E.: Adaptive restart for accelerated gradient schemes. Found. Comut. Math. 15(3), 715–732 (2015)
Article MathSciNet MATH Google Scholar
Pan, S., Liu, Y. (2019)
Passty, G.: Ergodic convergence to a zero of the sum of monotone operators in Hilbert space. J Math. Anal. Appl. 72(2), 383–390 (1979)
Article MathSciNet MATH Google Scholar
Polyak, B. T.: Introduction to Optimization. Optimization Software Incorporation, New York (1987)
MATH Google Scholar
Reed, R.: Pruning algorithms - a survey. IEEE Trans. Neural Netw. 4(5), 740–747 (1993)
Article Google Scholar
Robinson, S.: Stability theory for systems of inequalities. Part i: linear systems. SIAM J. Numer. Anal. 12(5), 754–769 (1975)
Article MathSciNet MATH Google Scholar
Robinson, S.: Strongly regular generalized equations. Math. Oper. Res. 5(1), 43–62 (1980)
Article MathSciNet MATH Google Scholar
Robinson, S.: Some continuity properties of polyhedral multifunctions. Math. Program. Stud. 14, 206–214 (1981)
Article MathSciNet MATH Google Scholar
Rockafellar, R. T., Wets, R.: Variational Analysis. Springer Science & Business Media (2009)
Schmidt, M., Roux, N., Bach, F.: Convergence rates of inexact proximal-gradient methods for convex optimization. In: Advances in Neural Information Processing Systems, vol. 24, pp 1458–1466 (2011)
Tao, S., Boley, D., Zhang, S.: Local linear convergence of ISTA and FISTA on the LASSO problem. SIAM J. Optim. 26(1), 313–336 (2016)
Article MathSciNet MATH Google Scholar
Tseng, P.: Approximation accuracy, gradient methods, and error bound for structured convex optimization. Math. Program. 125(2), 263–295 (2010)
Article MathSciNet MATH Google Scholar
Tseng, P., Yun, S.: A coordinate gradient descent method for nonsmooth separable minimization. Math. Program. 117(1), 387–423 (2009)
Article MathSciNet MATH Google Scholar
Wen, B., Chen, X., Pong, T. -K.: Linear convergence of proximal gradient algorithm with extrapolation for a class of nonconvex nonsmooth minimization problems. SIAM J. Optim. 27(1), 124–145 (2017)
Article MathSciNet MATH Google Scholar
Xiao, L., Zhang, T.: A proximal-gradient homotopy method for the sparse least-squares problem. SIAM J. Optim. 23(2), 1062–1091 (2013)
Article MathSciNet MATH Google Scholar
Yang, W., Han, D.: Linear convergence of the alternating direction method of multipliers for a class of convex optimization problems. SIAM J. Numer. Anal. 54, 625–640 (2016)
Article MathSciNet MATH Google Scholar
Ye, J. J.: Constraint qualifications and necessary optimality conditions for optimization problems with variational inequality constraints. SIAM J. Optim. 10(4), 943–962 (2000)
Article MathSciNet MATH Google Scholar
Ye, J. J., Ye, X. Y.: Necessary optimality conditions for optimization problems with variational inequality constraints. Math. Oper. Res. 22(4), 977–997 (1997)
Article MathSciNet MATH Google Scholar
Ye, J. J., Yuan, X., Zeng, S., Zhang, J.: Variational analysis perspective on linear convergence of some first order methods for nonsmooth convex optimization problems. Optimization-online preprint (2018)
Ye, J. J., Zhou, J.: Verifiable sufficient conditions for the error bound property of second-order cone complementarity problems. Math. Program. 171, 361–395 (2018)
Article MathSciNet MATH Google Scholar
Yuan, X., Zeng, S., Zhang, J.: Discerning the linear convergence of ADMM for structured convex optimization through the lens of variational analysis. J. Mach. Learn. Res. 21, 1–75 (2020)
MathSciNet MATH Google Scholar
Zhang, C. -H.: Nearly unbiased variable selection under minimax concave penalty. Ann. Stat. 38(2), 894–942 (2010)
Article MathSciNet MATH Google Scholar
Zhou, Z., So, A. M. -C.: A unified approach to error bounds for structured convex optimization problems. Math. Program. 165(2), 689–728 (2017)
Article MathSciNet MATH Google Scholar

Download references

Acknowledgments

The authors would like to thank the anonymous referees for their helpful suggestions and comments.

Author information

Authors and Affiliations

School of Computer Science and Technology, East China Normal University, Shanghai, China
Xiangfeng Wang
Department of Mathematics and Statistics, University of Victoria, Victoria, Canada
Jane J. Ye
Department of Mathematics, The University of Hong Kong, Hong Kong, China
Xiaoming Yuan & Shangzhi Zeng
Department of Mathematics, SUSTech International Center for Mathematics, Southern University of Science and Technology, National Center for Applied Mathematics Shenzhen, Shenzhen, China
Jin Zhang

Authors

Xiangfeng Wang
View author publications
You can also search for this author in PubMed Google Scholar
Jane J. Ye
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoming Yuan
View author publications
You can also search for this author in PubMed Google Scholar
Shangzhi Zeng
View author publications
You can also search for this author in PubMed Google Scholar
Jin Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jin Zhang.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The research was partially supported by NSFC No. 11871279 and 11971090, NSERC, the General Research Fund from Hong Kong Research Grants Council: 12302318, National Science Foundation of China 11971220, Guangdong Basic and Applied Basic Research Foundation 2019A1515011152

Appendix A

1.1 A.1 Proof of Lemma 2

(1) Since x^k+ 1 is the optimal solution of the proximal operation (1.3) with a = x^k − γ∇f(x^k), we have

$$ \begin{array}{@{}rcl@{}} && g(x^{k+1}) + \frac{1}{2\gamma}\left\| x^{k+1} - \left( x^{k} - \gamma \nabla f(x^{k}) \right) \right\|^{2} \le g(x^{k}) + \frac{1}{2\gamma}\left\| \gamma \nabla f(x^{k}) \right\|^{2}, \end{array} $$

which can be reformulated as

$$ g(x^{k+1}) + \frac{1}{2\gamma}\left\| x^{k+1} - x^{k} \right\|^{2} + \left\langle \nabla f(x^{k}), x^{k+1} - x^{k} \right\rangle - g(x^{k}) \le 0. $$

(A.1)

Furthermore, since ∇f(x) is globally Lipschitz continuous with the Lipschitz constant L, we have

$$ f(x^{k+1})\le f(x^{k}) + \left\langle \nabla f(x^{k}), x^{k+1} - x^{k} \right\rangle + \frac{L}{2}\left\| x^{k+1} - x^{k} \right\|^{2}. $$

Adding the above inequality to (A.1) we obtain

$$ F(x^{k+1}) - F(x^{k}) \le \left( \frac{L}{2} - \frac{1}{2\gamma} \right) \left\| x^{k+1} - x^{k} \right\|^{2}. $$

As a result if $\gamma < \frac {1}{L}$ we have (3.1) with $\kappa _{1} := \frac {1}{2\gamma } - \frac {L}{2}$.

(2) By the optimality of x^k+ 1 we have that for any x,

$$ g(x^{k+1}) + \frac{1}{2\gamma}\left\| x^{k+1} - x^{k} + \gamma \nabla f(x^{k}) \right\|^{2} \le g(x) + \frac{1}{2\gamma}\left\| x - x^{k} + \gamma \nabla f(x^{k}) \right\|^{2}, $$

which can be reformulated as

$$ g(x^{k+1}) - g(x) \le \frac{1}{2\gamma}\left\| x - x^{k} \right\|^{2} - \frac{1}{2\gamma}\left\| x^{k+1} - x^{k} \right\|^{2} + \left\langle \nabla f(x^{k}), x - x^{k+1} \right\rangle. $$

By the Lipschitz continuity of ∇f(x),

$$ f(x) \ge f(x^{k+1}) + \left\langle \nabla f(x^{k+1}), x - x^{k+1} \right\rangle - \frac{L}{2}\left\| x - x^{k+1} \right\|^{2}. $$

By the above two inequalities we obtain

$$ \begin{array}{@{}rcl@{}} F(x^{k+1}) - F(x)&\le&\frac{1}{2\gamma}\left\| x - x^{k} \right\|^{2} - \frac{1}{2\gamma}\left\| x^{k+1} - x^{k} \right\|^{2} + \left\langle \nabla f(x^{k}), x - x^{k+1} \right\rangle \\ && - \left\langle \nabla f(x^{k+1}), x - x^{k+1} \right\rangle + \frac{L}{2} \left\| x - x^{k+1} \right\|^{2}\\ &\le&\frac{1}{\gamma}\left\| x - x^{k+1}\right\|^{2} + \frac{1}{\gamma} \left\| x^{k+1} - x^{k} \right\|^{2} - \frac{1}{2\gamma}\left\| x^{k+1} - x^{k} \right\|^{2} \\ && +\left\langle \nabla f(x^{k}) - \nabla f(x^{k+1}), x - x^{k+1} \right\rangle + \frac{L}{2} \left\| x - x^{k+1} \right\|^{2}\\ &\le&\frac{1}{\gamma}\left\| x - x^{k+1}\right\|^{2}+ \frac{1}{\gamma} \left\| x^{k+1} - x^{k} \right\|^{2}-\frac{1}{2\gamma} \left\| x^{k+1} - x^{k} \right\|^{2} \\ && + \frac{L}{2} \left\| x^{k+1} - x^{k} \right\|^{2} + \frac{1}{2} \left\| x - x^{k+1} \right\|^{2} + \frac{L}{2} \left\| x - x^{k+1} \right\|^{2}\\ &=& \left( \frac{1}{\gamma} + \frac{L+1}{2} \right) \left\| x - x^{k+1} \right\|^{2} + \left( \frac{L}{2} + \frac{1}{2\gamma} \right) \left\| x^{k} - x^{k+1} \right\|^{2}, \end{array} $$

from which we can obtain (3.2) with $\kappa _{2} := \max \limits \left \{ \left (\frac {1}{\gamma } + \frac {L+1}{2} \right ), \left (\frac {L}{2} + \frac {1}{2\gamma } \right ) \right \}$.

1.2 A.2 Proof of Theorem 5

In the proof, we denote by $\zeta :=F(\bar x)$ for succinctness. And we recall that the proper separation of the stationary value condition holds on $\bar {x} \in {\mathcal {X}}^{\pi }$, i.e., there exists δ > 0 such that

$$ x \in {\mathcal{X}}^{\pi}\cap {\mathbb{B}} (\bar{x},\delta )\quad \Longrightarrow \quad F(x) = F(\bar{x}). $$

(A.2)

Without lost of generality, we assume that 𝜖 < δ/(κ + 1) throughout the proof.

Step 1. We prove that $\bar x$ is a stationary point and

$$ \lim_{k\rightarrow \infty} \|x^{k+1} -x^{k}\|=0. $$

(A.3)

Adding the inequalities in (3.1) starting from iteration k = 0 to an arbitrary positive integer K, we obtain

$$ \sum\limits_{k=0}^{K} \left\| x^{k+1} - x^{k} \right\|^{2} \le \frac{1}{\kappa_{1}} \left( F(x^{0}) - F(x^{K+1}) \right)\le \frac{1}{\kappa_{1}} \left( F(x^{0}) - F_{\min} \right) < \infty. $$

It follows that ${\sum }_{k=0}^{\infty } \left \| x^{k+1} - x^{k} \right \|^{2} < \infty ,$ and consequently (A.3) holds. Let $\{ x^{k_{i}} \}_{i=1}^{\infty }$ be a convergent subsequence of $\left \{ x^{k} \right \}$ such that $x^{k_{i}}\rightarrow \bar {x}$ as $i\rightarrow \infty $. Then by (A.3), we have

$$ \lim_{i\rightarrow \infty} x^{k_{i}}=\lim_{i\rightarrow \infty} x^{k_{i}-1}=\bar x. $$

(A.4)

Since

$$ x^{k_{i}} \in\text{Prox}_{g}^{\gamma} \left (x^{k_{i}-1} -\gamma \nabla f(x^{k_{i}-1}) \right ), $$

(A.5)

let $i\rightarrow \infty $ in (A.5) and by the outer semicontinuity of $\text {Prox}_{g}^{\gamma } (\cdot )$ (see [50, Theorem 1.25]) and continuity of ∇f, we have

$$ \bar{x}\in \text{Prox}_{g}^{\gamma} \left( \bar{x} -\gamma \nabla f(\bar{x}) \right ), $$

Using the definition of the proximal operator and applying the optimality condition and we have

$$ 0\in \nabla f \left( \bar{x}\right) + \partial^{\pi} g \left( \bar{x}\right) , $$

and so $\bar x\in \mathcal {X}^{\pi }$.

Step 2. Given $\hat {\epsilon } > 0$ such that $\hat {\epsilon } < \delta /\epsilon - \kappa -1$, for each k > 0, we can find ${{\bar {x}^{k} \in \mathcal {X}^{\pi }}}$ such that

$$\left\| \bar{x}^{k} - x^{k} \right\| \le \min\left\{\sqrt{d\left( x^{k},{{{\mathcal{X}}^{\pi}}} \right)^{2} + \hat{\epsilon}\|x^{k} - x^{k-1}\|^{2}}, d\left( x^{k}, {{{\mathcal{X}}^{\pi}}} \right) + \hat{\epsilon}\|x^{k} - x^{k-1}\|\right\}.$$

It follows by the cost-to-estimate condition (3.2) we have

$$ F(x^{k}) - F(\bar{x}^{k})\le \hat{ \kappa}_{2} \left( \text{dist} \left( x^{k}, {\mathcal{X}}^{\pi} \right)^{2} + \left\| x^{k} - x^{k-1} \right\|^{2} \right), $$

(A.6)

with $\hat { \kappa }_{2} = \kappa _{2}(1+\hat {\epsilon })$. Now we use the method of mathematical induction to prove that there exists k_ℓ > 0 such that for all j ≥ k_ℓ,

$$ \begin{array}{@{}rcl@{}} && x^{j}\in \mathbb{B} \left( \bar{x}, \epsilon \right),\quad x^{j+1}\in \mathbb{B} \left( \bar{x}, \epsilon \right),\quad F(\bar{x}^{j}) = \zeta ,\quad {{F(\bar{x}^{j+1}) = \zeta}}, \end{array} $$

(A.7)

$$ \begin{array}{@{}rcl@{}} && F(x^{j+1}) - \zeta \le \hat{ \kappa}_{2} \left( \text{dist} \left( x^{j+1}, {\mathcal{X}}^{\pi} \right)^{2} + \left\| x^{j+1} - x^{j} \right\|^{2} \right), \end{array} $$

(A.8)

$$ \sum\limits_{i=k_{\ell}}^{j} \left\| x^{i} - x^{i+1} \right\| \le \frac{\left\| x^{k_{\ell}-1} - x^{k_{\ell}} \right\| - \left\| x^{j} - x^{j+1} \right\|}{2} + c\left[\sqrt{F(x^{k_{\ell}}) - \zeta} - \sqrt{F(x^{j+1}) - \zeta}\right], $$

(A.9)

where the constant $c:=\frac {2\sqrt {\hat { \kappa }_{2}(\kappa ^{2}+1)}}{\kappa _{1}}> 0$.

By (A.4) and the fact that F is continuous in its domain, there exists k_ℓ > 0 such that $x^{k_{\ell }}\in \mathbb {B} \left (\bar {x},\epsilon \right )$, $x^{k_{\ell }+1}\in \mathbb {B} \left (\bar {x},\epsilon \right )$,

$$ \begin{array}{@{}rcl@{}} &&{\left\| x^{k_{\ell}} - \bar{x} \right\| + \frac{\left\| x^{k_{\ell}-1} - x^{k_{\ell}} \right\| }{2} + c\left[\sqrt{F(x^{k_{\ell}}) - \zeta}\right]\le \frac{\epsilon}{2}}, \end{array} $$

(A.10)

$$ \begin{array}{@{}rcl@{}} &&{\left\| x^{k+1} - x^{k} \right\| < \frac{\epsilon}{2} , \quad \forall k \ge k_{\ell} - 1}, \end{array} $$

(A.11)

$$ \begin{aligned} \left\|\bar{x}^{k_{\ell}} - \bar{x}\right\| &\le \left\|\bar{x}^{k_{\ell}} - x^{k_{\ell}}\right\| + \left\|x^{k_{\ell}} - \bar{x}\right\| \\ &\overset{{(3.4)}}{\le} (\kappa+\hat{\epsilon}) \left\|x^{k_{\ell}} - x^{k_{\ell} - 1}\right\| + \left\|x^{k_{\ell}} - \bar{x}\right\| < {(\kappa +\hat{\epsilon} + 2)\epsilon/2} < \delta, \end{aligned} $$

which indicates $\bar {x}^{k_{\ell }}\in {{{\mathcal {X}}^{\pi }}} \cap {\mathbb {B}}\left (\bar {x},\delta \right )$. It follows by the proper separation of the stationary value condition (A.2) that $F\left (\bar {x}^{k_{\ell }} \right ) = \zeta $.

Before inducing (A.7)–(A.9), we should get ready by showing that for j ≥ k_ℓ, if (A.7) and (A.8) hold, then

$$ 2\left\| x^{j} - x^{j+1} \right\| \le c\left[\sqrt{F(x^{j}) - \zeta} - \sqrt{F(x^{j+1}) - \zeta}\right] + \frac{\left\| x^{j} - x^{j+1} \right\| + \left\| x^{j-1} - x^{j} \right\|}{2}. $$

(A.12)

Firstly, since $x^{j}\in \mathbb {B} \left (\bar {x}, \epsilon \right )$, $F(\bar {x}^{j}) = \zeta $ and (A.6) holds, it follows from (3.4) that

$$ F(x^{j})-\zeta\leq \hat{ \kappa}_{2}(\kappa \|x^{j}-x^{j-1}\|^{2}+ \|x^{j}-x^{j-1}\|^{2}) ={\kappa_{3}^{2}}\|x^{j}-x^{j-1}\|^{2}, $$

(A.13)

where $\kappa _{3} := \sqrt {\hat { \kappa }_{2} \left (\kappa ^{2} + 1 \right )}$. Similarly, since $x^{j+1}\in \mathbb {B} \left (\bar {x}, \epsilon \right )$ and $F(\bar {x}^{j+1}) = \zeta $, by (A.6) and condition (3.4), we have

$$ \begin{array}{@{}rcl@{}} F(x^{j+1}) - \zeta &\le&{\kappa_{3}^{2}} \left\| x^{j+1} - x^{j} \right\|^{2}. \end{array} $$

(A.14)

As a result, we can obtain

$$ \begin{array}{@{}rcl@{}} \sqrt{F(x^{j}) - \zeta} - \sqrt{F(x^{j+1}) - \zeta}&=&\frac{\left( F(x^{j}) - \zeta\right) - \left( F(x^{j+1}) - \zeta\right)}{\sqrt{F(x^{j}) - \zeta} + \sqrt{F(x^{j+1}) - \zeta}}\\ &=& \frac{F(x^{j}) -F(x^{j+1})}{\sqrt{F(x^{j}) - \zeta} + \sqrt{F(x^{j+1}) - \zeta}}\\ &\overset{\text{(3.1)(A.46)(A.47)}}{\ge} & \frac{\kappa_{1} \left\| x^{j+1} - x^{j} \right\|^{2}}{\kappa_{3}\left( \left\| x^{j} - x^{j-1} \right\| + \left\| x^{j+1} - x^{j} \right\| \right)}. \end{array} $$

After defining $c: = \frac {2\kappa _{3}}{\kappa _{1}}$, we have

$$ \left( c \left[\sqrt{F(x^{j}) - \zeta} - \sqrt{F(x^{j+1}) - \zeta}\right] \right)\left( \frac{\left\| x^{j} - x^{j+1} \right\| + \left\| x^{j-1} - x^{j} \right\|}{2} \right)\ge \left\| x^{j+1} - x^{j} \right\|^{2}, $$

from which by applying $ab\le \left (\frac {a+b}{2} \right )^{2}$ we establish (A.12).

Next we proceed to prove the three properties (A.7)–(A.9) by induction on j. For j = k_ℓ, we have

$$ x^{k_{\ell}}\in \mathbb{B} \left( \bar{x}, \epsilon \right),\quad x^{k_{\ell}+1}\in \mathbb{B} \left( \bar{x}, \epsilon \right),\quad F(\bar{x}^{k_{\ell}}) = \zeta, $$

and similar to the estimate of $\left \|\bar {x}^{k_{\ell }} - \bar {x}\right \|$, we can show

$$ \left\| \bar{x}^{k_{\ell}+1} - \bar{x} \right\| \le \delta. $$

It follows by (A.2) that $F(\bar {x}^{k_{\ell }+1}) = \zeta $, and hence by (A.6),

$$ F(x^{k_{\ell}+1}) - \zeta \le \hat{ \kappa}_{2} \left( \text{dist} \left( x^{k_{\ell}+1}, {{{\mathcal{X}}^{\pi}}} \right)^{2} + \left\| x^{k_{\ell}+1} - x^{k_{\ell}} \right\|^{2} \right), $$

which is (A.8) with j = k_ℓ. Note that property (A.9) for j = k_ℓ can be obtained directly through (A.12).

Now suppose (A.7) (A.8) and (A.9) hold for certain j > k_ℓ. By induction we also want to show that (A.7) (A.8) and (A.9) hold for j + 1. We have

$$ \begin{array}{@{}rcl@{}} \left\| x^{j+2} - \bar{x} \right\|&\le& \left\| x^{k_{\ell}} - \bar{x} \right\| + {\sum}_{i={k_{\ell}}}^{j} \left\| x^{i} - x^{i+1} \right\| + \left\| x^{j+1} - x^{j+2} \right\| \\ &<& \left\| x^{k_{\ell}} - \bar{x} \right\| + \frac{\left\| x^{k_{\ell}-1} - x^{k_{\ell}} \right\| - \left\| x^{j} - x^{j+1} \right\|}{2}\\ && + c\left[\sqrt{F(x^{k_{\ell}}) - \zeta} - \sqrt{F(x^{j+1}) - \zeta}\right] + \frac{\epsilon}{2} \\ &\le& \left\| x^{k_{\ell}} - \bar{x} \right\| + \frac{\left\| x^{k_{\ell}-1} - x^{k_{\ell}} \right\| }{2} + c\left[\sqrt{F(x^{k_{\ell}}) - \zeta}\right] + \frac{\epsilon}{2} \le \epsilon, \end{array} $$

where the second inequality follows from (A.9) and (A.11) and the last inequality follows from (A.10). Since $x^{j+2}\in \mathbb {B}(\bar x, \epsilon )$, by the definition of $\bar {x}^{j}$ and (3.4), there holds that

$$\begin{aligned} \left\| \bar{x}^{j+2} - \bar{x} \right\| &\le \| \bar{x}^{j+2} - x^{j+2} \| + \| x^{j+2} - \bar{x} \| \\ &\le (\kappa+\hat{\epsilon})\| x^{j+2} - x^{j+1} \| + \epsilon \\ & < (\kappa +\hat{\epsilon} +2)\epsilon/2 < \delta, \end{aligned}$$

where the third inequality follows from (A.11). It follows from the proper separation of stationary value assumption (A.2) that $F(\bar {x}^{j+2}) = \zeta $. Consequently by (A.6), we have

$$ F(x^{j+2}) - \zeta \le \hat{ \kappa}_{2} \left( \text{dist} \left( x^{j+2}, {{{\mathcal{X}}^{\pi}}} \right)^{2} + \left\| x^{j+2} - x^{j+1} \right\|^{2} \right). $$

So far we have shown that (A.7)-(A.8) hold for j + 1. Moreover

$$ \begin{array}{@{}rcl@{}} && \sum\limits_{i=k_{\ell}}^{j+1} \left\| x^{i} - x^{i+1} \right\| \\ &\overset{\text{(A.42)}}{\le}& \frac{\left\| x^{k_{\ell}-1} - x^{k_{\ell}} \right\| - \left\| x^{j} - x^{j+1} \right\|}{2} + c\left[\sqrt{F(x^{k_{\ell}}) - \zeta} - \sqrt{F(x^{j+1}) - \zeta}\right] \\ && + \left\| x^{j+1} - x^{j+2} \right\|\\ &\overset{\text{(A.45)}}{\le}& \frac{\left\| x^{k_{\ell}-1} - x^{k_{\ell}} \right\| - \left\| x^{j} - x^{j+1} \right\|}{2} + c\left[\sqrt{F(x^{k_{\ell}}) - \zeta} - \sqrt{F(x^{j+1}) - \zeta}\right] \\ && + c\left[\sqrt{F(x^{j+1}) - \zeta} - \sqrt{F(x^{j+2}) - \zeta}\right] + \frac{\left\| x^{j+1} - x^{j+2} \right\| + \left\| x^{j} - x^{j+1} \right\|}{2}\\ && - \left\| x^{j+1} - x^{j+2} \right\|\\ &=& \frac{\left\| x^{k_{\ell}-1} - x^{k_{\ell}} \right\| - \left\| x^{j+1} - x^{j+2} \right\|}{2} + c\left[\sqrt{F(x^{k_{\ell}}) - \zeta} - \sqrt{F(x^{j+2}) - \zeta}\right], \end{array} $$

from which we obtain (A.9) for j + 1. The desired induction on j is now complete. In summary, we have now proved the properties (A.7)–(A.9).

Step 3. We prove that the whole sequence {x^k} converges to $\bar x$ and (3.5)–(3.6) hold.

By (A.9), for all j ≥ k_ℓ

$$ \begin{aligned} \sum\limits_{i=k_{\ell}}^{j} \left\| x^{i} - x^{i+1} \right\| &\le \frac{\left\| x^{k_{\ell}-1} - x^{k_{\ell}} \right\| - \left\| x^{j} - x^{j+1} \right\|}{2} + c\left[\sqrt{F(x^{k_{\ell}}) - \zeta} - \sqrt{F(x^{j+1}) - \zeta}\right]\\ &\le \frac{\left\| x^{k_{\ell}-1} - x^{k_{\ell}} \right\|}{2} + c \sqrt{F(x^{k_{\ell}}) - \zeta} < \infty, \end{aligned} $$

which indicates that $\left \{ x^{k} \right \}$ is a Cauchy sequence. It follows that the whole sequence converges to the stationary point $\bar {x}$. Further for all k ≥ k_ℓ, we have $x^{k}\in \mathbb {B}(\bar {x},\epsilon )$. As a result, the PG-iteration-based error bound condition (3.4) holds on all the iteration points $\left \{ x^{k} \right \}_{k > k_{\ell }}$. Recall that by (3.1) and (A.14), we have

$$ \begin{array}{@{}rcl@{}} F(x^{k+1}) - F(x^{k}) &\le & -\kappa_{1}\left\| x^{k+1} - x^{k} \right\|^{2}, \\ F(x^{k+1}) - \zeta &\le & \hat{ \kappa}_{2} \left( \kappa^{2} + 1 \right)\left\| x^{k+1} - x^{k} \right\|^{2} , \end{array} $$

which implies that

$$ F(x^{k}) - F(x^{k+1})\ge \frac{\kappa_{1}}{\hat{ \kappa}_{2}\left( \kappa^{2} + 1 \right)} \left( F(x^{k+1}) - \zeta\right). $$

We can observe easily that

$$ F(x^{k}) - \zeta + \zeta - F(x^{k+1})\ge \frac{\kappa_{1}}{\hat{ \kappa}_{2}\left( \kappa^{2} + 1 \right)} \left( F(x^{k+1}) - \zeta\right). $$

Thus we have

$$ F(x^{k+1}) - \zeta \le \sigma \left( F(x^{k}) - \zeta\right), \text{with } \sigma := \frac{1}{1 + \frac{\kappa_{1}}{\hat{ \kappa}_{2}\left( \kappa^{2} + 1 \right)}} < 1, $$

which completes the proof of (3.5).

Inspired by [6], we have following linear convergence result for sequence {x^k}. Recall the sufficient descent property (3.1),

$$F(x^{k+1}) - F(x^{k}) \le -\kappa_{1}\left\| x^{k+1} - x^{k} \right\|^{2},$$

which indicates that there exists a constant C such that

$$ \left\| x^{k+1} - x^{k} \right\| \le \sqrt{ \frac{1}{\kappa_{1}} \left( F(x^{k}) - F(x^{k+1}) \right) }\le \sqrt{ \frac{1}{\kappa_{1}} \left( F(x^{k}) - \zeta \right) }\le C \sqrt{\sigma}^{k}. $$

In addition, we have that

$$ \left\| x^{k} - \bar{x} \right\| \le {\sum}_{i=k}^{\infty} \left\| x^{i} - x^{i+1} \right\|\le {\sum}_{i=k}^{\infty} C\sqrt{\sigma}^{i} \le \frac{C}{1-\sqrt{\sigma}} \sqrt{\sigma}^{k}, $$

which implies (3.6) with $\rho _{0} = \frac {C}{1-\sqrt {\sigma }}$.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, X., Ye, J.J., Yuan, X. et al. Perturbation Techniques for Convergence Analysis of Proximal Gradient Method and Other First-Order Algorithms via Variational Analysis. Set-Valued Var. Anal 30, 39–79 (2022). https://doi.org/10.1007/s11228-020-00570-0

Download citation

Received: 28 June 2019
Accepted: 06 December 2020
Published: 28 January 2021
Issue Date: March 2022
DOI: https://doi.org/10.1007/s11228-020-00570-0

Keywords

Mathematics Subject Classification (2010)

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Perturbation Techniques for Convergence Analysis of Proximal Gradient Method and Other First-Order Algorithms via Variational Analysis

Abstract

Article PDF

Similar content being viewed by others

Strong Convergence of Regularized New Proximal Point Algorithms

Strong convergence and bounded perturbation resilience of a modified proximal gradient algorithm

Finite convergence analysis and weak sharp solutions for variational inequalities

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Appendix A

1.1 A.1 Proof of Lemma 2

1.2 A.2 Proof of Theorem 5

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification (2010)

Navigation

Perturbation Techniques for Convergence Analysis of Proximal Gradient Method and Other First-Order Algorithms via Variational Analysis

Abstract

Article PDF

Similar content being viewed by others

Strong Convergence of Regularized New Proximal Point Algorithms

Strong convergence and bounded perturbation resilience of a modified proximal gradient algorithm

Finite convergence analysis and weak sharp solutions for variational inequalities

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Appendix A

Appendix A

1.1 A.1 Proof of Lemma 2

1.2 A.2 Proof of Theorem 5

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification (2010)

Search

Navigation