The restricted strong convexity revisited: analysis of equivalence to error bound and quadratic growth

Zhang, Hui

doi:10.1007/s11590-016-1058-9

The restricted strong convexity revisited: analysis of equivalence to error bound and quadratic growth

Original Paper
Published: 01 July 2016

Volume 11, pages 817–833, (2017)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Optimization Letters Aims and scope Submit manuscript

The restricted strong convexity revisited: analysis of equivalence to error bound and quadratic growth

Download PDF

Hui Zhang¹

1076 Accesses
20 Citations
Explore all metrics

Abstract

The restricted strong convexity is an effective tool for deriving globally linear convergence rates of descent methods in convex minimization. Recently, the global error bound and quadratic growth properties appeared as new competitors. In this paper, with the help of Ekeland’s variational principle, we show the equivalence between these three notions. To deal with convex minimization over a closed convex set and structured convex optimization, we propose a group of modified versions and a group of extended versions of these three notions by using gradient mapping and proximal gradient mapping separately, and prove that the equivalence for the modified and extended versions still holds. Based on these equivalence notions, we establish new asymptotically linear convergence results for the proximal gradient method. Finally, we revisit the problem of minimizing the composition of an affine mapping with a strongly convex differentiable function over a polyhedral set, and obtain a strengthened property of the restricted strong convex type under mild assumptions.

Nearly Optimal First-Order Methods for Convex Optimization under Gradient Norm Measure: an Adaptive Regularization Approach

Article 27 January 2021

From error bounds to the complexity of first-order descent methods for convex functions

Article 30 November 2016

Variational Convexity and the Local Monotonicity of Subgradient Mappings

Article 21 March 2019

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

To obtain globally linear convergence rates of gradient-type methods for minimizing convex (not necessarily strongly convex) differentiable functions, we recently proposed the restricted strong convexity (RSC) [19, 20], which is a strictly weaker concept than the strong convexity. Up to now, it has been proved that the RSC property is a very powerful tool for analyzing descent methods including (in)exact gradient method, restarted nonlinear CG, BFGS and its damped limited memory variants L-D-BFGS [17, 20]. Almost parallel to work [19], the authors of [11, 18] defined the global error bound (GEB) property in the spirit of Hoffman’s celebrated result on error bounds for systems of linear inequalities [7, 10]. They showed that the GEB property also guarantees globally linear convergence results for descent methods. Moreover, they figured out a class of convex programs that frequently appear in machine learning obeying the GEB property. Very recently, the authors of [6, 9, 13] proposed the quadratic growth (QG) property with different names; it was called second order growth property in [13], optimal strong convexity in [9], and semi-strongly convex property in [6]. They showed that the QG property can guarantee globally linear convergence results for descent methods as well. Since each of these three notions contributes as a linear convergence guarantee, it should be interesting to see what is the relationship between them. With the help of Ekeland’s variational principle, we show that they are actually equivalent.

To deal with convex minimization over a closed convex and structured convex optimization, we propose a group of modified versions and a group of extended versions of these three notions by using gradient mapping and proximal gradient mapping separately [14] to replace the gradient notion. Similarly, the modified and extended versions can be used to derive globally linear convergence results for a large class of descent methods in convex minimization [3, 5, 6, 13]. If the objective function in convex program involves the gradient-Lipschitz-continuous property, we prove that the equivalence for the modified and extended versions still holds. Based on these equivalence notions, we establish new asymptotically linear convergence results for the proximal gradient method, that are complementary to recently appearing theory.

The equivalence results in this paper provide us with alternative ways to check whether a given convex minimization problem satisfies the RSC property; in some cases, to check one of the equivalence properties might be much easier than to check the others. As a case study, we investigate the problem of minimizing the composition of an affine mapping with a strongly convex differentiable function over a polyhedral set, which is very popular in machine learning. We prove that this problem enjoys a strengthened property of the RSC type and hence the modified GEB property without any compactness assumption of polyhedral sets.

At the time of writing this paper, the authors of [4] showed the equivalence between the error bound (corresponding to our growth property) and the Kurdyka-${\L }$ojasiewicz inequality for convex functions having a moderately flat profile near the set of minimizers. As the convex functions are differentiable, this paper might provide complementary results to that in [4]. When our paper was under review, the authors of [5] posted their paper concerning the equivalence between the error bound and quadratic growth properties on arXiv. They defined the error bound condition by using the proximal gradient mapping, that generalizes our modified error bound property and the global error bound from the beginning proposed in [18]. However, they did not discuss the equivalence to the RSC property. Besides, our convergence results for the proximal gradient method are new and might be of interest in themselves.

The rest of paper is organized as follows. In Sect. 2, we present the basic notations and concepts discussed in this paper. In Sect. 3, we analyze the relationship of different notions. In Sect. 4, we discuss the convergence of proximal gradient method implied by the equivalence notions. In Sect. 5 is devoted to a case study.

2 Notation and definitions

We denote by $d(x, \mathcal {Y})$ the distance from a point x to a nonempty closed set $\mathcal {Y}$; that is, $d(x, \mathcal {Y})=\inf _{y\in \mathcal {Y}}\Vert x-y\Vert $. When $\mathcal {Y}$ is a single point set, i.e., $\mathcal {Y}=\{y\}$, we use d(x, y) to replace $d(x, \mathcal {Y})$ for simplicity. We will consider functions that take values in the extended real line $\overline{\mathbb {R}}:=\mathbb {R}\bigcup \{+\infty \}$. The projection of x onto a nonempty closed convex set $\mathcal {Y}$ is denoted by $[x]_\mathcal {Y}^+$. The spectral norm of a matrix X is given by $\Vert X\Vert $. The terminology below follows from [14]. A convex differentiable function g is gradient-Lipschitz-continuous if there exists a positive scalar L such that

$$\begin{aligned} \Vert \nabla g(x)-\nabla g(y)\Vert \le L\Vert x-y\Vert , \quad \forall x, y\in \mathbb {R}^n, \end{aligned}$$

(1)

or equivalently,

$$\begin{aligned} 0\le g(y)- g(x)-\langle \nabla g(x), y-x\rangle \le \frac{L}{2}\Vert y-x\Vert ^2, \quad \forall x, y\in \mathbb {R}^n, \end{aligned}$$

(2)

and strongly convex if there exists a positive scalar $\mu $ such that

$$\begin{aligned} \langle \nabla g(x)-\nabla g(y), x-y\rangle \ge \mu \Vert x-y\Vert ^2, \quad \forall x, y\in \mathbb {R}^n. \end{aligned}$$

(3)

2.1 Original versions for unconstrained convex program

Definition 1

Let $f:\mathbb {R}^n\rightarrow \mathbb {R}$ be convex differentiable. Denote $f^*=\min _{x\in \mathbb {R}^n} f(x)$ and $\mathcal {X}=\arg \min _{x\in \mathbb {R}^n} f(x)$ and assume that $\mathcal {X}$ is nonempty. Then the unconstrained convex program

$$\begin{aligned} \mathop {{{\mathrm{minimize}}}}\limits _{x\in \mathbb {R}^n} f(x) \end{aligned}$$

obeys

(a)
the restricted strongly convex property with constant $\nu >0$ if it satisfies the restricted secant inequality
$$\begin{aligned} \langle \nabla f(x), x-[x]_\mathcal {X}^+\rangle \ge \nu \cdot d^2(x,\mathcal {X}),\quad \forall x\in \mathbb {R}^n. \end{aligned}$$
(4)
(b)
the global error bound property with constant $\kappa >0$ if it satisfies the error upper bound inequality
$$\begin{aligned} \Vert \nabla f(x)\Vert \ge \kappa \cdot d(x,\mathcal {X}),\quad \forall x\in \mathbb {R}^n. \end{aligned}$$
(5)
(c)
the quadratic growth property with constant $\tau >0$ if it satisfies the second order growth of the function value
$$\begin{aligned} f(x)-f^*\ge \frac{\tau }{2}\cdot d^2(x,\mathcal {X}), \quad \forall x\in \mathbb {R}^n. \end{aligned}$$
(6)

We use $RSC(\nu )$, $GEB(\kappa )$, and $QG(\tau )$ to stand for the defined properties above respectively.

To ensure that $[x]_\mathcal {X}^+$ is well defined, we need $\mathcal {X}$ to be closed; this is implied by the differentiable of f(x).

Remark 1

The RSC property first appeared in [8] as a restricted secant inequality. The authors of [19, 20] formally defined it and figured out a class of non-trivial RSC functions.

The authors of [11, 18] proposed the GEB property in the spirit of Hoffman’s error bounds [7]. The local version of GEB only implies asymptotic linear convergence rates [10].

The QG property appeared in a couple of papers [6, 9, 13] with different names. The equivalence between the RSC and the QG of convex differentiable functions was shown in [17, 20].

2.2 Modified versions for constrained convex program

To deal with constrained convex programs, we first introduce the gradient mapping [14].

Definition 2

Let $ \gamma >0$ be a fixed constant and Q be a closed convex set, and let $\bar{x}\in \mathbb {R}^n$. Denote

$$\begin{aligned} x_Q(\bar{x};\gamma )= & {} \arg \min _{x\in Q}\left[ f(\bar{x})+ \langle \nabla f(\bar{x}), x-\bar{x}\rangle +\frac{\gamma }{2}\Vert x-\bar{x}\Vert ^2\right] \\ G^f_Q(\bar{x};\gamma )= & {} \gamma (\bar{x}-x_Q(\bar{x};\gamma )). \end{aligned}$$

We call $G^f_Q(\bar{x};\gamma )$ the gradient mapping of function f on Q.

When $Q=\mathbb {R}^n$, we have that

$$\begin{aligned} x_Q(\bar{x};\gamma )=\bar{x}-\frac{1}{\gamma }\nabla f(\bar{x}) \quad \text {and} \quad G^f_Q(\bar{x};\gamma )=\nabla f(\bar{x}). \end{aligned}$$

The latter implies that the gradient mapping generalizes the gradient notion.

Definition 3

Let $f:\mathbb {R}^n\rightarrow \mathbb {R}$ be a convex differentiable function and let Q be a nonempty closed convex set. Denote $\mathcal {X}=\arg \min _{x\in Q} f(x)$ and $f^*=\min _{x\in Q} f(x)$ and assume that $\mathcal {X}$ is nonempty. Let $\gamma >0$ be a fixed constant. Then the constrained convex program

$$\begin{aligned} \mathop {{{\mathrm{minimize}}}}\limits _{x\in Q} f(x) \end{aligned}$$

obeys

(a)
the modified restricted strongly convex property on Q with constant $\nu >0$ if it satisfies the restricted secant inequality
$$\begin{aligned} \langle G^f_Q(y;\gamma ), y-[y]_\mathcal {X}^+\rangle \ge \nu \cdot d^2(y,\mathcal {X}),\quad \forall y\in Q. \end{aligned}$$
(7)
(b)
the modified global error bound property on Q with constant $\kappa >0$ if it satisfies the error upper bound inequality
$$\begin{aligned} \Vert G^f_Q(y;\gamma )\Vert \ge \kappa \cdot d(y,\mathcal {X}), \quad \forall y\in Q. \end{aligned}$$
(8)
(c)
the modified quadratic growth property on Q with constant $\tau >0$ if it satisfies the second order growth of the function value
$$\begin{aligned} f(y)-f^*\ge \frac{\tau }{2}\cdot d^2(y,\mathcal {X}), \quad \forall y\in Q. \end{aligned}$$
(9)

We use $mRSC(\nu )$, $mGEB(\kappa )$, and $mQG(\tau )$ to stand for the defined properties above respectively.

Again, the closedness of $\mathcal {X}$ is guaranteed by the differentiable of f(x) and hence $[y]_\mathcal {X}^+$ is well defined.

Remark 2

The author of [17] also proposed a modified RSC by considering a convex constraint set Q. But they required that both $X_f=\arg \min _{x\in \mathbb {R}^n} f(x)$ and $Q\bigcap X_f$ are nonempty. Such assumptions are very strong conditions and many practical problems may fail to satisfy.

Remark 3

When $Q=\mathbb {R}^n$, the modified versions return to the corresponding original versions since $G^f_Q(\bar{x};\gamma )=\nabla f(\bar{x})$. Therefore, the modified Definition 3 can be viewed as a generalization of Definition 1.

2.3 Extended versions via proximal gradient mapping

To introduce extended versions of the previous notions for structure convex optimization, we need the concept of proximal gradient mapping [2].

Definition 4

Let $ \gamma >0$ be a fixed constant, $f: \mathbb {R}^n\rightarrow \mathbb {R}$ be a differentiable function, $g:\mathbb {R}^n\rightarrow \overline{\mathbb {R}}$ be a closed convex function, and $\bar{x}\in \mathbb {R}^n$. Denote

$$\begin{aligned} p^f_g(\bar{x};\gamma )= & {} \mathop {\arg \min }\limits _{x}\left[ f(\bar{x})+ \langle \nabla f(\bar{x}), x-\bar{x}\rangle +\frac{\gamma }{2}\Vert x-\bar{x}\Vert ^2+g(x)\right] \\ G^f_g(\bar{x};\gamma )= & {} \gamma (\bar{x}-p^f_g(\bar{x};\gamma )). \end{aligned}$$

We call $G^f_g(\bar{x};\gamma )$ the proximal gradient mapping of functions f and g.

Let Q be a closed nonempty convex set. When g is the indicator function

$$\begin{aligned} I_Q(x)=\left\{ \begin{array}{ll} 0,&{} \quad x\in Q,\\ +\infty , &{} \quad x\notin Q, \end{array} \right. \end{aligned}$$

we have that

$$\begin{aligned} p^f_g(\bar{x};\gamma )=x_Q(\bar{x};\gamma ). \end{aligned}$$

This implies that the proximal gradient mapping generalizes the gradient mapping.

Definition 5

Let $f:\mathbb {R}^n\rightarrow \mathbb {R}$ be a convex differentiable function and $g:\mathbb {R}^n\rightarrow \overline{\mathbb {R}}$ be a closed convex function. Denote $\mathcal {X}=\arg \min _{x} \varphi (x):= f(x)+g(x)$ and $\varphi ^*=\min _{x} \varphi (x)$ and assume that $\mathcal {X}$ is nonempty. Let $\gamma >0$ be a fixed constant. Then the following convex program

$$\begin{aligned} \mathop {{{\mathrm{minimize}}}}\limits _{x} \varphi (x)=f(x)+g(x) \end{aligned}$$

obeys

(a)
the extended restricted strongly convex property with parameter $\nu , \omega >0$ if it satisfies the restricted secant inequality
$$\begin{aligned} \langle G^f_g(y;\gamma ), y-[y]_\mathcal {X}^+\rangle \ge \nu \cdot d^2(y,\mathcal {X}),\quad \forall y\in [\varphi \le \varphi ^*+\omega ]. \end{aligned}$$
(10)
(b)
the extended global error bound property with parameter $\kappa , \omega >0$ if it satisfies the error upper bound inequality
$$\begin{aligned} \Vert G^f_g(y;\gamma )\Vert \ge \kappa \cdot d(y,\mathcal {X}), \quad \forall y\in [\varphi \le \varphi ^*+\omega ]. \end{aligned}$$
(11)
(c)
the extended quadratic growth property with constant parameter $\tau , \omega >0$ if it satisfies the second order growth of the function value
$$\begin{aligned} \varphi (y)-\varphi ^*\ge \frac{\tau }{2}\cdot d^2(y,\mathcal {X}), \quad \forall y\in [\varphi \le \varphi ^*+\omega ]. \end{aligned}$$
(12)

We use $eRSC(\nu , \omega )$, $eGEB(\kappa , \omega )$, and $eQG(\tau , \omega )$ to stand for the defined properties above respectively.

It is easy to see that $\varphi (x)$ is lower semicontinuous over $\mathbb {R}^n$ and hence $\mathcal {X}$ is closed; see e.g. Lemma 2.6.3 in [16]. This ensures that $[y]_\mathcal {X}^+$ is well defined.

Remark 4

The eQG appeared in [6] under the name of semi-strongly convex property. The authors of [21] proposed an analog of the eGEB property and exploited it by borrowing tools from set-valued analysis. The authors of [5] introduced the eGEB and proved that it is equivalent to the eQG. Our novelty here lies in the definition of eRSC.

3 Equivalence analysis

3.1 Equivalence among the original versions

In what follows, we prove that the notions defined in Definition 1 are actually equivalent. The idea of proof is mainly inspired by the seminal paper [1] and heavily relies on the well-known Ekeland’s variational principle in Lemma 1.

Theorem 1

Under the setting of Definition 1, the restricted strongly convex property, the global error bound property, and the quadratic growth property are equivalent in the following sense:

$$\begin{aligned} QG(\nu )\Rightarrow RSC\left( \frac{\nu }{2}\right) \Rightarrow GEB\left( \frac{\nu }{2}\right) \Rightarrow QG\left( \frac{\nu }{4}\right) . \end{aligned}$$

Proof

The implication of $QG(\nu )\Rightarrow RSC(\frac{\nu }{2})$ has been shown in [17, 20]. The implication $RSC(\frac{\nu }{2})\Rightarrow GEB(\frac{\nu }{2})$ is a direct consequence after applying the Cauchy-Schwartz inequality. It only needs to prove $GEB(\nu ) \Rightarrow QG(\frac{\nu }{2})$. Now assume that f has the GEB property with constant $\nu >0$. It suffices to prove that for all $0\le \alpha <\frac{1}{4}$, the following holds:

$$\begin{aligned} f(z)-f^*\ge \alpha \nu \cdot d^2(z,\mathcal {X}), \quad \forall z\in \mathbb {R}^n. \end{aligned}$$

If this is not true, then there must exist $z_0\in \mathbb {R}^n$ such that

$$\begin{aligned} f(z_0)<f^*+\alpha \nu \cdot d^2(z_0,\mathcal {X}). \end{aligned}$$

Clearly, $z_0\notin \mathcal {X}$ and hence $d(z_0, \mathcal {X})>0$ since $\mathcal {X}$ is a nonempty closed set. Let $\lambda =\frac{1}{2}d(z_0, \mathcal {X})$. By Ekeland’s variational principle with $\epsilon =\alpha \nu \cdot d^2(z_0,\mathcal {X})=4\alpha \nu \lambda ^2$, there exists $x_0\in \mathbb {R}^n$ such that $d(x_0,z_0)\le \lambda $ and

$$\begin{aligned} f(x)\ge f(x_0)-\frac{\epsilon }{\lambda }d(x,x_0)=f(x_0)-4\alpha \nu \lambda \cdot d(x,x_0), \quad \forall x\in \mathbb {R}^n. \end{aligned}$$

Then, $x_0$ minimizes the convex function $f(x)+4\alpha \nu \lambda \cdot d(x,x_0)$. By the first-order optimality condition, we get

$$\begin{aligned} 0\in \nabla f(x_0)+4\alpha \nu \lambda \cdot \partial (\Vert \cdot -x_0\Vert )(x_0)=\nabla f(x_0)+4\alpha \nu \lambda \cdot \mathcal {Y}, \end{aligned}$$

where $\mathcal {Y}=\{y\in \mathbb {R}^n: \Vert y\Vert \le 1\}$ [16]. Hence, we can find $y_0\in \mathcal {Y}$ such that $\nabla f(x_0)=-4\alpha \nu \lambda y_0$. Since

$$\begin{aligned} 2\lambda =d(z_0, \mathcal {X}) \le d(x_0, z_0)+d(x_0, \mathcal {X})\le \lambda +d(x_0, \mathcal {X}), \end{aligned}$$

we have $d(x_0, \mathcal {X})\ge \lambda $. Therefore,

$$\begin{aligned} \Vert \nabla f(x_0)\Vert =4\alpha \nu \lambda \Vert y_0\Vert \le 4\alpha \nu \lambda \le 4\alpha \nu \cdot d(x_0, \mathcal {X})< \nu \cdot d(x_0, \mathcal {X}), \end{aligned}$$

which contradicts the GEB property. This completes the proof. $\square $

3.2 Equivalence among the modified and extended versions

In this part, we deduce the equivalence among the modified and extended versions.

Theorem 2

Let $f:\mathbb {R}^n\rightarrow \mathbb {R}$ be a convex differentiable function whose gradient is Lipschitz-continuous with positive scalar L and $g:\mathbb {R}^n\rightarrow \overline{\mathbb {R}}$ be a closed convex function. Let $\gamma \ge L$ be a fixed constant. Then the extended versions are equivalent in the following sense

$$\begin{aligned} eRSC(\nu _1, \omega )\Rightarrow eGEB(\kappa _1, \omega )\Rightarrow eQG(\tau _1,\omega ) \end{aligned}$$

and

$$\begin{aligned} eQG(\tau _ 2,\omega )\Rightarrow eGEB(\kappa _2, \omega )\Rightarrow eRSC(\nu _2, \omega ), \end{aligned}$$

where $\omega \in (0, +\infty ]$ is a fixed constant and the other parameters satisfy $\kappa _1=\tau _1=\nu _1$ and

$$\begin{aligned} \kappa _2=\frac{\tau _2 \gamma ^2}{(2\gamma +\tau _2)(\gamma +L)}, ~~\nu _2=\frac{\kappa _2^2}{\gamma }. \end{aligned}$$

In particular, the equivalence among the modified versions also holds in the same way by letting $\omega =+\infty $.

Proof

The equivalence of $eGEB(\kappa , \omega )$ and $eQG(\tau ,\omega )$ has been shown in [5] recently; see Lemma 2. Applying the Cauchy-Schwartz inequality to the left-hand side term of eRSC gives the eGEB property. It remains to prove that eGEB implies eRSC. By using Lemma 3 with $x=[y]_\mathcal {X}^+$ and $\bar{x}=y$, we get

$$\begin{aligned} \langle G^f_g(y;\gamma ),y-[y]_\mathcal {X}^+\rangle \ge \frac{1}{2\gamma }\Vert G^f_g(y;\gamma )\Vert ^2 +\varphi (p^f_g(y;\gamma )) -\varphi ([y]_\mathcal {X}^+). \end{aligned}$$

(13)

Noticing that $\varphi (p^f_g(y;\gamma )) - \varphi ([y]_\mathcal {X}^+)=\varphi (p^f_g(y;\gamma )) - \varphi ^*\ge 0$ and applying the eGEB property, we obtain

$$\begin{aligned} \langle G^f_g(y;\gamma ),y-[y]_\mathcal {X}^+\rangle \ge \frac{\kappa _2^2}{2\gamma }d^2(y,\mathcal {X}), \quad \forall y\in [\varphi \le \varphi ^*+\omega ], \end{aligned}$$

(14)

which completes the proof. $\square $

4 Convergence analysis

In this section, we discuss asymptotically linear convergence of the proximal gradient method, based on the equivalence notions defined before.

The proximal gradient method, also known as the forward-backward splitting method, is fundamental for the following structured optimization problem

$$\begin{aligned} \mathop {{{\mathrm{minimize}}}}\limits _{x} \varphi (x):= f(x)+g(x), \end{aligned}$$

where f is a convex function with the gradient-Lipschitz-continuous property and g is a closed convex function. It can be stated as

$$\begin{aligned} x_{k+1}=x_k-\frac{1}{\gamma } G^f_g(x_k;\gamma ), \end{aligned}$$

where the constant $\gamma >0$ is appropriately chosen. Based on the eGEB property, the authors of [5] observed that an asymptotically Q-linear convergence in function values can be assured for this method, and moreover, if the iterates $x_k$ have some limit point $x^*$, then $x_k$ asymptotically converge R-linearly. Motivated by the arguments in [20], we have established below an asymptotically Q-linear convergence in the distance values $d(x_{k},\mathcal {X})$, and proved that $x_k$ themselves asymptotically converge R-linearly to a limit point $x^*$ under a compactness assumption on the minimizer set $\mathcal {X}$. The following result is complimentary to that of [5] and might be of interest in itself.

Theorem 3

Let $f:\mathbb {R}^n\rightarrow \mathbb {R}$ be a convex differentiable function whose gradient is Lipschitz-continuous with positive scalar L and $g:\mathbb {R}^n\rightarrow \overline{\mathbb {R}}$ be a closed convex function. Denote $\mathcal {X}=\arg \min _{x} \varphi (x):= f(x)+g(x)$ and $\varphi ^*=\min _{x} \varphi (x)$ and assume that $\mathcal {X}$ is nonempty. Let $\gamma \ge L$ be a fixed constant. Suppose that ${{\mathrm{minimize}}}_{x} \varphi (x)$ satisfies the $eQG(\tau ,\omega )$ property (equivalently, the $eGEB(\kappa , \omega )$ property). Then, the iterates $x_k$ generated by the proximal gradient method asymptotically converge Q-linearly, that is there exists an index m such that the inequality

$$\begin{aligned} d(x_{k+1},\mathcal {X})\le \sqrt{\frac{\gamma }{\gamma + \tau }}d(x_k,\mathcal {X}) \end{aligned}$$

holds for all $k\ge m$. Moreover, if $\mathcal {X}$ is compact, then the iterates $x_k$ asymptotically converge R-linearly to some limit point $x^*\in \mathcal {X}$ in the sense that

$$\begin{aligned} d^2(x_{k+m},x^*)\le C\cdot \left( 1-\frac{\kappa }{2\gamma }\right) ^k \end{aligned}$$

holds for all $k\ge 1$, where $m>\frac{L}{2\omega }d^2(x_0,\mathcal {X})$ and $C=\frac{2(\varphi (x_m)-\varphi ^*)}{\gamma }\left( \sum _{i=0}^\infty (1-\frac{\kappa }{2\gamma })^{\frac{i}{2}}\right) ^2$.

The proof idea below is partially inspired by [5, 20].

Proof

Let $m= \lceil \frac{L}{2\omega }d^2(x_0,\mathcal {X})\rceil $ where $\lceil x \rceil $ denotes the smallest integer larger than x. By using the standard sublinear estimate [2]

$$\begin{aligned} \varphi (x_k)-\varphi ^*\le \frac{L\cdot d^2(x_0,\mathcal {X})}{2k}, \end{aligned}$$

we deduce that $\varphi (x_k)-\varphi ^*\le \omega $ holds for all $k\ge m$. Denote the projection point of x onto $\mathcal {X}$ by $x^{\prime }$. By invoking Lemma 3 with $x=x_k^{\prime }$ and $\bar{x}=x_k$, we have that

$$\begin{aligned} \langle G^f_g(x_k;\gamma ),x_k-x_k^{\prime }\rangle \ge \frac{1}{2\gamma }\Vert G^f_g(x_k;\gamma )\Vert ^2 +\varphi (x_{k+1}) -\varphi ^* . \end{aligned}$$

(15)

Now, together with the $eQG(\tau ,\omega )$ property, we derive that for all $k\ge m$

$$\begin{aligned} d^2(x_{k+1},\mathcal {X})&= \Vert x_{k+1}-x_{k+1}^{\prime }\Vert ^2\le \Vert x_{k+1}-x_k^{\prime }\Vert ^2 \end{aligned}$$

(16a)

$$\begin{aligned}&= \Vert x_k-x_k^{\prime } -\frac{1}{\gamma }G^f_g(x_k;\gamma )\Vert ^2 \end{aligned}$$

(16b)

$$\begin{aligned}&= \Vert x_k-x_k^{\prime }\Vert ^2 -\frac{2}{\gamma }\langle G^f_g(x_k;\gamma ),x_k-x_k^{\prime }\rangle + \frac{1}{\gamma ^2}\Vert G^f_g(x_k;\gamma )\Vert ^2\end{aligned}$$

(16c)

$$\begin{aligned}&\le \Vert x_k-x_k^{\prime }\Vert ^2-\frac{2}{\gamma }(\varphi (x_{k+1}) -\varphi ^* ) \end{aligned}$$

(16d)

$$\begin{aligned}&\le d^2(x_k,\mathcal {X})-\frac{\tau }{\gamma } d^2(x_{k+1},\mathcal {X}), \end{aligned}$$

(16e)

which yields the asymptotically Q-linear convergence of $x_k$.

To prove the convergence of $\{x_k\}$ themselves, we first consider the sequence $\{x^{\prime }_k\}\subseteq \mathcal {X}$, which must have a subsequence, denoted by $\{x^{\prime }_{k_i}\}$, converging to some point $x^*\in \mathcal {X}$ due to the compactness assumption on $\mathcal {X}$. We claim that $x^{\prime }_k\rightarrow x^*$ as $k\rightarrow \infty $. Otherwise, there must exist another subsequence of $\{x_k\}$, denoted by $\{x^{\prime }_{k_j}\}$, converging to a different point $\hat{x}\in \mathcal {X}$. Notice that

$$\begin{aligned} d(x^*,\hat{x})\le d(x^*,x^{\prime }_{k_i})+d(x^{\prime }_{k_i}, x^{\prime }_{k_j})+d(x^{\prime }_{k_j},\hat{x}) \end{aligned}$$

and $d(x^{\prime }_{k_i}, x^{\prime }_{k_j})\le d(x_{k_i}, x_{k_j}),$ where the latter follows from the nonexpansive property of projection operator. We get that

$$\begin{aligned} d(x^*,\hat{x})\le d(x^*,x^{\prime }_{k_i})+d(x^{\prime }_{k_j},\hat{x})+d(x_{k_i}, x_{k_j}). \end{aligned}$$

(17)

Denote $\ell _1=\min \{k_i, k_j\}$ and $\ell _2=\max \{k_i,k_j\}$; then

$$\begin{aligned} d(x_{k_i}, x_{k_j})\le \sum _{i=\ell _1}^{\ell _2-1}d(x_i,x_{i+1})\le \sum _{i=\ell _1}^{\infty }d(x_i,x_{i+1}). \end{aligned}$$

By invoking Lemma 3 with $x=\bar{x}=x_i$, we have that

$$\begin{aligned} \varphi (x_i)-\varphi (x_{i+1})\ge \frac{1}{2\gamma }\Vert G^f_g(x_i;\gamma )\Vert ^2=\frac{\gamma }{2}\Vert x_i-x_{i+1}\Vert ^2. \end{aligned}$$

(18)

On the other hand, inequality (15) implies

$$\begin{aligned} \varphi (x_{i+1}) -\varphi ^*\le \Vert G^f_g(x_i;\gamma )\Vert ^2\left( \frac{\Vert x_i-x^{\prime }_i\Vert }{\Vert G^f_g(x_i;\gamma )\Vert }-\frac{1}{2\gamma }\right) . \end{aligned}$$

(19)

Applying the $eGEB(\kappa , \omega )$ property to (19) and combining with (18), we get that

$$\begin{aligned} \varphi (x_{i+1}) -\varphi ^*\le \left( 1-\frac{\kappa }{2\gamma }\right) (\varphi (x_i) -\varphi ^*),~~i\ge m . \end{aligned}$$

(20)

Let $\ell _1\ge m$ and $i\ge m$; then

$$\begin{aligned} d^2(x_i,x_{i+1})&= \Vert x_i-x_{i+1}\Vert ^2 \le \frac{2}{\gamma }(\varphi (x_i)-\varphi (x_{i+1}))\le \frac{2}{\gamma }(\varphi (x_i)-\varphi ^*)\end{aligned}$$

(21a)

$$\begin{aligned}&\le \frac{2}{\gamma }\left( 1-\frac{\kappa }{2\gamma }\right) ^{i-m}(\varphi (x_m)-\varphi ^*) \end{aligned}$$

(21b)

and hence

$$\begin{aligned} d(x_{k_i},x_{k_j})&\le \sum _{i=\ell _1}^{\infty }d(x_i,x_{i+1})\le \sqrt{\frac{2(\varphi (x_m)-\varphi ^*)}{\gamma }}\sum _{i=\ell _1}^\infty \left( 1-\frac{\kappa }{2\gamma }\right) ^{\frac{i-m}{2}}\end{aligned}$$

(22a)

$$\begin{aligned}&= D\cdot \left( 1-\frac{\kappa }{2\gamma }\right) ^{\frac{\ell _1-m}{2}}\rightarrow 0, ~~as~~\ell _1\rightarrow \infty , \end{aligned}$$

(22b)

where $D=\sqrt{\frac{2(\varphi (x_m)-\varphi ^*)}{\gamma }}\sum _{i=0}^\infty \left( 1-\frac{\kappa }{2\gamma }\right) ^{\frac{i}{2}}<\infty $. Together with the fact that $x^{\prime }_{k_i}\rightarrow x^*$ as $k_i\rightarrow \infty $ and $x^{\prime }_{k_j}\rightarrow \hat{x}$ as $k_j\rightarrow \infty $, we immediately get $d(x^*,\hat{x})=0$ by using (17) with $k_i\rightarrow \infty $ and $k_j\rightarrow \infty $, which contradicts $x^*\ne \hat{x}$. Therefore, $x^{\prime }_k\rightarrow x^*$ as $k\rightarrow \infty $ indeed holds. Finally, in light of the asymptotically Q-linear convergence of $x_k$, we have that

$$\begin{aligned} d(x_k,x^*)\le d(x_k,x^{\prime }_k)+ d(x^{\prime }_k,x^*)=d(x_k, \mathcal {X})+ d(x^{\prime }_k,x^*) \rightarrow 0, ~~as~~k\rightarrow \infty , \end{aligned}$$

which implies that the iterates $x_k$ converge to $x^*\in \mathcal {X}$. The asymptotically R-linear convergence of $\{x_k\}$ follows by setting $k_i=\ell _1=k+m$ and letting $k_j\rightarrow +\infty $ in (22). This completes the proof. $\square $

Remark 5

Although (22) implies that $\{x_k\}$ is a Cauchy sequence and hence converges, it can not ensure its convergence to a point belonging to $\mathcal {X}$ without the compactness assumption on $\mathcal {X}$.

5 A case study: composition minimization with constraints

The authors of [21] presented a unified framework for establishing error bounds for a class of structured convex optimization problems. By the equivalence between the error bound condition and quadratic growth, the authors of [5] streamlined and illuminated the arguments in [21] and also extended their results to a wider setting. Here, in a transparent way we derive a strengthened property of the RSC type for the following constrained convex program

$$\begin{aligned} {{\mathrm{minimize}}}\,\,f(x),\quad \quad \mathrm{subject\,\,to}\,\,~x\in Q, \end{aligned}$$

(23)

where Q is a polyhedral set in the n-dimensional Euclidean space, and f is the composition of an affine mapping with a strongly convex differentiable function. We assume that f and Q are of the special form:

$$\begin{aligned} f(x)=g(Ex), ~~~Q=\{x\in \mathbb {R}^n| Ax\le b\}, \end{aligned}$$

where E is some $m\times n$ matrix, A is some $k\times n$ matrix, $b\in \mathbb {R}^k$ is some vector, and g is strongly convex and gradient-Lipschitz-continuous with positive scalars $\mu $ and L respectively.

The convex program (23) has been extensively studied for its wide applications in data networks and machine learning.

Theorem 4

Under the setting of Definition 3 and letting $\gamma \ge L\Vert EE^T\Vert $, we have that the constrained convex program (23) obeys the strengthened property of the RSC type:

$$\begin{aligned} \langle G^f_Q(y;\gamma ), y-[y]_\mathcal {X}^+\rangle \ge \frac{1}{2\gamma }\Vert G^f_Q(y;\gamma )\Vert ^2+ C_1 \cdot d^2(y,\mathcal {X}), ~~\forall y\in Q, \end{aligned}$$

(24)

and the modified quadratic growth property:

$$\begin{aligned} f(y)\ge f^*+ C_2\cdot d^2(y,\mathcal {X}), ~~\forall y\in Q, \end{aligned}$$

(25)

where $C_i, i=1,2$ are positive constant depending on matrices E and A and the positive scalar $\mu $.

Proof

By using (29) in Lemma 5 with $x=[y]_\mathcal {X}^+, \bar{x}=y$ and noticing that $f(x_Q(y;\gamma ))\ge f([y]_\mathcal {X}^+)=f^*$, we get

$$\begin{aligned} \langle G^f_Q(y;\gamma ), y-[y]_\mathcal {X}^+\rangle \ge \frac{1}{2\gamma }\Vert G^f_Q(y;\gamma )\Vert ^2+ \frac{\mu }{2} \Vert Ey-E[y]_\mathcal {X}^+\Vert ^2. \end{aligned}$$

(26)

Since g is strongly convex, there must exist a unique vector $t^*$ such that $ Ex=t^*, \forall x\in \mathcal {X}$; please refer to [10, 18]. Thus,

$$\begin{aligned} \mathcal {X}=\{x| Ex=t^*\}\bigcap \{x|Ax\le b\}. \end{aligned}$$

Due to the result of Hoffman’s error bound in Lemma 6, there must exist a constant $\theta >0$ depending on matrices E and A such that for any $y\in Q=\{x|Ax\le b\}$ it holds

$$\begin{aligned} \Vert Ey-E[y]_\mathcal {X}^+\Vert ^2=\Vert Ey-t^*\Vert ^2\ge \theta \Vert y-[y]_\mathcal {X}^+\Vert ^2=\theta \cdot d^2(y,\mathcal {X}). \end{aligned}$$

Thus, the strengthened property of the RSC type follows from this and the inequality (26).

By Lemma 7, for any $x^*\in \mathcal {X}$ we have $G^f_Q(x^*;\gamma )=0$ and hence $x^*=x_Q(x^*;\gamma )$. Therefore, using (29) Lemma 5 with $\bar{x}=[y]_\mathcal {X}^+$ and the fact of $\Vert Ey-E[y]_\mathcal {X}^+\Vert ^2\ge \theta \cdot d^2(y,\mathcal {X})$, we get the modified quadratic growth property. This completes the proof. $\square $

Remark 6

By Lemma A.8 in [18], we get

$$\begin{aligned} \Vert G^f_Q(y;\gamma )\Vert =\Vert \gamma (y-[y-\frac{1}{\gamma }\nabla f(y)]_\mathcal {X}^+)\Vert \le \gamma \max (1,\gamma ^{-1})\Vert y-[y-\nabla f(y)]_\mathcal {X}^+\Vert . \end{aligned}$$

Thus, using this and applying the Cauchy-Schwartz inequality to (24), we get

$$\begin{aligned} \gamma \max (1,\gamma ^{-1})\Vert y-[y-\nabla f(y)]_\mathcal {X}^+\Vert \ge C_1\cdot d(y,\mathcal {X}). \end{aligned}$$

Or, equivalently there exists a constant $C_3>0$ such that

$$\begin{aligned} C_3 \cdot d(y,\mathcal {X})\le \Vert y-[y-\nabla f(y)]_\mathcal {X}^+\Vert . \end{aligned}$$

On the contrast, the authors of [18] derived the following error bound

$$\begin{aligned} d(y,\mathcal {X})\le \kappa _3\Vert y-[y-\nabla f(y)]_\mathcal {X}^+\Vert , ~~\forall y\in Q~~\text {and} ~~f(y)-f^*\le M \end{aligned}$$

for $f(x)=g(Ex)+q^Tx$. Although this bound enables the incorporation of the additional linear term, the constant $\kappa _3$ depends on the positive parameter M and hence on the variable y. To avoid such dependence, the authors of [3] derived the quadratic growth property

$$\begin{aligned} f(y)\ge f^*+ \kappa _4\cdot d^2(y,\mathcal {X}), ~~\forall y\in Q \end{aligned}$$

for $f(x)=g(Ex)+q^Tx$ by assuming that Q is compact, where $\kappa _4>0$ is a constant. Here, we can drop the compactness assumption by neglecting the linear term $q^Tx$.

Remark 7

There is an alternative way in [13] to prove the modified QG property. Indeed, By the strong convexity of g and the fact of $\Vert Ey-E[y]_\mathcal {X}^+\Vert ^2\ge \theta \cdot d^2(y,\mathcal {X})$, we derive for $\forall y\in Q$ that

$$\begin{aligned} f(y)-f^*= & {} f(y)-f([y]_\mathcal {X}^+)=g(Ey) -g(E[y]_\mathcal {X}^+) \ge \frac{\mu }{2}\Vert Ex-E[x]_\mathcal {X}^+\Vert ^2\\\ge & {} \frac{\mu \theta }{2}\cdot d^2(y,\mathcal {X}). \end{aligned}$$

Thus, by Theorem 2 we immediately get the mRSC property

$$\begin{aligned} \langle G^f_Q(y;\gamma ), y-[y]_\mathcal {X}^+\rangle \ge C_4\cdot d^2(y,\mathcal {X}), \end{aligned}$$

where $C_4$ is some positive constant.

References

Aragón Artacho, F.J., Geoffroy, M.H.: Characterization of metric regularity of subdifferentials. J. Convex Anal. 15(2), 365–380 (2008)
MathSciNet MATH Google Scholar
Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009)
Article MathSciNet MATH Google Scholar
Beck, A., Shtern, S.: Linearly convergent away-step conditional gradient for non-strongly convex functions (2015). arXiv:1504.05002v1 [math.OC]
Bolte, J., Nguyen, T.P., Peypouquet, J., Suter, B.W.: From error bounds to the complexity of first-order descent methods for convex functions (2015). arXiv:1510.08234v1 [math.OC]
Drusvyatskiy, D., Lewis, A.S.: Error bounds, quadratic growth, and linear convergence of proximal methods (2016). arXiv:1602.06661v1 [math.OC]
Gong, P.H., Ye, J.P.: Linear convergence of variance-reduced stochastic gradient without strong convexity (2015). arXiv:1406.1102v2 [cs.NA]
Hoffman, A.J.: On approximate solutions of systems of linear inequalities. J. Res. Natl. Bureau Stand. 49, 263–265 (1952)
Article MathSciNet Google Scholar
Lai, M.J., Yin, W.T.: Augmented $\ell _1$ and nuclear-norm models with a globally linearly convergent algorithm. SIAM J. Imaging Sci. 6(2), 1059–1091 (2013)
Article MathSciNet MATH Google Scholar
Liu, J., Wright, S.J.: Asynchronous stochastic coordinate descent: parallelism and convergence properties. SIAM J. Optim. 30, 351–376 (2015)
Article MathSciNet MATH Google Scholar
Luo, Z.Q., Tseng, P.: On the linear convergence of descent methods for convex essentially smooth minimization. SIAM J. Optim. 117, 408–425 (1992)
Article MathSciNet MATH Google Scholar
So, A.M.-C.: Non-asymptotic convergence analysis of inexact gradient methods for machine learning without strong convexity (2013). arXiv:1309.0113v1 [math.OC]
Mangasarian, O.L., Shiau, T.-H.: Lipschitz continuity of solutions of linear inequalities, programs and complementarity problems. SIAM J. Control Optim. 25(3), 583–595 (1987)
Article MathSciNet MATH Google Scholar
Necoara, I., Nesterov, Y., Glineur, F.: Linear convergence of first order methods for non-strongly convex programming (2015). arXiv:1504.06298v3 [math.OC]
Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course. Kluwer Academic, New York (2004)
Robinson, S.M.: Bounds for error in the solution set of a perturbed linear program. Linear Algebra Appl. 6(69–81), 1–38 (1973)
MathSciNet Google Scholar
Ruszczyński, A.P.: Nonlinear Optimization, vol. 13. Princeton university press, New Jersy (2006)
Sch$\ddot{o}$pfer, F.: Linear convergence of descent methods for the unconstrained minimization of restricted strongly convex functions. SIAM J. Optim. (2015) (submitted)
Wang, P., Lin, C.: Iteration complexity of feasiable decent methods for convex optimization. J. Mach. Learn. Res. 15, 1523–1548 (2014)
MathSciNet Google Scholar
Zhang, H., Yin, W.T.: Gradient methods for convex minimization: better rates under weaker conditions. Technical report, CAM Report 13-17, UCLA (2013)
Zhang, H., Cheng, L.Z.: Restricted strong convexity and its applications to convergence analysis of gradient-type methods. Optim. Lett. 9(5), 961–979 (2015)
Article MathSciNet MATH Google Scholar
Zhou, Z.R., So, A.M.-C.: A unified approach to error bounds for structured convex optimization problems (2015). arXiv:1512.03518v1 [math.OC]

Download references

Acknowledgments

We would like to thank anonymous reviewers for their valuable comments, with which great improvements have been made in this manuscript. The work is supported by the National Science Foundation of China (Nos. 11501569 and 61571008).

Author information

Authors and Affiliations

College of Science, National University of Defense Technology, Changsha, 410073, Hunan, People’s Republic of China
Hui Zhang

Authors

Hui Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hui Zhang.

Appendix

Lemma 1

(Ekeland’s variational principle, [1]) Let $(X, d(\cdot ,\cdot ))$ be a complete metric space and let $f:\mathbb {R}^n\rightarrow \overline{\mathbb {R}}$ be a lower semicontinuous function bounded from below, where d(x, y) stands for the Euclidean distance. Suppose that for some $\epsilon >0$ and $z\in X$, $f(z)<\inf f +\epsilon $. Then for any $\lambda >0$ there exist $y\in X$ such that $d(z,y)\le \lambda $ and

$$\begin{aligned} f(x)+\frac{\epsilon }{\lambda }d(x,y)\ge f(y), ~\forall x\in X. \end{aligned}$$

Lemma 2

(Corollary 3.6, [5]) Let f be gradient-Lipschitz-continuous with positive scalar L and $g:\mathbb {R}^n\rightarrow \overline{\mathbb {R}}$ be a closed convex function. Then the $eQG(\tau , \omega )$ implies $eGEB(\kappa , \omega )$ with $\kappa =\frac{\tau \gamma ^2}{(2\gamma +\tau )(\gamma +L)}$. Conversely, the $eGEB(\kappa , \omega )$ implies the $eQG(\tau , \omega )$ with any $\tau \in (0, \kappa )$.

The following result describes an important property of the proximal gradient mapping.

Lemma 3

([2]) Let f(x) be gradient-Lipschitz-continuous with positive scalar L, $g:\mathbb {R}^n\rightarrow \overline{\mathbb {R}}$ be a closed convex function, $\gamma \ge L$, and $\bar{x}\in \mathbb {R}^n$. Denote $\varphi (x)=f(x)+g(x)$. Then, for any $x\in Q$ we have

$$\begin{aligned} \varphi (x)-\varphi (p^f_g(\bar{x};\gamma ))\ge \langle G^f_g(\bar{x};\gamma ), x-\bar{x}\rangle + \frac{1}{2\gamma }\Vert G^f_g(\bar{x};\gamma )\Vert ^2. \end{aligned}$$

(27)

Lemma 4

Let $f(x)=g(Ex)$ with g satisfying the properties (1) and (3), and denote $\hat{L}=L\Vert EE^T\Vert $. Then, we have

$$\begin{aligned} f(y)\ge f(x)+\langle \nabla f(x), y-x\rangle +\frac{\mu }{2}\Vert Ey-Ex\Vert ^2, \quad \forall x, y\in \mathbb {R}^n, \end{aligned}$$

and

$$\begin{aligned} f(y)\le f(x)+\langle \nabla f(x), y-x\rangle +\frac{\hat{L}}{2}\Vert y-x\Vert ^2, \quad \forall x, y\in \mathbb {R}^n. \end{aligned}$$

Proof

By applying integration, we derive $\forall x, y\in \mathbb {R}^n$ that

$$\begin{aligned}&f(y)-f(x)-\langle \nabla f(x), y-x\rangle \end{aligned}$$

(28a)

$$\begin{aligned} =&\int _0^1 \langle \nabla f(x+\tau (y-x))-\nabla f(x), y-x\rangle d\tau \end{aligned}$$

(28b)

$$\begin{aligned} =&\int _0^1 \langle \nabla g(Ex+\tau \cdot E(y-x))-\nabla g(Ex), Ey-Ex\rangle d\tau . \end{aligned}$$

(28c)

Thus, by the strong convexity we get

$$\begin{aligned} f(y)-f(x)-\langle \nabla f(x), y-x\rangle \ge \int _0^1 \mu \tau \Vert Ey-Ex\Vert ^2 d\tau =\frac{\mu }{2}\Vert Ey-Ex\Vert ^2, \end{aligned}$$

and by the Lipschitz property and the Cauchy-Schwartz inequality we have

$$\begin{aligned} f(y)-f(x)-\langle \nabla f(x), y-x\rangle \le \int _0^1 L \tau \Vert Ey-Ex\Vert ^2 d\tau \le \frac{L\Vert EE^T\Vert }{2}\Vert y-x\Vert ^2, \end{aligned}$$

both of which complete the proof. $\square $

To deal with the gradient mapping of $f(x)=g(Ex)$, we need the following result which is motivated by Theorem 2.2.7 in [14].

Lemma 5

Let $f(x)=g(Ex)$ with g satisfying the properties (1) and (3), and let $\gamma \ge \hat{L}(=L\Vert EE^T\Vert )$ and $\bar{x}\in \mathbb {R}^n$. Then, for any $x\in Q$ we have

$$\begin{aligned} f(x)\ge f(x_Q(\bar{x};\gamma ))+ \langle G^f_Q(\bar{x};\gamma ), x-\bar{x}\rangle + \frac{1}{2\gamma }\Vert G^f_Q(\bar{x};\gamma )\Vert ^2+\frac{\mu }{2}\Vert Ex-E\bar{x}\Vert ^2.\nonumber \\ \end{aligned}$$

(29)

Proof

Denote $x_Q=x_Q(\bar{x};\gamma ), G=G^f_Q(\bar{x};\gamma )$ and let

$$\begin{aligned} \phi (x)=f(\bar{x})+ \langle \nabla f(\bar{x}), x-\bar{x}\rangle +\frac{\gamma }{2}\Vert x-\bar{x}\Vert ^2. \end{aligned}$$

Then, $\nabla \phi (x)=\nabla f(\bar{x})+\gamma (x-\bar{x})$ and for any $x\in Q$ we have

$$\begin{aligned} \langle \nabla f(\bar{x})-G, x-x_Q\rangle =\langle \nabla \phi (x_Q), x-x_Q\rangle \ge 0. \end{aligned}$$

With this inequality and by Lemma 4, we derive that

$$\begin{aligned} f(x)- \frac{\mu }{2}\Vert Ex-E\bar{x}\Vert ^2&\ge f(\bar{x})+\langle \nabla f(\bar{x}), x-\bar{x}\rangle \end{aligned}$$

(30a)

$$\begin{aligned}&=f(\bar{x})+ \langle \nabla f(\bar{x}), x_Q-\bar{x}\rangle +\langle \nabla f(\bar{x}), x-x_Q\rangle \end{aligned}$$

(30b)

$$\begin{aligned}&\ge f(\bar{x})+ \langle \nabla f(\bar{x}), x_Q-\bar{x}\rangle +\langle G, x-x_Q\rangle \end{aligned}$$

(30c)

$$\begin{aligned}&=\phi (x_Q)-\frac{\gamma }{2}\Vert x_Q-\bar{x}\Vert ^2+\langle G, x-x_Q\rangle \end{aligned}$$

(30d)

$$\begin{aligned}&=\phi (x_Q)+\frac{1}{2\gamma }\Vert G\Vert ^2+\langle G, x-\bar{x}\rangle \end{aligned}$$

(30e)

$$\begin{aligned}&\ge f(x_Q)+\frac{1}{2\gamma }\Vert G\Vert ^2+\langle G, x-\bar{x}\rangle , \end{aligned}$$

(30f)

where the last inequality follows from $f(x)\le \phi (x)$ since $\gamma \ge \hat{L}$. Hence, the desired result holds. $\square $

Lemma 6

(Hoffman’s error bound, [7, 12, 15]) Let E be an $m\times n$ matrix and A be a $k\times n$ matrix, and let b be a vector in $\mathbb {R}^k$. Then, there exists a scalar $\theta >0$ depending on E and A only such that, for any y satisfying $Ax\le b$ and any $t^*\in \mathbb {R}^m$ such that the linear system $Ex=t^*, Ax\le b$ is consistent, there is a point $\bar{y}\in \{u: Eu=t^*, Au\le b\}$ satisfying $\theta \Vert y-\bar{y}\Vert ^2\le \Vert Ey-t^*\Vert ^2$.

Lemma 7

Let $\gamma >0$. Then $\bar{x}\in Q$ is optimal for (23) if and only if $G^f_Q(\bar{x};\gamma )=0$

The proof is identical to the proof of Lemma A.6 in [18]. Hence we omit the arguments.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, H. The restricted strong convexity revisited: analysis of equivalence to error bound and quadratic growth. Optim Lett 11, 817–833 (2017). https://doi.org/10.1007/s11590-016-1058-9

Download citation

Received: 08 December 2015
Accepted: 23 June 2016
Published: 01 July 2016
Issue Date: April 2017
DOI: https://doi.org/10.1007/s11590-016-1058-9

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

The restricted strong convexity revisited: analysis of equivalence to error bound and quadratic growth

Abstract

Similar content being viewed by others

Nearly Optimal First-Order Methods for Convex Optimization under Gradient Norm Measure: an Adaptive Regularization Approach

From error bounds to the complexity of first-order descent methods for convex functions

Variational Convexity and the Local Monotonicity of Subgradient Mappings

1 Introduction

2 Notation and definitions

2.1 Original versions for unconstrained convex program

Definition 1

Remark 1

2.2 Modified versions for constrained convex program

Definition 2

Definition 3

Remark 2

Remark 3

2.3 Extended versions via proximal gradient mapping

Definition 4

Definition 5

Remark 4

3 Equivalence analysis

3.1 Equivalence among the original versions

Theorem 1

Proof

3.2 Equivalence among the modified and extended versions

Theorem 2

Proof

4 Convergence analysis

Theorem 3

Proof

Remark 5

5 A case study: composition minimization with constraints

Theorem 4

Proof

Remark 6

Remark 7

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix

Appendix

Lemma 1

Lemma 2

Lemma 3

Lemma 4

Proof

Lemma 5

Proof

Lemma 6

Lemma 7

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation