Abstract
Nemirovski’s analysis (SIAM J. Optim. 15:229–251, 2005) indicates that the extragradient method has the O(1/t) convergence rate for variational inequalities with Lipschitz continuous monotone operators. For the same problems, in the last decades, a class of Fejér monotone projection and contraction methods is developed. Until now, only convergence results are available to these projection and contraction methods, though the numerical experiments indicate that they always outperform the extragradient method. The reason is that the former benefits from the ‘optimal’ step size in the contraction sense. In this paper, we prove the convergence rate under a unified conceptual framework, which includes the projection and contraction methods as special cases and thus perfects the theory of the existing projection and contraction methods. Preliminary numerical results demonstrate that the projection and contraction methods converge twice faster than the extragradient method.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
Let Ω be a closed convex subset of R n, F be a continuous mapping from R n to itself. The variational inequality problem, denoted by VI(Ω,F), is to find a vector u ∗∈Ω such that
Notice that VI(Ω,F) is invariant when F is multiplied by a positive scalar β>0. It is well known ([1], p. 267) that, for any β>0,
where P Ω (⋅) denotes the projection onto Ω with respect to the Euclidean norm, i.e.,
Throughout this paper we assume that the mapping F is monotone and Lipschitz continuous, i.e.,
and there is a constant L>0 (not necessary to know), such that
Moreover, we assume that the solution set of VI(Ω,F), denoted by Ω ∗, is nonempty. The nonempty assumption of the solution set, together with the monotonicity assumption of F, implies that Ω ∗ is closed and convex (see p. 158 in [3]).
Among the algorithms for monotone variational inequalities, the extragradient (EG) method proposed by Korpelevich [12] is one of the simple and attractive methods. In fact, each iteration of the extragradient method can be divided into two steps. The k-th iteration of EG method begins with a given u k∈Ω, the first step produces a vector \(\tilde{u}^{k}\) via a projection
where β k >0 is selected to satisfy (see [13])
Since \(\tilde{u}^{k}\) is not accepted as the new iterate, for designation convenience, we call it as a predictor and β k is named the prediction step size. The second step (correction step) of the k-th iteration updates the new iterate u k+1 by
where β k and \(\tilde{u}^{k}\) are given in (1.3a). The sequence {u k} generated by the extragradient method is Fejér monotone with respect to the solution set, namely,
For a proof of the above contraction property, the readers may consult [3] (see pp. 1115–1118 therein). Notice that, in the extragradient method, the step size of the prediction (1.3a) and that of the correction (1.3c) are equal. Thus these two steps seem like ‘symmetric’.
Because of its simple iterative forms, recently, the extragradient method has been applied to solve some large optimization problems in the area of information science, such as in machine learning [11, 14, 22, 23], optical network [16, 17] and speech recognition [18], etc. In addition, Nemirovski [15] and Tseng [24] proved the O(1/t) convergence rate of the extragradient method. Both in the theoretical and practical aspects, the interest in the extragradient method becomes more active.
In the last decades, a class of projection and contraction (PC) methods for monotone variational inequalities [5, 6, 8, 19, 20] are developed. Similarly as in the extragradient method, each iteration of the PC methods consists of two steps. The prediction step of PC methods produces the predictor \(\tilde{u}^{k}\) via (1.3a) just as in the extragradient method (but the condition (1.3b) is not necessary). The PC methods exploit a pair of geminate directions [7, 8] offered by the predictor, namely, they are
Here, both the directions are ascent directions of the unknown distance function \(\frac{1}{2}\|u-u^{*}\|^{2}\) at the point u k. Based on such directions, the goal of the correction step is to generate a new iterate which is closer to the solution set. It leads to choosing the ‘optimal’ step length
and a relaxation factor γ∈(0,2), the second step (correction step) of the PC methods updates the new iterate u k+1 by
or
The PC methods (without line search) make one (or two) projection(s) on Ω at each iteration, and the distance of the iterates to the solution set monotonically converges to zero. According to the terminology in [2], these methods belong to the class of Fejér contraction methods. In fact, the only difference between the extragradient method and one of the PC methods is that they use different step sizes in the correction step (see (1.3c) and (1.8)). According to our numerical experiments [6, 8], the PC methods always outperform the extragradient methods.
Stimulated by the complexity statement of the extragradient method, this paper shows the O(1/t) convergence rate of the projection and contraction methods for monotone VIs. Recall that Ω ∗ can be characterized as (see (2.3.2) in p. 159 of [3])
This implies that \(\tilde{u} \in \varOmega\) is an approximate solution of VI(Ω,F) with the accuracy ϵ if it satisfies
In this paper, we show that, for given ϵ>0, in O(L/ϵ) iterations the projection and contraction methods can find a \(\tilde{u}\) such that
where
As a byproduct of the complexity analysis, we find why taking a suitable relaxation factor γ∈(1,2) in the correction steps (1.7) and (1.8) of the PC methods can achieve the faster convergence.
The outline of this paper is as follows. Section 2 recalls some basic concepts in the projection and contraction methods. In Sect. 3, we investigate the geminate descent directions of the distance function. Section 4 shows the contraction property of the PC methods. In Sect. 5, we carry out the complexity analysis, which results in an O(1/t) convergence rate and suggests using the large relaxation factor in the correction step of the PC methods. In Sect. 6, we present some numerical results to indicate the efficiency of the PC methods in comparison with the extragradient method. Finally, some conclusion remarks are addressed in the last section.
Throughout the paper, the following notational conventions are used. We use u ∗ to denote a fixed but arbitrary point in the solution set Ω ∗. A superscript such as in u k refers to a specific vector and usually denotes an iteration index. For any real matrix M and vector v, we denote the transpose by M T and v T, respectively. The Euclidean norm will be denoted by ∥⋅∥.
2 Preliminaries
In this section, we summarize the basic concepts of the projection mapping and three fundamental inequalities for constructing the PC methods. Throughout this paper, we assume that the projection on Ω in the Euclidean-norm has a closed form and it is easy to be carried out. Since
according to the optimal solution of the convex minimization problem, we have
Consequently, for any u∈Ω, it follows from (2.1) that
Therefore, we have
For given u and β>0, let \(\tilde{u} = P_{\varOmega}[u- \beta F(u)]\) be given via a projection. We say that \(\tilde{u}\) is a test-vector of VI(Ω,F) because
Since \(\tilde{u} \in \varOmega\), it follows from (1.1) that
Setting v=u−βF(u) and u=u ∗ in the inequality (2.1), we obtain
Under the assumption that F is monotone we have
The inequalities (2.3), (2.4) and (2.5) play an important role in the projection and contraction methods. They were emphasized in [5] as three fundamental inequalities in the projection and contraction methods.
Definition 2.1
(Ascent direction)
For any but fixed u ∗∈Ω ∗, a direction d is called an ascent direction of \(\frac{1}{2} \|u- u^{*}\|^{2}\) at u if and only if the inner-product (u−u ∗)T d>0.
Definition 2.2
(Predictor in projection-type methods)
For given u k, \(\tilde{u}^{k}\) given by (1.3a) is called the predictor.
Definition 2.3
(Prediction step-size condition in EG method)
The condition (1.3b) is called the prediction step-size condition in the extragradient method.
Indeed, the predictor \(\tilde{u}^{k}\) in the projection and contraction methods [5, 6, 8, 19] is produced by (1.3a). Because the mapping F is Lipschitz continuous (even if the constant L>0 is unknown), without loss of generality, we can assume that inf k≥0{β k }≥β L >0 and β L =O(1/L). In practical computation, we can choose a constant ν∈(0,1) and make an initial guess of β=ν/L and decrease β by a constant factor and repeat the procedure whenever (1.3b) is violated.
To the direction \(d(u^{k},\tilde{u}^{k})\) defined in (1.5), there is a correlative ascent direction \(\beta_{k}F(\tilde{u}^{k})\). Notice that the projection equation (1.3a) can be written as
where \(d(u^{k},\tilde{u}^{k})\) is defined in (1.5). It follows from (1.2) that \(\tilde{u}^{k}\) is a solution of VI(Ω,F) if and only if \(d(u^{k},\tilde{u}^{k})=0\). In [9, 10], \(d(u^{k},\tilde{u}^{k})\) and \(\beta_{k}F(\tilde{u}^{k})\) in (2.6) are called a pair of geminate directions and denoted by \(d_{1}(u^{k},\tilde{u}^{k})\) and \(d_{2}(u^{k},\tilde{u}^{k})\), respectively. In this paper, we restrict \(d_{2}(u^{k},\tilde{u}^{k})\) to be \(F(\tilde{u}^{k})\) times a positive scalar β k . The projection and contraction methods considered in this paper belong to the prox-like contraction methods [9, 10]. Instead of the step-size condition in the extragradient method (see (1.3b)), we use the following general step-size conditions for the projection and contraction methods.
Definition 2.4
(Prediction step-size conditions in PC methods)
Let c 1,c 2>0 be given constants. For given u k, let \(\tilde{u}^{k}\) be given by (1.3a) and \(d(u^{k},\tilde{u}^{k})\) be defined by (1.5). We say β k satisfies the prediction step-size conditions in the projection and contraction methods, if its related direction \(d(u^{k},\tilde{u}^{k})\) satisfies
and
3 The ascent directions
For any but fixed u ∗∈Ω ∗, (u−u ∗) is the gradient of the unknown distance function \(\frac{1}{2} \|u- u^{*}\|^{2}\) in the Euclidean-normFootnote 1 at the point u.
3.1 Geminate ascent directions
The forthcoming analysis is based on the general conditions. Note that an equivalent expression of (2.6) is
and from (2.8) we have
The following lemmas tell us that both the direction \(d(u^{k},\tilde{u}^{k})\) (for u k∈R n) and \(F(\tilde{u}^{k})\) (for u k∈Ω) are ascent directions of the function \(\frac{1}{2}\|u-u^{*}\|^{2}\) whenever u k is not a solution point. The proof is similar to those in [7], for completeness sake of this paper, we restate the short proofs.
Lemma 3.1
Let u k, \(\tilde{u}^{k}\) and \(d(u^{k},\tilde{u}^{k})\) be given by (2.6) and the general conditions (2.7) and (2.8) be satisfied. Then we have
Proof
Note that u ∗∈Ω. By setting u=u ∗ in (3.1) (the equivalent expression of (2.6)), we get
The last inequality follows from the monotonicity of F and \((\tilde{u}^{k} - u^{*})^{T}F(u^{*}) \ge 0\). Therefore,
The assertion (3.3) is followed from the above inequality and (3.2) directly. □
Lemma 3.2
Let \(u^{k}, \tilde{u}^{k}\) and \(d(u^{k},\tilde{u}^{k})\) be given by (2.6) and the general conditions (2.7) and (2.8) be satisfied. If u k∈Ω, then we have
Proof
Since \((\tilde{u}^{k} - u^{*})^{T}\beta_{k} F(\tilde{u}^{k}) \ge 0\), we have
Note that because u k∈Ω, by setting u=u k in (3.1), we get
From the above two inequalities follows that
The assertion (3.4) is followed from the above inequality and (3.2) directly. □
We would like to emphasize that (3.3) holds for u k∈R n while (3.4) is hold only for u k∈Ω.
3.2 Ascent directions in the extragradient method
The condition (1.3b) is necessary in the prediction step of the extragradient method. Setting u=u k, \(\tilde{u}=\tilde{u}^{k}\) and β=β k in the fundamental inequalities (2.3), (2.4) and (2.5), and adding them, we get
where \(d(u^{k},\tilde{u}^{k})\) defined in (1.5). It follows from (3.5) that
Note that, under the condition (1.3b), it follows that
This means that \(d(u^{k},\tilde{u}^{k})\) is an ascent direction of the unknown distance function \(\frac{1}{2}\|u-u^{*}\|^{2}\) at the point u k, and the condition (2.7) is satisfied with c 1=1−ν. By using the definition of \(d(u^{k},\tilde{u}^{k})\), we have
From the above inequality follows that the condition (2.8) is satisfied with \(c_{2}\ge \frac{1}{2}\). In other words, the step-size conditions in Definition 2.4 are satisfied if the condition (1.3b) is holds. Thus, the directions \(d(u^{k},\tilde{u}^{k})\) and \(\beta_{k} F(\tilde{u}^{k})\) defined in (1.5) are a pair of geminate ascent directions. In other words, the step-size condition (1.3b) is sufficient for the one in the projection and contraction methods.
4 Corrector and the convergence in the contraction sense
The extragradient method uses (1.3c) to update the new iterate. Based on the pair of geminate ascent directions in (2.6), namely, \(d(u^{k},\tilde{u}^{k})\) and \(\beta_{k}F(\tilde{u}^{k})\), in the projection and contraction methods, we use the one of the following corrector forms to update the new iterate u k+1:
or
where γ∈(0,2) and ϱ k is defined in (2.8). Note that the same step size length is used in (4.1) and (4.2) even if the search directions are different. Recall that \(\tilde{u}^{k}\) is obtained via a projection, by using the correction form (4.2), we have to make an additional projection on Ω in the PC methods. Replacing γϱ k in (4.2) by 1 (and if the step size condition (1.3b) is satisfied), it reduces to the update form of the extragradient method (see (1.3c)). For any solution point u ∗∈Ω ∗, we define
and
which measure the profit in the k-th iteration. The following theorem gives a lower bound of the profit function, the similar results were established in [6–8].
Theorem 4.1
For given u k, let the general conditions (2.7) and (2.8) be satisfied. If the corrector is updated by (4.1) or (4.2), then for any u ∗∈Ω ∗ and γ>0, we have
and
respectively, where
Proof
Using the definition of ϑ I (γ) and \(u_{I}^{k+1}\) (see (4.1)), we have
Recalling (3.3), we obtain
Substituting it in (4.8) and using the definition of q(γ), we get ϑ I (γ)≥q(γ) and the first assertion is proved. Now, we turn to show the second assertion. Because
and u ∗∈Ω, by setting u=u ∗ and \(v= u^{k} - \gamma \varrho_{k} \beta_{k} F(\tilde{u}^{k})\) in (2.2), we have
Thus,
The last inequality in (4.10) follows from \((\tilde{u}^{k}-u^{*})^{T} F(\tilde{u}^{k}) \ge 0\). Since \(u_{\mathit{II}}^{k+1}\in \varOmega\), by setting \(u=u_{\mathit{II}}^{k+1}\) in (3.1), we get
and consequently, substituting it in the right hand side of (4.10), we obtain
To the two crossed term in the right hand side of (4.11), we have (by using (3.2))
and
respectively. Substituting them in the right hand side of (4.11) and using \(u^{k}-\gamma \varrho_{k}d(u^{k},\tilde{u}^{k})= u_{I}^{k+1}\), we obtain
and the proof is complete. □
Note that q(γ) is a quadratic function of γ, it reaches its maximum at γ ∗=1. In practice, ϱ k is the ‘optimal’ step size in (4.1) and (4.2), γ is a relaxation factor. Because q(γ) is a lower bound of ϑ I (γ) (resp. ϑ II (γ)), the desirable new iterate is updated by (4.1) (resp. (4.2)) with γ∈[1,2) and the reason is interpreted in Fig. 1.
From Theorem 4.1 and the definition of ϱ k (see (2.8)), we obtain
By using the general conditions (2.7) and (2.8), it follows from the above inequality that
Due to the property (4.14), we call the methods which use update forms (4.1) and (4.2) PC Method-I and PC Method II, respectively. Note that the assertion (4.14) is derived from the general conditions (2.7)–(2.8). From (4.14), the convergence result of the PC methods follows directly (just as the convergence of the extragradient method follows from (1.4), for details, see Theorem 2.1 in [6]).
5 Convergence rate of the PC methods
This section proves the convergence rate of the projection and contraction methods. Recall that the base of the complexity proof is (see (2.3.2) in p. 159 of [3])
In the sequel, for given ϵ>0 and \(\mathcal{ D} \subset \varOmega\), we focus our attention to find a \(\tilde{u}\) such that
Although the PC Method I uses the update form (4.1) and it does not guarantee that {u k} belongs to Ω, the sequence \(\{\tilde{u}^{k}\}\subset \varOmega\) in the PC methods with different corrector forms. Now, we prove the key inequality of the PC Method I for the complexity analysis.
Lemma 5.1
For given u k∈R n, let \(\tilde{u}^{k}\) and \(d(u^{k},\tilde{u}^{k})\) be given by (2.6) and the general conditions (2.7) and (2.8) be satisfied. If the new iterate u k+1 is updated by (4.1) with any γ>0, then we have
where q(γ) is defined in (4.7).
Proof
Because (due to (3.1))
and (see (4.1))
we need only to show that
To the crossed term in the left hand side of (5.4), namely \((u- \tilde{u}^{k})^{T}(u^{k} - u^{k+1})\), using an identity
we obtain
By using \(u^{k+1} = u^{k} -\gamma\varrho_{k} d(u^{k},\tilde{u}^{k})\) and (3.2), we get
Substituting it in the right hand side of (5.5) and using the definition of q(γ), we obtain (5.4) and the lemma is proved. □
The both sequences \(\{\tilde{u}^{k}\}\) and {u k} in the PC method II belong to Ω. In the following lemma we prove the same assertion for PC method II as in Lemma 5.1.
Lemma 5.2
For given u k∈Ω, let \(\tilde{u}^{k}\) and \(d(u^{k},\tilde{u}^{k})\) be given by (2.6) and the general conditions (2.7) and (2.8) be satisfied. If the new iterate u k+1 is updated by (4.2) with any γ>0, then we have
where q(γ) is defined in (4.7).
Proof
For investigating \((u- \tilde{u}^{k})^{T} \beta_{k} F(\tilde{u}^{k})\), we divide it in the terms
First, we deal with the term \((u^{k+1}- \tilde{u}^{k})^{T}\gamma\varrho_{k}\beta_{k} F(\tilde{u}^{k})\). Since u k+1∈Ω, substituting u=u k+1 in (3.1) we get
To the first crossed term of the right hand side of (5.8), using (3.2), we have
To the second crossed term of the right hand side of (5.8), using the Cauchy-Schwarz Inequality, we get
Substituting them in the right hand side of (5.8), we obtain
Now, we turn to treat of the term \((u- u^{k+1})^{T} \gamma\varrho_{k}\beta_{k} F(\tilde{u}^{k})\) in (5.7). Since u k+1 is updated by (4.2), u k+1 is the projection of \((u^{k} - \gamma\varrho_{k} \beta_{k} F(\tilde{u}^{k}) )\) on Ω, it follows from (2.1) that
and consequently
Using the identity \(a^{T}b = \frac{1}{2} \{\|a\|^{2} - \|a-b\|^{2} +\|b\|^{2}\}\) to the right hand side of the last inequality, we obtain
Adding (5.9) and (5.10) and using the definition of q(γ), we get (5.6) and the proof is complete. □
For the different projection and contraction methods, we have the same key inequality which is shown in Lemmas 5.1 and 5.2, respectively. By setting u=u ∗ in (5.3) and (5.6), we get
Because \((\tilde{u}^{k}-u^{*})^{T}F(\tilde{u}^{k})\ge (\tilde{u}^{k}-u^{*})^{T}F(u^{*})\ge 0\) and \(q(\gamma)= \gamma(2 - \gamma) \varrho_{k}^{2}\times \| d(u^{k},\tilde{u}^{k})\|^{2}\), it follows from the last inequality that
This is just the form (4.13) in Sect. 4. In other words, the contraction property (4.13) of PC methods is the consequent result of Lemmas 5.1 and 5.2, respectively.
For the convergence rate proof, we allow γ∈(0,2]. In this case, we still have q(γ)≥0. By using the monotonicity of F, from (5.3) and (5.6) we get
This inequality is essential for the convergence rate proofs.
Theorem 5.1
For any integer t>0, we have a \(\tilde{u}_{t}\in \varOmega\) which satisfies
where
Proof
Summing the inequality (5.11) over k=0,…,t, we obtain
Using the notations of ϒ t and \(\tilde{u}_{t}\) in the above inequality, we derive
Indeed, \(\tilde{u}_{t}\in \varOmega\) because it is a convex combination of \(\tilde{u}^{0}, \tilde{u}^{1},\ldots, \tilde{u}^{t}\). The proof is complete. □
If the step-size sequence {β k } in (1.3a) is low-bounded away from zero, i.e.
then it follows from (2.8) and (5.13) that
and thus the PC methods have O(1/t) convergence rate. For any substantial set \(\mathcal{ D}\subset \varOmega\), the PC methods reach
iterations, where \(\tilde{u}_{t}\) is defined in (5.13) and \(D= \sup \{ \|u-u^{0}\| \mid u\in \mathcal{ D}\}\). This convergence rate is in the ergodic sense, the statement (5.12) suggests us to take a larger parameter γ∈(0,2] in the correction steps of the PC methods. To use the O(1/t) convergence rate, we need only to check that the sequence {β k } is low-bound away from zero and the general conditions in Definition 2.4 are satisfied.
6 Numerical experiments
This section is devoted to test the efficiency of the PC methods in comparison with the extragradient method [12]. In particular, we use (1.3a), (1.3b), (1.3c) as the recursion form of the extragradient method. All codes are written in Matlab and run on a Lenovo X200 Computer with 2.53 GHz.
6.1 Test examples of minimizing a sum of distances
The first part of the test examples is a min-max problem which can be formulated as a monotone linear variational inequality with skew-symmetric matrix. The problem is to find the shortest network in a given full Steiner topology (see Example 1 in [25]). The points-edges connections of the network are depicted in Fig. 2, where P={b [1],…,b [10]} are given points in R 2 (called regular points) and x [1],…,x [8] are the locations of the additional points (called Steiner points). The coordinates of the 10 regular points are given in Table 1. The mathematical form of the shortest sum of the distance under p-norm is
Mainly, we are interested in p=1,2 and ∞. For any d∈R 2 and p=1,2 and ∞, we have
where
Formulating the problem to a linear variational inequality
The problem (6.1) is equivalent to the following min-max problem
where z [i],i=1,…,17 are vectors in R 2. In fact, z [i] is the dual variable ordered to edge e i . The compact form of (6.3) is
where
A is block matrix which has form
Let \((x^{*}, z^{*})\in \mathcal{R}\times \mathcal{B}\) be any solution of (6.4), it follows that
Thus, (x ∗,z ∗) is a solution of the following variational inequality:
The compact form of (6.7) is the following linear variational inequality:
where
Note that M is skew-symmetric and thus the linear variational inequality is monotone.
Solving the LVI by the extragradient method
By using the extragradient method (see (1.3a)–(1.3c)), for given u k, to produce \(\tilde{u}^{k}\) by (1.3a), the condition (1.3b) should be satisfied. Note that the mapping F(u)=Mu+q in LVI(Ω,M,q) is Lipschitz continuous. For this example, one can calculate ∥M∥≈2.2089. In order to ensure the condition (1.3b) to be satisfied, we should to take β∈(0,1/∥M∥). Since 1/∥M∥≈0.4527, we use the extragradient method to solve the problem (6.8)–(6.9) with different β=0.30,0.35,0.40 and 0.45, respectively. The start point is u 0=0 and the iteration is stopped as soon as
The optimal networks are depicted in Figs. 3, 4. The iteration numbers for the shortest distance problems under different norms are reported in Table 2. It seems that, for fast convergence, the constant parameter β∈(0,1/∥M∥) should be closed to 1/∥M∥.
Solving the LVI by the PC methods
By using the PC methods, the condition (1.3b) is not necessary. We need only to ensure the general conditions in Definition 2.4 to be satisfied. Thus, we can take β k ≡1 in (1.3a) and the predictor \(\tilde{u}^{k}\) is given by
Because M T=−M, the above projection mapping can be rewritten as
This is the form (2.6) with
For this \(d(u^{k},\tilde{u}^{k})\), we have
and
Thus the general conditions in Definition 2.4 are satisfied with c 1=1 and c 2=1/∥I+M T∥2. We use the PC Method-I (see (1.7)) or PC Method II (see (1.8)) to update the new iterate. The iteration numbers and the CPU times are listed in Table 3. For comparison, we also report the CPU times by using the extragradient method with β=0.45 (the fastest case in Table 2) and the shortest distance of the network in different norms.
It is observed that
Both the PC Methods converge much faster than the Extragradient method.
6.2 Test examples of nonlinear complementarity problems
We take the nonlinear complementarity problems
as the second part of the test examples. Complementarity problem is a variational inequality VI(Ω,F) with \(\varOmega=R^{n}_{+}\), the non-negative orthant in R n. In this case, P Ω (v)=max(v,0).
Constructing the test problems
The mapping F(u) in the tested NCP is given by
where D(u):R n→R n is the nonlinear part, M is an n×n matrix, and q∈R n is a vector.
-
In D(u), the nonlinear part of F(u), the components are
$$D_j (u)= d_j\cdot\arctan(a_j\cdot u_j), $$where a and d are random vectorsFootnote 2 whose elements are in (0,1).
-
The matrix M in the linear part is given by M=A T A+B. A is an n×n matrix whose entries are randomly generated in the interval (−5,+5), and B is an n×n skew-symmetric random matrix (B T=−B) whose entriesFootnote 3 are in the interval (−5,+5).
It is clear that the mapping composed in this way is monotone. We construct the following 3 sets of test examples by choosing different vector q in (6.10).
-
1.
In the first set of the test examples, the elements of vector q is generated from a uniform distribution in the interval (−500,500).
-
2.
The second setFootnote 4 of the test examples is similar to the first set. Instead of q∈(−500,500), the vector q is generated from a uniform distribution in the interval (−500,0).
-
3.
The third set of test examples has a known solution \(u^{*} \in R^{n}_{+}\). Let vector p be generated from a uniform distribution in the interval (−10,10) and
$$ u^*= \max(p,0). $$(6.11)By setting
$$v= \max(-p,0) \quad \mbox{and} \quad q= v - \bigl(D\bigl(u^*\bigr)+Mu^*\bigr), $$we have F(u ∗)=D(u ∗)+Mu ∗+q=v=max(−p,0). Thus,
$$\bigl(u^*\bigr)^TF\bigl(u^*\bigr)= \bigl(\max(p,0) \bigr)^T \bigl(\max(-p,0) \bigr)=0. $$In this way we constructed a test NCP with a known solution u ∗ described in (6.11).
Implementation details
For given u k, we use (1.3a) to produce \(\tilde{u}^{k}\) which satisfies condition in (1.3b) with ν=0.95. Note that in this case the general conditions (2.7) and (2.8) are satisfied with c 1=(1−ν) and c 2=1/2.
We use a modified Armijo rule for choosing the parameter β k . It should be mentioned, in practical computation, if \(r_{k} :=\beta_{k} \|F(u^{k})-F(\tilde{u}^{k})\|/ \|u^{k}-\tilde{u}^{k}\|\) is too small, it will lead slow convergence. Therefore, if r k ≤μ=0.4, the trial parameter β k will be enlarged for the next iteration. In the case that F is Lipschitz continuous with constant L>0, by using the above mentioned strategies for choosing β k , we still have inf{β k }≥β L =O(1/L). These ‘refined’ strategies are necessary for fast convergence. Algorithm 1 is the implementation details.
Algorithm 1
Step 0. | Set β 0=1, u 0∈Ω and k=0. |
Step 1. | \(\tilde{u}^{k}=P_{\varOmega}[u^{k}-\beta_{k} F(u^{k})]\), |
\(r_{k}:=\displaystyle {{{\beta_k \|F(u^k)-F(\tilde{u}^k)\|} \over {\|u^k-\tilde{u}^k\|}},}\) | |
while r k >ν | |
\(\beta_{k} := 0.7 *\beta_{k} * \min\{ 1, {1\over r_{k}} \}\), \(\tilde{u}^{k}=P_{\varOmega}[u^{k}-\beta_{k} F(u^{k})]\) | |
end(while) | |
Use different forms ((1.3c), (4.1) or (4.2)) to update u k+1. | |
If r k ≤μ then β k :=β k ∗ν∗0.9/r k , end(if) | |
Step 2. | β k+1=β k and k=k+1, go to Step 1. |
The iterations begin with u 0=0, β 0=1 and stop as soon as
Since both F(u k) and \(F(\tilde{u}^{k})\) are involved in those methods recursions, each iteration of the test methods needs at least 2 times of evaluations of the mapping F. We use No. It and No. F to denote the numbers of iterations and the evaluations of the mapping F, respectively. The size of the tested problems is from 500 to 2000.
Comparison between the extragradient method and the PC method II
In the case that the condition (1.3b) is satisfied, replacing γϱ k in (4.2) by 1, the PC method II becomes the extragradient method. According to the assertion in Theorems 4.1 and 5.1, we take the relaxation factor γ=1.9 in the PC methods. The test results for the 3 sets of NCP are given in Tables 4, 5 and 6, respectively.
In the third test examples, as the stop criterion is satisfied, we have ∥u k−u ∗∥∞≈2×10−4 by using the all three test methods. The PC method II and the extragradient method use the same direction but different step size in the correction step. The numerical results show that the PC method II is much efficient than the extragradient method. Even if the PC methods need to calculate the step size ϱ k in each iteration, while the computational load required by the additional effort is significantly less than the dominating task (the evaluations of F(u k) and \(F(\tilde{u}^{k})\)). It is observed that
The different PC methods use the one of the geminate directions but the same step size in their correction forms. Between the PC methods, PC method II needs fewer iterations than PC method I, this evidence coincides with the assertions in Theorem 4.1 (see (4.5) and (4.6)). Thus, we suggest to use PC method II when the projection on Ω is easy to be carried out. Otherwise (when the projection is the dominating task in the iteration), we use PC method I because its update form (4.1) does not contain the projection.
7 Conclusions
In a unified framework, we proved the O(1/t) convergence rate of the projection and contraction methods for monotone variational inequalities. The convergence rate is the same as that for the extragradient method. In fact, our convergence rate include the extragradient method as a special case. The complexity analysis in this paper is based on the general conditions defined in Definition 2.4 and thus can be extended to a broaden class of similar contraction methods. Preliminary numerical results indicate that the PC methods do outperform the extragradient method.
Notes
For convenience, we only consider the distance function in the Euclidean-norm. All the results in this paper are easy to extended to the contraction of the distance function in G-norm where G is a positive definite matrix.
A similar type of (small) problems was tested in [21] where the components of the nonlinear mapping D(u) are D j (u)=c⋅arctan(u j ).
In the paper by Harker and Pang [4], the matrix M=A T A+B+D, where A and B are the same matrices as what we use here, and D is a diagonal matrix with uniformly distributed random entries d jj ∈(0.0,0.3).
In [4], the similar problems in the first set are called easy problems while the 2-nd set problems are called hard problems.
References
Bertsekas, D.P., Tsitsiklis, J.N.: Parallel and Distributed Computation, Numerical Methods. Prentice-Hall, Englewood Cliffs (1989)
Blum, E., Oettli, W.: Mathematische Optimierung: Grundlagen und Verfahren. Ökonometrie und Unternehmensforschung. Springer, Berlin (1975)
Facchinei, F., Pang, J.S.: Finite-Dimensional Variational Inequalities and Complementarity Problems, Vols. I and II. Springer Series in Operations Research. Springer, New York (2003)
Harker, P.T., Pang, J.S.: A damped-Newton method for the linear complementarity problem. Lect. Appl. Math. 26, 265–284 (1990)
He, B.S.: A class of projection and contraction methods for monotone variational inequalities. Appl. Math. Optim. 35, 69–76 (1997)
He, B.S., Liao, L.-Z.: Improvements of some projection methods for monotone nonlinear variational inequalities. J. Optim. Theory Appl. 112, 111–128 (2002)
He, B.S., Xu, M.-H.: A general framework of contraction methods for monotone variational inequalities. Pac. J. Optim. 4, 195–212 (2008)
He, B.S., Yuan, X.M., Zhang, J.J.Z.: Comparison of two kinds of prediction-correction methods for monotone variational inequalities. Comput. Optim. Appl. 27, 247–267 (2004)
He, B.S., Liao, L.-Z., Wang, X.: Proximal-like contraction methods for monotone variational inequalities in a unified framework I: effective quadruplet and primary methods. Comput. Optim. Appl. 51, 649–679 (2012)
He, B.S., Liao, L.-Z., Wang, X.: Proximal-like contraction methods for monotone variational inequalities in a unified framework II: general methods and numerical experiments. Comput. Optim. Appl. 51, 681–708 (2012)
Howard, A.G.: Large margin, transformation learning. PhD Thesis, Graduate School of Arts and Science, Columbia University (2009)
Korpelevich, G.M.: The extragradient method for finding saddle points and other problems. Ekon. Mat. Metod. 12, 747–756 (1976)
Khobotov, E.N.: Modification of the extragradient method for solving variational inequalities and certain optimization problems. USSR Comput. Math. Math. Phys. 27, 120–127 (1987)
Lacoste-Julien, S.: Discriminative machine learning with structure. PhD Thesis, Computer Science, University of California, Berkeley (2009)
Nemirovski, A.: Prox-method with rate of convergence O(1/t) for variational inequality with Lipschitz continuous monotone operators and smooth convex-concave saddle point problems. SIAM J. Optim. 15, 229–251 (2005)
Pan, Y.: A game theoretical approach to constrained OSNR optimization problems in optical network. PhD Thesis, Electrical and Computer Engineering, University of Toronto (2009)
Pan, Y., Pavel, L.: Games with coupled propagated constraints in optical networks with multi-link topologies. Automatica 45, 871–880 (2009)
Sha, F.: Large margin training of acoustic models for speech recognition. PhD Thesis, Computer and Information Science, University of Pennsylvania (2007)
Solodov, M.V., Tseng, P.: Modified projection-type methods for monotone variational inequalities. SIAM J. Control Optim. 34, 1814–1830 (1996)
Sun, D.: A class of iterative methods for solving nonlinear projection equations. J. Optim. Theory Appl. 91, 123–140 (1996)
Taji, K., Fukushima, M., Ibaraki, I.: A globally convergent Newton method for solving strongly monotone variational inequalities. Math. Program. 58, 369–383 (1993)
Taskar, B., Lacoste-Julien, S., Jordan, M.I.: Structured prediction, dual extragradient and Bregman projections. J. Mach. Learn. Res. 7, 1627–1653 (2006)
Taskar, B., Lacoste-Julien, S., Jordan, M.I.: Structured prediction via extragradient method. In: Weiss, Y., Schoelkopf, B., Platt, J. (eds.) Advances in Neural Information Processing Systems (NIPS), vol. 18 (2006)
Tseng, P.: On accelerated proximal gradient methods for convex-concave optimization. Department of Mathematics, University of Washington, Seattle, WA 98195, USA (2008)
Xue, G.L., Ye, Y.Y.: An efficient algorithm for minimizing a sum of Euclidean norms with applications. SIAM J. Optim. 7, 1017–1036 (1997)
Acknowledgements
The authors thank X.-L. Fu, M. Li, M. Tao and X.-M. Yuan for the discussion and valuable suggestions.
Author information
Authors and Affiliations
Corresponding author
Additional information
X. Cai was supported by the MOEC fund 20110091110004. G. Gu was supported by the NSFC grants 11001124 and 91130007. B. He was supported by the NSFC grant 91130007 and the MOEC fund 20110091110004.
Rights and permissions
About this article
Cite this article
Cai, X., Gu, G. & He, B. On the O(1/t) convergence rate of the projection and contraction methods for variational inequalities with Lipschitz continuous monotone operators. Comput Optim Appl 57, 339–363 (2014). https://doi.org/10.1007/s10589-013-9599-7
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10589-013-9599-7