1 Introduction

Let \(\mathbb {R}^{n}\) be n dimension Euclidean space, \(C\subset \mathbb {R}^{n}\) be a nonempty closed and convex set and \(F:\mathbb {R}^{n}\rightarrow \mathbb {R}^{n}\) be a continuous mapping. We consider the classical variational inequality problem of finding a vector xC such that

$$ \langle F(x),u-x\rangle\ge 0,~\forall u\in C, $$
(1)

where 〈⋅,⋅〉 denotes the inner product of \(\mathbb {R}^{n}\). We let

$$ S:= \{x\in C: \langle F(x), u - x \rangle \geq 0, \forall u\in C\} $$

denote the solution set of variational inequalities and let

$$ S_{D}:= \{x\in C: \langle F(u), u - x \rangle \geq 0, \forall u\in C\} $$

denote the solution set of the dual variational inequalities of Problem (1).

If C is nonempty closed and convex and F is continuous or hemicontinuous on C, the relationship SDS is known as Minty lemma. The relationship SSD holds when F is pseudomonotone on C in the sense of Karamardian [1]; i.e., for all x, yC,

$$ \langle F(x),y-x\rangle\geq 0 \Longrightarrow\langle F(y),y-x\rangle\geq 0. $$

Hence, we obtain that

$$ \begin{array}{@{}rcl@{}} S = S_{D}, \end{array} $$
(2)

whenever C is nonempty closed and convex, F is continuous and pseudomonotone on C.

From [2, Example 4.2], we see that S = SD may not satisfied when F is quasimonotone in the sense of Karamardian [1] on C, i.e., for all x, yC,

$$ \langle F(x),y-x\rangle> 0 \Longrightarrow\langle F(y),y-x\rangle\geq 0. $$

Recently, an interesting work in [3] show that if \(C\subset \mathbb {R}\) is closed and bounded, then SD ≠ ∅ if and only if F is quasimonotone, see Lemma 3.1 and Proposition 3.1 therein. Moreover, from Theorem 3.1 and Example 3.1 in [3], we construct high-dimensional variational inequalities, which is not quasimonotone with SD ≠ ∅ whenever the number of dimensional n ≥ 2, see Example 6.1.

The global convergence of many well-known algorithms are based on the condition that S = SD; see, for example, Goldstein-Levitin-Polyak Projection algorithms [4, 5], proximal point algorithm [6], extragradient projection algorithms [7, 8] and its variant algorithms [9, 10], infeasible projection-based algorithm [11,12,13], self-adaptive projection-based algorithm [14] and projected reflected gradient algorithm [15] (because these algorithms need F is at least pseudomonotone on C). The condition S = SD is also used for generalized monotone or nonmonotone variational inequalities or nonmonotone equilibrium problems to establish the global convergence, see, for example, [16,17,18] and [19]. Moreover, the assumption about monotonicity or pseudomonotonicity of the underlying mapping is used in variational inequality and fixed point problems, see, for example, [20,21,22,23,24]. However, if we generalize the pseudomonotone of F to qusimonotone, there exists the case that \(S_{D} \subsetneqq S\); see, for example, [2, Example 4.2]. So, assumption S = SD may not suit for quasimonotone or nonmonotone variational inequalities.

There still exist some algorithms for quasimonotone variational inequality problems; see, for example, [25] and [26]. However, the global convergence of these algorithms need extra assumptions than SD ≠ ∅. Recently, [27] presented an iterative algorithm for quasimonotone variational inequalities, the global convergence is established under the assumptions SD ≠ ∅ and

$$ \begin{array}{@{}rcl@{}} \{z\in C:F(z)=0\}\backslash S_{D} \text{ is a finite set}. \end{array} $$
(3)

However, from Example 4.4 below, we see that assumption (3) may not suit for quasimonotone variational inequalities.

In [2], we proposed a double projection algorithm for nonmonotone variational inequalities. The global convergence is obtained under assumptions that SD ≠ ∅Footnote 1, C is closed and convex and F is continuous on C (without needing any monotonicity of F). This algorithm is generalized to solve nonmonotone set-valued variational inequalities and nonmonotone equilibrium problems; see, for example, [28,29,30] and [31]. However, the next iteration point xk+ 1 in [2] is generated by projecting a vector onto the intersection of the feasible set C and k + 1 half-spaces. Hence, the computational cost of computing xk+ 1 will increase as k increase. Moreover, the next iterate point of algorithms in [28,29,30] and [31] are all similarly generated in [2]. Very recently, [32] proposed an extragradient method for solving variational inequalities without monotonicity.

Inspired by [11,12,13, 33, 34], we present an infeasible projection algorithm (IPA) for Problem (1) with SD ≠ ∅. In IPA, xk+ 1 is generated by projecting xk onto only one half-space, which is selected from former k + 1 half-spaces and has the largest distance from xk. Moreover, the computational cost of computing xk+ 1 can be ignored, see Remark 3.1 below. If F is in addition L-Lipschitz continuous, by taking suitable parameters, IPA requires only one projection onto the feasible set per iteration (see Proposition 4.1 and Remark 4.3 below). Comparing with the algorithm in [27], the global convergence of IPA without needing extra assumption in (3). Moreover, if in addition the error bound condition holds and F is Lipschitz continuity, the convergence rate of IPA is Q-linear (see Theorem 5.1 below). To compare with YH and DHF, we construct high-dimensional variational inequalities with SD ≠ ∅ and perform these algorithms in Matlab. Numerical experiments show that IPA is much more efficient than YH and DHF in terms of CPU time. Moreover, IPA is less dependent on the initial value than YH and DHF.

The rest of this paper is organized as follows. Some notations and preliminary materials are introduced in Section 2. IPA and its convergence analysis is introduced in Section 3. The simplified IPA is introduced in Section 4 when F is in addition Lipschitz continuous on \(\mathbb {R}^{n}\). The convergence rate of IPA is introduced in Section 5 under additional error bound assumption. Finally, numerical experiments are reported in Section 6.

2 Preliminaries

In this section, we introduce some properties about the projection mapping and some materials which will be used in future convergence analysis.

Let ∥⋅∥ denote Euclidean norm in \(\mathbb {R}^{n}\) and let dist(x, C) denote the Euclidean distance from a vector x to C; i.e.,

$$ \text{dist}(x,C):=\inf\{\|x-y\|:y\in C\}. $$

Let PC(x) denote the orthogonal projection of the vector x onto C; i.e.,

$$ P_{C}(x):= \text{arg min} \{\|y-x\|:y\in C\}. $$

From the fact that C is closed and convex, we have \(\text {dist}(x,P_{C}(x))=\inf \{\|x-y\|:\) yC}. For a fixed \(x\in \mathbb {R}^{n}\) and a positive number μ, we let r(x, μ) denote the natural residual mapping of problem (1); i.e.,

$$ \begin{array}{@{}rcl@{}} r(x,\mu):=x - P_{C}(x-\mu F(x)). \end{array} $$
(4)

In the following, we recall some well-known properties about the projection mapping and the natural residual mapping.

Lemma 2.1

[36] Let \(C\subset \mathbb {R}^{n}\) be a nonempty closed convex set. Then, the following statements hold.

  1. (a)

    for any fixed \(x\in \mathbb {R}^{n}\), we have

    $$ \begin{array}{@{}rcl@{}} z = P_{C}(x)\iff z\in C\text{ and }\langle z-x,y-z\rangle\geq 0 \text{~ for all ~} y\in C. \end{array} $$
    (5)
  2. (b)

    PC(⋅) is nonexpansive; i.e.,

    $$ \begin{array}{@{}rcl@{}} \|P_{C}(x)-P_{C}(z)\|\leq\|x-z\| \text{~ for all ~} x,z\in \mathbb{R}^{n}. \end{array} $$

Lemma 2.2

[10] Let \(C\subset \mathbb {R}^{n}\) be a nonempty closed convex set and x be any fixed point in \(\mathbb {R}^{n}\). If y = PC(x), then we have

$$ \|y-z\|^{2}\leq\|x - z\|^{2}-\|y - x\|^{2} \text{~ for all ~}z\in C. $$

Lemma 2.3

[37] Let v be a fixed vector and \(H:=\{x\in \mathbb {R}^{n}:\langle u,x\rangle \leq a\}\) be a half-space. Then

$$ \begin{array}{@{}rcl@{}} P_{H}(v) = v -\max\left\{\frac{\langle u,v\rangle - a}{\|u\|^{2}}, 0\right\}u. \end{array} $$

Moreover, if in addition vH, it follows that

$$ \begin{array}{@{}rcl@{}} P_{H}(v) = v - \frac{\langle u,v\rangle - a}{\|u\|^{2}}u. \end{array} $$

Lemma 2.4

[34, 38] Let C be a nonempty closed and convex set of \(\mathbb {R}^{n}\) and r(x, μ) be defined in (4). Then the following statements hold.

  1. (a)

    x is a solution of problems (1) if and only if ∥r(x, μ)∥ = 0 for each fixed μ > 0.

  2. (b)

    If there exists a positive number μ > 0 such that ∥r(x, μ)∥ = 0, then x is a solution of problems (1).

Lemma 2.5

[39] Let r(x, μ) be defined in (4). Then, for any fixed point \(x\in \mathbb {R}^{n}\), the following properties hold.

  1. (a)

    function μ↦∥r(x, μ)∥ is nondecreasing whenever μ > 0.

  2. (b)

    function \(\mu \mapsto \frac {\|r(x,\mu )\|}{\mu }\) is nonincreasing whenever μ > 0.

The following inequality is a direct conclusion from Lemma 2.5. For simplicity, we omit its proof.

$$ \begin{array}{@{}rcl@{}} \!\!\!\!\!\!\!\!\min{(1,\mu)}\|r(x,1)\|\leq\|r(x,\mu)\|\leq\max{(1,\mu)}\|r(x,1)\|\text{~ for any fixed ~}\mu >0. \end{array} $$
(6)

Lemma 2.6

[10] Let \({\Omega }\subset \mathbb R^{n}\) be a closed convex set and h be a real-valued function with \(\text {dom} h = \mathbb R^{n}\). Denote \(\widetilde {\Omega }:= \{x\in {\Omega }: h(x)\le 0\}\). If \(\widetilde {\Omega }\) is nonempty and h is Lipschitz continuous on Ω with modulus θ > 0, then

$$ \text{dist}(x,\widetilde{\Omega})\ge \theta^{-1}\max\{h(x),0\} \text{~ for all ~} x\in {\Omega}. $$

Lemma 2.7

[40, 41] Let {γk} and {βk} be the nonnegative real number sequences satisfying \({\sum }_{k=0}^{\infty } \beta _{k} < \infty \) and γk+ 1γk + βk for all k. Then, the sequence {γk} is convergent.

Before ending this section, we use the following lemma to show some sufficient condition about SD ≠ ∅, see, for example, [2, Proposition 2.1] and [42, Proposition 1].

Lemma 2.8

If one of the following statements hold

  1. (a)

    F is pseudomonotone on C and S ≠ ∅;

  2. (b)

    F is the gradient of f, where f is a differentiable quasiconvex function on an open set \(K\supset C\) and can attains its global minimum on C;

  3. (c)

    F is quasimonotone on C, F≠ 0 and C is bounded;

  4. (d)

    F is quasimonotone on C, F≠ 0 on C and there exists a positive number r such that, for every xC with ∥x∥≥ r, there exists yC such that ∥y∥≤ r and 〈F(x), yx〉≤ 0;

  5. (e)

    F is quasimonotone on C and SST ≠ ∅ with ST := {xC|〈F(x), yx〉 = 0,for all yC};

  6. (f)

    F is quasimonotone on C, intC is nonempty and there exists xS such that F(x)≠ 0,

then SD is nonempty.

3 IPA and its convergence analysis

To solve nonmonotone Problem (1) with SD ≠ ∅, we see that the next iterate point in [2] is generated by projecting a vector onto the intersection of the feasible set C and k + 1 half-spaces. In this section, we present a new infeasible projection-based algorithm (IPA) for Problem (1) without monotonicity. The next iteration point of IPA is generated by projecting the current iteration point onto only one half-space, which is selected from the former k + 1 half-spaces and has the largest distance from the current iteration point. In this section, we first present our IPA as Algorithm 1 below. Then, we will show the well-definedness and the global convergence of IPA.

figure a

Remark 3.1

The line search procedure in (??) is inspired by [11]. The initial step-size \({\alpha _{k}^{0}}\) therein is suggested as αk− 1. This together with the fact η ∈ (0, 1) show that {αk} is nonincreasing. In this paper, we relax the range of \({\alpha _{k}^{0}}\) on a bounded closed interval below away from zero. Thus, we can use Barzilai-Borwein step-size to accelerate algorithm (Barzilai-Borwein step-size is wildly used in convex and nonconvex optimization). Moreover, to avoid a smaller step-size, we enlarge the initial step-size \({\alpha _{k}^{0}}\) by using a self-adaptive strategy. So, our \({\alpha _{k}^{0}}\) is computed by (30), see also our former work [43].

The procedure to choose the half-space which has the largest distance from xk to half-spaces {H0, ⋯ , Hk} needs k + 1 projection from xk onto each half-space in {H0, ⋯ , Hk}. Fortunately, this procedure can be performed in a parallel way. Hence, together with Lemma 2.3, the computational cost of this procedure can be ignored.

Now, we show the well-definedness of line search of IPA whenever F is continuous on \(\mathbb {R}^{n}\).

Lemma 3.2

Suppose that F is continuous on \(\mathbb {R}^{n}\). Let \(\alpha \in [\alpha _{\min \limits }, \alpha _{\max \limits }]\) be a fixed number and η ∈ (0,1). Then there exists a nonnegative integer m such that

$$ \begin{array}{@{}rcl@{}} \alpha \eta^{m} \|F(x) - F(P_{C}(x - \alpha \eta^{m} F(x)))\| \leq \sigma\|x - P_{C}(x - \alpha \eta^{m} F(x))\|. \end{array} $$
(11)

Proof

If xS, then, from Lemma 2.4 (a), we have x = PC(xαF(x)). Hence, (11) holds at m = 0.

If xS, from Lemma 2.4 (b), we see that ∥r(x, ν)∥ > 0 for any ν > 0. In this case, we claim that inequality (11) holds after finitely many iterations. Suppose to the contrary that

$$ \alpha \eta^{m} \|F(x) - F(P_{C}(x - \alpha \eta^{m} F(x)))\| > \sigma\|x - P_{C}(x - \alpha \eta^{m} F(x))\|\text{~ for all ~} m. $$
(12)

Then, we consider the following two cases.

  1. Case A:

    If xC, then we have PC(x) = x. This together with the continuity of both PC(⋅) and F(⋅) on \(\mathbb {R}^{n}\) and the fact η ∈ (0,1) gives

    $$ \begin{array}{@{}rcl@{}} \|F(x) - F(P_{C}(x - \alpha \eta^{m} F(x)))\|\rightarrow 0\text{~ as ~}m\rightarrow \infty. \end{array} $$
    (13)

    On the other hand, in view of the fact that η ∈ (0,1) and the second relation of Lemma 2.5, for sufficient large m, we see that

    $$ \sigma\frac{\|x - P_{C}(x - \alpha \eta^{m} F(x))\|}{\alpha \eta^{m}} = \sigma \frac{\|r(x,\alpha \eta^{m})\|}{\alpha \eta^{m}} \geq \sigma\frac{\|r(x,1)\|}{1}>0, $$
    (14)

    where the last inequality holds due to the fact xS and Lemma 2.4(b). Consequently, (13) and (14) contradict (12).

  2. Case B:

    If xC, then we have σxPC(x)∥ > 0. Using this together with the continuity of F(⋅) and PC(⋅), it follows that \(\alpha \eta ^{m} \|F(x) - F(P_{C}(x - \alpha \eta ^{m} F(x)))\|\rightarrow 0\) and \(\sigma \|{x - P_{C}(x - \alpha \eta ^{m} F(x))}\|\rightarrow \sigma \|{x-P_{C}(x)}\|>0\) as \(m \rightarrow \infty \). However, this contradicts (12).

Hence, we see that (11) holds for some finite nonnegative integer m. □

Next, we use following lemma to show that the half-space Hk defined by hk(v) in (??) can separate strictly the current iteration point xk from SD.

Lemma 3.3

Suppose that F is continuous on \(\mathbb {R}^{n}\) and SD ≠ ∅. Let hk be defined in (??) and {xk} be the infinite sequence generated by IPA. Then, for any fixed xSD, we have

$$ \begin{array}{@{}rcl@{}} h_{k}(x^{\ast})\leq 0 \text{~ and ~} h_{k}(x^{k})\geq (1-\sigma)\|r(x^{k},\alpha_{k})\|^{2} > 0\text{~ for all ~}k. \end{array} $$
(15)

Moreover, we have \(S_{D}\subseteq H_{k}\) and xkHk for all k.

Proof

Let xSD. Invoking the definition of hk, we see that

$$ \begin{array}{@{}rcl@{}} h_{k}(x^{*})&=&\langle x^{k}-z^{k}-\alpha_{k}(F(x^{k})-F(z^{k})),x^{*}-z^{k}\rangle \\ &=& \langle x^{k}-z^{k}-\alpha_{k}F(x^{k}),x^{*}-z^{k}\rangle+\alpha_{k}\langle F(z^{k}),x^{*}-z^{k}\rangle\\ &\leq& \langle x^{k}-z^{k}-\alpha_{k}F(x^{k}),x^{*}-z^{k}\rangle\leq 0, \end{array} $$

where the first inequality holds from the definition of SD and the fact that zkC, the last inequality holds from (5), the fact zk = PC(xkαkF(xk)) and the fact xC. This completes the first relation of (15).

Next, from the definition of hk, we see that

$$ \begin{array}{@{}rcl@{}} h_{k}(x^{k})\!& = &\!\langle x^{k}-z^{k}-\alpha_{k}(F(x^{k})-F(z^{k})),x^{k}-z^{k}\rangle=\|x^{k}-z^{k}\|^{2}-\alpha_{k}\langle F(x^{k})\\&&\!-F(z^{k}),x^{k}-z^{k}\rangle\\ \!& = &\!\|r(x^{k},\alpha_{k})\|^{2} - \alpha_{k} \langle F(x^{k}) - F(z^{k}),r(x^{k},\alpha_{k})\rangle\!\!\geq\!\! \|r(x^{k},\alpha_{k})\|^{2} - \sigma\|r(x^{k},\alpha_{k})\|^{2}\\ \!& = &\!(1-\sigma)\|r(x^{k},\alpha_{k})\|^{2}>0, \end{array} $$

where the first inequality holds from Cauchy-Schwartz inequality and (??), the last inequality holds from the fact ∥r(xk, αk)∥ > tol > 0. This completes the second relation of (15).

Hence, from (15) and the definition of Hk, we further obtain that xHk and xkHk for all k. This completes the proof. □

Before analyze the convergence of the sequence {xk} generated by IPA, we study the properties of the sequence {xk, zk, αk} generated by IPA.

Theorem 3.4

Suppose that F is continuous on \(\mathbb {R}^{n}\) and SD ≠ ∅. Let σ and η be given in IPA and {xk, zk, αk} be the infinite sequence generated by IPA. Then the following statements hold.

  1. (a)

    For any fixed \(x \in \bigcap _{i=0}^{\infty }H_{i}\), the sequence {∥xkx∥} is convergent. Moreover, the sequences {xk}, {F(xk)}, {zk} and {xkzkαk(F(xk) − F(zk))} are all bounded. If F is in addition Lipschitz continuous with modulus L, one has

    $$ \begin{array}{@{}rcl@{}} \inf\limits_{k>0}{\alpha_{k}}\geq \min\{\alpha_{min}, \eta \frac{\sigma}{L}\}>0. \end{array} $$
    (16)
  2. (b)

    For any fixed \(x\in \bigcap _{i=0}^{\infty } H_{i}\), it holds that

    $$ \begin{array}{@{}rcl@{}} 0\!<\!\text{dist}^{2}(x^{k}, H_{k}) \!\leq\! \text{dist}^{2}(x^{k}, \hat H_{k}) \!\leq\! \|x^{k} - x\|^{2} - \|x^{k+1} - x\|^{2} \text{ for all }k. \end{array} $$
    (17)

    Moreover, we have \(\lim \limits _{k\rightarrow \infty }\text {dist}(x^{k}, \hat H_{k})=0\).

  3. (c)

    For any cluster point \(\bar {x}\) of the sequence {xk}, it holds that \(\bar {x} \in \bigcap _{i=0}^{\infty }H_{i}\).

  4. (d)

    (Global subsequential convergence) Any cluster point of the infinite sequence {xk} is a solution of Problem (1).

Proof

  1. (a)

    From the fact SD ≠ ∅ and Lemma 3.3 that \(S_{D}\subset \bigcap _{i=0}^{\infty }H_{i}\), we have \(\bigcap _{i=0}^{\infty }H_{i}\neq \emptyset \). Moreover, together with the fact \(\bigcap _{i=0}^{\infty }H_{i} \subset \hat H_{k}\) for any k and (??), we see that, for any fixed \(x\in \bigcap _{i=0}^{\infty }H_{i}\),

    $$ \|x^{k+1} - x \|=\|P_{\hat H_{k}}(x^{k}) - x \| = \|P_{\hat H_{k}}(x^{k}) - P_{\hat H_{k}}(x) \| \leq \|x^{k} - x \|\text{ for any }k. $$
    (18)

    Using this together with Lemma 2.7, we obtain that, for any fixed \(x\in \bigcap _{i=0}^{\infty }H_{i}\), the sequence {∥xk+ 1x∥} is convergent. Moreover, {xk} is bounded. Hence, by using the continuity of F, we see that {F(xk)} is bounded. Combine this with the fact zk = PC(xkαkF(xk)), \(0< \alpha _{k} < \alpha _{\max \limits }\) and the continuity of PC(⋅), we further obtain that {zk} is bounded. similarly, we have {F(zk)}, and {xkzkαk(F(xk) − F(zk))} are all bounded.

Now, we show that \( \inf \limits _{k>0}{\alpha _{k}}>0\) when F is Lipschitz continuous. Let L > 0 be the Lipschitz modulus of F. Then, we see that (??) holds when \(\alpha \leq \frac {\sigma }{L}\). Hence, from the definition of αk, we deduces that either \(\alpha _{k} = {\alpha _{k}^{0}}\) (if \({\alpha _{k}^{0}}\leq \frac {\sigma }{L}\)) or \(\alpha _{k} > \eta \frac {\sigma }{L}\). Using this together with the fact \(0<\alpha _{min}\leq {\alpha _{k}^{0}}\), we have

$$ \alpha_{k}\geq \min\{\alpha_{min}, \eta \frac{\sigma}{L}\}>0 \text{ \ for all }\ k. $$
  1. (b)

    The first inequality holds from the fact xkHk in Lemma 3.3, the second inequality holds from (??) and the third inequality holds from the fact \(\bigcap _{i=0}^{\infty } H_{i} \subseteq \hat H_{k}\) for any k, Lemma 2.2 and (??). This completes the proof of (17). Note that the right hand of (17) is summable. Then, we have \(\lim \limits _{k\rightarrow \infty }\text {dist}(x^{k}, \hat H_{k})=0\).

  2. (c)

    From the definition of \(\hat H_{k}\), we have

    $$ 0\leq \text{dist}(x^{k}, H_{i}) \leq \text{dist}(x^{k}, \hat H_{k})\text{ for all }i \leq k. $$

    This together with Theorem 3.4(b) implies that

    $$ \lim_{k\rightarrow\infty}\text{dist}(x^{k}, H_{i})= 0\text{ for any fixed }i. $$

    Using this together with the fact dist(⋅, Hi) is continuous on \(\mathbb {R}^{n}\) and the fact \(\bar x\) is a cluster point of {xk}, we further obtain that \(\text {dist}(\bar x, H_{i}) = 0\) for any fixed i. This completes the proof.

  3. (d)

    Let \(\bar x\) be a cluster point of the sequence {xk} and \(I\subset \mathbb {N}\) be an index set such that \(\lim _{k\rightarrow \infty ,k\in I}x^{k}= \bar x\).

We first show that \(\lim _{k\rightarrow \infty }\|r(x^{k},\alpha _{k})\|= 0\). To this end, we recall from (a) that there exists M1 > 0 such that ∥xkzkαk(F(xk) − F(zk))∥≤ M1 for all k. Using this together with the definition of hk in (??), we see that M1 is a Lipschitz modulus of hk for all k. Moreover, from Lemma 2.6 and (15), we have

$$ \text{dist}(x^{k},H_{k})\geq M_{1}^{-1} h_{k}(x^{k})\geq M_{1}^{-1}(1-\sigma)\|r(x^{k},\alpha_{k})\|^{2} > 0. $$

This together with (b) gives

$$ \begin{array}{@{}rcl@{}} \displaystyle{\lim_{k\rightarrow\infty}}{\|r(x^{k},\alpha_{k})\|} = \lim_{k\rightarrow\infty} \|x^{k}-z^{k}\|=0. \end{array} $$
(19)

Next, we show that there exists an index set \(J\subseteq I\) such that

$$ \begin{array}{@{}rcl@{}} \displaystyle{\lim_{k\rightarrow\infty,k\in J}}{\|r(x^{k},1)\|} = 0. \end{array} $$
(20)

To this end, we denote \(\widetilde \alpha :=\inf _{i\in I}\{\alpha _{i}\}\) and consider the following two cases.

  1. Case 1:

    If \(\widetilde \alpha > 0\), then \(\widetilde \alpha \leq \alpha _{i}\) for all i. Using this together with (6), we see that

    $$ \begin{array}{@{}rcl@{}} 0\leq\|r(x^{i},1)\|\leq\frac{\|r(x^{i},\alpha_{i})\|}{\min\{\alpha_{i},1\}}\leq\frac{\|r(x^{i},\alpha_{i})\|}{\min\{\widetilde\alpha,1\}}, \forall i\in I. \end{array} $$

    Hence, together with (19), we have \(\lim _{i\rightarrow \infty ,i\in I}\|r(x^{i},1)\|=0\). By taking J = I, we see that (20) holds.

  2. Case 2:

    If \(\widetilde \alpha =0\), then there exists an index set \(J\subseteq I\) such that \(\lim _{i\rightarrow \infty ,i\in J}\alpha _{i}=0\). This together with (??) implies that, for sufficiently large iJ,

    $$ \alpha_{i} \eta^{-1}\left\|F(x^{i}) - F\left( P_{C}\left( x^{i} - \alpha_{i} \eta^{-1}F(x^{i})\right)\right)\right\| > \sigma \|r(x^{i},\alpha_{i} \eta^{-1})\|. $$

    Hence, for sufficiently large iJ, it follows that

    $$ \begin{array}{@{}rcl@{}} \left\|F(x^{i}) - F\left( P_{C}\left( x^{i} - \alpha_{i} \eta^{-1}F(x^{i})\right)\right)\right\| &>& \frac{\sigma \|r(x^{i},\alpha_{i} \eta^{-1})\|}{\alpha_{i} \eta^{-1}}\\&\geq& \frac{\sigma \|r(x^{i},1 )\|}{1}>0, \end{array} $$
    (21)

    where the second inequality holds from the second relation of Lemma 2.5 and the fact that \(\lim _{i\rightarrow \infty ,i\in J}\alpha _{i}=0\), the third inequality holds because xiS.

    On the other hand, from the triangle inequality and the fact that PC(⋅) is nonexpansive, it holds that

    $$ \begin{array}{@{}rcl@{}} &&\|x^{i} - P_{C}\left( x^{i} - \alpha_{i} \eta^{-1}F(x^{i})\right)\|\\ &&\leq\|r(x^{i},\alpha_{i})\| + \| P_{C}\left( x^{i} - \alpha_{i} F(x^{i})\right) - P_{C}\left( x^{i} - \alpha_{i} \eta^{-1}F(x^{i})\right)\|\\ &&\leq\|r(x^{i},\alpha_{i})\| + \| x^{i} - \alpha_{i} F(x^{i}) - (x^{i} - \alpha_{i} \eta^{-1}F(x^{i}))\|\\ &&=\|r(x^{i},\alpha_{i})\| + \alpha_{i}(\eta^{-1}-1) \|F(x^{i})\|\rightarrow 0 \text{ (as }i\rightarrow \infty,i\in J), \end{array} $$
    (22)

    where the limit holds from the fact \(J\subseteq I\), (19), the fact \(\lim _{i\rightarrow \infty ,i\in J}\alpha _{i}=0\) and the fact {F(xi)} is bounded. Using this together with the fact \(\lim _{k\rightarrow \infty ,k\in I}x^{k}= \bar x\), the continuity of F and (22), it follows that

    $$ \lim_{i\rightarrow \infty, i\in J}\left\|F(x^{i}) - F\left( P_{C}\left( x^{i} - \alpha_{i} \eta^{-1}F(x^{i})\right)\right)\right\|= 0. $$

    This together with (21) gives (20).

Finally, by using the continuity of ∥r(⋅,1)∥ and passing to the limit in (20) along the index set J, we see that \(\|r(\bar x,1)\|=0\). This together with Lemma 2.4 completes the proof. □

Now, we are ready to show the global convergence of IPA.

Theorem 3.5

[Global sequential convergence] Suppose that F is continuous on \(\mathbb {R}^{n}\) and SD ≠ ∅. Let {xk} be the infinite sequence generated by IPA. Then {xk} is globally convergent to a solution of Problem (1).

Proof

Let x be any fixed cluster point of {xk}. Then, from Theorem 3.4(c), we see that \(x^{*} \in \bigcap _{i=0}^{\infty } H_{i}\). Hence, by replacing x by x in (18), we see that

$$ \|x^{k+1} - x^{*} \| \leq \|x^{k} - x^{*} \| \text{ for all }k. $$

Moreover, from Lemma 2.7, it follows that {∥xkx∥} is convergent. This together with the fact that x is a cluster point of {xk} gives

$$ \lim_{k\rightarrow\infty}x^{k} = x^{*}. $$

This together with Theorem 3.4(d) gets the desired conclusion. □

4 Simplified IPA when underlying mapping is Lipschitz continuous

In Section 3, we present IPA for nonmonotone Problem (1). The global convergence is established under the assumption that SD ≠ ∅. In this section, we aims to simply IPA by omitting its line search procedureFootnote 2 whenever F is L-Lipschitz continuous. We call this simplified IPA as IPAL

Now, we show the simplified IPA as Algorithm 2 below.

figure b

We use the following Proposition to show that IPA can reduce to IPAL whenever F is L-Lipschitz continuous.

Proposition 4.1

Suppose that F is L-Lipschitz continuous on \(\mathbb {R}^{n}\). Let \({\alpha _{k}^{0}} = \lambda \) with λ ∈ (0,1/L) and σ = λL. Then, IPA reduces to IPAL.

Proof

It suffices to show that, for each k, αk = λ whenever \({\alpha _{k}^{0}} = \lambda \) with λ ∈ (0,1/L). Invoking the definition of Lipschitz continuous, we see that

$$ \lambda \left\|F(x^{k}) - F\left( P_{C}\left( x^{k} - \lambda F(x^{k})\right)\right)\right\| \leq \lambda L\|x^{k} - P_{C}(x^{k} - \lambda F(x^{k}))\|. $$

Using this together with the fact that \({\alpha _{k}^{0}} \! =\! \lambda \) and σ = λL, we see that (??) holds at mk = 0 for any k. Hence, we obtain that αk = λ for any k. This completes the proof. □

From Proposition 4.1 and Theorem 3.5, we have the following corollary.

Corollary 4.2

Suppose that F is L-Lipschitz continuous on \(\mathbb {R}^{n}\) and SD ≠ ∅. Let {xk} be an infinite sequence generated by IPAL. Then {xk} is globally convergent to a solution of Problem (1).

Remark 4.3

If the Lipschitz modulus of F is known, from Proposition 4.1, we see that, by taking suitable parameters, IPA requires only one projection onto the feasible set in each iteration. Moreover, from Lemma 2.3, the computational cost of the next iteration point in IPA is much less than the algorithm in [2]. Comparing with algorithm in [27], the global convergence of simplified IPA without needing extra assumption {zC : F(z) = 0}∖SD is a finite set. Moreover, from (23) in the following example, we see that this assumption may not suit for quasimonotone variational inequality problems.

Before ending this section, we use the following example to show the extra assumption in [27] (see also assumption (3)). Note that the label of formula (3) is (cond:notsuit) may not suit for quasimonotone variational inequality problems.

Example 4.4

Let C = [− 1, 1] × [− 1, 1] and \(F(x) = ({x_{1}^{2}},0)^{T}\) with \(x = (x_{1},x_{2})^{T}\in \mathbb {R}^{2}\). Then, it is routine to check that F is quasimonotone on C. Moreover, we see that

$$ \begin{array}{@{}rcl@{}} S=\{(x_{1},x_{2}):x_{1}=-1 \text{ or } 0, x_{2}\in [-1,1]\} \text{ and } S_{D} = \{(-1,x_{2}): x_{2}\in [-1,1]\}. \end{array} $$

Moreover,

$$ \begin{array}{@{}rcl@{}} \{z\in C: F(z) = 0\}\backslash S_{D} = \{(0,x_{2}): x_{2}\in [-1,1]\}. \end{array} $$
(23)

5 Convergence rate of IPA

In this section, we analyze the rate of the sequence generated by IPA. Our analysis based on the following local error bound; there exist positive numbers c1 and c2 such that

$$ \begin{array}{@{}rcl@{}} \text{dist}(x,S_{D})\leq c_{1} \|r(x,1)\|\text{ for all }x\in \mathbb{R}^{n} \text{ with }\|r(x,1)\|\leq c_{2}, \end{array} $$
(24)

where SD is the solution set of dual variational inequality of Problem (1). This condition is different with the error bound condition in [11], see formula (3) therein. The difference is that we replace S with SD in [11] therein. However, from (2), we see that (24) is coincide with the error bound in [11] if F is in addition pseudomonotone on C. The error bound plays an central role in convergence rate analysis. Interested reader can refer [44,45,46] and the survey paper [47] for the sufficient conditions of the error bound.

Theorem 5.1

Suppose that F is Lipschitz continuous on \(\mathbb {R}^{n}\) with modulus L and SD ≠ ∅. Let {xk} be the infinite sequence generated by IPA. Then, the following statements hold.

  1. (a)

    There exist t1 > 0 and t2 > 0 such that

    $$ \begin{array}{@{}rcl@{}} t_{1}\|r(x^{k},1)\|\leq\text{dist} (x^{k}, H_{k})\leq t_{2} \|r(x^{k},1)\text{ for all }k. \end{array} $$
    (25)
  2. (b)

    If in addition (24) holds, the convergence of the sequence generated by IPA is Q-linear; i.e., there exists a positive number \(\tilde c \in [0,1)\) such that

    $$ \begin{array}{@{}rcl@{}} \text{dist} (x^{k+1}, S_{D})\leq \tilde c\cdot \text{dist} (x^{k},S_{D})\text{ for sufficient large }k. \end{array} $$
    (26)

Proof

  1. (a)

    From Lemma 3.3, we have xkHk for all k. Using this together with Lemma 2.3, we see that

    $$ \begin{array}{@{}rcl@{}} P_{H_{k}}(x^{k})\!&=&\! x^{k}-\frac{\langle{x^{k} - z^{k} - \alpha_{k}(F(x^{k})-F(z^{k}))}\rangle{x^{k}-z^{k}}}{\|{x^{k}-z^{k} - \alpha_{k}(F(x^{k}) - F(z^{k}))}\|^{2}}(x^{k}-z^{k}-\alpha_{k}(F(x^{k})\\&&\!-F(z^{k}))). \end{array} $$
    (27)

Moreover, it follows that

$$ \begin{array}{@{}rcl@{}} &&\text{dist}(x^{k},H_{k}) = \|\frac{\langle{x^{k}-z^{k}-\alpha_{k}(F(x^{k})-F(z^{k}))},{x^{k}-z^{k}}\rangle}{\|{x^{k}-z^{k}-\alpha_{k}(F(x^{k})-F(z^{k}))}\|^{2}}\\&&\qquad\qquad\qquad (x^{k}-z^{k}-\alpha_{k}(F(x^{k})-F(z^{k})))\|\\ && = \frac{\left|\|x^{k}-z^{k}\|^{2} - \alpha_{k}\langle F(x^{k})-F(z^{k}),x^{k}-z^{k}\rangle\right|}{\|x^{k}-z^{k}-\alpha_{k}(F(x^{k})-F(z^{k}))\|} \\&&\geq \frac{\|x^{k}-z^{k}\|^{2} - \alpha_{k}\left|\langle{F(x^{k})-F(z^{k})},{x^{k}-z^{k}}\rangle\right|}{\|{x^{k}-z^{k}-\alpha_{k}(F(x^{k})-F(z^{k}))}\|}\\ &&\geq \frac{\|x^{k} - z^{k}\|^{2} - \alpha_{k} \|F(x^{k}) - F(z^{k})\|\|x^{k} - z^{k}\|}{\|x^{k} - z^{k} - \alpha_{k}(F(x^{k}) - F(z^{k}))\|}\geq \frac{\|x^{k}-z^{k}\|^{2} - \sigma\|x^{k}-z^{k}\|^{2}}{\|{x^{k}-z^{k}-\alpha_{k}(F(x^{k})-F(z^{k}))}\|} \\ && \geq \frac{\|x^{k}-z^{k}\|^{2} - \sigma\|x^{k}-z^{k}\|^{2}}{\|x^{k}-z^{k}\|+ \alpha_{k}\|F(x^{k})-F(z^{k})\|}\geq \frac{\|x^{k}-z^{k}\|^{2} - \sigma\|x^{k}-z^{k}\|^{2}}{\|{x^{k}-z^{k}}\|+ \sigma\|x^{k}-z^{k}\|} \\ &&= \frac{1-\sigma}{1+\sigma}\|x^{k}-z^{k}\| = \frac{1-\sigma}{1+\sigma}\|r(x^{k},\alpha_{k})\| \geq \frac{1-\sigma}{1+\sigma}\min\{1,\alpha_{k}\}\|r(x^{k},1)\|\\ && \geq \frac{1-\sigma}{1+\sigma}\min\{1,\alpha_{min}, \eta \frac{\sigma}{L}\}\|r(x^{k},1)\|, \end{array} $$

where the first inequality holds from triangle inequality and the fact αk > 0 for all k, the second inequality holds from Cauchy-Schwartz inequality, the third inequality and the fifth inequality hold from (??) and the fact σ < 1, the sixth inequality holds from (6) and the last inequality holds from (16). Let \(t_{1} = \frac {1-\sigma }{1+\sigma }\min \limits \{1,\alpha _{min}, \eta \frac {\sigma }{L}\}\). Then, the first inequality of (25) holds.

Next, from (27), we see that

$$ \begin{array}{@{}rcl@{}} &&\text{dist}(x^{k},H_{k})\leq \frac{\|x^{k}-z^{k}\|^{2} + \sigma\|x^{k}-z^{k}\|^{2}}{\|x^{k}-z^{k}\|- \sigma\|x^{k}-z^{k}\|}= \frac{1+\sigma}{1-\sigma}\|r(x^{k},\alpha_{k})\|\\ &&\leq \frac{1+\sigma}{1-\sigma}\max\{1,\alpha_{k}\}\|r(x^{k},1)\|\leq \frac{1+\sigma}{1-\sigma}\max\{1,\alpha_{\max}\}\|r(x^{k},1)\|, \end{array} $$

where the first inequality holds from the fact αkF(xk) − F(zk)∥≤ σxkzk∥, the second inequality holds from (6) and the last inequality holds from the facts \(0<{\alpha _{k}^{0}}\leq \alpha _{\max \limits }\) and 0 < η < 1. Let \(t_{2} = \frac {1+\sigma }{1-\sigma }\max \limits \{1,\alpha _{\max \limits }\}\). Then, the second inequality of (25) holds.

  1. (b)

    Note the fact SD = ∩yC{x : 〈F(y), yx〉≥ 0} that SD is closed and convex. Using this together with the assumption SD ≠ ∅, we see that there exists only one vector \(\bar x^{k}\in S_{D}\) such that

    $$ \begin{array}{@{}rcl@{}} \bar x^{k}: = P_{S_{D}}(x^{k}) \text{ for each }k. \end{array} $$
    (28)

We first claim that the following inequality holds

$$ \begin{array}{@{}rcl@{}} \text{dist}^{2}(x^{k+1}, S_{D})\leq \text{dist}^{2}(x^{k}, S_{D}) - {t_{1}^{2}} \|r(x^{k},1)\|^{2}. \end{array} $$
(29)

To this end, recall from Lemma 3.3 that \(S_{D}\subset \bigcap _{k=0}^{\infty } H_{k}\) and (17), we see that

$$ \|x^{k+1}- \bar x^{k}\|^{2} \leq \|x^{k}-\bar x^{k}\|^{2} - \text{dist}^{2}(x^{k}, H_{k}). $$

Using this, we obtain that

$$ \begin{array}{@{}rcl@{}} &&\text{dist}^{2}(x^{k+1}, S_{D})\leq \text{dist}^{2}(x^{k+1}, \bar x^{k})\leq \text{dist}^{2}(x^{k}, \bar x^{k}) - \text{dist}^{2}(x^{k}, H_{k})\\ &&= \text{dist}^{2}(x^{k}, S_{D}) - \text{dist}^{2}(x^{k}, H_{k})\leq \text{dist}^{2}(x^{k}, S_{D}) - {t_{1}^{2}} \|r(x^{k},1)\|^{2}, \end{array} $$

where the first equality holds from (28) and the last inequality holds from (25).

Now, we are ready to show our main conclusion.

From Theorem 3.4(b), we see that \(\lim \limits _{k\rightarrow \infty }\text {dist}(x^{k}, H_{k})=0\). Using this together with the first relation of (25), it follows that \(\lim \limits _{k\rightarrow \infty }\|r(x^{k},1)\|=0\). Hence, for any given c2 > 0, there exists integer N > 0 such that ∥r(xk,1)∥≤ c2 for all kN. This together with (29) and (24) gives

$$ \begin{array}{@{}rcl@{}} \text{dist}^{2}\left( x^{k+1}, S_{D}\right)\leq \left( 1-\frac{{t_{1}^{2}}}{{c_{1}^{2}}}\right)\text{dist}^{2}\left( x^{k}, S_{D}\right)\text{ for all } k\geq N. \end{array} $$

Note the fact \(\text {dist}^{2}\left (x^{k+1}, S_{D}\right ) \geq 0\) that \(1-\frac {{t_{1}^{2}}}{{c_{1}^{2}}}\geq 0\). Let \(\tilde c = \sqrt {\left (1-\frac {{t_{1}^{2}}}{{c_{1}^{2}}}\right )}\). Then (26) holds. □

6 Numerical experiments

In this section, we test IPA, Algorithm 2.1 in [2] (YH for short) and Algorithm 2 in [31] (DHF for short) for high-dimensional nonmonotone variational inequalities problem. All codes are written in Matlab and performed in Matlab2015a on a notebook with Intel(R) Core(TM) i7-5500U CPU (2.40GHZ 2.40 GHZ) and 8 GB of RAM.

In IPA, we take \(\alpha _{\min \limits }=10^{-10}\), \(\alpha _{\max \limits }=10^{10}\), η = 0.5 and σ = 0.99. Inspired by the renowned Barzilai-Borwein step-size and self-adaptive step-size, we take the parameter \({\alpha _{k}^{0}}\) as follows: set \({\alpha _{0}^{0}} = 1\) and take, for k ≥ 1,

$$ {\alpha_{k}^{0}} = \begin{cases} P_{[10^{-10},10^{10}]}\left( \frac{\|x^{k} - x^{k-1}\|^{2}}{\langle x^{k} - x^{k-1},F(x^{k}) - F(x^{k-1})\rangle}\right)& \text{if}\ \langle x^{k} - x^{k-1},F(x^{k}) - F(x^{k-1})\rangle \!>\! 10^{-12},\\ P_{[10^{-10},10^{10}]}(1.5\alpha_{k-1}) & \text{otherwise}, \end{cases} $$
(30)

which is similar as in our former work [43]. In YH, we take γ = 0.4 and σ = 0.99. In DHF, we take the parameters are what the corresponding reference proposed in [31, Table 5], i.e., \(\gamma _{k} = \frac {k+0.1}{30(k+1)}\), θ = 0.98 and ρ = 0.5.

Example 6.1

Let

$$ \begin{array}{@{}rcl@{}} C=[-1,1]^{n} \text{ and } F: (x_{1},\dots,x_{n}) \mapsto ({x_{1}^{2}},\cdots,{x_{n}^{2}}). \end{array} $$

Then, it is routine to check that \(S_{D} = \{(-1,\dots ,-1)\}\) and \(S = \{(x_{1},\dots ,x_{n}):x_{i} \in \{-1,0\},\forall i \}\). Moreover, from the fact that \(x\rightarrow x^{2}\) is quasimonotone on [− 1,1], [3, Theorem 3.1] and [3, example 3.1], we see that F is not quasimonotone on C whenever n ≥ 2.

In Example 6.1, the terminated criterion for YH and DHF are both tol = 10− 5 and the terminated criterion for IPA is tol = 10− 6, i.e., the procedure of YH and DHF terminated whenever ∥r(x, 1)∥ ≤ 10− 5 and the procedure of IPA terminated whenever ∥r(x, α)∥≤ 10− 6. We first generate 9 initial points x0 by setting

$$ x^{0}: = 0.1*i*ones(n,1), i\in \{1,2,3,\dots,9\}, $$

where \(ones(n,1)\in \mathbb {R}^{n\times 1}\) is a vector with each component is 1. Next, for a fixed dimensional number n, we repeatedly use YH, DHF and IPA to solve Example 6.1 with different initial point 9 times, respectively. Finally, we report the number of iterations (iter), the number of projections (np), the CPU time (in seconds) and disxSFootnote 3 in Table 1, averaged over the 9 initial points.

Table 1 Results for Example 6.1

Example 6.2

In this example, we set

$$ \begin{array}{@{}rcl@{}} C=[0,1]^{n} \text{ and } G: (x_{1},\dots,x_{n}) \mapsto ({x_{1}^{2}} - x_{1},\cdots, {x_{n}^{2}} - x_{n}). \end{array} $$

Then, we can check that \(S_{D} = \{(1,\dots ,1)\}\) and \(S = \{(x_{1},\dots ,x_{n}):x_{i} \in \{0,1\},\forall i \}\). Let I be the identity mapping. Then, from Example 6.1, we see that G is the difference of F and I.

In Example 6.2, the terminated criterion for YH and DHF are both tol = 10− 4 and the terminated criterion for IPA is tol = 10− 6. We first use YH, DHF and IPA to solve Example 6.2 with n = 1 and initial point \(x^{0}\in \{0.1, 0.2, 0.3,\dots ,0.9\}\). We let ∖ denote the procedure of algorithm lose to find the approximate solution when CPU time less than 2 min. We report iter, np, the CPU time (in seconds) and disxS in Table 2.

Table 2 Results for Example 6.2 with n = 1

Based on Table 2, we test DHF and IPA by solving high-dimension Example 6.2 with the initial point belongs to the following set

$$ \begin{array}{@{}rcl@{}} \left\{ 0.1*i*ones(n,1): i\in \{4,5,6,\dots,9\}\right\}. \end{array} $$
(31)

We report iter, np, the CPU time (in seconds) and disxS in Table 3, averaged over the 6 different initial points.

Table 3 Results for Example 6.2 with initial point choose in (31)

Remark 6.3

From Tables 12 and 3, we see that YH, DHF and IPA are all have the ability to find a solution of nonmonotone VI. Obviously, IPA is much more efficient than YH and DHF from the terms of CPU time. Moreover, from Table 2, we see that IPA is less dependent on the initial value than YH and DHF.