1 Introduction

Let H be a real Hilbert space and C be a nonempty, closed and convex subset of H. Given a bifunction \(f:C\times C\rightarrow \mathfrak {R}\), the equilibrium problem with respect to f and C, in the sense of Blum and Oettli [7, 31], is formulated as follows:

$$\begin{aligned} \text{ Find }~ x^*\in C~\text{ such } \text{ that }~f(x^*,y)\ge 0,\quad \forall y\in C. \end{aligned}$$
(EP)

The solution set of problem (EP) is denoted by EP(fC). This problem is also known as the Ky Fan inequality [10] because of his early contributions to this field. Mathematically, problem (EP) unifies various fields of optimization problems, variational inequalities, fixed point problems, Nash equilibrium problem and many others, see e.g., [7, 11, 18, 19, 22]. Due to the applicability of EPs, during the last 20 years, problem (EP) has received a great interest by many authors who study the existence of solutions as well as developing iterative schemes for solving it, see for example [4, 12, 16, 17, 23, 26, 28, 30, 32, 35, 38].

One of the most popular method for solving problem (EP) is the proximal point algorithm (PPA). The algorithm was originally introduced by Martinet [25] for solving monotone variational inequalities, and later extended by Rockafellar [33] to monotone operators. Other researchers proposed generalizations and extensions to the PPA. For example, Moudafi [26] proposed a PPA-type method for solving EPs with monotone bifunctions and in [21], non-monotone bifunctions are considered.

Let us recall the proximal point algorithm for solving problem (EP). Choose a starting point \(x_0\in H\). Given the current iterate \(x_n\), find the next iterate \(x_{n+1}\) by solving the subproblem

$$\begin{aligned} f(x_{n+1},y)+\frac{1}{\lambda }\left\langle x_{n+1}-x_n,y-x_{n+1}\right\rangle \ge 0,\quad \forall y\in C. \end{aligned}$$

By introducing the resolvent of the bifunction f (see, [9, 26]), defined by

$$\begin{aligned} T_\lambda ^f(x)=\left\{ z\in C:f(z,y)+\frac{1}{\lambda }\left\langle z-x,y-z\right\rangle \ge 0,\quad \forall y\in C\right\} \end{aligned}$$

the PPA can be written in a compact form of \(x_{n+1}=T_\lambda ^f(x_n)\), where \(\lambda >0\). It was known that under some suitable conditions, the sequence \(\left\{ x_n\right\} \) generated by the PPA weak converges to some solution of problem (EP) provided by the monotonicity of f. In the case when f is strongly monotone, then the \(\left\{ x_n\right\} \) converges strongly to the unique solution of problem (EP), see, e.g., [8, Theorem 12]. Recently, Iusem and Sosa [20] have studied and analyzed further a proximal point method for problem (EP) in a Hilbert space which improved upon previously known convergence results.

In the light of Antipin’s research [1,2,3, 12], Moudafi [27] introduced the following so-called inertial proximal point algorithm (IPPA): given \(x_{n-1}\), \(x_n\) and two parameters \(\alpha _n\in [0,1)\) and \(r_n>0\), find \(x_{n+1}\in C\) such that

$$\begin{aligned} f(x_{n+1},y)+\frac{1}{r_n}\left\langle x_{n+1}-x_n-\alpha _n(x_n-x_{n-1}),y-x_{n+1}\right\rangle \ge 0,\quad \forall y\in C, \end{aligned}$$

i.e., \(x_{n+1}=T_{r_n}^f(y_n)\) with \(y_n=x_n+\alpha _n(x_n-x_{n-1})\). Clearly when \(\alpha _n=0\) then the IPPA reduces to the original PPA in [26]. Moudafi [27] proved that the sequence \(\left\{ x_n\right\} \) generated by the IPPA converges weakly to some solution of problem (EP) under the assumption

$$\begin{aligned} \sum _{n=1}^\infty \alpha _n||x_{n}-x_{n-1}||^2<+\infty . \end{aligned}$$
(*)

Very recently, in [14] the author has reconsidered the IPPA in incorporating with the Krasnoselski–Mann iteration and proposed a weakly convergent algorithm, namely the inertial-like proximal algorithm, for solving problem (EP). The convergence of the inertial-like algorithm in [14] is established without assuming the condition (*). The numerical results in [14] also suggest the faster convergence of the IPPA over others.

It can be seen that the inertial-like algorithms share a common feature and is that the next iterate \(x_{n+1}\) is constructed based on the previous two iterations \(x_{n-1}\) and \(x_n\). This is believed to cause an “inertial effect” which can improve the convergence rate of the original algorithm. Recently, Chbani and Riahi [8] introduced another kind of inertial term which is a convex combination of the two previous iterates, i.e., \(y_n=x_n-\alpha _n(x_n-x_{n-1})=(1-\alpha _n)x_n+\alpha _n x_{n-1}\). By combining the proximal point method, their inertial technique and the viscosity method, Chbani and Riahi have introduced an inertial algorithm, namely Algorithm (RIPAFAN). Under suitable conditions imposed on the control parameters and associated cost bifunction, the RIPAFAN converges either weakly or strongly. Some other inertial-like algorithms for solving problem (EP) can be found, for example, in [13, 15] and the many references therein.

Motivated and inspired by the aforementioned results above, in this paper we study and propose some inertial-like proximal point algorithms (IPPA) for solving EPs in areal Hilbert spaces with strong convergence property. Firstly, we consider a class of strongly monotone bifunctions and establish the rate of convergence (at least linearly) of the first IPPA. This algorithm is requires the knowledge of the modulus of strong monotonicity of the cost bifunction. Since this constant might be unknown or difficult to approximate in practice, we also introduce a second variant named the modified IPPA and does not require the knowledge of this constant. For this algorithm we also provide convergence theorem and rate of convergence. Last we introduce a so-called Mann-like IPPA for solving a wider class of EPs with monotone bifunctions and compared with the Krasnoselski-Mann IPPA in [14], we provide strong convergent. Several numerical results are reported and compared with related methods to illustrate the convergence and advantages of our proposed algorithms.

The rest of this paper is organized as follows: in Sect. 2, we recall some definitions and preliminary results for further use. Sects. 3 presents the algorithms for a class of strongly monotone bifunctions while Sect. 4 deals with the algorithm for monotone bifunctions. In Sect. 5, several numerical results are reported to illustrate the computational performance of the new algorithms over some existing algorithms.

2 Preliminaries

Let H be a real Hilbert space with the inner product \(\left\langle \cdot ,\cdot \right\rangle \) and the induced norm \(||\cdot ||\). Let C be a nonempty, closed and convex subset of H and \(f: C\times C\rightarrow \mathfrak {R}\) be a bifunction. Throughout this paper, we consider the following standard assumptions imposed on the bifunction f:

  1. (A1)

    \(f(x,x)=0\) for all \(x\in C\);

  2. (A2a)

    f is strongly monotone on C, i.e., there exists a constant \(\gamma >0\) such that

    $$\begin{aligned} f(x,y)+f(y,x)\le -\gamma ||x-y||^2,\quad \forall x,y\in C; \end{aligned}$$
  3. (A2b)

    f is monotone on C, i.e.,

    $$\begin{aligned} f(x,y)+f(y,x)\le 0,\quad \forall x,y\in C; \end{aligned}$$
  4. (A3)

    f(., y) is upper-hemicontinuous for each y, i.e., for all \(x,y,z\in C\),

    $$\begin{aligned} \lim _{t\rightarrow 0}f(tz+(1-t)x,y)\le f(x,y); \end{aligned}$$
  5. (A4)

    f(x, .) is convex and lower semi-continuous for each \(x\in C\);

  6. (A5)

    EP(fC) is nonempty.

It is obvious that the hypothesis (A2a) implies the hypothesis (A2b). Under the hypothesis (A1), (A2b), (A3) and (A4) then the resolvent \(T_\lambda ^f\) of the cost bifunction f is single-valued and \(EP(f,C)=Fix(T_\lambda ^f)\) (see, for example [9]) where \(Fix(T_\lambda ^f)\) is denoted by the set of fixed points of \(T_\lambda ^f\). Moreover, the solution set EP(fC) of problem (EP) is closed and convex.

In any Hilbert space, we have the following result.

Lemma 2.1

[6, Corollary 2.14] For all \(x,y\in H\) and \(\alpha \in \mathfrak {R}\), the following equality holds,

$$\begin{aligned} ||\alpha x+(1-\alpha )y||^2=\alpha ||x||^2+(1-\alpha )||y||^2-\alpha (1-\alpha )||x-y||^2. \end{aligned}$$

The following technical lemmas are useful for our algorithms analysis in Sect. 4.

Lemma 2.2

[39] Let \(\left\{ a_n\right\} \) be a sequence of nonnegative real numbers. Suppose that

$$\begin{aligned} a_{n+1} \le (1-\gamma _n)a_n+\gamma _n\delta _n,\,\,\,\forall n\ge 0, \end{aligned}$$

where the sequences \(\{\gamma _n\}\) in (0, 1) and \(\{\delta _n\}\) in \(\mathfrak {R}\) satisfy the conditions: \(\lim _{n\rightarrow \infty }\gamma _n=0\), \(\sum _{n=1}^\infty \gamma _n=\infty \) and \(\lim \sup _{n\rightarrow \infty }\delta _n\le 0\). Then \(\lim _{n\rightarrow \infty }a_n=0\).

Lemma 2.3

[24, Remark 4.4] Let \(\left\{ \epsilon _n\right\} \) be a sequence of non-negative real numbers. Suppose that, for any integer m, there exists an integer p such that \(p\ge m\) and \(\epsilon _p\le \epsilon _{p+1}\). Let \(n_0\) be an integer such that \(\epsilon _{n_0}\le \epsilon _{n_0+1}\) and define

$$\begin{aligned} \tau (n)=\max \left\{ k \in N : n_0 \le k \le n, \ \epsilon _k \le \epsilon _{k+1} \right\} \end{aligned}$$

for all \(n\ge n_0\). Then \(0\le \epsilon _n \le \epsilon _{\tau (n)+1}\) for all \(n\ge n_0\). Furthermore, the sequence \(\left\{ \tau (n)\right\} _{n\ge n_0}\) is non-decreasing and tends to \(+\infty \) as \(n\rightarrow \infty \).

3 Inertial proximal point algorithms for strongly monotone bifunctions

In this section, we present two inertial proximal point algorithms for a special class of strongly monotone bifunctions. The first one, namely IPPA, is presented when the modulus of strong monotonicity of the cost bifunction is known. In this case, the rate of convergence (at least linearly) of the algorithm is established under some mild conditions on the control parameters. Afterwards we introduce a modified inertial proximal point algorithm (MIPPA) where this the modulus is not known in advance and convergence rate is also established. Throughout this section, under the hypotheses (A1), (A2a), (A3)–(A4), problem (EP) has a unique solution, see, e.g., [29, Proposition 1]. This unique solution is denoted by \(x^\dagger \).

3.1 IPPA with the prior knowledge of the modulus of strong pseudomonotonicity

figure a

The analysis of the algorithm is next.

Theorem 3.1

Under the hypotheses (A1), (A2a), (A3)–(A4), the sequence \(\left\{ x_n\right\} \) generated by Algorithm 3.1 converges strongly (at least linearly) to the unique solution \(x^\dagger \) of problem (EP). Moreover, there exists \(M>0\) such that

$$\begin{aligned} ||x_{n+1}-x^\dagger ||^2\le \frac{M}{(1+2\gamma \lambda )^n}, \end{aligned}$$

where \(\gamma \) is the modulus of strong monotonicity of bifunction f.

Proof

It follows from the definitions of \(x_{n+1}\) and \(T_\lambda ^f\) that

$$\begin{aligned} f(x_{n+1},y)+\frac{1}{\lambda }\left\langle x_{n+1}-w_n,y-x_{n+1}\right\rangle \ge 0,\quad \forall y\in C, \end{aligned}$$

which, with \(y=x^\dagger \in C\), leads to

$$\begin{aligned} f(x_{n+1},x^\dagger )+\frac{1}{\lambda }\left\langle x_{n+1}-w_n,x^\dagger -x_{n+1}\right\rangle \ge 0. \end{aligned}$$
(1)

Since \(x^\dagger \) is the solution of problem (EP) and \(x_{n+1}\in C\), \(f(x^\dagger ,x_{n+1})\ge 0\). Thus \(f(x_{n+1},x^\dagger )\le -\gamma ||x_{n+1}-x^\dagger ||^2\) because f is strongly monotone. This together with relation (1) implies that \( -\gamma \lambda ||x_{n+1}-x^\dagger ||^2+\left\langle x_{n+1}-w_n,x^\dagger -x_{n+1}\right\rangle \ge 0 \) or

$$\begin{aligned} 2\gamma \lambda ||x_{n+1}-x^\dagger ||^2\le 2\left\langle x_{n+1}-w_n,x^\dagger -x_{n+1}\right\rangle . \end{aligned}$$
(2)

Applying the equality \(2\left\langle a,b\right\rangle =||a+b||^2-||a||^2-||b||^2\) for \(a=x_{n+1}-w_n,~b=x^\dagger -x_{n+1}\) to relation (2), we obtain

$$\begin{aligned} 2\gamma \lambda ||x_{n+1}-x^\dagger ||^2\le ||w_n-x^\dagger ||^2 - ||x_{n+1}-w_n||^2-||x^\dagger -x_{n+1}||^2. \end{aligned}$$

Thus

$$\begin{aligned} (1+2\gamma \lambda )||x_{n+1}-x^\dagger ||^2\le ||w_n-x^\dagger ||^2 - ||x_{n+1}-w_n||^2. \end{aligned}$$
(3)

Moreover, it follows from the definition of \(w_n\) and Lemma 2.1 that

$$\begin{aligned} ||w_n-x^\dagger ||^2= & {} ||x_n+\theta (x_n-x_{n-1})-x^\dagger ||^2\nonumber \\= & {} ||(1+\theta )(x_n-x^\dagger )+(-\theta )(x_{n-1}-x^\dagger )||^2\nonumber \\= & {} (1+\theta )||x_n-x^\dagger ||^2-\theta ||x_{n-1}-x^\dagger ||^2+ \theta (1+\theta )||x_n-x_{n-1}||^2\nonumber \\ \end{aligned}$$
(4)

and

$$\begin{aligned} ||x_{n+1}-w_n||^2= & {} ||x_{n+1}-x_n-\theta (x_n-x_{n-1})||^2\nonumber \\= & {} ||x_{n+1}-x_n||^2+\theta ^2||x_n-x_{n-1}||^2-2\theta \left\langle x_{n+1}-x_n,x_n-x_{n-1} \right\rangle \nonumber \\\ge & {} ||x_{n+1}-x_n||^2+\theta ^2||x_n{-}x_{n-1}||^2-2\theta ||x_{n+1}{-}x_n|| ||x_n{-}x_{n-1}||\nonumber \\ {\ge }&||x_{n+1}{-}x_n||^2{+}\theta ^2||x_n{-}x_{n-1}||^2 {-}\theta \left[ ||x_{n+1}{-}x_n||^2{+} ||x_n-x_{n{-}1}||^2\right] \nonumber \\= & {} (1-\theta )||x_{n+1}-x_n||^2+(\theta ^2-\theta )||x_n-x_{n-1}||^2. \end{aligned}$$
(5)

Combining relations (3)–(5), we obtain

$$\begin{aligned} (1+2\gamma \lambda )||x_{n+1}-x^\dagger ||^2\le & {} (1+\theta )||x_n-x^\dagger ||^2- \theta ||x_{n-1}-x^\dagger ||^2 \\&+2\theta ||x_n-x_{n-1}||^2-(1-\theta )||x_{n+1}-x_n||^2. \end{aligned}$$

Thus

$$\begin{aligned} (1+2\gamma \lambda )||x_{n+1}-x^\dagger ||^2- & {} \theta ||x_n-x^ \dagger ||^2+(1-\theta )||x_{n+1}-x_n||^2\nonumber \\\le & {} ||x_n-x^\dagger ||^2-\theta ||x_{n-1}-x^\dagger ||^2+ 2\theta ||x_n-x_{n-1}||^2\nonumber \\ \end{aligned}$$
(6)

or

$$\begin{aligned}&(1+2\gamma \lambda )\left[ ||x_{n+1}-x^\dagger ||^2-\frac{\theta }{1+ 2\gamma \lambda }||x_n-x^\dagger ||^2+\frac{1-\theta }{1+2\gamma \lambda }||x_{n+1}-x_n||^2\right] \nonumber \\\le & {} ||x_n-x^\dagger ||^2-\theta ||x_{n-1}-x^\dagger ||^2+2 \theta ||x_n-x_{n-1}||^2. \end{aligned}$$
(7)

It is obvious that

$$\begin{aligned} \theta \ge \frac{\theta }{1+2\gamma \lambda }. \end{aligned}$$
(8)

Since \(0\le \theta \le \frac{1}{3+4\lambda \gamma }\), we obtain \(0\le \theta (3+4\lambda \gamma )\le 1\). Thus \(2\theta (1+2\lambda \gamma )\le 1-\theta \), which implies that

$$\begin{aligned} 0\le 2\theta \le \frac{1-\theta }{1+2\lambda \gamma }. \end{aligned}$$
(9)

Combining relations (7)–(9), we obtain

$$\begin{aligned}&(1+2\gamma \lambda )\left[ ||x_{n+1}-x^\dagger ||^2-\frac{\theta }{1+2\gamma \lambda }||x_n-x^\dagger ||^2+\frac{1-\theta }{1+2\gamma \lambda }||x_{n+1}-x_n||^2\right] \nonumber \\\le & {} ||x_n-x^\dagger ||^2-\frac{\theta }{1+2\gamma \lambda }||x_{n-1}-x^ \dagger ||^2+\frac{1-\theta }{1+2\gamma \lambda }||x_n-x_{n-1}||^2 \end{aligned}$$
(10)

or

$$\begin{aligned} a_{n+1}\le \frac{1}{1+2\gamma \lambda }a_n,\quad \forall n\ge 1 \end{aligned}$$
(11)

where

$$\begin{aligned} a_n=||x_n-x^\dagger ||^2-\frac{\theta }{1+2\gamma \lambda }||x_{n-1}- x^\dagger ||^2+\frac{1-\theta }{1+2\gamma \lambda }||x_n-x_{n-1}||^2. \end{aligned}$$
(12)

It follows from relation (11), by the induction, that

$$\begin{aligned} a_{n+1}\le \frac{1}{(1+2\gamma \lambda )^n}a_1, \end{aligned}$$
(13)

We have

$$\begin{aligned} ||x_n-x^\dagger ||^2\le 2||x_n-x_{n+1}||^2+2||x_{n+1}-x^\dagger ||^2. \end{aligned}$$

Thus, from the definition of \(a_{n+1}\) [see (12) with \(n=n+1\)], we obtain

$$\begin{aligned} a_{n+1}= & {} ||x_{n+1}-x^\dagger ||^2-\frac{\theta }{1+2\gamma \lambda }|| x_n-x^\dagger ||^2+\frac{1-\theta }{1+2\gamma \lambda }||x_{n+1}-x_n||^2\nonumber \\\ge & {} ||x_{n+1}-x^\dagger ||^2-\frac{\theta }{1+2\gamma \lambda }\left[ 2|| x_n-x_{n+1}||^2+2||x_{n+1}-x^\dagger ||^2\right] \nonumber \\&+\frac{1-\theta }{1+2\gamma \lambda }||x_{n+1}-x_n||^2\nonumber \\= & {} \left( 1-\frac{2\theta }{1+2\gamma \lambda }\right) ||x_{n+1}-x^\dagger ||^2+ \frac{1-3\theta }{1+2\gamma \lambda }||x_{n+1}-x_n||^2\nonumber \\\ge & {} \frac{1}{3}||x_{n+1}-x^\dagger ||^2, \end{aligned}$$
(14)

where the last inequality follows from the facts that \(0\le \theta \le \frac{1}{3+4\lambda \gamma }< \frac{1}{3}\) and \(\lambda \gamma >0\). This together with (13) implies that

$$\begin{aligned} ||x_{n+1}-x^\dagger ||^2\le \frac{3a_1}{(1+2\gamma \lambda )^n}=\frac{M}{(1+2\gamma \lambda )^n}, \end{aligned}$$
(15)

where \(M=3a_1\) and

$$\begin{aligned} a_1=||x_1-x^\dagger ||^2-\frac{\theta }{1+2\gamma \lambda }|| x_0-x^\dagger ||^2+\frac{1-\theta }{1+2\gamma \lambda }||x_1-x_0||^2. \end{aligned}$$

Finally, it follows from relation (11) that the sequence \(\left\{ a_n\right\} \) converges Q-linearly. This together with the definition of \(a_n\) follows the R-linear convergence of \(\left\{ ||x_n-x_{n-1}||^2\right\} \). Hence, we obtain immediately that the sequence \(\left\{ x_n\right\} \) converges R-linearly. Theorem 3.1 is proved. \(\square \)

Remark 3.1

We can introduce the following relaxed version of Algorithm 3.1 where the next iterate \(x_{n+1}\) might be more computable.

figure b

An example for the sequence \(\left\{ \varDelta _n\right\} \) satisfying condition (H2) is \(\varDelta _n=\frac{1}{(1+2\lambda \gamma )^{pn}}\), where \(p\ge 1\). In the case when \(\varDelta _n=0\) (and thus \(\delta _n=0\)), the aforementioned relaxed algorithm becomes Algorithm 3.1. This relaxed version allows more flexibility in numerical computation.

In this case, the conclusion of Theorem 3.1 is still true. Indeed, by arguing similarly to relation (10), we also obtain

$$\begin{aligned}&(1+2\gamma \lambda )\left[ ||x_{n+1}-x^\dagger ||^2-\frac{\theta }{1+2 \gamma \lambda }||x_n-x^\dagger ||^2+\frac{1-\theta }{1+2\gamma \lambda }||x_{n+1}- x_n||^2\right] \\\le & {} ||x_n-x^\dagger ||^2-\frac{\theta }{1+2\gamma \lambda }||x_{n-1}-x^ \dagger ||^2+\frac{1-\theta }{1+2\gamma \lambda }||x_n-x_{n-1}||^2+2\lambda \delta _n, \end{aligned}$$

which, from the definition of \(\delta _n\) in (H2), follows that

$$\begin{aligned} {\bar{a}}_{n+1}\le \frac{1}{1+2\gamma \lambda }{\bar{a}}_n, \end{aligned}$$
(16)

where \( {\bar{a}}_n=||x_n-x^\dagger ||^2-\frac{\theta }{1+2\gamma \lambda }||x_{n-1}-x^\dagger ||^2+\frac{1-\theta }{1+2\gamma \lambda }||x_n-x_{n-1}||^2+2\lambda \varDelta _n. \) Thus, we obtain by the induction that

$$\begin{aligned} {\bar{a}}_{n+1}\le \frac{1}{(1+2\gamma \lambda )^n}{\bar{a}}_1. \end{aligned}$$
(17)

As in relation (14), we get

$$\begin{aligned} {\bar{a}}_{n+1}\ge \frac{1}{3}||x_{n+1}-x^\dagger ||^2+2\lambda \varDelta _n\ge \frac{1}{3}||x_{n+1}-x^\dagger ||^2. \end{aligned}$$

Thus, from relation (17), one obtains

$$\begin{aligned} ||x_{n+1}-x^\dagger ||^2\le \frac{3{\bar{a}}_1}{(1+2\gamma \lambda )^n}, \end{aligned}$$

where \({\bar{a}}_n=||x_1-x^\dagger ||^2-\frac{\theta }{1+2\gamma \lambda }||x_{0}-x^\dagger ||^2+\frac{1-\theta }{1+2\gamma \lambda }||x_1-x_{0}||^2+2\lambda \varDelta _1.\)

3.2 IPPA without the prior knowledge of the modulus of strong pseudomonotonicity

In this section we present an IPPA without the prior knowledge of the modulus of strong pseudomonotonicity, called the MIPPA).

figure c

In the case, \(\varDelta _n=0\) (and thus \(\delta _n=0\)) then \(x_{n+1}\) can be shortly rewritten as \(x_{n+1}=T_\lambda ^f(w_n)\). An example for the sequences \(\left\{ \lambda _n\right\} \) and \(\left\{ \varDelta _n\right\} \) satisfied conditions (H3)–(H5) is \(\lambda _n=\frac{1}{n^q}\) and \(\varDelta _n=\frac{1}{a^{pn}}\), where \(0<q\le 1\) and \(p\ge 1\). We have the following result.

Theorem 3.2

Under the hypotheses (A1), (A2a), (A3)–(A4), the sequence \(\left\{ x_n\right\} \) generated by Algorithm 3.2 converges strongly to the unique solution \(x^\dagger \) of problem (EP). Furthermore, there exist numbers \(n_0\ge 1\) and \(M>0\) such that

$$\begin{aligned} ||x_{n+1}-x^\dagger ||^2\le \frac{M}{1+2\gamma \sum _{k=n_0}^n \lambda _k}. \end{aligned}$$
(18)

Proof

Since \(\lambda _n\rightarrow 0\), \(a>1\) and \(0\le 3\theta <1\), there exists a number \(n_0\ge 1\) such that

$$\begin{aligned} a>1+2\gamma \lambda _n, ~\forall n\ge n_0 \end{aligned}$$
(19)

and \(3\theta +4\gamma \theta \lambda _{n-1}<1\). Thus, \(2\theta (1+2\gamma \lambda _{n-1}) < 1-\theta \) or

$$\begin{aligned} 2\theta \le \frac{1-\theta }{1+2\gamma \lambda _{n-1}}, ~\forall n\ge n_0. \end{aligned}$$
(20)

Repeating the proof of relation (6), we obtain

$$\begin{aligned}&(1+2\gamma \lambda _n)||x_{n+1}-x^\dagger ||^2-\theta ||x_n-x^ \dagger ||^2+(1-\theta )||x_{n+1}-x_n||^2\nonumber \\\le & {} ||x_n-x^\dagger ||^2-\theta ||x_{n-1}-x^\dagger ||^2+2 \theta ||x_n-x_{n-1}||^2+2\lambda _n\delta _n. \end{aligned}$$
(21)

It follows from relation (19) and the non-increasing property of \(\left\{ \lambda _n\right\} \) that

$$\begin{aligned} \lambda _n\delta _n=\lambda _n\varDelta _n-a\lambda _n\varDelta _{n+1}\le \lambda _n\varDelta _n-(1+2\gamma \lambda _n)\lambda _{n+1}\varDelta _{n+1},\quad \forall n\ge n_0. \end{aligned}$$
(22)

Combining this inequality with (21), and noting the fact \(\theta \ge \frac{\theta }{1+2\gamma \lambda _{n-1}}\) and relation (20), we come to the following estimate,

$$\begin{aligned}&(1+2\gamma \lambda _n)\left[ ||x_{n+1}{-}x^\dagger ||^2{-}\frac{\theta }{1{+}2\gamma \lambda _n}||x_n{-}x^\dagger ||^2 \right. \left. {+}\frac{1{-}\theta }{1{+}2\gamma \lambda _n}||x_{n+1}{-}x_n||^2{+}2\lambda _{n+1}\varDelta _{n+1}\right] \\&\quad \le ||x_n-x^\dagger ||^2-\theta ||x_{n-1}-x^\dagger ||^2+2\theta ||x_n-x_{n-1}||^2+2\lambda _n\varDelta _n\\&\quad \le ||x_n-x^\dagger ||^2-\frac{\theta }{1+2\gamma \lambda _{n-1}}||x_{n-1}-x^\dagger ||^2+\frac{1-\theta }{1+2\gamma \lambda _{n-1}}||x_n-x_{n-1}||^2+2\lambda _n\varDelta _n, \end{aligned}$$

or

$$\begin{aligned} \widetilde{a}_{n+1}\le \frac{1}{1+2\gamma \lambda _n}\widetilde{a}_n,\quad \forall n\ge n_0, \end{aligned}$$

where \(\widetilde{a}_n=||x_n-x^\dagger ||^2-\frac{\theta }{1+2\gamma \lambda _{n-1}}||x_{n-1}-x^\dagger ||^2+\frac{1-\theta }{1+2\gamma \lambda _{n-1}}||x_n-x_{n-1}||^2+2\lambda _n\varDelta _n\). Thus, by the induction, we obtain for all \(n\ge n_0\) that

$$\begin{aligned} \widetilde{a}_{n+1}\le \frac{\widetilde{a}_{n_0}}{\varPi _{k=n_0}^n (1+2\gamma \lambda _k) }. \end{aligned}$$

As mentioned above, we also see that

$$\begin{aligned} \widetilde{a}_{n+1}\ge \frac{1}{3}||x_{n+1}-x^\dagger ||^2+2\lambda _n\varDelta _n\ge \frac{1}{3}||x_{n+1}-x^\dagger ||^2. \end{aligned}$$

Thus

$$\begin{aligned} ||x_{n+1}-x^\dagger ||^2\le \frac{3\widetilde{a}_{n_0}}{\varPi _{k=n_0}^n (1+2\gamma \lambda _k) }\le \frac{3\widetilde{a}_{n_0}}{1+2\gamma \sum _{k=n_0}^n \lambda _k}= \frac{M}{1+2\gamma \sum _{k=n_0}^n \lambda _k} \end{aligned}$$

where \(M=3\widetilde{a}_{n_0}\). Hence, since \(\sum _{n=1}^\infty \lambda _n=+\infty \), we obtain that \(||x_{n+1}-x^\dagger ||^2\rightarrow 0\) or \(\left\{ x_n\right\} \) converges strongly to \(x^\dagger \) which solves uniquely problem (EP). \(\square \)

Remark 3.2

Although the formulation of Algorithm 3.2 does not require the prior knowledge of the modulus \(\gamma \), from the proof of Theorem 3.2 we see that the convergence rate depends strictly on this constant.

Remark 3.3

In order to prove the strong convergence of Algorithm 3.2 (also for the algorithm proposed in Remark 3.1), we only need to take a sequence \(\left\{ \delta _n\right\} \subset [0,+\infty )\) such that \(\sum _{n=1}^\infty \delta _n<\infty \). However, in view of conditions (H2) and (H5), we use a sequence \(\left\{ \delta _n\right\} \) having a special structure. This is the aim in order to obtain estimate (18) of the convergence rate of the algorithm. The special structure of \(\left\{ \delta _n\right\} \) is technically used in estimate (22).

4 Mann-like IPPA for monotone bifunctions

In this section, we introduce a third inertial-like proximal point algorithm with strong convergence property for solving EPs with monotone bifunctions.

figure d

Remark that the parameter \(\gamma _n\) with the properties in (H6) can be considered as a parameter of Tikhonov, see, in [5, Theorem 7.4] which plays an important role in establishing the strong convergence of the sequence \(\left\{ x_n\right\} \) to the minimum-norm solution of problem (EP). The convergence analysis of the algorithm is presented next.

Theorem 4.1

Under the hypotheses (A1), (A2b), (A3)–(A5), the sequence \(\left\{ x_n\right\} \) generated by Algorithm 4.3 converges strongly to the minimum-norm solution \(x^\dagger \) of problem (EP).

Proof

We divide the proof into the following four steps.

Claim 1 The following estimate holds for all \(n\ge 0\)and \(x^*\in EP(f,C)\),

$$\begin{aligned}&||x_{n+1}-x^*||^2\le (1-\gamma _n)||w_n-x^*||^2-2\gamma _n\beta _n \left\langle w_n-T_\lambda ^f (w_n),x_{n+1}-x^*\right\rangle \\&\quad +\,2\gamma _n \left\langle -x^*,x_{n+1}-x^*\right\rangle . \end{aligned}$$

Indeed, by arguing similarly to relation (3) for the monotone bifunction f, we obtain

$$\begin{aligned} ||T_\lambda ^f(w_n)-x^*||^2\le ||w_n-x^*||^2 - ||T_\lambda ^f(w_n)-w_n||^2,\quad \forall x^*\in EP(f,C). \end{aligned}$$
(24)

Hence

$$\begin{aligned} ||T_\lambda ^f(w_n)-x^*||^2\le ||w_n-x^*||^2. \end{aligned}$$
(25)

From the definition of \(x_{n+1}\) and a straightforward computation, the iterate \(x_{n+1}\) can be rewritten as

$$\begin{aligned} x_{n+1}=(1-\gamma _n)\left[ (1-\beta _n)w_n+\beta _n T_\lambda ^f (w_n)\right] -\gamma _n\beta _n (w_n-T_\lambda ^f (w_n)). \end{aligned}$$

Thus, applying the inequality \(||x+y||^2\le ||x||^2+2 \left\langle y,x+y\right\rangle \) and relation (25), we get

$$\begin{aligned} ||x_{n+1}-x^*||^2=&||(1-\gamma _n)\left[ (1-\beta _n)w_n+\beta _n T_\lambda ^f (w_n)-x^*\right] \\&\quad -\gamma _n\beta _n (w_n-T_\lambda ^f (w_n))-\gamma _n x^*||^2 \\ \le&(1-\gamma _n)^2||(1-\beta _n)w_n+\beta _n T_\lambda ^f (w_n)-x^*||^2 \\&\quad -2\gamma _n\beta _n \left\langle w_n-T_\lambda ^f (w_n),x_{n+1}-x^*\right\rangle \\&-2\gamma _n \left\langle x^*,x_{n+1}-x^*\right\rangle \\ \le&(1-\gamma _n)^2\left[ (1-\beta _n)||w_n-x^*||^2+\beta _n ||T_\lambda ^f (w_n)-x^*||^2\right] \\&-2\gamma _n\beta _n \left\langle w_n-T_\lambda ^f (w_n),x_{n+1}-x^*\right\rangle +2\gamma _n \left\langle -x^*,x_{n+1}-x^*\right\rangle \\ \le&(1-\gamma _n)||w_n-x^*||^2-2\gamma _n\beta _n \left\langle w_n-T_\lambda ^f (w_n),x_{n+1}-x^*\right\rangle \\&\quad +2\gamma _n \left\langle -x^*,x_{n+1}-x^*\right\rangle , \end{aligned}$$

in which the second inequality follows from the convexity of \(||.||^2\) and the last inequality comes from relation (25) and the fact \((1-\gamma _n)^2\le 1-\gamma _n\). Thus, Claim 1 is proved.

Claim 2 The sequences \(\left\{ x_n\right\} \), \(\left\{ w_n\right\} \)are bounded.

Indeed, since \(\left\{ \beta _n \right\} \subset [a,b] \subset (0,1)\) and \(\gamma _n \rightarrow 0\), there exists \(n_0 \ge 1\) such that \(1-\beta _n-\gamma _n \in (0,1)\) for all \(n \ge n_0\). Without loss of generality, throughout the proof of Theorem 3.1, we can assume that \(1-\beta _n-\gamma _n \in (0,1)\) for all n. From the definitions of \(x_{n+1}\), \(w_n\) and relation (25), we have

$$\begin{aligned} ||x_{n+1}-x^*||=&||(1-\beta _n-\gamma _n)(w_n-x^*)+\beta _n(T_\lambda ^f(w_n)-x^*)-\gamma _n x^*||\nonumber \\ \le&(1-\beta _n-\gamma _n)||w_n-x^*||+\beta _n ||T_\lambda ^f(w_n)-x^*||+\gamma _n||x^*||\nonumber \\ \le&(1-\beta _n-\gamma _n)||w_n-x^*||+\beta _n ||w_n-x^*||+\gamma _n||x^*||\nonumber \\ =&(1-\gamma _n)||w_n-x^*||+\gamma _n||x^*||\nonumber \\ =&(1-\gamma _n)||(x_n-x^*)+\theta _n(x_n-x_{n-1})||+\gamma _n||x^*||\nonumber \\ \le&(1-\gamma _n)||x_n-x^*||+\theta _n(1-\gamma _n)||x_n-x_{n-1}||+\gamma _n||x^*||\nonumber \\ =&(1-\gamma _n)||x_n-x^*||+\gamma _n\left[ (1-\gamma _n)\frac{\theta _n}{\gamma _n}||x_n-x_{n-1}||+||x^*||\right] . \end{aligned}$$
(26)

It follows from conditions (H6) and (23) that \((1-\gamma _n)\frac{\theta _n}{\gamma _n}||x_n-x_{n-1}||\rightarrow 0\) as \(n\rightarrow \infty \). Thus, this sequence is bounded. Set \(K{=}||x^*||+\sup \left\{ (1{-}\gamma _n)\frac{\theta _n}{\gamma _n}||x_n{-}x_{n-1}||:n\ge 1\right\} \). By relation (26), we obtain

$$\begin{aligned} ||x_{n+1}-x^*||\le (1-\gamma _n)||x_n-x^*||+\gamma _n K,\quad \forall n\ge 1, \end{aligned}$$

which, by the induction, implies that \(||x_{n+1}-x^*||\le \max \left\{ ||x_0-x^*||,K\right\} \). Thus, the sequence \(\left\{ x_n\right\} \) is bounded. The boundedness of \(\left\{ w_n\right\} \) follows from its definition. This completes the proof of Claim 2.

Claim 3 We have the following estimate for all \(n\ge 0\)and \(x^*\in EP(f,C)\),

$$\begin{aligned} ||x_{n+1}-x^*||^2\le&||x_n-x^*||^2+\theta _n (1-\gamma _n) \left( ||x_n-x^*||^2-||x_{n-1}-x^*||^2\right) \\&+2\theta _n(1-\gamma _n)||x_n-x_{n-1}||^2-\beta _n ||T_\lambda ^f(w_n)-w_n||^2+\gamma _n||x^*||^2. \end{aligned}$$

Indeed, it follows from the definition of \(w_n\) that

$$\begin{aligned} ||w_n-x^*||^2=\,&||(1+\theta _n)(x_n-x^*)-\theta _n(x_{n-1}-x^*)||^2\nonumber \\ =\,&(1+\theta _n)||x_n-x^*||^2-\theta _n||x_{n-1}-x^*||^2+\theta _n(1+\theta _n)||x_n-x_{n-1}||^2\nonumber \\ \le&||x_n-x^*||^2+\theta _n\left( ||x_n-x^*||^2-||x_{n-1}-x^*||^2\right) +2\theta _n||x_n-x_{n-1}||^2. \end{aligned}$$
(27)

Thus, from the definition of \(x_{n+1}\), the convexity of \(||\cdot ||^2\) and relation (24), we obtain from the definitions of \(x_{n+1}\) and \(w_n\), we have

$$\begin{aligned} ||x_{n+1}-x^*||^2= & {} ||(1-\beta _n-\gamma _n)(w_n-x^*)+\beta _n(T_\lambda ^f(w_n)-x^*)-\gamma _n x^*||^2\\= & {} ||(1-\beta _n-\gamma _n)(w_n-x^*)+\beta _n(T_\lambda ^f(w_n)-x^*)+\gamma _n (-x^*)||^2\\\le & {} (1-\beta _n-\gamma _n)||w_n-x^*||^2+\beta _n ||T_\lambda ^f(w_n)-x^*||^2+\gamma _n||-x^*||^2\\\le & {} (1-\beta _n-\gamma _n)||w_n-x^*||^2 + \beta _n \left( ||w_n-x^*||^2-||T_\lambda ^f(w_n)-w_n||^2\right) \\&+\gamma _n||x^*||^2\\= & {} (1-\gamma _n)||w_n-x^*||^2-\beta _n ||T_\lambda ^f(w_n)-w_n||^2+\gamma _n||x^*||^2\\\le & {} (1-\gamma _n)||x_n-x^*||^2+\theta _n (1-\gamma _n) \left( ||x_n-x^*||^2-||x_{n-1}-x^*||^2\right) \\&+2\theta _n(1-\gamma _n)||x_n-x_{n-1}||^2-\beta _n ||T_\lambda ^f(w_n)-w_n||^2+\gamma _n||x^*||^2\\\le & {} ||x_n-x^*||^2+\theta _n (1-\gamma _n) \left( ||x_n-x^*||^2-||x_{n-1}-x^*||^2\right) \\&+2\theta _n(1-\gamma _n)||x_n-x_{n-1}||^2-\beta _n ||T_\lambda ^f(w_n)-w_n||^2+\gamma _n||x^*||^2, \end{aligned}$$

which completes the proof of Claim 3.

Claim 4 The sequence \(\left\{ x_n\right\} \)converges strongly to the minimum-norm solution \(x^\dagger \)of problem (EP).

Indeed, applying Claim 3 for \(x^*=x^\dagger \) and set \(\varGamma _n=||x_n-x^\dagger ||^2\), we obtain

$$\begin{aligned} \beta _n ||T_\lambda ^f(w_n)-w_n||^2&\le \varGamma _n - \varGamma _{n+1}+\theta _n (1-\gamma _n) \left( \varGamma _n-\varGamma _{n-1}\right) \nonumber \\&\quad +2\theta _n(1-\gamma _n)||x_n-x_{n-1}||^2+\gamma _n||x^\dagger ||^2. \end{aligned}$$
(28)

Now, we consider two cases.

Case 1 There exists a number \(n_0\ge 1\) such that \(\varGamma _{n+1}\le \varGamma _n\) for all \(n\ge n_0\). In this case, the limit of the sequence \(\left\{ \varGamma _n\right\} _{n}\) exists. Moreover, from condition (H6) and (23), we obtain \(\theta _n||x_n-x_{n-1}||\rightarrow 0\) and \(\gamma _n\rightarrow 0\) as \(n\rightarrow \infty \). Thus, it follows from (28) and \(\beta _n\ge a>0\) that

$$\begin{aligned} \lim _{n\rightarrow \infty }||T_\lambda ^f(w_n)-w_n||^2=0. \end{aligned}$$
(29)

Let \(\omega _w(x_n)\) denote the set of weak cluster points of the sequence \(\left\{ x_n\right\} \). It follows from the definition of \(\left\{ w_n\right\} \) that \(\omega _w(x_n)=\omega _w(w_n)\). Moreover, it follows from (29) that \(\omega (w_n)\subset EP(f,C)\) (see, for example, a proof for this conclusion in [14, Theorem 3.1]). Now, using Claim 1 with \(x^*=x^\dagger \), we obtain

$$\begin{aligned}&\varGamma _{n+1}\le (1-\gamma _n)||w_n-x^\dagger ||^2\nonumber \\&\quad +\gamma _n\left[ 2\beta _n \left\langle w_n-T_\lambda ^f (w_n),x^\dagger -x_{n+1}\right\rangle +2\left\langle -x^\dagger ,x_{n+1}-x^\dagger \right\rangle \right] . \end{aligned}$$
(30)

It follows from the definition of \(w_n\) that

$$\begin{aligned} ||w_n-x^\dagger ||^2\le & {} \left( ||x_n-x^\dagger ||+\theta _n||x_n-x_{n-1}||\right) ^2\nonumber \\= & {} ||x_n-x^\dagger ||^2+\theta _n^2||x_n-x_{n-1}||^2+2\theta _n||x_n-x^\dagger ||||x_n-x_{n-1}||\nonumber \\\le & {} ||x_n-x^\dagger ||^2+\theta _n||x_n-x_{n-1}||^2+2\theta _n||x_n-x^\dagger ||||x_n-x_{n-1}||\nonumber \\\le & {} \varGamma _n+3M\theta _n||x_n-x_{n-1}||, \end{aligned}$$
(31)

where \(M=\sup \left\{ ||x_n-x_{n-1}||,||x_n-x^\dagger ||:n\ge 1\right\} \). Thus, from relations (30) and (31), we get

$$\begin{aligned} \varGamma _{n+1}\le (1-\gamma _n)\varGamma _n+\gamma _n \delta _n, \end{aligned}$$
(32)

where

$$\begin{aligned} \delta _n= & {} 3M(1-\gamma _n)\frac{\theta _n}{\gamma _n}||x_n-x_{n-1}||+2\beta _n \left\langle w_n-T_\lambda ^f (w_n),x^\dagger -x_{n+1}\right\rangle \nonumber \\&+2\left\langle -x^\dagger ,x_{n+1}-x^\dagger \right\rangle . \end{aligned}$$
(33)

Note that \(\frac{\theta _n}{\gamma _n}||x_n-x_{n-1}||\rightarrow 0\) as \(n\rightarrow \infty \). Thus, from relations (29) and (33), we see that

$$\begin{aligned} \limsup _{n}\delta _n=2\limsup _{n}\left\langle -x^\dagger ,x_{n+1}-x^\dagger \right\rangle =2\sup _{p\in \omega _w(x_n)}\left\langle -x^\dagger ,p-x^\dagger \right\rangle \le 0, \end{aligned}$$
(34)

in which the last inequality follows from the facts that \(x^\dagger =P_{EP(f,C)}(0)\) and \(\omega _w(x_n)\subset EP(f,C)\). Thus, Lemma 2.2, together with relations (32), (34) and condition (H6), ensures that \(\varGamma _n\rightarrow 0\) or \(x_n\rightarrow x^\dagger \) as \(n\rightarrow \infty \).

Case 2 There exists a subsequence \(\left\{ \varGamma _{n_i}\right\} \) of \(\left\{ \varGamma _{n}\right\} \) such that \(\varGamma _{n_i}\le \varGamma _{n_i+1}\) for all \(i\ge 0\). In that case, it follows from Lemma 2.3 that

$$\begin{aligned} \varGamma _{\tau (n)}\le \varGamma _{\tau (n)+1},~\varGamma _n\le \varGamma _{\tau (n)+1},\quad \forall n\ge n_0, \end{aligned}$$
(35)

where \(\tau (n)=\max \left\{ k\in N:n_0\le k \le n: \varGamma _k\le \varGamma _{k+1}\right\} \). Moreover the sequence \(\left\{ \tau (n)\right\} \) is non-decreasing and \(\tau (n)\rightarrow +\infty \) as \(n\rightarrow \infty \). It follows from relations (28) and (35) that

$$\begin{aligned} \beta _{\tau (n)} ||T_\lambda ^f(w_{\tau (n)})-w_{\tau (n)}||^2&\le \theta _{\tau (n)} (1-\gamma _{\tau (n)}) \left( \varGamma _{\tau (n)}-\varGamma _{\tau (n)-1}\right) \nonumber \\&+2\theta _{\tau (n)}(1-\gamma _{\tau (n)})||x_{\tau (n)}-x_{\tau (n)-1}||^2+\gamma _{\tau (n)}||x^\dagger ||^2. \end{aligned}$$
(36)

From the definition of \(\varGamma _n\), we have

$$\begin{aligned}&\varGamma _{\tau (n)}-\varGamma _{{\tau (n)}-1}=||x_{\tau (n)}-x^\dagger ||^2-||x_{{\tau (n)}-1}-x^\dagger ||^2 \\= & {} \left( ||x_{\tau (n)}-x^\dagger ||-||x_{{\tau (n)}-1}-x^\dagger ||\right) \left( ||x_{\tau (n)}-x^\dagger ||+||x_{{\tau (n)}-1}-x^\dagger ||\right) \\\le & {} ||x_{\tau (n)}-x_{{\tau (n)}-1}|| \left( ||x_{\tau (n)}-x^\dagger ||+||x_{{\tau (n)}-1}-x^\dagger ||\right) . \end{aligned}$$

Thus, from (36), we obtain

$$\begin{aligned} \beta _{\tau (n)} ||T_\lambda ^f(w_{\tau (n)})-w_{\tau (n)}||^2&\le \theta _{\tau (n)} (1-\gamma _{\tau (n)})||x_{\tau (n)}\nonumber \\&\quad -x_{{\tau (n)}-1}|| \left( ||x_{\tau (n)}-x^\dagger ||+||x_{{\tau (n)}-1}-x^\dagger ||\right) \nonumber \\&\quad +2\theta _{\tau (n)}(1-\gamma _{\tau (n)})||x_{\tau (n)}-x_{\tau (n)-1}||^2+\gamma _{\tau (n)}||x^\dagger ||^2. \end{aligned}$$
(37)

Passing to the limit in (37) and noting that \(\theta _{\tau (n)}||x_{\tau (n)}-x_{{\tau (n)}-1}|| \rightarrow 0\) and \(\beta _{\tau (n)}\ge a>0\), we obtain

$$\begin{aligned} \lim _{n\rightarrow \infty }||T_\lambda ^f(w_{\tau (n)})-w_{\tau (n)}||^2=0. \end{aligned}$$

Thus, as in Case 1, we obtain \(\omega _w(x_{\tau (n)})=\omega _w(w_{\tau (n)})\subset EP(f,C)\) and

$$\begin{aligned} \varGamma _{\tau (n)+1}\le (1-\gamma _{\tau (n)})\varGamma _{\tau (n)}+\gamma _{\tau (n)} \delta _{\tau (n)}, \end{aligned}$$
(38)

where

$$\begin{aligned} \delta _{\tau (n)}=\,&3M(1-\gamma _{\tau (n)})\frac{\theta _{\tau (n)}}{\gamma _{\tau (n)}}||x_{\tau (n)}-x_{\tau (n)-1}||\nonumber \\&+2\beta _{\tau (n)} \left\langle w_{\tau (n)}-T_\lambda ^f (w_{\tau (n)}),x^\dagger -x_{\tau (n)+1}\right\rangle \nonumber \\&+2\left\langle -x^\dagger ,x_{\tau (n)+1}-x^\dagger \right\rangle . \end{aligned}$$
(39)

From (38), we obtain \(\gamma _{\tau (n)}\varGamma _{\tau (n)}\le \varGamma _{\tau (n)}-\varGamma _{\tau (n)+1}+\gamma _{\tau (n)} \delta _{\tau (n)}\), which from (35), implies that \(\gamma _{\tau (n)}\varGamma _{\tau (n)}\le \gamma _{\tau (n)} \delta _{\tau (n)}\). Thus

$$\begin{aligned} \varGamma _{\tau (n)}\le \delta _{\tau (n)}, \end{aligned}$$
(40)

because \(\gamma _{\tau (n)}>0\). By arguing similarly to Case 1, we also obtain \(\limsup _{n} \delta _{\tau (n)}\le 0\). Thus, from (40), we get \(\limsup _{n} \varGamma _{\tau (n)}\le 0\), which together with relation (38), implies that

$$\begin{aligned} \limsup _{n} \varGamma _{\tau (n)+1}\le 0. \end{aligned}$$

Hence \(\varGamma _{\tau (n)+1}\rightarrow 0\). Consequently, \(\varGamma _n\rightarrow 0\) because relation (35) or \(x_n\rightarrow x^\dagger \) as \(n\rightarrow \infty \). This completes the proof. \(\square \)

Observe that when \(\theta =0\) then \(\theta _n={\bar{\theta }}_n=0\) and we obtain the following result which follows directly from Theorem 4.1.

Corollary 4.1

Suppose that the assumptions (A1), (A2b), (A3)–(A5) hold. Choose two sequences \(\left\{ \beta _n\right\} \) and \(\left\{ \gamma _n\right\} \) as in Algorithm 4.3. Then, the sequence \(\left\{ x_n\right\} \) generated by

$$\begin{aligned} x_{n+1}=(1-\beta _n-\gamma _n)x_n+\beta _nT_\lambda ^f(x_n),~x_0\in H, \end{aligned}$$

converges strongly to the minimum-norm solution \(x^\dagger \) of problem (EP).

Now, we consider the constrained optimization problem

$$\begin{aligned} \min \left\{ \phi (y): y \in C\right\} , \end{aligned}$$
(OP)

where \(\phi : C \rightarrow \mathfrak {R}\) is a proper convex, lower semicontinuous function. We assume that the solution set \(\arg \min _C \phi \) of problem (OP) is nonemtpy. For each \(x,~y \in C\), set \(f(x,y)=\phi (y)-\phi (x)\). In this case, the resolvent \(T_\lambda ^f\) is the proximal mapping \(\mathrm{prox}_{\lambda \phi }\) of \(\phi \) with the parameter \(\lambda \), i.e.,

$$\begin{aligned} T_\lambda ^f(x)=\mathrm{prox}_{\lambda \phi }(x)=\arg \min \left\{ \lambda \phi (y)+\frac{1}{2}||y-x||^2:y\in C\right\} . \end{aligned}$$

Thus, we obtain immediately the following corollary from Theorem 4.1

Corollary 4.2

Suppose that \(\phi : C \rightarrow \mathfrak {R}\) is a proper convex, lower semi-continuous function, and that the solution set \(\arg \min _C \phi \) of problem (OP) is nonempty. Choose \(x_0,~x_1\in H\) and the parameters \(\lambda >0\), \(\theta \in [0,1)\), \(\left\{ \gamma _n\right\} \subset (0,+\infty )\), \(\left\{ \epsilon _n\right\} \subset [0,+\infty )\), \(\left\{ \beta _n\right\} \subset [a,1]\subset (0,1]\), \(\left\{ \theta _n\right\} \) as in Algorithm 4.3. Then, the sequence \(\left\{ x_n\right\} \) generated by

$$\begin{aligned} \left\{ \begin{array}{ll} w_n=x_n+\theta _n(x_n-x_{n-1}),\\ x_{n+1}=(1-\beta _n-\gamma _n)w_n+\beta _n \mathrm{prox}_{\lambda \phi }(w_n). \end{array} \right. \end{aligned}$$

converges strongly to the minimum-norm solution \(x^\dagger \) of problem (OP).

5 Numerical illustrations

In this section, we present numerical experiments and comparison with related algorithms illustrating the behavior of our proposed algorithms. In case that the solution of problem (EP) is unknown, we’ll use the sequence

$$\begin{aligned} D_n=||x_n-\mathrm{prox}_{\lambda f(x_n,.)}(x_n)||^2, ~n=0,1,2,\ldots \end{aligned}$$

as a measure to describe the computational performance of the algorithms. Recall that \(\mathrm{prox}_{\lambda f}\) is denoted by the proximal mapping of the function f with \(\lambda >0\), i.e.,

$$\begin{aligned} \mathrm{prox}_{\lambda f}(u)=\arg \min \left\{ \lambda f(y)+\frac{1}{2}||y-u||^2 : y\in C\right\} . \end{aligned}$$

Clearly \(D_n=0\) if and only if \(x_n\) is the solution of problem (EP). So, the convergence of \(D_n\) to 0 can be considered as if \(x_n\) converges to the solution of the problem. Otherwise, when the solution \(x^*\) of the problem is known, we use the sequence

$$\begin{aligned} E_n=||x_n-x^*||^2, ~n=0,1,2,\ldots . \end{aligned}$$

All the programs are written in Matlab 7.0 with the Optimization Toolbox and computed on a PC Desktop Intel(R) Core(TM) i5-3210M CPU 2.50 GHz, RAM 2.00 GB.

Example 1 In this example, we focus in \(\mathfrak {R}^m\) (\(m=100,~150\)) with the bifunction

$$\begin{aligned} f(x,y)=\left\langle Px+Qy+q,y-x\right\rangle , \end{aligned}$$

where \(q\in \mathfrak {R}^m\) and P,  Q are two \(m\times m\) matrices such that Q is symmetric positive semi-definite and \(Q-P\) is symmetric negative semi-definite. All the entries of Q, P, and q are in \((-2,2)\) and the feasible set is

$$\begin{aligned} C=\left\{ x\in \mathfrak {R}^m:-2\le x_i \le 5,~i=1,2,\ldots ,m\right\} . \end{aligned}$$

Experiment 1 Here the data is randomly generated such that \(Q-P\) is negative definite. We have

$$\begin{aligned} f(x,y)+f(y,x)=(x-y)^T(Q-P)(x-y)\le -\gamma ||x-y||^2,\quad \forall x,~y\in \mathfrak {R}^m, \end{aligned}$$

where \(\gamma \) is the smallest eigenvalue of \(Q-P\). In this case, the bifunction f is strongly monotone with the constant \(\gamma \). Then, the aim of this experiment is to describe the numerical behavior of the relaxed version of Algorithm 3.1 (shortly, IPPA) and Algorithm 3.2 (MIPPA). We choose \(\lambda =1\) for IPPA and \(\lambda _n=1/(n+1)\) for MIPPA. The starting point is \(x_0=x_1\) and generated randomly in the interval [0, 1]. For computing an approximation of the resolvent \(T_\lambda ^f\) of bifunction f we use the regularization algorithm [30, Algorithm A1] (a inner loop over iteration). This means that we find \(x_{n+1}\) in problem (AEP) in Remark 3.1 as a \(\delta _n\)-solution at each iteration (where \(\delta _n=\varDelta _n-a\varDelta _{n+1},~ a=2, \varDelta _n=\frac{1}{a^n}\)). These computations are obtained easily thanks to the Lipschip-type condition of f (see, [30, Lemma 5.1] for more details).

The numerical results are shown in Figs. 1 and 2 for different inertial parameters. In these figures, the y-axis represents for the value of \(D_n\) in Log-scale while the x-axis is for the execution time elapsed in seconds. We see that IPPA preforms better than MIPPA, but we need to keep in mind the theoretical advantages of MIPPA, as explained in the previous sections.

Fig. 1
figure 1

Experiment 1 in \(\mathfrak {R}^{100}\). Numbers of iterations respectively are 66, 65, 67, 75, 79, 74

Fig. 2
figure 2

Experiment 1 in \(\mathfrak {R}^{150}\). Numbers of iterations respectively are 54, 53, 52, 60, 62, 59

Experiment 2 In this experiment, we consider three settings.

Case 1 Choose \(P=Q=diag(1,2,\ldots ,m)\) and \(q=0\). In this case, the solution of problem (EP) is \(x^*=(0,0,\ldots ,0)^T\).

Case 2 Choose \(P=Q\) as full, symmetric and positive define matrix and \(q=0\). The solution of problem (EP) is again \(x^*=(0,0,\ldots ,0)^T\).

Case 3 The entries of P, Q and q are generated randomly such that Q is symmetric positive semi-definite and \(Q-P\) is symmetric negative semi-definite. The solution of problem (EP) is unknown.

In all above the cases, the bifunction f is monotone. We compare Algorithm 4.3 (shortly, MaIPPA) with four strongly convergent algorithms, namely algorithm RIPAFAN introduced in [8], the hybrid projection proximal point algorithm (HPPA1) proposed in [36, Corollary 3.1], the hybrid projection proximal point algorithm (HPPA2) suggested in [37, Theorem 3.1] and the shrinking projection proximal point algorithm (SPPA) presented in [34, Corollary 3.6].

The parameters are chosen as follows: \(\theta =0.5\), \(\beta _n=n/(2n+1)\), \(\gamma _n=1/(n+1)\), \(\epsilon _n=1/(n+1)^{1.1}\) for MaIPPA; \(\gamma _n=0.5\), \(\delta _n=\frac{1}{(n+1)^{1.1}}\), \(\beta _n=\frac{n}{2n+1}\), \(\alpha _n=1-\beta _n-\frac{1}{(n+1)^{1.1}}\), \(g=x_0\) for RIPAFAN; \(r_n=\lambda =1\) for all the algorithms. The starting point is randomly chosen as in the previous experiment.

We emphasize here that in both Cases 1 and 2, since P is symmetric, we get that \(\left\langle Px,y\right\rangle =\left\langle Py,x\right\rangle \). Thus, from the fact \(P=Q\), we obtain

$$\begin{aligned} f(x,y)= & {} \left\langle Px+Qy+q,y-x\right\rangle =\left\langle Px+Py+q,y-x\right\rangle \\= & {} \left\langle Px,y\right\rangle -\left\langle Px,x\right\rangle +\left\langle Py,y\right\rangle -\left\langle Py,x\right\rangle + \left\langle q,y\right\rangle -\left\langle q,x\right\rangle \\= & {} (y^TPy+q^Ty)-(x^TPx+q^Tx). \end{aligned}$$

Hence, our problem (EP) becomes the optimization problem on C with the optimization function \(g(x)=x^TPx+q^Tx\). Thus, the resolvent \(T_\lambda ^f(x)\) of the bifunction f is the proximal mapping \(\mathrm{prox}_{\lambda g}(x)\) of the function g(x). Thus, \(T_\lambda ^f(x)\) can be computed effectively by the Optimization Toolbox in Matlab. In the last case, the resolvent \(T_\lambda ^f(x)\) is found similarly to Experiment 1. The numerical results are shown in Figs. 3, 4, 5, 6, 7 and 8 and suggest that MaIPPA (Algorithm 4.3) preforms better than the other algorithms. In Figs. 3, 4, 5, 6, 7 and 8, as in Experiment 1, the y-axis represents for the value of \(D_n\) and \(E_n\) in Log-scale and the x-axis is for the execution time elapsed in seconds.

Fig. 3
figure 3

Experiment 2 in \(\mathfrak {R}^{100}\) (Case 1). Numbers of iterations respectively are 218, 225, 100, 99, 91

Fig. 4
figure 4

Experiment 2 in \(\mathfrak {R}^{150}\) (Case 1). Numbers of iterations respectively are 213, 203, 104, 88, 76

Fig. 5
figure 5

Experiment 2 in \(\mathfrak {R}^{100}\) (Case 2). Numbers of iterations respectively are 185, 178, 85, 86, 85

Fig. 6
figure 6

Experiment 2 in \(\mathfrak {R}^{150}\) (Case 2). Numbers of iterations respectively are 96, 99, 58, 60, 62

Fig. 7
figure 7

Experiment 2 in \(\mathfrak {R}^{100}\) (Case 3). Numbers of iterations respectively are 68, 72, 60, 59, 56

Fig. 8
figure 8

Experiment 2 in \(\mathfrak {R}^{150}\) (Case 3). Numbers of iterations respectively are 54, 55, 47, 48, 47

Example 2 Let \(C \subset \mathfrak {R}^m\) be a polyhedral convex set given by

$$\begin{aligned} C=\left\{ x\in \mathfrak {R}^m: Ax \le b\right\} \end{aligned}$$

where A is a matrix of size \(10 \times m\) and b is vector in \(\mathfrak {R}^m\) with their entries generated randomly in \((-2,2)\). Consider the bifunction f in [32] of the form \(f(x,y)=\left\langle F(x)+Qy+q,y-x\right\rangle \) where Q is a randomly generated symmetric positive semidefinite matrix, \(q \in \mathfrak {R}^m\), and \(F: \mathfrak {R}^m \rightarrow \mathfrak {R}^m\) is a mapping. For experiments, we suppose that F is of the form \(F(x)=M(x)+Dx+c\), where

$$\begin{aligned} \left\{ \begin{array}{ll} M(x)=(f_1(x),f_2(x),\ldots ,f_m(x))^T,\\ f_i(x)=x_{i-1}^2+x_i^2+x_{i-1}x_i+x_ix_{i+1},~i=1,2,\ldots ,m,\\ x_0=x_{m+1}=0, \end{array} \right. \end{aligned}$$

and \(c=(-1,-1,\ldots ,-1)\), D is a square matrix of order m, given by

$$\begin{aligned} d_{ij}= \left\{ \begin{array}{ll} 4&{}\text{ if }\quad i=j,\\ 1&{}\text{ if }\quad i-j=1,\\ -2&{}\text{ if }\quad i-j=-1,\\ 0&{}\text{ otherwise }. \end{array} \right. \end{aligned}$$

Experiment 3 In this experiment, we choose the parameters as in Experiment 2. For finding an approximation of \(T_\lambda ^f\), we use a linesearch procedure presented in [32, Algorithm 2] to find a \(\delta _n\)-approximation solution (where \(\delta _n=\varDelta _n-a\varDelta _{n+1},~ a=2, \varDelta _n=\frac{1}{a^n}\)). The using of a linesearch in general is time-consuming. The results are illustrated in Figs. 9 and 10.

Fig. 9
figure 9

Experiment 3 in \(\mathfrak {R}^{100}\). Numbers of iterations respectively are 79, 44, 54, 55, 47

Fig. 10
figure 10

Experiment 3 in \(\mathfrak {R}^{150}\). Numbers of iterations respectively are 60, 31, 44, 43, 41

6 Conclusions

In this paper, we have presented some algorithms which provide the strong convergence for solving an monotone equilibrium problem in a Hilbert space. The algorithms can be considered as the combinations between the (relaxed) proximal point method, the inertial method and the Mann-like iteration. The strong convergence of the algorithms are established under several standard conditions imposed on cost bifunction and control parameters. The using of inertial technique is the aim to improve the convergent properties of the algorithms. Several experiments are performed to support the theoretical results. The numerical results have illustrated the fast convergence of the new algorithm over some existing algorithms.