1 Preliminaries

Let (Xd) be a metric space. For \(x,y\in X\), a mapping \(c:[0,l]\rightarrow X\), where \(l\ge 0\), is called a geodesic with endpoints xy, if \(c(0)=x\), \(c(l)=y\), and \(d(c(t), c(t')) = \left| t - t'\right| \) for all \(t, t'\in [0,l]\). If, for every \(x,y\in X\), a geodesic with endpoints xy exists, then we call (Xd) a geodesic metric space. Furthermore, if there exists a unique geodesic for each \(x,y\in X\), then (Xd) is said to be uniquely geodesic.

Definition 1.1

A subset K of a uniquely geodesic space X is said to be convex when for any two points \(x,y \in K\), the geodesic joining x and y is contained in K.

For each \(x, y\in X\), the image of a geodesic c with endpoints xy is called a geodesic segment joining x and y and is denoted by [xy].

Let X be a uniquely geodesic metric space. For each \(x, y\in X\) and for each \(t\in [0, 1]\), there exists a unique point \(z\in [x, y]\) such that \(d(x, z) = td(x, y)\) and \(d(y, z) = (1 - t)d(x, y)\). We will use the notation \((1 - t)x \oplus ty\) for denoting the unique point z satisfying the above statement.

Definition 1.2

(see Dhompongsa and Panyanak 2008) A geodesic space X is called CAT(0) space if for all \(x,y,z\in X\) and \(t \in [0, 1]\) it holds that

$$\begin{aligned} d^{2}(tx\oplus (1-t)y, z)\le td^{2}(x, z) + (1-t)d^{2}(y, z) - t(1-t)d^{2}(x, y). \end{aligned}$$

A complete CAT(0) space is called a Hadamard space.

We enumerate next some relevant examples of Hadamard spaces (see Bacak 2014; Bertrand and Kloeckner 2012).

  1. 1.

    The Euclidean space \({\mathbb {R}}^n\), and more generally all Hilbert spaces. The geodesics are the line segments. Moreover, it is known that a Banach space is CAT(0) if and only if it is Hilbert.

  2. 2.

    The real hyperbolic space. We equip \({\mathbb {R}}^{n+1}\) with the inner product given by \(\langle x, y\rangle =-x_0y_0+\sum _{i=1}^nx_iy_i\), for \(x=(x_0, x_1, \ldots , x_{n})\) and \(y=(y_0, y_1, \ldots , y_{n})\). We consider the hyperbolic space \(H^n:=\left\{ x=(x_0, x_1, \ldots , x_{n})\in {\mathbb {R}}^{n+1}: \langle x, x\rangle =-1, \ \ x_{0}>0\right\} \). The Riemannian metric g on the tangent spaces \(T_pH^n \subset T_p{\mathbb {R}}^{n+1}\) for \(p\in H^n\) is defined as \(g(x,y)=\mathrm{arccosh} (-\langle x,y\rangle )\) for all \(x,y\in H^n\). The sectional curvature of \((H^n,g)\) is \(-1\) at every point.

  3. 3.

    Other hyperbolic spaces, like \({\mathbb {C}}H^n\), \({\mathbb {H}}H^n\), \({\mathbb {O}}H^2\).

  4. 4.

    More generally, the symmetric spaces of non-compact type, like the quotient \(SL(n,{\mathbb {R}})/SO(n,{\mathbb {R}})\) endowed with the metric induced by the Killing form of \(SL(n,{\mathbb {R}})\).

  5. 5.

    More generally, all simply connected Riemannian manifold with non-positive sectional curvature.

  6. 6.

    \({\mathbb {R}}\)-trees. Recall that a metric space (Xd) is an \({\mathbb {R}}\)-tree if it is uniquely geodesic and for every \(x, y, z \in X\) we have \([x,z]=[x, y] \cup [y, z]\) whenever \([x, y] \cap [y, z]={y}\).

  7. 7.

    The Hilbert ball. Let \((H,\Vert \cdot \Vert )\) be a complex Hilbert space with inner product and put \({\mathbb {B}}:=\{ x\in H: \Vert x\Vert <1\}\). We equip \({\mathbb {B}}\) with the hyperbolic metric \(\rho (x,y):=\tanh ^{-1}\sqrt{1-\sigma (x,y)}\), for all \(x,y\in {\mathbb {B}}\), where \(\sigma (x,y):=(1-\Vert x\Vert ^2)(1-\Vert y\Vert ^2)/(1-\langle x, y\rangle )\) for all \(x,y\in {\mathbb {B}}\). Now, \(({\mathbb {B}},\rho )\) is a Hadamard space. This is an infinite dimensional Hadamard space with non zero curvature.

  8. 8.

    All products of Hadamard spaces.

  9. 9.

    The gluing of any two Hadamard spaces along isometric, convex subsets; for example, any Hadamard space with an additional geodesic ray glued at some point, or three hyperbolic half-planes glued along their limiting geodesics, etc.

Berg and Nikolaev (1998, 2008) introduced the concept of quasi-linearization as follows. Let us formally denote a pair \((a, b) \in X \times X\) as \(\overrightarrow{ab}\) and call it a vector. Then quasi-linearization is defined as a map \(\langle \cdot , \cdot \rangle : (X \times X)\times (X \times X)\rightarrow {\mathbb {R}}\) defined by

$$\begin{aligned} \langle \overrightarrow{ab},\overrightarrow{cd}\rangle =\frac{1}{2}\left\{ d^2(a, d) + d^2(b, c) - d^2(a, c) - d^2(b, d)\right\} , \ \ \ \ \ \ a, b, c, d \in X. \end{aligned}$$

It is easy to see that \(\langle \overrightarrow{ab},\overrightarrow{cd}\rangle = \langle \overrightarrow{cd},\overrightarrow{ab}\rangle \), \(\langle \overrightarrow{ab},\overrightarrow{cd}\rangle = -\langle \overrightarrow{ba},\overrightarrow{cd}\rangle \) and \(\langle \overrightarrow{ax},\overrightarrow{cd}\rangle + \langle \overrightarrow{xb},\overrightarrow{cd}\rangle = \langle \overrightarrow{ab},\overrightarrow{cd}\rangle \) for all \(a, b, c, d, x \in X\). We say that X satisfies the Cauchy–Schwarz inequality if \(\langle \overrightarrow{ab},\overrightarrow{cd}\rangle \le d(a, b)d(c, d)\) for all \(a, b, c, d \in X\). It is known (Corollary 3 of Berg and Nikolaev 2008) that a geodesically connected metric space is a CAT(0) space if and only if it satisfies the Cauchy–Schwarz inequality.

Let (Xd) be a Hadamard space and \(\{x^n\}\) be a bounded sequence in X. Take \(x\in X\). Let \(r(x,\{x^n\})=\limsup _{n\rightarrow \infty } d(x,x^n)\). The asymptotic radius of \(\{x^n\}\) is given by

$$\begin{aligned} r(\{x^n\})= \inf \{r(x,\{x^n\})| x\in X \} \end{aligned}$$

and the asymptotic center of \(\{x^n\}\) is the set \(A(\{x^n\})=\{x \in X| r(x,\{x^n\})=r(\{x^n\})\}\). It is known that in a Hadamard space, \(A(\{x^n\})\) consists exactly one point.

Definition 1.3

(see Kirk and Panyanak 2008, p. 3690) A sequence \(\{x^n\}\) in a Hadamard space (Xd) \(\Delta \)-converges to \(x\in X\) if \(A(\{x^{n_k}\})=\{x\}\), for each subsequence \(\{x^{n_k}\}\) of \(\{x^n\}\).

We denote \(\Delta \)-convergence in X by \(\overset{\Delta }{\longrightarrow }\) and the metric convergence by \(\rightarrow \).

We present next two known results related to the notion of \(\Delta \)-convergence.

Lemma 1.4

(see Kirk and Panyanak 2008, Proposition 3.6) Let X be a Hadamard space. Then, every bounded, closed and convex subset of X is \(\Delta \)-compact, i.e. every bounded sequence in it, has a \(\Delta \)-convergent subsequence.

Lemma 1.5

(see Dhompongsa and Panyanak 2008) Let (Xd) be a CAT(0) space. Then, for all \(x, y, z \in X\) and \(t \in [0, 1]\):

$$\begin{aligned} d(tx\oplus (1-t)y, z)\le td(x, z) + (1-t)d(y, z). \end{aligned}$$

Let \(C\subset X\) be nonempty, closed and convex. It is well known for any \(x\in X\) there exists a unique \(u\in C\) such that

$$\begin{aligned} d(u,x)=\inf \{d(z,x): z\in C\}. \end{aligned}$$
(1.1)

We define the projection onto C, \(P_C:X\rightarrow C\), by taking as \(P_C(x)\) the unique \(u\in C\) which satisfies (1.1). We give next a characterization of the projection.

Proposition 1.6

(see Dehghan and Rooin 2013) Let C be a nonempty convex subset of a CAT(0) space X, \(x\in X\) and \(u\in C\). Then \(u=P_C(x)\) if and only if

$$\begin{aligned} \langle \overrightarrow{yu},\overrightarrow{xu}\rangle \le 0, \end{aligned}$$

for all \(y\in C\).

A function \(h:X\rightarrow (-\infty ,\infty ]\) is called:

  1. (i)

    convex iff

    $$\begin{aligned} h(t x\oplus (1-t)y)\le t h(x)+(1-t)h(y),\ \ \forall x,y\in X \ \text {and}\ t\in (0,1) \end{aligned}$$
  2. (ii)

    strictly convex iff

    $$\begin{aligned} h(t x\oplus (1-t)y)<t h(x)+(1-t)h(y),\ \ \forall x,y\in X, \ x\ne y \ \text {and}\ t\in (0,1). \end{aligned}$$

    It is easy to see that each strictly convex function has at most one minimizer on X.

Take a closed and convex set \(K\subset X\) and \(f:X\times X\rightarrow {\mathbb {R}}\) such that

  1. B1:

    \(f(x,x)=0\) for all \(x\in X\),

  2. B2:

    \(f(\cdot ,\cdot ):X\times X\rightarrow {\mathbb {R}}\) is continuous,

  3. B3:

    \(f(x,\cdot ):X\rightarrow {\mathbb {R}}\) is convex for all \(x\in X\).

The equilibrium problem EP(fK) consists of finding \(x^*\in K\) such that \(f(x^*,y)\ge 0\) for all \(y\in K\). The set of solutions of EP(fK) will be denoted as S(fK).

For convergence of the extragradient method, some monotonicity assumptions on the bifunction f are needed. We define next two such properties for future reference: the bifunction f is said to be monotone if \(f(x,y)+f(y,x) \le 0\) for all \(x,y \in X\), and pseudo-monotone if for any pair \(x,y\in X\), \(f(x,y) \ge 0\) implies \(f(y,x) \le 0\).

The equilibrium problem encompasses, among its particular cases, convex optimization problems, variational inequalities (monotone or otherwise), Nash equilibrium problems, and other problems of interest in many applications.

Equilibrium problems with monotone and pseudo-monotone bifunctions have been studied extensively in Hilbert, Banach as well as in topological vector spaces by many authors (e.g. Bianchi and Schaible 1996; Chadli et al. 2000; Combettes and Hirstoaga 2005; Iusem et al. 2009; Iusem and Sosa 2010). Recently the second author and Khatibzadeh have studied optimization and equilibrium problems in Hilbert and Hadamard spaces (see Khatibzadeh and Mohebbi 2016, 2019a, b; Khatibzadeh et al. 2017). Also, some authors have studied equilibrium problems on Hadamard manifolds (see Colao et al. 2012; Noor and Noor 2012).

We make now a brief reference to existence results for equilibrium problems in Hadamard spaces. It has been proved in Theorem 2.4 of Khatibzadeh and Mohebbi (2019b) that EP(fK) has solutions whenever X is a Hadamard space, K is convex, and f, besides B1–B3 above, satisfies the following two conditions:

  1. (i)

    f is properly quasi-monotone, meaning that for any finite subset \(L\subset X\) it holds that \(\max _{x\in L}f(x,y)\le 0\) for all y belonging to the convex hull of L (the convex hull being understood in the sense of Definition 1.1).

  2. (ii)

    f is coercive in K, meaning that there exists \(z\in K\) such that for any sequence \(\{x^k\}\subset K\) such that \(\lim _{k\rightarrow \infty }d(x^k,z)=\infty \), there exists \(u\in K\) satisfing \(f(x^k,u)\le 0\) for large enough k.

This result extends a similar one, on existence of solutions for equilibrium problems in Hilbert spaces, proved in Iusem et al. (2009).

We will deal in this paper with the extragradient (or Korpelevich’s) method for equilibrium problems in Hadamard spaces, and thus we start with an introduction to its well known finite dimensional formulation when applied to variational inequalities, i.e., we assume that \(X={\mathbb {R}}^n\) with the Euclidean distance and \(f(x,y)=\langle T(x),y-x\rangle \) with \(T:{\mathbb {R}}^n\rightarrow {\mathbb {R}}^n\). It is easy to check that for this f, EP(fK) reduces to the variational inequality problem VIP(TK), consisting of finding \(x^*\in K\) such that \(\langle T(x^*),x-x^*\rangle \ge 0\) for all \(x\in K\). Several iterative methods for solving the finite dimensional VIP(TK) have been proposed. A basic one is the natural extension of the projected gradient method for optimization problems, substituting the operator T for the gradient, which generates a sequence \(\{x^k\}\subset {\mathbb {R}}^n\) through:

$$\begin{aligned} x^{k+1}=P_K(x^k-\alpha _kT(x^k)), \end{aligned}$$
(1.2)

where \(\alpha _k\) is a positive stepsize and \(P_K\), is the orthogonal projection onto K. This method converges under quite strong hypothesis, namely Lipschitz continuity and strong monotonicity of T, i.e.

$$\begin{aligned} \left\| T(x)- T(y)\right\| \le L \left\| x-y\right\| ~~~\forall ~x,y\in {\mathbb {R}}^n, \end{aligned}$$

and

$$\begin{aligned} \langle T(x)- T(y), x-y\rangle \ge \sigma \left\| x-y\right\| ^2~~~\forall ~x,y\in {\mathbb {R}}^n, \end{aligned}$$

where \(L>0\) and \(\sigma >0\) are the Lipschitz and strong monotonicity constants respectively. With these assumptions, the sequence generated by (1.2) converges to a solution of VIP(TK) (provided that the problem has solutions), as long as the stepsizes \(\alpha _k\) satisfy \(\alpha _k=\alpha \in (0,2\sigma /L^2)\) for all k (see e.g., Bertsekas and Tsitsiklis 1989; Fang 1980).

When T is just monotone, i.,e.

$$\begin{aligned} \langle T(x)- T(y), x-y\rangle \ge 0~~~\forall ~x,y\in {\mathbb {R}}^n, \end{aligned}$$

then the situation changes: the generated sequence may diverge independently of the choice of the stepsizes \(\alpha _k\). Take e.g. \(E=K={\mathbb {R}}^2\) and assume that T is a rotation with a \(\pi /2\) angle. T is certainly monotone and Lipschitz continuous. The unique solution of VIP(TK) is the origin, but (1.2) gives rise to a sequence satisfying \(\left\| x^{k+1}\right\| >\left\| x^k\right\| \) for all k. To deal with this situation, Korpelevich introduced in Korpelevich (1976) a two-step algorithm, namely:

$$\begin{aligned} y^k= & {} P_K(x^k-\alpha _kT(x^k)), \end{aligned}$$
(1.3)
$$\begin{aligned} x^{k+1}= & {} P_K(x^k-\alpha _kT(y^k)). \end{aligned}$$
(1.4)

The geometric motivation behind this procedure is the following: Let \(H_k=\{x\in {\mathbb {R}}^n: \langle T(y^k), x-y^k\rangle =0\}\), with \(y^k\) as in (1.3). It is easy to check that, by monotonicity of T, \(H_k\) separates \(x^k\) from the solution set S(TK). Thus, if \(\alpha _k\) is small enough, the point \(x^{k+1}\) defined by (1.4) is obtained by moving first from \(x^k\) in the direction of its projection onto a hyperplane separating it from the solution set (achieving the point \(x^k-\alpha _kT(y^k)\)), and then projecting the resulting point onto K, which contains S(TK). Hence, \(x^{k+1}\) is closer than \(x^k\) to any solution. This property, called Fejér monotonicity of \(\{x^k\}\) with respect to S(TK), is the basis of the convergence analysis. If T is Lipschitz continuous with constant L and VIP(TK) has solutions, then the sequence generated by (1.3)–(1.4) converges to a solution of VIP(TK) provided that \(\alpha _k=\alpha \in (0, 1/L)\) (see Korpelevich 1976).

In the absence of Lipschitz continuity of T, it is natural to perform a linesearch to find appropriate stepsize, as is done in the following method:

Take \(\delta \in (0,1)\), \({\hat{\beta }}\), \({\tilde{\beta }}\) satisfying \(0<{\hat{\beta }}\le {\tilde{\beta }}\), and a sequence \(\{\beta _k\}\subseteq [{\hat{\beta }},{\tilde{\beta }}]\). The method is initialized with any \(x^0\in K\) and the iterative step is as follows:

Given \(x^k\) define

$$\begin{aligned} z^k:= x^k-\beta _k T(x^k) . \end{aligned}$$
(1.5)

If \(x^k=P_K(z^k)\) stop. Otherwise take

$$\begin{aligned} j(k):= & {} \min \left\{ \,\,j\ge 0:\left\langle T(2^{-j}P_K(z^k)+(1-2^{-j})x^k),x^k-P_K(z^k)\right\rangle \nonumber \right. \\\ge & {} \left. \frac{\delta }{\beta _k}\,\Vert x^k-P_K(z^k)\Vert ^2\,\,\right\} , \end{aligned}$$
(1.6)
$$\begin{aligned} \alpha _k:= & {} 2^{-j(k)}, \end{aligned}$$
(1.7)
$$\begin{aligned} y^k:= & {} \alpha _k P_K(z_k) + (1-\alpha _k)x^k, \end{aligned}$$
(1.8)
$$\begin{aligned} H_k:= & {} \left\{ z\in {\mathbb {R}}^n\,:\,\langle z-y^k,T(y^k)\rangle =0\right\} , \end{aligned}$$
(1.9)
$$\begin{aligned} x^{k+1}:= & {} P_K\left( P_{H_k}( x^k)\right) . \end{aligned}$$
(1.10)

Note that along the search for \(\alpha _k\) the right hand side of (1.6) is kept constant, which is quite convenient from a computational viewpoint, and that no orthogonal projections onto K are required during the inner loop: there are only two projections onto K per iteration, namely in the computation of \(z^k\) and \(x^{k+1}\), exactly as in the original method (1.3)–(1.4). This method converges to a solution of VIP(TK) under the only assumptions of monotonicity of T and existence of solutions; see Iusem and Svaiter (1997).

The above backtracking procedure for determining the right \(\alpha _k\) is sometimes called an Armijo-type search (see Armijo 1966). It has been analyzed for VIP(TK) in Konnov (1993a) and Iusem and Svaiter (1997). Other variants of Korpelevich’s method can be found in Khobotov (1987) and Marcotte (1991).

Extensions of Korpelevich’s method to the point-to-set setting (in which case Lipschitz continuity assumptions must be carefully reworked, see e.g. Robinson and Lu 2008), can be found in Bao and Khanh (2005), Iusem and Lucambio Pérez (2000), Konnov (1993b) and Konnov (2001). All these references deal with finite dimensional spaces.

The extragradient method for solving variational inequalities (1.5)–(1.10) has been extended to Banach spaces in Iusem and Nasri (2011). An extragradient method for nonsmooth equilibrium problems in Banach spaces has been recently studied in Iusem and Mohebbi (2018).

The extragradient method has recently been extended also to the case of variational inequalities on Hadamard manifolds (see Tang and Huang 2012). A variant of the method in Tang and Huang (2012) can be found in Tang et al. (2015), where a regularization scheme is added, consisting of the projection of the iterates onto the intersection of two halfspaces.

We comment next on the difference between these two references and our results in this paper. First, our method applies to equilibrium problems, which are a quite more general class than variational inequalities. We remark that, as pointed out in Iusem and Mohebbi (2018), monotone equilibrium problems in Banach spaces can be reduced to variational inequalities, through the introduction of the so called diagonal subdifferential operator\(U^f\) (see also Iusem and Sosa 2010). This reduction does not work in Hadamard spaces, where there is no clear way of defining the diagonal subdifferential operator and establishing its properties (e.g. maximal monotonicity). Hence, the results in Tang and Huang (2012) cannot be applied for solving equilibrium problems in Hadamard spaces.

Also Hadamard spaces are substantially more general than Hadamard manifolds, which are intrinsically finite dimensional, as commented after Definition 1.2.

In connection with the results in Tang et al. (2015), their above described regularization procedure cannot be easily extended to Hadamard spaces. Our regularization procedure, through a Halpern regularization scheme (which gives raise to a strong convergence result), is fully unrelated to the scheme in Tang et al. (2015).

Other versions of extragradient methods in Hadamard manifolds, for finding zeros of monotone vector fields and solutions of variational inequalities, both rather unrelated to our approach, can be found in Ferreira et al. (2005) and Batista et al. (2019) respectively.

Since one of the main contributions of this paper is the extension of the extragradient method for equilibrium problems from Hadamard manifolds to Hadamard spaces, it is worthwhile to provide an example of an equilibrium problem in a Hadamard space which is not a Hadamard manifold.

Let X be the space \({\mathbb {R}}^n\) with the radial metric (also called French railroad metric), where the distance \(\rho :X\times X\rightarrow {\mathbb {R}}\) is defined as:

$$\begin{aligned} \rho (x,y)= {\left\{ \begin{array}{ll} \left\| x-y\right\| \quad \qquad \qquad \mathrm{if}\,\,\, x, y\,\,\, \mathrm{are\,\,\, colinear} \\ \left\| x\right\| +\left\| y\right\| \qquad \qquad \mathrm{otherwise}. \end{array}\right. } \end{aligned}$$
(1.11)

It is known that the metric space \((X,\rho )\) is a \({\mathbb {R}}\)-tree space (see Kirk 2007, p 197), and \({\mathbb {R}}\)-tree spaces, as mentioned above, are Hadamard spaces (e.g. Bertrand and Kloeckner 2012). For \(x\ne 0\) and \(t\in {\mathbb {R}}_{++}\), let \({\bar{B}}(x,t)\) be the open ball with center x and radius t in the radial metric, and B(xt) the Euclidean open ball with the same center and radius. Also, for \(x\in {\mathbb {R}}^n\), let \([0,x)=\{sx:s\in [0,1)\}\). It is easy to check that \({\bar{B}}(x,\left\| x\right\| +t)=B(x,t)\cup [0,(\left\| x\right\| +t)x)\), i.e. a ball with a “spike”. Clearly, this radial open ball is not locally homeomorphic to either \({\mathbb {R}}\) or \({\mathbb {R}}^n\), and hence X is neither a Riemannian manifold nor, “a fortiori” a Hadamard manifold. It is also easy to check that the nonnegative orthant \(R^n_+\) is a convex set of (\(X,\rho \)).

Let now (Xd) be a Hadamard space. Take \(z_1, \ldots , z_m\in X\) and \(w_1, \ldots , w_m\in {\mathbb {R}}_{++}\) satisfying \(\sum _{i=1}^mw_i=1\). Given a convex subset \(K\subset X\), we define the K-constrained geometric median of \(z_1,\ldots ,z_m\) as:

$$\begin{aligned} \mathrm{argmin}_{x\in K}\left\{ \sum _{i=1}^mw_id(x,z_i): x\in X\right\} \end{aligned}$$

and the K-Fréchet mean of \(z_1,\ldots ,z_m\) as:

$$\begin{aligned} \mathrm{argmin}_{x\in K}\left\{ \sum _{i=1}^mw_id^2(x,z_i): x\in X\right\} . \end{aligned}$$

Define \(g,h: X\rightarrow {\mathbb {R}}\) as \(g(x)= \sum _{i=1}^mw_id(x,z_i)\), \(h(x)= \sum _{i=1}^mw_id^2(x,z_i)\).

It is clear that fg are proper, convex and continuous. If we define the bifunction \(f:X\times X\rightarrow {\mathbb {R}}\) as either \(f(x,y)=g(y)-g(x)\) or \(f(x,y)=h(y)-h(x)\), where (Xd) is \({\mathbb {R}}^n\) with the radial metric, and take K as the nonnegative orthant of \({\mathbb {R}}^n\), then the equilibrium problem EP(fK) satisfies all the assumptions required for convergence of our algorithm, which thus can be used for computing constrained geometric medians and Fréchet means in \({\mathbb {R}}^n\) with the radial metric. We comment that, since EP(fK) is indeed an optimization problem, the second step (1.6)–(1.10) of the extragradient method is superfluous (the gradient method is sufficient for convex optimization problems). Of course, examples can be given of monotone bifunctions in the radial space, satisfying the convergence assumptions of our method, and such that the second step of the method is indeed essential for convergence.

In Sect. 2, we will present an extragradient method for equilibrium problems in Hadamard spaces, and prove \(\Delta \)-convergence of the generated sequence to a solution of the equilibrium problem, assuming pseudomonotonicity of the bifunction. In Sect. 3, we propose a variant of the extragradient method for which the generated sequence can be shown to be strongly convergent to an equilibrium point, when the bifunction is pseudomonotone.

We will add now some additional conditions on the bifunction f, besides B1–B3 defined above, which will be needed in the convergence analysis.

  1. B4:

    f is pseudo-monotone, i.e. whenever \(f(x,y)\ge 0\) with \(x,y\in X\), it holds that \(f(y,x)\le 0\).

  2. B5:

    f is Lipschitz continuous on bounded sets, meaning that for any bounded set \(B\subset X\), there exists \(M>0\) such that \( \left| f(x,y)-f(x,z)\right| \le M d(y,z)\) and \(\left| f(x,y)-f(z,y)\right| \le M d(x,z) \) for all \(x,y,z \in B\).

  3. B6:

    \(f(\cdot ,y)\) is \(\Delta \)-upper semicontinuous for all \(y\in X\).

It is well known that a concave and upper semicontinuous function is always \(\Delta \)-upper semicontinuous.

We make some comments now on these assumptions. In connection with B4, we mention that some monotonicity-like property is needed for convergence of all variants of the extragradient method. In the case of EP, the usual assumption is monotonicity of f meaning that \(f(x,y)+f(y,x)\le 0\) for all \(x,y\in X\). Clearly, B4 is weaker than monotonicity. Regarding B5, note that it holds when, for all bounded set B, \(f(x,\cdot )\) and \(f(\cdot , x)\) are Lipschitz continuous on B for all \(x\in B\), say with Lipschitz constant \(L_x\), and \(L_x\) is bounded on B. In connection with B6, we emphasize that it holds when \(f(\cdot ,y)\) is concave for all \(y\in X\). In view of B3, our analysis covers the very important concave-convex case (e.g., the problem of finding saddle points of the Lagrangian of constrained convex optimization problems).

2 Extragradient method with linesearch and \(\Delta \)-convergence

We start with a Hadamard space X, a closed and convex set \(K\subset X\) and a bifunction \(f:X\times X\rightarrow {\mathbb {R}}\) which satisfies B1–B3. We consider the equilibrium problem EP(fK) as defined in Sect. 1, and propose the following Extragradient Method with Linesearch (EML) for solving this problem.

Take \(\delta \in (0,1)\), \({{\hat{\beta }}}\), \({{\tilde{\beta }}}\) satisfying \(0<{\hat{\beta }}\le {\tilde{\beta }}\), and a sequence \(\{\beta _k\}\subseteq [{{\hat{\beta }}},{{\tilde{\beta }}}]\).

1. Initialization:

$$\begin{aligned} x^0 \in K. \end{aligned}$$
(2.1)

2. Iterative step: Given \(x^k\), define

$$\begin{aligned} z^k \in \mathrm{Argmin}_{y\in K}\left\{ f(x^k,y)+ \frac{1}{2\beta _k}d^2(y, x^k)\right\} . \end{aligned}$$
(2.2)

If \(x^k=z^k\) stop. Otherwise, let

$$\begin{aligned} \ell (k)=\min \left\{ \ell \ge 0: \beta _kf(y^{\ell },x^k)-\beta _kf(y^{\ell },z^k) \ge \frac{\delta }{2}d^2(z^k,x^k)\right\} , \end{aligned}$$
(2.3)

where

$$\begin{aligned} y^{\ell }=2^{-\ell }z^k\oplus (1-2^{-\ell })x^k. \end{aligned}$$
(2.4)

We take

$$\begin{aligned} \alpha _k:= & {} 2^{-\ell (k)}, \end{aligned}$$
(2.5)
$$\begin{aligned} y^k:= & {} \alpha _k z^k\oplus (1-\alpha _k)x^k, \end{aligned}$$
(2.6)
$$\begin{aligned} w^k= & {} P_{H_k}(x^k), \end{aligned}$$
(2.7)

where

$$\begin{aligned} H_k=\{y\in X: f(y^k,y)\le 0\}. \end{aligned}$$

Finally we define

$$\begin{aligned} x^{k+1}=P_K(w^k). \end{aligned}$$
(2.8)

We start the analysis of the algorithm with some elementary properties of EML.

Proposition 2.1

The sequence \(\{ z^k\}\) is well defined.

Proof

It is a consequence of Lemma 3.1.2 in Jost (1997) (see also Lemma 2.2.19 in Bacak 2014). \(\square \)

Proposition 2.2

Assume that f satisfies B3. Take \(x\in K, \beta \in {\mathbb {R}}^+\). If

$$\begin{aligned} z\in \mathrm{Argmin}_{y\in K}\left\{ f(x,y)+ \frac{1}{2\beta }d^2(y,x)\right\} \end{aligned}$$
(2.9)

then \(\frac{1}{2}\left\{ d^2(z,x)-d^2(y,x)+d^2(y,z)\right\} \le \beta [f(x,y)-f(x,z)]\) for all \(y\in K\).

Proof

Note that

$$\begin{aligned} f(x,z)+\frac{1}{2\beta }d^2(z,x) \le f(x,y)+\frac{1}{2\beta }d^2(y,x) \end{aligned}$$
(2.10)

for all \(y\in K\). Now, taking \(w=ty\oplus (1-t)z\) with \(t\in (0, 1]\) in (2.10), we have:

$$\begin{aligned} f(x,z)+\frac{1}{2\beta }d^2(z,x)&\le f(x,w)+\frac{1}{2\beta }d^2(w,x)\nonumber \\&=f(x,ty\oplus (1-t)z)+\frac{1}{2\beta }d^2(ty\oplus (1-t)z,x)\nonumber \\&\le tf(x,y)+(1-t)f(x,z)\nonumber \\&\quad +\frac{1}{2\beta }\{td^2(y,x)+(1-t)d^2(z,x)-t(1-t)d^2(y,z)\}. \end{aligned}$$
(2.11)

It follows from (2.11) that

$$\begin{aligned} \frac{1}{2\beta }\left\{ d^2(z,x)-d^2(y,x)+(1-t)d^2(y,z)\right\} \le f(x,y)-f(x,z). \end{aligned}$$
(2.12)

Taking limits in (2.12) with \(t\rightarrow 0^+\), we obtain

$$\begin{aligned} \frac{1}{2}\left\{ d^2(z,x)-d^2(y,x)+d^2(y,z)\right\} \le \beta \left[ f(x,y)-f(x,z)\right] . \end{aligned}$$
(2.13)

\(\square \)

Corollary 2.3

Assume that \(\{x^k\}\) and \(\{z^k\}\) are the sequences generated by EML. Then

$$\begin{aligned} \frac{1}{2}\left\{ d^2(z^k,x^k)-d^2(y,x^k)+d^2(y,z^k)\right\} \le \beta _k\left[ f(x^k,y)-f(x^k,z^k)\right] , \ \ \ \ \ \ \ \forall y\in K. \end{aligned}$$

Proof

Follows from Proposition 2.2 and (2.2). \(\square \)

Proposition 2.4

If Algorithm EML stops at the k-th iteration then \(x^k\) is a solution of EP(fK).

Proof

Follows from Corollary 2.3. \(\square \)

Proposition 2.5

The following statements hold for Algorithm EML.

  1. (i)

    \(\ell (k)\) is well defined, (i.e. the Armijo-type search for \(\alpha _k\) is finite), and consequently the same holds for the sequence \(\{x^k\}\).

  2. (ii)

    \(x^k\in K\) for all \(k\ge 0\).

  3. (iii)

    If the algorithm does not stop at iteration k, then \(f(y^k,x^k)>0\).

Proof

  1. (i)

    We proceed inductively, i.e. we assume that \(x^k\) is well defined, and proceed to establish that the same holds for \(x^{k+1}\). Note that \(z^k\) is well defined by Proposition 2.1. It suffices to check that \(\ell (k)\) is well defined. Assume by contradiction that

    $$\begin{aligned} \beta _k[f(y^{\ell },x^k)-f(y^{\ell },z^k)] < \frac{\delta }{2}d^2(z^k,x^k) \ \ \ \ \ \forall \ell \ge 0. \end{aligned}$$
    (2.14)

    Note that the sequence \(\{y^\ell \}\) is strongly convergent to \(x^k\). In view of B2, taking limits in (2.14) as \(\ell \rightarrow \infty \),

    $$\begin{aligned} \beta _k[f(x^k,x^k)-f(x^k,z^k)] \le \frac{\delta }{2}d^2(z^k,x^k). \end{aligned}$$
    (2.15)

    Since \(x^k\in K\) by (2.8), we apply Corollary 2.3 with \(y=x^k\) in (2.15), obtaining

    $$\begin{aligned} d^2(z^k,x^k)\le \frac{\delta }{2}d^2(z^k,x^k). \end{aligned}$$
    (2.16)

    Since \(\delta \in (0,1)\), we get a contradiction.

  2. (ii)

    It follows from (2.1) and (2.8).

  3. (iii)

    Assume that \(f(y^k,x^k)\le 0\). Note that, using B1, B3 and (2.6),

    $$\begin{aligned} 0=f(y^k,y^k)\le \alpha _kf(y^k,z^k)+(1-\alpha _k)f(y^k,x^k). \end{aligned}$$

    Hence \(f(y^k,z^k)\ge 0\). On the other hand, by (2.3)–(2.6),

    $$\begin{aligned} f(y^k,x^k)\ge f(y^k,z^k)+\frac{\delta }{2\beta _k}d^2(z^k,x^k)>f(y^k,z^k)\ge 0, \end{aligned}$$
    (2.17)

    in contradiction with the assumption. Note that the strict inequality in (2.17) is due to the fact that \(x^k\ne z^k\). \(\square \)

We continue with an intermediate lemma.

Lemma 2.6

(see Ahmadi Kakavandi and Amini 2010) Let \(h: X\rightarrow (-\infty , \infty ]\) be a proper, lower semicontinuous, convex function. Denote as clD(h) the closure of the domain of h. Then, for every \(b\in clD(h)\) and \(t\in {\mathbb {R}}\) with \(t<h(b)\), there exist a point \(a\in X\) and a real number s with \(t<s\le h(b)\) such that

$$\begin{aligned} h(x)\ge \frac{1}{s-t}\langle \overrightarrow{ab},\overrightarrow{ax}\rangle +s \end{aligned}$$

for all \(x\in X\). Moreover \(d(a,b)< h(b)-t\).

We continue the analysis of the convergence properties of ELM. Recall that S(fK) denotes the solution set of EP(fK).

Proposition 2.7

Assume that the bifunction f satisfies B1–B5 and that \(S(f, K)\ne \emptyset \). Let \(\{x^k\}\), \(\{y^k\}\), \(\{z^k\}\) and \(\{w^k\}\) be the sequences generated by Algorithm EML. If the algorithm does not have finite termination, then

  1. (i)

    The sequence \(\{d(x^*,x^k)\}\) is nonincreasing (and henceforth convergent) for any \(x^*\in S(f,K)\).

  2. (ii)

    The sequence \(\{x^k\}\) is bounded, and therefore it has \(\Delta \)-cluster points.

  3. (iii)

    \(\lim _{k\rightarrow \infty } d(w^k,x^k)=0\).

  4. (iv)

    The sequence \(\{z^k\}\) is bounded.

  5. (v)

    \(\lim _{k\rightarrow \infty } f(y^k,x^k)=0\)

Proof

  1. (i)

    Take \(x^*\in S(f,K)\). By B4, \(x^*\in H_k\) for all k. By Proposition 2.5(iii), \(x^k\not \in H_k\). Also, we have \(w^k=P_{H_k}(x^k)\). Using Proposition 1.6,

    $$\begin{aligned} \langle \overrightarrow{x^*w^k},\overrightarrow{x^kw^k}\rangle \le 0, \end{aligned}$$

    that is to say,

    $$\begin{aligned} d^2(w^k,x^k)+d^2(x^*,w^k)-d^2(x^*,x^k)\le 0. \end{aligned}$$
    (2.18)

    Now, since \(x^{k+1}=P_K(w^k)\), again Proposition 1.6 implies that

    $$\begin{aligned} \langle \overrightarrow{x^*x^{k+1}},\overrightarrow{w^kx^{k+1}}\rangle \le 0 \end{aligned}$$

    or equivalently,

    $$\begin{aligned} d^2(x^{k+1},w^k)+d^2(x^*,x^{k+1})-d^2(x^*,w^k)\le 0. \end{aligned}$$
    (2.19)

    In view of (2.18) and (2.19), we have

    $$\begin{aligned} d^2(x^*,x^{k+1})\le & {} d^2(x^*,w^k)-d^2(x^{k+1},w^k)\le d^2(x^*,w^k)\nonumber \\\le & {} d^2(x^*,w^k)+d^2(w^k,x^k)\le d^2(x^*,x^k). \end{aligned}$$
    (2.20)
  2. (ii)

    In view of (i), \(\lim _{k\rightarrow \infty }d(x^*,x^k)\) exists, so that \(\{x^k\}\) is bounded.

  3. (iii)

    It follows from (i) and (2.20).

  4. (iv)

    Since \(x^k\in K\) by (2.8), we get from (2.2)

    $$\begin{aligned} f(x^k,z^k)+\frac{1}{2\beta _k}d^2(z^k, x^k)\le f(x^k,x^k)+ \frac{1}{2\beta _k}d^2(x^k, x^k)=0. \end{aligned}$$
    (2.21)

    It follows from (2.21) that

    $$\begin{aligned} d^2(z^k, x^k)\le -2\beta _kf(x^k,z^k). \end{aligned}$$
    (2.22)

    Take a negative real number t, so that \(t< f(x^k,x^k)=0\). Since \(f(x^k,\cdot )\) is convex, proper and lower semicontinuous, by Lemma 2.6 there exists \(v^k\in X\) and a real number \(t< s_k \le 0 = f(x^k,x^k)\) such that

    $$\begin{aligned} f(x^k,y)\ge \frac{1}{s_k-t}\langle \overrightarrow{v^kx^k},\overrightarrow{v^ky}\rangle +s_k \end{aligned}$$
    (2.23)

    for all \(y\in K\). Let \(B(x^k,1)\) be the closed ball of radius 1 centered at \(x^k\). Since f is bounded on bounded sets by B5, and \(\{x^k \}\) is bounded by item (ii), there exists \(M>0\) such that \(f(x^k,y)<M\) for all k and for all \(y\in B(x^k,1)\). Therefore,

    $$\begin{aligned} \sup _{y\in B(x^k,1)} \Big ( \frac{1}{s_k-t}\langle \overrightarrow{v^kx^k },\overrightarrow{v^ky }\rangle +s_k \Big ) \le \sup _{y\in B(x^k,1)}f(x^k,y)\le M. \end{aligned}$$
    (2.24)

    It follows from (2.24) that the sequences \(\left\{ \frac{1}{s_k-t}d(v^k,x^k)\right\} \) and \(\left\{ \frac{1}{s_k-t}d^2(v^k,x^k)\right\} \) are bounded. Therefore, setting \(y=z^k\) in (2.23), we have that \(-f(x^k,z^k)\le \frac{-1}{s_k-t}\langle \overrightarrow{v^kx^k },\overrightarrow{v^kz^k }\rangle -s_k\). Applying Cauchy–Schwarz inequality, we obtain

    $$\begin{aligned} -f(x^k,z^k)\le \frac{1}{s_k-t}d(v^k,x^k)d(v^k,z^k)-s_k. \end{aligned}$$
    (2.25)

    From (2.22) and (2.25), we get

    $$\begin{aligned} d^2(z^k, x^k)\le & {} \frac{2\beta _k}{s_k-t}d(v^k,x^k)d(v^k,z^k)-s_k\nonumber \\\le & {} \frac{2\beta _k}{s_k-t}d(v^k,x^k)\Big (d(v^k,x^k)+d(x^k,z^k)\Big )-s_k. \end{aligned}$$
    (2.26)

    Now, since the sequences \(\{s_k\}\), \(\{x^k\}\), \(\left\{ \frac{1}{s_k-t}d(v^k,x^k)\right\} \) and \(\left\{ \frac{1}{s_k-t}d^2(v^k,x^k)\right\} \) are bounded, boundedness of \(\{z^k\}\) follows from (2.26).

  5. (v)

    By (iii), we have \(\lim _{k\rightarrow \infty } d(w^k,x^k)=0\). Since \(\{x^k\}\), \(\{y^k\}\), and \(\{w^k\}\) are bounded, we invoke B5, obtaining some \(M>0\) such that \(\left| f(y^k,x^k)-f(y^k,w^k)\right| \le M d(x^k,w^k)\). We conclude from item (iii) that

    $$\begin{aligned} \lim _{k\rightarrow \infty }\left| f(y^k,x^k)- f(y^k,w^k)\right| =0. \end{aligned}$$
    (2.27)

    Note that if the algorithm does not stop at iteration k, then \(f(y^k,x^k)>0\), by Proposition 2.5(iii). Also, since \(w^k\in H_k\), we have \(f(y^k,w^k)\le 0\) for all k. These two inequalities, together with (2.27), easily imply that

    $$\begin{aligned} \lim _{k\rightarrow \infty } f(y^k,x^k)= \lim _{k\rightarrow \infty } f(y^k,w^k)=0. \end{aligned}$$

    \(\square \)

Definition 2.8

We say that \(\{x^k\}\) is an asymptotically solving sequence for EP(fK) if

$$\begin{aligned} \liminf _{k\rightarrow \infty }f(x^k,y)\ge 0 \end{aligned}$$

for all \(y\in K\).

We continue with two technical results to be used for proving that EML generates an asymptotically solving sequence.

Proposition 2.9

Assume that the bifunction f satisfies B1–B5 and that \(S(f, K)\ne \emptyset \). Let \(\{x^k\}\) and \(\{z^k\}\) be the sequences generated by Algorithm EML. If \(\{x^{k_i}\}\) is a subsequence of \(\{x^k\}\) satisfying

$$\begin{aligned} \lim _{i\rightarrow \infty }d(z^{k_i},x^{k_i})=0, \end{aligned}$$
(2.28)

then \(\{x^{k_i}\}\) is an asymptotically solving sequence for EP(fK).

Proof

Since \(\{x^k\}\) and \(\{z^k\}\) are bounded, B5 entails the existence of some \(M>0\) such that

$$\begin{aligned} \left| f(x^{k_i},z^{k_i})\right| = \left| f(x^{k_i},z^{k_i})-f(x^{k_i},x^{k_i})\right| \le M d(z^{k_i},x^{k_i}), \end{aligned}$$
(2.29)

using B1 in the equality. From (2.29), we obtain

$$\begin{aligned} \lim _{i\rightarrow \infty } f(x^{k_i},z^{k_i})=0. \end{aligned}$$
(2.30)

Take now any \(y\in K\). By Corollary 2.3, we have

$$\begin{aligned} \frac{1}{2}\left\{ d^2(z^{k_i},x^{k_i})-d^2(y,x^{k_i})+d^2(y,z^{k_i})\right\} \le \beta _{k_i}\left[ f(x^{k_i},y)-f(x^{k_i},z^{k_i})\right] , \end{aligned}$$

which implies that

$$\begin{aligned} -d(z^{k_i},x^{k_i})\{d(y,x^{k_i})+d(y,z^{k_i})\}\le 2\beta _{k_i}\left[ f(x^{k_i},y)-f(x^{k_i},z^{k_i})\right] . \end{aligned}$$
(2.31)

Now, taking liminf in (2.31), we use (2.30), together with the boundedness of \(\{x^k\}\) and \(\{z^k\}\), to obtain that \(\liminf _{i\rightarrow \infty }f(x^{k_i},y)\ge 0\) for all \(y\in K\), thus establishing that \(\{x^{k_i}\}\) is an asymptotically solving sequence. \(\square \)

Proposition 2.10

Assume that the bifunction f satisfies B1–B5 and that \(S(f, K)\ne \emptyset \). If a subsequence \(\{\alpha _{k_i}\}\) of \(\{\alpha _k\}\), as defined in (2.5), converges to 0, then \(\{x^{k_i}\}\) is an asymptotically solving sequence for EP(fK).

Proof

For proving the result, we will use Proposition 2.9. Thus, we must show that

$$\begin{aligned} \lim _{i\rightarrow \infty }d(z^{k_i},x^{k_i})=0. \end{aligned}$$

For the sake of contradiction, and without loss of generality, let us assume that

$$\begin{aligned} \mathrm{liminf}_{i\rightarrow \infty }d(z^{k_i},x^{k_i})>\eta >0. \end{aligned}$$
(2.32)

Define

$$\begin{aligned} {\hat{y}}^i=2\alpha _{k_i}z^{k_i} \oplus (1-2\alpha _{k_i})x^{k_i}, \end{aligned}$$
(2.33)

or equivalently

$$\begin{aligned} d({\hat{y}}^i,x^{k_i})=2\alpha _{k_i}d(z^{k_i},x^{k_i}). \end{aligned}$$
(2.34)

Note that, since \(\lim _{i\rightarrow \infty }\alpha _{k_i}=0\), \(\ell (k_i)> 1\) for large enough i. Also, in view of (2.33), we have that \({{\hat{y}}}^i=y^{\ell (k_i)-1}\) in the inner loop of the linesearch for determining \(\alpha _{k_i}\), i.e., in (2.4). Since \(\ell (k_i)\) is the first integer for which the inequality in (2.3) holds, such inequality is reversed for \(\ell (k_i)-1\), i.e., we have

$$\begin{aligned} \beta _{k_i}\left[ f({\hat{y}}^i,x^{k_i})-f({\hat{y}}^i,z^{k_i})\right] < \frac{\delta }{2}d^2(z^{k_i},x^{k_i}) \end{aligned}$$
(2.35)

for large enough i. On the other hand, since \(\lim _{i\rightarrow \infty }\alpha _{k_i}=0\) by hypothesis, and \(\{d(z^{k_i},x^{k_i})\}\) is bounded by Proposition 2.7(ii) and (iv), it follows from (2.34) that

$$\begin{aligned} \lim _{i\rightarrow \infty }d({\hat{y}}^i,x^{k_i})=0. \end{aligned}$$
(2.36)

Since \(\{x^k\}\) and \(\{z^k\}\) are bounded by Proposition 2.7 (ii) and (iv), we invoke B5, and conclude that there exist \(M>0\) such that

$$\begin{aligned} \beta _{k_i}\left[ f(x^{k_i},x^{k_i})-f({{\hat{y}}}^i,x^{k_i})\right]\le & {} {{\tilde{\beta }}} Md({{\hat{y}}}^i, x^{k_i})\nonumber \\ \beta _{k_i}\left[ f({{\hat{y}}}^i,z^{k_i})-f(x^{k_i},z^{k_i})\right]\le & {} {{\tilde{\beta }}} Md({{\hat{y}}}^i, x^{k_i}), \end{aligned}$$
(2.37)

with \({{\tilde{\beta }}}\) as in the definition of the sequence \(\{\beta _k\}\). Since \(\delta \) belongs to (0, 1), we obtain from (2.36) and (2.37) that there exists \(m\in {\mathbb {N}}\) such that

$$\begin{aligned} \beta _{k_i}\left[ f(x^{k_i},x^{k_i})-f({{\hat{y}}}^i,x^{k_i})\right]\le & {} \frac{\eta ^2(1-\delta )}{4},\nonumber \\ \beta _{k_i}\left[ f({{\hat{y}}}^i,z^{k_i})-f(x^{k_i},z^{k_i})\right]\le & {} \frac{\eta ^2(1-\delta )}{4}, \end{aligned}$$
(2.38)

for \(i\ge m\), with \(\eta \) as in (2.32) and \(\delta \) as in (2.35). Therefore, we obtain

$$\begin{aligned} \beta _{k_i}\left[ f(x^{k_i},x^{k_i})-f(x^{k_i},z^{k_i})\right]\le & {} \beta _{k_i}\left[ f({{\hat{y}}}^i,x^{k_i})-f({{\hat{y}}}^i,z^{k_i})\right] +\frac{\eta ^2(1-\delta )}{2}\nonumber \\< & {} \frac{\delta }{2}d^2(z^{k_i},x^{k_i})+\frac{(1-\delta )}{2}d^2(z^{k_i},x^{k_i})= \frac{1}{2}d^2(z^{k_i},x^{k_i}),\nonumber \\ \end{aligned}$$
(2.39)

for all \(i\ge m\), using (2.38) in the first inequality and (2.35), (2.32) in the second one. Now we combine Corollary 2.3 and (2.39) to get

$$\begin{aligned}&\frac{1}{2}\{d^2(z^{k_i},x^{k_i})-d^2(x^{k_i},x^{k_i})+d^2(x^{k_i},z^{k_i})\}\\&\quad \le \beta _{k_i}\left[ f(x^{k_i},x^{k_i})-f(x^{k_i},z^{k_i})\right] < \frac{1}{2}d^2(z^{k_i},x^{k_i}) \end{aligned}$$

for all \(i\ge m\), i.e.,

$$\begin{aligned} d^2(x^{k_i},z^{k_i})<0 \end{aligned}$$

for all \(i\ge m\), which is a contradiction. \(\square \)

Now we prove that EML generates an asymptotically solving sequence.

Proposition 2.11

Assume that f satisfies B1–B5, and that \(S(f,K)\ne \emptyset \). Then the sequence \(\{x^k\}\) generated by Algorithm EML is an asymptotically solving sequence for EP(fK).

Proof

First assume that there exists a subsequence \(\{\alpha _{k_i}\}\) of \(\{\alpha _k\}\) which converges to 0. In this case, by Proposition 2.10, we obtain \(\liminf _{i\rightarrow \infty }f(x^{k_i},y)\ge 0\) for each \(y\in K\). Now assume that \(\{\alpha _{k_i}\}\) is any subsequence of \(\{\alpha _k\}\) bounded away from zero (say \(\alpha _{k_i}\ge {\bar{\alpha }}>0\)). It follows from (2.3) and (2.6) that

$$\begin{aligned} \beta _{k_i}\left[ f(y^{k_i},x^{k_i})-f(y^{k_i},z^{k_i})\right] \ge \frac{\delta }{2}d^2(z^{k_i},x^{k_i}). \end{aligned}$$
(2.40)

Note that, since \(\alpha _{k_i}\le 1\) by (2.5), we get, in view of B3,

$$\begin{aligned} 0=f(y^{k_i},y^{k_i})\le \alpha _{k_i}f(y^{k_i},z^{k_i})+(1-\alpha _{k_i})f(y^{k_i},x^{k_i}), \end{aligned}$$

so that

$$\begin{aligned} \frac{1-\alpha _{k_i}}{\alpha _{k_i}}f(y^{k_i},x^{k_i})\ge -f(y^{k_i},z^{k_i}). \end{aligned}$$
(2.41)

Multiplying both sides of (2.41) by \(\beta _{k_i}\) and adding (2.40), we easily get

$$\begin{aligned} \beta _{k_i}f(y^{k_i},x^{k_i}) \ge \frac{\delta \alpha _{k_i}}{2}d^2(z^{k_i},x^{k_i}). \end{aligned}$$
(2.42)

Taking limits in (2.42) with \(i\rightarrow \infty \), we obtain, in view of Proposition 2.7(v),

$$\begin{aligned} 0\ge \frac{\delta {{\bar{\alpha }}}}{2}\lim _{i\rightarrow \infty }d^2(z^{k_i},x^{k_i}). \end{aligned}$$

Hence, \(\lim _{i\rightarrow \infty }d(z^{k_i},x^{k_i})=0\). We are within the assumptions of Proposition 2.9, and thus we conclude that \(\{x^{k_i}\}\) is an asymptotically solving sequence for EP(fK). It follows that every subsequence of \(\{x^k\}\) is an asymptotically solving sequence for EP(fK), and hence the same holds for the whole sequence \(\{x^k\}\). \(\square \)

We need now the following theorem, which is the last intermediate result leading to our \(\Delta \)-convergence theorem.

Theorem 2.12

(see Ahmadi Kakavandi 2013) Let (Xd) be a Hadamard space and \(\{x^k\}\) be a sequence in X. Take \(x\in X\). \(\{x^k\}\)\(\Delta \)-converges to x if and only if \(\limsup _{k\rightarrow \infty }\langle \overrightarrow{xx^{k}},\overrightarrow{xy}\rangle \le 0\) for all \(y\in X\).

We end this section with our \(\Delta \)-convergence theorem for EML.

Theorem 2.13

Assume that f satisfies B1–B6, and that \(S(f,K)\ne \emptyset \). Then the sequence \(\{x^k\}\) generated by Algorithm EML is \(\Delta \)-convergent to a solution of EP(fK).

Proof

Proposition 2.7(ii) shows that \(\{x^k\}\) is bounded. Therefore \(\{x^k\}\) has \(\Delta \)-cluster points. Let \(v\in K\) be one of them, and \(\{x^{k_i}\}\) be a subsequence of \(\{x^k\}\)\(\Delta \)-convergent to v. By Proposition 2.11, we have that \(\liminf _{i\rightarrow \infty }f(x^{k_i},y)\ge 0\) for all \(y\in K\). Since \(f(\cdot ,y)\) is \(\Delta \)-upper semicontinuous by B6, we conclude that \(f(v,y)\ge 0\) for all \(y\in K\), i.e., \(v\in S(f,K)\). It remains to be proved that there exists only one \(\Delta \)-cluster point of \(\{x^k\}\). Let \(v'\) be another \(\Delta \)-cluster point of \(\{x^k\}\), so that there exists a subsequence \(\{x^{k_j}\}\) such that \(x^{k_j}\overset{\Delta }{\longrightarrow } v'\). We have already proved that \(v'\in S(f,K)\). Also both \(\lim _{k\rightarrow \infty } d(v,x^k)\) and \(\lim _{k\rightarrow \infty }d(v',x^k)\) exist, in view of Proposition 2.7(i). Note that

$$\begin{aligned} 2\langle \overrightarrow{x^{k_i}x^{k_j}},\overrightarrow{v'v}\rangle = d^2(x^{k_i},v)+d^2(x^{k_j},v')-d^2(x^{k_i},v')-d^2(x^{k_j},v). \end{aligned}$$
(2.43)

Letting \(i\rightarrow \infty \), and then \(j\rightarrow \infty \), we get \({\lim }_{j\rightarrow \infty } {\lim }_{i\rightarrow \infty } \langle \overrightarrow{x^{k_i}x^{k_j}},\overrightarrow{v'v}\rangle =0\). Also, we can write the left side of (2.43) as:

$$\begin{aligned} 2\langle \overrightarrow{x^{k_i}x^{k_j}},\overrightarrow{v'v}\rangle = 2\langle \overrightarrow{x^{k_i}v},\overrightarrow{v'v}\rangle + 2\langle \overrightarrow{vv'},\overrightarrow{v'v}\rangle + 2\langle \overrightarrow{v'x^{k_j}},\overrightarrow{v'v}\rangle . \end{aligned}$$
(2.44)

By taking \(\limsup \) in (2.44) and using Theorem 2.12, we conclude that \(d^2(v,v')\le 0\). Hence \(v=v'\), i.e. \(\{x^k\}\) has only one \(\Delta \)-cluster point, and so it \(\Delta \)-converges to a solution of EP(fK). \(\square \)

3 A strongly convergent variant of the EML method

In this section we perform a minor modification on the EML algorithm which ensures strong convergence of the generated sequence to a solution of EP(fK). In Hilbert spaces, this procedure, called Halpern’s regularization (see, e.g., Halpern 1967), consists of taking a convex combination of a given EML iterate with a fixed point \(u\in X\), where the weight given to u decreases to 0 with k. The strong limit of the generated sequence is the projection of u onto the solution set. The modified method will be called EMLH. We remark that, besides the theoretical interest in having strong (rather than \(\Delta \)-) convergence, the sequences generated by regularized methods which ensure strong convergence tend to exhibit a more regular behavior. For instance, examples have been presented in Gárciga Otero et al. (2001), showing that the proximal point method for convex optimization in Hilbert spaces can generate a weakly convergent sequence \(\{x^k\}\) such that \(\sum _{k=0}^{\infty }\left\| x^k-x^{k+1}\right\| =\infty \). When regularized versions of the proximal point method, which ensure strong convergence, are applied to these examples, the generated sequence \(\{x^k\}\) turns out to satisfy \(\sum _{k=0}^{\infty }\left\| x^k-x^{k+1}\right\| <\infty \).

We will assume in the sequel that X is a Hadamard space, \(K\subset X\) is closed and convex, and \(f:X\times X\rightarrow {\mathbb {R}}\) is a bifunction which satisfies B1–B3. Next we give the formal definition of Algorithm EMLH.

1. Initialization:

Fix \(u\in X\) and consider a sequence \(\{\gamma _k\}\subset (0,1)\) such that \(\lim _{k\rightarrow \infty }\gamma _k =0\) and \(\sum _{k=0}^{\infty }\gamma _k=\infty \). Define

$$\begin{aligned} x^0 \in K. \end{aligned}$$
(3.1)

2. Iterative step: Given \(x^k\), define

$$\begin{aligned} z^k \in \mathrm{Argmin}_{y\in K}\left\{ f(x^k,y)+\frac{1}{2\beta _k}d^2(y,x^k)\right\} . \end{aligned}$$
(3.2)

If \(x^k=z^k\) stop. Otherwise, let

$$\begin{aligned} \ell (k)=\min \left\{ \ell \ge 0: \beta _kf(y^{\ell },x^k)-\beta _kf(y^{\ell },z^k) \ge \frac{\delta }{2}d^2(z^k,x^k)\right\} , \end{aligned}$$
(3.3)

with

$$\begin{aligned} y^{\ell }=2^{-\ell }z^k\oplus (1-2^{-\ell })x^k. \end{aligned}$$
(3.4)

Set

$$\begin{aligned} \alpha _k:= & {} 2^{-\ell (k)}, \end{aligned}$$
(3.5)
$$\begin{aligned} y^k:= & {} \alpha _k z^k\oplus (1-\alpha _k)x^k, \end{aligned}$$
(3.6)
$$\begin{aligned} w^k= & {} P_{H_k}(x^k), \end{aligned}$$
(3.7)

where

$$\begin{aligned} H_k=\{y\in X: f(y^k,y)\le 0\}. \end{aligned}$$

Define

$$\begin{aligned} v^k= & {} \gamma _ku\oplus (1-\gamma _k)w^k, \end{aligned}$$
(3.8)
$$\begin{aligned} x^{k+1}= & {} P_K(v^k). \end{aligned}$$
(3.9)

Observe that the only difference between EML and EMLH is the introduction of \(v^k\) in (3.8). Note also that our assumptions on X ensure that \(v^k\) is well defined. We proceed now to the convergence analysis of EMLH. We mention that many of the intermediate results coincide with those for EML, in which case we will refer to their proofs without much detail.

We observe first that, as in the case of EML, the sequence \(\{ z^k\}\) is well defined by Lemma 3.1.2 of Jost (1997) (see also Lemma 2.2.19 of Bacak 2014).

Proposition 3.1

Assume that \(\{x^k\}\) and \(\{z^k\}\) are the sequences generated by EMLH. Then

$$\begin{aligned} \frac{1}{2}\{d^2(z^k,x^k)-d^2(y,x^k)+d^2(y,z^k)\}\le \beta _k f(x^k,y)-\beta _k f(x^k,z^k), \ \ \ \ \ \ \ \forall y\in K. \end{aligned}$$

Proof

Follows from Proposition 2.2 and (3.2). \(\square \)

Proposition 3.2

If Algorithm EMLH stops at the k-th iteration then \(x^k\) is a solution of EP(fK).

Proof

Follows from Proposition 3.1. \(\square \)

Proposition 3.3

The following statements hold for Algorithm EMLH.

  1. (i)

    \(\ell (k)\) is well defined, (i.e. the Armijo-type search for \(\alpha _k\) is finite), and consequently the same holds for the sequence \(\{x^k\}\).

  2. (ii)

    \(x^k\in K\) for all \(k\ge 0\).

  3. (iii)

    If the algorithm does not stop at iteration k, then \(f(y^k,x^k)>0\).

Proof

Similar to the proof of Proposition 2.5. \(\square \)

Proposition 3.4

Assume that f satisfies B1–B5 and \(S(f,K)\ne \emptyset \). Let \(\{x^k\}\), \(\{y^k\}\), \(\{z^k\}\) and \(\{w^k\}\) be the sequences generated by Algorithm EMLH. If the algorithm does not have finite termination, then

  1. (i)

    The sequence \(\{x^k\}\) is bounded, and therefore it has \(\Delta \)-cluster points.

  2. (ii)

    The sequence \(\{z^k\}\) is bounded.

  3. (iii)

    If \(\{d(w^{k_n},x^{k_n})\}\) is a subsequence of \(\{d(w^k,x^k)\}\) such that \(\lim _{n\rightarrow \infty }d(w^{k_n},x^{k_n}) =0\), then \(\lim _{n\rightarrow \infty } f(y^{k_n},x^{k_n})=0\).

Proof

(i) Let \(x^*\in S(f, K)\). By Proposition 3.3 (iii), \(x^k\not \in H_k\). By (3.7), \(w^k=P_{H_k}(x^k)\). Therefore, Proposition 1.6 implies that

$$\begin{aligned} \langle \overrightarrow{x^*w^k},\overrightarrow{x^kw^k}\rangle \le 0, \end{aligned}$$

or equivalently,

$$\begin{aligned} d^2(w^k,x^k)+d^2(x^*,w^k)-d^2(x^*,x^k)\le 0. \end{aligned}$$
(3.10)

On the other hand, since \(x^{k+1}=P_K(v^k)\), we have

$$\begin{aligned} \langle \overrightarrow{x^*x^{k+1}},\overrightarrow{v^kx^{k+1}}\rangle \le 0, \end{aligned}$$

or equivalently,

$$\begin{aligned} d^2(x^{k+1},v^k)+d^2(x^*,x^{k+1})-d^2(x^*,v^k)\le 0. \end{aligned}$$
(3.11)

Using (3.8), (3.10) and (3.11), we obtain:

$$\begin{aligned} d(x^*,x^{k+1})\le & {} d(x^*,v^k) \le \gamma _k d(x^*,u)+(1-\gamma _k)d(x^*,w^k)\nonumber \\\le & {} \gamma _k d(x^*,u)+(1-\gamma _k)d(x^*,x^k) \le \max \{d(x^*,u),d(x^*,x^k)\}. \end{aligned}$$
(3.12)

Iterating (3.12), we conclude that

$$\begin{aligned} d(x^*,x^{k+1})\le \max \{d(x^*,u),d(x^*,x^0)\}, \end{aligned}$$

and hence \(\{x^k\}\) is bounded.

The proofs of items (ii) and (iii) are similar to those of Proposition 2.7(iv) and (v). \(\square \)

Proposition 3.5

Assume that the bifunction f satisfies B1–B5 and that \(S(f, K)\ne \emptyset \). Let \(\{x^k\}\) and \(\{z^k\}\) be the sequences generated by Algorithm EMLH. If \(\{x^{k_i}\}\) is a subsequence of \(\{x^k\}\) satisfying \(\lim _{i\rightarrow \infty }d(z^{k_i},x^{k_i})=0\), then \(\{x^{k_i}\}\) is an asymptotically solving sequence for EP(fK).

Proof

Similar to the proof of Proposition 2.9. \(\square \)

Proposition 3.6

Assume that the bifunction f satisfies B1–B5 and that \(S(f, K)\ne \emptyset \). If a subsequence \(\{\alpha _{k_i}\}\) of \(\{\alpha _k\}\) as defined in (3.5) converges to 0, then \(\{x^{k_i}\}\) is an asymptotically solving sequence for EP(fK).

Proof

Similar to the proof of Proposition 2.10. \(\square \)

To establish strong convergence of the sequence generated by EMLH, we need an intermediate result which establishes an elementary property of real sequences.

Lemma 3.7

Consider sequences \(\{s_k\}\subset [0,\infty ), \{t_k\}\subset {\mathbb {R}}\) and \(\{\gamma _k\}\subset (0,1)\) satisfying \(\sum _{k=1}^{\infty } \gamma _k = \infty \). Suppose that \(s_{k+1}\le (1-\gamma _k) s_k+\gamma _kt_k\) for all \(k \ge 1\). If \({\limsup }_{n\rightarrow \infty } t_{k_n} \le 0\) for all subsequence \(\{s_{k_n}\}\) of \(\{s_k\}\) satisfying \(\liminf _{n\rightarrow \infty } (s_{{k_n}+1}-s_{k_n})\ge 0\), then \({\lim }_{k\rightarrow \infty }s_k = 0\).

Proof

See Saejung and Yotkaew (2012). \(\square \)

We complete the paper with our strong convergence result for Algorithm EMLH.

Theorem 3.8

Assume that the bifunction f satisfies B1–B6 and that \(S(f, K)\ne \emptyset \). Then the sequence \(\{x^k\}\) generated by EMLH is strongly convergent to \(P_{S(f, K)}(u)\).

Proof

It is easy to check that under B1–B4, S(fK) is closed and convex. Let \(x^*=P_{S(f,K)}(u)\). By Proposition 3.3(iii), \(x^{k+1}\not \in H_{k+1}\). By (3.7), \(w^{k+1}=P_{H_{k+1}}(x^{k+1})\). Therefore, Proposition 1.6 implies that

$$\begin{aligned} \langle \overrightarrow{x^*w^{k+1}},\overrightarrow{x^{k+1}w^{k+1}}\rangle \le 0, \end{aligned}$$

or equivalently,

$$\begin{aligned} d^2(w^{k+1},x^{k+1})+d^2(x^*,w^{k+1})-d^2(x^*,x^{k+1})\le 0. \end{aligned}$$
(3.13)

On the other hand, since \(x^{k+1}=P_K(v^k)\), we have

$$\begin{aligned} \langle \overrightarrow{x^*x^{k+1}},\overrightarrow{v^kx^{k+1}}\rangle \le 0, \end{aligned}$$

or equivalently,

$$\begin{aligned} d^2(x^{k+1},v^k)+d^2(x^*,x^{k+1})-d^2(x^*,v^k)\le 0. \end{aligned}$$
(3.14)

Now, (3.13) and (3.14) imply that

$$\begin{aligned} d^2(x^{k+1},v^k)+d^2(w^{k+1},x^{k+1})+d^2(x^*,w^{k+1})-d^2(x^*,v^k)\le 0. \end{aligned}$$
(3.15)

We conclude from (3.15) that

$$\begin{aligned} d^2(x^*,w^{k+1})\le d^2(x^*,v^k). \end{aligned}$$
(3.16)

Taking into account (3.8), (3.13) and (3.14), we obtain

$$\begin{aligned} \nonumber d(x^*,x^{k+1})&\le d(x^*,v^k) \le \gamma _k d(x^*,u)+(1-\gamma _k)d(x^*,w^k)\\&\le \gamma _k d(x^*,u)+(1-\gamma _k)d(x^*,x^k) \le \max \{d(x^*,u),d(x^*,x^k)\}. \end{aligned}$$
(3.17)

Iterating (3.17), we get that

$$\begin{aligned} \max \{d(x^*,x^{k+1}), d(x^*,v^k)\} \le \max \{d(x^*,u),d(x^*,x^0)\}, \end{aligned}$$

and hence both \(\{x^k\}\) and \(\{v^k\}\) are bounded. It follows from (3.13) that \(\{w^k\}\) is also bounded.

On the other hand, we have

$$\begin{aligned} d^2(x^*,v^{k+1})&=d^2(x^*,\gamma _{k+1}u\oplus (1-\gamma _{k+1})w^{k+1})\nonumber \\&\le (1-\gamma _{k+1})d^2(x^*,w^{k+1})+ \gamma _{k+1}d^2(x^*,u)-\gamma _{k+1}(1-\gamma _{k+1})d^2(u,w^{k+1})\nonumber \\&\le (1-\gamma _{k+1})d^2(x^*,v^k)+ \gamma _{k+1}\left[ d^2(x^*,u)-(1-\gamma _{k+1})d^2(u,w^{k+1})\right] , \end{aligned}$$
(3.18)

using (3.16) in the second inequality. Next we will show that \(d^2(x^*,v^{k})\rightarrow 0\). In view of Lemma 3.7, it suffices to show that

$$\begin{aligned} \limsup _{n\rightarrow \infty } \left[ d^2(x^*,u)-(1-\gamma _{k_n+1})d^2(u,w^{k_n+1})\right] \le 0 \end{aligned}$$
(3.19)

for every subsequence \(\{d^2(x^*,v^{k_n})\}\) of \(\{d^2(x^*,v^k)\}\) satisfying

$$\begin{aligned} \liminf _{n\rightarrow \infty } \left[ d^2(x^*,v^{k_n+1})-d^2(x^*,v^{k_n})\right] \ge 0. \end{aligned}$$

Consider such a subsequence. We have

$$\begin{aligned} 0&\le \liminf _{n\rightarrow \infty } \left[ d^2(x^*,v^{k_n+1})-d^2(x^*,v^{k_n})\right] \nonumber \\&=\liminf _{n\rightarrow \infty }\left[ d^2(x^*,\gamma _{k_n+1}u\oplus (1-\gamma _{k_n+1})w^{k_n+1})- d^2(x^*,v^{k_n})\right] \nonumber \\&\le \liminf _{n\rightarrow \infty } \left[ \gamma _{k_n+1}d^2(x^*,u)+(1-\gamma _{k_n+1})d^2(x^*,w^{k_n+1})- d^2(x^*,v^{k_n})\right] \nonumber \\&=\liminf _{n\rightarrow \infty }\left[ \gamma _{k_n+1}(d^2(x^*,u)-d^2(x^*,w^{k_n+1}))+ d^2(x^*,w^{k_n+1})-d^2(x^*,v^{k_n})\right] \nonumber \\&\le \limsup _{n\rightarrow \infty } \gamma _{k_n+1}\left[ d^2(x^*,u)-d^2(x^*,w^{k_n+1})\right] +\liminf _{n\rightarrow \infty } \left[ d^2(x^*,w^{k_n+1})-d^2(x^*,v^{k_n})\right] \nonumber \\&=\liminf _{n\rightarrow \infty } \left[ d^2(x^*,w^{k_n+1})-d^2(x^*,v^{k_n})\right] \nonumber \\&\le \limsup _{n\rightarrow \infty } \left[ d^2(x^*,w^{k_n+1})-d^2(x^*,v^{k_n})\right] \le 0, \end{aligned}$$
(3.20)

using the fact that \(\lim _{k\rightarrow \infty }\gamma _k=0\) in the third equality and (3.16) in the last inequality. We conclude from (3.20) that

$$\begin{aligned} \lim _{n\rightarrow \infty }\left[ d^2(x^*,w^{k_n+1})-d^2(x^*,v^{k_n})\right] =0. \end{aligned}$$
(3.21)

Combining (3.21) and (3.15), we get

$$\begin{aligned} \lim _{n\rightarrow \infty }d^2(x^{k_n+1},v^{k_n})=\lim _{n\rightarrow \infty } d^2(w^{k_n+1},x^{k_{n}+1})=0. \end{aligned}$$
(3.22)

Refining the subsequence \(\{w^{k_n+1}\}\) if needed, we may assume, without loss of generality, that it has a \(\Delta \)-limit \(p\in X\), so that

$$\begin{aligned} \limsup _{n\rightarrow \infty } \left[ d^2(x^*,u)-(1-\gamma _{k_n+1})d^2(u,w^{k_n+1})\right] \le d^2(x^*,u)-d^2(u,p). \end{aligned}$$
(3.23)

We claim now that \(p\in S(f,K)\). By (3.22), \(x^{k_n+1}\rightharpoonup p\). Now we consider, as we did for Algorithm EML, two cases related to the behavior of \(\{\alpha _k\}\). First assume that there exists a subsequence \(\{\alpha _{k_{n_i}+1}\}\) of \(\{\alpha _{k_n+1}\}\) which converges to 0. In this case, by Proposition 3.6, we obtain \(\liminf _{i\rightarrow \infty }f(x^{k_{n_i}+1},y)\ge 0\) for all \(y\in K\). Since \(f(\cdot ,y)\) is \(\Delta \)-upper semicontinuous by B6, we get that \(p\in S(f,K)\).

Now we take a subsequence \(\{\alpha _{k_{n_i}+1}\}\) of \(\{\alpha _{k_n+1}\}\) bounded away from zero, say greater or equal to \(\eta \) for large enough i. It follows from (3.3) and (3.6) that

$$\begin{aligned} \beta _{k_{n_i}+1}\left[ f(y^{k_{n_i}+1},x^{k_{n_i}+1}) -f(y^{k_{n_i}+1},z^{k_{n_i}+1})\right] \ge \frac{\delta }{2}d^2(z^{k_{n_i}+1},x^{k_{n_i}+1}). \end{aligned}$$
(3.24)

Note that

$$\begin{aligned} 0=f(y^{k_{n_i}+1},y^{k_{n_i}+1})\le \alpha _{k_{n_i}+1}f(y^{k_{n_i}+1},z^{k_{n_i}+1}) +(1-\alpha _{k_{n_i}+1})f(y^{k_{n_i}+1},x^{k_{n_i}+1}), \end{aligned}$$

so that

$$\begin{aligned} \frac{1-\alpha _{k_{n_i}+1}}{\alpha _{k_{n_i}+1}}f(y^{k_{n_i}+1},x^{k_{n_i}+1})\ge -f(y^{k_{n_i}+1},z^{k_{n_i}+1}). \end{aligned}$$
(3.25)

Multiplying (3.25) by \(\beta _{k_{n_i}+1}\) and adding (3.24), we easily get

$$\begin{aligned} \beta _{k_{n_i}+1}f(y^{k_{n_i}+1},x^{k_{n_i}+1}) \ge \frac{\delta \alpha _{k_{n_i}+1}}{2}d^2(z^{k_{n_i}+1},x^{k_{n_i}+1}). \end{aligned}$$
(3.26)

Since \(\lim _{i\rightarrow \infty }d(w^{k_{n_i}+1},x^{k_{n_i}+1})=0\) by (3.22), taking limits in (3.26) and using Proposition 3.4(iii) we obtain \(\lim _{i\rightarrow \infty }d(z^{k_{n_i}+1},x^{k_{n_i}+1})=0\).

Now, we invoke Proposition 3.5 to get \(\liminf _{i\rightarrow \infty }f(x^{k_{n_i}+1},y)\ge 0\) for all \(y\in K\). Again, since \(f(\cdot ,y)\) is \(\Delta \)-upper semicontinuous by B6, we conclude that \(p\in S(f,K)\). We have shown that the \(\Delta \)-limit p of \(\{x^{k_n+1}\}\) belongs to S(fK) both when the corresponding stepsizes \(\alpha _{k_n+1}\)’s either approach zero or are bounded away from zero, establishing the claim.

Note that S(fK) is closed and convex, \(w^{k_n+1}\overset{\Delta }{\longrightarrow } p\), \(x^*=P_{S(f, K)}(u)\) and \(p\in S(f,K)\). Therefore we have \(d(x^*, u) \le d(p, u)\). Now we apply (3.23) and get

$$\begin{aligned} \limsup _{n\rightarrow \infty }\left[ d^2(x^*,u)-(1-\gamma _{k_n+1})d^2(u,w^{k_n+1})\right] \le d^2(x^*,u)-d^2(u,p)\le 0. \end{aligned}$$

Therefore (3.19) holds and thus, by Lemma 3.7, \(d^2(x^*,v^{k})\rightarrow 0\). Hence \(d^2(x^*,x^{k})\rightarrow 0\) by (3.14), i.e. \(x^k\rightarrow x^*=P_{S(f, K)}(u)\). \(\square \)