1 Introduction

Let \(K\subseteq {\mathbb {R}}^{n}\) be a nonempty, closed and convex set and \(f: K \times K \rightarrow {\mathbb {R}}\) a bifunction. Then the equilibrium problem is defined by

$$\begin{aligned} \mathrm{find ~} {\overline{x}} \in K: ~~ f({\overline{x}}, y) \ge 0, ~ \forall ~ y \in K. \end{aligned}$$
(EP)

Problem (EP) is a very general formulation which includes several mathematical models from continuous optimization and variational problems such as minimization problems, variational inequalities, minimax problems, fixed point problems and complementarity problems among others (see the excellent seminal work [6] for more on this).

In particular, (EP) includes also two families of mixed variational inequalities in which we have special interest since applications in several fields of engineering and economics among others (see [1, 14, 20, 21, 30, 33, 37]) can be cast as such. Recall that given an operator \(T: K \rightarrow {\mathbb {R}}^{n}\) and a function \(\varphi : K \rightarrow {\mathbb {R}}\), the mixed variational inequality (MVI) problem is defined as

$$\begin{aligned} \mathrm{find ~} {\overline{x}} \in K: ~~ \langle T({\overline{x}}), y - {\overline{x}} \rangle + \varphi (y) - \varphi ({\overline{x}}) \ge 0, ~ \forall ~ y \in K, \end{aligned}$$
(MVI)

while the inverse mixed variational inequality (IMVI) problem is defined as

$$\begin{aligned} \mathrm{find ~} {\overline{x}} \in K: ~~ \langle {\overline{x}}, T(y) - T({\overline{x}}) \rangle + \varphi (y) - \varphi ({\overline{x}}) \ge 0, ~ \forall ~ y \in K. \end{aligned}$$
(IMVI)

Both problems are very general formulations, too, and have been intensively studied in the literature (see [1, 16, 20, 21, 35, 37, 50, 53] among others) since they encompass variational inequalities and inverse variational inequalities as a special case (when \(\varphi =0\), see [12, 13]), constrained optimization problems (when \(T=0\)) and even the minimization of the sum of two functions (when T is the gradient of a differentiable function defined on K in problem (MVI)). Furthermore, problem (IMVI) generalizes standard mathematical programming problems such as the extended linear-quadratic programming studied in [41] (see [20]). For all these reasons, problem (EP) became the object of study for many researchers in applied mathematics, engineering, economics and computer sciences, among others, especially, when the involved bifunction f is assumed to be convex on its second argument.

The convexity assumption in problems (EP), (MVI) and (IMVI) is standard and hard to be dropped since even usual results may fail without this hypothesis. Indeed, if \(\varphi \) is not convex, then problem (MVI) may have no solutions even on convex and compact sets (see the example in [25, p. 127]) and the same happens with (IMVI). The reason behind this is the “conflict” between the properties of the function \(\varphi \) and those of the operator T (see [16, 25, 50]) that occurs because one is searching for a point where an inequality is satisfied for all possible values of a variable, so, if the function \(\varphi \) is monotonically decreasing on a certain subset of K, the operator needs to “go upwards” faster on the same subset, and vice versa. As a consequence, an assumption regarding the relation between the involved function and operator needs to be imposed in order to ensure the existence of solutions to (MVI) and (IMVI) when \(\varphi \) is not convex even on convex and compact sets. Usually in the literature (see [16, 25, 26] and references therein), the so-called mixed variational inequality property introduced in [50, Definition 3.1]) is employed for this. Furthermore, similar obstacles or even higher are expected for algorithms for dealing with such problems, as indicated, for instance, by the projected gradient method for solving variational inequalities (\(\varphi =0\) in problem (MVI)) where T is assumed \(\mu \)-strongly monotone and L-Lipschitz-continuous (with \(L, \mu > 0\)) which converges when \(L < \sqrt{2 \mu }\) (see for instance [13, p. 1109]).

In this paper, we propose an iterative method for solving problem (EP) without the usual convexity assumption. In particular, we use the generalized convexity notion of strong quasiconvexity, a class of functions broader than the strongly convex ones.

Strongly quasiconvex functions were introduced by Polyak in his famous paper [38] and were investigated by different authors who provided various results involving them in works like [29, 49]. Moreover, as anticipated by Polyak in [38], there is also a significant literature on the calculus of variations involving strongly quasiconvex functions (see [31] for instance). New studies involving strongly quasiconvex functions in the field of optimization can be found, for instance, in [11], which provides asymptotic convergence properties of the saddle-point dynamics involving bifunctions which are strongly quasiconvex in one or both variables, and in [42], where one finds results on the asymptotic behavior of quasi-autonomous gradient expansive systems governed by a strongly quasiconvex function. The recent work [32] of one of the authors of this paper provides a fresh view on this class of functions, by recalling the most important existing results and providing a first investigation on minimizing its members by means of modern iterative methods. More precisely, there it is shown that the classical proximal point algorithm converges under standard assumptions when employed for minimizing a strongly quasiconvex function, too, while in [17] the authors of this study obtained a similar conclusion for a relaxed-inertial proximal point method.

Different proximal point-type algorithms have been considered for solving equilibrium problems in the convex case, in some of them the employed bifunction being endowed with a generalized monotonicity property rather than the classical monotonicity. There are only a few (very recent) works where iterative methods for solving equilibrium problems with the underlying function (strongly) quasiconvex can be found, namely [27], where arguably the first proximal point method in such a framework is proposed; [51] with a subgradient method; [52] that contains a proximal subgradient algorithm and the recent contribution [4], which contains an extension of [27] to Bregman distances.

The main contribution of this work consists in showing that the relaxed-inertial proximal point algorithm for solving equilibrium problems (RIPPA-EP from now on) introduced in [2, 3] for solving monotone inclusion problems and extended to equilibrium problems in [23, 47] in the convex case remains convergent, under slightly different, but still standard assumptions, when the employed bifunction is strongly quasiconvex in its second variable and pseudomonotone, following thus in a certain sense our recent works [17, 23, 24, 27, 32]. Therefore, the influence of B.T. Polyak’s work on this paper is twofold, because the inertial proximal algorithms are based on his heavy ball method from [39]. Numerical experiments for solving problems (MVI) and (IMVI) where the involved function \(\varphi \) is strongly quasiconvex are presented, and the performance of the proposed algorithm is shown to be superior to the one of the standard proximal point method.

The paper is structured as follows. In Sect. 2, we introduce notations and preliminary results. In Sect. 3, the proposed algorithm is presented and its convergence is investigated. Relevant special cases are discussed, and sufficient conditions that guarantee the fulfillment of the hypotheses of the convergence statements are provided, too. Section 4 contains computational results that illustrate the theoretical achievements.

2 Preliminaries and Basic Definitions

We denote by \(\langle \cdot ,\cdot \rangle \) the inner product of \({\mathbb {R}}^{n}\), while the corresponding Euclidean norm is \(\Vert \cdot \Vert \). For \(x, y, z \in {\mathbb {R}}^{n}\) and \(\beta \in {\mathbb {R}}\), one has

$$\begin{aligned}&~~~~~ \langle x - z, y - x \rangle = \frac{1}{2} \Vert z - y \Vert ^{2} - \frac{1}{2} \Vert x - z \Vert ^{2} - \frac{1}{2} \Vert y - x \Vert ^{2}, \end{aligned}$$
(2.1)
$$\begin{aligned}&\Vert \beta x + (1 - \beta ) y \Vert ^{2} = \beta \Vert x \Vert ^{2} + (1 - \beta ) \Vert y\Vert ^{2} - \beta (1 - \beta ) \Vert x - y \Vert ^{2}. \end{aligned}$$
(2.2)

Consider a function \(h: {\mathbb {R}}^{n} \rightarrow \overline{{\mathbb {R}}}:= {\mathbb {R}} \cup \{\pm \infty \}\). The effective domain of h is \({{\,\textrm{dom}\,}}\,h:= \{x \in {\mathbb {R}}^{n}: h(x) < + \infty \}\) and we call h proper when it has a nonempty domain and takes nowhere the value \(- \infty \). The standard conventions \(\sup _{\emptyset } h:= - \infty \), \(\inf _{\emptyset } h:= + \infty \), and \(0\cdot (+\infty )=+\infty \), \(0\cdot (-\infty )=0\), and \(+\infty + (-\infty )=+\infty \) are considered. The sublevel set of h at height \(\lambda \in {\mathbb {R}}\) is \(S_{\lambda } (h):= \{x \in {\mathbb {R}}^{n}: h(x) \le \lambda \}\), while its set minimizers is \({{\,\mathrm{\arg \min }\,}}_{{\mathbb {R}}^{n}} h\). Denote by \({{\,\textrm{Id}\,}}:{\mathbb {R}}^{n} \rightarrow {\mathbb {R}}^{n}\) the identity operator on \({\mathbb {R}}^{n}\), and by \(K^{*}:= \{y \in {\mathbb {R}}^{n}: \langle y, x\rangle \ge 0, ~ \forall ~ x \in K\}\) the positive dual cone associated to a set \(K\subseteq {\mathbb {R}}^{n}\).

When \({{\,\textrm{dom}\,}}h\) is convex, we say that h is

(a):

convex, when for every \(x, y \in {{{\,\textrm{dom}\,}}}\,h\)

$$\begin{aligned} h(\lambda x + (1-\lambda )y) \le \lambda h(x) + (1 - \lambda ) h(y), \, \forall \, \lambda \in [0, 1]; \end{aligned}$$
(2.3)
(b):

semistrictly quasiconvex if, given every \(x, y \in {{{\,\textrm{dom}\,}}}\,h\), with \(h(x) \ne h(y)\), then

$$\begin{aligned} h(\lambda x + (1-\lambda )y) < \max \{h(x), h(y)\}, \, \forall \, x, y \in {{\,\textrm{dom}\,}}h, \, \forall \, \lambda \in \, ]0, 1[, \end{aligned}$$
(2.4)
(c):

quasiconvex, when for every \(x, y \in {{{\,\textrm{dom}\,}}}\,h\)

$$\begin{aligned} h(\lambda x + (1-\lambda ) y) \le \max \{h(x), h(y)\}, \, \forall \, \lambda \in [0, 1]; \end{aligned}$$
(2.5)
(d):

strongly convex [38] (on \({{\,\textrm{dom}\,}}\,h\)) with modulus \(\gamma > 0\), when there is a \(\gamma \in \, ]0, + \infty [\) for which

$$\begin{aligned}{} & {} h(\lambda y + (1-\lambda )x) \le \lambda h(y) + (1-\lambda ) h(x) - \lambda (1 - \lambda ) \frac{\gamma }{2} \Vert x - y \Vert ^{2}, \,\\{} & {} \qquad \qquad \forall \, x, y \in {{\,\textrm{dom}\,}}h, \, \forall \, \lambda \in [0, 1]; \end{aligned}$$
(e):

strongly quasiconvex [38] (on \({{\,\textrm{dom}\,}}\,h\)) with modulus \(\gamma > 0\), when there is a \(\gamma \in \, ]0, + \infty [\) for which

$$\begin{aligned}{} & {} h(\lambda y + (1-\lambda )x) \le \max \{h(y), h(x)\} - \lambda (1 - \lambda ) \frac{\gamma }{2} \Vert x - y \Vert ^{2}, \,\\{} & {} \qquad \qquad \forall \, x, y \in {{\,\textrm{dom}\,}}h, \, \forall \, \lambda \in [0, 1]. \end{aligned}$$

When (2.3) (or (2.5)) is strictly fulfilled whenever \(x \ne y\), we call h strictly (quasi)convex.

The following scheme exhibits the known relations among the mentioned (generalized) convexity concepts (where “qcx” stands for quasiconvex)

$$\begin{aligned} \begin{array}{ccccccc} \mathrm{strongly ~ convex} &{} \Longrightarrow &{} \mathrm{strictly ~ convex} &{} \Longrightarrow &{} \textrm{convex} &{} \Longrightarrow &{} \textrm{qcx} \\ \Downarrow &{} \, &{} \Downarrow &{} \, &{} \Downarrow &{} \, &{} \, \\ \mathrm{strongly ~ qcx} &{} \Longrightarrow &{} \mathrm{strictly ~ qcx} &{} \Longrightarrow &{} \mathrm{semistrictly ~ qcx} &{} \, &{} \, \\ \, &{} \, &{} \Downarrow &{} \, &{} \, &{} \, &{} \, \\ \, &{} \, &{} \textrm{qcx} &{} \, &{} \, &{} \, &{} \, \end{array} \end{aligned}$$

In [29, Theorem 2] one can see that the Euclidean norm is strongly quasiconvex on any bounded convex set, but not strongly convex. Note also that a quasiconvex function can be seen as a strongly quasiconvex one with modulus \(\gamma = 0\), so “h is strongly quasiconvex with modulus \(\gamma \ge 0\)” covers both these notions.

Examples of strongly quasiconvex functions which are not convex can be found in [17, 27,28,29, 32, 52]. For instance, the function \(x \in {\mathbb {R}}^{n} \mapsto \sqrt{\Vert x \Vert }\), which has been employed in studies on information protection (see [48]) or on Gram–Schmidt orthogonalization methods (see [46]), is (see [32, Theorem 17]) strongly quasiconvex on any bounded and convex subset of \({\mathbb {R}}^{n}\) and, as we will see in Sect. 4, some classes of fractional functions (intensively studied for their economic applications, see [43, 45]) are strongly quasiconvex (see [28]), too.

A proper function \(h: {\mathbb {R}}^{n} \rightarrow \overline{{\mathbb {R}}}\) is said to be 2-supercoercive when

$$\begin{aligned} \liminf _{\Vert x \Vert \rightarrow + \infty } \frac{h(x)}{\Vert x \Vert ^{2}} > 0. \end{aligned}$$

According to [32, Theorem 1], any function that is strongly quasiconvex on a convex set \(K\subseteq {\mathbb {R}}^n\) is also 2-supercoercive. Hence, lower semicontinuous strongly quasiconvex functions have exactly one minimizer on closed convex finitely-dimensional subsets (see [32, Corollary 3]), an observation useful when investigating proximal point-type methods for optimization problems involving such functions (see [17, 27, 32]).

Take \(h: {\mathbb {R}}^{n} \rightarrow \overline{{\mathbb {R}}}\) proper and \(K\subseteq {\mathbb {R}}^{n}\) closed and convex, such that \(K \subseteq {{\,\textrm{dom}\,}}\,h\). The proximal operator of h on K of parameter \(\beta > 0\) at \(x \in {\mathbb {R}}^{n}\) is

$$\begin{aligned} {{\,\textrm{Prox}\,}}_{\beta h}: {\mathbb {R}}^{n} \rightrightarrows {\mathbb {R}}^{n}, \ {{\,\textrm{Prox}\,}}_{\beta h} (K, x) = {{\,\textrm{argmin}\,}}_{y \in K} \left\{ h(y) + \frac{1}{2 \beta } \Vert y - x \Vert ^{2} \right\} . \end{aligned}$$

When \(K = {\mathbb {R}}^{n}\), we write \({{\,\textrm{Prox}\,}}_{\beta h} ({\mathbb {R}}^{n}, \cdot )\) simpler as \({{\,\textrm{Prox}\,}}_{\beta h} (\cdot )\).

The next result from [32] will be necessary later.

Lemma 1

(cf. [32, Proposition 6]) Let K be a closed and convex set in \({\mathbb {R}}^{n}\), \(h: {\mathbb {R}}^{n} \rightarrow \overline{{\mathbb {R}}}\) be a proper, lower semicontinuous, strongly quasiconvex function with modulus \(\gamma \ge 0\) and such that \(K \subseteq {{\,\textrm{dom}\,}}\,h\), \(\beta > 0\) and \(x \in K\). If \({\overline{x}} \in {{\,\textrm{Prox}\,}}_{\beta h} (K, x)\), then for all \(y \in K\) and all \(\lambda \in [0, 1]\) one has

$$\begin{aligned} h({\overline{x}}) - \max \{h(y), h({\overline{x}})\} \le \frac{\lambda }{\beta } \langle {\overline{x}} - x, y - {\overline{x}} \rangle + \frac{\lambda }{2} \left( \frac{\lambda }{\beta } - \gamma + \lambda \gamma \right) \Vert y - {\overline{x}} \Vert ^{2}. \end{aligned}$$

Different to the convex case, it has been shown in [32, (1.4)] that the sum of a quasiconvex and a half of a squared norm is not necessarily a strongly quasiconvex function. Additionally, [32, Remark 6] exhibits an example where not even the sum of a strongly quasiconvex function and a half of a squared norm is strongly quasiconvex. Consequently, (see also [32, Remark 19(ii)]) the proximal operator of a strongly quasiconvex function is usually set-valued, containing a singleton when the sum of the strongly quasiconvex function and a half of a squared norm is strongly quasiconvex.

A bifunction \(f: {\mathbb {R}}^{n}\times {\mathbb {R}}^{n} \rightarrow {\mathbb {R}}\) is monotone on \(K \subseteq {\mathbb {R}}^{n}\), when

$$\begin{aligned} f(x, y) + f(y, x) \le 0\ ~ \forall ~ x, y \in K. \end{aligned}$$

When the following weaker condition (see [19] for counter-examples for the opposite implication) is fulfilled

$$\begin{aligned} f(x, y) \ge 0 ~ \Longrightarrow ~ f(y, x) \le 0 ~ \forall ~ x, y \in K, \end{aligned}$$

we say that f is pseudomonotone on K.

The next statement was essentially proven in [3, Theorem 2.1] and it will be useful for analyzing the convergence of the proposed algorithm.

Lemma 2

Let the sequences \(\{\varphi _{k}\}_{k}\), \(\{s_{k}\}_{k}\), \(\{\alpha _{k} \}_{k}\) and \(\{\delta _{k}\}_{k}\) in \([0, + \infty [\) and let \(\alpha \in {\mathbb {R}}\) be such that \(0 \le \alpha _{k} \le \alpha < 1\) and

$$\begin{aligned} \varphi _{k+1} - \varphi _{k} + s_{k+1} \le \alpha _{k} (\varphi _{k} - \varphi _{k-1}) + \delta _{k}, ~ \forall ~ k \ge 0. \end{aligned}$$
(2.6)

Then the following assertions hold

(a):

for all \(k\ge 1\), we have \(\displaystyle {\varphi _{k} + \sum _{j=1}^{k} \, s_{j} \le \varphi _{0} + \dfrac{1}{1 - \alpha } \sum _{j=0}^{k-1} \, \delta _{j}}\);

(b):

if \(\sum ^{\infty }_{k=0} \delta _{k} < + \infty \), then \(\lim _{ k \rightarrow \infty } \, \varphi _{k}\) exists, i.e., the sequence \(\{\varphi _k \}_{k}\) converges to some element in \([0, + \infty [\).

More on generalized convexity (in particular (strong) quasiconvexity) and generalized monotonicity can be found, for instance, in [10, 19, 29, 32, 38, 44, 49].

3 Relaxed-Inertial Proximal Point Type Algorithm

Let K be a nonempty set in \({\mathbb {R}}^{n}\) and \(f: K \times K \rightarrow {\mathbb {R}}\) be a bifunction. Consider the equilibrium problem

$$\begin{aligned} \mathrm{find ~} {\overline{x}} \in K: ~~ f({\overline{x}}, y) \ge 0, ~ \forall ~ y \in K, \end{aligned}$$
(EP)

and denote by S(Kf) the set of solutions of (EP).

The following assumptions on the mathematical objects involved in the equilibrium problem are necessary for investigating the convergence of the algorithm we propose for solving (EP)

(A1):

\(y\mapsto f(\cdot , y)\) is upper semicontinuous for all \(y \in K\);

(A2):

f is pseudomonotone on K;

(A3):

f is lower semicontinuous (jointly in both arguments);

(A4):

\(f(x, \cdot )\) is strongly quasiconvex on K with modulus \(\gamma > 0\) whenever \(x \in K\);

(A5):

f satisfies the following Lipschitz-continuity type condition: there exists \(\eta > 0\) for which

$$\begin{aligned} f(x, z) - f(x, y) - f(y, z) \le \eta \left( \Vert x - y \Vert ^{2} + \Vert y - z \Vert ^{2}\right) , ~ \forall ~ x, y, z \in K. \end{aligned}$$
(3.1)

As was noted in [27, Remark 3.1(i)], the following usual assumption when dealing with equilibrium problems is a direct consequence of (A2) and (A5). We mention it, as it will be employed in some intermediate results which do not require the satisfaction of (A5).

(A0):

\(f(x,x) = 0\) for all \(x\in K\).

Moreover, one usually augments (A1) with the following hypothesis:

\((A1')\):

\(f(x, \cdot )\) is lower semicontinuous for all \(x \in K\),

that is a consequence of (A3) and will be considered when the latter is not required.

While (A1), (A2), (A3) and (A5) are standard assumptions in the related literature, (A4) relaxes the standard convexity hypothesis usually imposed on f in its second argument in the literature.

Some comments on these hypotheses follow (see also [27, Remark 3.1]).

Remark 3

(i):

We know by [27, Proposition 3.1] that if f satisfies (Ai) with \(i=0,1,1', 2,4\), then \(S(K, f) \ne \emptyset \) and is actually a singleton.

(ii):

When \(f(x, y) = \langle T(x), y - x\rangle \), with L-Lipschitz continuous \(T: K \rightarrow {\mathbb {R}}^{n}\) (\(L >0\)), relation (3.1) is fulfilled for \(\eta = {L}/{2}\) (see [40]).

The relaxed-inertial proximal point-type method we propose for solving (EP) is the following.

Algorithm 1
figure a

RIPPA-EP for Strongly Quasiconvex EP’s

Next, we point out some observations regarding Algorithm 1.

Remark 4

(i):

Step 1 in Algorithm 1 is actually a proximal point step with inertial effects applied to the function \(f(y^{k}, \cdot )\) at \(y^{k}\) over the set K, for every \(k\ge 0\). Recall that inertial effects that improve the performance of proximal point algorithms were introduced by Alvarez and Attouch in [3] and are based on the heavy ball method introduced by Polyak in [39]. Even though the proximal operator is mostly considered in the literature with respect to the whole underlying space, in papers such as [7, 16, 17] it is taken on a certain set. The same algorithm has been proposed for solving (EP) in [23] (of which we became aware only when this paper was complete, hence the possible redundancies) as a relaxed-inertial extragradient method and [47], via an explicit discretization of a second-order dynamical system, under different hypotheses that include the convexity of f in the second variable. Even if this is not explicitly acknowledged, in these works the proximal operator is considered with respect to the underlying set of (EP), too. Other relaxed-inertial proximal point-type algorithms for solving convex equilibrium problems were introduced in [22, 36].

(ii):

In virtue of relations (3.2) and (3.4), K should be assumed to be an affine subspace and not only as a closed and convex set. We note that the authors in [2, 3, 15, 34] considered their algorithms on the whole space, while in [23, 47] K was taken closed and convex, however the issue of staying feasible after the extrapolation step does not seem to have been taken into consideration.

(iii):

If \(\alpha =0\), then Algorithm 1 reduces to the standard proximal point algorithm in [27] to which a relaxation step (that vanishes when \(\rho _k=1\) for all \(k\ge 0\)) is added.

3.1 Convergence Analysis

As a first result, we validate the stopping criterion of Algorithm 1 whose proof is skipped as it is similar to the one of [27, Proposition 3.2].

Proposition 5

Let K be an affine subspace in \({\mathbb {R}}^{n}\), \(\{\beta _{k}\}_{k}\) and \(\{\rho _{k}\}_{k}\) sequences of positive numbers, \(\{x^{k}\}_{k}\), \(\{y^{k}\}_{k}\) and \(\{z^{k}\}_{k}\) the sequences generated by Algorithm 1 and suppose that assumptions (A0), (A1), \((A1')\) and (A5) are valid. If \(y^{k} = z^{k}\), then \(x^{k+1}=y^{k}\) is solution of (EP), i.e., \(\{y^{k}\} = S(K, f)\).

The following result is the starting point of our analysis.

Proposition 6

Let K be an affine subspace in \({\mathbb {R}}^{n}\), \(\{\beta _{k}\}_{k}\) and \(\{\rho _{k}\}_{k}\) be sequences of positive numbers, \(\{x^{k}\}_{k}\), \(\{y^{k}\}_{k}\) and \(\{z^{k}\}_{k}\) be the sequences generated by Algorithm 1 and suppose that assumptions (Ai) with \(i =1, 1', 2,4,5\) hold. Take \(\{{\overline{x}}\} = S(K, f)\). Then for every \(k \ge 0\), at least one of the following inequalities holds

$$\begin{aligned}&\Vert x^{k+1} - {\overline{x}}\Vert ^{2} \le \, \Vert y^{k} -{\overline{x}}\Vert ^{2} - \frac{2 - \rho _{k}}{\rho _{k}} \Vert x^{k+1} - y^{k} \Vert ^{2} - \frac{\rho _{k} (\gamma \beta _{k} - 1)}{2} \Vert z^{k} - {\overline{x}} \Vert ^{2} , \end{aligned}$$
(3.5)
$$\begin{aligned}&\Vert x^{k+1} - {\overline{x}} \Vert ^{2} \le \, \Vert y^{k} -{\overline{x}}\Vert ^{2} - \frac{2 - 4 \eta \beta _{k} - \rho _{k}}{\rho _k} \Vert x^{k+1} - y^{k} \Vert ^{2}\nonumber \\&\hspace{2.3cm} - \frac{\rho _{k} (\gamma \beta _{k} - 1 - 8 \eta \beta _{k} )}{2} \Vert z^{k} - {\overline{x}} \Vert ^{2}. \end{aligned}$$
(3.6)

Proof

First, note that the relaxation step (3.4) of Algorithm 1 yields

$$\begin{aligned} x^{k+1} - {\overline{x}}= (1 - \rho _{k})(y^{k} -{\overline{x}}) + \rho _{k} (z^{k} - {\overline{x}}). \end{aligned}$$

Then using identity (2.2), we have

$$\begin{aligned} \Vert x^{k+1} - {\overline{x}} \Vert ^{2} = (1-\rho _k) \Vert y^{k} - {\overline{x}} \Vert ^{2} + \rho _{k} \Vert z^{k} - {\overline{x}} \Vert ^{2} - \rho _{k} (1 - \rho _{k}) \Vert y^{k} - z^{k} \Vert ^{2}, \end{aligned}$$

or equivalently,

$$\begin{aligned} \Vert x^{k+1} - {\overline{x}}\Vert ^2 - \Vert y^{k} - {\overline{x}} \Vert ^{2} =&\, \rho _{k} (\Vert z^{k} - {\overline{x}} \Vert ^{2} - \Vert y^{k} - {\overline{x}} \Vert ^{2}) - \rho _{k} (1 - \rho _{k}) \Vert y^{k} - z^{k} \Vert ^{2}. \end{aligned}$$

Furthermore, it follows from relation (2.1) that

$$\begin{aligned} \Vert z^{k}- {\overline{x}} \Vert ^{2} - \Vert y^{k} - {\overline{x}} \Vert ^{2} = \Vert z^{k} - y^{k} \Vert ^{2} + 2 \langle z^{k} - y^{k}, y^{k} - {\overline{x}} \rangle . \end{aligned}$$

Therefore, combining the last two inequalities we obtain

$$\begin{aligned} \Vert x^{k+1} - {\overline{x}}\Vert ^2 - \Vert y^{k} - {\overline{x}} \Vert ^{2} =&\,2\rho _{k} \langle z^{k}-y^k, y^k-{\overline{x}}\rangle + \rho _{k}^2 \Vert y^k-z^{k}\Vert ^2. \end{aligned}$$
(2.7)

On the other hand, for every \(k\ge 0\), step (3.3) and Lemma 1 yield

$$\begin{aligned} f(y^{k}, z^{k}) - \max \{f(y^{k}, z^{k}),&\, f(y^{k}, {\overline{x}})\} \le \frac{\lambda }{\beta _{k}} \langle z^{k} - y^{k}, {\overline{x}} - z^{k} \rangle \\&+ \frac{\lambda }{2} \left( \frac{\lambda }{\beta _{k}} - \gamma + \lambda \gamma \right) \Vert z^{k} - {\overline{x}} \Vert ^{2}, \, \forall \, \lambda \in [0, 1]. \end{aligned}$$

We consider two cases.

(i):

If \(f(y^{k}, z^{k}) \ge f(y^{k}, {\overline{x}})\), then

$$\begin{aligned}&\frac{1}{\beta _{k}} \langle z^{k} - y^{k}, z^{k} - {\overline{x}} \rangle \le \frac{1}{2} \left( \frac{\lambda }{\beta _{k}} - \gamma + \lambda \gamma \right) \Vert z^{k} - {\overline{x}} \Vert ^{2}, \, \forall \, \lambda \in \, ]0, 1] \nonumber \\&\overset{\lambda = \frac{1}{2}}{\Longrightarrow } \hspace{1.0cm} \langle z^{k} - y^{k}, z^{k} - {\overline{x}} \rangle \, \le \, \frac{1 - \gamma \beta _{k}}{4} \Vert z^{k} - {\overline{x}} \Vert ^{2}, \nonumber \\&\Longrightarrow \langle z^{k} - y^{k}, y^{k} - {\overline{x}} \rangle \le - \Vert z^{k} - y^{k} \Vert ^{2}+\frac{1-\gamma \beta _{k}}{4} \Vert z^{k} - {\overline{x}} \Vert ^{2}. \end{aligned}$$
(2.8)

Thus, combining relations (2.7) and (2.8), and then using the fact \(z^{k} - y^{k} = (1/{\rho _{k}}) (x^{k+1} - y^{k})\), we obtain (3.5).

(ii):

If \(f(y^{k}, z^{k}) < f(y^{k}, {\overline{x}})\), then

$$\begin{aligned} 0&\le ~ \frac{\lambda }{\beta _{k}} \langle z^{k} - y^{k}, {\overline{x}} - z^{k} \rangle + \frac{\lambda }{2} \left( \frac{\lambda }{\beta _{k}} - \gamma + \lambda \gamma \right) \Vert z^{k} - {\overline{x}} \Vert ^{2} \\&~~~~ + f(y^{k}, {\overline{x}}) - f(y^{k}, z^{k}) \\&\le \frac{\lambda }{\beta _{k}} \langle z^{k} - y^{k}, {\overline{x}} - z^{k} \rangle + \frac{\lambda }{2} \left( \frac{\lambda }{\beta _{k}} - \gamma + \lambda \gamma \right) \Vert z^{k} - {\overline{x}} \Vert ^{2} \\&~~~~ + f(z^{k}, {\overline{x}}) + \eta \left( \Vert z^{k} - y^{k} \Vert ^{2} +\Vert z^{k} - {\overline{x}} \Vert ^{2}\right) , \end{aligned}$$

where the last inequality follows from assumption (A5). Furthermore, since \(\{{\overline{x}}\} = S(K, f)\) and f is pseudomonotone, \(f(z^{k}, {\overline{x}}) \le 0\), so that

$$\begin{aligned} 0&\le ~ \frac{\lambda }{\beta _{k}} \langle z^{k} - y^{k}, {\overline{x}} - z^{k} \rangle + \frac{\lambda }{2} \left( \frac{\lambda }{\beta _{k}} - \gamma + \lambda \gamma \right) \Vert z^{k} - {\overline{x}} \Vert ^{2} \\&~~~~ + \eta \left( \Vert z^{k} - y^{k} \Vert ^{2} + \Vert z^{k} - {\overline{x}} \Vert ^{2}\right) , ~ \forall ~ \lambda \in [0, 1]. \end{aligned}$$

This implies

$$\begin{aligned}&\frac{\lambda }{\beta _{k}} \langle z^{k} - y^{k}, z^{k} - {\overline{x}} \rangle \le \frac{\lambda }{2} \left( \frac{\lambda }{\beta _{k}} - \gamma + \lambda \gamma \right) \Vert z^{k} - {\overline{x}} \Vert ^{2} + \eta \left( \Vert z^{k} - y^{k} \Vert ^{2} + \Vert z^{k} - {\overline{x}} \Vert ^{2}\right) \nonumber \\&\hspace{3.0cm} = \eta \Vert z^{k} - y^{k} \Vert ^{2} + \frac{\lambda }{2} \left( \frac{\lambda }{\beta _{k}} - \gamma + \lambda \gamma +\frac{2\eta }{ \lambda } \right) \Vert z^{k} - {\overline{x}} \Vert ^{2} \nonumber \\&\overset{\lambda = \frac{1}{2}}{\Longrightarrow } ~ \langle z^{k} - y^{k}, y^{k} - {\overline{x}} \rangle \le \, \left( 2\eta \beta _{k}-1\right) \Vert z^{k} - y^{k} \Vert ^{2} \nonumber \\&\hspace{3.8cm} + \left( 2\eta \beta _{k} + \frac{1 - \gamma \beta _{k}}{4} \right) \Vert z^{k} - {\overline{x}} \Vert ^{2}. \end{aligned}$$
(2.9)

Finally, combining (2.9) with (2.7) and using \(z^{k} - y^{k} = ({1}/{\rho _{k}}) (x^{k+1} - y^{k})\), the desired inequality (3.6) holds.

Therefore, for every \(k \ge 0\), we obtain (3.5) or (3.6). \(\square \)

Remark 7

(i):

Observe that

$$\begin{aligned} \dfrac{\rho _k (\gamma \beta _k - 1 - 8 \eta \beta _k)}{2}> 0 \, \Longleftrightarrow \, \beta _{k} > \frac{1}{\gamma - 8 \eta }. \end{aligned}$$

By this and since \(\rho _{k} > 0\) for every \(k \ge 0\), we have \(\rho _{k} (\gamma \beta _k - 1) \ge \rho _k (\gamma \beta _k - 1 - 8 \eta \beta _k) > 0\). Hence, the last terms in the right-hand side of inequalities (3.5) and (3.6) are nonnegative, i.e.,

$$\begin{aligned} \frac{\rho _k (\gamma \beta _k - 1)}{2} \Vert z^k - {\overline{x}} \Vert ^2 \ge 0 ~~ \text {and} ~~ \frac{\rho _k (\gamma \beta _k - 1 - 8 \eta \beta _k) }{2} \Vert z^k - {\overline{x}} \Vert ^2 \ge 0. \end{aligned}$$
(ii):

Since \(\{\rho _{k}\}_{k} \subseteq [1 - \rho , 1 + \rho ]\), we have

$$\begin{aligned} \frac{2 - 4 \eta \beta _k - \rho _{k}}{\rho _k} \ge \frac{2 - 4 \eta \beta _k - (1 + \rho )}{1 + \rho } = \frac{1 - 4 \eta \beta _k - \rho }{1 + \rho } > 0, \end{aligned}$$

if and only if there exists \(\epsilon > 0\) such that \(\beta _k < \epsilon \le {1}/({4 \eta })\) and \(0 \le \rho \le 1 - 4 \eta \varepsilon \). So,

$$\begin{aligned} \frac{1 - 4 \eta \beta _k - \rho }{1 + \rho } \ge \frac{1 - 4 \eta \epsilon - \rho }{1 + \rho } \ge 0 \, \Longleftrightarrow \, 0 \le \rho \le 1 - 4 \eta \varepsilon . \end{aligned}$$

As a consequence of the previous remark, and in order to facilitate the convergence analysis of our proposed algorithm, we assume that the parameter sequences \(\{\beta _{k}\}_{k}\) and \(\{\rho _{k}\}_{k}\) satisfy the following hypotheses: there exists \(\epsilon > 0\) such that

(C1):

\(\frac{1}{\gamma - 8 \eta }< \beta _{k} < \epsilon \le \frac{1}{4 \eta }\) for every \(k \ge 0\);

(C2):

\(0 < 1 - \rho \le \rho _{k} \le 1 + \rho \) with \(0 \le \rho \le 1 - 4 \eta \epsilon \).

We immediately note the following connections of these hypotheses with the conditions required for the convergence of the standard proximal point algorithm for solving (EP) proposed in [27].

Remark 8

(i):

If we consider \(\epsilon = \frac{1}{4 \eta }\), then \(\rho = 0\), \(\rho _{k} = 1\) for all \(k\ge 0\) and (C1) becomes \(\frac{1}{\gamma - 8 \eta }< \beta _{k} < \frac{1}{4 \eta }\) for every \(k \ge 0\), which is actually [27, Assumption (A6)] for the usual proximal point algorithm for pseudomonotone equilibrium problems. Note also that the hypotheses considered in [47, Section 5] or [23, Section 3] for ensuring the convergence of Algorithm 1 in the convex case are not significantly less complicated than the ones imposed in this paper.

(ii):

Assumption (C1) (in particular, [27, Assumption (A6)]) is related to (MVI) and (IMVI) and their variants, when the involved function \(\varphi \) is nonconvex and, as we explained before, this assumption mediates the “conflict” between the operator T and the function \(\varphi \) (e.g., in Example 25), which in general could lead to equilibrium problems without solutions, even when they are defined over convex compact sets.

As a consequence of Proposition 6 and Remark 7(i), we have the following statement.

Corollary 9

Let K be an affine subspace in \({\mathbb {R}}^{n}\), f such that assumptions (Ai) with \(i = 1, 1', 2, 4, 5\) hold, \(\{\alpha _{k}\}_{k}\), \(\{\beta _{k}\}_{k}\) and \(\{\rho _{k}\}_{k}\) sequences of positive numbers such that assumption (C1) holds and \(\{x^{k}\}_{k}\) and \(\{y^{k}\}_{k}\) the sequences generated by Algorithm 1. Taking \(\{{\overline{x}}\} = S(K, f)\), then for every \(k \ge 0\), at least one of the following inequalities holds

$$\begin{aligned}&\Vert x^{k+1} - {\overline{x}} \Vert ^{2} \le \, \Vert y^{k} - {\overline{x}} \Vert ^{2} - \frac{2 - \rho _{k}}{\rho _{k}} \Vert x^{k+1} - y^{k} \Vert ^{2} , \end{aligned}$$
(2.10)
$$\begin{aligned}&\Vert x^{k + 1} - {\overline{x}} \Vert ^{2} \le \Vert y^{k} - {\overline{x}} \Vert ^{2} - \frac{2 -4 \eta \beta _{k} - \rho _{k}}{\rho _k} \Vert x^{k+1} - y^{k} \Vert ^{2}. \end{aligned}$$
(2.11)

The following result will be useful later.

Proposition 10

Let K be an affine subspace in \({\mathbb {R}}^{n}\), f such that assumptions (Ai) with \(i = 1, 1', 2, 4, 5\) hold, \(\{\alpha _{k}\}_{k}\), \(\{\beta _{k}\}_{k}\) and \(\{\rho _{k}\}_{k}\) sequences of positive numbers such that assumption (C1) holds and \(\{x^{k}\}_{k}\) and \(\{y^{k}\}_{k}\) the sequences generated by Algorithm 1. Take \(\{{\overline{x}}\} = S(K, f)\) and \(\varphi _{k}:= \Vert x^{k} - {\overline{x}} \Vert ^{2}\) for every \(k \ge 0\). Then, for every \(k \ge 0\), at least one of the following inequalities holds

$$\begin{aligned}&\varphi _{k+1} - \varphi _{k} - \alpha _k(\varphi _{k} - \varphi _{k-1}) + \dfrac{2 - \rho _{k}}{\rho _{k}} \Vert x^{k+1} - y^{k} \Vert ^{2} \le (\alpha _{k}^{2} + \alpha _{k}) \Vert x^{k} - x^{k-1} \Vert ^{2}, \end{aligned}$$
(2.12)
$$\begin{aligned}&\quad \varphi _{k+1} - \varphi _{k} - \alpha _{k} (\varphi _{k} - \varphi _{k-1}) + \frac{2 - 4 \eta \beta _{k} - \rho _{k}}{\rho _k} \Vert x^{k+1} - y^{k} \Vert ^{2} \nonumber \\&\hspace{4.5cm} \le (\alpha _{k}^{2} + \alpha _{k}) \Vert x^{k} - x^{k-1} \Vert ^{2}. \end{aligned}$$
(2.13)

Proof

Observe that using relation (3.2), identity (2.1) and the definition of \(\varphi _{k}\), we have

$$\begin{aligned} \Vert y^{k} - {\overline{x}} \Vert ^2&= \Vert x^{k} + \alpha _{k} (x^{k} - x^{k-1}) - {\overline{x}} \Vert ^{2} \nonumber \\&= \Vert x^{k} - {\overline{x}} \Vert ^{2} + \alpha _{k}^{2} \Vert x^{k} - x^{k-1} \Vert ^{2} + 2 \alpha _{k} \langle x^{k} - {\overline{x}} , x^{k} - x^{k-1} \rangle \nonumber \\&= \varphi _{k} + \alpha _{k}^{2} \Vert x^{k} - x^{k-1} \Vert ^{2} + \alpha _{k} (\varphi _{k} + \Vert x^{k} - x^{k-1} \Vert ^{2} - \varphi _{k-1}) \nonumber \\&= \varphi _{k} + \alpha _{k} (\varphi _{k} - \varphi _{k-1}) + (\alpha _{k}^{2} + \alpha _{k}) \Vert x^{k} - x^{k-1} \Vert ^{2}. \end{aligned}$$
(2.14)

Hence, the desired inequalities (2.12) and (2.13) follow directly by replacing (2.14) in (2.10) and (2.11), respectively. \(\square \)

Our first main result which shows that the sequence generated by Algorithm 1 converges to a solution of problem (EP) is given below.

Theorem 11

Let K be an affine subspace in \({\mathbb {R}}^{n}\), f be such that assumptions (Ai) with \(i = 1, 2,3,4,5\) hold, \(\{\alpha _{k}\}_{k}\), \(\{\beta _{k}\}_{k}\) and \(\{\rho _{k}\}_{k}\) be sequences of positive numbers such that assumptions (Ci) with \(i=1,2\) hold, \(\{x^{k}\}_{k}\), \(\{y^{k}\}_{k}\) and \(\{z^{k}\}_{k}\) be the sequences generated by Algorithm 1. If

$$\begin{aligned} \sum _{k=0}^\infty \, \alpha _{k} \Vert x^{k} - x^{k-1} \Vert ^{2} < + \infty , \end{aligned}$$
(2.15)

then the following assertions hold

(a):

for \(\{{\overline{x}}\} = S(K, f)\), the limit \(\lim _{k \rightarrow \infty }\, \Vert x^k - {\overline{x}} \Vert \) exists and

$$\begin{aligned} \lim _{k \rightarrow + \infty } \Vert x^{k+1} - y^{k} \Vert ^{2} = \lim _{k \rightarrow + \infty } \Vert z^{k} - y^{k} \Vert ^{2} = 0; \end{aligned}$$
(2.16)
(b):

the sequences \(\{x^{k}\}_{k}\), \(\{y^{k}\}_{k}\) and \(\{z^{k}\}_{k}\) converge to the same limit, that is a solution to (EP).

Proof

(a): Defining \(\delta _{k} = (\alpha _k^2 + \alpha _{k}) \Vert x^{k} - x^{k-1} \Vert ^{2}\), \(k\ge 0\), and using Proposition 10 we observe that condition (2.6) in Lemma 2 is fulfilled with \(s_{k+1} = (({2 - \rho _k})/{\rho _{k}})\) \(\Vert x^{k+1} - y^{k}\Vert ^{2}\) or \(s_{k+1} = (({2 -4\eta \beta _k - \rho _{k}})/{\rho _k})\Vert x^{k+1} - y^{k}\Vert ^{2}\). In addition, from \(\{\alpha _{k}\}_k\subset [0,1[\) and condition (2.15) follows

$$\begin{aligned} \sum _{k=0}^{+ \infty } \delta _{k} =\sum _{k=0}^{+ \infty } (\alpha _k^2 + \alpha _{k}) \Vert x^{k} - x^{k-1} \Vert ^{2} \le 2 \sum _{k=0}^{+ \infty } \alpha _{k} \Vert x^{k} - x^{k-1} \Vert ^{2} < + \infty . \end{aligned}$$

Hence, we infer from this and Lemma 2(b) that \(\lim _{k \rightarrow + \infty } \varphi _{k}\) exists. Thus for \(\{{\overline{x}}\} = S(K, f)\), we have that \(\lim _{k \rightarrow \infty } \Vert x^k- {\overline{x}} \Vert \) exists, which, in particular, guarantees the boundedness of the sequence \(\{x^{k}\}_{k}\). Likewise, condition (2.15) and Lemma 2(a) yield \(\sum _{k=0}^{\infty } s_{k+1} < + \infty \), and so, \(s_{k+1} \rightarrow 0\) as \(k \rightarrow + \infty \). From this and (Ci), \(i=1,2\), we conclude that \(\lim _{k \rightarrow + \infty } \Vert x^{k+1} - y^{k} \Vert ^{2} = 0\), and since \(x^{k+1} - y^{k} = \rho _{k}(z^{k} - y^k)\) for all \(k\ge 0\) by relation (3.4), it follows that \(\lim _{k \rightarrow +\infty } \Vert z^{k} - y^{k} \Vert ^{2} = 0\).

(b): By part (a), the sequence \(\{x^{k}\}_{k}\) is bounded, hence it has cluster points. Let \({\widehat{x}} \in K\) be one of them. Then, there exists a subsequence \(\{x^{k_{l}}\}_{l}\) of \(\{x^{k} \}_{k}\) such that \(x^{k_{l}} \rightarrow {\widehat{x}}\) as \(l \rightarrow + \infty \). Applying (2.16) to \(\{x^{k_{l}}\}_{l}\) and using that \(\lim _{l \rightarrow + \infty } x^{k_{l}} = {\widehat{x}}\), we conclude that

$$\begin{aligned} \lim _{l \rightarrow + \infty } y^{k_{l}} =\lim _{l \rightarrow + \infty } z^{k_{l}} = {\widehat{x}}. \end{aligned}$$
(2.17)

It follows from (3.3) that for any \(k\ge 0\) one has

$$\begin{aligned}&f(y^{k}, z^{k}) + \frac{1}{2 \beta _{k}} \Vert y^{k} - z^{k} \Vert ^{2} \le f(y^{k}, x) + \frac{1}{2 \beta _{k}} \Vert y^{k} - x \Vert ^{2}, ~ \forall ~ x \in K \\&~~~~ \Longrightarrow f(y^{k}, z^{k}) \le f(y^{k}, x) + \frac{1}{2 \beta _{k}} \Vert y^{k} - x \Vert ^{2}, ~ \forall ~ x \in K. \end{aligned}$$

Take \(x=\lambda y + (1-\lambda ){\widehat{x}}\) with \(y \in K\) and \(\lambda \in [0, 1]\). Then,

$$\begin{aligned} f(y^{k}, z^{k})&\le f(y^{k}, \lambda y + (1-\lambda ){\widehat{x}}) + \frac{1}{2 \beta _{k}} \Vert \lambda (y^{k} - y) + (1-\lambda )(y^{k} - {\widehat{x}}) \Vert ^{2} \\&\le \max \{f(y^{k}, y), f(y^{k}, {\widehat{x}})\} - \frac{\lambda (1 - \lambda ) \gamma }{2} \Vert y - {\widehat{x}} \Vert ^{2} \\&\quad + \frac{\lambda }{2 \beta _{k}}\Vert y^{k} - y \Vert ^{2} + \frac{(1 - \lambda )}{2 \beta _{k}} \Vert y^{k} - {\widehat{x}} \Vert ^{2} - \frac{ \lambda (1 - \lambda )}{2 \beta _{k}} \Vert y - {\widehat{x}} \Vert ^{2} \\&= \max \{f(y^{k}, y), f(y^{k}, {\widehat{x}})\} + \frac{\lambda }{\beta _{k}} \langle {\widehat{x}} - y^{k}, y - {\widehat{x}} \rangle + \frac{1}{2 \beta _{k}} \Vert y^{k} - {\widehat{x}} \Vert ^{2} \\&\quad + \frac{\lambda }{2} \left( \frac{\lambda }{\beta _{k}} - \gamma + \lambda \gamma \right) \Vert y - {\widehat{x}} \Vert ^{2}, ~ \forall ~ y \in K, ~ \forall ~ \lambda \in [0, 1]. \end{aligned}$$

Take \({\widehat{\beta }}:= {1}/({\gamma - 8 \eta })\). Replace k by \(k_{\ell }\) in the previous inequalities. Then, taking \(\limsup _{\ell \rightarrow + \infty }\) and using (2.17) and assumption (A3), we have

$$\begin{aligned} 0&= f({\widehat{x}}, {\widehat{x}}) \le \liminf _{\ell \rightarrow + \infty } f(y^{k_{\ell }}, z^{k_{\ell }}) \le \limsup _{\ell \rightarrow + \infty } f(y^{k_{\ell }}, z^{k_{\ell }}) \\&\le \max \{f({\widehat{x}}, y), 0\} + \frac{\lambda }{2} \left( \frac{\lambda }{{\widehat{\beta }}} - \gamma + \lambda \gamma \right) \Vert y - {\widehat{x}} \Vert ^{2}, ~ \forall ~ y \in K, ~ \forall ~ \lambda \in [0, 1]. \end{aligned}$$

Since \(\gamma > 0\), we can take \(\lambda < ({{\widehat{\beta }} \gamma })/({1 + {\widehat{\beta }} \gamma })\) in such a way that \({\lambda }/{{\widehat{\beta }}} - \gamma + \lambda \gamma < 0\). Thus

$$\begin{aligned} 0&\le \max \{f({\widehat{x}}, y), 0\} = f({\widehat{x}}, y), ~ \forall ~ y \in K \backslash \{{\widehat{x}}\}. \end{aligned}$$

Therefore, \({\widehat{x}} \in S(K, f)\).

Finally, since f is strongly quasiconvex in its second argument, S(Kf) is a singleton, invoking Opial’s Lemma (cf. [5, Lemma 2.39]), the whole sequence \(\{x^{k}\}_{k}\) converges to \(\{{\overline{x}}\} = S(K, f)\), which completes the proof. \(\square \)

Remark 12

A sufficient condition for the fulfillment of (2.15) is to choose the inertial parameter \(\alpha _k\) as follows: for some \(\theta \in ]0,1[\),

$$\begin{aligned} \alpha _k=\left\{ \begin{array}{ll} \min \left\{ \frac{\theta ^{k}}{||x^{k}-x^{k-1}||^2},\alpha \right\} , &{} \textit{if}\quad x^{k}\ne x^{k-1}, \\ \alpha , &{} \textit{otherwise}. \end{array} \right. \end{aligned}$$

In the next section, we provide another sufficient condition that connects the relaxation and Lipschitz condition parameters.

As consequences of Theorem 11, we obtain some particular cases. The first one discusses the convergence of Algorithm 1 when \(\alpha = 0\) (i.e., without the inertial step). Note that its convergence statement does not require K to be an affine subspace when \(\rho _k \in [0, 1]\).

Corollary 13

Let K be an affine subspace in \({\mathbb {R}}^{n}\), f such that assumptions (Ai) with \(i = 1, 2, 3, 4, 5\) hold, and \(\{\alpha _{k}\}_{k}\), \(\{\beta _{k}\}_{k}\) and \(\{\rho _{k}\}_{k}\) sequences of positive numbers such that assumptions (Ci) with \(i=1,2\) hold. Then, for the sequence \(\{x^k\}_{k}\) generated as follows: choose \(x^{0} \in K\) and for \(k\ge 0\) do

$$\begin{aligned} \textit{(RPPA-EP)}\qquad \left\{ \begin{array}{@{}ll@{}} &{} z^{k} \in {{\,\textrm{argmin}\,}}\left\{ f(x^{k}, x) + \frac{1}{2 \beta _{k}} \Vert x^{k} - x \Vert ^{2}: x \in K \right\} \\ &{} x^{k+1} = (1 - \rho _{k}) x^{k} + \rho _{k} z^{k}, \end{array} \right. \end{aligned}$$
(3.18)

we have

(a):

\(\displaystyle {\sum _{k=0}^{+ \infty } \dfrac{2 - \rho _{k}}{ \rho _{k}} \Vert x^{k+1} - x^{k} \Vert ^{2} < + \infty }\) and \(\displaystyle {\sum _{k=0}^{+ \infty } \frac{2 -4 \eta \beta _k - \rho _{k}}{ \rho _{k}} \Vert x^{k+1} - x^{k} \Vert ^{2} < + \infty }\);

(b):

when \(\{{\overline{x}}\}=S(K, f)\), \(\lim _{k \rightarrow + \infty } \Vert x^{k} - {\overline{x}} \Vert \) exists, and hence \(\{x^k\}_k \) is bounded;

(c):

the sequence \(\{x^{k}\}_{k}\) generated by (3.18) converges to a solution of (EP) as \(k\rightarrow +\infty \).

Let us now assume that \(\rho =\alpha =0\), that yields \(\rho _{k}=1\) and \(\alpha _k=0\) for every \(k\ge 0\). In this case we obtain a variant of [27, Theorem 3.1].

Corollary 14

Let K be a closed and convex set in \({\mathbb {R}}^{n}\), f such that assumptions (Ai) with \(i = 1, 2,3, 4, 5\) hold and the sequence \(\{\beta _{k}\}_{k}\) such that assumption (C1) holds. Let \(\{x^{k}\}_{k}\) be the sequence generated by the following iterative rule, for \(k\ge 0\),

$$\begin{aligned} \textit{(PPA-EP)}\quad \left\{ \begin{array}{@{}ll@{}} &{} x^{0}\in K,\\ &{} x^{k+1} \in {{\,\textrm{argmin}\,}}\left\{ f(x^{k}, x) + \dfrac{1}{2 \beta _{k}} \Vert x^{k} - x \Vert ^{2}:~x \in K\right\} .\\ \end{array} \right. \end{aligned}$$

Then \(\{x^{k}\}_{k}\) converges to \(\{{\overline{x}} \}= S(K, f)\) as \(k \rightarrow + \infty \).

Proof

Take \(\epsilon = {1}/(4 \eta )\) and use Remark 8.

Another consequence of Theorem 11 is [17, Theorem 3.1], which establishes the convergence of the inertial relaxed proximal point algorithm for minimizing strongly quasiconvex functions. \(\square \)

Corollary 15

Let K be a convex and closed set in \({\mathbb {R}}^{n}\), \(h: {\mathbb {R}}^{n} \rightarrow {\mathbb {R}}\) a lower semicontinuous and strongly quasiconvex function on K with modulus \(\gamma > 0\), \(\{\alpha _{k}\}_{k}\) and \(\{\beta _{k}\}_{k}\) sequences of positive numbers bounded away from 0 and \(\{\rho _{k}\}_{k}\) a sequence of positive numbers such that assumption (C2) holds. Let us consider the problem

$$\begin{aligned} \min _{x \in K} h(x). \end{aligned}$$
(2.19)

If \(\sum _{k=0}^{\infty }\,\alpha _{k} \Vert x^{k} - x^{k-1} \Vert ^{2} < + \infty \), then the sequence \(\{x^{k}\}_{k}\), generated by Algorithm 1 applied to the bifunction \(f_{h}: K \times K \rightarrow {\mathbb {R}}\), \(f_{h} (x, y):= h(y) - h(x)\), \(x, y \in K\), converges to the unique minimizer of h on K. Moreover, the sequences \(\{y^{k}\}_{k}\) and \(\{z^{k}\}_{k}\) converge both to the same limit.

Proof

It can be immediately verified that assumptions (A2) and (A4) hold for \(f_{h}\). Furthermore, since h is lower semicontinuous, \(f_{h}\) is lower semicontinuous on its first argument and upper semicontinuous on its second argument, so assumptions (A1) and \((A1')\) hold, which easily entails the validity of (A3). Finally, by definition of \(f_{h}\), we have

$$\begin{aligned} f_{h} (x, z) - f_{h} (x, y) - f_{h} (y, z)&= h(z) - h(x) -(h(y) - h(x)) - (h(z) - h(y)) = 0, \end{aligned}$$

i.e., assumption (A5) holds with \(\eta =0\), and hence assumption (C1) is valid as well since \(\gamma >0\). Hence, all assumptions needed for our previous results are valid in this case. Therefore, it follows from Theorem 11 that \(\{x^{k}\}_{k}\), \(\{y^{k}\}_{k}\) and \(\{z^{k}\}_{k}\) converge to the unique optimal solution of (2.19). \(\square \)

Remark 16

In particular, considering \(\rho = \alpha = 0\), i.e., \(\rho _{k} = 1\) and \(\alpha _{k} = 0\) for every \(k \ge 0\) in Algorithm 1, Corollary 15 collapses to [32, Theorem 15] (by taking \(\epsilon = {1}/(4 \eta )\) and using Remark 8).

3.2 Sufficient Conditions

In order to provide other sufficient hypotheses for the convergence of the sequences generated by Algorithm 1 to the solution of (EP) by ensuring the fulfilment of (2.15), additionally to assumptions (Ai) with \(i=1,2,3,4,5\) and (Ci) with \(i=1,2\), we consider the following condition

(C3):

the sequence \(\{\alpha _k\}_{k}\) is nondecreasing and there exists \({\overline{\alpha }} = {\xi }/({2 + \xi })\) such that

$$\begin{aligned} 0 \le \alpha _{k} \le \alpha _{k + 1} \le \alpha< {\overline{\alpha }} < 1, ~ \forall ~ k \ge 0, \end{aligned}$$
(2.20)

where

$$\begin{aligned} \xi :=\frac{1-4\eta \epsilon -\rho }{1+\rho }>0. \end{aligned}$$
(2.21)

To this end, we prove the following auxiliary result. Note that in its proof we “unify” the statements given in terms of “at least one inequality holds” that begin with Proposition 6.

Proposition 17

Let K be an affine subspace in \({\mathbb {R}}^{n}\), f such that assumptions (Ai) with \(i=1, 1', 2,4,5\) hold, \(\{\alpha _{k}\}_{k}\), \(\{\beta _{k}\}_{k}\) and \(\{\rho _{k}\}_{k}\) sequences of positive numbers such that assumptions (Ci) with \(i=1,2\) hold, and \(\{x^{k}\}\) the sequence generated by Algorithm 1. Take \(\{{\overline{x}}\} = S(K, f)\) and \(\varphi _{k}:= \Vert x^{k} - {\overline{x}} \Vert ^{2}\) for every \(k \ge 0\). Then, for every \(k \ge 0\), we have

$$\begin{aligned} \varphi _{k+1} - \varphi _{k} - \alpha _{k} (\varphi _{k} - \varphi _{k-1}) - \nu _k \Vert x^{k} - x^{k-1} \Vert ^{2} \le \xi (\alpha _k-1) \Vert x^{k+1} - x^{k} \Vert ^{2},\qquad \end{aligned}$$
(2.22)

where \(\nu _k:=(1+\xi )\alpha _k^2+(1-\xi )\alpha _k\) and \(\xi \) is defined by (2.21).

Proof

To show (2.22), it is enough to bound \(\{\Vert x^{k+1} - y^k \Vert ^2\}_k\) from below in relation (2.12) (and (2.13)) of Proposition 10. For \(k \ge 0\), a direct calculation using definition of \(y^k\) and the Cauchy–Schwartz inequality yield

$$\begin{aligned}{} & {} \Vert x^{k+1} - y^{k} \Vert ^{2} = \Vert x^{k+1} - x^{k} \Vert ^{2} + \alpha _{k}^{2} \Vert x^{k} - x^{k-1} \Vert ^{2} - 2 \alpha _{k} \langle x^{k+1} - x^{k}, x^{k} - x^{k-1} \rangle \\{} & {} \quad \ge \Vert x^{k+1} - x^{k} \Vert ^{2} + \alpha _{k}^{2} \Vert x^{k} - x^{k-1} \Vert ^{2} - 2 \alpha _{k} \Vert x^{k+1} - x^{k} \Vert \, \Vert x^{k} - x^{k-1} \Vert . \end{aligned}$$

Using the well-known identity \(2 pq \le p^{2} + q^{2}\) with \(p = \Vert x^{k+1} - x^k \Vert \) and \(q = \Vert x^{k} - x^{k-1} \Vert \), we conclude

$$\begin{aligned} \Vert x^{k+1} - y^k \Vert ^{2} \ge (1 - \alpha _{k}) \Vert x^{k+1} - x^{k} \Vert ^{2} + (\alpha _{k}^{2} - \alpha _{k}) \Vert x^{k} - x^{k-1} \Vert ^{2}. \end{aligned}$$
(2.23)

Moreover, observe that assumptions (C1) and (C2) yield

$$\begin{aligned} \frac{2 - \rho _{k}}{\rho _{k}} \ge \frac{2 - 4 \eta \beta _{k} - \rho _{k}}{ \rho _{k}}> \frac{1 - 4 \eta \epsilon - \rho }{1 + \rho } = \xi > 0. \end{aligned}$$

Finally, replacing (2.23) in (2.12) and taking into account the latter bound,

$$\begin{aligned} \varphi _{k+1} - \varphi _{k} - \alpha _{k} (\varphi _{k} - \varphi _{k-1}) \le&\, \left[ (1+\xi )\alpha _k^2+(1-\xi )\alpha _k \right] \Vert x^{k} - x^{k-1} \Vert ^{2} \\&+ \xi (\alpha _{k} -1) \Vert x^{k+1} - x^{k} \Vert ^{2}, \end{aligned}$$

which proves relation (2.22). \(\square \)

Our second main result, which provides another sufficient condition for the convergence of Algorithm 1, is given below.

Theorem 18

Let K be an affine subspace in \({\mathbb {R}}^{n}\), f such that assumptions (Ai) with \(i = 1, 2, 3, 4, 5\) hold, \(\{\alpha _{k}\}_{k}\), \(\{\beta _{k}\}_{k}\) and \(\{\rho _{k}\}_{k}\) sequences of positive numbers such that assumptions (Ci) with \(i=1,2,3\) hold, and \(\{x^{k}\}_{k}\), \(\{y^{k}\}_{k}\) and \(\{z^{k}\}_{k}\) the sequences generated by Algorithm 1. Then,

$$\begin{aligned} \sum _{k=1}^{\infty } \, \Vert x^{k} - x^{k-1} \Vert ^{2} < + \infty . \end{aligned}$$
(2.24)

As a consequence, the sequences \(\{x^{k}\}_{k}\), \(\{y^{k}\}_{k}\) and \(\{z^{k}\}_{k}\) converge to \(\{{\overline{x}}\} = S(K, f)\).

Proof

Define \(\mu _{0}:= (1 - \alpha _{0}) \varphi _{0} \ge 0\) and

$$\begin{aligned} \mu _{k} := \varphi _{k} - \alpha _{k} \varphi _{k-1} + \nu _{k} \Vert x^{k} - x^{k-1} \Vert ^{2}, ~ \forall ~ k \ge 1. \end{aligned}$$
(2.25)

Since \(\varphi _{k} \ge 0\) and \(\alpha _{k} \le \alpha _{k+1}\) for all \(k \ge 0\), it follows from (2.22) and (2.25) that

$$\begin{aligned} \mu _{k+1} - \mu _{k}&= \left( \varphi _{k+1} - \varphi _{k} - \alpha _{k} (\varphi _{k} - \varphi _{k-1}) - \nu _{k} \Vert x^{k} - x^{k-1} \Vert ^{2} \right) + \nu _{k+1} \Vert x^{k+1} - x^{k} \Vert ^{2} \nonumber \\&\le \left( \xi (\alpha _k-1)+ \nu _{k+1} \right) \Vert x^{k+1} - x^{k} \Vert ^{2} \nonumber \\&= \left( (1+\xi )\alpha _{k+1}^2 + (1 - \xi ) \alpha _{k+1} + \xi (\alpha _k - 1)\right) \Vert x^{k+1} - x^{k} \Vert ^{2} \nonumber \\&\le \left( (1+\xi )\alpha _{k+1}^2+ \alpha _{k+1}-\xi \right) \Vert x^{k+1} - x^{k} \Vert ^{2}. \end{aligned}$$
(2.26)

Now, as \(\alpha _{k} \in [0, 1[\) (hence \(\alpha _{k}^{2} \le \alpha _{k}\)) for every \(k \ge 0\), using (2.20), we have

$$\begin{aligned} (1 + \xi ) \alpha _{k+1}^2 + \alpha _{k+1} - \xi \le (2 + \xi ) \alpha _{k+1} - \xi \le (2 + \xi ) \alpha - \xi < 0. \end{aligned}$$

This implies that \(\mu _{k+1} \le \mu _{k}\) for all \(k \ge 0\), consequently, the sequence \(\{\mu _k\}_{k}\) is nonincreasing and bounded above by \(\mu _{0} = (1 - \alpha _0) \varphi _0\). So, Eq. (2.25) and the monotonicity of \(\{\mu _k\}_{k}\) yield

$$\begin{aligned} \mu _0 \ge \mu _k = \varphi _{k} - \alpha _{k} \varphi _{k-1} + \nu _{k} \Vert x^{k} - x^{k-1} \Vert ^{2} \ge \varphi _{k} - \alpha _{k} \varphi _{k-1}. \end{aligned}$$

From this and (2.20), we recursively obtain

$$\begin{aligned} \varphi _{k} \le \alpha \varphi _{k-1} + \mu _{0} \le \cdots \le \alpha ^{k} \varphi _0 + \mu _{0} \sum _{j=0}^{k-1} \alpha ^{j} \le \alpha ^{k} \varphi _{0} + \frac{\mu _0}{1 - \alpha }. \end{aligned}$$

By using relation (2.26), the above inequality, and taking into account that \(\mu _{k+1} \ge -\alpha \varphi _k\), we have

$$\begin{aligned} \sum _{j=0}^{k} \, \Vert x^{j+1} - x^{j} \Vert ^{2}&\le \dfrac{1}{\xi - (2 + \xi ) \alpha } (\mu _{0} - \mu _{k+1}) \le \dfrac{1}{\xi - (2 + \xi ) \alpha }(\mu _{0} + \alpha \varphi _{k})\nonumber \\&\le \dfrac{1}{\xi - (2+\xi )\alpha } \left( \mu _{0} + \alpha \left( \alpha ^{k} \varphi _{0} + \frac{\mu _{0}}{1 - \alpha } \right) \right) \nonumber \\&= \dfrac{1}{\xi - (2+\xi )\alpha } \left( \frac{\mu _{0}}{1 - \alpha } + \alpha ^{k+1} \varphi _{0} \right) . \end{aligned}$$
(2.27)

Hence, taking \(k \rightarrow + \infty \), we obtain

$$\begin{aligned} \sum _{k=0}^{+ \infty }\,\Vert x^{k+1} - x^{k} \Vert ^{2} \le \dfrac{1}{\xi - (2 + \xi ) \alpha } \frac{\mu _{0}}{1 - \alpha } < + \infty , \end{aligned}$$

which proves (2.24). The other conclusions follow directly from (2.24) and Theorem 11. \(\square \)

Remark 19

The above result extends [34, Theorem 3.2] and [17, Theorem 4.1] from minimization problems to pseudomonotone equilibrium problems involving strongly quasiconvex functions.

As a consequence of Theorem 18, we obtain the following result, which provides an estimate for the number of iterations necessary for Algorithm 1 to reach a solution of (EP).

Corollary 20

Consider the hypotheses of Theorem 18. For \(\{{\overline{x}}\} = S(K,f)\) and any positive integer k, there exists \(i \in \{0,\ldots , k\}\) such that

$$\begin{aligned} \Vert x^{i+1} - x^{i} \Vert \le \frac{1}{\sqrt{k}}C, \end{aligned}$$
(2.28)

where \(C:=\sqrt{\dfrac{1 + \alpha - \alpha ^2}{(1 - \alpha ) \left[ \xi - (2+\xi ) \alpha \right] }} \Vert x^0 - {\overline{x}}\Vert \). Moreover, given a tolerance \(\Vert z^{k}- y^{k}\Vert <\varepsilon \), Algorithm  1 finds a solution of (EP) after performing at most

$$\begin{aligned} \dfrac{(1+\alpha )(1 + \alpha - \alpha ^2)}{(1 - \alpha ) \left[ \xi - (2+\xi ) \alpha \right] }\dfrac{\Vert x^0 - {\overline{x}}\Vert ^2}{\varepsilon ^2}, \end{aligned}$$

iterations.

Proof

Indeed, since \(\alpha \in ]0,1[\), from (2.27) we have

$$\begin{aligned} \sum _{i=0}^{k} \,||x^{i} - x^{i-1}||^2&\le \dfrac{1}{\xi - (2+\xi ) \alpha } \left( \frac{\mu _{0}}{1 - \alpha } + \alpha ^{k+1} \varphi _{0} \right) \nonumber \\&\le \dfrac{1 + \alpha - \alpha ^2}{(1 - \alpha ) \left[ \xi - (2+\xi ) \alpha \right] } \Vert x^0 - {\overline{x}}\Vert ^2. \end{aligned}$$
(2.29)

Choosing some \(i \in \{0, 1, \ldots , k\}\) in (2.29), we deduce (2.28).

Since \(\Vert x^{k+1}-y^{k}\Vert \le \Vert x^{k+1}-x^{k}\Vert + \alpha _k \Vert x^{k}-x^{k-1}\Vert \) and \(\rho _k\Vert z^k - y^k\Vert =\Vert x^{k+1} - y^{k}\Vert \) by (3.2), the last statement follows directly

from (2.28). \(\square \)

We conclude this section by discussing the choice of the parameters involved in our analysis.

Remark 21

  1. (i)

    Recall that the idea behind the proximal point algorithm is the regularization of a possibly ill-conditioned problem \(0 \in T(x)\) (where T is a nonlinear operator, for instance, the subdifferential of a convex function) by replacing it with a sequence of well-conditioned subproblems of the form \(0 \in \beta _k T(x) + x - x^{k}\), \(k\ge 0\). For any \(k\ge 0\) the choice of \(\beta _k\) in each specific instance must be a trade-off between numerical stability and fast progress of the algorithm, because if \(\beta _k\) is very small, the subproblem is very well-conditioned, but the progress of the algorithm slow because \(x^{k}\) remains very close to \(x^{k-1}\). On the other hand, when \(\beta _k\) is large, the algorithm advances fast (in the limit, when \(\beta _k\) approaches \(+\infty \), the regularization term becomes negligible, and we solve the problem in one iteration), but when the problem is ill-conditioned, this entails high numerical instability. Therefore, in the proximal type methods we consider for solving equilibrium problems the choice of \(\beta _k\) must adhere to the trade-off between numerical stability and fast progress. This is ensured by (C1). On the other hand, the choice of the parameters of the inertial and relaxation parameters \(\{\alpha _k\}_{k}\) and \(\{\rho _k\}_{k}\) is controlled by the input parameters \(\rho \), \(\alpha \), and the Lipschitz-type constant \(\eta \), as can be observed in (C2) and (C3).

  2. (ii)

    Note that the sequence \(\{\beta _k\}_{k}\) can be updated by using in each iteration a line-search procedure or a self-adaptive strategy. When employing them, there is no need to know the value of the Lipschitz-type condition parameter (which can be challenging to calculate in many situations), but the price to pay is that an extra gradient step is needed. For further details on this approach, see, for instance, [23, 40] in the convex case and [52] in the quasiconvex setting (without inertial and relaxation effects).

  3. (iii)

    Relaxation techniques are another ingredient in the formulation of algorithms in continuous optimization because they provide more flexibility to the output of an iterative scheme (see [5]) and have the property of speeding up the algorithm in a similar manner to the effect of inertial steps. Without using inertia, over-relaxation (i.e., \(\rho _k >1\) when \(k\ge 0\)) provides a natural way of speeding up algorithms. The interplay of the inertial and relaxation parameters plays a crucial role in the convergence analysis of our proposed algorithm. The choice of the values of \(\alpha \) and \(\rho \) (given as inputs to the algorithm) is thus crucial, as they need to satisfy condition (C3).

4 Applications and Numerical Experiments

In this section, we discuss some equilibrium problems involving bifunctions that are strongly quasiconvex in the second variable and fulfill the hypotheses guaranteeing the convergence of the proposed algorithm.

4.1 Strongly Quasiconvex Ratios

Given a subset \(K \subseteq {\mathbb {R}}^{n}\), and functions \(h: \mathbb { R}^{n} \rightarrow \overline{{\mathbb {R}}}\) and \(g: {\mathbb {R}}^{n} \rightarrow {\mathbb {R}}\) such that \(K\cap {{\,\textrm{dom}\,}}\,h \ne \emptyset \) and \(0\notin g(K)\), consider the fractional minimization problem given by

$$\begin{aligned} \min _{x \in K} \, \varphi (x) ~ = ~ \min _{x \in K} \frac{h(x)}{g(x)}. \end{aligned}$$
(FMP)

This problem has been intensively studied in the literature (see [10, 43, 45] among others) due to its concrete applications in several fields of mathematical sciences as well as in economics, for instance, in the theory of productivity as maximization of return/risk or profit/cost and minimization of cost/time among others.

In general, problem (FMP) is not convex. Indeed, if, for instance, h is convex and g is affine, then \(\varphi :K\rightarrow \overline{{\mathbb {R}}}\), \(\varphi =h/g\), is semistrictly quasiconvex, and the same happens when h is nonnegative and convex, and g is positive and concave, see [10, Theorem 2.3.8]. Sufficient conditions for guaranteeing that \(\varphi \) is strongly quasiconvex are given below.

Proposition 22

(cf. [28]) When h is strongly convex with modulus \(\gamma > 0\), g is positive and bounded from above by \(M > 0\), and one of the following hypotheses holds

(a):

g is affine;

(b):

h is nonnegative on \({{\,\textrm{dom}\,}}h\) and g is concave;

(c):

h is nonpositive on \({{\,\textrm{dom}\,}}h\) and g is convex,

then \(\varphi :K\rightarrow \overline{{\mathbb {R}}}\), \(\varphi =h/g\), is strongly quasiconvex with modulus \(\gamma ^{\prime }= {\gamma }/{M} > 0\).

A consequence of this statement follows.

Corollary 23

(cf. [28]) Let \(A, B \in {\mathbb {R}}^{n\times n}\), \(a, b \in {\mathbb {R}}^{n}\) and \(\alpha , \beta \in {\mathbb {R}}\), \(h(x) = \frac{1}{2} \langle Ax, x \rangle + \langle a, x \rangle + \alpha \) and \(g(x) = \frac{1}{2} \langle Bx, x \rangle + \langle b, x \rangle + \beta \). Take \(K = \{x \in {\mathbb {R}}^{n}: ~ m \le g(x) \le M\}\), with \(0< m < M\), and \(\varphi (x) = {h(x)}/{g(x)}\), with \(x \in K\). Suppose that A is a positive definite matrix, and \(\lambda _{\min }(A)\) is its smallest eigenvalue. If any of the following conditions holds

(a):

\(B = 0\) (the null matrix);

(b):

h is nonnegative on K and B is negative semidefinite;

(c):

h is nonpositive on K and B is positive semidefinite,

then \(\varphi \) is strongly quasiconvex with modulus \(\gamma ^{\prime } ={\lambda _{\min }(A)}/{M} > 0\).

In the following example, we present families of problems of type (IMVI) for which assumptions (Ai) with \(i=0,1,2,3,4,5\) are satisfied. This example already appears in [28], but we include it here for completeness.

Example 24

(cf. [28]) Let \(A, B \in {\mathbb {R}}^{n \times n}\) be matrices, \(a, b, c \in {\mathbb {R}}^n\) and \(\alpha , \beta \in {\mathbb {R}}\). Let us consider \(K:= \{x \in {\mathbb {R}}^{n}: ~ m \le \langle c, x \rangle + \alpha \le M\}\) with \(0< m < M\), and \(F: K \rightarrow {\mathbb {R}}^n\) and \(\varphi : K \rightarrow {\mathbb {R}}\) be the operator and function defined by

$$\begin{aligned} F(x) := \frac{1}{\langle c, x \rangle + \alpha }(Ax+a), ~~~ \varphi (x) := \frac{\frac{1}{2} \langle B x, x \rangle + \langle b, x \rangle + \beta }{\langle c, x \rangle + \alpha }. \end{aligned}$$
(4.1)

By taking \(f: K \times K \rightarrow {\mathbb {R}}\) given by

$$\begin{aligned} f(x, y) :=&\, \left\langle x, \frac{Ay + a}{\langle c, y \rangle + \alpha } - \frac{Ax + a}{\langle c, x \rangle + \alpha } \right\rangle \nonumber \\&+ \frac{\frac{1}{2} \langle B y, y \rangle + \langle b, y \rangle + \beta }{\langle c, y \rangle + \alpha } - \frac{\frac{1}{2} \langle B x, x \rangle + \langle b, x \rangle + \beta }{\langle c, x \rangle + \alpha }, \end{aligned}$$
(4.2)

(IMVI) reduces to an equilibrium problem (EP).

Let us check that the bifunction f defined in (4.2) satisfies assumptions (Ai) with \(i=0,1,2,3,4,5\) under mild hypotheses. Indeed, (Ai) with \(i=0,1,3\) are straightforward while (A2) follows from the monotonicity of F.

For (A4): note that f can be rewritten as

$$\begin{aligned} f(x, y) =&\, \langle x, F(y) - F(x) \rangle + \varphi (y) - \varphi (x)\\ =&\,\frac{\frac{1}{2} \langle B y, y \rangle + (x^{\top } A + b^{\top })y + a^{\top }x + \beta }{\langle c, y \rangle + \alpha } - \frac{x^{\top }(Ax + a)}{\langle c, x \rangle + \alpha } \\&- \frac{\frac{1}{2} \langle B x, x \rangle + \langle b, x \rangle + \beta }{\langle c, x \rangle + \alpha } . \end{aligned}$$

Hence, in virtue of Corollary 23, for any \(x \in K\) the function \(f(x, \cdot )\) is strongly quasiconvex on K when the matrix B is positive definite, and its modulus of strong quasiconvexity is \(\gamma = {\lambda _{\min }(B)}/{M} > 0\).

For (A5): Since F is continuously differentiable on K, it is Lipschitz-continuous on K. Let us denote by \(L > 0\) its Lipschitz constant. Then, for every \(x, y, z \in K\),

$$\begin{aligned} f(x,y) + f(y,z) - f(x,z)= & {} \langle x - y, F(y) - F(z) \rangle \\\ge & {} - \frac{\sqrt{L}}{2} \Vert x-y\Vert ^2 - \frac{1}{2 \sqrt{L}} \Vert F(y) - F(z)\Vert ^2 \\\ge & {} - \frac{\sqrt{L}}{2} (\Vert x - y\Vert ^2 + \Vert y - z\Vert ^2), \end{aligned}$$

i.e., (A5) holds.

Finally, for (C1) and (C2), we simply take the matrix B having its smallest eigenvalue large enough to satisfy (A6). This can be done because \(\eta \) depends on \(A, a, c, \alpha \), but not on B.

In the following example, we present families of problems that satisfy our assumptions \((A_i)\) with \(i=0,1,2,3,4,5\). This new example will be used for the numerical experiments below.

Example 25

Let \(B \in {\mathbb {R}}^{n \times n}\), \(b, c \in {\mathbb {R}}^n\) and \(\alpha , \beta \in {\mathbb {R}}\), \(K:= \{x \in {\mathbb {R}}^{n}: ~ m \le \langle c, x \rangle + \alpha \le M\}\) where \(0< m < M\), \(T: K \rightarrow K\) be an operator and the function \(\varphi : K \rightarrow {\mathbb {R}}\) defined as in (4.1). Taking \(f: K \times K \rightarrow {\mathbb {R}}\) given by

$$\begin{aligned} f(x, y):= \left\langle T(x), y - x \right\rangle + \frac{\frac{1}{2} \langle By, y \rangle + \langle b, y \rangle + \beta }{\langle c, y \rangle + \alpha } - \frac{\frac{1}{2} \langle Bx, x \rangle + \langle b, x \rangle + \beta }{\langle c, x \rangle + \alpha },\nonumber \\ \end{aligned}$$
(2.3)

problem (MVI) reduces to an equilibrium problem.

Let us check that bifunction f defined in (2.3) satisfies assumptions (Ai) with \(i=0,1,2,3\) under mild hypotheses. Indeed, again the verification of assumptions (Ai) with \(i=0,1,3\), is straightforward; if T is monotone, then (A2) holds, and if T is L-Lipschitz-continuous, then (A5) follows from Remark 3(ii).

For (A4): note that f can be rewritten as

$$\begin{aligned} f(x, y)&= \langle T(x), y - x \rangle + \varphi (y) - \varphi (x) \nonumber \\&= \langle T(x), y \rangle + \frac{\frac{1}{2} \langle By, y \rangle + \langle b, y \rangle + \beta }{\langle c, y \rangle + \alpha } - \langle T(x), x \rangle \\&\qquad - \frac{\frac{1}{2} \langle Bx, x \rangle + \langle b, x \rangle + \beta }{\langle c, x \rangle + \alpha } \nonumber \\&= \frac{y^{\top } (T(x) c^{\top } + \frac{1}{2} B) y + (b + \alpha T(x))^{\top } y + \beta }{\langle c, y \rangle + \alpha } - \langle T(x), x \rangle \\&\qquad - \frac{\frac{1}{2} \langle Bx, x \rangle + \langle b, x \rangle + \beta }{\langle c, x \rangle + \alpha }. \end{aligned}$$

Take an \(x\in K\) such that \(T(x)^{\top } c\ne 0\). Then, by Brauer’s Eigenvalue Theorem (see [8]), the matrix \(T(x) c^{\top }\) has rank 1 and its only nonzero eigenvalue is \(\lambda _{\ne 0}(x) = T(x)^{\top }c\). Hence, in virtue of Proposition 22, for any \(x \in K\) the function \(f(x, \cdot )\) is strongly quasiconvex on K when the matrix \(T(x)c^{\top } + ({1}/{2}) B\) is positive definite for all \(x \in K\).

Finally, for (C1) and (C2), we simply take \(b \in {{\,\textrm{int}\,}}\,K^{*}\) and the matrix B such that its smallest eigenvalue is large enough to satisfy (C1) and (C2).

4.2 Numerical Experiments

In order to stress the applicability of the algorithms proposed in this paper to equilibrium problems for which until recently there have been no reliable iterative solving methods, we present some computational results obtained in matlab 2019b-Win64 on a Dell Latitude 5420 Laptop with Windows 11 Pro and an Intel Core i7 1165G7 CPU with 2.80 GHz and 32GB RAM by implementing the relaxed-inertial proximal point method investigated in this work Algorithm 1 (RIPPA-EP) and its noninertial version (3.18) (RPPA-EP), and, for comparison, the “pure” proximal point method [27, Algorithm 1] (PPA-EP). The examples treated below exhibit situations where the methods proposed in this paper have a superior performance (in terms of the number of necessary iterations and CPU time until reaching the solution to (EP)) to their “pure” proximal point counterpart. As stopping criteria of the proposed algorithms, we considered the situations when the norm, in particular the absolute value, of the difference between \(y^{k}\) and \(z^k\), respectively \(x^{k}\) and \(z^k\) in the noninertial case, and between two consecutive members of \(\{x^{k}\}_k\) in the “pure” proximal point case is not larger than an a priori given error \(\varepsilon >0\), i.e., we stop when \(\Vert z^{k} - y^{k}\Vert < \varepsilon \) for (RIPPA-EP), \(\Vert z^k - x^{k}\Vert <\varepsilon \) for (RPPA-EP), and \(\Vert x^k - x^{k-1}\Vert <\varepsilon \) for (PPA-EP).

We begin with two simpler equilibrium problems, followed by the fractional one considered in Example 25.

Example 26

Take \(p, q \in {\mathbb {N}}\), \(p, q > 1\), and the bifunction \(f: {\mathbb {R}} \times {\mathbb {R}} \rightarrow {\mathbb {R}}\),

$$\begin{aligned} f(x, y):= & {} p \left( \max \{\sqrt{\Vert y \Vert }, (y - q)^{2} - q\} \right) \nonumber \\{} & {} - p \left( \max \{\sqrt{\Vert x \Vert }, (x-q)^{2} - q\} \right) + x(y-x). \end{aligned}$$
(2.4)

We solve (EP) with Algorithm 1 and [27, Algorithm 1]. Its solution set \(S({\mathbb {R}}, f)\) depends on the values of \(p, q > 1\).

Assumptions (A1) and (A3) trivially hold by the continuity of f, while (A5) holds when, for instance, \(\eta = 1/2\). Taking \(x, y \in K\) one has

$$\begin{aligned} f (x, y) + f(y, x) = x(y-x) + y(x-y) = -(y-x)^{2} \le 0, \end{aligned}$$

hence f is monotone on K, i.e., (A2) holds as well.

Regarding (A4), for all \(x \in K\), since the functions \(y \mapsto x(y - x)\) and \(y \mapsto p\left( \max \{\sqrt{\Vert y \Vert }, (y -q)^{2} - q\} \right) \) are increasing on \([0, + \infty [\), their sum is increasing too, so that f is quasiconvex in its second argument by, for instance, [18, Proposition 4.9]. Furthermore, f is strongly quasiconvex in its second argument as the maximum of two strongly quasiconvex functions. Indeed, for every \(q \in {\mathbb {N}}\), taking \(\delta > 0\) to be larger than the positive solution of the equation \(p (y-q)^{2} - \sqrt{\Vert y \Vert } = pq\), the function \(y \mapsto p\sqrt{\Vert y \Vert } + x(y-x)\) is strongly quasiconvex on \([0, \delta ]\) with modulus \(\gamma = {1}/({2 \sqrt{\delta ^{3}}}) > 0\) by [27, Example 4.2], while the function \(y \mapsto (y-q)^{2} -q + x(y-x)\) is strongly convex, hence also strongly quasiconvex on \([\delta , + \infty [\). Therefore, f is strongly quasiconvex on \([0, + \infty [\) and assumption (A4) holds. Also, as \(q>1\), the functions \(\sqrt{\Vert y \Vert }\) and \((y -q)^{2} - q\) take the same value twice on \([0, +\infty [\), so

$$\begin{aligned} y \mapsto p\max \{\sqrt{\Vert y \Vert }, (y - q)^{2} - q\} + x(y-x) = p((y - q)^{2} - q) + x(y-x), \end{aligned}$$

is strongly convex on \(]-\infty , 0]\), hence also strongly quasiconvex. Therefore, (A4) holds when \(K={\mathbb {R}}\). Finally, \(p, q > 1\) have to be chosen such that \(\gamma \) satisfies conditions (C1) and (C2), while \(\{\alpha _k\}_k\) needs to fulfill (C3). Note that one can multiply the last term of (2.4) by a parameter \(t > 0\) without affecting the mentioned properties of the considered problem.

We performed several experiments in matlab for various constellations of the involved elements, that confirm the conclusions reached in [2, 3, 23] for the convex case, namely that a clever choice of the parameters makes the relaxed-inertial proximal point algorithms faster and cheaper to use than their “pure” proximal point counterparts.

For instance, when \(q=99\), \(\epsilon = 1/4\), \(\eta = 1/2\), \(\rho = 2/5\), \(\varepsilon = 10^{-7}\), \(\alpha =1/29 - \varepsilon \) (as \({\overline{\alpha }}=1/29\) by (2.21)), \(\alpha _k = \alpha - 1/(10+7k)\), \(\beta _k = 1/(8k+4)\), \(k\ge 0\), and the starting points \(x^0=8297\) and \(x^{-1}=3210\), taking \(\rho _k = (1-1/k)(1-\rho ) + (1/k) (1+\rho )\), \(k\ge 0\), made Algorithm 1 (RIPPA-EP) deliver the solution \({\bar{x}}=88.5879\) to (EP) in 20 iterations that required 0.6232 s, while [27, Algorithm 1] (PPA-EP) provided \({\bar{x}}\) in 123 steps, after 1.3024 s. Moreover, when \(\rho _k = (1/k)(1-\rho ) + (1-1/k) (1+\rho )\), \(k\ge 0\), Algorithm 1 (RIPPA-EP) remained faster and cheaper than the standard proximal point method as it delivered \({\bar{x}}\) after 38 iterations, in 0.9610 s. Improving the accuracy to \(\varepsilon = 10^{-9}\), Algorithm 1 (RIPPA-EP) reached \({\bar{x}}\) in 30 iterations and 0.8728 s with the first choice of the relaxation parameters, and in 53 iterations and 1.3601 s with the second one, while [27, Algorithm 1] (PPA-EP) required 124 iterations and 1.9781 s. These results are summarized in Table 1.

Table 1 Comparison of the performance of RIPPA-EP(a) (Algorithm 1 with \(\rho _k = (1-1/k)(1-\rho ) + (1/k) (1+\rho )\)), RIPPA-EP(b) (Algorithm 1 with \(\rho _k = (1/k)(1-\rho ) + (1-1/k)(1+\rho )\)), and [27, Algorithm 1] with starting points \(x^0=8297\) and \(x^{-1}=3210\) while reaching the stopping criterion

Taking the starting points closer to the obtained solution, \(x^0=200\) and \(x^{-1}=-300\), the tendency observed above remained valid, as Algorithm 1 (RIPPA-EP) reached \({\bar{x}}\) in 13 iterations and 0.4052 s, while [27, Algorithm 1] (PPA-EP) required 83 iterations and 0.8381 s.

We can also report on the curious fact that when we mistakenly implemented \(\beta _k = 1/(8k)+4\) (instead of \(\beta _k = 1/(8k+4)\)), \(k\ge 0\), in the last parameter constellation presented above, Algorithm 1 converged and delivered the same solution \({\bar{x}}\) to (EP), requiring 25 iterations and 0.8981 s for this. In this situation, however, (C1) is violated and thus Theorem 11 does not apply; hence, the convergence of the proposed algorithm cannot be theoretically guaranteed. As this seems to work in practice, at least in this case, the legitimate (at the moment open) question of alternative hypotheses that keep Algorithm 1 convergent beyond the setting of Theorem 11 is raised.

Next we consider the problem from (2.4) in higher dimensions. The relaxed-inertial proximal point algorithm turns out to outperform its standard proximal point counterpart in this case, too.

Example 27

Take \(p, q \in {\mathbb {N}}\), \(p, q > 1\), and the bifunction \(f: {\mathbb {R}}^n \times {\mathbb {R}}^n \rightarrow {\mathbb {R}}\),

$$\begin{aligned} f(x, y) :=&~ p \left( \max \{\sqrt{\Vert y\Vert }, (y - qe)^{2} - q\} \right) \nonumber \\&- p \left( \max \{\sqrt{\Vert x \Vert }, (x-qe)^{2} - q\} \right) + x^\top (y-x), \end{aligned}$$
(2.5)

where \(e=(1, \ldots , 1)^\top \in {\mathbb {R}}^n\). We solve (EP) with Algorithm 1 and [27, Algorithm 1]. Its solution set \(S({\mathbb {R}}^{n}, f)\) depends on the value of \(q > 1\). One can verify the fulfillment of the hypotheses in a similar manner to Example 26.

For \(n=60\), \(\varepsilon = 10^{-8}\), \(\alpha _k = \alpha - 1/(120+3k)\), \(\beta _k = 1/(28k+4)\), \(\rho _k = (1/k)(1-\rho ) + (1-1/k) (1+\rho )\), \(k \ge 0\), the other parameters as above, and randomly generated starting points, Algorithm 1 (RIPPA-EP) delivered the solution to (EP) in 84 iterations that required 4.8055 s, while [27, Algorithm 1] (PPA-EP) provided it in 137 steps, after 7.8315 s. Increasing the dimension to \(n=99\) and relaxing the admissible error to \(\varepsilon = 10^{-6}\), for \(\alpha _k = \alpha - 1/(120+7k)\), \(\rho _k = (1-1/k)(1-\rho ) + (1/k) (1+\rho )\), \(k\ge 0\), the other parameters as above, and randomly generated starting points, Algorithm 1 (RIPPA-EP) needed 71 iterations that required 5.6575 s to solve (EP), while [27, Algorithm 1] (PPA-EP) did the job in 228 steps, after 22.3490 s. The results are summarized in Table 2. The observed tendency was confirmed by testing the algorithms for higher dimensions like \(n=500\) and \(n=1000\), too.

Table 2 Comparison of the performance of RIPPA-EP(a) (Algorithm 1 with \(\rho _k = (1/k)(1-\rho ) + (1-1/k)(1+\rho )\)), RIPPA-EP(b) (Algorithm 1 with \(\rho _k =(1-1/k)(1-\rho ) + (1/k) (1+\rho )\)), and PPA-EP [27, Algorithm 1]

Next, we examine an equilibrium problem related to the one discussed in Example 26. In this scenario, the governing bifunction is strongly quasiconvex in its second variable, representing a special case of the strongly quasiconvex equilibrium problem outlined in [27, Example 4.2]. This specific case has been addressed in [52, Example 1] using a proximal subgradient algorithm.

Example 28

Consider the equilibrium problem from Example 25 with \(n=2\), \(\alpha =0.3\), \(c=(2, 3)^\top \), \(B = 10 \cdot \left( \begin{array}{cc} 2 &{} 1\\ 1 &{} 2\end{array}\right) \), \(b=(3, 8)^\top \), \(\beta =0\), \(T = ({1}/{13}) \cdot {{\,\textrm{Id}\,}}\), \(m=1\) and \(M=10\). Hence \(\gamma = 1\). Because of the structure of the considered problem, as the feasible set is convex and closed, but not an affine subspace, we solved it with the noninertial version (3.18) (RPPA-EP) of Algorithm 1 (as the extrapolation steps could lead to infeasible points). We performed experiments in matlab for various constellations of the involved parameters. For instance, when \(\varepsilon =10^{-6}\), \(\epsilon = 1/3\), \(\eta = {1}/{13}\), \(\rho = 0.15\), \(\beta _{k} = 1/(k+20)\), \(\rho _k = (1/k)(1-\rho ) + (1-1/k) (1+\rho )\), \(k\ge 0\), with the starting point \(x^0=(5, 5)^\top \), (3.18) (RPPA-EP) delivered the solution \((0.1948, 0.1035)^\top \) to the considered problem after 66 iterations and 0.6249 s. This performance was improved when we took larger relaxation intervals, the best result being achieved for \(\rho = 0.7\), namely 41 iterations and 0.3625 s. Interestingly, for higher values of \(\rho \) (up to its maximal value of 35/39 allowed by (C2)) the performance of the algorithm resembles the one of the standard proximal point method [27, Algorithm 1], which reached the solution after 78 iterations, in 0.7149 s. Lowering the acceptable error to \(\varepsilon =10^{-8}\), (3.18) (RPPA-EP) delivered an approximate solution to the equilibrium problem from Example 25 after 75 iterations and 0.6980 s when \(\rho = 0.7\), that is improved to 69 iterations and 0.6211 s for \(\rho = 0.45\), while [27, Algorithm 1] required 127 iterations and 1.4180 s. These findings are summarized in Table 3.

Table 3 Comparison of the performance of (3.18) RPPA-EP with \(\rho _k = (1/k)(1-\rho ) + (1-1/k) (1+\rho )\)) and PPA-EP [27, Algorithm 1]

All these experiments confirm, again, that a suitable choice of the parameters makes the relaxed-inertial proximal point algorithms faster and cheaper to use than their “pure” proximal point counterparts.

Remark 29

When implementing all three considered algorithms the proximal step (3.3) and its counterpart in [27, Algorithm 1] were computed by employing the Matlab function fmincon because, as it is unfortunately often the case when dealing with nonconvex functions, a closed form of the proximal operator of the involved function (even when \(K = {\mathbb {R}}^{n}\)) is not available for the moment. Depending on the involved function, it might derived by direct calculation or possibly by adapting the method considered in [9] for determining the proximal operator of the root function.

Remark 30

As noted in [17], the optimal choices of the involved relaxation and inertial parameters derived in the convex case do not always accelerate the proximal point algorithm in the strongly quasiconvex case. It remains thus an open question to determine choices of these parameters that deliver a theoretically guaranteed acceleration in the strongly quasiconvex setting.

5 Conclusions

We show that the relaxed-inertial proximal point algorithm for solving equilibrium problems, recently introduced in the convex case, remains convergent, under standard hypotheses, when the employed bifunction is strongly quasiconvex in its second variable, and pseudomonotone. Various sufficient conditions for guaranteeing the fulfillment of these hypotheses are provided. Numerical experiments illustrate the theoretical achievements by showing that the proposed algorithm outperforms the standard proximal point one.

In subsequent work, we plan to provide similar results achieved under a more general quasiconvexity assumption on the involved bifunction in its second variable. Moreover, we aim to investigate the extension to the quasiconvex framework considered in this work of the modified relaxed-inertial extragradient method proposed in [23, 47] in the convex case.