1 Introduction

The equilibrium problem provides a general setting to formulate a large number of problems such as scalar and vector optimization, variational inequality, fixed point, complementarity, saddle points and noncooperative games in a unique format (see [1, 2] and references therein). A huge number of applications have been developed in different areas such as economics, environment, transportation, information technology and telecommunications: some recent papers focused on oligopolistic and spatial price markets [35], auction and financial markets [68], risk management [9], climate policies [10, 11], traffic and pricing over telecommunication networks or over public roads [1214], clouding computing [15, 16], power allocation in radio systems [17, 18], internet advertising [19].

Several kinds of methods to solve equilibrium problems have been proposed (see, for instance, the recent survey [1]). One popular approach relies on the reformulation of the equilibrium problem as an optimization problem through appropriate gap or D-gap functions: many ad hoc descent methods for minimizing the chosen gap function have been developed (see [13, 14, 2028]). Most of them require the computation of the minimum of a convex function over the (convex) feasible region of the equilibrium problem just to evaluate the gap function at a given point. Therefore, this evaluation could be computationally expensive when the feasible region is described by nonlinear (convex) constraints.

Recently, a gap function, which uses a polyhedral approximation of the feasible region, has been introduced in [14]. This paper introduces two descent methods for solving equilibrium problems with nonlinear constraints, exploiting this new gap function. They are both based on a search direction which could be unfeasible, unlike most of the known algorithms. Therefore, some penalization techniques are needed: an exact penalty term is introduced and a descent direction for the penalized gap function is available, provided that the penalization parameter is small enough. It is worthy to remark that the penalization parameter is updated throughout the iterations of the algorithms whenever it is needed.

These new methods have some better features than most of the available descent methods. At each iteration, a convex optimization problem with linear constraints is solved instead of a convex problem with nonlinear constraints, as in [13, 21, 23, 25, 26]. Moreover, the key assumption for convergence is weaker than those for the methods proposed in [21, 25, 26]. Finally, the parameters are bounded away from zero while they might go to zero, and thus give numerical instability, in the methods developed in [13, 14, 23].

The paper is organized as follows: Sect. 2 recalls some well-known definitions and results about gap functions for equilibrium problems, and a key lemma on the penalized gap function is proved. Section 3 provides the two new solution methods and their convergence is proved under standard assumptions. Finally, Sect. 4 contains some final remarks and comparisons between the new methods and those already available in the literature.

2 Preliminaries

We consider the following equilibrium problem:

$$\mathrm{(EP)}\quad\text{find } x^* \in C \quad\text{ such that } \quad f \bigl(x^*,y \bigr) \geq0, \quad\forall y \in C, $$

where \(C\subseteq\mathbb{R}^{n}\) is closed and convex and \(f: \mathbb{R}^{n} \times\mathbb{R}^{n} \to\mathbb{R}\) is a bifunction. Throughout the paper, the following basic assumptions are made:

  • The feasible set C is bounded and it is defined by convex inequalities, i.e.,

    $$C := \bigl\{ x \in\mathbb{R}^n: c_i(x) \leq0, i = 1, \dots, m \bigr\} , $$

    where \(c_{i} : \mathbb{R}^{n} \to\mathbb{R}\) are convex functions.

  • The c i ’s are twice continuously differentiable and Slater constraint qualification holds, i.e., there is \(\hat{x} \in\mathbb{R}^{n}\) such that \(c_{i}(\hat {x}) < 0\) for all i=1,…,m.

  • The bifunction f is continuously differentiable, f(x,⋅) is convex, f(x,x)=0 for all xD, where D is a bounded polyhedron containing C.

The above assumptions guarantee that (EP) admits at least one solution (see, for instance, [29]).

A function \(g:C\rightarrow\mathbb{R}\) is said to be a gap function for (EP) iff g is non-negative on C, and x solves (EP) if and only if x C and g(x )=0. Thus, gap functions allow to reformulate an equilibrium problem as a global optimization problem, whose optimal value is known a priori. In order to build gap functions with good smoothness properties, it is helpful to consider a continuously differentiable auxiliary bifunction \(h:\mathbb{R}^{n}\times \mathbb{R} ^{n}\rightarrow\mathbb{R}\) which satisfies the following conditions:

  • h(x,y)≥0 for all x,yD and h(z,z)=0 for all zD,

  • h(x,⋅) is strictly convex for all xD,

  • y h(z,z)=0 for all zD,

  • 〈∇ x h(x,y)+∇ y h(x,y),yx〉≥0 for all x,yD.

If all the conditions on f and h which involve D actually hold on the whole \(\mathbb{R}^{n}\), then any bounded polyhedron D such that CD can be considered.

Given any α>0, a well-known gap function (see, for instance, [26]) is

$$\phi_\alpha(x)= -\min_{y \in C} \bigl\{ f(x,y) + \alpha\, h(x,y) \bigr\} . $$

Several descent methods based on the minimization of the gap function ϕ α have been developed [13, 14, 21, 25, 26]. However, computing ϕ α (x) requires the solution of a convex optimization problem with nonlinear constraints, which may be computationally expensive.

Recently, the gap function ϕ α has been modified by replacing the feasible region C by its polyhedral approximation at each considered point [14], namely introducing the function

$$ \varphi_\alpha(x)= -\min_{y \in P(x)} \bigl\{ f(x,y) + \alpha\, h(x,y) \bigr\} , $$
(1)

where

$$P(x) = \bigl\{ y \in D : c_i(x) + \bigl\langle {\nabla c_i(x)} , {y-x} \bigr\rangle \leq0, i = 1, \dots, m \bigr\} . $$

The polyhedron D guarantees the boundedness of P(x): in fact, the linearization of the constraints alone could be not enough for it, even though C itself is bounded.

Since the inner optimization problem in (1) has a strictly convex objective function and a bounded feasible region, it admits a unique solution y α (x). We remark that this modification of the gap function ϕ α extends to (EP) a similar idea developed in [30] for variational inequalities.

Lemma 2.1

[14]

The following statements hold:

  1. (a)

    φ α is locally Lipschitz continuous on D;

  2. (b)

    x solves (EP) if and only if y α (x )=x ;

  3. (c)

    φ α is a gap function for (EP);

  4. (d)

    If xC does not solve (EP) and f is strictly ∇-monotone on D, i.e.

    $$ \bigl\langle {\nabla_x f(x,y) + \nabla_y f(x,y)} , {y - x} \bigr\rangle > 0, \quad\forall x,y \in D \textit{ with } x \neq y, $$
    (2)

    then y α (x)−x is a descent direction for φ α at x.

It is worth noting that it is not possible to replace P(x) by D in (1): φ α would no longer be a gap function, and also the fixed-point reformulation in the above lemma would not hold.

The above results suggest to exploit y α (x)−x as a search direction at a given iterate x in the minimization process of the gap function φ α . However, the direction y α (x)−x may be unfeasible because y α (x) belongs to the approximating polyhedron P(x), but not necessarily to the feasible set C. For this reason, the following exact penalty function has been introduced in [14]:

$$\psi_{\alpha,\varepsilon,p} (x) := \varphi_\alpha(x) + \frac {1}{\varepsilon}\, \big\| c^+(x)\big\| _p, $$

where \(c^{+}(x)=(c_{1}^{+}(x),\dots,c_{m}^{+}(x))\) with \(c_{i}^{+}(x)=\max\{ 0,c_{i}(x)\} \), ε>0 and p∈[1,∞]. Given any α>0, also the penalty function turns out to be a gap function, provided that the penalization parameter ε is small enough. In fact, well-known results about penalization [31] allow to prove the following key properties.

Lemma 2.2

Given any α>0 and any p∈[1,∞], there exists \(\bar {\varepsilon }>0\) such that

  1. (a)

    ψ α,ε,p (x)≥0 for all xD,

  2. (b)

    x solves (EP) if and only if x D and ψ α,ε,p (x )=0,

for all \(\varepsilon\in]0,\bar{\varepsilon}[\).

Proof

Consider any compact set D′ containing D in its interior, namely \(D \subset{\rm int\,} D'\). By [31, Proposition 8 and Theorems 11 and 12], there exists \(\bar{\varepsilon}>0\) such that

$$ {\rm argmin} \bigl\{ \varphi_\alpha(x) : x \in C \bigr\} = {\rm argmin} \bigl\{ \psi_{\alpha,\varepsilon,p}(x) : x \in {\rm int\,} D' \bigr\} $$
(3)

holds for any \(\varepsilon\in]0,\bar{\varepsilon}[\). Consider any global minimizer \(\hat{x}\) of φ α over C: we have both \(\hat{x}\in C\) and \(\varphi_{\alpha}(\hat{x})=0\), so that \(\psi_{\alpha,\varepsilon ,p}(\hat {x})=\varphi_{\alpha}(\hat{x})=0\). Therefore, (a) follows immediately since (3) implies that \(\hat{x}\) is also a minimizer of ψ α,ε,p on D. As a consequence, Lemma 2.1(c) and (3) imply that (b) holds as well. □

Next section introduces two new solution methods for (EP) based on the minimization of the penalized gap function ψ α,ε,p , which are both convergent under assumption (2).

Actually, a method based on the minimization of ψ α,ε,p has already been proposed in [14]. The basic idea of the algorithm is the following: given values for α and ε, the new iterate is obtained moving away from the current iterate x k along the direction y α (x k)−x k if the decrease condition

$$ \psi_{\alpha,\varepsilon,p} \bigl(x^k \bigr) - \alpha \bigl[ h \bigl(x^k,y_\alpha \bigl(x^k \bigr) \bigr) + \bigl\langle \nabla_x h \bigl(x^k,y_\alpha \bigl(x^k \bigr) \bigr),y_\alpha \bigl(x^k \bigr)-x^k \bigr\rangle \bigr]\leq-\eta\, \psi_{\alpha,\varepsilon,p} \bigl(x^k \bigr) $$
(4)

is satisfied for some given η∈]0,1[ along with two further technical conditions (related to ε). If this is not the case, a null step is performed simply decreasing both α and ε simultaneously before trying again to move away from x k with the same procedure. The decrease is run according to sequences of parameters going to zero. Thus, if the algorithm performs an infinite sequence of null steps, it generates a sequence of iterates which converges to a solution of (EP) while the parameters α and ε go to zero.

On the other hand, the assumptions required by this latter method for convergence are different from those used in this paper. In fact, the above method from [14] is based on the following concavity-type condition:

$$ f(x,y) + \bigl\langle {\nabla_x f(x,y)} , {y-x} \bigr\rangle \geq0 \quad \forall x,y \in D, $$
(5)

which is neither weaker nor stronger than condition (2) (see Examples 2.5 and 3.2 in [13]). Furthermore, the following example satisfies both the conditions (2) and (5), and it provides a case in which the above method could be numerically unstable since the parameters α and ε actually go to zero. On the contrary, this may never happen in the algorithms of this paper since α is kept fixed and ε is always updated a finite number of times.

Example 2.1

Consider (EP) with n=2, m=1,

$$f(x,y)=(x_1+x_2)\,y_1 + (x_2-x_1)\,y_2 -x_1^2 -x_2^2 $$

and \(c_{1}(x)=x_{1}^{2}+x_{2}^{2}-1\). Therefore, the feasible region C is the unit ball and x =(0,0) is the unique solution of (EP). Consider \(h(x,y)=\|y-x\|_{2}^{2}/2\) and the box D=[−1,1]×[−1,1] containing the feasible region. Note that f satisfies (2) since

$$\bigl\langle \nabla_x f(x,y) + \nabla_y f(x,y), y-x \bigr\rangle =\|y-x\|_2^2. $$

Furthermore, f(⋅,y) is concave for all \(y\in\mathbb{R}^{n}\), hence

$$0 = f(y,y) \leq f(x,y) + \bigl\langle {\nabla_x f(x,y)} , {y-x} \bigr\rangle , \quad\forall x,y \in D, $$

holds for all x,yD, i.e., f satisfies (5) as well.

Three cases can occur running the algorithm devised in [14] (see [14, Theorem 5]):

  1. 1.

    It finds x after a finite number of iterations;

  2. 2.

    It generates a sequence which converges to x while the parameters α and ε are updated only a finite number of times;

  3. 3.

    It generates a sequence which converges to x while α and ε go to zero.

Unless the starting point of the algorithm is x itself, the first two cases cannot happen.

The first case requires that x =(0,0) belongs to the segment with x and y α (x) as extreme points for some x, i.e., y α (x)=γx for some γ<0. Furthermore, y α (x) is the projection of z α (x) onto P(x) with

$$z_{\alpha}(x)= \bigl( x_1-(x_1+x_2)/ \alpha, x_2-(x_2-x_1)/\alpha \bigr). $$

In fact, we have

$$\begin{aligned} f(x,y) + \alpha h(x,y) = & (x_1+x_2)y_1+(x_2-x_1)y_2 - x_1^2-x_2^2 \\ &{}+\alpha \bigl[(y_1-x_1)^2 +(y_2-x_2)^2 \bigr]/2 \\ = & \alpha \bigl[ \bigl(y_1-x_1+(x_1+x_2)/ \alpha \bigr)^2+ \bigl(y_2-x_2+(x_2-x_1)/ \alpha \bigr)^2 \bigr]/2 \\ = & \alpha\big\| y - z_\alpha(x) \big\| ^2/2, \end{aligned}$$

and therefore

$$y_{\alpha}(x) = {\rm argmin} \bigl\{ f(x,y) + \alpha h(x,y) : y \in P(x) \bigr\} = {\rm argmin} \bigl\{ \big\| y - z_\alpha(x) \big\| ^2 : y \in P(x) \bigr\} . $$

Figure 1 shows that no point γx can ever be the projection of z α (x) onto P(x). Therefore, y α (x)≠γx for all γ<0, which means that the first case cannot occur.

Fig. 1
figure 1

Case 1 is not possible (Example 2.1)

In the second case the algorithm generates a sequence x kx while α is definitely fixed and the decrease condition (4) holds whenever k is sufficiently large. Clearly, z α (x k)→x since z α (x)→0 as x→0. Hence, z α (x k)∈CP(x k) holds for any sufficiently large k, and it implies y α (x k)=z α (x k) and therefore ψ α,ε,p (x k)=φ α (x k)=∥x k2/α. However, the decrease condition (4) reads

$$0 \leq- \frac{\eta}{\alpha} \|x^k\|^2, $$

which is not possible. Therefore, also the second case cannot occur.

As a result, only the third case can occur: the algorithm generates a sequence which converges to x while α and ε go to zero.

3 Descent Methods

While y α (x)−x is a descent direction for φ α at x, it is not necessarily so for any penalized gap function ψ α,ε,p . Indeed, the key result shows that it is a descent direction also for ψ α,ε,p at x if the parameter ε is small enough. The generalized directional derivative of ψ α,ε,p at x along the direction d, i.e.,

$$\psi_{\alpha,\varepsilon,p}^\circ(x;d) := \limsup_{\substack{z \to x \\ t \downarrow0 }}t^{-1} \bigl[ \psi_{\alpha,\varepsilon,p}(z + t\,d) - \psi _{\alpha,\varepsilon,p}(z) \bigr], $$

provides a convenient tool to check whether d is a descent direction. In fact, if \(\psi_{\alpha,\varepsilon,p}^{\circ}(x;d)<0\), then ψ α,ε,p (x+td)<ψ α,ε,p (x) holds whenever t>0 is small enough.

Theorem 3.1

Let α>0, p∈[1,∞] and Λ α (x) be the set of all the Lagrange multipliers associated with y α (x). If (2) holds and xD does not solve (EP), then

$$\psi_{\alpha,\varepsilon,p}^\circ \bigl(x ; y_\alpha(x)-x \bigr) < 0, $$

and therefore y α (x)−x is a descent direction for ψ α,ε,p at x, provided that 1/ε≥∥λ + q , where

$$\lambda^+_i = \left \{ \begin{array}{l@{\quad}l} \lambda_i, & \textit{if } c_i(x)>0, \\ 0, & \textit{otherwise,} \end{array} \right . $$

for any given λΛ α (x), and ∥⋅∥ q is the dual norm of ∥⋅∥ p .

Proof

Since x does not solve (EP), then d:=y α (x)−x≠0. Considering the convex function v(x):=∥c +(x)∥ p , then v (x;d) coincides with its directional derivative v′(x;d). Thus, the generalized directional derivative of ψ α,ε,p satisfies the following inequality:

$$\psi_{\alpha,\varepsilon,p}^\circ(x;d) \leq\varphi_{\alpha }^\circ(x;d) + \frac{1}{\varepsilon} \, v'(x;d). $$

The following chain of inequalities and equalities holds:

$$\begin{aligned} \varphi_{\alpha}^\circ(x;d) \leq&-\bigl\langle { \nabla_x f\bigl(x,y_\alpha (x)\bigr) + \alpha \nabla_x h\bigl(x,y_\alpha(x)\bigr)} , {d} \bigr\rangle \\ < & \bigl\langle {\nabla_y f\bigl(x,y_\alpha(x)\bigr) + \alpha\nabla_y h\bigl(x,y_\alpha(x)\bigr) } , {d} \bigr\rangle \\ = &- \displaystyle{\sum_{i=1}^m} \lambda_i \, \bigl\langle {\nabla c_i(x)} , {d} \bigr\rangle \\ = & \displaystyle{\sum_{i=1}^m} \lambda_i \, c_i(x) \\ \leq& \displaystyle{\sum_{i=1}^m} \lambda^+_i \, c^+_i(x) \\ = & \bigl\langle {\lambda^+} , {c^+(x)} \bigr\rangle . \end{aligned}$$

The first inequality is actually Theorem 2(b) in [14], while the second one is due to the strict ∇-monotonicity of f+αh. The subsequent equalities follow from the multipliers’ rule and the complementarity slackness condition: in fact, we have

$$ \nabla_y f \bigl(x,y_\alpha(x) \bigr) + \alpha \,\nabla_y h \bigl(x,y_\alpha(x) \bigr) + \displaystyle{\sum_{i=1}^m} \lambda_i \nabla c_i(x) = 0 $$
(6)

and

$$ \lambda_i \, \bigl[c_i(x) + \bigl\langle {\nabla c_i(x)} , {y_\alpha(x)-x} \bigr\rangle \bigr] = 0, \quad i=1,\dots,m, $$
(7)

since λ is a Lagrange multiplier associated with y α (x). Finally, the last inequality and equality follow immediately from the definitions. Moreover, the proof of Lemma 4 in [14] shows that v′(x;d)≤−v(x). Thus, we have:

$$\begin{aligned} \psi_{\alpha,\varepsilon,p}^\circ(x;d) <& \bigl\langle {\lambda^+} , {c^+(x)} \bigr\rangle - \varepsilon^{-1} \, \big\| c^+(x)\big\| _p \\ \leq&\|\lambda^+\|_q \, \big\| c^+(x)\big\| _p - \varepsilon^{-1}\,\big\| c^+(x)\big\| _p \\ =& \bigl( \|\lambda^+\|_q - \varepsilon^{-1} \bigr)\, \big\| c^+(x)\big\| _p \\ \leq& 0, \end{aligned}$$

since 1/ε≥∥λ + q . □

Note that y α (x)−x is a descent direction for ψ α,ε,p at a feasible point x for any ε>0: in fact, xC implies λ +=0 and hence 1/ε≥∥λ + q holds for all ε>0. Under assumptions on f other than strict ∇-monotonicity, this is no longer necessarily true (see [14, Theorem 4]).

The solutions of (EP) coincide with the global minima of ψ α,ε,p on the set D, provided that ε is small enough (see Lemma 2.2). As a consequence of Theorem 3.1, a further result holds under the strict ∇-monotonicity of f. In fact, provided that ε is small enough, if (2) holds, then the stationary points of ψ α,ε,p on D, i.e., those x D such that

$$\begin{aligned} \psi_{\alpha,\varepsilon,p}^{\circ} \bigl(x^* ; y-x^* \bigr) \geq0, \quad\forall y \in D, \end{aligned}$$

solve (EP). Furthermore, an explicit bound on ε is also available.

Corollary 3.1

Let α>0, p∈[1,∞]. If (2) holds and

$$ 1/\varepsilon\geq\sup \bigl\{ \| \lambda^+ \|_q: \lambda\in \varLambda_{\alpha}(x), x \in D \bigr\} , $$

then any stationary point of ψ α,ε,p on D solves (EP).

Proof

The thesis follows immediately from Theorem 3.1, provided that the bound is finite. Consider the Lagrangian function

$$L(y) := f(x,y) + \alpha\,h(x,y) + \sum_{i=1}^m \lambda_i \bigl[c_i(x) + \bigl\langle {\nabla c_i(x)} , {y-x} \bigr\rangle \bigr], $$

where xD and λΛ α (x) are fixed. The complementarity slackness condition (7) implies L(y α (x))=−φ α (x), while the multipliers’ rule (6) states that y α (x) is a stationary point of L. Since L is a convex function, then y α (x) minimizes L on \(\mathbb{R} ^{n}\). Considering any point \(\hat{x}\in\mathbb{R}^{n}\) which satisfies the Slater constraint qualification (that is, \(c_{i}(\hat{x}) < 0\) for all i=1,…,m), the following chain of inequalities holds:

$$\begin{aligned} f(x,\hat{x}) + \alpha\,h(x,\hat{x}) + \varphi_\alpha(x) \ge& \displaystyle{\sum_{i=1}^m \lambda_i} \bigl[-c_i(x) - \bigl\langle {\nabla c_i(x)} , {\hat{x}-x} \bigr\rangle \bigr] \\ \geq& -\displaystyle{\sum_{i=1}^m \lambda_i} \, c_i(\hat{x}) \\ \geq& \zeta\, \displaystyle{\sum_{i=1}^m} \lambda_i, \end{aligned}$$

where

$$\zeta:= \min\bigl\{ -c_1(\hat{x}),\dots,-c_m(\hat{x})\bigr\} >0.$$

The first inequality simply states

$$L(y_\alpha(x))\leq L(\hat{x}),$$

while the second one follows from the convexity of the functions c i ’s. Therefore, we have

$$\| \lambda\|_1 \leq \bigl[ f(x,\hat{x}) + \alpha\,h(x,\hat{x}) + \varphi_\alpha(x) \bigr]/\zeta. $$

The maximum of the right-hand side over D is finite since f, h and φ α are continuous and D is compact. Therefore, all the multipliers λΛ α (x) are uniformly bounded over xD. Since ∥λ +1≤∥λ1, the same property holds also for all the corresponding vectors λ +: thus, the bound in the statement is finite. □

Therefore, any local minimization method could be directly applied for solving (EP) exploiting a unique penalized gap function, but the computation of the bound would be required. Actually, this is not necessary: a descent method can be devised moving away from the current iterate x k along the direction d k=y α (x k)−x k after updating the penalization parameter ε just in case it is too big for d k to be a descent direction. Clearly, d k=0, which means that x k is a fixed point of the map y α , guarantees that x k solves (EP).

Algorithm 1

(0):

Choose a sequence ρ j ↓0, p∈[1,∞], α>0, β,γ∈]0,1[, x 0D. Set ε=ρ 0, j=0 and k=0.

(1):

Compute y k=argmin{f(x k,y)+αh(x k,y):yP(x k)} and λ k a corresponding Lagrange multiplier vector.

(2):

If d k:=y kx k=0, then STOP.

(3):

While 1/ε<∥(λ k)+ q do

  set ε=ρ j+1 and j=j+1.

end

(4):

Compute the smallest non-negative integer s such that

$$\psi_{\alpha,\varepsilon,p} \bigl(x^k+\gamma^s\,d^k \bigr) - \psi_{\alpha,\varepsilon,p} \bigl(x^k \bigr) \leq- \beta \, \gamma^{2s} \, \|d^k\|_2, $$

  set t k =γ s, x k+1=x k+t k d k, k=k+1 and goto Step 1.

Theorem 3.2

If (2) holds, then either Algorithm 1 stops at a solution of (EP) after a finite number of iterations or ε is updated at most a finite number of times and Algorithm 1 produces a bounded sequence {x k}, such that any of its cluster points solves (EP) and the sequence of the values {ψ α,ε,p (x k)} converges to 0.

Proof

Since there exists M>0 such that ∥λ q M holds for all xD and all λΛ α (x) (see the proof of Corollary 3.1), the parameter ε is updated at most a finite number of times and Step 3 may never loop indefinitely.

Next, we show that the line search procedure in Step 4 is finite. By Theorem 3.1 and the choice of ε at Step 3, \(\psi _{\alpha,\varepsilon,p}^{\circ}(x^{k} ; d^{k}) < 0\) holds for all k. By contradiction, suppose there exists an iteration k such that

$$\psi_{\alpha,\varepsilon,p} \bigl(x^k+\gamma^s\,d^k \bigr) - \psi_{\alpha,\varepsilon,p} \bigl(x^k \bigr) > - \beta\, \gamma^{2s} \, \|d^k\|_2 $$

holds for all \(s \in\mathbb{N}\). Then, taking the limit we get the contradiction:

$$\psi_{\alpha,\varepsilon,p}^\circ \bigl(x^k; d^k \bigr) \geq\limsup_{s\to\infty} \gamma^{-s} \, \bigl[ \psi_{\alpha,\varepsilon,p} \bigl(x^k+\gamma^s\,d^k \bigr) - \psi_{\alpha,\varepsilon,p} \bigl(x^k \bigr) \bigr] \geq0. $$

If the algorithm stops at x after a finite number of iterations, then the stopping criterion guarantees that x solves (EP) because of Lemma 2.1.

Now, suppose the algorithm generates an infinite sequence {x k}: the sequence is bounded since x k is a convex combination of x k−1 and y α (x k), which both belong to D. Consider any cluster point x of the sequence. Taking the appropriate subsequence {x }, we have x x . Without any loss of generality, we can assume that ε is constant for all the iterations. Moreover, the continuity of the map y α guarantees d d =y α (x )−x . We want to prove d =0 and therefore that x solves (EP). By contradiction, suppose d ≠0. Since the sequence {ψ α,ε,p (x k)} is monotone, decreasing and bounded below, it has a limit. Hence, we also have

$$\lim_{\ell\to\infty} \bigl[ \psi_{\alpha,\varepsilon,p} \bigl(x^{\ell} \bigr) - \psi_{\alpha,\varepsilon,p} \bigl(x^{\ell+1} \bigr) \bigr] = 0. $$

Moreover, the stepsize rule 4 guarantees

$$\psi_{\alpha,\varepsilon,p} \bigl(x^{\ell} \bigr) - \psi_{\alpha ,\varepsilon,p} \bigl(x^{\ell+1} \bigr) \geq\beta\, t_\ell^2 \, \|d^\ell\|_2 > 0. $$

Therefore, t →0 as →+∞ since d ≠0. Moreover, the inequality

$$ \psi_{\alpha,\varepsilon,p} \bigl(x^{\ell} + t_\ell\, \gamma^{-1} \, d^{\ell} \bigr) - \psi_{\alpha,\varepsilon,p} \bigl(x^\ell \bigr) > - \, \beta\, \bigl(t_\ell\, \gamma^{-1} \bigr)^2 \, \|d^\ell\|_2 $$
(8)

holds for all \(\ell\in\mathbb{N}\). Since ψ α,ε,p is locally Lipschitz continuous, the mean value theorem guarantees that

$$ \psi_{\alpha,\varepsilon,p} \bigl(x^{\ell} + t_\ell\, \gamma^{-1} \, d^{\ell} \bigr) - \psi_{\alpha,\varepsilon,p} \bigl(x^\ell \bigr) = \bigl\langle {\xi^\ell} , {t_{\ell} \, \gamma ^{-1} \, d^\ell} \bigr\rangle , $$
(9)

where ξ is a generalized gradient of ψ α,ε,p at x +θ t γ −1d , holds for some θ ∈]0,1[. Hence, (8) and (9) imply

$$\bigl\langle {\xi^\ell} , {\, d^\ell} \bigr\rangle > - \, \beta\, t_\ell\, \gamma^{-1} \, \| d^\ell \|_2. $$

On the other hand, we have

$$\psi_{\alpha,\varepsilon,p}^\circ \bigl( x^\ell+ \theta_\ell\, t_\ell\, \gamma^{-1} \, d^\ell; d^\ell \bigr) \geq\bigl\langle {\xi^\ell} , {\, d^\ell} \bigr\rangle , $$

and thus

$$\psi_{\alpha,\varepsilon,p}^\circ \bigl( x^\ell+ \theta_\ell\, t_\ell\, \gamma^{-1} \, d^\ell; d^\ell \bigr) > - \, \beta\, t_\ell\, \gamma^{-1} \, \|d^\ell\|_2. $$

Since x x , d d , and t →0, we get x +θ t γ −1d x . Since \(\psi_{\alpha,\varepsilon,p}^{\circ}\) is upper semicontinuous as function of (x;d) (see, for instance, [32]), taking the limit we get

$$ \psi_{\alpha,\varepsilon,p}^\circ \bigl(x^*; d^* \bigr) \geq \limsup_{\ell\to\infty} \psi_{\alpha,\varepsilon,p}^\circ \bigl(x^\ell+ \theta_\ell\, t_\ell\, \gamma^{-1} \, d^\ell; d^\ell \bigr) \geq0. $$
(10)

Eventually taking a subsequence, λ λ as →+∞ for some \(\lambda^{*}\in\mathbb{R}^{m}_{+}\) since the sequence {λ } bounded. Moreover, we have λ Λ α (x ) since the set-valued map Λ α is closed (see [33, Lemma 2]). Therefore, 1/ε≥∥(λ )+ q implies 1/ε≥∥(λ )+ q , and hence Theorem 3.1 ensures \(\psi_{\alpha,\varepsilon,p}^{\circ}(x^{*}; d^{*})<0\) in contradiction with (10).

Therefore, x solves (EP). By Lemma 2.2 we have ψ α,ε,p (x )=0, and thus 0 is the limit of the whole sequence {ψ α,ε,p (x k)}. □

The algorithm does not require to check that the penalized gap function is positive at the current iterate x k. Indeed, ψ α,ε,p (x k)<0 might happen at some iteration and a descent step could be taken as well. Anyhow, negative values may be met only as long as the penalization parameter ε keeps being updated, hence they are possible at most a finite number of times. In fact, after ε is fixed once and for all, the sequence {ψ α,ε,p (x k)} is monotone, decreasing and goes to 0, which could not happen with negative values.

The main difference between the above algorithm and the one in [14] is about the parameters’ update. Parameters are updated to guarantee that d k is a descent direction: in Algorithm 1 it is enough to check that ε is small enough, while in the algorithm of [14] also two further conditions must be met and α has to be updated, too. This difference affects the behaviour of the parameters meaningfully: in Algorithm 1 α is fixed and ε is updated at most a finite number of times, while in the other algorithm α and ε change simultaneously and may actually go to zero (see Example 2.1).

In general, a different kind of line search can be considered, too: the term ∥d2 may be replaced by the value of the considered gap function (see [13]). Applying this idea in the current framework requires some additional care as the solution strategy is based on the minimization of the penalized gap function ψ α,ε,p . While a descent direction can be obtained in the same way of Algorithm 1, lack of feasibility may create some further troubles. A line search based on the inequality

$$\psi_{\alpha,\varepsilon,p} \bigl(x^k+\gamma^s\,d^k \bigr) - \psi_{\alpha,\varepsilon,p} \bigl(x^k \bigr) \leq- \beta \, \gamma^{2s} \, \psi_{\alpha,\varepsilon,p} \bigl(x^k \bigr) $$

may not work. In fact, the right-hand size of the inequality has to be negative. Therefore, a further check has to be performed to guarantee that the penalized gap function is positive at the current iterate x k: if it is not so, it is enough to decrease the penalization parameter ε to get a positive value (see Lemma 2.2). Anyway, feasibility may be not achieved all the same: the method could provide sequences for which the values of the gap function go to zero, but whose cluster points are not feasible. Adding the penalty term ∥c +(x)∥ p to the right-hand side of the line search inequality allows to get feasibility as well.

Algorithm 2

  1. (0)

    Choose a sequence ρ j ↓0, p∈[1,∞], α,δ>0, β,γ∈]0,1[, x 0D. Set ε=ρ 0, j=0 and k=0.

  2. (1)

    Compute y k=argmin{f(x k,y)+αh(x k,y):yP(x k)} and λ k a corresponding Lagrange multiplier vector.

  3. (2)

    If d k:=y kx k=0, then STOP.

  4. (3)

    While ψ α,ε,p (x k)≤0 or 1/ε<∥(λ k)+ q do

      set ε=ρ j+1 and j=j+1.

    end

  5. (4)

    Compute the smallest non-negative integer s such that

    $$\psi_{\alpha,\varepsilon,p} \bigl(x^k+\gamma^s\,d^k \bigr) - \psi_{\alpha,\varepsilon,p} \bigl(x^k \bigr) \leq- \beta \, \gamma^{2s} \, \bigl[ \psi_{\alpha,\varepsilon,p} \bigl(x^k \bigr) + \delta\big\| c^+ \bigl(x^k \bigr)\big\| _p \bigr], $$

      set t k =γ s, x k+1=x k+t k d k, k=k+1 and goto Step 1.

Theorem 3.3

If (2) holds, then either Algorithm 2 stops at a solution of (EP) after a finite number of iterations or ε is updated at most a finite number of times and Algorithm 2 produces a bounded sequence {x k}, such that any of its cluster points solves (EP) and the sequence of the values {ψ α,ε,p (x k)} converges to 0.

Proof

The parameter ε is updated at most a finite number of times since all the multipliers λΛ α (x) are uniformly bounded over xD (see the proof of Corollary 3.1). Furthermore, arguing as in the proof of Theorem 3.2, it is easy to show that the line search procedure in Step 4 is finite.

If the algorithm stops at x after a finite number of iterations, then the stopping criterion guarantees that x solves (EP) because of Lemma 2.1.

Now, suppose the algorithm generates an infinite sequence {x k}: the sequence is bounded since x k is a convex combination of x k−1 and y α (x k), which both belong to D. Consider any cluster point x of the sequence. Taking the appropriate subsequence {x }, we have x x . Without any loss of generality, we can assume that ε is constant for all the iterations. By contradiction, suppose that x does not solve (EP). If x C, then

$$\psi_{\alpha,\varepsilon,p} \bigl(x^* \bigr)+\delta\big\| c^+ \bigl(x^* \bigr)\big\| _p = \psi_{\alpha,\varepsilon,p} \bigl(x^* \bigr) = \varphi_\alpha \bigl(x^* \bigr) >0. $$

On the other hand, if x DC, then ∥c +(x )∥ p >0 and ψ α,ε,p (x )≥0, since ψ α,ε,p (x )>0. Therefore, we have

$$\psi_{\alpha,\varepsilon,p} \bigl(x^* \bigr)+\delta\big\| c^+ \bigl(x^* \bigr)\big\| _p > 0 $$

also in this case. Since the sequence {ψ α,ε,p (x k)} is monotone, decreasing and bounded below, it has a limit. Hence, we also have

$$\lim_{\ell\to\infty} \bigl[ \psi_{\alpha,\varepsilon,p} \bigl(x^{\ell} \bigr) - \psi_{\alpha,\varepsilon,p} \bigl(x^{\ell+1} \bigr) \bigr] = 0. $$

Moreover, the stepsize rule 4 guarantees

$$ \psi_{\alpha,\varepsilon,p} \bigl(x^{\ell } \bigr) - \psi_{\alpha,\varepsilon,p} \bigl(x^{\ell+1} \bigr) \geq\beta\, t_\ell^2 \, \bigl[ \psi_{\alpha,\varepsilon,p} \bigl(x^\ell \bigr) + \delta\big\| c^+ \bigl(x^\ell \bigr)\big\| _p \bigr] > 0. $$
(11)

Since

$$\lim_{\ell\to\infty} \psi_{\alpha,\varepsilon,p} \bigl(x^\ell \bigr) + \delta\big\| c^+ \bigl(x^\ell \bigr)\big\| _p = \psi_{\alpha,\varepsilon,p} \bigl(x^* \bigr) + \delta\big\| c^+ \bigl(x^* \bigr)\big\| _p > 0, $$

then t →0 as →+∞. Arguing as in the proof of Theorem 3.2, we get

$$ \psi_{\alpha,\varepsilon,p}^\circ \bigl(x^*; y_{\alpha} \bigl(x^* \bigr)-x^* \bigr) \geq0 $$
(12)

and 1/ε≥∥(λ )+ q for some λ Λ α (x ). Therefore, Theorem 3.1 guarantees

$$\psi_{\alpha,\varepsilon,p}^\circ \bigl(x^*; y_{\alpha} \bigl(x^* \bigr)-x^* \bigr)<0, $$

which contradicts (12).

Therefore, x solves (EP). By Lemma 2.2 we have ψ α,ε,p (x )=0, and thus 0 is the limit of the whole sequence {ψ α,ε,p (x k)}. □

Though Algorithm 2 updates the penalization parameter ε according to a different rule from Algorithm 1, it is still updated at most a finite number of times, thus preventing the possible numerical troubles due to arbitrarily small parameters.

4 Conclusions

In the paper, two globally convergent algorithms for solving equilibrium problems with nonlinear constraints have been developed. They are both based on the minimization of a suitable penalized gap function: at each iteration they solve a convex optimization problem with linear constraints, but the computation of the generalized derivative of the penalized gap function is not needed. The rule to update the penalization parameter and the line search are the core differences between the two algorithm.

This paper extends to equilibrium problems the ideas given in [30] for variational inequalities only. Moreover, explicit rules to update the penalization parameter ε throughout the iterations are given, while the algorithm in [30] requires the a priori knowledge of a suitable fixed ε. Furthermore, the new algorithms perform Armijo-type inexact line searches instead of the rather theoretical exact line search of [30].

The descent methods given in [13, 21, 23, 25, 26] do not perform any constraint linearization, and hence convex optimization problems with nonlinear constraints have to be solved at each iteration, while the algorithms of this paper provide this valuable feature. Moreover, the convergence of the methods proposed in [21, 25, 26] requires the strong ∇-monotonicity of the equilibrium bifunction (see, for instance, [26, condition (15)]), which is a stronger assumption than the strict ∇-monotonicity condition (2) used in this paper.

While the methods proposed in [13, 14, 23] converge under assumptions which are neither stronger nor weaker than condition (2), the behaviour of the regularization and penalization parameters can be compared. In the algorithms of this paper the regularization parameter α is fixed and the penalization parameter ε is updated at most a finite number of times, and thus they are both bounded away from zero. Conversely, α and ε change simultaneously and may actually go to zero in [14], α is the unique parameter and may go to zero in [13], while α is fixed but ε (which actually plays the role of a further regularization parameter as no penalization is involved) always goes to zero in [23].