1 Introduction

Optimization is a widespread mathematical technique in many engineering and economic applications. However, in many real-world problems, an objective function to be optimized is missing and the concept of equilibrium becomes crucial. Roughly speaking, if optimization takes care of the system utility function, the equilibrium takes into account the mutual interaction between users. In recent years, the interest in equilibrium problems has widely grown. The main applications are concerned with traffic over telecommunication networks or over public roads, oligopolistic and spatial price markets, financial markets, risk management, climate competition, migration problems, power allocation in radio systems, internet advertising, cloud computing (see, e.g., Altman and Wynter 2004; Ardagna et al. 2011, 2013; Bigi et al. 2009; Bigi and Passacantando 2012; Dafermos 1980; Drouet et al. 2008; Ferris and Pang 1997; Forgó et al. 2005; Konnov 2007, 2008a, b; Liu and Nagurney 2007; Miller and Ruszczyński 2008; Mordukhovich et al. 2007; Nagurney 1993, 2010; Patriksson 1994; Pang et al. 2010; Scutari et al. 2010; Wardrop 1952; Zhao and Nagurney 2008 and references therein).

All these problems have been formulated in the literature through variational mathematical models as complementarity problems, variational inequalities, quasi-variational inequalities, and Nash equilibrium problems among others. Variational inequalities (VIs) are one of the most known variational models. They were introduced by Hartman and Stampacchia (1966) as a tool for studying partial differential equations in infinite dimensional spaces arising from mechanics (free-obstacle problem, friction problem, etc.). Later, their applications to contact problems in mechanical structures provided a vaste source of finite dimensional problems.

A finite-dimensional VI is defined as follows:

$$\begin{aligned} \text {find }x^*\in C\text { such that }\langle F(x^*),y-x^* \rangle \ge 0,\quad \text { for all }y \in C, \end{aligned}$$
(VI)

where \(F:\mathbb {R}^n\rightarrow \mathbb {R}^n\), C is a closed and convex subset of \(\mathbb {R}^n\) and \(\langle \cdot ,\cdot \rangle \) is the scalar product in \(\mathbb {R}^n\).

Several kinds of numerical methods to solve VIs have been proposed (see, e.g., Facchinei and Pang 2003; Harker and Pang 1990 and references therein). One popular approach is based on the reformulation of (VI) as an optimization problem through suitable merit functions.

A function \(p:\mathbb {R}^n \rightarrow \mathbb {R}\) is called merit function for (VI) if there exists a set \({\varOmega }\subseteq \mathbb {R}^n\) such that:

  • p is nonnegative on \({\varOmega }\),

  • \(x^*\) is a solution to (VI) if and only if \(x^* \in {\varOmega }\) and \(p(x^*)=0\).

If the set \({\varOmega }\) coincides with the feasible set C of (VI), a merit function is also known in the literature as a gap function. Hence, if (VI) has at least a solution, then it is equivalent to the optimization problem

$$\begin{aligned} \min _{x \in {\varOmega }}\ p(x). \end{aligned}$$

Therefore, merit functions are the key concept to build a bridge between VIs and optimization.

In this paper we aim at reviewing the state of the art concerning the merit function approach for VIs and two interesting generalization of VIs: quasi-variational inequalities and abstract equilibrium problems. This is an updated version of the previous review from Pappalardo et al. (2014).

The rest of the paper is organized as follows: Sect. 2 is devoted to the preliminary concepts that will be used in the paper. Section 3 deals with both constrained and unconstrained optimization reformulations of (VI). In particular, we will describe continuity and differentiability properties of merit functions, conditions under which merit functions are convex or their stationary points solve (VI) and error bound results, i.e., how the distance between an arbitrary point x and the solution set of (VI) can be estimated in terms of the merit function value at x. Furthermore, ad-hoc descent methods for minimizing merit functions will be shown. Sections 4 and 5 are devoted to the results about merit functions for quasi-variational inequalities and abstract equilibrium problems, respectively. Examples of applications of the presented models are provided in Sects. 3, 4 and 5. Some concluding remarks and suggestions for future research are collected in Sect. 6. We hope that this paper may stimulate further interest in merit functions and may be the basis to obtain new results.

2 Preliminaries

In this section, we show two classic particular cases of (VI), and we recall the main definitions and preliminary results that will be used throughout the paper. We make the blanket assumptions that the feasible set C of (VI) is closed and convex and the operator F is continuous on C.

Optimality conditions As first particular case, let us consider the problem of finding a local minimum \(x^*\) of a differentiable function \(\psi : \mathbb {R}^n \rightarrow \mathbb {R}\) over the set C. The classic first order necessary optimality condition states that the directional derivative of \(\psi \) at \(x^*\) in any feasible direction is nonnegative, i.e.

$$\begin{aligned} \langle \nabla \psi (x^*) , y - x^* \rangle \ge 0, \qquad \forall \ y \in C. \end{aligned}$$

This condition is a particular case of (VI) where \(F(x)=\nabla \psi (x)\).

Complementarity problems Another example of (VI) is provided by a complementarity problem described as follows: given a closed convex cone \(C \subseteq \mathbb {R}^n\) and a mapping \(F:\mathbb {R}^n\rightarrow \mathbb {R}^n\), the complementarity problem asks to determine a point \(x^* \in C\) such that

$$\begin{aligned} \langle F(x^*),x^*\rangle = 0 \quad \text { and } \quad F(x^*) \in C^*, \end{aligned}$$

where \(C^*\) denotes the dual cone of C, i.e.

$$\begin{aligned} C^* := \{ d \in \mathbb {R}^n:\ \langle d , y \rangle \ge 0 \text { for all } y \in C \}. \end{aligned}$$

Solving the complementarity problem amounts to solving (VI). In fact, if \(x^*\) solves the complementarity problem, then for any \(y \in C\) we have

$$\begin{aligned} \langle F(x^*), y-x^* \rangle = \langle F(x^*), y \rangle \ge 0, \end{aligned}$$

hence \(x^*\) solves (VI); vice versa, if \(x^*\) solves (VI), then setting \(y=0\) and \(y=2x^*\) (which belong to C because C is a cone) we obtain \(\langle F(x^*),x^*\rangle = 0\) and hence \(F(x^*) \in C^*\), that is \(x^*\) is a solution to the complementarity problem. Note that if we define

$$\begin{aligned} p(x):=\langle F(x),x\rangle , \quad {\varOmega }:=\{x \in C:\ F(x) \in C^*\}, \end{aligned}$$
(1)

then \(p(x) \ge 0\) for any \(x \in {\varOmega }\) and \(x^*\) solves the complementarity problem if and only if \(x^* \in {\varOmega }\) and \(p(x^*)=0\), i.e., p is a merit function for the complementarity problem.

Monotonicity definitions Monotonicity is a key assumption to establish existence of solutions, convergence results for algorithms and to provide error bounds for (VI). We now recall the main monotonicity properties that will be exploited in the paper. F is said monotone on C if

$$\begin{aligned} \langle F(x)-F(y),x-y\rangle \ge 0, \qquad \forall \ x,y\in C; \end{aligned}$$

the corresponding concept of strict monotonicity is analogously defined just requiring strict inequality to hold for any \(x,y \in C\) with \(x \ne y\); F is said strongly monotone on C with modulus \(\mu \) if

$$\begin{aligned} \langle F(x)-F(y),x-y\rangle \ge \mu \Vert x-y\Vert ^2, \qquad \forall \ x,y\in C, \end{aligned}$$

for some \(\mu >0\); F is said pseudomonotone on C if for any \(x,y \in C\) one has

$$\begin{aligned} \langle F(y),x-y \rangle \ge 0 \quad \Longrightarrow \quad \langle F(x),x-y \rangle \ge 0. \end{aligned}$$

In the particular case where \(F(x)=\nabla \psi (x)\), monotonicity and strong monotonicity of F on C are equivalent to convexity and strong convexity of \(\psi \) on C, respectively.

Existence results We now recall two basic results concerning the existence of a solution to (VI). For the sake of simplicity, we will not consider the sharpest possible assumptions. The solution set of (VI) is nonempty if either the feasible set C is bounded (Hartman and Stampacchia 1966) or the following coercivity condition holds: there exists a point \(y\in C\) such that

$$\begin{aligned} \lim _{\Vert x\Vert \rightarrow \infty , x \in C} \langle F(x),y-x \rangle = -\infty . \end{aligned}$$
(2)

In particular, condition (2) holds if F is strongly monotone on C. Moreover, the strong monotonicity of F ensures that (VI) has a unique solution.

Fixed point problem reformulation If we denote by \(\pi _C\) the Euclidean projection operator on C, i.e.,

$$\begin{aligned} \pi _C(x) := \arg \min _{y \in C} \Vert y-x\Vert , \end{aligned}$$

then it is well-known that (VI) is equivalent to finding a fixed point of the operator \(x \mapsto \pi _{C}(x-F(x))\) (see, e.g., Facchinei and Pang 2003).

Complementarity problem reformulation Assuming

$$\begin{aligned} C := \{ x \in \mathbb {R}^n:\ g_i(x) \le 0, \quad i=1,\dots ,m\}, \end{aligned}$$
(3)

where the functions \(g_i\) are differentiable and convex for all \(i=1,\dots ,m\), it is possible to derive the Karush–Kuhn–Tucker (KKT) conditions for (VI). In fact, \(x^*\) is a solution to (VI) if and only if it is a global minimum of the following convex optimization problem:

$$\begin{aligned} \min _{y \in C}\ \langle F(x^*), y \rangle . \end{aligned}$$

Under some constraint qualification, the following KKT conditions

$$\begin{aligned} \left\{ \begin{array}{ll} F(x^*) + \sum \limits _{i=1}^m \lambda ^*_i \nabla g_i(x^*) = 0, &{} \\ \lambda ^*_i \, g_i(x^*) = 0, &{} i=1,\dots ,m, \\ \lambda ^*_i \ge 0, \quad g_i(x^*) \le 0, &{} i=1,\dots ,m, \end{array} \right. \end{aligned}$$
(4)

are necessary and sufficient for optimality and, in turn, for the existence of solutions to (VI). It is well-known that KKT system (4) is equivalent to a complementarity problem.

3 Merit functions for variational inequalities

In this section, we summarize several approaches in order to express (VI) as a constrained or unconstrained optimization problem by means of different merit functions.

A first merit function can be defined by exploiting the fixed point reformulation stated in Sect. 2. In fact, the function \(x \mapsto \Vert x-\pi _{C}(x-F(x)) \Vert \) is a gap function for (VI) (see Eaves 1971).

A further merit function for (VI) can be obtained by means of the complementarity reformulation (4), which in turn can be associated with the merit function (1). Further examples of merit functions associated with a complementarity problem can be found in Fischer and Jiang (2000).

3.1 Constrained optimization reformulations

This section is devoted to gap functions for (VI).

3.1.1 Auslender gap function

A first example of gap function was given in Auslender (1976), where the following function was introduced:

$$\begin{aligned} p(x) := \sup _{y\in C}\ \langle F(x),x-y \rangle . \end{aligned}$$
(5)

It is trivial to prove that p is a gap function for (VI). Since the supremum in (5) can be infinite or not attained in a unique point, this function is in general neither finite, nor differentiable, nor convex. However, when C is bounded and F is continuously differentiable, it is finite and admits directional derivatives \(p'(x;d)\) at any point \(x \in C\) in any direction d. Moreover, if F is monotone, any stationary point \(x^*\) of p on C, i.e.

$$\begin{aligned} p'(x^*;y-x^*) \ge 0, \qquad \forall \ y\in C, \end{aligned}$$

is a solution to (VI). In the particular case of monotone affine VIs, i.e., if \(F(x)=Ax+b\) where A is a positive semidefinite matrix, p also turns out to be convex (Marcotte 1985). If F is strongly monotone with modulus \(\mu \) and \(x^*\) is the unique solution to (VI), then p provides the following error bound:

$$\begin{aligned} \Vert x-x^*\Vert \le \sqrt{p(x)/\mu }, \qquad \forall \ x \in C. \end{aligned}$$

A descent method based on the function p has been proposed in Marcotte and Dussault (1989) in the case where C is a bounded polyhedron. At each iteration, the descent direction is obtained by minimizing a linearization of a further gap function. If F is monotone, then the algorithm is globally convergentFootnote 1 to a solution to (VI). Moreover, the convergence is quadraticFootnote 2 if F is strongly monotone and the termination is achieved in a finite number of iterations if F is affine. Under a so-called “geometric stability condition”, it is shown that p also provides an error bound for (VI).

3.1.2 Regularized gap functions

Many efforts of the research have been directed to the study of differentiable gap functions in order to simplify the computational aspects of the problem. Important results in this sense have been obtained in Fukushima (1992), Larsson and Patriksson (1994), Wu et al. (1993), Zhu and Marcotte (1994).

First, Auchmuty (1989) proposed a scheme in order to define a general class of gap functions:

$$\begin{aligned} p^A(x) := \sup _{y\in C}\ \left[ \langle F(x), x-y \rangle + f(x) - f(y) -\langle \nabla f(x),x-y\rangle \right] , \end{aligned}$$
(6)

where \(f:\mathbb {R}^n\rightarrow \mathbb {R}\) is convex and continuously differentiable. It was proved that if \((x^*,y^*)\) is a saddle point of the function

$$\begin{aligned} L(x,y) := \langle F(x), x-y \rangle + f(x) - f(y) -\langle \nabla f(x),x-y\rangle \end{aligned}$$

on \(C\times C\), i.e.,

$$\begin{aligned} L(x^*,y) \le L(x^*,y^*) \le L(x,y^*),\quad \forall \ (x,y)\in C\times C, \end{aligned}$$

then \(x^*\) is a solution to (VI) and \(p^A\) is a gap function. We observe that if f is strongly convex on C and F is differentiable, then the function \(p^A\) is finite and differentiable.

Later, Fukushima (1992) introduced a gap function which is a special case of (6), setting \(f(x)= \langle x,Mx\rangle /2\), where M is a symmetric and positive definite matrix . It is defined by

$$\begin{aligned} p^F(x):=\max _{y\in C}\ \left[ \langle F(x),x-y\rangle - \frac{1}{2} \, \langle x-y,M\,(x-y) \rangle \right] . \end{aligned}$$
(7)

Note that the maximum in (7) is always attained in a unique point y(x) since the objective function is strongly concave with respect to the variable y, hence \(p^F\) is always finite. If F is continuously differentiable, then also \(p^F\) is so (Danskin 1966) and

$$\begin{aligned} \nabla p^F(x) = F(x) -[(\nabla F(x))^T-M](y(x)-x). \end{aligned}$$

Moreover, if \(x^*\) is a stationary point of \(p^F\) on C, i.e.

$$\begin{aligned} \langle \nabla p^F(x^*),y-x^*\rangle \ge 0, \qquad \forall \ y\in C, \end{aligned}$$

and the Jacobian matrix \(\nabla F(x^*)\) is positive definite, then \(x^*\) is a solution to (VI). In the special case of strongly monotone affine VIs, i.e., \(F(x)=Ax+b\) with A positive definite, \(p^F\) turns out to be convex (strongly convex) provided that the matrix \(A+A^T-M\) is positive semidefinite (positive definite) (see Larsson and Patriksson 1994).

A descent algorithm for minimizing \(p^F\) has been proposed in Fukushima (1992): given any starting point \(x^0 \in C\), the sequence \(\{x^k\}\) is generated by the iterations

$$\begin{aligned} x^{k+1}=x^k + t_k d^k, \end{aligned}$$
(8)

where the search direction \(d^k = y(x^k)-x^k\) and the stepsize \(t_k\in (0,1]\) is such that

$$\begin{aligned} p^F(x^k+t_kd^k) = \min _{t \in (0,1]} p^F(x^k+td^k). \end{aligned}$$
(9)

Under the assumptions that C is bounded, F is continuously differentiable on C and \(\nabla F(x)\) is positive definite for all \(x\in C\), the sequence \(\{x^k\}\) belongs to C and converges to the unique solution to (VI). This algorithm converges also employing an inexact line search rule, provided that F is strongly monotone on C and \(\nabla F\) is Lipschitz continuous on C.

A variant of the above method which does not require the strong monotonicity of F has been proposed in Zhu and Marcotte (1993), setting the matrix \(M=\alpha I\), where I is the identity matrix and \(\alpha >0\). In fact, the monotonicity of F paired with the boundedness of C guarantees that at any point \(x \in C\) the vector \(y(x)-x\), which depends on the matrix M and hence depends on \(\alpha \), is a descent direction for \(p^F\), provided that \(\alpha \) is small enough. The method performs a line search at the current iterate \(x^k\) if \(y(x^k)-x^k\) is a descent direction for \(p^F\); otherwise the value of \(\alpha \) is decreased. The method is globally convergent if C is bounded and F is continuously differentiable and monotone on C. This algorithm has been extended to nonsmooth VIs in Panicucci et al. (2009).

Marcotte and Dussault (1987) and Taji et al. (1993) propose a modified Newton method: at each iteration it finds the solution to the linearized (VI) at x, i.e.,

$$\begin{aligned} \text {find }z(x) \in C\text { s.t. } \langle F(x)+\nabla F(x)[z(x)-x], y-z(x) \rangle \ge 0, \qquad \forall \ y \in C. \end{aligned}$$
(10)

In the hypothesis of strong monotonicity of F, problem (10) admits a unique solution z(x) such that \(d=z(x)-x\) is a descent direction for the gap functions \(p^A\) and \(p^F\): the function \(p^A\) has been considered in Marcotte and Dussault (1987), while \(p^F\) in Taji et al. (1993). By employing a line search strategy, the method is shown to be quadratically convergent under suitable additional assumptions.

When the feasible set C, defined as in (3), is not a polyhedron, the evaluation of the regularized gap function \(p^F\) at a given point x could be computationally expensive. In order to overcome this drawback, the following gap function has been proposed in Taji and Fukushima (1996):

$$\begin{aligned} p^{TF}(x) := \max _{y\in T(x)} \left[ \langle F(x),x-y\rangle - \frac{1}{2}\langle x-y,M(x-y)\rangle \right] , \end{aligned}$$
(11)

where

$$\begin{aligned} T(x) := \{ y\in \mathbb {R}^n:\ g_i(x)+\langle \nabla g_i(x),y-x\rangle \le 0, \quad i=1,\ldots , m\} \end{aligned}$$
(12)

is an outer polyhedral approximation of C at x. If F is continuously differentiable, \(g_i\)’s are continuously differentiable and a constraint qualification holds, then \(p^{TF}\) is directionally differentiable. Furthermore, if \(\nabla F\) is positive definite and \(g_i\)’s are twice continuously differentiable, then any stationary point of \(p^{TF}\) on C is a solution to (VI). A successive quadratic programming algorithm based on the minimization of an exact penalty function associated with \(p^{TF}\) has been proposed in Taji and Fukushima (1996).

A generalization of the gap function introduced by Fukushima has been proposed in Wu et al. (1993), Zhu and Marcotte (1994) by replacing in (7) the regularizing term \(\langle x-y,M(x-y)\rangle /2\) with a general bifunction \(G:\mathbb {R}^n\times \mathbb {R}^n \rightarrow \mathbb {R}\) such that:

$$\begin{aligned} \begin{array}{l} G(x,y) \ge 0 \text { for all } (x,y) \in \mathbb {R}^n\times \mathbb {R}^n, \\ G\text { is continuously differentiable}, \\ G(x,\cdot )\text { is strongly convex on }C\text { for all }x\in C, \\ G(x,x)=\nabla _y G(x,x)=0\text { for all }x\in C. \end{array} \end{aligned}$$
(13)

For any \(\alpha >0\), the function

$$\begin{aligned} p_{\alpha }(x) := \max _{y\in C} \left[ \langle F(x),x-y\rangle - \alpha \, G(x,y) \right] \end{aligned}$$
(14)

turns out to be a gap function, which is continuously differentiable, if F is so, with

$$\begin{aligned} \nabla p_{\alpha }(x) = F(x) -(\nabla F(x))^T(y_\alpha (x)-x)-\alpha \nabla _x G(x,y_\alpha (x)), \end{aligned}$$
(15)

where \(y_\alpha (x)\) is the unique solution to problem (14). Note that when F is only locally Lipschitz continuous, the function \(p_{\alpha }\) is also locally Lipschitz and its Clarke generalized gradient satisfies a formula similar to (15) (see Ng and Tan 2007b). Moreover, if \(x^*\) is a stationary point of \(p_{\alpha }\) on C and \(\nabla F(x^*)\) is positive definite, then \(x^*\) solves (VI). A further important feature of \(p_{\alpha }\) is that, under the assumption that F is strongly monotone and \(\nabla _y G(x,\cdot )\) is Lipschitz continuous on C, for every \(x\in C\), it provides an error bound (see Zhu and Marcotte 1994), i.e., there exists \(M>0\) such that

$$\begin{aligned} \Vert x-x^*\Vert \le M \sqrt{p_{\alpha }(x)}, \qquad \forall \ x \in C, \end{aligned}$$
(16)

where \(x^*\) is the unique solution to (VI). In particular, (16) implies the boundedness of the sublevel sets of \(p_{\alpha }\), which is of crucial importance in the convergence of the minimization algorithms.

The descent methods developed in Fukushima (1992) have been generalized to a more general framework exploiting several classes of gap functions defined by (14) (see Zhu and Marcotte 1994). Given a continuous mapping \({\varGamma }:C\times C\rightarrow \mathbb {R}^n\), such that \({\varGamma }(x,\cdot )\) is strongly monotone on C for any \(x\in C\), the following auxiliary variational inequality is considered at a given point \(x \in C\): find \(y^* \in C\) such that

figure a

Having denoted by w(x) the unique solution to (AVI(x)), one can prove that the mapping \(w:C\rightarrow C\) is continuous and \(x^*\) is a solution to (VI) if and only if \(x^*=w(x^*)\). In view of this result the following iterative method is proposed. Given \(x^k \in C\), compute \(w(x^k)\): if \(w(x^k)=x^k\), then \(x^k\) is a solution to (VI), otherwise find \(x^{k+1}\) performing an ArmijoFootnote 3 inexact line search for the gap function \(p_{\alpha }\) along the direction \(d^k:=w(x^k)-x^k\).

Each combination of G and \({\varGamma }\) generates a different descent algorithm, which is globally convergent to the unique solution to (VI) under the assumption of continuous differentiability and strong monotonicity of F and suitable additional assumptions on G and \({\varGamma }\) (Zhu and Marcotte 1994). Note that the algorithm (8)–(9) previously described is recovered by setting \({\varGamma }(x,y)=M(y-x)\).

The analysis of the convergence properties of the descent methods based on the regularized gap functions \(p^F\) and \(p_{\alpha }\) in the case where the operator F is nondifferentiable has been considered in Ng and Tan (2007a) and Tan (2007).

3.1.3 Minty (dual) gap functions

The Minty (or dual) variational inequality was introduced in Minty (1967) and consists in finding \(x^*\in C\) such that

$$\begin{aligned} \langle F(y),y-x^* \rangle \ge 0, \qquad \forall \ y\in C. \end{aligned}$$
(MVI)

Its relevance to applications was pointed out in Giannessi (1998), Pappalardo and Passacantando (2002, 2004). In particular, Minty states the equivalence between (VI) and (MVI) when F is pseudomonotone on C (Minty 1967).

In parallel with the Auslender gap function, it can be shown that

$$\begin{aligned} p^M(x):= \sup _{y\in C}\ \langle F(y),x-y\rangle \end{aligned}$$
(17)

is a gap function for (MVI) and hence it is a gap function for (VI) provided that F is pseudomonotone on C.

The most important feature of this function, known in the literature as Minty (or dual) gap function, is its convexity. However, it is in general nondifferentiable; subdifferentiability and related properties have been analysed in Marcotte and Dussault (1987), Marcotte and Zhu (1999), Nguyen and Dupuis (1984), Yamashita and Fukushima (1997), Zhang et al. (2003). Furthermore, it can be difficult to evaluate \(p^M\) since the optimization problem in (17) is generally not convex.

A cutting plane method for minimizing \(p^M\) has been proposed in Nguyen and Dupuis (1984): at each iteration it solves a linear programming problem, provided that C is a polyhedron, and it converges to a solution to (VI) if F is strictly monotone. Later, this method has been combined with the Tikhonov regularization technique in order to deal with monotone VIs (Bigi and Panicucci 2010).

Following the scheme described before for (VI), it is possible to regularize the function \(p^M\) exploiting a bifunction G which satisfies conditions (13). In fact, the function

$$\begin{aligned} p^M_G(x) := \sup _{y\in C}\ [\langle F(y),x-y \rangle - G(x,y)] \end{aligned}$$
(18)

is a gap function for (MVI) (see Mastroeni 1999). Moreover, if the optimization problem in (18) has a unique solution y(x), then \(p^M_G\) is continuously differentiable and its gradient is given by

$$\begin{aligned} \nabla p^M_G(x)= F(y(x))-\nabla _x G(x,y(x)). \end{aligned}$$

In parallel with the analysis developed for (VI), a descent method for the function \(p^M_G\) has been proposed in Mastroeni (1999). Given any starting point \(x^0\in C\), any sequence \(\{x^k\}\) generated by an exact line search algorithm with descent direction given by \(y(x)-x\) converges to the unique solution to (VI), provided that C is compact, F is continuously differentiable, \(\nabla F\) is positive definite on C and

$$\begin{aligned} \nabla _x G(x,y) + \nabla _y G(x,y)=0, \qquad \forall \ x,y\in C. \end{aligned}$$

Observe that the latter condition is fulfilled, for instance, by

$$\begin{aligned} G(x,y)= \frac{1}{2}\,\langle x-y,M(x-y)\rangle . \end{aligned}$$

This algorithm has been extended in Mastroeni (2005) employing an inexact line search rule and replacing the assumption that \(\nabla F(x)\) is positive definite for \(x\in C\) with the strong monotonicity of F on C.

A different regularization of the gap function \(p^M\) has been proposed in Yamashita and Fukushima (1997), where the following function is considered:

$$\begin{aligned} p^M_{\beta }(x) := \sup _{y\in C}\ [\langle F(y),x-y \rangle +\beta \Vert x-y\Vert ^2], \end{aligned}$$
(19)

where \(\beta \) is a positive parameter. This function is convex and lower semicontinuous as the original \(p^M\). It is continuously differentiable provided that F is so and the supremum in (19) is attained in a unique point. Moreover, if F is strongly monotone on C with modulus \(\mu \), and \(\beta \in (0,\mu ]\), it is a gap function for (VI) and provides an error bound, i.e.,

$$\begin{aligned} \Vert x-x^*\Vert \le \sqrt{p^M_{\beta }(x)/\beta }, \qquad \forall \ x \in C, \end{aligned}$$
(20)

where \(x^*\) is the unique solution to (VI).

3.1.4 Gap functions based on conjugate duality

Given a convex function \(f:\mathbb {R}^n\rightarrow \mathbb {R}\) and a concave function \(g:\mathbb {R}^n\rightarrow \mathbb {R}\), we recall that

$$\begin{aligned} f^*(y):=\sup _{x\in \mathbb {R}^n } [\langle y,x\rangle -f(x)], \qquad g_*(y):=\inf _{x\in \mathbb {R}^n } [\langle y,x\rangle -g(x)] \end{aligned}$$

are the Fenchel conjugate, in convex and concave sense respectively, of f and g (see, e.g., Rockafellar 1970). Moreover, the Fenchel dual of the problem

$$\begin{aligned} \inf _{x\in \mathbb {R}^n} [f(x)-g(x)] \end{aligned}$$

is defined as

$$\begin{aligned} \sup _{y \in \mathbb {R}^n} [g_*(y)-f^*(y)]. \end{aligned}$$

Note that the Fenchel dual of the constrained problem

$$\begin{aligned} \inf _{x\in C} f(x) \end{aligned}$$

can be obtained defining \(g(x)=-\delta _C(x)\), where \(\delta _C\) is the indicatorFootnote 4 function of the set C, so that

$$\begin{aligned} \inf _{x\in C} f(x)=\inf _{x\in \mathbb {R}^n}[f(x)-(-\delta _C(x))], \end{aligned}$$

and the associated Fenchel dual turns out to be

$$\begin{aligned} \sup _{y\in \mathbb {R}^n} [-f^*(y)+\inf _{x\in C}\langle y,x\rangle ]. \end{aligned}$$

When the feasible set C is explicitly defined by convex constraints as in (3), the value of the Auslender gap function p at a given point x coincides (see Altangerel et al. 2007; Larsson and Patriksson 1994) with the opposite of the optimal value of the Fenchel dual of the problem

figure b

Moreover, the opposite of the optimal value of the so called Lagrangian dual and the Fenchel–Lagrange dual associated with P(x) leads to define a further gap function that coincides with

$$\begin{aligned} p^L(x):= \inf _{\lambda \ge 0}\sup _{y\in \mathbb {R}^n}\ [\langle F(x),x-y\rangle -\langle \lambda ,g(y)\rangle ], \end{aligned}$$
(21)

which has been proposed in Giannessi (1995).

Similarly, considering the opposite of the optimal values of the Lagrange and of the Fenchel dual associated with the problem

$$\begin{aligned} \inf _{y\in C} \langle F(y),y-x\rangle , \end{aligned}$$

which is equivalent to the one which appears in the right-hand side of (17), the following gap functions for (MVI) are defined (Altangerel et al. 2007):

$$\begin{aligned}&p^{ML}(x) := \inf _{\lambda \ge 0}\sup _{y\in \mathbb {R}^n}\ \left[ \langle F(y),x-y \rangle -\langle \lambda ,g(y) \rangle \right] , \end{aligned}$$
(22)
$$\begin{aligned}&p^{MF}(x) := \inf _{p\in \mathbb {R}^n} \left\{ \sup _{y \in \mathbb {R}^n} [\langle F(y),x-y\rangle +\langle p,y\rangle ]+\delta ^*_C(-p)\right\} , \end{aligned}$$
(23)

where \(\delta _C^*(x)=\sup _{y\in C} \langle x,y\rangle \) is the support function to the set C.

The proposed gap functions are all convex if F is an affine monotone map.

3.2 Unconstrained optimization reformulations

In this section, we review merit functions which allow to reformulate (VI) as an unconstrained optimization problem.

3.2.1 D-gap functions

The difference of two regularized gap functions

$$\begin{aligned} p_{\alpha \beta }(x) := p_\alpha (x) - p_\beta (x), \end{aligned}$$
(24)

where \(p_\alpha \) and \(p_\beta \) are defined by (14) with \(0 < \alpha < \beta \), is called D-gap function (where D stands for “difference”). This function is nonnegative on the whole space \(\mathbb {R}^n\) and \(p_{\alpha \beta } (x^*)=0\) if and only if \(x^*\) is a solution to (VI) (Yamashita et al. 1997). Therefore, solving (VI) is equivalent to finding the optimal solutions of the problem

$$\begin{aligned} \min _{x \in \mathbb {R}^n}\ p_{\alpha \beta } (x). \end{aligned}$$
(25)

When (VI) is a nonlinear complementarity problem, the D-gap function with \(\beta =1/\alpha \) and \(\alpha \in (0,1)\) coincides with the implicit Lagrangian proposed and studied in Mangasarian and Solodov (1993), Peng (1997) and Peng and Yuan (1997).

Clearly, the D-gap function inherits the differentiability properties of \(p_{\alpha }\) and \(p_{\beta }\), i.e., if F is differentiable, the function \(p_{\alpha \beta }\) is also differentiable and

$$\begin{aligned} \nabla p_{\alpha \beta } (x) = (\nabla F(x))^T [y_\beta (x)-y_\alpha (x)] + \beta \nabla _x G(x,y_\beta (x)) - \alpha \nabla _x G(x,y_\alpha (x)), \end{aligned}$$
(26)

where \(y_\alpha (x)\) and \(y_\beta (x)\) are the solutions of the optimization problem (14) with \(\alpha \) and \(\beta \) respectively. When the mapping F is locally Lipschitz continuous, the D-gap function is also locally Lipschitz and the Clarke generalized gradient of \(p_{\alpha \beta }\) satisfies a formula similar to (26) (see Ng and Tan 2007a).

The D-gap function is not convex in general and the stationary points of (25) may not be global minima. However, if \(x^*\) is a stationary point, i.e., \(\nabla p_{\alpha \beta } (x^*) = 0\), and the Jacobian matrix \(\nabla F(x^*)\) is positive definite, then \(x^*\) is a solution to (VI) (see Yamashita et al. 1997). Notice that the positive definiteness of \(\nabla F(x^*)\) can not be replaced by the strict monotonicity assumption on F [see the counterexample in Yamashita et al. (1997)]. When the feasible set C is a box, it is sufficient to assume that \(\nabla F(x^*)\) is a P-matrix (i.e. its principal minors are all positive) to obtain the same conclusion (Kanzow and Fukushima 1998b). In the special case of strongly monotone affine VIs, the D-gap function is convex (strongly convex) provided that the parameters \(\alpha \) and \(\beta \) are chosen so that the matrix

$$\begin{aligned} A+A^T-\alpha I - \beta ^{-1}A^T\,A \end{aligned}$$
(27)

is positive semidefinite (positive definite) (Peng and Fukushima 1999).

Since the D-gap functions allow to reformulate (VI) as an unconstrained problem, the boundedness of their level sets is an important issue in order to develop minimization algorithms for solving problem (25). The level sets of the D-gap function, denoted by

$$\begin{aligned} L(c) := \{ x \in \mathbb {R}^n:\ p_{\alpha \beta }(x) \le c \}, \end{aligned}$$

are bounded for all \(c \ge 0\) if either C is bounded (Kanzow and Fukushima 1998b) or F is strongly monotone (Peng and Fukushima 1999; Qu et al. 2003). Recently, it has been proved in Li and Ng (2009) that the strong monotonicity assumption on F can be replaced by a coercivity condition stronger than (2).

If F is strongly monotone on \(\mathbb {R}^n\) and either F is Lipschitz continuous on \(\mathbb {R}^n\) or C is bounded, then \(\sqrt{p_{\alpha \beta }}\) provides an error bound for (VI), i.e. there exists a constant \(M>0\) such that

$$\begin{aligned} \Vert x-x^* \Vert \le M\,\sqrt{p_{\alpha \beta }(x)}, \qquad \forall \ x \in \mathbb {R}^n, \end{aligned}$$

(see Yamashita et al. 1997). Notice that when C is unbounded, the strong monotonicity on F, without the Lipschitz continuity assumption, is not sufficient to guarantee the same result [see the counterexample in Huang and Ng (2005)]. However, when F is strongly monotone, (VI) has a unique solution and hence it is possible to reformulate the problem by replacing the set C by its intersection with a sphere large enough to contain the solution. In the special case where C is a box, the strong monotonicity of F can be replaced by the assumption that F is a uniform P-function (Kanzow and Fukushima 1998b). Recently, new global error bounds have been proposed in Li et al. (2010).

On the other hand, the strong monotonicity on F only guarantees a local error bound on the level sets of the D-gap function (Qu et al. 2003), that is for any \(c \ge 0\) there exists \(M>0\) such that

$$\begin{aligned} \Vert x-x^* \Vert \le M\,\sqrt{p_{\alpha \beta }(x)}, \qquad \forall \ x \in L(c). \end{aligned}$$

Recently, this result has been extended to locally \(\xi \)-monotone and coercive mappings (Li et al. 2010) and to general nonmonotone mappings (Li and Ng 2009).

There are several solution methods for VIs based on the minimization of D-gap functions. A descent method with Armijo-type line search has been proposed in Yamashita et al. (1997): at each iteration it exploits the search direction \(d=r(x)+\rho \,s(x)\), where \(r(x)=y_\alpha (x)-y_\beta (x)\), \(s(x)=\alpha \nabla _x G(x,y_\alpha (x))-\beta \nabla _x G(x,y_\beta (x))\) and \(\rho >0\) is a sufficiently small constant. This method converges to the solution to (VI) if F is strongly monotone on \(\mathbb {R}^n\) and either F is Lipschitz continuous on \(\mathbb {R}^n\) or C is bounded. Another descent method was developed in Solodov and Tseng (2000) for solving monotone VIs with bounded feasible set. It is similar to the method proposed in Zhu and Marcotte (1993) based on the Fukushima’s regularized gap function: at each iteration it uses \(d=y_\alpha (x)-y_\beta (x)\) as search direction along with a suitable update of the parameters \(\alpha \) and \(\beta \). A descent method for solving nonmonotone VIs, which is based on the minimization of the function \(\sqrt{p_{\alpha \beta }}\), has been presented recently in Li and Ng (2009).

A hybrid Newton method has been proposed in Peng and Fukushima (1999): at each iteration, it finds the solution z(x) to the linearized VI (10) at x and it uses the direction \(d=z(x)-x\) whenever it provides a sufficient decrease in the D-gap function \(p_{\alpha \beta }\); otherwise the direction \(d=-\nabla p_{\alpha \beta }(x)\) is used. Then, an inexact line search is performed to get the next iterate. The generated sequence converges superlinearly to the unique solution \(x^*\) to (VI) if F is continuously differentiable and strongly monotone on \(\mathbb {R}^n\). Furthermore, the convergence is quadratic if \(\nabla F\) is Lipschitz continuous around \(x^*\). A variant of this method has been proposed in Peng et al. (1999) for box constrained VIs.

In Kanzow and Fukushima (1998b) a nonsmooth Gauss–Newton type method for solving box constrained VIs has been presented. At each iteration, it solves a linear system of equations involving the generalized Hessian of the D-gap function and uses this vector as search direction if a descent condition is satisfied; otherwise the direction \(d=-\nabla p_{\alpha \beta }(x)\) is used. Then an inexact line search is performed. The algorithm is globally and superlinearly convergent under suitable assumptions. A similar Gauss–Newton strategy has also been adopted in a trust region method for minimizing the D-gap function (Sun et al. 1997).

Another Newton type method for the solution of box constrained VIs is based on the reformulation of (VI) as a system of nonsmooth and nonlinear equations involving the natural residual. This method, based on the minimization of the D-gap function, is globally and superlinearly convergent (Kanzow and Fukushima 1998a).

3.2.2 Merit functions via the Moreau–Yosida regularization

Another approach to get unconstrained optimization reformulations of (VI) is based on the Moreau–Yosida regularization of some gap functions (Yamashita and Fukushima 1997).

The function

$$\begin{aligned} p^{MY}_{\alpha \lambda }(x) := \inf _{z \in C} \left\{ \sup _{y \in C} \left[ \langle F(z),z-y \rangle - \alpha \,\Vert y-z\Vert ^2 \right] + \lambda \,\Vert z-x\Vert ^2 \right\} , \end{aligned}$$
(28)

with \(\alpha \ge 0\) and \(\lambda > 0\), is derived from the Moreau–Yosida regularization of the regularized gap function \(p^F\), with \(M=\alpha \,I\). It is nonnegative on the whole space \(\mathbb {R}^n\) and \(p^{MY}_{\alpha \lambda }(x^*)=0\) if and only if \(x^*\) solves (VI).

Notice that this merit function may not be easy to evaluate in practice unless (VI) has a certain special structure [e.g., F is affine and C is a polyhedron (see Yamashita and Fukushima 1997)]. However, it enjoys some nice theoretical properties that other merit functions do not have. For instance, if the infimum in (28) is uniquely attained in \(z_{\alpha \lambda }(x)\) for each \(x \in \mathbb {R}^n\), then \(p^{MY}_{\alpha \lambda }\) is differentiable on \(\mathbb {R}^n\) and

$$\begin{aligned} \nabla p^{MY}_{\alpha \lambda } (x) = 2\lambda \,[x-z_{\alpha \lambda }(x)], \end{aligned}$$

even if F is not differentiable.

In general, the function \(p^{MY}_{\alpha \lambda }\) is not convex. However, if the gap function p is convex, then \(p^{MY}_{0\lambda }\) is differentiable and convex on \(\mathbb {R}^n\) for any \(\lambda >0\); while if the regularized gap function \(p^F\) , with \(M=\alpha \,I\), is convex then \(p^{MY}_{\alpha \lambda }\) is differentiable and convex on \(\mathbb {R}^n\) for any \(\lambda >0\).

The function \(p^{MY}_{\alpha \lambda }\) provides also a global error bound under the strong monotonicity of F (without assuming Lipschitz continuity as is the case for the D-gap functions). In fact, if F is strongly monotone on C with modulus \(\mu \), \(\alpha \in [0,\mu )\) and \(\lambda >0\), then

$$\begin{aligned} \frac{1}{2}\,\min \{\mu -\alpha ,\lambda \} \Vert x-x^*\Vert ^2 \le p^{MY}_{\alpha \lambda }(x) \le \lambda \,\Vert x-x^*\Vert ^2, \qquad \forall \ x \in \mathbb {R}^n, \end{aligned}$$

i.e., the growth rate of \(p^{MY}_{\alpha \lambda }\) is in the order of the squared distance from the unique solution \(x^*\) to (VI).

When F satisfies suitable monotonicity assumptions, further merit functions for (VI) can be obtained by the Moreau–Yosida regularization of Minty gap functions. The function

$$\begin{aligned} p^{M}_{\beta \lambda }(x) := \inf _{z \in C} \left\{ \sup _{y \in C} \left[ \langle F(y),z-y \rangle + \beta \,\Vert y-z\Vert ^2 \right] + \lambda \,\Vert z-x\Vert ^2 \right\} , \end{aligned}$$
(29)

with \(\beta \ge 0\) and \(\lambda > 0\), is the Moreau–Yosida regularization of the Minty gap function \(p^M\) (when \(\beta =0\)) and the regularized Minty gap function \(p^M_\beta \) (when \(\beta >0\)). If F is pseudomonotone on C, then \(p^{M}_{0\lambda }\) turns out to be a merit function for (VI) for any \(\lambda >0\), because it is nonnegative on \(\mathbb {R}^n\) and \(p^{M}_{0\lambda }(x^*)=0\) if and only if \(x^*\) solves (VI). Furthermore, if F is strongly monotone with modulus \(\mu \), then the same happens for \(p^{M}_{\beta \lambda }\) provided that \(\beta \in [0,\mu ]\).

Note that also \(p^{M}_{\beta \lambda }\) may not be easy to evaluate in practice, but some nice theoretical properties hold. In fact, it is differentiable and convex on \(\mathbb {R}^n\) for any \(\beta \ge 0\) and \(\lambda >0\), without making any additional assumption on F. Moreover, if F is strongly monotone on C with modulus \(\mu \), \(\beta \in (0,\mu ]\) and \(\lambda >0\), then the quadratic growth rate of \(p^{M}_{\beta \lambda }\) is ensured, i.e.,

$$\begin{aligned} \frac{1}{2}\,\min \{\beta ,\lambda \} \Vert x-x^*\Vert ^2 \le p^{M}_{\beta \lambda }(x) \le \lambda \,\Vert x-x^*\Vert ^2, \qquad \forall \ x \in \mathbb {R}^n, \end{aligned}$$

where \(x^*\) is the unique solution to (VI).

3.3 An application to traffic network equilibrium problems

A traffic network consists of a set of nodes \(\mathcal {N}\), a set of arcs \(\mathcal {A} \subseteq \mathcal {N} \times \mathcal {N}\) and a set of origin/destination pairs \(\mathcal {W} \subseteq \mathcal {N} \times \mathcal {N}\). For each O/D pair w, a traffic demand \(d_{w}\) has to be distributed among the paths connecting w. We denote \(\mathcal {P}_{w}\) the set of all paths connecting w, \(x_p\) the flow on path p and \(x=(x_p)_{p \in \mathcal {P}_{w}, w \in \mathcal {W}}\) the vector of all path flows. The set of feasible path flows is given by

$$\begin{aligned} X = \left\{ x \ge 0{:} \quad \sum _{p \in \mathcal {P}_w} x_p = d_{w}, \qquad \forall \ w\in \mathcal {W} \right\} . \end{aligned}$$

The flow \(f_a\) on each arc a is the sum of all flows on paths to which the arc belongs, hence the arc flow vector \(f=(f_a)_{a \in \mathcal {A}}\) can be written as \(f=\Delta \,x\), where \(\Delta \) is the arc-path incidence matrix:

$$\begin{aligned} \Delta _{a,p}= {\left\{ \begin{array}{ll} 1 &{} \text {if }a\in p,\\ 0 &{} \text {otherwise.} \end{array}\right. } \end{aligned}$$

For each arc a, there is a nonnegative cost function \(t_a(f)\), which represents the travel time associated with arc a and depends on the arc flow vector f. The corresponding path cost function is assumed to be additive, i.e., the travel time \(T_p(x)\) on path p is the sum of the travel times of the arcs belonging to p:

$$\begin{aligned} T_p(x)=\sum _{a \in p} t_a (\Delta \,x). \end{aligned}$$

According to the Wardrop equilibrium principle (Wardrop 1952), a path flow \(x^* \in X\) is called a network equilibrium if it is positive only on minimum cost paths, i.e., the following implication

$$\begin{aligned} x^*_p > 0 \quad \Longrightarrow \quad T_p(x^*) = \min _{q \in \mathcal {P}_{w}} T_q(x^*) \end{aligned}$$

holds for any O/D pair \(w \in \mathcal {W}\) and path \(p \in \mathcal {P}_{w}\).

It is well-known (Dafermos 1980) that the problem of finding network equilibria is equivalent to solving the following variational inequality:

$$\begin{aligned} \text {find }x^* \in X\text { such that } \langle T(x^*), y-x^* \rangle \ge 0, \quad \text { for all } y \in X. \end{aligned}$$
(30)

Next example shows how the merit function approach for VIs can be applied to network equilibria.

Example 1

Consider the network in Fig. 1 with two O/D pairs: \(w_1=(1,4)\) with demand \(d_1=4\) and \(w_2=(1,5)\) with \(d_2=6\). Each O/D pair is connected by two paths: \(\mathcal {P}_{w_1}=\{ (1,2),(2,4);\ (1,3),(3,4) \}\) and \(\mathcal {P}_{w_2}=\{ (1,2),(2,5);\ (1,3),(3,5) \}\). We denote the flow on paths as \(x_1,\dots ,x_4\), respectively. Hence the set of feasible path flows is given by

$$\begin{aligned} X = \{ x \in \mathbb {R}^4_+:\ x_1 + x_2 = 4, \quad x_3 + x_4 = 6 \}. \end{aligned}$$
Fig. 1
figure 1

Traffic network in Example 1

Assume that the arc cost functions are defined as follows:

$$\begin{aligned} \left\{ \begin{array}{l} t_{12} := f_{12} + 1 = x_1 + x_3 + 1, \\ t_{13} := 3\,f_{13} + 2 = 3\,(x_2+x_4) + 2, \\ t_{24} := 2\,f_{24} + f_{34} + 1 = 2\,x_1 + x_2 + 1, \\ t_{25} := 2\,f_{25} + f_{35} + 3 = 2\,x_3 + x_4 + 3, \\ t_{34} := f_{34} + 2 = x_2 + 2, \\ t_{35} := 4\,f_{35} + 1 = 4\,x_4 + 1, \end{array} \right. \end{aligned}$$

thus the corresponding path costs are

$$\begin{aligned} \left\{ \begin{array}{l} T_1 = t_{12} + t_{24} = 3\,x_1 + x_2 + x_3 + 2, \\ T_2 = t_{13} + t_{34} = 4\,x_2 + 3\,x_4 + 4, \\ T_3 = t_{12} + t_{25} = x_1 + 3\,x_3 + x_4 + 4, \\ T_4 = t_{13} + t_{35} = 3\,x_2 + 7\,x_4 + 3, \end{array} \right. \end{aligned}$$

i.e., the operator of VI (30) is \(T(x)=A\,x+b\), with

$$\begin{aligned} A = \left( \begin{array}{c@{\quad }c@{\quad }c@{\quad }c} 3 &{} 1 &{} 1 &{} 0 \\ 0 &{} 4 &{} 0 &{} 3 \\ 1 &{} 0 &{} 3 &{} 1 \\ 0 &{} 3 &{} 0 &{} 7 \end{array} \right) , \qquad b = \left( \begin{array}{c} 2 \\ 4 \\ 4 \\ 3 \end{array} \right) . \end{aligned}$$

Note that the matrix A is positive definite, thus the mapping T is strongly monotone and there exists a unique solution of VI (30), i.e., a unique network equilibrium.

We now consider the Fukushima’s regularized gap function (7) with \(M=I\), i.e.,

$$\begin{aligned} p^F(x) = \max _{y \in X} \left[ \langle T(x), x-y \rangle - \frac{1}{2} \Vert x-y\Vert ^2 \right] . \end{aligned}$$

This function is continuously differentiable and strongly convex since the matrix \(A+A^T-I\) is positive definite. In Fig. 2 we show the graph of \(p^F\) defined on the 2-dimensional space \((x_1,x_3)\), with \(x_1 \in [0,4]\) and \(x_3 \in [0,6]\) (the demand constraints allow to express variables \(x_2\) and \(x_4\) as function of \(x_1\) and \(x_3\), respectively).

The descent algorithm (8)–(9) applied to \(p^F\) can be exploited to compute the network equilibrium. Table 1 reports the first four iterations of the algorithm (implemented in MATLAB) starting from the feasible flow (4, 0, 6, 0).

Fig. 2
figure 2

The regularized gap function \(p^F\) with \(M=I\) in Example 1

Table 1 Numerical results of the descent algorithm (8)–(9) applied to Example 1

Note that the path costs corresponding to the equilibrium solution

$$\begin{aligned} x^*=(2.6316,\ 1.3684,\ 4.0526,\ 1.9474) \end{aligned}$$

are

$$\begin{aligned} T(x^*)=(15.3158,\ 15.3158,\ 20.7368,\ 20.7368), \end{aligned}$$

i.e., the two paths connecting each O/D pair have the same cost. Furthermore, the Lagrange multipliers \(\lambda ^*\) associated with \(x^*\) in the KKT conditions (4) coincide with the equilibrium costs, i.e., \(\lambda ^*=(15.3158,\ 20.7368)\).

Figure 3 shows the graph of the D-gap function \(p_{\alpha \beta }\), with \(\alpha =1\), \(\beta =2\) and \(G(x,y)=\Vert x-y\Vert ^2/2\), defined on the space \((x_1,x_3)\). Note that this function is always nonnegative (even in unfeasible points) and its global minimum is \(x^*\).

Fig. 3
figure 3

The D-gap function \(p_{\alpha \beta }\) with \(\alpha =1\), \(\beta =2\) and \(G(x,y)=\Vert x-y\Vert ^2/2\) in Example 1

4 Merit functions for quasi-variational inequalities

In this section we consider the merit function approach for quasi-variational inequalities (QVIs), i.e., VIs in which the feasible region depends on the variable x. Given a vector-valued mapping \(F:\mathbb {R}^n \rightarrow \mathbb {R}^n\) and a set-valued mapping \(C:\mathbb {R}^n \rightrightarrows \mathbb {R}^n\), such that C(x) are closed and convex sets for any \(x \in \mathbb {R}^n\), the QVI is defined as follows:

$$\begin{aligned} \text {find }x^* \in C(x^*)\text { such that }\langle F(x^*),y-x^* \rangle \ge 0,\quad \text { for all }y \in C(x^*). \end{aligned}$$
(QVI)

When all the sets C(x) coincide with the same set C, (QVI) collapses to (VI). The set of fixed points of the mapping C, i.e.

$$\begin{aligned} X := \{ x \in \mathbb {R}^n:\ x \in C(x) \}, \end{aligned}$$

is the feasible region of (QVI). In the following we suppose that sets C(x) are defined by constraints, i.e.,

$$\begin{aligned} C(x) := \{ y \in \mathbb {R}^n:\ g_i(x,y) \le 0, \quad i=1,\dots ,m \}, \end{aligned}$$

where the functions \(g_i:\mathbb {R}^n\times \mathbb {R}^n\rightarrow \mathbb {R}\) are assumed to be continuous and \(g_i(x,\cdot )\) convex for any fixed \(x \in \mathbb {R}^n\). Furthermore, in order to guarantee the convexity of the set X, we assume that the functions \(x \mapsto g_i(x,x)\) are convex for all \(i=1,\dots ,m\).

QVIs were introduced in Bensoussan et al. (1973) and Bensoussan and Lions (1973) and subsequently exploited to model several finite and infinite-dimensional problems (see Baiocchi and Capelo 1984; Chan and Pang 1982; Facchinei et al. 2014, 2013 and references therein).

Some merit functions have been proposed in the literature extending to QVIs similar ideas developed for VIs. Similarly to VIs, the reformulation of (QVI) as a fixed point problem leads to define a merit function. In fact, it follows from the definition that x solves (QVI) if and only if \(x=\pi _{C(x)}(x-F(x))\), thus \(\Vert x - \pi _{C(x)}(x-F(x))\Vert \) is a merit function for (QVI). Another approach is based on reformulating (QVI), under suitable constraint qualifications, as a complementarity problem via the following KKT conditions:

$$\begin{aligned} \left\{ \begin{array}{ll} F(x) + \sum \limits _{i=1}^m \lambda _i \nabla _y g_i(x,x) = 0, &{} \\ \lambda _i \, g_i(x,x) = 0, &{} i=1,\dots ,m,\\ \lambda _i \ge 0, \qquad g_i(x,x) \le 0, &{} i=1,\dots ,m. \end{array} \right. \end{aligned}$$

Recently, a solution method based on these conditions has been proposed in Facchinei et al. (2014).

A straightforward extension of the gap function (5) to QVIs is defined as follows (Giannessi 1995):

$$\begin{aligned} \mathsf {p}(x) := \sup _{y \in C(x)} \langle F(x), x-y \rangle . \end{aligned}$$
(31)

This function is nonnegative on the set X and \(x^*\) solves (QVI) if and only if \(x^* \in X\) and \(\mathsf {p}(x^*)=0\). However, the gap function \(\mathsf {p}\) is nondifferentiable and it may occur that \(\mathsf {p}(x)=+\infty \) for some point in X.

The regularized gap function (7) has been extended to QVIs in Fukushima (2007) and is defined by

$$\begin{aligned} \mathsf {p}_\alpha (x) := \max _{y \in C(x)} \left[ \langle F(x), x-y \rangle - \frac{\alpha }{2}\, \Vert x-y\Vert ^2 \right] . \end{aligned}$$
(32)

This function is a gap function, it is finite and the maximum in (32) is attained in a unique point \(y_\alpha (x)\), provided that the set C(x) is nonempty. Actually, it is possible to define \(\mathsf {p}_\alpha \) replacing the regularization term \(\alpha \Vert x-y\Vert ^2/2\) and the set C(x) with more general expressions satisfying suitable conditions (Fukushima 2007; Taji 2008).

In contrast to VIs, this function is nondifferentiable even if F is so [see examples in Harms et al. (2014b)]. If F and \(g_i\) are continuously differentiable and a constraint qualification holds, then \(\mathsf {p}_\alpha \) is directionally differentiable everywhere and its directional derivative at x along direction d is given by

$$\begin{aligned} \mathsf {p}'_\alpha (x;d) = \min _{\lambda \in {\varLambda }_\alpha (x)} \langle F(x) -[(\nabla F(x))^T-\alpha \,I][y_\alpha (x)-x] - \sum _{i=1}^m \lambda _i \nabla _x g_i(x,y_\alpha (x)), d \rangle , \end{aligned}$$

where \({\varLambda }_\alpha (x)\) is the set of Lagrange multipliers associated with \(y_\alpha (x)\), i.e.,

$$\begin{aligned} \begin{array}{rl} {\varLambda }_\alpha (x)= \{ \lambda \in \mathbb {R}^m_+: &{} F(x) + \alpha [y_\alpha (x)-x] + \sum _{i=1}^m \lambda _i \nabla _y g_i(x,y_\alpha (x))=0, \\ &{} \lambda _i\,g_i(x,y_\alpha (x)) = 0, \qquad i=1,\dots ,m \}, \end{array} \end{aligned}$$

(see Fukushima 2007). Furthermore, \(\mathsf {p}_\alpha \) turns out to be continuously differentiable in the special case of QVI with ‘moving sets’, i.e., when \(C(x)=Q+c(x)\), where Q is a closed and convex set and \(c:\mathbb {R}^n\rightarrow \mathbb {R}^n\), provided that mappings F and c are continuously differentiable (Dietrich 2001). Recently, this latter result has been extended to QVIs with generalized moving sets (Harms et al. 2014b).

Similarly to VIs, the regularized gap function \(\mathsf {p}_\alpha \) is nonconvex in general. In Taji (2008) it is proved that, whenever \(\mathsf {p}_\alpha \) is directionally differentiable, a stationary point \(x^*\) of \(\mathsf {p}_\alpha \) on X, i.e.,

$$\begin{aligned} \mathsf {p}'_\alpha (x^*;y-x^*) \ge 0, \qquad \forall \ y \in X, \end{aligned}$$

is a solution to (QVI) provided that the matrix \(\nabla F(x^*)\) is positive definite and

$$\begin{aligned} \lambda _i \, \langle \nabla _x g_i(x^*,y_\alpha (x^*)), y_\alpha (x^*)-x^* \rangle \ge 0, \quad \text {for all }i=1,\dots ,m\text { and } \lambda \in {\varLambda }_{\alpha }(x). \end{aligned}$$

Notice that in Taji (2008) the key assumption \(y_\alpha (x^*) \in X\) is not explicitly stated in the statement, but it is exploited in the proof and must be therefore considered as hypothesis.

An unconstrained minimization reformulation of (QVI) can be obtained via the D-gap functions, i.e., the difference of two regularized gap functions. In fact, given \(0 < \alpha < \beta \), the function

$$\begin{aligned} \mathsf {p}_{\alpha \beta }(x) := \mathsf {p}_{\alpha }(x) - \mathsf {p}_{\beta }(x) \end{aligned}$$

is nonnegative on \(\mathbb {R}^n\) and \(\mathsf {p}_{\alpha \beta }(x^*)=0\) if and only if \(x^*\) solves (QVI). The directional differentiability of \(\mathsf {p}_{\alpha \beta }\) directly follows from that of \(\mathsf {p}_{\alpha }\) and \(\mathsf {p}_{\beta }\). Recently, extending an idea of Dietrich (see Dietrich 1999), another unconstrained optimization reformulation has been obtained in Harms et al. (2014a) by making use of Toland’s and Singer’s duality theory.

The functions \(\sqrt{\mathsf {p}_\alpha }\) and \(\sqrt{\mathsf {p}_{\alpha \beta }}\) provide error bound results for (QVI) provided that F is strongly monotone and Lipschitz continuous on \(\mathbb {R}^n\) and an additional technical assumption on the Euclidean projection on the sets C(x) is fulfilled (Gupta and Mehra 2012). Another error bound result based on the function \(\sqrt{\mathsf {p}_\alpha }\) has been proved in Aussel et al. (2011).

4.1 Application to generalized Nash equilibrium problems

Let us consider a noncooperative game with N players, in which each player i controls a set of variables \(x_i \in \mathbb {R}^{n_i}\). The vector of all players strategies is denoted by \(x=(x_1,\dots ,x_N) \in \mathbb {R}^n\), with \(n=n_1+\dots ,n_N\); the vector x is also denoted by \(x=(x_i,x_{-i})\), where \(x_{-i}\) denotes the strategy vector of all the players different from player i. Each player i has a cost function \(\theta _i : \mathbb {R}^n \rightarrow \mathbb {R}\), which possibly depends on all players strategies x, and a feasible set \(X_i(x_{-i}) \subseteq \mathbb {R}^{n_i}\), possibly depending on the rival players’ strategies \(x_{-i}\).

A generalized Nash equilibrium (GNE) of the game is a vector

$$\begin{aligned} x^*=(x^*_1,\dots ,x^*_N) \in X_1(x^*_{-1}) \times \dots \times X_N(x^*_{-N}) \end{aligned}$$

such that, for any \(i=1,\dots ,N\), \(x^*_i\) is an optimal solution of the following optimization problem:

$$\begin{aligned} \min _{x_i}\ \theta _i(x_i,x^*_{-i}) \qquad \text {subject to} \qquad x_i \in X_i(x^*_{-i}). \end{aligned}$$

In other words, \(x^*\) is a GNE if no player can improve its own cost function by unilaterally changing its strategy.

It is well-known (see, e.g., Facchinei and Kanzow 2010) that under the following assumptions:

  • \(\theta _i\) is continuously differentiable for any \(i=1,\dots ,N\),

  • \(\theta _i(\cdot ,x_{-i})\) is convex for any \(x_{-i}\) and \(i=1,\dots ,N\),

  • the feasible sets \(X_i(x_{-i})\) are closed and convex for all \(x \in \mathbb {R}^n\) and \(i=1,\dots ,N\),

the problem of finding GNE is equivalent to solving the QVI with operator

$$\begin{aligned} F(x) = ( \nabla _{x_1} \theta _1(x), \dots , \nabla _{x_N} \theta _N(x) ) \end{aligned}$$

and set-valued mapping

$$\begin{aligned} C(x) = X_1(x_{-1}) \times \dots X_N(x_{-N}). \end{aligned}$$

Example 2

(Cavazzuti et al. 2002) Consider a two-person noncooperative game, in which player i selects the coordinate \(x_i \in \mathbb {R}\) subject to a individual constraint \(x_i \le 0\) and a shared constraint \(x_1+x_2 \le -1\). The aim of player i is to minimize the (squared) distance between \((x_1,x_2)\) and his favourite goal \(P_i \in \mathbb {R}^2\), with \(P_1=(1,0)\) and \(P_2=(0,1)\). Thus the optimization problems of the two players are defined as follows:

$$\begin{aligned} \text {Player 1: } \left\{ \begin{array}{l} \min \limits _{x_1}\ (x_1-1)^2 + x_2^2 \\ x_1 \le 0 \\ x_1+x_2 \le -1 \end{array} \right. \qquad \text {Player 2: } \left\{ \begin{array}{l} \min \limits _{x_2}\ x_1^2 + (x_2-1)^2 \\ x_2 \le 0 \\ x_1+x_2 \le -1 \end{array} \right. \end{aligned}$$

The set of GNE of the game coincides with the solution set of the QVI given by \(F(x)=(2\,x_1-2, 2\,x_2-2)\) and

$$\begin{aligned} C(x) = (-\infty ,\min \{0,-1-x_2\}] \times (-\infty ,\min \{0,-1-x_1\}]. \end{aligned}$$

The feasible region of the QVI, i.e. the set of fixed point of the set-valued mapping C, is

$$\begin{aligned} X = \{ x \in \mathbb {R}^2:\ x_1 \le 0,\ x_2 \le 0,\ x_1+x_2 \le -1 \}. \end{aligned}$$

It is easy to check that the solution set of the QVI is the segment connecting \((-1,0)\) and \((0,-1)\).

The value of the gap function (31) can be explicitly computed:

$$\begin{aligned} \begin{array}{rl} \mathsf {p}(x) &{} = \sup \limits _{y \in C(x)} \langle F(x),x-y \rangle \\ &{} = \sup \limits _{y \in C(x)} \left[ 2\,(x_1-1)\,(x_1-y_1) + 2\,(x_2-1)\,(x_2-y_2) \right] \\ &{} = 2\,x_1\,(x_1-1) + 2\,x_2\,(x_2-1) + \sup \limits _{y_1 \le \min \{0,-1-x_2\}} 2\,(1-x_1)\,y_1 \\ &{} \qquad + \sup \limits _{y_2 \le \min \{0,-1-x_1\}} 2\,(1-x_2)\,y_2 \\ &{} = {\left\{ \begin{array}{ll} 2\,x_1\,(x_1-1) + 2\,x_2\,(x_2-1) + &{} \\ \quad + 2\,(1-x_1)\,\min \{0,-1-x_2\} + &{} \\ \quad + 2\,(1-x_2)\,\min \{0,-1-x_1\}, &{} \text {if }x_1 \le 1\text { and }x_2 \le 1,\\ +\infty , &{} \text {otherwise.} \end{array}\right. } \end{array} \end{aligned}$$

This function is equal to zero on the solution set, but is not finite everywhere on \(\mathbb {R}^2\) and it is not differentiable on the half-lines \(\{-1\}\times (-\infty ,1]\) and \((-\infty ,1] \times \{-1\}\).

Figures 4 and 5 show the graphs of the regularized gap function \(\mathsf {p}_\alpha \), with \(\alpha =5\), and the D-gap function \(\mathsf {p}_{\alpha \beta }\), with \(\alpha =5\) and \(\beta =10\), respectively. Note that both functions are finite on \(\mathbb {R}^2\) and equal to zero in the solution set; \(\mathsf {p}_\alpha \) is negative in points not belonging to X, while \(\mathsf {p}_{\alpha \beta }\) is nonnegative on the whole space \(\mathbb {R}^2\).

Fig. 4
figure 4

The regularized gap function \(\mathsf {p}_\alpha \) with \(\alpha =5\) in Example 2

Fig. 5
figure 5

The D-gap function \(\mathsf {p}_{\alpha \beta }\) with \(\alpha =5\) and \(\beta =10\) in Example 2

5 Merit functions for abstract equilibrium problems

The abstract equilibrium problem is a general mathematical model which includes optimization, multi-objective optimization, variational inequalities, fixed point and complementarity problems, Nash equilibria in noncooperative games and inverse optimization as special cases (see Bigi et al. 2013; Blum and Oettli 1994). It is defined as follows:

$$\begin{aligned} \text {find }x^* \in C\text { such that }f(x^*,y) \ge 0, \quad \text { for all }y \in C, \end{aligned}$$
(EP)

where C is a closed and convex subset of \(\mathbb {R}^n\) and \(f:\mathbb {R}^n\times \mathbb {R}^n\rightarrow \mathbb {R}\) is a bifunction such that \(f(x,\cdot )\) is convex and satisfies \(f(x,x)=0\) for all \(x \in C\). Setting \(f(x,y)=\langle F(x),y-x \rangle \) we obtain (VI).

In the last decade, several merit functions for (EP) have been introduced in the literature. These functions often extend to (EP) those originally conceived for VIs. For instance, a direct extension of the gap function (5) from VIs to (EP) is defined as follows (Mastroeni 2003):

$$\begin{aligned} \mathbf {p}(x) := \sup _{y\in C} \left[ -f(x,y) \right] . \end{aligned}$$

This function is nonnegative on C and \(x^*\) solves (EP) if and only if \(x^* \in C\) and \(\mathbf {p}(x^*)=0\). However, \(\mathbf {p}\) has the same disadvantages of function (5), i.e., it is in general neither finite, nor differentiable nor convex. For these reasons, the regularized gap function has been proposed (Mastroeni 2003):

$$\begin{aligned} \mathbf {p}_\alpha (x) := \max _{y\in C} \left[ -f(x,y) - \frac{\alpha }{2} \Vert x-y\Vert ^2 \right] . \end{aligned}$$
(33)

It allows to reformulate (EP) as the problem of minimizing \(\mathbf {p}_\alpha \) on C, it is continuously differentiable, if the bifunction f is so, and

$$\begin{aligned} \nabla \mathbf {p}_\alpha (x) = -\nabla _x f(x,y_\alpha (x)) - \alpha [x-y_\alpha (x)], \end{aligned}$$

where \(y_\alpha (x)\) is the unique maximizer of problem (33). Note that the regularization term \(\Vert y-x\Vert ^2\) can be replaced by a more general bifunction G satisfying condition (13) (see Mastroeni 2003). Similarly to VIs, the regularized gap function is nonconvex in general. However, if \(x^*\) is a stationary point of \(\mathbf {p}_\alpha \) on C, i.e.,

$$\begin{aligned} \langle \nabla \mathbf {p}_\alpha (x^*), y-x^* \rangle \ge 0, \qquad \forall \ y \in C, \end{aligned}$$

and f is strictly \(\nabla \)-monotone on C, i.e.,

$$\begin{aligned} \langle \nabla _x f(x,y) + \nabla _y f(x,y), y-x \rangle > 0, \qquad \forall \ x,y \in C \text { with } x \ne y, \end{aligned}$$

then \(x^*\) is a solution to (EP). The strict \(\nabla \)-monotonicity of f plays a role similar to those of positive definiteness of \(\nabla F\) for VIs. In fact, it guarantees, in addition to the above “stationarity” property, that \(y_\alpha (x)-x\) is a descent direction for \(\mathbf {p}_\alpha \) at any non-stationary point x. Solution methods based on the minimization of \(\mathbf {p}_\alpha \) along this direction have been developed in Chadli et al. (2004) and Mastroeni (2003). An inexact version of these methods has been proposed in Di Lorenzo et al. (2014).

A descent method which does not require the strict \(\nabla \)-monotonicity of f has been introduced in Bigi et al. (2009). It is similar to that developed in Zhu and Marcotte (1993) for VIs: at any iteration it performs a line search if \(y_\alpha (x)-x\) is a descent direction for \(\mathbf {p}_\alpha \) at x, otherwise the value of \(\alpha \) is reduced. Convergence is guaranteed provided that C is bounded and f is c-monotone on C, i.e.,

$$\begin{aligned} f(x,y) + \langle \nabla _x f(x,y), y-x \rangle \ge 0, \qquad \forall \ x,y \in C. \end{aligned}$$
(34)

The latter condition is neither stronger nor weaker than strict \(\nabla \)-monotonicity and it is satisfied if \(f(\cdot ,y)\) is concave for all \(y \in C\) (see Bigi et al. 2009; Bigi and Passacantando 2015b).

Similarly to VIs, function \(\sqrt{\mathbf {p}_\alpha }\) provides error bound results under suitable monotonicity assumptions on f (see Chadli et al. 2004; Konnov and Pinyagina 2003b; Mastroeni 2003).

Since the evaluation of the regularized gap function \(\mathbf {p}_\alpha \) could be computationally expensive if C is defined as in (3) by nonlinear constraints, a variant of function \(\mathbf {p}_\alpha \) can be exploited as in the case of VIs. In Bigi and Passacantando (2012) the following function has been introduced:

$$\begin{aligned} \mathbf {p^{BP}_\alpha }(x) := \max _{y\in T(x)} \left[ -f(x,y) - \frac{\alpha }{2} \Vert x-y\Vert ^2 \right] , \end{aligned}$$

where T(x) is the outer polyhedral approximation of C at x defined as in (12). This function turns out to be a locally Lipschitz gap function for (EP). Furthermore, if \(g_i\)’s are continuously differentiable and a constraint qualification holds, then \(\mathbf {p^{BP}_\alpha }\) is directionally differentiable. Solution methods for (EP) exploiting this merit function have been proposed in Bigi and Passacantando (2012, 2015a).

D-gap functions have been extended from VIs to (EP) as well. Indeed, the difference of two regularized gap functions

$$\begin{aligned} \mathbf {p}_{\alpha \beta }(x) := \mathbf {p}_{\alpha }(x) - \mathbf {p}_{\beta }(x), \end{aligned}$$
(35)

with \(0<\alpha <\beta \), is nonnegative on \(\mathbb {R}^n\) and \(\mathbf {p}_{\alpha \beta }(x^*)=0\) if and only if \(x^*\) solves (EP). Thus, the global minima of \(\mathbf {p}_{\alpha \beta }\) on \(\mathbb {R}^n\) coincide with the solutions to (EP) (see Konnov and Pinyagina 2003a; Zhang and Han 2009). The D-gap function inherits the differentiability properties of \(\mathbf {p}_{\alpha }\) and \(\mathbf {p}_{\beta }\) but in general is not convex. Stationary points of \(\mathbf {p}_{\alpha \beta }\) coincide with the solutions to (EP) if the mappings \(\nabla _x f(x,\cdot )+\nabla _y f(x,\cdot )\) are strictly monotone on \(\mathbb {R}^n\) for any \(x \in \mathbb {R}^n\) (Zhang and Han 2009). Similarly to VIs, function \(\sqrt{\mathbf {p}_{\alpha \beta }}\) provides error bound results under suitable monotonicity assumptions on f (Cherugondi 2013; Zhang and Wu 2009).

Several solution methods for (EP) are based on D-gap functions. Descent methods exploiting the direction \(d=r(x)+\rho s(x)\), where \(r(x)=y_\alpha (x)-y_\beta (x)\), \(s(x)=\alpha [x-y_\alpha (x)]-\beta [x-y_\beta (x)]\) and \(\rho >0\) is small enough, have been introduced in Cherugondi (2013) and Konnov and Pinyagina (2003a). A descent method, which is similar to that proposed in Solodov and Tseng (2000) for VIs, is based on direction \(d=y_\alpha (x)-y_\beta (x)\) and suitable updates of parameters \(\alpha \) and \(\beta \) (Bigi and Passacantando 2015). Another descent method relies on the same direction \(d=y_\alpha (x)-x\) which is exploited by the solution methods for \(\mathbf {p}_\alpha \) (Zhang and Wu 2009).

The regularized Minty gap function (19) has been extended to (EP) in Quoc and Muu (2012) and it has been used to develop an iterative method for solving strongly monotone equilibrium problems, while gap functions based on conjugate duality have been extended to (EP) in Altangerel et al. (2006).

5.1 Application to a class of Nash–Cournot equilibrium problems

We now describe a problem of production competition over a network between several firms which produce the same commodity. We consider a modification of the oligopolistic model originally proposed in Marcotte (1987). Given a transportation network \((\mathcal {N},\mathcal {A})\), where \(\mathcal {N}\) is the set of nodes and \(\mathcal {A}\) the set of arcs, the firms and the markets are located at some subsets of nodes I and J, respectively. Each firm \(i \in I\) chooses the quantity \(x^i_j\) to supply to each market \(j \in J\) and the quantities \(v^i_a\) to be sent on each arc \(a \in \mathcal {A}\). These variables are subject to flow-conservation constraints, i.e., for any \(i \in I\) and \(k \in \mathcal {N}\) we have

$$\begin{aligned} (E\,v^i)_k= \left\{ \begin{array}{ll} -\displaystyle {\sum _{j \in J}}x^i_j &{} \hbox { if } k=i, \\ 0 &{} \hbox { if } k \notin J, \\ x^i_k &{} \hbox { if } k \in J, \\ \end{array} \right. \end{aligned}$$
(36)

where E is the node-arc incidence matrix of the network and \(v^i=(v^i_a)_{a \in \mathcal {A}}\). Moreover, \(q^i\) denotes the maximum quantity that firm i may produce, i.e.,

$$\begin{aligned} \sum _{j \in J} x^i_j \le q^i. \end{aligned}$$
(37)

The goal of the firm i is to maximize its profit given by

$$\begin{aligned} \sum _{j \in J} x^i_j \, p_j \left( \sum _{\ell \in I} x^{\ell }_j \right) - \sum _{a \in \mathcal {A}} s_a \, v_a^i - \pi _i \left( \sum _{j \in J} x^i_j \right) , \end{aligned}$$
(38)

where \(p_j:\mathbb {R}_+ \rightarrow \mathbb {R}_+\) is the inverse demand function for market j, that is \(p_j(z)\) denotes the unitary price at which the market j requires a total quantity z, \(s_a\) is the unitary transportation cost on arc a and \(\pi _i:\mathbb {R}_+ \rightarrow \mathbb {R}_+\) is the production cost function of firm i. Note that the first term of (38) depends on the quantities \(x^{\ell }_j\) chosen by all the firms \(\ell \in I\).

We say that an equilibrium state is reached when the flows and the quantities produced by the firms are such that no firm would increase its profit by changing its own choices while the other firms keep their own. This equilibrium definition coincides with the concept of Nash equilibrium in a noncooperative game where firms are the players and (38) are their payoff functions. Setting \(x=(x^i_j)_{i \in I, j \in J}\), \(v=(v^i)_{i \in I}\) and analogously y and w, Nash equilibria of this game are the solutions of the abstract equilibrium problem (EP), where the bifunction f is the Nikaidô–Isoda function associated with the game (Nikaidô and Isoda 1955), that is:

$$\begin{aligned} f((x,v),(y,w))= & {} \sum \limits _{i\in I} \left[ \sum \limits _{j \in J} x^i_{j} \, p_j\left( \sum \limits _{\ell \in I} x^\ell _{j}\right) - \sum \limits _{j \in J} y^i_{j} \, p_j\left( y^i_{j}+\sum \limits _{\ell \in I, \ell \ne i} x^\ell _{j}\right) \right. \\&\left. + \sum \limits _{a \in \mathcal {A}} s_a \, (w_a^i-v_a^i) + \pi _i\left( \sum \limits _{j \in J} y^i_{j}\right) -\pi _i\left( \sum \limits _{j \in J}x^i_{j}\right) \right] \\ \end{aligned}$$

and the feasible set C is defined by constraints (36) and (37).

Fig. 6
figure 6

Transportation network in Example 3

Example 3

Let us consider the transportation network in Fig. 6, where \(I=\{1,2\}\), \(J=\{6,7\}\) and the number associated with each arc a denotes the unitary transportation cost \(s_a\).

We assume that the production bounds are \(q_1=70\) and \(q_2=40\); that both markets have the same inverse demand function:

$$\begin{aligned} p_j(z) = p(z) := \rho ^{1/\tau }(z+\sigma )^{-1/\tau }, \qquad j \in J, \end{aligned}$$

with \(\rho =5000\), \(\tau =1.1\) and \(\sigma =0.01\) (see, e.g., Murphy et al. 1982), and that the production cost functions have the form

$$\begin{aligned} \pi _i(z) := \gamma _i \, z + (1+\delta _i)^{-1} K_i^{-\delta _i} z^{1+\delta _i}, \qquad i \in I, \end{aligned}$$

where parameters \(\gamma _i\), \(\delta _i\) and \(K_i\) are reported in Table 2.

Since the functions p and \(\pi _i\) are convex and differentiable, the function \(z \mapsto z\,p(z)\) is concave and the bifunction \(f(\cdot ,(y,w))\) is concave for any (yw). Therefore, condition (34) is fulfilled and the convergence of the modified descent algorithm proposed in Bigi et al. (2009) is guaranteed. We implemented this algorithm in MATLAB exploiting the built-in function fmincon from the Optimization Toolbox to evaluate the regularized gap function \(\mathbf {p}_\alpha \) and to compute the search direction \(y_\alpha (x)-x\). Table 3 reports the quantities supplied by each firm to each market in the first 10 iterations of the algorithm starting from a zero total production.

Table 2 Parameters of cost functions \(\pi _i\) in Example 3
Table 3 Numerical results of the modified descent algorithm proposed in Bigi et al. (2009) applied to Example 3

6 Concluding remarks

Merit functions have been introduced for a number of variational mathematical models. In this paper, we focused on three of the most important ones: variational inequalities, quasi-variational inequalities and abstract equilibrium problems. Among other relevant models we recall set-valued variational inequalities, vector variational inequalities and generalized Nash equilibrium problems.

The merit function approach has been extensively developed for VIs in the last two decades, while it is still at a quite early stage for more general problems. We believe that this is partially due to the complexity of these problems, but above all because many real-world applications of these problems have arisen recently. Therefore, there are still many challenging open problems regarding merit functions which are worthy of being investigated.

Merit functions for QVIs need to be further investigated: for instance, the general condition under which the stationary points of the regularized gap function are solutions to (QVI) should be deepened for different classes of problems according to the set-valued mapping defining the feasible region. Furthermore, to the best of our knowledge, no ad-hoc descent method based on merit functions has been developed so far.

Regarding abstract equilibrium problems, the convergence of descent methods is usually based on differentiability assumptions. We think that some efforts should be devoted to develop algorithms for nonsmooth problems, which include nonsmooth Nash equilibrium problems as special cases. Moreover, it would be interesting to extend the Moreau–Yosida regularization to merit functions for abstract equilibrium problems. Finally, we believe that new merit functions could be developed without assuming the convexity of \(f(x,\cdot )\): this might allow to extend the merit function approach to nonconvex Nash equilibrium problems.