1 Introduction

Given a nonempty set K from \({\mathbb {R}}^{n}\) and an equilibrium bifunction f on K, i.e., a bifunction \(f{{:}}\,{\mathbb {R}}^{n} \times {\mathbb {R}}^{n}\rightarrow {\mathbb {R}}\) with \(f(x,x)=0\) for all \(x\in K\), the equilibrium problem is defined by

$$\begin{aligned} {\text {find}} \, x \in K \, {\text {such}\,\text {that}} \,\, f(x,y)\ge 0, \quad \forall \, y \in K. \end{aligned}$$
(EP)

As was noted in [13], equilibrium problems encompass several problems found in fixed point theory, continuous optimization and nonlinear analysis, e.g. minimization problems, linear complementarity problems, variational inequalities (VIs from now on) and vector optimization problems, among others.

On the other hand, and mainly motivated by real life problems, quasi-variational inequalities (QVIs from now on [16]) have been introduced and studied deeply in the recent years. Recall that, given a point-to-set operator \(K{{:}}\,{\mathbb {R}}^{n}\rightrightarrows {\mathbb {R}}^{n}\) and a point-to-point operator \(F{{:}}\,{\mathbb {R}}^{n} \rightarrow {\mathbb {R}}^{n}\), the QVI problem consists of

$$\begin{aligned} {\text {find}} \, x\in K(x) \, {\text {such}\,{\text {that}}} \,\, \langle F(x), x - y \rangle \ge 0, \quad \forall \, y \in K(x). \end{aligned}$$
(QVI)

We say that a point \(x \in {\mathbb {R}}^{n}\) is feasible if \(x \in K(x)\). If \(K(x) := K\), then (QVI) reduces to the usual variational inequality problem, which is also a particular case of the equilibrium problem (EP).

In order to unify both approaches, the quasi-equilibrium problem (QEP from now on) have been introduced and studied. Here the problem is defined by a point-to-set operator K and an equilibrium bifunction f, where the QEP consists of

$$\begin{aligned} {\text {find}} \, x \in K(x) \, {\text {such}}\,{\text {that}} \,\, f(x,y) \ge 0, \quad \forall \, y \in K(x). \end{aligned}$$
(QEP)

Therefore, QEPs encompass both EPs and QVIs simultaneously, i.e., by extension, minimization problems, linear complementarity problems, generalized Nash equilibrium problems (GNEPs from now on [22, 25]), and many others related to economics, management and mechanics among others (see [9, 29, 39]). Moreover, the tools used for providing existence results and optimality conditions goes from convex analysis and operator theory to generalized convexity, generalized monotonicity, fixed point theory and variational analysis, i.e., such problems provide a rich area for applying theoretical results and new developments from nonlinear applied analysis (see [8, 9, 15, 21] for instance).

With respect to algorithms for solving QEPs, several developments have been made in the past 10 years. We mention here different approaches for QEPs as extragradient methods (see [44, 45]) and the gap function approach (see [10]). The case of the Augmented Lagrangian method, which is also the main topic of this paper, have been developed in [43] for the usual minimization problem, and in [33] for the variational inequality problems. Different variants of the Augmented Lagrangian method for QVIs may be found in [36,37,38, 41], extending the method from VIs to QVIs.

In this paper, we propose an Augmented Lagrangian algorithm for QEPs. The main difference of our algorithm is given by its global convergence properties under weak constraint qualifications. To do this, and after an study of optimality conditions and constraints qualification for QEPs, we adapt the so-called sequential optimality conditions from nonlinear programming to QEPs (see [1]). Furthermore, it turns out that the generalization of an Approximate-KKT (AKKT) point for QEPs, which is a natural sequential optimality condition in optimization, is not necessarily satisfied at the solutions of a general QEP. So, special classes of QEPs need to be studied.

This analysis is an extension of the one presented in [14] for GNEPs. The existence of (approximate) Lagrange multipliers depend on the problem formulation, hence, since the interpretation of a GNEP as a QEP depends on a reformulation, the analysis conducted in this paper is indeed necessary. Moreover, we present some numerical experiments illustrating the practical behavior of the algorithm, including examples where the AKKT condition is not satisfied. The obtained results show an interesting behavior, in which the generated sequence tends to minimize the KKT residue norm, even when it is not possible to make the KKT residue arbitrarily small. This type of situation has not been previously reported in the literature.

In our algorithm, we divide the constraint set K(x) in two parts and we penalize only one of these parts within our (partial) Augmented Lagrangian approach. Hence, we consider a whole class of methods which are quite flexible and that can take into account the special structure of the underlying QEP in a favourable way. Since Augmented Lagrangian methods are not expected to find feasible points without strong assumptions, we provide a tendency for finding feasible points by introducing a secondary QEP as a measure of infeasibility. Hence, our global convergence theory is split into a result concerning feasibility and another one concerning optimality, as motivated by similar results in optimization (see e.g., [12]). Finally, we provide special classes of QEPs for which the resulting EP-subproblems are easy to solve under a monotone assumption on the Lagrangian bifunction or when the AKKT conditions are necessarily satisfied at its solutions. The analysis of the special classes that we present here is much more comprehensive than the one presented in [14], contemplating other cases where the AKKT conditions hold and also where the subproblems are monotonous, which has not been studied previously.

The paper is organized as follows. In Sect. 2, we set up notation, basic definitions and preliminaries on constraint qualifications and generalized monotonicity. In Sect. 3, we deal with QEP-tailored constraint qualifications (CQ-QEP) and we introduce the concept of Approximate Karush–Kuhn–Tucker (AKKT) condition for QEPs (AKKT-QEP). We present classes of QEPs for which AKKT-QEP is satisfied at a solution. In Sect. 4, we present our Augmented Lagrangian method. We provide a compact global convergence analysis considering both feasibility and optimality of a limit point generated by our algorithm. In Sect. 5, we deal with properties of the feasibility of QEPs and consider some special classes of QEPs via the study of monotonicity properties of its associated Lagrangian, where an example for mixed variational inequalities is also provided. Finally, in Sect. 6, some illustrative numerical results are presented.

2 Preliminaries

Given \(a \in {\mathbb {R}}\), we define \(a_{+} := \max \lbrace 0, a \rbrace\). Similarly, for a real vector x, we write \(x_{+}\) for the vector where the plus-operator is applied to each component. A vector-valued function \(\psi {{:}}\,{\mathbb {R}}^{n}\rightarrow {\mathbb {R}}^{m}\) is called convex if all component functions are convex. Finally, for a continuously differentiable bifunction \(g{{:}}\,{\mathbb {R}}^{n} \times {\mathbb {R}}^{n} \rightarrow {\mathbb {R}}^{m}\), we denote the partial (with respect to the second argument y) transposed Jacobian by \(\nabla _{y}g(x,y)\). Hence, for the ith component, \(\nabla _{y} g_{i} (x, y)\) is the gradient, viewed as a column vector. A collection \(a_1, \dots , a_m\) of vectors is called positively linearly dependent (p.l.d. from now on) if \(\sum _{i=1}^m t_i a_i = 0\) for some \(t_1 \ge 0, \dots , t_m \ge 0\) not all zero. Otherwise the collection is called positively linearly independent (p.l.i. from now on).

Consider a nonlinear programming problem with inequality constraints (for simplicity),

$$\begin{aligned} \min _{x} u(x) \,\, s.t. \, c_{i} (x) \le 0, \quad \forall \, i \in \{1, \dots , m\}, \end{aligned}$$
(1)

where \(u{{:}}\,{\mathbb {R}}^{n}\rightarrow {\mathbb {R}}\) and \(c_{i}{{:}}\, {\mathbb {R}}^{n}\rightarrow {\mathbb {R}}\) for \(i=1,\dots ,m\) are assumed to be continuously differentiable. Let X denote the feasible set of problem (1) and \(A(\bar{x})=\{i\mid c_i(\bar{x})=0\}\) the index set of active constraints at a point \(\bar{x}\in X\).

Definition 1

Let \(\bar{x}\in X\) be a feasible point. We say that \(\bar{x}\) satisfies the:

  1. (a)

    Linear Independence Constraint Qualification (LICQ) if the gradient vectors \(\nabla c_{i}(\bar{x})\) for \(i\in A(\bar{x})\) are linearly independent.

  2. (b)

    Mangasarian–Fromovitz Constraint Qualification (MFCQ) if the gradients \(\nabla c_{i}(\bar{x})\) for \(i\in A(\bar{x})\) are p.l.i.

  3. (c)

    Constant Positive Linear Dependence (CPLD) constraint qualification if for any subset \(I\subseteq A(\bar{x})\) such that the gradient vectors \(\nabla c_{i} (\bar{x})\) for \(i \in I\) are p.l.d., they remain p.l.d. for all x in a neighborhood of \(\bar{x}\).

  4. (d)

    Cone Continuity Property (CCP) if the set-valued mapping \(C{{:}}\,{\mathbb {R}}^{n} \rightrightarrows {\mathbb {R}}^{n}\) is outer semicontinuous at \(\bar{x}\), i.e., \(\limsup _{x\rightarrow \bar{x}} C(x) \subseteq C(\bar{x})\), where

    $$\begin{aligned}&C(x) := \left\{ w \in {\mathbb {R}}^{n}{{:}}\, w = \sum _{i \in A(\bar{x})} \lambda _{i} \nabla c_{i} (x), \, \lambda _{i} \ge 0 \right\} , \, {\mathrm{and}} \\&\limsup _{x\rightarrow \bar{x}} C(x) := \left\{ w\in {\mathbb {R}}^{n}{{:}}\, \exists \, x^{k} \rightarrow \bar{x}, \, \exists \, w^{k} \rightarrow w, w^{k} \in C(x^{k}) \right\} . \end{aligned}$$

It is know that CCP is the weakest of the conditions presented (among others), while still being a constraint qualification, implying, e.g., Abadie’s CQ (see [6]). That is, when CCP holds at a local minimizer, the KKT conditions are satisfied. On the other hand, sequential optimality conditions for constrained optimization are necessarily satisfied by local minimizers, independently of the fulfillment of constraint qualifications. These conditions are used for developing stopping criteria for several important methods such as the Augmented Lagrangian method and others and for proving global convergence results to a KKT point under a weak constraint qualification (CCP, for instance). The most popular of these sequential optimality conditions is the Approximate-KKT (AKKT) [1, 42] defined below:

Definition 2

(AKKT) We say that \(\bar{x} \in X\) satisfies AKKT if there exist sequences \(\{x^{k}\} \subset {\mathbb {R}}^{n}\) and \(\{\lambda ^{k}\} \subset {\mathbb {R}}^{m}_{+}\) such that \(\lim _{k \rightarrow \infty } x^{k} = \bar{x}\),

$$\begin{aligned} \lim _{k\rightarrow \infty } \Vert \nabla u(x^{k}) + \sum ^{m}_{i=1} \lambda ^{k}_{i} \nabla c_{i} (x^{k})\Vert& {} = 0, \, \mathrm{and} \\ \lim _{k \rightarrow \infty } \left( \min \{-c_{i} (x^{k}), \lambda ^{k}_{i}\} \right)& {} = 0, \quad \forall \, i \in \{1, 2, \ldots , m\}. \end{aligned}$$

Sequences \(\{x^k\}\) and \(\{\lambda ^k\}\) are called primal AKKT and dual AKKT sequences, respectively.

The following theorem states that AKKT is a true necessary optimality condition independently of the validity of any constraint qualification (see [1, 12]).

Theorem 1

Let \(\bar{x}\)be a local solution of problem (1), then\(\bar{x}\)satisfies AKKT.

When an AKKT point is such that the corresponding dual sequence is bounded, it is clear that the point is a true KKT point. However, even in the unbounded case, one may prove that the KKT conditions hold under different assumptions. The weakest of such assumptions, independently of the objective function, is CCP. Theorem 1 is also relevant without assuming constraint qualifications, as it shows that it is possible to find a point arbitrarily close to a local solution of problem (1) that satisfies the KKT conditions up to a given tolerance \(\epsilon >0\). This result suggests the use of perturbed KKT conditions as stopping criterion of numerical algorithms.

In our analysis, we consider the (QEP) with a continuously differentiable bifunction f, together with the multifunction K defined as

$$\begin{aligned} K(x)=\lbrace y\in {\mathbb {R}}^{n}{{:}}\,g(x,y)\le 0 \rbrace , \end{aligned}$$
(2)

where \(g{{:}}\,{\mathbb {R}}^{n}\times {\mathbb {R}}^n\rightarrow {\mathbb {R}}^{m}\) is continuously differentiable and denotes the parameterized constraints.

Note that equality constraints can also be included, but to keep the notation simple, we consider only inequality constraints. If g depends only on y, by abuse of notation, we replace g(xy) by g(y). Thus, \(K(x) = \{ y \in {\mathbb {R}}^{n}{{:}}\, g(y) \le 0\}=K\) for all \(x\in {\mathbb {R}}^{n}\), and (QEP) reduces to (EP).

Let \(x^{*}\) be a solution of the QEP with K given as in Eq. (2). Then \(x^{*} \in K(x^{*})\) and \(f(x^{*},y)\ge 0\) for all \(y \in K(x^{*})\), or equivalently,

$$\begin{aligned} f(x^{*},y)\ge 0, \quad \forall \, y{{:}}\, g(x^{*}, y) \le 0. \end{aligned}$$

As \(f(x^{*},x^{*})=0\), it follows that \(x^{*}\) is a solution of the problem

$$\begin{aligned} \min _{y}f(x^*,y) \, \text {s.t} \, g_{i}(x^{*},y)\le 0, \quad \forall \, i \in \{1, \dots , m\}. \end{aligned}$$
(3)

Assuming that a suitable constraint qualification holds at the solution \(x^{*}\) with respect to the set \(K(x^{*})\subseteq {\mathbb {R}}^{n}\), it follows that there exists some Lagrange multiplier \(\lambda ^{*}\in {\mathbb {R}}^{m}_+\) such that \((x^{*},\lambda ^{*})\) satisfies the following KKT conditions:

$$\begin{aligned}&\nabla _{y} f(x^{*}, x^{*}) + \sum ^{m}_{i=1} \lambda ^{*}_{i} \nabla _{y} g_{i} (x^{*}, x^{*})= 0, \\&\lambda ^{*}_{i} \ge 0, \, g_{i}(x^{*},x^{*})\le 0, \, \lambda ^{*}_{i} g_{i} (x^{*}, x^{*}) = 0, \quad \forall \, i \in \{1, \dots , m\}. \end{aligned}$$

This motivates the following definition of the KKT system for a QEP:

Definition 3

(KKT-QEP) Consider the (QEP) with K given by (2). Then the system

$$\begin{aligned}&\nabla _{y} f(x, x) + \sum ^{m}_{i=1} \lambda _{i} \nabla _{y} g_{i} (x, x) = 0, \\&\lambda _{i} \ge 0, \, g_{i} (x, x) \le 0, \, \lambda _{i} g_{i} (x, x) = 0, \quad \forall \, i \in \{1, \dots , m\}, \end{aligned}$$

is called the KKT conditions of the underlying (QEP). Every \((x,\lambda )\) satisfying these conditions is called a KKT-QEP pair.

A QEP is said to be convex if \(f(x,\cdot )\) and \(g(x,\cdot )\) are convex for each x (a usual assumption for QEPs—which we do not assume). Then, the KKT-QEP conditions are sufficient for optimality.

Our aim is to compute a KKT-QEP point by solving a related sequence of KKT-QEP systems from (simpler) quasi-equilibrium subproblems. In fact, in our analysis we allow for inexact solutions of the underlying subproblems. The following definition, motivated by the similar concept for optimization suggested by Definition 2, introduces our notion of an \(\epsilon\)-stationary point of this QEP.

Definition 4

Consider the (QEP) with K defined by (2), and let \(\epsilon \ge 0\). We call \((x,\lambda )\), with \(\lambda \ge 0\), an \(\epsilon\)-inexact KKT-QEP pair of the (QEP) if the following inequalities hold:

$$\begin{aligned}&\Vert \nabla _{y} f(x, x) + \sum ^{m}_{i=1} \lambda _{i} \nabla _{y} g_{i} (x, x) \Vert \le \epsilon , \end{aligned}$$
(4)
$$\begin{aligned} |\min \{ -g_{i} (x, x), \lambda _{i}\}|\le \epsilon , \quad \forall \, i \in \{1, \dots , m\}. \end{aligned}$$
(5)

Note that for \(\epsilon =0\) an \(\epsilon\)-inexact KKT-QEP point is a standard KKT-QEP point. A limit \(\bar{x}\) of \(\epsilon\)-inexact KKT-QEP points \(\{x_\epsilon \}_{\epsilon \rightarrow 0^+}\) (with suitable multipliers \(\{\lambda _\epsilon \}_{\epsilon \rightarrow 0^+}\) that may not be convergent) will be called an AKKT-QEP point (see Definition 6 below).

We finish this section with the following monotonicity notions which will be relevant in our forthcoming analysis.

Definition 5

Let S be a nonempty set from \({\mathbb {R}}^{n}\). Then an equilibrium bifunction \(f{{:}}\,S \times S\rightarrow {\mathbb {R}}\) is said to be

  1. (a)

    strongly monotone on S, if there exists a constant \(\gamma >0\) such that

    $$\begin{aligned} f(x,y)+f(y,x)\le -\gamma \Vert x-y\Vert ^{2}, \quad \forall \, x,y\in S; \end{aligned}$$
    (6)
  2. (b)

    mononote on S, if

    $$\begin{aligned} f(x, y) + f(y, x) \le 0, \quad \forall \, x, y \in S, \end{aligned}$$
    (7)

    while f is strictly monotone on S, if the previous inequality is strict whenever \(y \ne x\);

  3. (c)

    pseudomonotone on S, if for every \(x, y \in S\),

    $$\begin{aligned} f(x, y) \ge 0 \, \Longrightarrow \, f(y, x) \le 0; \end{aligned}$$
    (8)
  4. (d)

    \(\nabla _{xy}\)-monotone on S, if the mapping \(\nabla _{x}f(x,\cdot )\) is monotone on S for all \(x \in S\), that is,

    $$\begin{aligned} \langle \nabla _{x} f(x, y) - \nabla _{x} f(x, z), y - z \rangle \ge 0, \quad \forall \, x, y, z \in S, \end{aligned}$$

    while f is strictly \(\nabla _{xy}\)-monotone on S, if the previous inequality is strict whenever \(y \ne z\).

Clearly, every strongly monotone bifunction is strictly monotone and every monotone bifunction is pseudomonotone. If f is (strictly) \(\nabla _{xy}\)-monotone on S, then f is (strictly) monotone on S by [11, Theorem 3.1].

For a further study on generalized monotonicity we refer to [11, 28].

3 Approximate-Karush–Kuhn–Tucker condition and constraint qualifications for QEPs

In many problems, it is natural to stop the execution of an algorithm when a stationarity measure is approximately satisfied. In this section we will show that this procedure may avoid solutions a priori. This is in contrast with what is known in nonlinear programming.

We begin by extending the concept of AKKT for QEPs (AKKT-QEP), followed by the study of some important cases of QEPs where this condition is necessarily satisfied at a solution, whereas we show that this does not happen in general. Then we will see the relationship that exists between AKKT-QEP and constraint qualifications for QEPs, together with the Augmented Lagrangian method that will be presented in the next section. This analysis yields a global convergence proof to a KKT-QEP point under a weak constraint qualification.

Note that one may include equality constraints explicitly in the definition, without separating into two inequalities, and the computations carry out similarly to the case of optimization.

Definition 6

(AKKT-QEP) Consider the (QEP) with K defined by (2). We say that a feasible \(\bar{x} \in {{\mathbb {R}}}^n\) satisfies AKKT-QEP if there exist two sequences \(\{ x^{k}\} \subset {\mathbb {R}}^{n}\) and \(\{ \lambda ^{k} \} \subset {\mathbb {R}}^{m}_{+}\) such that \(x^{k} \rightarrow \overline{x}\),

$$\begin{aligned} \lim _{k \rightarrow \infty } \Vert \nabla _{y} f(x^{k}, x^{k}) + \sum ^{m}_{i=1} \lambda ^{k}_{i} \nabla _{y} g_{i} (x^{k}, x^{k}) \Vert = 0, \, {\mathrm{and}} \end{aligned}$$
(9)
$$\begin{aligned} \lim _{k \rightarrow \infty } \left( \min \{-g_{i} (x^{k}, x^{k}), \lambda ^{k}_{i} \} \right) = 0, \quad \forall \, i = 1, \dots , m. \end{aligned}$$
(10)

Sequences \(\{x^k\}\) and \(\{\lambda ^k\}\) are called (primal) AKKT-QEP and dual AKKT-QEP, respectively.

Example 1

AKKT-QEP is not necessarily satisfied at a solution. Indeed, set \(f, g{{:}}\,{\mathbb {R}} \times {\mathbb {R}} \rightarrow {\mathbb {R}}\) given by \(f(x,y) = -x+y\), \(g(x,y) = \frac{1}{2} (x-y)^{2}\), and \(K(x) = \{y \in {\mathbb {R}}{{:}}\, \, g(x, y) \le 0\} = \{x\}\). Clearly, the solution set of (QEP) is the whole real line. Set any solution \(x^{*} \in {\mathbb {R}}\). If \(x^{*}\) is an AKKT-QEP point, then we should find sequences \(\{x^k\}\subset {\mathbb {R}}\) and \(\{\lambda ^k\} \subset {\mathbb {R}}_+\) such that \(|1 + \lambda ^{k}(x^{k}-x^{k}) |\rightarrow 0,\) which is impossible. Hence \(x^{*}\) is not an AKKT-QEP point.

The previous example is similar to [14, Example 5.3]. Note that the fact that the AKKT condition is not necessary for GNEPs does not directly imply the result for QEPs, since a reformulation of the GNEP is required to solve it as a QEP. On the other hand, writing the KKT conditions of both problems, it is not difficult to see that the KKT residue in one problem tends to zero if and only if it goes to zero in the other as well. So the reformulation of the GNEP presented in [14, Example 5.3] would also illustrate failure of AKKT for QEPs. However, Example 1 is simpler, with a single variable and constraint, and the sign of the multipliers do not play a crucial role such as in the examples in [14]. This indicates that the AKKT-QEP condition would not hold even if the set K(x) is described by the equality constraint \(g(x, y) = (x-y)^{2} = 0\).

In [24, 36, 38], some important classes of QVIs were analyzed in the study of the Augmented Lagrangian method and of a method based on a potential reduction approach for solving the KKT system of a QVI. Let us show that for some of these classes, generalized for QEPs, and some other classes, we have the necessity of AKKT-QEP at a solution.

Theorem 2

Consider (QEP) where the constraints have the structure

$$\begin{aligned} g_i(x,y)=g^{1}_{i}(x)g^{2}_{i}(y)+g^{3}_{i}(x), \quad \forall \, i \in \{1, \dots , m\}, \end{aligned}$$
(11)

with continuously differentiable functions and\(x,y\in {\mathbb {R}}^n\). If\(\bar{x}\)is a solution, then the AKKT-QEP condition holds at\(\bar{x}\).

Proof

We have that \(\bar{x}\) is a solution of the following optimization problem:

$$\begin{aligned} \min _{y} f(\bar{x},y) \,\, \mathrm{s.t.} \, g^{1}_{i}(\bar{x}) g^{2}_{i} (y) + g^{3}_{i} (\bar{x})\le 0, \quad \forall \, i \in \{1, \dots , m\}. \end{aligned}$$

Denoting \(A(\bar{x})=\{i{{:}}\,g_i(\bar{x},\bar{x})=0\}\), by Theorem 1, there exist sequences \(\{ x^{k}\} \subset {\mathbb {R}}^{n}\) and \(\{\lambda ^{k}\} \subset {\mathbb {R}}^{\vert A(\bar{x}) \vert }_{+}\) such that \(x^{k} \rightarrow \bar{x}\) and

$$\begin{aligned} \Vert \nabla _{y} f(\bar{x},x^{k}) + \sum _{i\in A(\bar{x})} \lambda ^{k}_{i} g^{1}_{i} (\bar{x}) \nabla g^{2}_{i}(x^{k}) \Vert \rightarrow 0, \end{aligned}$$
(12)

where \(\lambda _i^k\rightarrow 0\) for \(i\not \in A(\bar{x})\) were equivalently replaced by a null sequence. Without loss of generality, one could also redefine, if necessary, \(\lambda ^{k}_{i}=0\) if \(g^{1}_{i}(\bar{x})=0\), and (12) would still hold. Let us define for k large enough and all \(i\in A(\bar{x})\):

$$\begin{aligned} \bar{\lambda }^{k}_i:= \left\{ \begin{array}{ll} 0, &{}\quad {\text {if}} \, g^{1}_{i}(x^{k})=0,\\ \lambda ^{k}_{i} \frac{g^{1}_{i}(\bar{x})}{g^{1}_{i}(x^{k})}, &{}\quad {\text {otherwise. }} \end{array} \right. \end{aligned}$$

Note that \(x^{k}\rightarrow \bar{x}\) and for k large enough \(\bar{\lambda }^{k}_i\) has the same sign of \(\lambda ^{k}_{i}\). Moreover, since \(\nabla _{y} g_{i} (x, y) = g^{1}_{i} (x) \nabla g^{2}_{i} (y)\), we have \(\bar{\lambda }^{k}_i \nabla _{y} g_{i} (x^k, x^{k}) = \lambda ^{k}_{i} g^{1}_{i} (\bar{x}) \nabla g^{2}_{i} (x^{k}).\) Therefore, by (12) and the triangular inequality, we have

$$\begin{aligned} \bar{\lambda }^{k}_i\nabla _{y}g_{i}(x^k,x^{k})=\lambda ^{k}_{i} g^{1}_{i}(\bar{x}) \nabla g^{2}_{i}(x^{k}). \end{aligned}$$

Therefore, by (12) and the triangular inequality,

$$\begin{aligned}&\Vert \nabla _{y}f(x^k,x^{k}) + \sum _{i\in A(\bar{x})} \bar{\lambda }^{k}_{i}\nabla _{y}g_{i}(x^k,x^{k})\Vert \\&\quad \le \Vert \nabla _{y}f(x^k,x^{k})-\nabla _{y}f(\bar{x},x^{k})\Vert + \Vert \nabla _{y}f(\bar{x},x^{k}) +\sum _{i\in A(\bar{x})} \lambda ^{k}_{i} g^{1}_{i}(\bar{x}) \nabla g^{2}_{i}(x^{k}) \Vert \rightarrow 0, \end{aligned}$$

and so \(\bar{x}\) is an AKKT-QEP point. \(\square\)

Note that setting \(g_i^1(x)=1\) and \(g_i^3(x)=0\) for all i we obtain the classical EP. Moreover, Theorem 2 also includes QEPs with Linear Constraints with variable right-hand side (see [24, 38]), that is,

$$\begin{aligned} g(x,y)=Ay-b(x), \end{aligned}$$
(13)

where \(A\in {\mathbb {R}}^{m\times n}\) and \(b{{:}}\,{\mathbb {R}}^{ n}\rightarrow {\mathbb {R}}^{m}\) is a given continuously differentiable function. A particularly important class of problems of type (13) are the QEPs with box constraints, that is,

$$\begin{aligned} g(x,y)=\begin{pmatrix} b_l(x)-y \\ y-b_u(x) \end{pmatrix}, \end{aligned}$$
(14)

where \(b_l,b_u{{:}}\,{\mathbb {R}}^{n}\rightarrow {\mathbb {R}}^{n}\) are given mappings which describe lower and upper bounds on the variable y, which may depend on x.

In fact, one may show the validity of AKKT-QEP for more general constraints than (13), but which are not contemplated by Theorem 2. For this, it is enough to have a constraint qualification holding at \(K(\bar{x})\) for \(\bar{x}\) fixed, hence, \(\bar{x}\) satisfies KKT-QEP and thus AKKT-QEP with constant sequences. For example, consider a problem where the constraints are of the form

$$\begin{aligned} g(x,y)=M(x)y-b(x), \end{aligned}$$
(15)

where \(M{{:}}\,{\mathbb {R}}^{n} \rightarrow {\mathbb {R}}^{m \times n}\) and \(b{{:}}\,{\mathbb {R}}^{n}\rightarrow {\mathbb {R}}^{m}\) are continuously differentiable functions. This class includes QEPs with bilinear constraints (see [24, 36, 38]), that is,

$$\begin{aligned} g(x, y) = \begin{pmatrix} x^{\top }Q_{1}y-b_{1} \\ \vdots \\ x^{\top } Q_{m} y - b_{m} \end{pmatrix}, \end{aligned}$$
(16)

where each \(Q_{i}\in {\mathbb {R}}^{n\times n}\) are symmetric matrices for all \(i = 1,\ldots ,m\) and \(b_{i}\in {\mathbb {R}}\) are given real numbers. To see this, set M(x) as the matrix with the ith row given by \(x^TQ_i\).

For a fixed \(\bar{x}\) we have that the constraints \(g(\bar{x},y) \le 0\) are linear and so it satisfies a constraint qualification. So every solution \(\bar{x}\) of (3) is a KKT point associated with some Lagrange multiplier \(\bar{\lambda }\). Taking \(x^k=\bar{x}\) and \(\lambda ^k=\bar{\lambda }\) for all k we have the desired result.

New, let us consider (QEP) with binary constraints [24, 38], that is, each continuously differentiable constraint \(g_{i}(x,y)\) depends on a single pair \((x_{j},y_{j})\) for some \(j = j(i) \in \{1, \ldots , n\}\). Then

$$\begin{aligned} K(x) = \left\{ y \in {\mathbb {R}}^{n}{{:}}\,g_{i}(x_{j(i)}, y_{j(i)}) \le 0, \quad \forall \, i \in \{1, \dots , m\} \right\} . \end{aligned}$$
(17)

This class of problems reduces to problems in which each constraint depends on one argument. In this case, let us see that AKKT-QEP is necessary at a solution.

Theorem 3

Consider problem (QEP) with K(x) as in (17). Let\(\bar{x}\)be a solution. Then\(\bar{x}\)is an AKKT-QEP point.

Proof

Since \(\bar{x}\) is a solution of the optimization problem

$$\begin{aligned} \min _{y} f(\bar{x},y) \,\, \mathrm{s.t.} \, g_{i} (\bar{x}_{j(i)}, y_{j(i)}) \le 0, \quad \forall \, i \in \{1, \dots , m\}, \end{aligned}$$

by Theorem 1, there exist \(\lbrace x^{k} \rbrace \subset {\mathbb {R}}^{n}\) and \(\lbrace \lambda ^{k} \rbrace \subset {\mathbb {R}}^{\vert A(\bar{x}) \vert }_{+}\) such that \(x^{k}\rightarrow \bar{x}\) and

$$\begin{aligned} \Vert \nabla _{y}f(\bar{x},x^{k}) +\sum _{i\in A(\bar{x})} \lambda ^{k}_{i}\nabla _{y}g_{i}(\bar{x},x^{k}) \Vert \rightarrow 0, \end{aligned}$$
(18)

where \(\nabla _{y}g_{i}(x,y)= \left( 0, \ldots , 0, \partial _{y_{j(i)}} g_{i} (x_{j(i)}, y_{j(i)}), 0, \ldots , 0\right) ^{\top }.\) Once again, note that we can redefine \(\lambda ^{k}_{i}=0\) if \(\nabla _{y}g_{i}(\bar{x},x^k)=0\). Now, define

$$\begin{aligned} \bar{\lambda }^{k}_i := \left\{ \begin{array}{ll} 0, &{}\quad {\text {if}} \, \partial _{y_{j(i)}} g_{i} (x^{k}_{j(i)}, x^{k}_{j(i)})=0,\\ \lambda ^{k}_{i} \frac{\partial _{y_{j(i)}} g_{i} (\bar{x}_{j(i)}, x^{k}_{j(i)})}{\partial _{y_{j(i)}} g_{i} (x^{k}_{j(i)}, x^{k}_{j(i)})}, &{}\quad {\text {otherwise.}} \\ \end{array} \right. \end{aligned}$$

Note that \(x^{k}\rightarrow \bar{x}\) and that \(\bar{\lambda }^{k}_i\) has the same sign of \(\lambda ^{k}_{i}\) for k large enough. Moreover,

$$\begin{aligned} \bar{\lambda }^{k}_i\nabla _{y}g_{i}(x^k,x^{k})= \lambda ^{k}_{i}\nabla _{y}g_{i}(\bar{x},x^{k}). \end{aligned}$$

Therefore, by (18), the continuity \(\nabla _y f(x,y)\) and the triangular inequality,

$$\begin{aligned}&\Vert \nabla _{y}f(x^k,x^{k}) + \sum _{i\in A(\bar{x})} \bar{\lambda }^{k}_{i} \nabla _{y} g_{i} (x^k,x^{k}) \Vert \\ & \quad \le \Vert \nabla _{y} f(x^k, x^{k}) - \nabla _{y} f(\bar{x}, x^{k})\Vert + \Vert \nabla _{y} f(\bar{x}, x^{k}) + \sum _{i\in A(\bar{x})} \lambda ^{k}_{i} \nabla _{y} g_{i} (\bar{x}, x^{k}) \Vert \rightarrow 0, \end{aligned}$$

so \(\bar{x}\) is an AKKT-QEP point. \(\square\)

Remark 1

We note that interpreting the example in [40] as an equilibrium problem in a natural way, we see that AKKT-QEP is not sufficient for optimality in the case of a convex problem.

In nonlinear optimization, a constraint qualification is needed for ensuring that a solution satisfies the KKT conditions. The same holds true for a solution of an EP to satisfy KKT-QEP. Since AKKT is a necessary optimality condition, any property on the feasible set that guarantees that an AKKT point is KKT, is actually a constraint qualification. A constraint qualification with this property has been called a strict constraint qualification in [5].

On the other hand, algorithms for nonlinear optimization usually generate sequences whose limit points satisfy AKKT. From [2,3,4,5,6], it is well-known that a separate analysis of the sequences generated by the algorithm, together with the (strict) constraint qualification needed for this limit point to satisfy KKT, yields global convergence results to a KKT point under a weak constraint qualifications.

In the context of QEPs, the fact that AKKT-QEP is not necessarily satisfied at a solution has some drawbacks for algorithms that generate AKKT-QEP sequences (see [14] for a discussion around this issue in the context of GNEPs). Moreover, for an algorithm that generates AKKT-QEP sequences, conditions for ensuring that AKKT-QEP points are KKT-QEP are an important issue. These conditions are weaker than the usual MFCQ for QEPs. Therefore, this analysis provides new global convergence results for QEPs under weaker assumptions.

We list some relevant conditions below:

Definition 7

Consider a continuously differentiable constraint bifunction \(g{{:}}\,{\mathbb {R}}^{n}\times {\mathbb {R}}^n \rightarrow {\mathbb {R}}^{m}\) and a feasible point \(\bar{x}\in {{\mathbb {R}}}^n\). We say that:

  1. (a)

    LICQ-QEP holds at \(\bar{x}\) if \(\{\nabla _{y} g_{i} (\bar{x}, \bar{x}){{:}}\, i \in A(\bar{x})\}\) is linearly independent.

  2. (b)

    MFCQ-QEP holds at \(\bar{x}\) if \(\{\nabla _{y} g_{i} (\bar{x}, \bar{x}){{:}}\, i \in A(\bar{x})\}\) is p.l.i.

  3. (c)

    WCPLD-QEP holds at \(\bar{x}\) if there exits a neighborhood U from \({\mathbb {R}}^{n}\) of \(\bar{x}\) such that, if \(I \subseteq A(\bar{x})\) is such that \(\left\{ \nabla _{y}g_{i}(\bar{x},\bar{x} )\right\} _{i\in I}\) is p.l.d., then \(\left\{ \nabla _{y}g_{i}(x,x)\right\} _{i\in I}\) is p.l.d. for all \(x\in U\).

  4. (d)

    WCCP-QEP holds at \(\bar{x}\) if the set-valued mapping \(C{{:}}\,{\mathbb {R}}^{n}\rightrightarrows {\mathbb {R}}^{n}\) is outer semicontinuous at \(\bar{x}\), that is, \(\limsup _{x\rightarrow \bar{x}}C(x)\subseteq C(\bar{x})\), where

    $$\begin{aligned} C(x)&= \left\{ w \in {\mathbb {R}}^{n}{{:}}\, w = \sum _{i \in A(\bar{x})} \lambda _{i} \nabla _{y} g_{i} (x, x), \, \lambda _{i} \ge 0 \right\} , {\mathrm{\, and \,}}\\&\limsup _{x \rightarrow \bar{x}} C(x) = \left\{ w \in {\mathbb {R}}^{n}{{:}}\, \, \exists \, x^{k} \rightarrow \bar{x}, \, \exists \, w^{k} \rightarrow w, \, w^{k} \in C(x^{k}) \right\} . \end{aligned}$$

When \(\bar{x}\in {\mathbb {R}}^n\) is not necessarily feasible, we say that

  1. (e)

    EMFCQ-QEP (Extended-MFCQ-QEP) holds at \(\bar{x}\) if \(\{ \nabla _{y} g_{i} (\bar{x}, \bar{x}){{:}}\, i \in A_E(\bar{x})\}\) is p.l.i., where \(A_E(\bar{x})=\{ i{{:}}\, g_{i} (\bar{x},\bar{x}) \ge 0 \}\).

To show that LICQ-QEP, MFCQ-QEP and EMFCQ-QEP are CQs for QEPs, it is enough to show that each property implies the corresponding optimization CQ for problem (3) at \(y=\bar{x}\). However, Example 1 shows that WCPLD and WCCP are not CQs for QEPs, since the KKT conditions do not hold at any solution of the problem.

In order to obtain a CQ, we proceed as in [14], i.e., we require the validity on an arbitrary neighborhood of \((\bar{x},\bar{x})\) in \({\mathbb {R}}^{n} \times {\mathbb {R}}^{n}\), and not only in points of the form (xx). That is, we arrive at the following constraint qualifications.

Definition 8

Consider a continuously differentiable constraint bifunction \(g{{:}}\,{\mathbb {R}}^{n} \times {\mathbb {R}}^{n} \rightarrow {\mathbb {R}}^{m}\) and a feasible point \(\bar{x}\in {{\mathbb {R}}}^n\). We say that:

  1. (a)

    CPLD-QEP holds at \(\bar{x}\) if there exists a neighborhood U from \({\mathbb {R}}^{n} \times {\mathbb {R}}^{n}\) of \((\bar{x},\bar{x})\) such that, if \(I \subseteq A(\bar{x})\) is such that \(\left\{ \nabla _{y} g_{i} (\bar{x}, \bar{x})\right\} _{i \in I}\) is p.l.d., then \(\left\{ \nabla _{y}g_{i}(x,y)\right\} _{i\in I}\) is p.l.d. for all \((x,y)\in U\).

  2. (b)

    CCP-QEP holds at \(\bar{x}\) if the set-valued mapping \(\bar{C}{{:}}\, {\mathbb {R}}^{n} \times {\mathbb {R}}^{n} \rightrightarrows {\mathbb {R}}^{n}\) is outer semicontinuous at \((\bar{x},\bar{x} )\), that is, \(\limsup _{(x, y) \rightarrow (\bar{x},\bar{x})} \bar{C} (x, y) \subseteq \bar{C} (\bar{x}, \bar{x})\), where

    $$\begin{aligned} \bar{C} (x, y)&= \left\{ w \in {\mathbb {R}}^{n}{{:}}\, w = \sum _{i \in A(\bar{x})} \lambda _{i} \nabla _{y} g_{i} (x, y), \, \lambda _{i} \ge 0 \right\} , \mathrm{\, and \,} \\&\quad \limsup _{(x, y) \rightarrow (\bar{x}, \bar{x})} \bar{C} (x, y) = \left\{ w \in {\mathbb {R}}^{n}{{:}}\, \, \exists \, (x^{k}, y^k) \rightarrow (\bar{x}, \bar{x}), \, \exists \, w^{k} \rightarrow w, w^{k} \in \bar{C} (x^{k}, y^k) \right\} . \end{aligned}$$

    Clearly CPLD-QEP (CCP-QEP) implies both WCPLD-QEP (WCCP-QEP) and the traditional CPLD (CCP) in the context of optimization for the constraints \(g(\bar{x},y)\le 0\), which means that CPLD-QEP and CCP-QEP are CQs for QEPs. Using the same arguments presented in [14], we have the following strict implications:

    $$\begin{aligned} {\text {LICQ-QEP}} \, \Longrightarrow \, {\text {MFCQ-QEP}} \, \Longrightarrow \, {\text {CPLD-QEP}} \, \Longrightarrow \, {\text {CCP-QEP}}. \end{aligned}$$

In the next theorem we show how to arrive at a KKT-QEP point from an AKKT-QEP point, the WCCP-QEP condition is the weakest property that ensures this for every bifunction f.

Theorem 4

The WCCP-QEP condition is equivalent to the fact that for any bifunctionf, AKKT-QEP implies KKT-QEP.

Proof

Let \(\bar{x}\in {{\mathbb {R}}}^n\) be a feasible point satisfying WCCP-QEP. Let f be a bifunction such that AKKT-QEP occurs in \(\bar{x}\). Then, there are sequences \(\lbrace x^{k} \rbrace \subset {\mathbb {R}}^{n}\) with \(x^{k }\rightarrow \bar{x}\) and \(\lbrace \lambda ^{k} \rbrace \subset {\mathbb {R}}^{|A(\bar{x})|}_{+}\) such that

$$\begin{aligned} \Vert \nabla _{y} f(x^{k}, x^{k}) + \sum _{i \in A(\bar{x})} \lambda ^{k}_{i} \nabla _{y} g_{i} (x^{k}, x^{k}) \Vert \rightarrow 0. \end{aligned}$$

Let \(w^{k}=\sum _{i\in A(\bar{x})}\lambda ^{k}_{i}\nabla _{y} g_{i}(x^{k},x^{k}) \in C(x^{k})\). Then \(w^{k}\rightarrow -\nabla _{y} f(\bar{x},\bar{x})\), i.e., \(-\nabla _{y} f(\bar{x},\bar{x}) \in \limsup _{x\rightarrow \bar{x}} C(x)\). From the WCCP-QEP condition, it follows that \(-\nabla _{y} f(\bar{x},\bar{x})\in C(\bar{x})\), so \(\bar{x}\) is a KKT-QEP point.

Conversely, assume that AKKT-QEP implies KKT-QEP for any bifunction. Let \(w \in \limsup _{x\rightarrow \bar{x}} C(x)\). Then there exist sequences \(x^{k} \rightarrow \bar{x}\) and \(w^{k} \rightarrow w\) such that \(w^{k}\in C(x^{k})\). Define the bifunction \(f (x, y) := \langle x - y, w \rangle\) with \(\nabla _{y} f (x, y) = - w \in {\mathbb {R}}^{n}\). As \(w^{k} \in C(x^{k})\), there exists \(\{ \lambda ^{k}\} \subset {\mathbb {R}}^{|A(\bar{x})|}_{+}\) such that

$$\begin{aligned} w^{k}=\sum _{i\in A(\bar{x})} \lambda ^{k}_{i}\nabla _{y}g_{i}(x^{k},x^{k}). \end{aligned}$$

Since \(\nabla _{y} f (x^{k}, x^{k}) = - w\) and \(w^{k} \rightarrow w\), we have

$$\begin{aligned} \nabla _{y} f (x^{k}, x^{k}) + \sum _{i \in A(\bar{x})} \lambda ^{k}_{i} \nabla _{y} g_{i} (x^{k}, x^{k}) \rightarrow 0. \end{aligned}$$

So, \(\bar{x}\) satisfies AKKT-QEP, i.e., KKT-QEP holds. Hence, \(-\nabla _{y} f (\bar{x}, \bar{x}) = w \in C(\bar{x})\) and WCCP-QEP holds. \(\square\)

Note that it is possible to define a weaker notion of AKKT-QEP involving two sequences \(\{x^k\}\) and \(\{y^k\}\) converging to \(\bar{x}\). This would trivially be a necessary optimality condition since one may take the constant sequence \(y^k:=\bar{x}\). However, as far as we know, algorithms for QEPs always generate sequences of the form \(y^k=x^k\), hence, the usefulness of this definition is not clear. On the other hand, this may suggest a way to define new algorithms with different, maybe better, convergence properties. In some of the numerical tests we show a possibility to exploit this fact. With this type of sequence we can obtain an analogous version of Theorem 4 by replacing WCCP-QEP with CCP-QEP and AKKT-QEP with this weaker version. The proof follows in the same way by replacing the sequence \(x^k\) by \((x^k,y^k)\) and \(C(x^k)\) by \(\bar{C}(x^k,y^k)\).

4 An Augmented Lagrangian method

In this section we propose an Augmented Lagrangian method for QEPs. From now on we will consider (QEP) where K is defined as follows:

$$\begin{aligned} K(x) = \{ y \in {\mathbb {R}}^{n}{{:}}\, g(x, y) \le 0, \, h(x, y) \le 0\}, \end{aligned}$$
(19)

where \(g{{:}}\,{\mathbb {R}}^{n} \times {\mathbb {R}}^{n} \rightarrow {\mathbb {R}}^{m}\) and \(h{{:}}\,{\mathbb {R}}^{n} \times {\mathbb {R}}^{n} \rightarrow {\mathbb {R}}^{l}\) are continuously differentiable. In order to solve the QEP, we follow the approach from [36] where the authors compute a solution of a QVI by solving a sequence of suitable QVIs.

Similarly to the minimization problem, we separate the set of constraints (19) in two parts. The part described by g, with the difficult constraints, will be penalized, while the part described by h will define the constraints of the subproblems at each iteration. Therefore, our analysis includes the case when all the constraints are penalized, where the subproblems are unconstrained, and also when the subproblems are EPs.

Formally, at each iteration of the algorithm, the new mapping that defines the constraints of the subproblems will be defined as \(K_h{{:}}\,{\mathbb {R}}^{n} \rightrightarrows {\mathbb {R}}^{n}\) with

$$\begin{aligned} K_{h} (x) := \left\{ y \in {\mathbb {R}}^{n}{{:}}\, h(x,y) \le 0 \right\} . \end{aligned}$$
(20)

Given \(u\in {\mathbb {R}}^m\) and \(\rho >0\), we define the Augmented Lagrangian bifunction, with respect to the constraints bifunction g, as

$$\begin{aligned} L(x,y;u,\rho )& {} = f(x,y)+\dfrac{\rho }{2}\sum ^{m}_{i=1} \left[ \max \left \lbrace 0,g_{i}(x,y)+ \dfrac{u_{i}}{\rho }\right\rbrace \right] ^{2}\\&-\,\dfrac{\rho }{2}\sum ^{m}_{i=1} \left[ \max \left \lbrace 0,g_{i}(x,x) +\dfrac{u_{i}}{\rho } \right \rbrace \right] ^{2}, \end{aligned}$$

where \(\rho\) is a suitable penalty parameter and \(u_{i}\) denotes a safeguarded estimate for the Lagrange multipliers \(\lambda _{i}\) associated with \(g_{i}\). The Augmented Lagrangian bifunction, together with the mapping \(K_{h}\), define in each iteration of the algorithm a new QEP denoted by QEP\((u,\rho )\).

Remark 2

If \(g_{i} (x, y) = c_{i} (y)\) for all \(i = 1, \ldots , m\), then the Augmented Lagrangian (27) reduces to the Augmented Lagrangian for EPs (see [33, Equation (2.4)]) by taking \(\gamma = \frac{1}{\rho }\). Furthermore, if \(f(x, y) = u(y) - u(x)\) and \(g_{i} (x, y) = c_{i} (y)\) for all \(i = 1, \ldots , m\), then the Augmented Lagrangian (27) reduces to the usual Augmented Lagrangian for the minimization problem (1) (see [43] for instance).

Our algorithm, in each iteration, computes an \(\epsilon\)-inexact KKT-QEP point for a tolerance \(\epsilon \rightarrow 0^+\) of QEP\((u,\rho )\) (for values of u and \(\rho\) that will be updated in each iteration) to find a KKT-QEP point of (QEP) under a weak constraint qualification. Note that the algorithm reduces to the one in [12] in the case of optimization and to the one in [14, 37] in the case of generalized Nash equilibrium problems. On the other hand, subproblems can be solved differently by exploiting the specific structure.

The precise statement of our Augmented Lagrangian method is given in Algorithm 1.

figure a

A natural choice of the sequence \(\{u^k\}\) is \(u^{k+1} = \min \{ \lambda ^{k},u^{max}\}\). Recall that, from Definition 4, the pair \((x^{k},\mu ^{k})\) computed in Step 2 must be such that:

$$\begin{aligned} \Vert \nabla _{y}L(x^{k},x^{k},u^{k},\rho ^{k} ) + \nabla _{y}h(x^{k},x^{k})\mu ^{k}\Vert \le \epsilon _{k}, \end{aligned}$$
(22)
$$\begin{aligned} \Vert \min \left\{ -h(x^{k}, x^{k}), \mu ^{k} \right\} \Vert \le {\epsilon }_{k}. \end{aligned}$$
(23)

Similarly to [36], our main result with respect to feasibility could be obtained requiring only that the expression in (22) is bounded, not necessarily converging to zero. We adopt the current presentation for clarity of exposition.

We proceed by considering the convergence properties of Algorithm 1. The analysis of the algorithm is divided into the study of feasibility and optimality. Regarding the former, note that (23) already implies that every limit point of \(\lbrace x^{k}\rbrace\) satisfies the h-constraints.

For the discussion of feasibility with respect to the g-constraints, we introduce an auxiliary QEP, which consists of finding \(x\in K_{h}(x)\) such that:

$$\begin{aligned} \varPsi (x, y) \ge 0, \quad \forall \, y \in K_{h} (x), \end{aligned}$$
(24)

where

$$\begin{aligned} \varPsi (x,y)= \dfrac{\Vert g_{+}(x,y) \Vert ^{2} -\Vert g_{+}(x,x) \Vert ^{2}}{2}. \end{aligned}$$

Note that its associated KKT-QEP system is given by:

$$\begin{aligned} \nabla _{y} g(x, x) g_{+} (x, x) + \sum ^{l}_{j=1} \mu _{j} \nabla _{y} h_{j} (x, x)& {} = 0, \\ \mu _{j} \ge 0, \, h_{j} (x, x) \le 0, \, \mu _{j} h_{j} (x, x) = 0, \quad \forall \, j& {} = 1,\dots ,l. \end{aligned}$$

Clearly, a solution \(\bar{x}\) of (QEP) related to \((\varPsi , K_h)\), denoted by QEP\((\varPsi ,K_h)\), is such that \(\Vert g_+(\bar{x}, y)\Vert\) is globally minimized for \(y \in K_h(\bar{x})\) at \(y=\bar{x}\). Hence, if the feasible region of (QEP) is non-empty, \(\bar{x}\) is feasible for (QEP). Since we can only prove feasibility under strong assumptions (see Theorem 6), the following result shows that any limit point of a sequence generated by Algorithm 1 at least tends to be feasible, in the sense that it satisfies AKKT-QEP for QEP\((\varPsi , K_h)\).

Theorem 5

Let\(\lbrace x^{k}\rbrace\)be a sequence generated by Algorithm 1. Any limit point of\(\lbrace x^{k} \rbrace\)satisfies the AKKT-QEP condition forQEP\((\varPsi ,K_{h})\).

Proof

Let us assume that \(x^k\rightarrow x^*\) in a subsequence. Since h is continuous and (23) holds, we have \(h(x^{*},x^{*})\le 0\) and hence \(x^{*}\) is feasible for QEP\((\varPsi ,K_{h})\).

If the sequence \(\{ \rho ^{k} \}\) is bounded, then \(\lim _{k \rightarrow \infty } \Vert \max \{g(x^{k}, x^{k}), - \lambda ^{k}\}\Vert = 0\) by Step 4, thus \(g(x^{*}, x^{*}) \le 0\), i. e., \(x^{*}\) is feasible for (QEP). This clearly gives an AKKT-QEP sequence with zero dual sequence for QEP\((\varPsi ,K_{h})\).

Let us consider now that \(\{\rho ^{k}\}\) is unbounded. It follows from (22) that \(\Vert \delta ^k\Vert \le \epsilon _k\), where

$$\begin{aligned} \delta ^{k}& {} = \nabla _{y} f(x^{k}, x^{k}) + \sum ^{m}_{i = 1} \max \left\{ 0, u^{k}_{i} \right. \\&\left. \quad +\,\rho ^{k} g_{i} (x^{k}, x^{k})\right\} \nabla _{y} g_{i} (x^{k}, x^{k}) + \sum ^{l}_{j = 1} \mu ^{k}_{j} \nabla _{y} h_{j} (x^{k}, x^{k}). \end{aligned}$$

Dividing by \(\rho ^{k}\), we obtain

$$\begin{aligned} \dfrac{\delta ^{k}}{\rho ^{k}} = \dfrac{\nabla _{y} f(x^{k}, x^{k})}{\rho ^{k}} + \sum ^{m}_{i = 1} \max \left\{ 0, \dfrac{u^{k}_{i}}{\rho ^{k}} + g_{i} (x^{k}, x^{k})\right\} \nabla _{y} g_{i} (x^{k}, x^{k}) + \sum ^{l}_{j = 1} \dfrac{\mu ^{k}_{j}}{\rho ^{k}}\nabla _{y} h_{j} (x^{k}, x^{k}), \end{aligned}$$

Since \(\{\epsilon _k\}\) is bounded, \(\frac{\delta ^{k}}{\rho ^{k}}\rightarrow 0\). Furthermore, due to the boundedness of \(\{ u^{k}\}\), if \(g_{i} (x^{*}, x^{*}) < 0\), then \(\max \{0, \frac{u^{k}_{i}}{\rho ^{k}} + g_{i} (x^{k}, x^{k})\} = 0\) for k large enough. Therefore,

$$\begin{aligned} \lim _{k \in K} \Vert \sum _{i{{:}}\, \, g_{i} (x^{*}, x^{*}) \ge 0} g_{i} (x^{k}, x^{k}) \nabla _{y} g_{i} (x^{k}, x^{k}) + \sum ^{l}_{j = 1} \dfrac{\mu ^{k}_{j}}{\rho ^{k}} \nabla _{y} h_{j} (x^{k}, x^{k}) \Vert = 0. \end{aligned}$$
(25)

From (23) and the fact that \(\nabla _{y}\varPsi (x^{k},x^{k})=\sum _{i{{:}}\, g_{i} (x^{*}, x^{*}) \ge 0} g_{i} (x^{k}, x^{k}) \nabla _{y} g_{i} (x^{k}, x^{k})\), we have that \(x^{*}\) satisfies the AKKT-QEP condition for QEP\((\varPsi , K_{h})\). \(\square\)

Corollary 1

Under the assumptions of Theorem5, if\(x^{*}\)fulfills WCCP-QEP with respect to theh-constraints describing\(K_h\), then\(x^{*}\)is a KKT-QEP point of QEP\((\varPsi ,K_h)\).

Proof

It is a consequence of Theorems 4 and 5. \(\square\)

Let us now state some particular cases of QEPs or additional conditions that ensure that a limit point of Algorithm 1, that is, an AKKT-QEP point for QEP\((\varPsi ,K_h)\), is indeed feasible for QEP(fK). Note that, from the proof of Theorem 5, this is the case when \(\{\rho ^k\}\) is bounded. The proofs are omitted since they are small adaptations of the ones from [38]. The first one uses the traditional argument that, under EMFCQ, a certain null linear combination of the constraint gradients can only occur with null coefficients.

Theorem 6

Let\(x^*\)be a limit point of a sequence generated by Algorithm 1, and suppose that\(x^*\)satisfies EMFCQ-QEP regarding the constraints defined bygandh. Then\(x^*\)is feasible for (QEP).

In the next results, the main argument is that, under certain conditions, KKT points are indeed solutions to the problem.

Consider problem (QEP) with \(K(x)=\{c(x)+ S(x)w{{:}}\,w\in Q_{1}\cap Q_{2}\}\), where \(S(x) \in {\mathbb {R}}^{n \times n}\) is nonsingular for all x, \(Q_{i} := \{ x \in {\mathbb {R}}^{n}{{:}}\,q^{i} (x) \le 0 \}\) for \(i = 1, 2\), and \(q^{1}{{:}}\, {\mathbb {R}}^{n} \rightarrow {\mathbb {R}}^{m}\), \(q^{2}{{:}}\, {\mathbb {R}}^{n} \rightarrow {\mathbb {R}}^{l}\) are convex. Here K(x) has the form of (19) with

$$\begin{aligned} g(x,y)=q^{1}( S^{-1}(x)[y-c(x)])\quad {\text {and}} \quad h(x,y)= q^{2}(S^{-1}(x)[y-c(x)]). \end{aligned}$$
(26)

Then we have the following

Theorem 7

Let\(x^*\)be a limit point of a sequence generated by Algorithm 1 applied to a QEP of the form (26) with\(Q_{1}\cap Q_{2}\ne \emptyset\). If\(x^*\)satisfies WCCP-QEP regarding theh-constraints, then\(x^*\)is feasible for (QEP).

The following results are also proved in a similar way

Theorem 8

Consider a QEP with bilinear constraints, wheregis given by (16), where each\(Q_{i}\in {\mathbb {R}}^{n\times n}\)is symmetric positive semidefinite for\(i = 1,\ldots ,m\)and\(h(x,y)=(h_{1}(y),\ldots ,h_{l}(y))^{T}\)has convex components. Let\(x^*\)be a limit point of a sequence generated by Algorithm 1. Suppose that\(x^*\)satisfies WCCP-QEP regarding theh-restrictions. Then\(\bar{x}\)is feasible.

Theorem 9

Consider a QEPs with linear constraints with variable right-hand side given by (13). Suppose that\(rank(A)=m\). Let\(x^*\)be a limit point of a sequence generated by Algorithm 1. Suppose that\(x^*\)satisfies WCCP-QEP regarding theh-restrictions. Then \(\bar{x}\)is feasible.

Theorem 10

Consider a QEPs with box constraints given by (14). Suppose that\(b_l(\bar{x})\le b_u(\bar{x})\). Let\(x^*\)be a limit point of a sequence generated by Algorithm 1. Suppose that\(x^*\)satisfies WCCP-QEP regarding theh-restrictions. Then\(\bar{x}\)is feasible.

Let us now discuss the optimality properties of the limit points of sequences generated by the Algorithm. The following result says that when the algorithm generates a sequence that has a feasible accumulation point, this is an AKKT-QEP point for the original problem (QEP).

Theorem 11

Assume that the sequence\(\lbrace x^{k}\rbrace\)generated by Algorithm 1 has a feasible limit point\(x^{*}\). Then, \(x^{*}\)satisfies the AKKT-QEP condition for (QEP).

Proof

Let \(\lim _{k\in K}x^{k}=x^{*}\). By Steps 2 and 3, we have that

$$\begin{aligned} \lim _{k\in K} \Vert \nabla _{y}f(x^{k},x^{k})+\sum ^{m}_{i=1}\lambda _i^k\nabla _{y}g_{i}(x^{k},x^{k}) + \sum ^{l}_{j=1}\mu ^{k}_{j}\nabla _{y}h_{j}(x^{k},x^{k})\Vert&=0, \\ \text {and} \, \lim _{k\in K}\Vert \min \left\{ -h(x^{k}, x^{k}), \mu ^{k} \right\} \Vert&=0. \end{aligned}$$

It remains to prove that \(\lim _{k\in K}\min \lbrace -g_{i}(x^{k},x^{k}),\lambda ^{k}_{i} \rbrace =0\) for all \(i=1,\dots ,m\). If \(g_i(x^*,x^*)=0\), the result follows from the continuity of \(g_{i}\). Otherwise, for k large enough we have \(g_{i}(x^{k},x^{k})< c<0\) for some constant c. If \(\lbrace \rho ^{k} \rbrace\) is bounded, Step 4 of the algorithm implies that \(\lambda ^{k}_{i} \rightarrow 0\). On the other hand, the same result follows from the updating scheme of Step 3 and the boundedness of \(\{u^k\}\). This concludes the proof. \(\square\)

Corollary 2

Under the assumptions of Theorem11, if\(x^{*}\)fulfills WCCP-QEP with respect to the constraintsgandhdescribingK, then\(x^{*}\)is a KKT-QEP point of (QEP).

Proof

It is a consequence of Theorems and 11. \(\square\)

The fact that the usual assumptions (LICQ, MFCQ, CPLD, CCP, etc) used in global convergence theorems of nonlinear programming algorithms are constraint qualifications is related to the fact that the algorithm does not discard solutions a priori. If a property P is not a CQ, than there would be a problem whose solution satisfies P but not the KKT conditions. Thus, if a theorem says that under P, a limit point of a sequence generated by an algorithm satisfies KKT, such solution would never be found. As we discussed earlier, several algorithms generate AKKT sequences which, as a genuine necessary optimality condition in the context of nonlinear programming, also do not rule out solutions a priori. In the case of QEPs, the situation is different. The fact that AKKT-QEP is not an optimality condition already implies that algorithms discard solutions that do not satisfy it. Therefore, to study the impact of using an assumption that is not a CQ in such an algorithm, we must turn our attention to solutions that are AKKT-QEP. Among these, all that satisfy WCCP-QEP are KKT-QEP and therefore no additional solution would be discarded by an algorithm that generates AKKT-QEP sequences. In this way, there is no reason to worry that WCCP-QEP is not a constraint qualification. Thus, since WCCP-QEP is weaker than CCP-QEP, Corollary 2 is stronger than the analog one under CCP-QEP, which is indeed a constraint qualification. Note however that this is not a property of the QEP itself, but a deficiency of the Augmented Lagrangian algorithm (and many others) that are only able to generate AKKT-QEP sequences. For instance, in Example 1, no solution satisfy AKKT-QEP, hence the algorithm can not find a solution, despite WCCP-QEP being fulfilled. If somehow an algorithm is developed in such a way that solutions are not discarded, note that WCCP-QEP must be replaced by a constraint qualification in order to arrive at a meaningful global convergence result. This issue is different than its counterpart for GNEPs, since CCP-GNEP is a CQ [14].

5 Solution of EP-subproblems

In this section, we provide sufficient conditions for ensuring the solvability of the EP-subproblems from Step 2 of Algorithm 1. To that end, we consider the case when:

$$\begin{aligned} K(x) = \left\{ y \in {\mathbb {R}}^{n}{{:}}\,\, g(x,y)\le 0, h(y) \le 0\right\} . \end{aligned}$$

Our study is motivated by the well-known existence results for EPs for monotone bifunctions (see [13, 28, 30] and references therein). For that reason, we provide necessary and sufficient conditions for ensuring the monotonicity of the Lagrangian

$$\begin{aligned} L(x, y) = f(x, y) + \dfrac{\rho }{2} \sum ^{m}_{i=1} \left( \max \left\{ 0, g_{i} (x,y) + \frac{u_{i}}{\rho }\right\}\right)^{2} - \dfrac{\rho }{2} \sum ^{m}_{i=1} \left(\max \left\{ 0, g_{i} (x, x) + \frac{u_{i}}{\rho } \right\}\right)^{2}. \end{aligned}$$
(27)

Take \(\alpha (x, y) := \sum ^{m}_{i=1} ( \max \{ 0, g_{i} (x, y) + \frac{u_{i} }{\rho }\})^{2}\) for simplicity. Then, by definition, L is monotone on \(K_h\) if and only if \(L(x, y) + L(y, x) \le 0\) for all \(x, y \in K_h\), or equivalent,

$$\begin{aligned} f(x, y) + f(y, x) \le \frac{\rho }{2} (\alpha (x, x) + \alpha (y, y) - \alpha (x, y) - \alpha (y, x)), \quad \forall \, x, y \in K_h, \end{aligned}$$
(28)

As a consequence, we have:

Proposition 1

The LagrangianLis monotone if and only if Eq. (28) holds. In particular, if\(g_{i} (x, x) = 0\)for all \(x \in K_h\)and all\(i = 1, \ldots , m\), and

$$\begin{aligned} f(x, y) + f(y, x) \le - \dfrac{\rho }{2} \alpha (x, y) - \dfrac{\rho }{2} \alpha (y, x) + \rho \sum ^{m}_{i=1} \left( \max \left\{ 0, \frac{u_{i}}{\rho }\right\} \right) ^{2}, \quad \forall \, x, y \in K_h, \end{aligned}$$

thenLis monotone.

Since the sum of monotone bifunctions is monotone, when f is monotone the monotonicity of L is ensured by the monotonicity of

$$\begin{aligned} \phi (x, y) := \dfrac{\rho }{2} \alpha (x, y) - \dfrac{\rho }{2} \alpha (x, x) \end{aligned}$$
(29)

Therefore, the following results follows easily:

Proposition 2

The bifunction\(\phi\)is monotone if and only if

$$\begin{aligned} \alpha (x, y) + \alpha (y, x) \le \alpha (x, x) + \alpha (y, y), \quad \forall \, x, y \in K_h. \end{aligned}$$
(30)

In particular, if \(g_{i} (x, x) = 0\) for all \(x \in K_h\) and all \(i = 1, \ldots , m\), and

$$\begin{aligned} \alpha (x, y) + \alpha (y, x) \le \rho \sum ^{m}_{i=1} \left( \max \left\{ 0, \dfrac{u_{i}}{\rho } \right\} \right) ^{2}, \quad \forall \, x, y \in K_h, \end{aligned}$$
(31)

then\(\phi\)is monotone.

In order to provide more concrete sufficient conditions, we consider the following assumptions:

Assumption 1

  1. (a)

    \(h{{:}}\, {\mathbb {R}}^{n} \rightarrow {\mathbb {R}}^{l}\) is a function for which \(K_h\) is a convex set.

  2. (b)

    \(f{{:}}\,{\mathbb {R}}^{n}\times {\mathbb {R}}^{n}\rightarrow {\mathbb {R}}^{n}\) is continuously differentiable on its second argument.

  3. (c)

    \(g{{:}}\, {\mathbb {R}}^{n} \times {\mathbb {R}}^{n} \rightarrow {\mathbb {R}}^{m}\) is twice continuously differentiable on \({\mathbb {R}}^{n} \times {\mathbb {R}}^{n}.\)If h is \({\mathbb {R}}^{l}_{+}\)-convex, i.e., each \(h_i\) is convex, then \(K_h\) is convex. The reverse statement does not hold, i.e., there are classes of vector-valued functions for which \(K_h\) is convex without the \({\mathbb {R}}^{l}_{+}\)-convexity assumption on h, for instance, the class of \(*\)-quasiconvex functions (see [35, Definition 2.2]).

If \(\phi\) is \(\nabla _{xy}-\)monotone, then \(\phi\) is monotone (by [11, Theorem 3.1(f)]), i.e., a sufficient condition for \(\phi\) to be monotone is that

$$\begin{aligned} \psi (y) = \nabla _{x} \phi (x, y)&= \sum ^{m}_{i=1} \max \{ 0, \rho g_{i} (x, y) + u_{i}\} \nabla _{x} g_{i} (x, y) \\&\quad -\,\sum ^{m}_{i = 1} \max \{ 0, \rho g_{i} (x, x) + u_{i} \} (J_{x} g_{i} (x, x))^{\top }, \end{aligned}$$

be monotone on \(K_h\), i.e., \(\langle \psi (x) - \psi (y), x - y\rangle \ge 0\) for all \(x, y \in K_h\).

Clearly, \(\psi {{:}}\,{\mathbb {R}}^{n} \rightarrow {\mathbb {R}}^{n}\) is locally Lipschitz without being continuously differentiable. By [34, Proposition 2.3(a)], we known that \(\psi\) is monotone on an open set D if and only if all generalized Jacobians (in the sense of Clarke [18]) from \(\partial \psi (y)\) are positive semidefinite for all \(y \in D\).

We estimate the generalized Jacobian of \(\psi\) below.

Proposition 3

Suppose that Assumption 1 holds. Then the generalized Jacobian of\(\psi\)at\(y \in {\mathbb {R}}^{n}\)satisfies\(\partial \psi (y) \subseteq M(y)\)with

$$\begin{aligned} M(y) = \left\{ \sum ^{m}_{i=1} \max \lbrace 0, \rho g_{i}(x,y)+u_{i}\rbrace \nabla ^{2}_{yx} g_{i} (x,y)+\rho \sum ^{m}_{i=1} s_{i}\nabla _{x}g_{i}(x,y) \left[ \nabla _{y} g_{i} (x,y) \right] ^{\top }\right\} , \end{aligned}$$

where\(s_{i}=1\)if\(\rho g_{i}(x,y)+u_{i}>0\), \(s_i = 0\)if\(\rho g_{i}(x,y) + u_{i}<0\), and\(s_i \in [0,1]\)if\(\rho g_{i}(x,y)+u_{i}=0\).

Proof

\(\psi\) is nonsmooth on its max-terms which are compositions of a smooth and a convex function, i.e., a regular mapping in the sense of Clarke [18]. \(\square\)

If the elements of M(y) are positive semidefinite, then \(\psi\) is monotone, i.e., the monotonicity of L holds whenever f is monotone. A sufficient condition for the elements from M(y) to be positive semidefinite is given below.

Proposition 4

Suppose that Assumption1holds. If the matrices

$$\begin{aligned} \nabla ^{2}_{xy} g_{i} (x, y), \qquad \quad \forall \, i{{:}}\, u_{i}+\rho g_{i} (x, y)&> 0, \\ \nabla _{x} g_{i} (x, y) \left[ \nabla _{y} g_{i} (x,y)\right] ^{\top }, \quad \forall \, i{{:}}\, u_{i} + \rho g_{i} (x, y)&\ge 0, \end{aligned}$$

are positive semidefinite, then all elements inM(y) are positive semidefinite.

Proof

By the representation of M(y) and our assumptions, it follows that each element of M(y) is a nonnegative sum of positive semidefinite matrices, i.e., M(y) is positive semidefinite.\(\square\)

5.1 Example 1: The moving set case

An interesting special case of problem (QEP) is the moving set case [10, 24, 36, 38]. This is the case when \(K(x) = c(x) + Q\) for some vector-valued function \(c{{:}}\,{\mathbb {R}}^{n}\rightarrow {\mathbb {R}}^{n}\) and a closed and convex set Q from \({\mathbb {R}}^{n}\). Usually, Q is given by

$$\begin{aligned} Q := \left\{ x \in {\mathbb {R}}^{n}{{:}}\, \, q(x) \le 0\right\} , \end{aligned}$$
(32)

where \(q{{:}}\, {\mathbb {R}}^{n} \rightarrow {\mathbb {R}}\) is a function such that Q is closed and convex. As we noted in the previous subsection, function q may not be convex.

If q is convex, then we have the following sufficient condition for ensuring monotonicity.

Proposition 5

If\(c{{:}}\, {\mathbb {R}}^{n} \rightarrow {\mathbb {R}}^{n}\), with\(c(x) = (c_{1} (x_{1}), \ldots , c_{n}(x_{n}))^{\top }\), is such that\(c^{\prime }_{i}(x_{i}) < 0\) for all \(i = 1, \ldots , m\), then the elements ofM(y) are positive semidefinite.

Proof

Since \(g_{i}(x,y)=q_{i}(y-c(x))\), we have \(\nabla _{x} g_{i} (x, y) = -Jc(x)^{\top } \nabla q_{i} (y - c(x))\) and \(\nabla _{y} g_{i} (x, y) = \nabla q_{i} (y - c(x))\). Thus

$$\begin{aligned} \nabla _{x} g_{i} (x, y) (\nabla _{y} g_{i} (x,y))^{\top } = - J c(x)^{\top } \nabla q_{i} (y-c(x))\nabla q_{i} (y - c(x))^{\top }. \end{aligned}$$
(33)

By assumption, \(S=-Jc(x)\) is a positive definite and diagonal matrix, so \(S=DD\) with D a positive definite diagonal matrix. Then

$$\begin{aligned} v^{\top } \nabla _{x} g_{i} (x,y) (\nabla _{y} g_{i} (x, y))^{\top }v&= v^{\top } DD \nabla q_{i} (y-c(x))\nabla q_{i}(y-c(x))^{\top }v \\&= v^{\top }DD\nabla q_{i}(y-c(x))\nabla q_{i}(y-c(x))^{\top }D^{-1}Dv \\&= w^{\top }D\nabla q_{i}(y-c(x))\nabla q_{i}(y-c(x))^{\top }D^{-1}w, \end{aligned}$$

where \(w=Dv\) for all \(v\in {\mathbb {R}}^{n}\). Since \(q_i\) is convex, \(\nabla q_{i}(y-c(x)) \nabla q_{i}(y-c(x))^{\top }\) is positive semidefinite, i.e., \(D\nabla q_{i}(y-c(x)) \nabla q_{i}(y-c(x))^{\top }D^{-1}\) is positive definite.

On the other hand, a direct calculus shows that

$$\begin{aligned} \nabla ^{2}_{xy}g_{i}(x,y)=-Jc(x)\nabla ^{2}q_{i}(y-c(x)). \end{aligned}$$

Since \(q_{i}\) is convex, its Hessian is symmetric and positive semidefinite. Hence, \(\nabla ^{2}_{xy}g_{i}(x,y)\) is positive semidefinite and the result follows from Proposition 4. \(\square\)

5.2 Example 2: The binary constraints case

We consider problem (QEP) with g given as in equation (17), that is,

$$\begin{aligned} K(x) = \left\{ y \in {\mathbb {R}}^{n}{{:}}\, \, g_{i} (x_{j(i)}, y_{j(i)}) \le 0, \quad \forall \, i \in \{1, \ldots , m\} \right\} . \end{aligned}$$

A sufficient condition for ensuring the monotonicity of the corresponding subproblems is given below.

Proposition 6

If for all\(i=1,\ldots ,m\), we have

$$\begin{aligned} \nabla _{x_{j(i)}} g_{i} (x_{j(i)}, y_{j(i)}) \nabla _{y_{j(i)}} g_{i} (x_{j(i)}, y_{j(i)}) \ge 0, \, \nabla ^{2}_{y_{j(i)} x_{j(i)}} g_{i} (x_{j(i)}, y_{j(i)}) \ge 0. \end{aligned}$$
(34)

Then all elements inM(y) are positive semidefinite.

Proof

Clearly, \(\nabla _{y} g_{i}(x, y) = \left( 0, \ldots , 0, \nabla _{y_{j(i)}} g_{i}(x_{j(i)}, y_{j(i)}), 0, \ldots ,0 \right) ^{\top }\), i. e., only position j(i) could be nonzero. Then the Jacobian \(\nabla ^{2}_{xy}g_{i}(x,y)\) is given by \(\nabla ^{2}_{y_{j(i)} x_{j(i)}}g_{i}(x_{j(i)},y_{j(i)})\) at the diagonal position (j(i), j(i)), and zero elsewhere. Therefore, it is positive semidefinite by assumption (34).

On the other hand, \(\nabla _{x} g_{i} (x, y) = (0,\ldots ,0,\nabla _{x_{j(i)}} g_{i} (x_{j(i)}, y_{j(i)}),0,\ldots ,0)\). Then, \(\nabla _{x}g_{i}(x,y)(\nabla _{y} g_{i} (x,y))^{\top }\) is a diagonal matrix with value \(\nabla _{x_{j(i)}} g_{i} (x_{j(i)}, y_{j(i)}) \nabla _{y_{j(i)}} g_{i} (x_{j(i)}, y_{j(i)})\) at position (j(i), j(i)), and zero elsewhere.

Therefore, the result follows from assumptions (34) and Proposition 4. \(\square\)

A special case of constraints with variable right-hand side which are also binary constraints is defined below

$$\begin{aligned} K(x) = \{ y \in {\mathbb {R}}{{:}}\, \, g_{i} (x, y) = c_{i} (x_{j(i)}) + d_{i} (y_{j(i)}) \le 0, \quad \forall \, i \in \{1, \ldots , m\} \}, \end{aligned}$$
(35)

where \(c_{i}, d_{i}\) are twice continuously differentiable functions for each i. Here

$$\begin{aligned} \nabla _{x_{j(i)}} g_{i} (x, y) = c^{\prime }_{i} (x_{j(i)}), \, \nabla _{y_{j(i)}} g_{i} (x, y) = d^{\prime }_{i} (y_{j(i)}), \, {\mathrm{and }} \nabla ^{2}_{y_{j(i)} x_{j(i)}} g_{i} (x,y)=0. \end{aligned}$$

The following result follows easily from the previous proposition.

Corollary 3

If\(c^{\prime }_{i} (x_{j(i)}) d^{\prime }_{i} (y_{j(i)}) \ge 0\) for all \(i = 1, \ldots , m\), then all elements inM(y) are positive semidefinite.

Another example is the class of problems with box constraints (with variable right-hand side). Recall that

$$\begin{aligned} K(x) = \left\{ y \in {\mathbb {R}}^{n}{{:}}\, y_{i} - \alpha _{i} x_{i} - \gamma _{i} \le 0, \quad \forall \, i \in \{1, \ldots , n \}\right\} . \end{aligned}$$
(36)

Clearly, if \(\alpha _{i} \le 0\) for all i, then all elements in M(y) are positive semidefinite.

5.3 Example 3: Mixed variational inequalities

In this subsection, we provide an example of an interesting and usual variational inequality for which its associated Augmented Lagrangian is monotone or pseudomontone bifunction, but for which there exists a positive answer for finding the solution of the related EP-subproblem.

Let \(A \in {\mathbb {R}}^{n \times n}\) be a symmetric matrix and \(a \in {\mathbb {R}}^{n}\). We consider the following variational inequality problem:

$$\begin{aligned} \mathrm{find}\,\overline{x} \in K_{h}{{:}}\, \,\, \langle A\overline{x} + a, y - \overline{x} \rangle \ge 0, \quad \forall \, y \in K_{h}, \end{aligned}$$
(37)

where \(h:=(h_1, h_2, \ldots , h_m)\) is such that \(K_h\) is a convex set and for each \(i = 1, 2, \ldots , m\), the function \(h_{i} {{:}}\,{\mathbb {R}}^{n} \rightarrow {\mathbb {R}}\) is continuous.

Given \(u \in {\mathbb {R}}^{n}\) and \(\rho > 0\). The Augmented Lagrangian associated to (37) is given by:

$$\begin{aligned} L^{A}_{u, \rho } (x, y)& {} = \langle Ax + a, y-x\rangle + \dfrac{\rho }{2} \sum ^{m}_{i=1} \left( \left( \max \left \{ 0, h_{i} (y) + \dfrac{u_{i}}{\rho } \right \} \right) ^{2} \right. \\&\left. \quad - \left( \max \left\{ 0, h_{i} (x) + \dfrac{u_{i}}{\rho } \right\} \right) ^{2}\right) . \end{aligned}$$

Note that solving Step 2 of Algorithm 1 is equivalent for finding a solution of the following mixed variational inequality on \({\mathbb {R}}^{n}\):

$$\begin{aligned} {\mathrm{find}}\,\overline{x} \in {\mathbb {R}}^{n}{{:}}\, \langle A\overline{x} + a, y - \overline{x}\rangle + g(y) - g(\overline{x}) \ge 0, \quad \forall \, y \in {\mathbb {R}}^{n}, \end{aligned}$$
(38)

where \(g(\cdot ) := \frac{\rho }{2} \sum ^{m}_{i=1} (\max \{ 0, h_{i} (\cdot ) + \frac{u_{i}}{\rho } \})^{2}\). This class of problems is of special interest due to its applications in economics, mechanics and electronics (see [26] for instance).

Set \(f^{g}_{A}{{:}}\, {\mathbb {R}}^{n} \times {\mathbb {R}}^{n} \rightarrow {\mathbb {R}}\) given by

$$\begin{aligned} f^{g}_{A} (x, y) := \langle A x + a, y - x \rangle + g(y) - g(x). \end{aligned}$$
(39)

Clearly, \(\overline{x} \in {\mathbb {R}}^{n}\) is a solution of problem (38) if and only if it is a solution of the following equilibrium problem:

$$\begin{aligned} {\mathrm{find}}\,\overline{x} \in {\mathbb {R}}^{n}{{:}}\, \,\, f^{g}_{A} (\overline{x}, y) \ge 0, \quad \forall \, y \in {\mathbb {R}}^{n}. \end{aligned}$$
(40)

Note that \(f^{g}_{A} (x, x) = 0\) for all \(x \in {\mathbb {R}}^{n}\), and that if g is continuous, then \(f^{g}_{A} (x, \cdot )\) and \(f^{g}_{A} (\cdot , y)\) are continuous for all \(x, y \in {\mathbb {R}}^{n}\). If A is positive semidefinite, then \(f^{g}_{A}\) is monotone without any further assumption on g.

Proposition 7

[32, Proposition 3.3] IfAis positive semidefinite, then\(f^{g}_{A}\)is monotone.

If A is not semidefinite positive, then \(f^{g}_{A}\) may be not monotone (see [32, Example 3.3]). However, in this case we could use Algorithm 1 whenever \(f^{g}_{A}\) is pseudomonotone (see [20, 31, 32] for existence results even in the nonconvex case). Necessary and sufficient conditions for ensuring the pseudomonotonicity of bifunctions defined my matrices may be found in [27]. Finally, an algorithm for a class of pseudomonotone equilibrium problems may be found in [7].

6 Numerical experiments

To illustrate the behavior of the method, we have conducted some numerical experiments. The algorithm was implemented in Octave 5.1.0 using parameters \(\tau =0.5\), \(\gamma =10\), \(u^{max}=10^8\), \(\epsilon _k=\epsilon =10^{-4}\), and the initial penalty parameter was chosen as \(\rho ^1=1\). To solve the subproblems we used Octave’s built-in function lsqnonlin to solve its KKT system. Following the Augmented Lagrangian approach for optimization in [12], we keep box constraints explicitly defined as \(l \le x \le u\) within the subproblems’ constraints. The initial estimates for the box constraints’ Lagrange multipliers are set as zero. We start by considering the following problems:

  • Problem 1: \(\min x\) s.t. \(x^2 \le 0\);

  • Problem 2: \(\min x^2\) s.t. \(x+10 \le 0\);

  • Problem 3: \(\min (1-x_1)^2+100 (x_2-x_1^2)^2\) s.t. \((x_1-1)^3-x_2+1 \le 0\), \(x_1+x_2-2 \le 0\), \(-1.5 \le x_1 \le 1.5\), \(-0.5 \le x_2 \le 2.5\);

  • Problem 4: \(\min _{x_1} -x_1\) s.t. \(x_3-x_1-x_2 \le 0\), \(x_1+x_2-1\le 0\), \(x_1 \ge 0\); \(\min _{x_2} (x_2-0.5)^2\) s.t. \(x_3-x_1-x_2 \le 0\), \(x_1+x_2-1\le 0\), \(x_2 \ge 0\); \(\min _{x_3} (x_3-1.5x_1)^2\) s.t. \(0 \le x_3 \le 2\);

  • Problem 5: \(\min _{x_1,x_2} ((x_1-1)^2+(x_2-1)^2)/2\) s.t. \(x_1+x_2+x_3-3 \le 0\), \(x_1\ge 0.1, x_2 \ge 0.1\); \(\min _{x_3} (x_1 x_2 x_3^2)/2\) s.t. \(-x_1^2 x_3^2+0.5\le 0\), \(-x_2^2 x_3^2+0.5 \le 0\), \(x_3 \ge 0.1\);

  • Problem 6: The QEP defined by \(f(x,y)=\langle Px+Qy+q,y-x\rangle\) where

    $$\begin{aligned} q &= \left[ {\begin{array}{c} 1 \\ -2 \\ -1 \\ 2 \\ -1 \\ \end{array} } \right] , \;\; P = \left[ {\begin{array}{ccccc} 3.1 &{}\quad 2 &{}\quad 0 &{}\quad 0 &{}\quad 0 \\ 2 &{}\quad 3.6 &{}\quad 0 &{}\quad 0 &{}\quad 0 \\ 0 &{}\quad 0 &{}\quad 3.5 &{}\quad 2 &{}\quad 0 \\ 0 &{}\quad 0 &{}\quad 2 &{}\quad 3.3 &{}\quad 0 \\ 0 &{}\quad 0 &{}\quad 0 &{}\quad 0 &{}\quad 3 \\ \end{array} } \right] \;\; {\text { and }} \\ Q &= \left[ {\begin{array}{ccccc} 1.6 &{}\quad 1 &{}\quad 0 &{}\quad 0 &{}\quad 0 \\ 1 &{}\quad 1.6 &{}\quad 0 &{}\quad 0 &{}\quad 0 \\ 0 &{}\quad 0 &{}\quad 1.5 &{}\quad 1 &{}\quad 0 \\ 0 &{}\quad 0 &{}\quad 1 &{}\quad 1.5 &{}\quad 0 \\ 0 &{}\quad 0 &{}\quad 0 &{}\quad 0 &{}\quad 2 \\ \end{array} } \right] , \end{aligned}$$

    and K(x) is described by \(g_i(x,y)=1-y_i-\sum _{1\le j \le 5, j \ne i} x_j\), \(i \in \{1,\ldots ,5\}\).

Problems 1–3 are optimization problems, while problems 4–5 are GNEPs from [19, 23], respectively. All problems are rewritten as QEPs so that our algorithm can be applied. Problem 6 is a genuine QEP from [17]. In all examples, a solution is found which satisfies the AKKT-QEP condition. The results are shown in Table 1, where the first column is the problem number, the second and third columns are respectively the primal and dual starting points, while the fourth and fifth columns are respectively the primal and dual solutions found. The sixth column is the number of Augmented Lagrangian iterations while the last column is the Euclidean norm of the KKT residue at the primal-dual solution found.

Table 1 Performance on problems that satisfy AKKT-QEP at a solution

Note that Problem 1 is the only one where the KKT condition does not hold at a solution. This is indicated by the final Lagrange multiplier estimate \(\lambda ^*=84.47\), which is expected to diverge to infinity. In order to highlight this issue we run this problem again considering \(\epsilon =10^{-8}\) and we obtained \(x^*={-5.91 \times 10^{-5}}\), \(\lambda ^*={8447.28}\) with corresponding norm of the KKT residue as \({3.50 \times 10^{-9}}\) in 26 iterations.

We note that for optimization problems and GNEPs, the sequences generated by our method, as expected, coincide with the sequences generated by our implementations of the corresponding Augmented Lagrangian method for optimization [12] and GNEPs [14]. To keep the coherence between the comparison, we also used the minimization of the KKT residue in the resolution of the subproblems in all formulations. That is, using the same strategy when solving the subproblems, our method successfully generalizes the Augmented Lagrangian for these classes of problems.

Now, let us turn our attention to the following examples, where no solution satisfies AKKT-QEP:

  • Problem 7: The QEP defined by Example 1, that is \(f(x,y)=-x+y\) and \(g(x,y)= \frac{1}{2}(x-y)^2\);

  • Problem 8: \(\min _{x_i} x_i\) s.t \(\frac{1}{2}(x_1-x_2)^2 \le 0\), \(i \in \{1,2\}\);

  • Problem 9: \(\min _{x_1} x_1\) s.t \(x_1 x_2+x_1^4/4 \le 0\); \(\min _{x_2} -x_2\) s.t \(x_2 x_1^3+x_2^2/2 \le 0\).

In order to run the algorithm for these problems, one must allow accepting a subproblem solution even when an \(\epsilon _k\)-KKT-QEP point is not found by the subproblem solver (as these points do not exist near a solution for sufficiently small \(\epsilon _k\)). In that case, the subproblem solver is stopped when lack of progress is detected in two consecutive iterations. The same criterion is used for stopping the Augmented Lagrangian iterations. In Table 2 we present the numerical results when using this strategy.

Table 2 Performance when AKKT-QEP does not hold at a solution

For Problem 7, since all \(x\in {\mathbb {R}}\) is a solution and the KKT residue is the same for all \(\lambda \ge 0\), the algorithm is stopped after one or two iterations, depending on whether \(\lambda ^0\) is nonnegative or not. A more interesting formulation of a very similar problem is given by the GNEP described in Problem 8, where solutions are of the form \((x,x)\in {\mathbb {R}}^2\) [14]. Once again we observed that the sequence generated accumulates near a solution in a reasonable number of iterations, even though our theory does not cover these problems.

Note that, for Problem 8, the gradient of the Lagrangian is given by:

$$\begin{aligned} \nabla _{y} f(x, x) + \sum _{i\in \{1,2\}} \lambda _{i} \nabla _{y} g_{i} (x, x)=\left( \begin{array}{c} 1+\lambda _1(x_1-x_2)\\ 1-\lambda _2(x_1-x_2) \end{array}\right) , \end{aligned}$$
(41)

and since \(\lambda _1\) and \(\lambda _2\) are nonnegative, one of the two coordinates in (41), and so the norm of the KKT residue, is larger than or equal to 1 for any pair \((x, \lambda )\). On the other hand, given \(a \in {{\mathbb {R}}}\), taking

$$\begin{aligned} x^k = (a+1/k,a) \, {\text { and }} \, \lambda ^k=(0,k),\,\forall k, \end{aligned}$$

we have that \(\{x^k\}\) converges to the solution (aa) and the norm of the KKT residue evaluated at \((x^k,\lambda ^k)\) converges to 1, the infimum of such residual. That is, although near a solution there is no sequence such that the KKT residual converges to zero, there is a sequence that converges to the infimum of the KKT residual. Note that this has nothing to do with the exact norm of the KKT residual at the solution, which for a point such that \(x_1=x_2\), is equal to \(\sqrt{2}\) for any \(\lambda \ge 0\).

We exploit this phenomenon again in Problem 9 from [14], where a similar situation occurs. However, in this case, the feasible set is not a singleton whenever the other players’ decision is not null. Fixing \(x_2\), the best response of player 1’s problem is \(x_1^*=\min \{\root 3 \of {-4x_2},0\}\) and the second player’s solution, fixed \(x_1\), is \(x_2^*=\max \{-2x_1^3,0\}\). This means that the origin is the unique solution of the GNEP, although the feasible set is much larger.

In this problem, we have that

$$\begin{aligned} \nabla _{y} f(x, x) + \sum _{i\in \{1,2\}} \lambda _{i} \nabla _{y} g_{i} (x, x)=\left( \begin{array}{r} 1+\lambda _1(x_1^3+x_2)\\ -1+\lambda _2(x_1^3+x_2) \end{array}\right) , \end{aligned}$$

and once again we have that the norm of the KKT residue is at least 1 for any pair \((x, \lambda )\). Moreover, one can obtain points arbitrarily close to (0, 0) with the norm of the KKT residue as close to 1 as desired (taking, for instance, \(x^k=((2k)^{-1/3},(2k)^{-1})\) and \(\lambda ^k=(0,k)\) for all k). In our implementation, although we could not find the solution, for \(x^0=(-1,0)\) the algorithm converged to a feasible point such that the norm of the KKT residue converged to 1, the infimum of the KKT residue. However, an initial point with this characteristic was hard to find. Typically, the sequence generated would have a KKT residue converging to \(\sqrt{2}\), which is the punctual value of the KKT residue at any feasible point. The fact that the KKT residue is the same at all feasible points may justify attraction of the method to any feasible point when considering minimizing the KKT residue.

On experiencing with these problems we noted that even though a solution \(\bar{x}\) does not satisfy AKKT-QEP in these examples, that is, no sequence \(x^k\rightarrow \bar{x}\) and \(\{\lambda ^k\}\subset {\mathbb {R}}^m_+\) exist such that the limits in the definition of AKKT-QEP are equal to zero, there are sequences such that these limits are as small as possible.

It may be the case that such sequences always exist around a solution in general for any QEP, but unfortunately we were not able to prove or disprove this fact. Moreover, it would be interesting to develop algorithms that generate such types of sequences. It seems that in these cases the norm used to declare a subproblem solution would influence the result obtained by the main algorithm. An analysis of these points would also be interesting.

Finally, inspired by the convergence results with AKKT-QEP sequences in the weak sense (see the comment after the Proof of Theorem 4), we implemented the algorithm allowing sequences \(x^k\ne y^k\). For this, when solving the subproblems, we minimize the KKT residue with x and y uncoupled, and with the additional constraint \(x = y\). In this way, one may consider slightly different sequences, thus obtaining an AKKT-QEP sequence in the weak sense.

For Problems 1–6, the results obtained were the same as in the original implementation. In Problem 7, the new strategy reduced the KKT residue of the QEP, measured with an additional constraint \(x=y\), to 0.05 against 1 from the original implementation. This was done after 3 iterations, but from there we had no further progress and the algorithm did not reach the desired precision. For Problem 8, the reduction of the KKT residue was much bigger, from \(1.79 \times 10^7\) to 1.42, however, once again the algorithm did not reach the desired precision and no further progress occurs after 4 iterations. For Problem 9, after 13 iterations we found a solution with the same KKT residue from the previous implementation, which satisfies exactly the \(x = y\) constraint. We believe that further studies over this strategy might be important in the future.

7 Conclusion

In this paper we described an Augmented Lagrangian method for QEPs, where we proved that it tends to find feasible limit points in the sense that an Approximate-KKT-QEP point is found for an auxiliary feasibility QEP. When a limit point is feasible, an Approximate-KKT-QEP point for the original problem is found. We also discuss in some details the notion of an approximate stationary point in the context of QEPs, where we showed that, differently from the case of nonlinear programming, the KKT-QEP residual can not be made arbitrarily small near any solution of a general QEP. Nonetheless, we were able to prove that feasible limit points of the sequence generated by the Augmented Lagrangian method are true KKT-QEP points under a new weak condition that we call Weak Cone Continuity Property (WCCP), which, surprisingly, is not even a constraint qualification.

The difficulties underlying the possibility of dealing with non-convex problems is somewhat subsumed in the assumption that the Augmented Lagrangian subproblems can be solved, at least approximately. Hence, we also provided a detailed discussion on several classes of problems where these subproblems can be properly solved, in the sense that they yield monotone or pseudomonotone equilibrium problems.

On the other hand, when the subproblems cannot be solved to an arbitrary precision for the KKT residue, we observed that the solutions can be approximated by sequences where the KKT residue tends to be minimized. This may lead to the definition of a new necessary optimality condition, as well as to the development of an algorithm associated with it, which will be subject of our future research.

Another question raised in this work and that should be investigated in the future is the possibility to explore the optimality conditions of a QEP using uncoupled variables x and y but with the additional constraint \(x = y\). This would increase the number of variables in the problem but would avoid the fact that the AKKT-QEP is not a optimality condition. Other formulations that ensure that \(x = y\) could also be used. Perhaps a constraint that brings x and y to the same value slowly will be advantageous in some cases. We believe that not only the Augmented Lagrangian could be extended to use this strategy, but also, a wide range of methods based on the KKT-QEP conditions could benefit from this technique. More robust numerical tests considering this approach would be welcomed as well.