1 Introduction

The multi-leader–multi-follower game (MLMFG) is a bilevel structured noncooperative game featuring two or more leaders who determine their strategies first, followed by two or more followers who make their choices in a strategic setting. This framework can be viewed as an extension of bilevel optimization, the Stackelberg model, and single-leader–follower games. The inherent complexity of this class arises from the fact that each leader’s (upper-level) optimization problem is constrained by a set of followers’ (lower-level) Nash equilibria, which is computationally challenging to evaluate.

With the growing focus on noncooperative game theory, the MLMFG is also studied in economics and computer science. The MLMFG has often been used to analyze deregulated markets, such as wholesale electricity markets, which consist of several energy companies (leaders) and the independent system operator (follower) [11, 19, 24]. The class is formulated as a multi-leader–single-follower game (MLSFG). More recently, researchers in computation and telecommunication formulated an edge computing model with the MLMFG to achieve the best computation resource allocation [5, 17, 28]. In edge computing, leaders serve as edge computers with medium-scale computational resources, and followers play terminals such as smartphones, security cameras, or robot arms in a factory. For more recent advances and applications of the MLMFG, please refer to the surveys by Aussel and Svensson [2], and Hu and Fukushima [16].

Theoretical studies on MLMFG have taken two main directions. The first approach reformulates the followers’ problems into necessary conditions for optimality, known as the Karush–Kuhn–Tucker (KKT) conditions and incorporates it into the constraint of each leader’s optimization problem; that is, each leader’s problem solves a mathematical program with equilibrium constraints (MPEC) [21]. The resultant problem is referred to as an equilibrium problem with equilibrium constraints (EPEC) [23, 27] and is solved with an MPEC solver, e.g., NLPEC [9]. This approach in terms of MLMFG has also extensively studied over the years [12, 19, 27]. The EPEC approach yields the so-called shared constraints and variables, which coincides the KKT conditions and all followers’ strategies, respectively. However, the EPEC formulation introduces shared variables that can complicate the interpretation within the MLMFG context because the variables may be unilaterally changed by each leader as they desires, though the variables are essentially followers’ strategies.

The second approach addresses this issue by considering the best response of the followers’ noncooperative game given by leaders’ strategies and integrating them into each leader’s optimization problem [10, 13, 15]; we call this technique the best response approach. The problem does not explicitly make the followers’ strategies to appear; that is, the resultant problem is a simple Nash equilibrium problem among leaders. In general the resultant problem still has complicated objective functions, non-smooth and non-convex in each leader’s optimization problem, but this approach allows us to adopt a usual technique used for solving Nash equilibrium problems by smoothing the followers’ response.

The smoothing method for response functions have particularly been a focus in MPEC over the decades. To the best of our knowledge, Facchinei et al. [7] first considered the smoothing method for the response function in the MPEC, which means that the response function is characterized by the lower-level equilibrium constraint parametrized by the upper-level variable. They showed the convergence to the Clarke stationary point of the MPEC as the smoothing parameter converges to 0. Chen and Fukushima [4] then proposed the same method for the MPEC where the lower-level equilibrium constraint is a P-matrix linear complementarity constraint, and they showed the convergence to the Bouligand stationary point of the MPEC. Hu and Fukushima [14] extended this approach to EPEC, confirming convergence to the Bouligand stationary point, which satisfies the Bouligand stationarity for each MPEC. To date, however, no studies have confirmed such convergence for MPEC and EPEC with nonlinear complementarity constraints at the lower-level.

To the best our knowledge, the best response approach in the class of the MLMFG has only been studied in the case of quadratic games where the followers’ response is written in closed form by Hu and Fukushima [13, 15] and Herty et al. [10]. They demonstrated the existence of the leader–follower Nash equilibrium, where no one has incentive to change their strategy in both levels, for the quadratic game; each player solves a convex quadratic programming problem. Hu and Fukushima [13] considered the quadratic game when one follower solves equality constrained convex quadratic programming. In [15], they then considered the same class of the MLMFG under uncertainty and demonstrated the existence and uniqueness of robust Nash equilibrium. Herty et al. [10] extended the class to which one follower solves linear inequality constrained convex quadratic programming, and they proposed the smoothing technique of the follower’s response function.

However, there are many cases that cannot be formulated as a quadratic game in real-world applications, and in such a case, the followers’ response may no longer be obtained explicitly. For example, as often used in micro economics, the utility function is often characterized by a logarithm or an exponential function. Not only in economics but in optimal resource allocation in edge computing, Lyu et al. [22] used log-utility function for follower’s optimization. Moreover, in the blockchain based cloud/edge computing network, the utility function for followers is characterized by a fraction, e.g., Xiong et al. [28].

Currently, the study on the existence of the equilibria in the MLMFG is very limited with the best response approach because demonstrating the existence of leader–follower Nash equilibrium essentially requires that the each leader’s problem is convex. However, identifying the convexity of the problem often requires the response function to be explicitly written, which can be very difficult except in special cases as we introduced above [10, 13, 15].

In this paper, we propose the best response approach for a more general class of MLMFG with a smoothing method based on Facchinei et al. [7]. Using this technique allows MLMFG to be solved by a single-level (differentiable) variational inequality regarding the leaders’ Nash game. Meanwhile, in this class there may not exist the (global) leader–follower Nash equilibrium, and also if it exists, finding the equilibrium is NP-hard in general since the objective function of each leader is not convex. Hence, we concentrate on a weaker concept of the equilibrium, stationary Nash equilibrium, as the first-order condition for the local leader–follower Nash equilibrium. Then we demonstrate that the solution of the (smoothed) approximated variational inequality converges to the stationary Nash equilibrium of MLMFG as the smoothing parameter gets 0. We also report the results of numerical experiments conducted and illustrate the behavior of the proposed method.

In summary, our contributions bridge the following gaps in existing studies:

  1. 1.

    Unlike [10, 13], which focus on cases with explicitly calculated follower(s)’ response, we extend their framework to cases where the explicit form of follower(s)’ response is not analytically obtainable;

  2. 2.

    In contrast to [13], which considers only equality constraints in the followers’ optimization problems, our paper considers cases involving nonlinear inequality constraints;

  3. 3.

    This paper may be regarded as the first to demonstrate convergence to the Bouligand stationary point in a response-based smoothing method for EPECs with nonlinear complementarity constraints associated with MLMFG.

The remainder of this paper is organized as follows: Sect. 2 outlines mathematical concepts and basic noncooperative game theory used in our approach. Section 3 describes the MLMFG and its solution concepts. Section 4 introduces a smoothing method for the MLMFG and then analyzes its convergence to stationary Nash equilibrium. Section 5 reports on numerical experiments that illustrate the effectiveness of our proposed method with a toy example. Finally, Sect. 6 offers some concluding remarks.

2 Preliminaries

This section provides some fundamental concepts about convex analysis and Nash equilibrium problems. Throughout this paper we use the following notations: Let \(F:I\!\!R^n\rightarrow I\!\!R^m\) be differentiable, \(\nabla F(x):=[\nabla F_1(x),\dots ,\nabla F_m(x)]\) is the transposed Jacobian matrix of F at \(x \in I\!\!R^n\); we simply call \(\nabla F(x)\) the Jacobian matrix of F(x). For vectors \(a\in I\!\!R^n\) and \(b\in I\!\!R^n\), \(a\perp b\) denotes \(a^\top b=0\).

Definition 2.1

([6, Definition 2.6.1]) The (transposed) generalized Jacobian of \(F:I\!\!R^n\rightarrow I\!\!R^m\) at x, denoted as \(\partial F(x)\), is the convex hull of all \(n\times m\) matrices W obtained as the limit of a sequence of the form \(\nabla F(x^k)\), where \(x^k\rightarrow x\) and \(x^k\in \mathcal {D}_F\). Here, \(\mathcal {D}_F\subset I\!\!R^n\) is the set of which F is differentiable.

Symbolically, one has

$$\begin{aligned} \partial F(x)=\textrm{conv}\left\{ \lim _{x^k\rightarrow x}\nabla F(x^k)\ \Bigg |\ x^k\in \mathcal {D}_F\right\} , \end{aligned}$$

where \(\textrm{conv}\) denotes the convex hull of a set.

For a real-valued function \(\psi :I\!\!R^n\rightarrow I\!\!R\), a directional derivative \(\psi '(x;d)\) of \(\psi \) at \(x\in I\!\!R^n\) in the direction \(d\in I\!\!R^n\) is defined to be

$$\begin{aligned} \psi '(x;d):=\lim _{\tau \downarrow 0}\frac{f(x+\tau d)-f(x)}{\tau }, \end{aligned}$$
(1)

when the limit exists. The Clarke generalized directional derivative \(\psi ^\circ (x;d)\) of the function \(\psi \) at x in the direction \(d\in I\!\!R^n\) is defined as

$$\begin{aligned} \psi ^\circ (x;d):=\underset{y\rightarrow x,\tau \downarrow 0}{\lim \sup }\frac{\psi (y+\tau d)-\psi (y)}{\tau }. \end{aligned}$$

The regularity of a real-valued function is defined as follows.

Definition 2.2

([6, Definition 2.3.4]) A function \(\psi :I\!\!R^n\rightarrow I\!\!R\) is regular at \(x\in I\!\!R^n\) if, for every \(d\in I\!\!R^n\), the directional derivative \(\psi '(x;d)\) exists and satisfies

$$\begin{aligned} \psi '(x;d)=\psi ^\circ (x;d). \end{aligned}$$

Moreover, a vector-valued function \(\varPsi :I\!\!R^n\rightarrow I\!\!R^m\) is regular if each element of the function \(\psi _i\), \(i=1,\dots ,m\), is regular.

The tangent cone \({{\mathcal {T}}}_X(x)\) of \(X\subset I\!\!R^n\) at x is defined by

$$\begin{aligned} {\mathcal {T}}_X(x):=\left\{ d\in \Re ^n\ \Bigg |\ d=\lim _{\nu \rightarrow \infty }\alpha _\nu (x^\nu -x),\lim _{\nu \rightarrow \infty }x^\nu =x,x^\nu \in X,\alpha _\nu \ge 0,\nu =1,2,\dots \right\} , \end{aligned}$$

and the normal cone \({{\mathcal {N}}}_X(x)\) of \(X\subset I\!\!R^n\) at x is defined as the set of points v if there exist sequences \(\{x^k\}\subset X\) and \(\{v^k\}\) with

$$\begin{aligned} x^k\rightarrow x,\ v^k\rightarrow v,\ v^k\in {{\mathcal {T}}}_X(x^k)^\circ \ \ \forall k, \end{aligned}$$

where \({{\mathcal {T}}}_X(x)^\circ :=\{y\in I\!\!R^n\mid \langle y,z\rangle \le 0\ \ \forall z\in {{\mathcal {T}}}_X(x)\}\) denotes the polar cone of \({{\mathcal {T}}}_X(x)\).

Definition 2.3

([3, Definition 4.6.3]) The set \(X\subset I\!\!R^n\) is regular at \(x\in X\) if

$$\begin{aligned} {{\mathcal {N}}}_X(x)={{\mathcal {T}}}_{X}(x)^\circ . \end{aligned}$$

Furthermore, X is (simply called) regular if X is regular at all \(x\in X\).

The following lemma is a sufficient condition for the regularity of a set.

Lemma 2.1

([3, Proposition 4.6.3]) If \(X\subset I\!\!R^n\) is convex, then X is regular, and the normal cone, \({{\mathcal {N}}}_X(x)={{\mathcal {T}}}_X(x)^\circ \) under regularity, of X at \(x\in X\) is equivalent to

$$\begin{aligned} {{\mathcal {N}}}_X(x)=\{d\in I\!\!R^n\mid \langle d,z-x\rangle \le 0\quad \forall z\in X\}. \end{aligned}$$

Next we consider an N-player Nash equilibrium problem (NEP). Player labeled with \(\nu \in \{1,\dots ,N\}\) has \(x^\nu \in I\!\!R^{n_\nu }\) as the strategy vector and \(X^\nu \subset I\!\!R^{n_\nu }\) as the strategy set. Player \(\nu \) solves the following optimization problem:

$$\begin{aligned} \min _{x^\nu \in I\!\!R^{n_\nu }}\theta ^\nu (x^\nu ,x^{-\nu }) \qquad \text {s.t. } x^\nu \in X^\nu , \end{aligned}$$
(2)

where \(x^{-\nu }:=(x^1,\dots ,x^{\nu -1},x^{\nu +1},\dots ,x^N)\in I\!\!R^{n-n_\nu }\) denotes a tuple of strategies except player \(\nu \)’s one. Let \(n:=n_1+\dots +n_N\) and the function \(\theta ^\nu :I\!\!R^n\rightarrow I\!\!R\) be continuously differentiable.

Definition 2.4

(Nash equilibrium) A tuple of strategies \(x^*:=(x^{*,1},\dots ,\) \(x^{*,N})\in X:=X^1\times \dots \times X^N\) is called a Nash equilibrium if for each \(\nu \),

$$\begin{aligned} x^{*,\nu }\in \arg \min _{x^\nu \in X^\nu }\theta ^\nu (x^\nu ,x^{*,-\nu }). \end{aligned}$$

In other words, a Nash equilibrium is a tuple of strategies in which no one can reduce their cost unilaterally. However, a Nash equilibrium does not always exist in general, and it is difficult to find even if it exists.

The NEP may be characterized by a variational inequality (VI). Let \(\theta ^\nu \) be differentiable, and define \(X:=X^1\times \dots \times X^N\) and

$$\begin{aligned} F(x):=\left[ \begin{array}{c}\nabla _{x^1}\theta ^1(x^1,x^{-1})\\ \vdots \\ \nabla _{x^N}\theta ^N(x^N,x^{-N}) \end{array}\right] . \end{aligned}$$
(3)

Proposition 2.1

([8, Proposition 1.4.2]) Assume that \(\theta ^\nu (\cdot ,x^{-\nu })\) is convex for any \(x^{-\nu }\in I\!\!R^{n-n_\nu }\), and \(X^\nu \subset I\!\!R^{n_\nu }\) is nonempty, closed, and convex. Then, a tuple of strategies \(x^*\) is a Nash equilibrium if and only if \(x^*\) is a solution to the following VI:

$$\begin{aligned} \langle F(x^*), x-x^*\rangle \ge 0\quad \forall x\in X. \end{aligned}$$
(4)

Notice that even if each player’s optimization (2) is convex, the existence of Nash equilibrium is not guaranteed. In other words, the solution to variational inequality (4) may not exist in general. The following proposition ensures the existence of Nash equilibrium in an N-player NEP.

Proposition 2.2

(Aubin [1]) Suppose that the assumptions of Proposition 2.1 hold. Assume that \(X^\nu \subset I\!\!R^{n_\nu }\) is compact for all \(\nu \in \{1,\dots ,N\}\). Then, the Nash equilibrium of the NEP in which player \(\nu \) solves (2) exists.

The uniqueness of Nash equilibrium is stated as below.

Proposition 2.3

Suppose that the assumptions of Proposition 2.2 hold. Assume that the mapping \( F\ : R^n\rightarrow R^n\) defined in (3) is strictly monotone on \(X\subset I\!\!R^n\), i.e.,

$$\begin{aligned} \langle F(x)-F(x'),x-x'\rangle >0\quad \forall x,x'\in X \text { such that } x\ne x'. \end{aligned}$$

Then, the solution to VI (4) is unique, and it is a Nash equilibrium.

Proof

[8, Theorem 2.3.3] ensures that the strictly monotone VI has at most one solution. By the existence result from Proposition 2.2, the Nash equilibrium uniquely exists.\(\square \)

3 The Multi-Leader–Multi-Follower Games

Consider an MLMFG of N leaders and M followers. Let \(X^\nu \subset I\!\!R^{n_\nu }\) and \(\theta ^\nu :I\!\!R^{n+m}\rightarrow I\!\!R\) be the strategy set and cost function of leader \(\nu \in \{1,\dots ,N\}\), respectively, where \(m:=m_1+\dots +m_M\). Let \(Y^\omega (x)\subset I\!\!R^{m_\omega }\) and \(\gamma ^\omega :I\!\!R^{n+m}\rightarrow I\!\!R\) be the strategy set and cost function of follower \(\omega \in \{1,\dots ,M\}\), respectively.

For a fixed all followers’ strategies \(y\in I\!\!R^m\), determined in the future, leader \(\nu \) solves the following optimization problem:

$$\begin{aligned} \min _{x^\nu \in I\!\!R^{n_\nu }}\theta ^\nu (x^\nu ,x^{-\nu },y)\qquad \text {s.t. }x^\nu \in X^\nu . \end{aligned}$$
(5)

After all leaders simultaneously determine their strategies \(x\in X:=X^1\times \dots \times X^N\), follower \(\omega \) solves the following optimization problem:

$$\begin{aligned} \min _{y^\omega \in I\!\!R^{m_\omega }}\gamma ^\omega (x,y^\omega ,y^{-\omega })\qquad \text {s.t. } y^\omega \in Y^\omega (x). \end{aligned}$$
(6)

We can also consider the case in which the constraint \(Y^\omega (x)\) of follower \(\omega \)’s problem also depends on \(y^{-\omega }\), i.e., \(Y^\omega (x,y^{-\omega })\), referred to as a generalized Nash equilibrium problem (GNEP). Finding an equilibrium of GNEP, however, is also technical even in a single-level Nash game, which is not the scope of this paper. Let \({{\mathcal {S}}}(x)\) be a set of Nash equilibria in followers’ Nash game. The equilibrium concept of the MLMFG is considered as follows [16].

Definition 3.1

(Leader–follower Nash equilibrium) A tuple of leaders’ and followers’ strategies \((x^*,y^*)=\) \((x^{*,1},\dots ,x^{*,N},y^{*,1},\dots ,y^{*,M})\in X\times {{\mathcal {S}}}(x^*)\) is referred to as a pessimistic leader–follower (LF) Nash equilibrium if the following conditions simultaneously hold:

$$\begin{aligned} x^{*,\nu }\in \underset{x^\nu \in X^\nu }{\arg \min }\max _{y\in {{\mathcal {S}}}(x^\nu ,x^{*,-\nu })}\theta ^\nu (x^\nu ,x^{*,-\nu },y)\quad \forall \nu \in \{1,\dots ,N\}. \end{aligned}$$
(7)

A tuple of strategies \((x^*,y^*)=(x^{*,1},\dots ,x^{*,N},y^{*,1},\dots ,y^{*,M})\in X\times {{\mathcal {S}}}(x^*)\) is referred to as an optimistic leader–follower (LF) Nash equilibrium if the following conditions simultaneously hold:

$$\begin{aligned} x^{*,\nu }\in \underset{x^\nu \in X^\nu }{\arg \min }\min _{y\in {{\mathcal {S}}}(x^\nu ,x^{*,-\nu })}\theta ^\nu (x^\nu ,x^{*,-\nu },y)\quad \forall \nu \in \{1,\dots ,N\}. \end{aligned}$$
(8)

If \({{\mathcal {S}}}(x)\) is a singleton for every x, i.e, there exists a unique followers’ Nash equilibrium for every given leaders’ strategies, both equilibrium concepts are equivalent; hence we simply call the equilibrium point a leader–follower (LF) Nash equilibrium.

Unfortunately, the pessimistic LF Nash equilibrium may not exist even if \(\theta ^\nu \) is continuous and \(X^\nu \) is compact since

$$\begin{aligned} \varphi (x^\nu ,x^{-\nu })=\max _{y\in {{\mathcal {S}}}(x^\nu ,x^{-\nu })}\theta ^\nu (x^\nu ,x^{-\nu },y) \end{aligned}$$

is not necessarily lower semicontinuous with respect to \(x^\nu \), which implies that there may not exist the minimizers of \(\varphi (x^\nu ,x^{-\nu })\). In this paper, we impose that \({{\mathcal {S}}}(x)\) is a singleton for every \(x\in X\) to avoid such a complicated situation; that is, y is uniquely determined depending on x. Then, to emphasize that y is a function of x, we rewrite \({{\mathcal {S}}}(x)\) as y(x) and call it a response function. The sufficient condition for the uniqueness of followers’ Nash equilibrium will be given later.

Plugging y(x) into each leader’s problem (5) leads that the MLMFG comprised of (5)–(6) can be reformulated to the following single-level Nash equilibrium problem among leaders: Leader \(\nu \in \{1,\dots ,N\}\) solves

$$\begin{aligned} \min _{x^\nu \in I\!\!R^{n_\nu }}\quad \varTheta ^\nu (x^\nu ,x^{-\nu }):=\theta ^\nu (x^\nu ,x^{-\nu },y(x^\nu ,x^{-\nu }))\qquad \text {s.t.}\quad x^\nu \in X^\nu . \end{aligned}$$
(9)

We call (9) a reduced problem of (5), and the single-level game in which leader \(\nu \) solves (9) is defined as \(\textrm{NEP}(X,\{\varTheta ^\nu \}_{\nu =1}^N)\). By the definition of response function y(x), the following statement immediately holds.

Proposition 3.1

Let \(x^*\in X\) be a Nash equilibrium of \(\textrm{NEP}(X,\{\varTheta ^\nu \}_{\nu =1}^N)\). Then, \((x^*,y(x^*))\) is an LF Nash equilibrium of the MLMFG.

Proposition 3.1 indicates that under the uniqueness of followers’ Nash equilibrium y(x), it is enough to only consider \(\textrm{NEP}(X,\{\varTheta ^\nu \}_{\nu =1}^N)\) instead of the MLMFG (5)–(6). By utilizing the reduction technique into a single-level NEP, the existence of LF Nash equilibrium can be stated as below.

Proposition 3.2

Assume the following conditions:

  • For any tuple of leaders’ strategies \(x\in X\), there exists a unique lower-level response y(x);

  • For any \(\nu \), the leaders’ objectives \(\theta ^\nu \) and the best response function y are continuous;

  • For any \(\nu \), the strategy set \(X^\nu \) is nonempty, convex and compact;

  • For any \(\nu \), the composition function \(\varTheta ^\nu (x^\nu ,x^{-\nu })=\theta ^\nu (x^\nu ,x^{-\nu },y(x^\nu ,x^{-\nu }))\) is convex with respect to \(x^\nu \) for any fixed \(x^{-\nu }\).

Then, an LF Nash equilibrium of the MLMFG exists.

Proof

The assertion is immediately shown by Proposition 2.2.\(\square \)

Regrettably, verifying the convexity of \(\varTheta ^\nu \) is intrinsically hard since the lower-level response y(x) may not be written explicitly in general; in some special cases, however, it is possible, e.g., see Hu and Fukushima [15], Sherali [26], and Herty et al. [10].

These facts lead that the existence of Nash equilibrium is not guaranteed, and if it exists, finding it is NP-hard in general. Hence, we concentrate on a weaker concept of Nash equilibrium as stated below. The following concept is derived from a Bouligand stationarity for a mathematical program with equilibrium constraints (MPEC) [21, Lemma 4.2.5] and an equilibrium problem with equilibrium constraints (EPEC) [14]; we extended the concept to \(\textrm{NEP}(X,\{\varTheta ^\nu \}_{\nu =1}^N)\).

Definition 3.2

(Bouligand stationary Nash equilibrium) A tuple of leaders’ strategies \(x^*\in X\) is called a Bouligand (B-) stationary Nash equilibrium of \(\textrm{NEP}(X,\{\varTheta ^\nu \}_{\nu =1}^N)\) if, for every \(\nu \in \{1,\dots ,N\}\), \(x^{*,\nu }\in I\!\!R^{n_\nu }\) satisfies

$$\begin{aligned}&(\varTheta ^\nu )'(x^{*,\nu },x^{*,-\nu };d^\nu )=\nabla _{x^\nu }\theta ^\nu (x^{*,\nu },x^{*,-\nu },y(x^{*,\nu },x^{*,-\nu }))^\top d^\nu \nonumber \\&\quad +y'_{x^\nu }(x^{*,\nu },x^{*,-\nu };d^\nu )^\top \nabla _y\theta ^\nu (x^{*,\nu },x^{*,-\nu },y(x^{*,\nu },x^{*,-\nu }))\!\ge \! 0\quad \forall d\quad ^\nu \!\in \!{{\mathcal {T}}}_{X^\nu }(x^{*,\nu }), \end{aligned}$$
(10)

where \(y'_{x^\nu }(x^\nu ,x^{-\nu };d^\nu )\in I\!\!R^m\) is a partial directional derivative of y with respect to \(x^\nu \) along the direction \(d^\nu \in I\!\!R^{n_\nu }\) in the sense of (1).

We also define a weaker concept of stationary Nash equilibrium, which is derived from a Clarke stationarity in nonsmooth analysis [3].

Definition 3.3

(Clarke stationary Nash equilibrium) A tuple of leaders’ strategies \(x^*\in X\) is called a Clarke (C-) stationary Nash equilibrium of \(\textrm{NEP}(X,\{\varTheta ^\nu \}_{\nu =1}^N)\) if, for every \(\nu \in \{1,\dots ,N\}\), \(x^{*,\nu }\in I\!\!R^{n_\nu }\) satisfies:

$$\begin{aligned} 0\in \partial _{x^\nu }\varTheta ^\nu (x^{*,\nu },x^{*,-\nu })+{{\mathcal {T}}}_{X^\nu }(x^{*,\nu })^\circ . \end{aligned}$$

4 Smoothing Methods and its Convergence to Stationary Nash Equilibrium

Since the response function y(x) is nonsmooth, it is difficult to obtain the B-/C-stationary Nash equilibrium of \(\textrm{NEP}(X,\{\varTheta ^\nu \}_{\nu =1}^N)\) numerically. To overcome this, we propose a smoothing method and show that as the smoothing parameter decreases, the sequence of stationary Nash equilibria to the smoothed NEP converges to the B-/C-stationary Nash equilibrium of \(\textrm{NEP}(X,\{\varTheta ^\nu \}_{\nu =1}^N)\).

4.1 Smoothing Method

Hereinafter, the strategy set \(Y^\omega (x)\subset I\!\!R^{m_\omega }\) of follower \(\omega \in \{1,\dots ,M\}\), is defined by

$$\begin{aligned} Y^\omega (x):=\{y^\omega \in I\!\!R^{m_\omega }\mid g^\omega (x,y^\omega )\le 0\}, \end{aligned}$$

where \(g^\omega (x,\cdot ):I\!\!R^{m_\omega }\rightarrow I\!\!R^{l_\omega }\). Let \(l:=l_1+\dots +l_M\). Note that we omit the equality constraints in the model since it is not essential in the analysis.

In the following, we also assume the conditions stated below.

Assumption 4.1

For all \(\nu \in \{1,\dots ,N\}\), the following conditions hold:

  1. (L1)

    \(\theta ^\nu \) is continuously differentiable;

  2. (L2)

    The set \(X^\nu \subset I\!\!R^{n_\nu }\) is nonempty and compact.

In addition, for all \(\omega \in \{1,\dots ,M\}\), the following conditions hold:

  1. (F1)

    \(\gamma ^\omega \) and \(g^\omega \) are sufficiently smooth, and \(\gamma ^\omega (x,\cdot ,y^{-\omega })\) is convex for arbitrary given x and \(y^{-\omega }\);

  2. (F2)

    \(Y^\omega (x)\) is nonempty, convex and compact;

  3. (F3)

    For any given \(x\in X\) and any feasible solution \(y^\omega \in Y^\omega (x)\), the linear independence constraint qualification (LICQ) for the inequality constraint \(g^\omega (x,y^\omega )\le 0\) holds.

Let

$$\begin{aligned} G(x,y):=\left[ \begin{array}{c} \nabla _{y^1}\gamma ^1(x,y^1,y^{-1}) \\ \vdots \\ \nabla _{y^M}\gamma ^M(x,y^M,y^{-M}) \end{array} \right] ,\quad Y(x):=Y^1(x)\times \dots \times Y^M(x). \end{aligned}$$

Under the convexity assumption on each follower’s optimization problem (6), the condition for the Nash equilibrium in the followers’ Nash game is equivalently reformulated as the following VI by Proposition 2.1:

$$\begin{aligned} \langle G(x,y^*),y-y^*\rangle \ge 0\quad \forall y\in Y(x), \end{aligned}$$
(11)

where \(y^*\in Y(x)\) denotes the Nash equilibrium. In order to ensure the uniqueness of the Nash equilibrium in the followers’ Nash game, i.e., the solution of VI (11), we further assume the following assumptions in this paper.

Assumption 4.2

The Jacobian matrix of the mapping \(G(x,\cdot ):I\!\!R^m\rightarrow I\!\!R^m\) is positive definite for any fixed x.

Remark 4.1

Let us review the assumptions and problem settings used in the previous literature on MLMFG. Hu and Fukushima [13] considered the multi-leader–single-follower quadratic game in which one follower solves the strictly convex quadratic optimization problem with linear equality constraints. In this setting, the optimality condition for the follower is necessary and sufficient, and hence the unique response can be analytically solved; in fact, the follower’s response is linear. Herty et al. [10] also considered the same quadratic game where the follower solves the strictly convex quadratic optimization with a positive diagonal matrix for the quadratic term but the constrains only consists of the componentwise lower bound, i.e., \(y\ge l(x)\), where l(x) is a linear function of x. In the setting, the follower’s unique response is not smooth but can be solved analytically. They applied the smoothing method for follower’s optimality conditions, and they then obtained the smoothed unique response even though the smoothing term is included.

On the other hand, our settings can be seen as a generalization of theirs since we do not assume the detailed structure of \(\gamma ^\omega (x,\cdot ,\cdot )\) or Y(x). Note that the scope of both the literature above is to identify the existence or uniqueness of the LF Nash equilibrium of the MLMFG, but we do not consider the existence of LF Nash equilibrium, though the uniqueness of the Nash equilibrium of the followers’ game is always ensured by Proposition 4.1.

Proposition 4.1

Suppose that (F1)–(F3) in Assumptions 4.1 and 4.2 hold. Then the Nash equilibrium of followers’ Nash game is unique.

Proof

By the convexity of \(\gamma ^\omega (x,\cdot ,y^{-\omega })\) and compactness of \(Y^\omega (x)\) for all \(\omega \), there exists a Nash equilibrium depending on \(x\in X\) by Proposition 2.2. By the convexity of \(Y^\omega (x)\) for all \(\omega \), finding a Nash equilibrium is equivalent to solving VI (11). Assumption 4.2 implies that \(G(x,\cdot )\) is strictly monotone for any fixed x. It then follows from Proposition 2.3 that the solution to the strictly monotone variational inequality is at most one. Therefore the Nash equilibrium of the followers’ Nash game uniquely exists.\(\square \)

Remark 4.2

If Y(x) is not compact, the uniqueness still holds if the mapping \(G(x,\cdot )\) is strongly monotone: There exists \(\sigma >0\) such that

$$\begin{aligned} \langle G(x,y)-G(x,y'),y-y'\rangle \ge \sigma \Vert y-y'\Vert ^2\quad \forall y,y'\in Y(x). \end{aligned}$$

Omitting the follower’s label \(\omega \), we simplify the notations of the followers’ constraint functions as follows:

$$\begin{aligned} g(x,y)=[g_i(x,y)]_{i=1}^l:=[g^\omega (x,y^\omega )]_{\omega =1}^M. \end{aligned}$$

Since \(g^\omega (x,y^\omega )\) is independent of \(y^{-\omega }\) and for any \(y^\omega \) such that \(g^\omega (x,y^\omega )\le 0\), \(y^\omega \) satisfies Assumption 4.1–(F3), the LICQ for the collection of inequality constraints \(g(x,y)\le 0\) in VI (11) still holds. Then the KKT conditions for the VI are written as follows:

$$\begin{aligned}&G(x,y)+\nabla _y g(x,y)\lambda =0,\nonumber \\&g(x,y)+z=0,\nonumber \\&0\le \lambda \perp z\ge 0, \end{aligned}$$
(12)

where \(z\in I\!\!R^{l}\) is a slack variable for the inequality constraint \(g(x,y)\le 0\), and \(\lambda \in I\!\!R^{l}\) represents the Lagrange multiplier. If (12) is incorporated into the constraints of each leader’s optimization problem (5), the resultant problem is referred to as an EPEC. Previous works such as [12, 19] have proposed solution methods for the EPEC associated with the MLMFG.

Now, using a Fischer–Burmeister function (FB-function) \(\phi _0:I\!\!R^2\rightarrow I\!\!R\):

$$\begin{aligned} \phi _0(a,b):=\sqrt{a^2+b^2}-(a+b), \end{aligned}$$

the complementarity condition \(0\le a\perp b\ge 0\) (\(a\in I\!\!R\), \(b\in I\!\!R\)) is equivalent to \(\phi _0(a,b)=0\). Then, using this property, the complementarity \(0\le \lambda \perp z\ge 0\) is rewritten as

$$\begin{aligned} \varPhi _0(\lambda ,z):=\left[ \begin{array}{c} \phi _0(\lambda _1,z_1) \\ \vdots \\ \phi _0(\lambda _l,z_l) \end{array}\right] =0. \end{aligned}$$

Let

$$\begin{aligned} H_0(x,y,z,\lambda ):= \left[ \begin{array}{c} G(x,y)+\nabla _y g(x,y)\lambda \\ g(x,y)+z \\ \varPhi _0(\lambda ,z) \end{array} \right] . \end{aligned}$$

Then, KKT conditions (12) coincide with \(H_0(x,y,z,\lambda )=0\). Hereinafter, let \(w:=(y,z,\lambda )\) and \(H_0(x,w):=H_0(x,y,z,\lambda )\).

Proposition 4.2

Given \(x\in X\), let \(w^*:=(y^*,z^*,\lambda ^*)\) be the zero of the nonlinear equation \(H_0(x,w)=0\). If (F1)–(F3) in Assumptions 4.1 and 4.2 hold, then, \(y^*\) is a Nash equilibrium of the followers’ Nash game, and it is uniquely determined depending on \(x\in X\).

Proof

Since \(w^*=(y^*,z^*,\lambda ^*)\) satisfies \(H_0(x,w^*)=0\), the tuple also satisfies KKT conditions (12) for VI (11). It follows from the convexity of Y(x) and [8, Proposition 1.3.4] that \(y^*\) solves (11), which implies \(y^*\) is a Nash equilibrium of the followers’ game by Proposition 2.1. The uniqueness is ensured from Proposition 4.1. \(\square \)

Given leaders’ strategies \(x\in X\), we denote \(w(x):=(y(x),z(x),\lambda (x))\) as a solution to \(H_0(x,w)=0\). Proposition 4.2 states that we can obtain a Nash equilibrium of followers’ Nash game by solving \(H_0(x,w)=0\). Nevertheless, since \(H_0\) is nonsmooth at which \(z_i=\lambda _i=0\), degenerate point, y(x) is nonsmooth. Consequently, reduced problem (9) is nonsmooth; it is numerically difficult to deal with. To overcome it, we propose a smoothing approximation scheme for the equation.

Given a positive number \(\varepsilon \), the smoothing FB-function \(\phi _\varepsilon :I\!\!R^2\rightarrow I\!\!R\) is defined as

$$\begin{aligned} \phi _\varepsilon (a,b):=\sqrt{a^2+b^2+2\varepsilon ^2}-(a+b). \end{aligned}$$

It is easy to see that \(\phi _\varepsilon \) is continuously differentiable everywhere, and \(\phi _\varepsilon (a,b)\rightarrow \phi _0(a,b)\) (\(\varepsilon \rightarrow 0\)) by continuity.

Replacing \(\phi _0\) with \(\phi _\varepsilon \) in \(H_0\), the perturbed nonlinear system is given by

$$\begin{aligned} H_\varepsilon (x,w)\equiv H_\varepsilon (x,y,z,\lambda ) =0. \end{aligned}$$

Now we delve into some properties of \(H_0\) and \(H_\varepsilon \).

Proposition 4.3

Let \(x\in X\) be fixed. For any \(\varepsilon \ge 0\), if (F1)–(F3) in Assumptions 4.1 and 4.2 hold, then the system \(H_\varepsilon (x,w)=0\) has a unique solution \(w_\varepsilon (x):=(y_\varepsilon (x),z_\varepsilon (x),\lambda _\varepsilon (x))\), and \((z_\varepsilon (x),\) \(\lambda _\varepsilon (x))\) satisfies \(z_\varepsilon (x)>0\) and \(\lambda _\varepsilon (x)>0\) with \([z_\varepsilon (x)]_i[\lambda _\varepsilon (x)]_i=\varepsilon ^2\), where \([z_\varepsilon (x)]_i\) and \([\lambda _\varepsilon (x)]_i\) denote the ith element of the vectors \(z_\varepsilon (x)\) and \(\lambda _\varepsilon (x)\), respectively.

Proof

It suffices to show the claim when \(\varepsilon >0\) since we have proved the statement in the case where \(\varepsilon =0\) in Proposition 4.2. The solvability and uniqueness of the solution to \(H_\varepsilon (x,w)=0\) is proved by Kanzow and Jiang [18, Lemma 3.11]. The latter statement is easily verified.\(\square \)

We next show the nonsingularity of the (generalized) Jacobian matrix of \(H_\varepsilon \) for any \(\varepsilon \ge 0\).

Lemma 4.1

Let \(L\in I\!\!R^{m\times m}\) be a (not necessarily symmetric) positive definite matrix and \(A\in I\!\!R^{m\times l}\) be arbitrary. Suppose that \(\varXi \in I\!\!R^{l\times l}\) and \(H\in I\!\!R^{l\times l}\) are diagonal matrices with negative entries. Then, the matrix

$$\begin{aligned} M:= \left[ \begin{array}{ccc} L &{} A &{} O \\ O &{} I &{} \varXi \\ A^\top &{} O &{} H \end{array} \right] \in I\!\!R^{(m+2l)\times (m+2l)} \end{aligned}$$

is nonsingular.

Proof

It suffices to show that the system of equation \(Mv=0\) has only the trivial solution \(v=0\). Let \(v=(v_1,v_2,v_3)\), and then we have

$$\begin{aligned}&Lv_1+Av_2=0, \end{aligned}$$
(13)
$$\begin{aligned}&v_2+\varXi v_3=0, \end{aligned}$$
(14)
$$\begin{aligned}&A^\top v_1+H v_3=0. \end{aligned}$$
(15)

It follows from (13) that \(v_1=-L^{-1}Av_2\). Since \(\varXi \) is a negative diagonal matrix, \(v_3=-\varXi ^{-1}v_2\) in (14). Substituting them for (15) yields

$$\begin{aligned} (A^\top L^{-1}A+H\varXi ^{-1})v_2=0. \end{aligned}$$

Since \(A^\top L^{-1}A\) is positive semidefinite for any A, and \(H\varXi ^{-1}\) is a diagonal matrix whose diagonal entries are positive, the coefficient matrix \(A^\top L^{-1}A+H\varXi ^{-1}\) is positive definite. This implies \(v_2=0\), and then \(v_3=v_1=0\). We have completed the proof.\(\square \)

The following lemmas are slight modifications of Theorem 3.5 and Lemma 3.12 in Kanzow and Jiang [18].

Lemma 4.2

Suppose that (F1)–(F3) in Assumptions 4.1 and 4.2 hold. Let \(x\in X\) and for \(\varepsilon >0\), \(w^*=(y^*,z^*,\lambda ^*)\) be a solution to \(H_\varepsilon (x,w)=0\). Then, the Jacobian matrix \(\nabla _w H_\varepsilon \) is nonsingular.

Proof

The Jacobian of \(H_\varepsilon \) with respect to \(w=(y,z,\lambda )\) is given as follows:

$$\begin{aligned} \left[ \begin{array}{ccc} L &{} A &{} O \\ O &{} I &{} \varXi \\ A^\top &{} O &{} H \end{array} \right] , \end{aligned}$$

where

$$\begin{aligned} L&:=\nabla _y G(x,y)+\sum _{i=1}^{l}\nabla ^2_{yy}g_i(x,y)\lambda _i, \end{aligned}$$
(16)
$$\begin{aligned} A&:=\nabla _y g(x,y), \nonumber \\ \varXi&:=\underset{i=1,\dots ,l}{\textrm{diag}}\left( \frac{z_i}{\sqrt{(z_i)^2+(\lambda _i)^2+2\varepsilon ^2}}-1\right) =\underset{i=1,\dots ,l}{\textrm{diag}}\left( \frac{z_i}{z_i+\lambda _i}-1\right) ,\nonumber \\ H&:=\underset{i=1,\dots ,l}{\textrm{diag}}\left( \frac{\lambda _i}{\sqrt{(z_i)^2+(\lambda _i)^2+2\varepsilon ^2}}-1\right) =\underset{i=1,\dots ,l}{\textrm{diag}}\left( \frac{\lambda _i}{z_i+\lambda _i}-1\right) . \end{aligned}$$
(17)

Here we use \(z_i>0\), \(\lambda _i>0\), and \(z_i\lambda _i=\varepsilon ^2\). It is obvious that \(\varXi \) and H consist of negative diagonal entries. Since \(\nabla _y G(x,\cdot )\) is positive definite and \(g(x,\cdot )\) is convex, L is positive definite. Hence, applying Lemma 4.1 yields the result.\(\square \)

For a given \(x\in I\!\!R^n\), we define the index sets below:

$$\begin{aligned} \begin{aligned} {{\mathcal {J}}}_{0+}(x)&:=\{i\mid z_i(x)=0<\lambda _i(x)\},\\ {{\mathcal {J}}}_{00}(x)&:=\{i\mid z_i(x)=0=\lambda _i(x)\},\\ {{\mathcal {J}}}_{+0}(x)&:=\{i\mid z_i(x)>0=\lambda _i(x)\}, \end{aligned} \end{aligned}$$

where \(z_i(x)\) and \(\lambda _i(x)\) denote the ith element of z(x) and \(\lambda (x)\).

Lemma 4.3

Suppose that (F1)–(F3) in Assumptions 4.1 and 4.2 hold. For a given \(x\in X\), let \(w^*\) be a solution to \(H_0(x,w)=0\). Assume that the LICQ holds at \(w^*\). Then, the generalized Jacobian matrix \(\partial _{w}H_0(x,w^*)\) is nonsingular.

Proof

The generalized Jacobian matrix of \(H_0(x,w^*)\) with respect to w is given by

$$\begin{aligned} \begin{aligned} \partial _{w}H_0(x,w^*)={\Biggl \{} M=\left[ \begin{array}{ccc} L &{} A &{} O \\ O &{} I &{} \varXi \\ A^\top &{} O &{} H \end{array} \right] \ {\Bigg |}\ L=(16), A=(17),\\ \varXi =\underset{i=1,\dots ,l}{\textrm{diag}}(\xi _i-1),\ H=\underset{i=1,\dots ,l}{\textrm{diag}}(\eta _i-1). {\Biggr \}}, \end{aligned} \end{aligned}$$

where

$$\begin{aligned} \xi _i\in \left\{ \begin{array}{ll} \{0\} &{} \text {if } i\in {{\mathcal {J}}}_{0+}(x) \\ \left[ 0,1\right] &{} \text {if } i\in {{\mathcal {J}}}_{00}(x)\\ \{1\} &{} \text {if } i\in {{\mathcal {J}}}_{+0}(x) \end{array} \right. ,\quad \eta _i\in \left\{ \begin{array}{ll} \{1\} &{} \text {if }i\in {{\mathcal {J}}}_{0+}(x) \\ \left[ 0,1\right] &{} \text {if }i\in {{\mathcal {J}}}_{00}(x)\\ \{0\} &{} \text {if }i\in {{\mathcal {J}}}_{+0}(x) \end{array} \right. \end{aligned}$$

such that \(\xi _i^2+\eta _i^2\le 1\) for all \(i\in {{\mathcal {J}}}_{00}(x)\).

It suffices to show the nonsingularity of M for any \(M\in \partial _{w}H_0(x,w^*)\). We show the the nonsingularity of \(M^\top \) for convenience. Let \(v=(v_1,v_2,v_3)\), and \(M^\top v=0\) is given as follows:

$$\begin{aligned}&L^\top v_1+Av_3=0, \end{aligned}$$
(18)
$$\begin{aligned}&A^\top v_1+v_2=0, \end{aligned}$$
(19)
$$\begin{aligned}&\varXi v_2+Hv_3=0. \end{aligned}$$
(20)

For \(i\in {{\mathcal {J}}}_{0+}(x)\), \(\varXi _i=-1\) and \(H_{i}=0\). Then \([v_2]_i=0\) from (20). For \(i\in {{\mathcal {J}}}_{00}(x)\), since either \(\varXi _{i}\) or \(H_{i}\) is negative, \([v_2]_i[v_3]_i\le 0\) by (20). For \(i\in {{\mathcal {J}}}_{+0}(x)\), \(\varXi _i=0\) and \(H_i=-1\). Then \([v_3]_i=0\) from (20). Summarizing these results yields \((v_2)^\top v_3\le 0\). Then premultiplying (19) with \(v_3\) leads to

$$\begin{aligned} \begin{aligned}&(v_3)^\top A^\top v_1+(v_3)^\top v_2=0\\ \iff&(v_3)^\top A^\top v_1 \ge 0. \end{aligned} \end{aligned}$$

Furthermore, premultiplying (18) with \(v_1\) follows

$$\begin{aligned} (v_1)^\top L^\top v_1+(v_1)^\top Av_3=0. \end{aligned}$$

Since \(L^\top \) is positive definite and \((v_1)^\top Av_3\ge 0\), \(v_1=0\), which implies \(v_2=0\) by (19). In (18), we have

$$\begin{aligned} Av_3=0\iff \sum _{i=1}^l\nabla _y g_i(x,y)v_3=\sum _{i\in {{\mathcal {I}}}_g(x)}\nabla _y g_i(x,y)v_3=0, \end{aligned}$$
(21)

where \({{\mathcal {I}}}_g(x):=\{i\mid g_i(x,y)=0,\ i=1,\dots ,l\}\), and the last equality holds from \([v_3]_i=0\) for \(i\in {{\mathcal {J}}}_{+0}(x)\). By the LICQ assumption and (21), \(v_3=0\). Hence, we have \(v=0\), and this implies that M is nonsingular. This completes the proof.\(\square \)

Summarizing the results of Lemmas 4.2 and 4.3 yields the following proposition.

Proposition 4.4

Suppose that (F1)–(F3) in Assumptions 4.1 and 4.2 hold. For every \(\varepsilon \ge 0\) and \(x\in X\), the (generalized) Jacobian of \(H_\varepsilon (x,\cdot ):I\!\!R^{m+l+l}\rightarrow I\!\!R^{m+l+l}\) is nonsingular.

In what follows, we also use the notation \(H(\varepsilon ,x,w):=H_\varepsilon (x,w)\) to emphasize that \(\varepsilon \) is one of the variables. In the same way, \(w(\varepsilon ,x)\equiv w_\varepsilon (x)\) and \( w(\varepsilon ,x)=(y(\varepsilon ,x), z(\varepsilon ,x), \lambda (\varepsilon ,x))\equiv (y_\varepsilon (x),z_\varepsilon (x),\lambda _\varepsilon (x)).\)

Lemma 4.4

Suppose that (F1)–(F3) in Assumptions 4.1 and 4.2 hold. For every \(\varepsilon \ge 0\), \(H(\varepsilon ,x,w)\), as a function of the variables \((\varepsilon ,x,w)\), is locally Lipschitz continuous and regular.

Proof

Since (F1) holds, and thus all the remaining components of \(H(\varepsilon ,x,w)\) except the FB-function \(\phi _\varepsilon \) when \(\varepsilon =0\) are continuously differentiable from (F1), we only need to show that the locally Lipschitz continuity and regularity of \(\phi _0\). It is obvious that \(\phi _0\) is convex by its definition. It follows from [6, Proposition 2.3.6-(b)] that \(\phi _0\) is regular, and also \(\phi _0\) is locally Lipschitz continuous whenever \(\lambda _i\) and \(z_i\) are bounded, where the boundedness of \(\lambda _i\) and \(z_i\) is satisfied from Assumptions (F2) and (F3).\(\square \)

Proposition 4.5

Let \((\varepsilon ,x,w)\) be such that \(H(\varepsilon ,x,w)=0\). If (F1)–(F3) in Assumptions 4.1 and 4.2 hold, then there is a neighborhood \(U\times \varOmega \subset I\!\!R^{1+n}\) of \((\varepsilon ,x)\) and a locally Lipschitz continuous function \(w:U\times \varOmega \rightarrow I\!\!R^{m+l+l}\) such that for each \((\varepsilon ,x)\in U\times \varOmega \),

$$\begin{aligned} H(\varepsilon ,x,w(\varepsilon ,x))=0. \end{aligned}$$

Moreover, for any fixed \(\varepsilon \in U\setminus \{0\}\), \(w_\varepsilon :\varOmega \rightarrow I\!\!R^{m+l+l}\) is continuously differentiable.

Proof

Lemma 4.4, Proposition 4.4, the implicit function theorem [6, Corollary to Theorem 7.1.1], and [7, Lemma 2] ensure that w is locally Lipschitz continuous on \(\varOmega \). The latter claim is obtained from a well-known result in elementary calculus.\(\square \)

Note that the local Lipschitz continuity of \(w(\cdot ,\cdot )\) from Proposition 4.5 implies that \(w_\varepsilon (x)\rightarrow w(x^*)\), i.e.,

$$\begin{aligned} y_\varepsilon (x)\rightarrow y(x^*),\ z_\varepsilon (x)\rightarrow z(x^*),\ \lambda _\varepsilon (x)\rightarrow \lambda (x^*), \end{aligned}$$

as \(x\rightarrow x^*\) and \(\varepsilon \downarrow 0\), and for any \(\varepsilon >0\). Now we show some properties of \(w(\cdot ,\cdot )\) and \(w_\varepsilon (\cdot )\).

By the compactness assumption of \(X\subset I\!\!R^n\) and continuity assumption, the following statement holds, which is derived from the elementary calculus.

Proposition 4.6

Under (F1)–(F3) in Assumption 4.1, if (L2) in Assumption 4.1 holds, then the function \(w:I\!\!R^{1+n}\rightarrow I\!\!R^{m+l+l}\) is compact-valued over X, and for any fixed \(\varepsilon \ge 0\), its partial (generalized) Jacobian matrix \(\partial _x w_\varepsilon \) is also compact.

Lemma 4.5

For a positive sequence \(\{\varepsilon _k\}\) converging to 0, let \(a_{\varepsilon _k}>0\) and \(b_{\varepsilon _k}>0\) for all k and converging to 0. Suppose that

$$\begin{aligned} \hat{\xi }^k:=\frac{a_{\varepsilon _k}}{\sqrt{a_{\varepsilon _k}^2+b_{\varepsilon _k}^2+2\varepsilon _k^2}},\ \hat{\eta }^k:=\frac{b_{\varepsilon _k}}{\sqrt{a_{\varepsilon _k}^2+b_{\varepsilon _k}^2+2\varepsilon _k^2}}. \end{aligned}$$

Then their limits, if they exist, are denoted by \(\hat{\xi }^\circ \) and \(\hat{\eta }^\circ \), respectively, and satisfy \(\hat{\xi }^\circ ,\hat{\eta }^\circ \in [0,1]\) and \((\hat{\xi }^\circ )^2+(\hat{\eta }^\circ )^2\le 1\).

Proof

Let \(a_{\varepsilon _k}=r_k\cos \theta _k\) and \(b_{\varepsilon _k}=r_k\sin \theta _k\), where \(r_k\rightarrow 0\). Then we have

$$\begin{aligned} \hat{\xi }^k=\frac{\cos \theta _k}{\sqrt{1+2(\varepsilon _k/r_k)^2}},\ \hat{\eta }^k=\frac{\sin \theta _k}{\sqrt{1+2(\varepsilon _k/r_k)^2}}. \end{aligned}$$

Obviously, their limits \(\hat{\xi }^\circ ,\hat{\eta }^\circ \) satisfy \(\hat{\xi }^\circ ,\hat{\eta }^\circ \in [0,1]\). Furthermore,

$$\begin{aligned} (\hat{\xi }^k)^2+(\hat{\eta }^k)^2=\frac{1}{1+2(\varepsilon _k/r_k)^2}<1, \end{aligned}$$

for all \(k=1,2,\dots \). Therefore, \((\hat{\xi }^\circ )^2+(\hat{\eta }^\circ )^2\le 1\).\(\square \)

Proposition 4.7

Under (F1)–(F3) in Assumptions 4.1 and 4.2, there is a positive sequence \(\{\varepsilon _k\}\) tending to 0 such that

$$\begin{aligned} \left\{ \lim _{k\rightarrow \infty }\nabla w_{\varepsilon _k}(x^k)\right\} \subset \partial w(x^*). \end{aligned}$$
(22)

Proof

By the local Lipschitz continuity of \(w(\cdot )\), the generalized Jacobian \(\partial w(x)\) is given by

$$\begin{aligned} \partial w(x^*)=\textrm{conv}\left\{ \lim _{x\rightarrow x^*}\nabla w(x)\ \Bigg |\ x\in {{\mathcal {D}}}\right\} , \end{aligned}$$

where \({{\mathcal {D}}}\subset I\!\!R^n\) denotes the set of points at which w is differentiable; to be exact, that of points at which y and \(\lambda \) are differentiable because if \(y(\cdot )\) is differentiable at \(x^*\), \(z(\cdot )\) is also differentiable at \(x^*\) from the definition of z. Hence, it suffices to show, instead of (22), that

$$\begin{aligned} \left\{ \lim _{k\rightarrow \infty }\nabla w_{\varepsilon _k}(x^k)\right\} \subset \textrm{conv}\left\{ \lim _{x\rightarrow x^*}\nabla w(x)\ \Bigg |\ x\in {{\mathcal {D}}}\right\} . \end{aligned}$$
(23)

For \(\varepsilon _k>0\), Proposition 4.5 leads that the gradient of \(w_{\varepsilon _k}\) at \(x^k\) is given by

$$\begin{aligned} \nabla w_{\varepsilon _k}(x^k)&=\left[ \nabla y_{\varepsilon _k}(x^k),\ \nabla z_{\varepsilon _k}(x^k),\ \nabla \lambda _{\varepsilon _k}(x^k)\right] \nonumber \\&=- \nabla _x H_{\varepsilon _k}(x^k,y_{\varepsilon _k}(x^k),z_{\varepsilon _k}(x^k),\lambda _{\varepsilon _k}(x^k))\nonumber \\&\quad \left[ \begin{array}{c} \nabla _y H_{\varepsilon _k}(x^k,y_{\varepsilon _k}(x^k),z_{\varepsilon _k}(x^k),\lambda _{\varepsilon _k}(x^k)) \\ \nabla _z H_{\varepsilon _k}(x^k,y_{\varepsilon _k}(x^k),z_{\varepsilon _k}(x^k),\lambda _{\varepsilon _k}(x^k)) \\ \nabla _\lambda H_{\varepsilon _k}(x^k,y_{\varepsilon _k}(x^k),z_{\varepsilon _k}(x^k),\lambda _{\varepsilon _k}(x^k)) \end{array} \right] ^{-1}\nonumber \\&= -\left[ {L'_{\varepsilon _k}},\ {A'_{\varepsilon _k}},\ O \right] \left[ \begin{array}{ccc} L_{\varepsilon _k} &{} A_{\varepsilon _k} &{} O \\ O &{} I &{} \varXi _{\varepsilon _k} \\ A_{\varepsilon _k}^\top &{} O &{} H_{\varepsilon _k} \end{array} \right] ^{-1}, \end{aligned}$$
(24)

where

$$\begin{aligned} L_{\varepsilon _k}&:=\nabla _y G(x^k,y_{\varepsilon _k}(x^k))+\sum _{i=1}^l[\lambda _{\varepsilon _k}(x^k)]_i\nabla _{yy}^2 g_i(x^k,y_{\varepsilon _k}(x^k)),\\ A_{\varepsilon _k}&:=\nabla _y g(x^k,y_{\varepsilon _k}(x^k)),\\ L'_{\varepsilon _k}&:=\nabla _x G(x^k,y_{\varepsilon _k}(x^k))+\sum _{i=1}^{l}[\lambda _{\varepsilon _k}(x^k)]_i\nabla ^2_{xy}g_i(x^k,y_{\varepsilon _k}(x^k)),\\ A'_{\varepsilon _k}&:=\nabla _x g(x^k,y_{\varepsilon _k}(x^k)),\\ \varXi _{\varepsilon _k}&:=\underset{i=1,\dots ,l}{\textrm{diag}}\left( \frac{[z_{\varepsilon _k}(x^k)]_i}{\sqrt{[z_{\varepsilon _k}(x^k)]_i^2+[\lambda _{\varepsilon _k}(x^k)]_i^2+2\varepsilon _k^2}}-1\right) \\&=\underset{i=1,\dots ,l}{\textrm{diag}}\left( \frac{[z_{\varepsilon _k}(x^k)]_i}{[z_{\varepsilon _k}(x^k)]_i+[\lambda _{\varepsilon _k}(x^k)]_i}-1\right) ,\\ H_{\varepsilon _k}&:=\underset{i=1,\dots ,l}{\textrm{diag}}\left( \frac{[\lambda _{\varepsilon _k}(x^k)]_i}{\sqrt{[z_{\varepsilon _k}(x^k)]_i^2+[\lambda _{\varepsilon _k}(x^k)]_i^2+2\varepsilon _k^2}}-1\right) \\&=\underset{i=1,\dots ,l}{\textrm{diag}}\left( \frac{[\lambda _{\varepsilon _k}(x^k)]_i}{[z_{\varepsilon _k}(x^k)]_i+[\lambda _{\varepsilon _k}(x^k)]_i}-1\right) . \end{aligned}$$

Here, the second equality in \(\varXi _{\varepsilon _k}\) and \(H_{\varepsilon _k}\) holds from Proposition 4.3 and since \(\varepsilon >0\). Let

$$\begin{aligned} \hat{\xi }^k_i:=\frac{[z_{\varepsilon _k}(x^k)]_i}{\sqrt{[z_{\varepsilon _k}(x^k)]_i^2+[\lambda _{\varepsilon _k}(x^k)]_i^2+2\varepsilon _k^2}},\ \hat{\eta }^k_i:=\frac{[\lambda _{\varepsilon _k}(x^k)]_i}{\sqrt{[z_{\varepsilon _k}(x^k)]_i^2+[\lambda _{\varepsilon _k}(x^k)]_i^2+2\varepsilon _k^2}}, \end{aligned}$$

and the limits of \(\hat{\xi }^k\) and \(\hat{\eta }^k\) be \(\hat{\xi }^\circ \) and \(\hat{\eta }^\circ \), respectively. Since \((z_{\varepsilon _k},\lambda _{\varepsilon _k})\) is compact-valued on X from Proposition 4.6, there exists a limit for appropriately chosen \(\{\varepsilon _k\}\) and using Lemma 4.5, we have

$$\begin{aligned} \bar{\varXi }^\circ _i=\left\{ \begin{array}{ll} -1 &{} \text {if } i\in {{\mathcal {J}}}_{0+}(x^*),\\ \hat{\xi }^\circ _i-1 &{} \text {if } i\in {{\mathcal {J}}}_{00}(x^*),\\ 0 &{} \text {if } i\in {{\mathcal {J}}}_{+0}(x^*), \end{array} \right. \quad \bar{H}^\circ _i=\left\{ \begin{array}{ll} 0 &{} \text {if } i\in {{\mathcal {J}}}_{0+}(x^*),\\ \hat{\eta }^\circ _i-1 &{} \text {if } i\in {{\mathcal {J}}}_{00}(x^*),\\ -1 &{} \text {if } i\in {{\mathcal {J}}}_{+0}(x^*), \end{array} \right. \end{aligned}$$

where \(\hat{\xi }^\circ _i,\hat{\eta }^\circ _i\in [0,1]\) such that \((\hat{\xi }^\circ _i)^2+(\hat{\eta }^\circ _i)^2\le 1\). In addition, since \(G(\cdot )\) and \(g(\cdot )\) are smooth, and \((y,z,\lambda )(\cdot ,\cdot )\) is continuous, we can take the limits \(L_{\varepsilon _k}\rightarrow \bar{L}\), \(A_{\varepsilon _k}\rightarrow \bar{A}\), \(L'_{\varepsilon _k}\rightarrow \bar{L}'\), and \(A'_{\varepsilon _k}\rightarrow \bar{A}'\). Letting \(\nabla w^\circ (x^*):=(\nabla y^\circ (x^*),\nabla z^\circ (x^*),\nabla \lambda ^\circ (x^*))\) be the limit of (24), we have

$$\begin{aligned} \nabla w^\circ (x^*)^\top = \left[ \begin{array}{c} \nabla y^\circ (x^*)^\top \\ \nabla z^\circ (x^*)^\top \\ \nabla \lambda ^\circ (x^*)^\top \end{array} \right] = -\left[ \begin{array}{ccc} \bar{L}^\top &{} O &{} \bar{A} \\ \bar{A}^\top &{} I &{} O \\ O &{} \bar{\varXi }^\circ &{} \bar{H}^\circ \end{array} \right] ^{-1} \left[ \begin{array}{c} {\bar{L'}}^\top \\ {\bar{A'}}^\top \\ O \end{array}\right] . \end{aligned}$$

Now we show that

$$\begin{aligned} \nabla w^\circ (x^*) \in \textrm{conv}\left\{ \lim _{x\rightarrow x^*,x\in {{\mathcal {D}}}}\nabla w(x)\right\} . \end{aligned}$$
(25)

for \(\{\varepsilon _k\}\) appropriately chosen.

We first consider the case where \(x^*\in {{\mathcal {D}}}\). Let

$$\begin{aligned} T(x):=H_0(x,w(x))\equiv 0\quad \forall x\in X, \end{aligned}$$

because w(x) is the solution to the system \(H_0(x,w)=0\) for a fixed \(x\in X\). For a given \(x^k\in {{\mathcal {D}}}\), by differentiating both sides of \(T(x)\equiv 0\), we have

$$\begin{aligned} \nabla w(x^k)^\top = \left[ \begin{array}{c} \nabla y(x^k)^\top \\ \nabla z(x^k)^\top \\ \nabla \lambda (x^k)^\top \end{array} \right] =-\left[ \begin{array}{ccc} {L_k}^\top &{} O &{} A_k \\ A_k^\top &{} I &{} O \\ O &{} \varXi _k &{} H_k \end{array} \right] ^{-1} \left[ \begin{array}{c} {L'_k}^\top \\ {A'_k}^\top \\ O \end{array} \right] , \end{aligned}$$

where

$$\begin{aligned} \begin{aligned} L_k&:=\nabla _y G(x^k,y(x^k))+\sum _{i=1}^l{\lambda _i}(x^k)\nabla _{yy}^2 g_i(x^k,y(x^k)),\\ A_k&:=\nabla _y g(x^k,y(x^k)),\\ L'_k&:=\nabla _x G(x^k,y(x^k))+\sum _{i=1}^{l}{\lambda _i}(x^k)\nabla ^2_{xy}g_i(x^k,y(x^k)),\\ A'_k&:=\nabla _x g(x^k,y(x^k)),\\ \varXi _k&:=\underset{i=1,\dots ,l}{\textrm{diag}}\left( \frac{{z_i}(x^k)}{\sqrt{{z_i}(x^k)^2+{\lambda _i}(x^k)^2}}-1\right) ,\\ H_k&:=\underset{i=1,\dots ,l}{\textrm{diag}}\left( \frac{{\lambda _i}(x^k)}{\sqrt{{z_i}(x^k)^2+{\lambda _i}(x^k)^2}}-1\right) , \end{aligned} \end{aligned}$$

Note that letting

$$\begin{aligned} \xi ^k_i:=\frac{{z_i}(x^k)}{\sqrt{{z_i}(x^k)^2+{\lambda _i}(x^k)^2}},\quad \eta ^k_i:=\frac{{\lambda _i}(x^k)}{\sqrt{{z_i}(x^k)^2+{\lambda _i}(x^k)^2}}, \end{aligned}$$

where \(\xi ^k_i,\eta ^k_i\in [0,1]\) such that \((\xi ^k_i)^2+(\eta ^k_i)^2=1\) yields

$$\begin{aligned} (\varXi _k)_i=\left\{ \begin{array}{ll} -1 &{} \text {if }i\in {{\mathcal {J}}}_{0+}(x^k),\\ \xi ^k_i-1 &{} \text {if }i\in {{\mathcal {J}}}_{00}(x^k),\\ 0 &{} \text {if }i\in {{\mathcal {J}}}_{+0}(x^k), \end{array} \right. \quad (H_k)_i=\left\{ \begin{array}{ll} 0 &{} \text {if }i\in {{\mathcal {J}}}_{0+}(x^k),\\ \eta ^k_i-1 &{} \text {if }i\in {{\mathcal {J}}}_{00}(x^k),\\ -1 &{} \text {if }i\in {{\mathcal {J}}}_{+0}(x^k). \end{array} \right. \end{aligned}$$

Now as \(k\rightarrow \infty \), i.e., \(x^k\rightarrow x^*\), by the continuity of the involved functions, we have

$$\begin{aligned} \nabla w(x^*)^\top = \left[ \begin{array}{c} \nabla y(x^*)^\top \\ \nabla z(x^*)^\top \\ \nabla \lambda (x^*)^\top \end{array} \right] = -\left[ \begin{array}{ccc} \bar{L}^\top &{} O &{} \bar{A} \\ \bar{A}^\top &{} I &{} O \\ O &{} \bar{\varXi } &{} \bar{H} \end{array} \right] ^{-1} \left[ \begin{array}{c} {\bar{L'}}^\top \\ {\bar{A'}}^\top \\ O \end{array}\right] , \end{aligned}$$
(26)

where

$$\begin{aligned} \bar{\varXi }_i=\left\{ \begin{array}{ll} -1 &{} \text {if }i\in {{\mathcal {J}}}_{0+}(x^*), \\ \bar{\xi }_i-1 &{} \text {if }i\in {{\mathcal {J}}}_{00}(x^*),\\ 0 &{} \text {if }i\in {{\mathcal {J}}}_{+0}(x^*), \end{array} \right. \quad \bar{H}_i=\left\{ \begin{array}{ll} 0 &{} \text {if }i\in {{\mathcal {J}}}_{0+}(x^*), \\ \bar{\eta }_i-1 &{} \text {if }i\in {{\mathcal {J}}}_{00}(x^*),\\ -1 &{} \text {if }i\in {{\mathcal {J}}}_{+0}(x^*), \end{array} \right. \end{aligned}$$
(27)

for \(\bar{\xi }_i,\bar{\eta }_i\in [0,1]\) such that \((\bar{\xi }_i)^2+(\bar{\eta }_i)^2=1\). Now taking the convex hull of the set that consists of (26) yields that

$$\begin{aligned} \begin{aligned}&\textrm{conv}\left\{ \lim _{\begin{array}{c} x\rightarrow x^*\\ x\in {{\mathcal {D}}} \end{array}}\nabla w(x)\right\} =\\&\left\{ \nabla w(x^*) \ \Bigg |\ \text {(26) and (27) for } \bar{\xi }_i,\bar{\eta }_i\in [0,1] \text { s.t. } \bar{\xi }_i^2+\bar{\eta }_i^2\le 1 \text { for all } i\in {{\mathcal {J}}}_{00}(x^*). \right\} \end{aligned} \end{aligned}$$

From the observation above, we can choose \(\bar{\xi }_i\) and \(\bar{\eta }_i\) so that (25) holds.

Next, we consider the case where \(x^*\notin {{\mathcal {D}}}\). For \(x^k\in {{\mathcal {D}}}\), by the continuity of \(\nabla w_{\varepsilon _k}\), we have

$$\begin{aligned} \lim _{\varepsilon \downarrow 0}\nabla w_{\varepsilon }(x^k)-\nabla w(x^k)=0. \end{aligned}$$

Hence for \(x_k\in {{\mathcal {D}}}\), there exists \(\varepsilon _k>0\) such that

$$\begin{aligned} \Vert \nabla w_{\varepsilon _k}(x^k)-\nabla w(x^k)\Vert \le \Vert x_k-x^*\Vert , \end{aligned}$$

since the value of the left-hand side of the above inequality is sufficiently close to 0 for \(\varepsilon _k\) sufficiently small. This implies that \(\Vert \nabla w_{\varepsilon _k}(x^k)-\nabla w(x^k)\Vert \rightarrow 0\) as \(x_k\rightarrow x^*\) and \(\varepsilon _k\rightarrow 0\).

Since \(\{\nabla w_{\varepsilon _k}(x^k)\}\) has a limit, we have

$$\begin{aligned} \begin{aligned} \lim _{k\rightarrow \infty }\nabla w(x^k)&= \lim _{k\rightarrow \infty }[\nabla w_{\varepsilon _k}(x^k)- \{\nabla w_{\varepsilon _k}(x^k)-\nabla w(x^k)\}] \\&=\lim _{k\rightarrow \infty }\nabla w_{\varepsilon _k}(x^k) - \lim _{k\rightarrow \infty }\{\nabla w_{\varepsilon _k}(x^k)-\nabla w(x^k)\} \\&=\nabla w^\circ (x^*)- 0 =\nabla w^\circ (x^*). \end{aligned} \end{aligned}$$

Hence it follows that

$$\begin{aligned} \nabla w^\circ (x^*)= \lim _{k\rightarrow \infty }\nabla w(x^k)\in \textrm{conv}\left\{ \lim _{\begin{array}{c} x\rightarrow x^*\\ x\in {{\mathcal {D}}} \end{array}}\nabla w(x)\right\} . \end{aligned}$$

We have thus completed the proof.\(\square \)

Corollary 4.1

Under the same assumption as Proposition 4.7, for a positive sequence \(\{\varepsilon _k\}\) that satisfies (22), the following inclusion also holds:

$$\begin{aligned} \left\{ \lim _{k\rightarrow \infty }\nabla y_{\varepsilon _k}(x^k)\right\} \subset \partial y(x^*). \end{aligned}$$

Proof

By Proposition 4.7, we have

$$\begin{aligned} \lim _{k\rightarrow \infty }\nabla w_{\varepsilon _k}(x^k)= \nabla w^\circ (x^*), \end{aligned}$$

and [6, Proposition 2.6.2] leads to

$$\begin{aligned} \partial w(x^*)\subset \partial y(x^*)\times \partial z(x^*)\times \partial \lambda (x^*). \end{aligned}$$

It then follows that

$$\begin{aligned} \nabla w^\circ (x^*)=[\nabla y^\circ (x^*),\nabla z^\circ (x^*),\nabla \lambda ^\circ (x^*)]\in \partial y(x^*)\times \partial z(x^*)\times \partial \lambda (x^*). \end{aligned}$$

\(\square \)

By the observation from Corollary 4.1, it may be reasonable to assume that

$$\begin{aligned} \left\{ \lim _{k\rightarrow \infty }\nabla _{x^\nu }y_{\varepsilon _k}(x^{k,\nu },x^{k,-\nu })\right\} \subset \partial _{x^\nu }y(x^{*,\nu },x^{*,-\nu }). \end{aligned}$$
(28)

Remark 4.3

Obviously, the relation (28) holds when \(N=1\), i.e., the single-leader–multi-follower game.

Herty et al. [10] proposed a smoothing method for a special case of multi-leader–single-follower game in which the follower’s response y(x) can explicitly be obtained, while we consider the case where the response cannot be written explicitly in general. Nevertheless, the gradient of the response function can be computed as indicated in (24). In fact, computing the inverse of the block matrix in (24), \(\nabla y_{\varepsilon _k}(x^k)\) is given as follows:

$$\begin{aligned} \nabla y_{\varepsilon _k}(x^k)=&-L'_{\varepsilon _k}\left( I+ L_{\varepsilon _k}^{-1}[A_{\varepsilon _k}\ O] \left[ \begin{array}{cc} I &{} \varXi _{\varepsilon _k} \\ A_{\varepsilon _k}^\top L_{\varepsilon _k}^{-1}A_{\varepsilon _k} &{} H_{\varepsilon _k} \end{array}\right] ^{-1} \left[ \begin{array}{c}O \\ A_{\varepsilon _k}^\top \end{array}\right] \right) L_{\varepsilon _k}^{-1}\\&+[A'_{\varepsilon _k}\ O]\left[ \begin{array}{cc} I &{} \varXi _{\varepsilon _k} \\ A_{\varepsilon _k}^\top L_{\varepsilon _k}^{-1}A_{\varepsilon _k} &{} H_{\varepsilon _k} \end{array}\right] ^{-1} \left[ \begin{array}{c} O \\ A_{\varepsilon _k}^\top \end{array}\right] L_{\varepsilon _k}^{-1}. \end{aligned}$$

By Proposition 4.5, reduced problem (9) may be approximated by a differentiable optimization problem:

$$\begin{aligned} \min _{x^\nu \in X^\nu }\varTheta ^\nu _\varepsilon (x^\nu ,x^{-\nu }):=\theta ^\nu (x^\nu ,x^{-\nu },y_\varepsilon (x^\nu ,x^{-\nu })). \end{aligned}$$
(29)

Now let \(\textrm{NEP}(X,\{\varTheta ^\nu _\varepsilon \}_{\nu =1}^N)\) be the game in which leader \(\nu \in \{1,\dots ,N\}\) solves (29). If \(\textrm{NEP}(X,\{\varTheta ^\nu _\varepsilon \}_{\nu =1}^N)\) has a Nash equilibrium, i.e., there exists \(x^*\) such that

$$\begin{aligned} x^{*,\nu }\in \arg \min _{x^\nu \in X^\nu }\varTheta ^\nu _\varepsilon (x^\nu ,x^{*,-\nu }), \end{aligned}$$

then the following assertion holds.

Theorem 4.3

Let a sequence \(\{x^k\}_{k\in \mathbb {N}}\) be such that each \(x^k\) is a Nash equilibrium of \(\textrm{NEP}(X,\{\varTheta ^\nu _{\varepsilon _k}\}_{\nu =1}^N)\). If \(\{x^k\}_{k\in \mathbb {N}}\) converges to \(x^*\), then \(x^*\) is a Nash equilibrium of \(\textrm{NEP}(X,\{\varTheta ^\nu \}_{\nu =1}^N)\). Moreover, \((x^*,y(x^*))\) is an LF Nash equilibrium of the MLMFG.

Proof

The former assertion can be shown using [14, Theorem 4.5], and the latter one holds from Proposition 3.1.\(\square \)

However, analogous to the claim for \(\textrm{NEP}(X,\{\varTheta ^\nu \}_{\nu =1}^N)\) before, since (29) is also not convex in general, the Nash equilibrium of \(\textrm{NEP}(X,\{\varTheta ^\nu _\varepsilon \}_{\nu =1}^N)\) may not exist. Hence, we introduce the following stationary equilibrium concept for the nonconvex NEP.

Definition 4.1

(Stationary Nash equilibrium) A tuple of strategies \(x^*\in X\) is referred to as a stationary Nash equilibrium of \(\textrm{NEP}(X,\{\varTheta ^\nu _\varepsilon \}_{\nu =1}^N)\) if \(x^{*,\nu }\) satisfies the following condition for all \(\nu \in \{1,\dots ,N\}\):

$$\begin{aligned} (\varTheta ^\nu _\varepsilon )'(x^{*,\nu },x^{*,-\nu };d^\nu )=\langle \nabla _{x^\nu }\varTheta ^\nu _\varepsilon (x^{*,\nu },x^{*,-\nu }),d^\nu \rangle \ge 0\quad \forall d^\nu \in {{\mathcal {T}}}_{X^\nu }(x^{*,\nu }), \end{aligned}$$
(30)

which is equivalent to

$$\begin{aligned} \begin{aligned}&\nabla _{x^\nu }\theta ^\nu (x^{*,\nu },x^{*,-\nu },y_\varepsilon (x^{*,\nu },x^{*,-\nu }))^\top d^\nu +\\&{d^\nu }^\top \nabla _{x^\nu }y_\varepsilon (x^{*,\nu },x^{*,-\nu })\nabla _y\theta ^\nu (x^{*,\nu },x^{*,-\nu },y_\varepsilon (x^{*,\nu },x^{*,-\nu }))\ge 0\quad \forall d^\nu \in {{\mathcal {T}}}_{X^\nu }(x^{*,\nu }). \end{aligned} \end{aligned}$$

Definition 4.1 implies that for all \(\nu \), \(x^{*,\nu }\in X^\nu \) is a stationary point of reduced problem (29) for fixed \(x^{*,-\nu }\). Note that since \(\varTheta ^\nu _\varepsilon \) is differentiable, Eq. (30) is equivalent to

$$\begin{aligned} 0\in \nabla _{x^\nu }\varTheta ^\nu _\varepsilon (x^{*,\nu },x^{*,-\nu })+{{\mathcal {T}}}_{X^\nu }(x^{*,\nu })^\circ , \end{aligned}$$

which means that the B-/C-stationary Nash equilibrium, introduced in Definitions 3.2 and 3.3, are equivalent under the differentiability of \(\varTheta ^\nu _\varepsilon \).

In practice, we obtain a stationary Nash equilibrium of \(\textrm{NEP}(X,\{\varTheta ^\nu _\varepsilon \}_{\nu =1}^N)\) sequentially as \(\varepsilon >0\) decreases to find an approximate B-/C-stationary Nash equilibrium for \(\textrm{NEP}(X,\{\varTheta ^\nu \}_{\nu =1}^N)\).

If \(X^\nu \) is convex for all \(\nu \), the equilibrium can be computed by solving the following variational inequality problem: Find \(x^*\in X\) such that

$$\begin{aligned} \langle F^\ell _\varepsilon (x^*), x-x^*\rangle \ge 0\qquad \forall x\in X, \end{aligned}$$
(31)

where

$$\begin{aligned} F^\ell _\varepsilon (x):= \left[ \begin{array}{c} \nabla _{x^1}\varTheta ^1_\varepsilon (x^1,x^{-1}) \\ \vdots \\ \nabla _{x^N}\varTheta ^N_\varepsilon (x^N,x^{-N}) \end{array} \right] . \end{aligned}$$

The following proposition guarantees that the solution to VI (31) is the stationary Nash equilibrium for \(\textrm{NEP}(X,\{\varTheta ^\nu _\varepsilon \}_{\nu =1}^N)\).

Proposition 4.8

Suppose that \(X^\nu \subset I\!\!R^{n_\nu }\) is convex for all \(\nu \). Let \(x^*\in X\) be a solution to (31). Then, \(x^*\in X\) is a stationary Nash equilibrium for \(\textrm{NEP}(X,\{\varTheta ^\nu _\varepsilon \}_{\nu =1}^N)\).

Proof

Let \((x^\nu ,x^{*,-\nu })=(x^{*,1},\dots ,x^{*,\nu -1},x^\nu ,x^{*,\nu +1},\dots ,x^{*,N})\), where \(x^\nu \in X^\nu \) is arbitrary, Eq. (31) is reduced to

$$\begin{aligned} \langle \nabla _{x^\nu }\varTheta ^\nu _\varepsilon (x^{*,\nu },x^{*,-\nu }),x^\nu -x^{*,\nu }\rangle \ge 0\quad \forall x^\nu \in X^\nu . \end{aligned}$$
(32)

By Lemma 2.1, Eq. (32) is equivalent to

$$\begin{aligned} \begin{aligned}&0\in \nabla _{x^\nu }\varTheta ^\nu (x^{*,\nu },x^{*,-\nu };\varepsilon )+{{\mathcal {N}}}_{X^\nu }(x^{*,\nu })\\ \iff&0\in \nabla _{x^\nu }\varTheta ^\nu (x^{*,\nu },x^{*,-\nu };\varepsilon )+{{\mathcal {T}}}_{X^\nu }(x^{*,\nu })^\circ , \end{aligned} \end{aligned}$$

which coincides with (30). The claim holds for all \(\nu \), and thus \(x^*\in X\) is a stationary Nash equilibrium for \(\textrm{NEP}(X,\{\varTheta ^\nu _\varepsilon \}_{\nu =1}^N)\).\(\square \)

Proposition 4.9

Suppose that Assumptions 4.1 and 4.2 hold, and \(X^\nu \subset I\!\!R^{n_\nu }\) is convex for all \(\nu \). For each \(\varepsilon >0\), there exists a stationary Nash equilibrium to \(\textrm{NEP}(X,\{\varTheta ^\nu _\varepsilon \}_{\nu =1}^N)\).

Proof

It suffices to show that there exists a solution to variational inequality (31) by the convexity of X. Since \(F^\ell _\varepsilon \) is continuous, the solution set of (31) is nonempty and compact by [8, Corollary 2.2.5].\(\square \)

4.2 Convergence to Stationary Nash Equilibrium

In this subsection we show that the sequence of the stationary Nash equilibrium of \(\textrm{NEP}(X,\{\varTheta ^\nu _{\varepsilon _k}\}_{\nu =1}^N)\) converges to the B-/C-stationary Nash equilibrium of \(\textrm{NEP}(X,\{\varTheta ^\nu \}_{\nu =1}^N)\) for appropriately chosen \(\{\varepsilon _k\}\).

Theorem 4.4

Suppose that Assumptions 4.1, 4.2, and (28) hold for all \(\nu \in \{1,\dots ,N\}\). Assume that \(X^\nu \) is regular for all \(\nu \). Let \(\{x^k\}\) be a sequence of stationary Nash equilibria of \(\textrm{NEP}(X,\{\varTheta ^\nu _{\varepsilon _k}\}_{\nu =1}^N)\), i.e., \(x^{k,\nu }\) satisfies (30) for all \(\nu \). Then every accumulation point \(x^*\) of the sequence \(\{x^k\}\) is a C-stationary Nash equilibrium of \(\textrm{NEP}(X,\{\varTheta ^\nu \}_{\nu =1}^N)\). Moreover, assume that the reduced cost function \(\varTheta ^\nu (x^\nu ,x^{-\nu }):=\theta ^\nu (x^\nu ,x^{-\nu },y(x^\nu ,x^{-\nu }))\) is regular with respect to \(x^\nu \) at \(x^*\) for all \(\nu \). Then, \(x^*\) is a B-stationary Nash equilibrium of \(\textrm{NEP}(X,\{\varTheta ^\nu \}_{\nu =1}^N)\).

Proof

Since \(x^{k,\nu }\) is a stationary Nash equilibrium of \(\textrm{NEP}(X,\{\varTheta ^\nu _{\varepsilon _k}\}_{\nu =1}^N)\), for leader \(\nu \) \(x^{k,\nu }\in X^\nu \), under the regularity of \(X^\nu \), satisfies

$$\begin{aligned}&0\in \nabla _{x^\nu }\varTheta ^\nu _{\varepsilon _k}(x^{k,\nu },x^{k,-\nu })+{{\mathcal {N}}}_{X^\nu }(x^{k,\nu })\nonumber \\ \iff&0\in \nabla _{x^\nu }\theta ^\nu (x^{k,\nu },x^{k,-\nu },y_{\varepsilon _k}(x^{k,\nu },x^{k,-\nu }))+\nonumber \\&\qquad \,\nabla _{x^\nu }y_{\varepsilon _k}(x^{k,\nu },x^{k,-\nu })\nabla _y\theta ^\nu (x^{k,\nu },x^{k,-\nu },y_{\varepsilon _k}(x^{k,\nu },x^{k,-\nu }))+{{\mathcal {N}}}_{X^\nu }(x^{k,\nu }). \end{aligned}$$

By the compactness of X, we can assume that \(x^*\) is an accumulation point of \(\{x^k\}\) without loss of generality. The continuity of \(\nabla _{x^\nu }\theta ^\nu \), \(\nabla _y\theta ^\nu \), and \(y(\cdot )\) along with Proposition 4.5 implies that

$$\begin{aligned} \begin{aligned} \lim _{k\rightarrow \infty }\nabla _{x^\nu }\theta ^\nu (x^{k,\nu },x^{k,-\nu },y_{\varepsilon _k}(x^{k,\nu },x^{k,-\nu }))=\nabla _{x^\nu }\theta ^\nu (x^{*,\nu },x^{*,-\nu },y(x^{*,\nu },x^{*,-\nu })),\\ \lim _{k\rightarrow \infty }\nabla _{y}\theta ^\nu (x^{k,\nu },x^{k,-\nu },y_{\varepsilon _k}(x^{k,\nu },x^{k,-\nu }))=\nabla _{y}\theta ^\nu (x^{*,\nu },x^{*,-\nu },y(x^{*,\nu },x^{*,-\nu })). \end{aligned} \end{aligned}$$

From the assumption of (28), there exists a matrix \(V^{*,\nu }\in I\!\!R^{n_\nu \times m}\) such that

$$\begin{aligned} V^{*,\nu }=\lim _{k\rightarrow \infty }\nabla _{x^\nu }y_{\varepsilon _k}(x^{k,\nu },x^{k,-\nu })\in \partial _{x^\nu }y(x^{*,\nu },x^{*,-\nu }). \end{aligned}$$
(33)

Since \(x^k\rightarrow x^*\), \(-\nabla _{x^\nu }\varTheta ^\nu _{\varepsilon _k}(x^{k,\nu },x^{k,-\nu })\in {{\mathcal {N}}}_{X^\nu }(x^{k,\nu })\), and \(\nabla _{x^\nu }\varTheta ^\nu _{\varepsilon _k}(x^{k,\nu },x^{k,-\nu })\rightarrow \nabla _{x^\nu }\varTheta ^\nu (x^{*,\nu },x^{*,-\nu })\) from the observation above, by [25, Proposition 6.6], we have \(-\nabla _{x^\nu }\varTheta ^\nu (x^{*,\nu },x^{*,-\nu })\in {{\mathcal {N}}}_{X^\nu }(x^{*,\nu })\), which implies that

$$\begin{aligned}&0\in \nabla _{x^\nu }\theta ^\nu (x^{*,\nu },x^{*,-\nu },y(x^{*,\nu },x^{*,-\nu }))\nonumber \\&\quad +V^{*,\nu }\nabla _y\theta ^\nu (x^{*,\nu },x^{*,-\nu },y(x^{*,\nu },x^{*,-\nu }))+{{\mathcal {N}}}_{X^\nu }(x^{*,\nu }). \end{aligned}$$
(34)

Since \(\theta ^\nu \) is strictly differentiable with respect to y, the Jacobian chain rule [6, Theorem 2.6.6] can be applied and then yields

$$\begin{aligned} \begin{aligned} \partial _{x^\nu }\varTheta ^\nu (x^{*,\nu },x^{*,-\nu })=&\nabla _{x^\nu }\theta ^\nu (x^{*,\nu },x^{*,-\nu },y(x^{*,\nu },x^{*,-\nu }))\\&+\partial _{x^\nu }y(x^{*,\nu },x^{*,-\nu })\nabla _y\theta ^\nu (x^{*,\nu },x^{*,-\nu },y(x^{*,\nu },x^{*,-\nu })). \end{aligned} \end{aligned}$$

Then (34) implies \(0\in \partial _{x^\nu }\varTheta ^\nu (x^{*,\nu },x^{*,-\nu })+{{\mathcal {N}}}_{X^\nu }(x^{*,\nu })\). The claim simultaneously holds for all \(\nu \); thus, \(x^*\) is a C-stationary Nash equilibrium of \(\textrm{NEP}(X,\{\varTheta ^\nu \}_{\nu =1}^N)\).

Now we show the convergence to B-stationary Nash equilibrium by assuming the regularity of \(\varTheta ^\nu \). Since \(X^\nu \) is regular for all \(\nu \), X is also regular. By [3, Proposition 4.6.3], (34) is equivalent to

$$\begin{aligned} \begin{aligned}&\nabla _{x^\nu }\theta ^\nu (x^{*,\nu },x^{*,-\nu },y(x^{*,\nu },x^{*,-\nu }))^\top d^\nu +\\&\quad {d^\nu }^\top V^{*,\nu }\nabla _y\theta ^\nu (x^{*,\nu },x^{*,-\nu },y(x^{*,\nu },x^{*,-\nu }))\ge 0\quad \forall d^\nu \in {{\mathcal {T}}}_{X^\nu }(x^{*,\nu }). \end{aligned} \end{aligned}$$

The regularity assumption of \(\varTheta ^\nu \) and [6, Proposition 2.1.2 (b)] leads that

$$\begin{aligned} \begin{aligned}&(\varTheta ^\nu )'(x^{*,\nu },x^{*,-\nu };d^\nu )=\max \{{\zeta ^\nu }^\top d^\nu \mid \zeta ^\nu \in \partial _{x^\nu }\varTheta ^\nu (x^{*,\nu },x^{*,-\nu })\}\ge \\&\quad \nabla _{x^\nu }\theta ^\nu (x^{*,\nu },x^{*,-\nu },y(x^{*,\nu },x^{*,-\nu }))^\top d^\nu \\&\qquad +{d^\nu }^\top V^{*,\nu }\nabla _y\theta ^\nu (x^{*,\nu },x^{*,-\nu },y(x^{*,\nu },x^{*,-\nu }))\ge 0\quad \forall d^\nu \in {{\mathcal {T}}}_{X^\nu }(x^{*,\nu }). \end{aligned} \end{aligned}$$

The above assertion holds for every \(\nu \), which implies that \(x^*\) is a B-stationary Nash equilibrium point of \(\textrm{NEP}(X,\{\varTheta ^\nu \}_{\nu =1}^N)\).\(\square \)

Remark 4.4

Hori and Fukushima [12] showed the convergence to B-stationary Nash equilibrium with a squared penalty method for an EPEC associated with MLMFG, which means that the case when the penalty parameter \(\rho _k\rightarrow \infty \). As is well known, the squared penalty method is easily failed to be ill-conditioned. Then they proposed a refinement procedure after obtaining an ‘approximated’ B-stationary Nash equilibrium with the penalty method. However, the accumulation point obtained with the refinement may not guarantee the B-stationarity but only the weak stationarity because both problems are intrinsically different from each other. Meanwhile, our method enables us to obtain the B-stationary Nash equilibrium accurately.

If \(X^\nu \subset I\!\!R^{n_\nu }\) is given by

$$\begin{aligned} X^\nu :=\{x^\nu \in I\!\!R^{n_\nu }\mid u^\nu (x^\nu )\le 0,\ v^\nu (x^\nu )=0\}, \end{aligned}$$
(35)

where \(u^\nu :I\!\!R^{n_\nu }\rightarrow I\!\!R^{p_\nu }\) and \(v^\nu :I\!\!R^{n_\nu }\rightarrow I\!\!R^{q_\nu }\) are continuously differentiable, the convergence result of Theorem 4.4 can be shown without regularity assumption of \(X^\nu \) under an appropriate constraint qualification.

Definition 4.2

For each \(\nu \in \{1,\dots ,N\}\), we say that the Mangasarian–Fromovitz constraint qualification (MFCQ) holds at \(\bar{x}^\nu \in X^\nu \) if \(\nabla v^\nu _j(\bar{x}^\nu )\), \(j=1,\dots ,q_\nu \), are linearly independent, and there exists \(d^\nu \in I\!\!R^{n_\nu }\) such that \(\langle \nabla u^\nu _i(\bar{x}^\nu ),d^\nu \rangle <0\) for all \(i\in {{\mathcal {I}}}(\bar{x}^\nu ):=\{i\mid u^\nu _i(\bar{x}^\nu )=0\}\) and \(\langle \nabla v^\nu _j(\bar{x}^\nu ),d\rangle =0\) for all \(j=1,\dots ,q_\nu \).

Corollary 4.2

Suppose that Assumptions 4.1 and 4.2 hold, and assume that (28) holds for all \(\nu \in \{1,\dots ,N\}\). Suppose also that the MFCQ holds for all \(\nu \) at every accumulation point \(x^*\) of the sequence \(\{x^k\}\). Then \(x^*\) is a C-stationary Nash equilibrium for \(\textrm{NEP}(X,\{\varTheta ^\nu \}_{\nu =1}^N)\). Moreover, if \(\varTheta ^\nu \) is regular with respect to \(x^\nu \) at \(x^*\) for all \(\nu \), then \(x^*\) is a B-stationary Nash equilibrium for \(\textrm{NEP}(X,\{\varTheta ^\nu \}_{\nu =1}^N)\).

Proof

Under the MFCQ assumption, there exists Lagrange multipliers \(\zeta ^{\le ,k,\nu }\in I\!\!R^{p_\nu }_+\) and \(\zeta ^{=,k,\nu }\in I\!\!R^{q_\nu }\) satisfying the following optimality condition of (29):

$$\begin{aligned}&\nabla _{x^\nu }\varTheta ^\nu _{\varepsilon _k}(x^{k,\nu },x^{k,-\nu })+\nabla u^\nu (x^{k,\nu })\zeta ^{\le ,k,\nu }+\nabla v^\nu (x^{k,\nu })\zeta ^{=,k,\nu }=0, \end{aligned}$$
(36)
$$\begin{aligned}&0\le \zeta ^{\le ,\nu }\perp -u^\nu (x^{k,\nu })\ge 0, \end{aligned}$$
(37)
$$\begin{aligned}&v^\nu (x^{k,\nu })=0. \end{aligned}$$
(38)

By the continuity of \(\nabla _{x^\nu }\theta ^\nu \), \(\nabla _y\theta ^\nu \), and y along with Proposition 4.5, there exists a limit \((x^*,\zeta ^{\le ,*},\zeta ^{=,*})\) of KKT conditions (36) because the sequence \(\{(x^k,\zeta ^{\le ,k},\zeta ^{=,k})\}\) of KKT tuple is bounded by the MFCQ. Then we obtain

$$\begin{aligned}&\nabla _{x^\nu }\theta ^\nu (x^{*,\nu },x^{*,-\nu },y(x^{*,\nu },x^{*,-\nu }))+V^{*,\nu }\nabla _y\theta ^\nu (x^{*,\nu },x^{*,-\nu },y(x^{*,\nu },x^{*,-\nu }))\nonumber \\&\quad +\nabla u^\nu (x^{*,\nu })\zeta ^{\le ,*,\nu }+\nabla v^\nu (x^{*,\nu })\zeta ^{=,*,\nu }=0, \end{aligned}$$
(39)
$$\begin{aligned}&0\le \zeta ^{\le ,*,\nu }\perp -u^\nu (x^{*,\nu })\ge 0, \end{aligned}$$
(40)
$$\begin{aligned}&v^\nu (x^{*,\nu })=0, \end{aligned}$$
(41)

where \(V^{*,\nu }\) is defined in (33). It follows from the Jacobian chain rule that (39) implies

$$\begin{aligned}&0\in \nabla _{x^\nu }\theta ^\nu (x^{*,\nu },x^{*,-\nu },y(x^{*,\nu },x^{*,-\nu }))\nonumber \\&\quad +\partial _{x^\nu }y(x^{*,\nu },x^{*,-\nu })\nabla _y\theta ^\nu (x^{*,\nu },x^{*,-\nu },y(x^{*,\nu },x^{*,-\nu }))\nonumber \\&\quad +\nabla u^\nu (x^{*,\nu })\zeta ^{\le ,*,\nu }+\nabla v^\nu (x^{*,\nu })\zeta ^{=,*,\nu }. \end{aligned}$$
(42)

Under the MFCQ, KKT conditions (42), (40), and (41) coincide with \(0\in \partial _{x^\nu }\varTheta ^\nu (x^{*,\nu },x^{*,-\nu })+{{\mathcal {T}}}_{X^\nu }(x^{*,\nu })^\circ \), and thus \(x^*\) is a C-stationary Nash equilibrium of \(\textrm{NEP}(X,\{\varTheta ^\nu \}_{\nu =1}^N)\).

The latter claim can also be shown by the same manner as Theorem 4.4.\(\square \)

Remark 4.5

It is easy to see that B- and C-stationarity are equivalent when z and \(\lambda \) satisfy the strict complementarity; that is, \(z_i+\lambda _i>0\) for all \(i=1,\dots ,l\).

5 Numerical Experiments

In this section, we report results of numerical experiments conducted to illustrate the behavior of the proposed method with a toy example. First, we introduce a two-leader–two-follower game, i.e., \(N=2\) and \(M=2\).

We refer to the extended model of Hori and Fukushima [12] with the nonnegative constraint \(x^\nu \ge 0\) on each leader’s optimization problem. Leader \(\nu \in \{1,2\}\) solves the following problem:

$$\begin{aligned} \min _{x^\nu \in I\!\!R^2}\quad&\frac{1}{2}(x^\nu )^\top H_\nu x^\nu +(x^\nu )^\top G_{\nu ,-\nu }x^{-\nu }+\sum _{\omega =1,2} (x^\nu )^\top D_{\nu ,\omega } y^\omega + (q^\nu )^\top x^\nu \nonumber \\ \text {s.t.}\quad&A_\nu x^\nu \le b^\nu ,\ x^\nu \ge 0, \end{aligned}$$
(43)

where \(H_\nu \in I\!\!R^{n_\nu \times n_\nu }\), \(G_{\nu ,-\nu }\in I\!\!R^{n_\nu \times n_{-\nu }}\), \(D_{\nu ,\omega }\in I\!\!R^{n_\nu \times m_\omega }\), \(A_\nu \in I\!\!R^{p_\nu \times n_\nu }\), and \(b^\nu \in I\!\!R^{p_\nu }\). Since the number of leaders is two, i.e., \(N=2\), the label \(-\nu \) for adversarial leaders is the other one; for example, for \(\nu =1\), \(-\nu =2\), and then \(x^{-\nu }=x^2\) and \(G_{\nu ,-\nu }=G_{1,2}\). For \(\nu =2\), \(x^{-\nu }=x^1\), \(G_{\nu ,-\nu }=G_{2,1}\), and vice versa, i.e., for \(\nu =2\), \(x^{-\nu }=x^1\) and \(G_{\nu ,-\nu }=G_{2,1}\).

Follower \(\omega \in \{1,2\}\) solves the following problem:

$$\begin{aligned} \min _{y^\omega \in I\!\!R^2}\quad&\frac{1}{2}(y^\omega )^\top M_\omega y^\omega +(y^\omega )^\top Q_{\omega ,-\omega } y^{-\omega } -\sum _{\nu =1,2}(x^\nu )^\top D_{\nu ,\omega } y^\omega \nonumber \\ \text {s.t.}\quad&(c^\omega )^\top y^\omega +\sum _{\nu =1,2} (d^\nu )^\top x^\nu +a_\omega \ge 0,\ y^\omega \ge 0. \end{aligned}$$
(44)

where \(M_\omega \), \(\omega =1,2\), are symmetric positive definite. Similarly, the label \(-\omega \) for adversarial followers is the other one; for example, for \(\omega =1\), \(y^{-\omega }=y^2\), \(Q_{\omega ,-\omega }=Q_{1,2}\), and vice versa. Although it is a quadratic game, it cannot be solved by the approaches given in the previous works [13] and [10] because the response y(x) cannot be obtained explicitly due to the non-diagonal matrices \(M_\omega \) and the coefficient vector \(c^\omega \) of \(y^\omega \) as also indicated in Remark 4.1.

The KKT conditions of (44), \(\omega =1,2\), are given as follows:

$$\begin{aligned} \begin{aligned}&0\le \left[ \begin{array}{c} y^1 \\ y^2 \end{array}\right] \perp \left[ \begin{array}{cc} M_1 &{} Q_{1,2} \\ Q_{2,1} &{} M_2 \end{array}\right] \left[ \begin{array}{c} y^1 \\ y^2 \end{array}\right] -\left[ \begin{array}{cc} D_{1,1}^\top &{} D_{2,1}^\top \\ D_{1,2}^\top &{} D_{2,2}^\top \end{array}\right] \left[ \begin{array}{c} x^1 \\ x^2 \end{array}\right] -\left[ \begin{array}{cc} c^1 &{} 0 \\ 0 &{} c^2 \end{array}\right] \left[ \begin{array}{c} \lambda _1 \\ \lambda _2 \end{array}\right] \ge 0,\\&0\le \left[ \begin{array}{c} \lambda _1 \\ \lambda _2 \end{array}\right] \perp \left[ \begin{array}{cc} (c^1)^\top &{} 0 \\ 0 &{} (c^2)^\top \end{array}\right] \left[ \begin{array}{c}y^1 \\ y^2\end{array}\right] +\left[ \begin{array}{cc} (d^1)^\top &{} (d^2)^\top \\ (d^1)^\top &{} (d^2)^\top \end{array}\right] \left[ \begin{array}{c}x^1 \\ x^2\end{array}\right] +\left[ \begin{array}{c} a_1\\ a_2 \end{array}\right] \ge 0. \end{aligned} \end{aligned}$$

To ensure the uniqueness of the Nash equilibrium y, we assume that

$$\begin{aligned} \left[ \begin{array}{cc} M_1 &{} Q_{1,2} \\ Q_{2,1} &{} M_2 \end{array} \right] \end{aligned}$$

is positive definite. Let \(y_\varepsilon (x)\) be a partial solution to the perturbed KKT conditions of (44) with the smoothing method. Here, we define

$$\begin{aligned} A:=\left[ \begin{array}{cc}A_1 &{} \\ {} &{} A_2\end{array}\right] ,\ b:=\left[ \begin{array}{c} b^1 \\ b^2 \end{array}\right] ,\ X:=\{x\in I\!\!R^n\mid Ax\le b,\ x\ge 0 \}, \end{aligned}$$

and

$$\begin{aligned} F^\ell _\varepsilon (x):=\left[ \begin{array}{c}\nabla _{x^1}\varTheta ^1_\varepsilon (x^1,x^2) \\ \nabla _{x^2}\varTheta ^2_\varepsilon (x^1,x^2)\end{array}\right] , \end{aligned}$$

where

$$\begin{aligned} \begin{aligned} \nabla _{x^\nu }\varTheta ^\nu _\varepsilon (x^\nu ,x^{-\nu })=&H_\nu x^\nu +G_{\nu ,-\nu }x^{-\nu }+\\&\sum _{\omega =1,2}\Bigl (D_{\nu ,\omega } y^\omega _\varepsilon (x^\nu ,x^{-\nu })+\nabla _{x^\nu }y^\omega _\varepsilon (x^\nu ,x^{-\nu })D_{\nu ,\omega }^\top x^\nu \Bigr ). \end{aligned} \end{aligned}$$

Then the stationary Nash equilibrium of \(\textrm{NEP}(X,\{\varTheta ^\nu _\varepsilon \}_{\nu =1}^N)\) must satisfy the following VI: Find \(x^*_\varepsilon \in X\) such that

$$\begin{aligned} \langle F^\ell _\varepsilon (x^*_\varepsilon ),x-x^*_\varepsilon \rangle \ge 0\quad \forall x\in X, \end{aligned}$$
(45)

and its KKT conditions are written as the nonlinear complementarity problem:

$$\begin{aligned}&0\le F^\ell _\varepsilon (x)+A^\top \mu \perp x\ge 0, \end{aligned}$$
(46)
$$\begin{aligned}&\qquad \quad \,\, 0\le b-Ax \perp \mu \ge 0. \end{aligned}$$
(47)

Let \(v:=(x,\mu )\in I\!\!R^{n+p}\) and

$$\begin{aligned} \hat{F}^\varepsilon (v):=\left[ \begin{array}{c}F^\ell _\varepsilon (x)+A^\top \mu \\ b-Ax \end{array}\right] ,\ \varPsi ^\varepsilon (v):=\left[ \begin{array}{c}\phi _0(v_1,\hat{F}^\varepsilon _1(v))\\ \vdots \\ \phi _0(v_{n+p},\hat{F}^\varepsilon _{n+p}(v))\end{array}\right] . \end{aligned}$$

Then NCP (46)–(47) is reformulated as the nonsmooth equation \(\varPsi ^\varepsilon (v)=0\). We solve NCP (46)–(47) by semismooth Newton’s method proposed for NCP by Luca et al. [20], and we use \(1/2\Vert \varPsi ^\varepsilon (v)\Vert ^2_2\) as a merit function in the line search algorithm. The stopping criterion is given by

$$\begin{aligned} \Vert \min \{v,\hat{F}^\varepsilon (v)\}\Vert _\infty <(n+p)\times 10^{-6}, \end{aligned}$$

where the \(\min \) operator means the componentwise minimum value. The numerical instances are shown below:

$$\begin{aligned} n_1= & {} n_2=2,\ p_1=p_2=2,\ m_1=m_2=2,\\ H_1= & {} \left[ \begin{array}{rr}3&{}-4\\ -4&{}2\end{array}\right] , H_2=\left[ \begin{array}{rr}4&{}-5\\ -5&{}-3\end{array}\right] , G_{1,2}=\left[ \begin{array}{rr}2&{}-1\\ 2&{}2\end{array}\right] , G_{2,1}=-G_{1,2}^\top ,\\ D_{1,1}= & {} \left[ \begin{array}{rr}1&{}2\\ 2&{}1\end{array}\right] , D_{2,1}=\left[ \begin{array}{rr}2&{}1\\ 1&{}1\end{array}\right] , D_{1,2}=\left[ \begin{array}{rr}1&{}2\\ 1&{}1\end{array}\right] , D_{2,2}=\left[ \begin{array}{rr}2&{}1\\ 1&{}2\end{array}\right] ,\\ q^1= & {} \left[ \begin{array}{c} -6\\ -6\end{array}\right] , q^2=\left[ \begin{array}{c}-6\\ -6\end{array}\right] , A_1=\left[ \begin{array}{rr}2&{}1\\ 1&{}2\end{array}\right] , A_2=\left[ \begin{array}{rr}1&{}2\\ 2&{}1\end{array}\right] , b^1=\left[ \begin{array}{c}3\\ 1\end{array}\right] , b^2=\left[ \begin{array}{c}3\\ 1\end{array}\right] ,\\ M_1= & {} \left[ \begin{array}{rr}3&{}1\\ 1&{}3\end{array}\right] , M_2=\left[ \begin{array}{rr}2&{}1\\ 1&{}3\end{array}\right] , Q_{1,2}=\left[ \begin{array}{rr}1&{}1\\ 1&{}2\end{array}\right] , Q_{2,1}=-Q_{1,2}^\top ,\\ c^1= & {} \left[ \begin{array}{c}-1\\ -1\end{array}\right] , c^2=\left[ \begin{array}{c}-1\\ -1\end{array}\right] , d^1=\left[ \begin{array}{c}1\\ 1\end{array}\right] , d^2=\left[ \begin{array}{c}1\\ 1\end{array}\right] , a_1=4, a_2=4. \end{aligned}$$

We solve NCP (46)–(47) sequentially as \(\varepsilon _k\) decreases, where \(\varepsilon _k=0.9^{k}\), \(k=0,\dots ,74\). We run the algorithm with the initial point \(x^0:=(3,3,3,3)\). Here, \(y,z,\lambda \) is unique then without initial points, they are uniquely determined depending on \(x^0\). Figure 1a–d depict the stationary Nash equilibrium for \(\varepsilon _k\), \(k=0,\dots ,74\), and the sequence \(\{x^k\}\) converges to \(x^*\). We set other initial points, but the convergent point and the curve are similar to those figures.

Fig. 1
figure 1

Sequence of the stationary Nash equilibrium

Remark 5.1

The semismooth Newton’s method uses \(\nabla F^\ell _\varepsilon (x)\) with the Hessian matrix \(\nabla ^2 y_\varepsilon (x)\) of the response function \(y_\varepsilon (x)\), which requires the Hessian of \(H_\varepsilon (x,y,z,\lambda )\) with respect to \((y,z,\lambda )\) and so on. However, the formula of \(\nabla ^2 y_\varepsilon (x)\) is much complicated to obtain, and thus we use a numerical differentiation. Hence, the accuracy of the numerical results when \(\varepsilon \) is very small may not be guaranteed.

We now conclude this section. We verified the behavior of the proposed method with a toy example, and all the solutions \(x^k\) and \(y_{\varepsilon _k}(x^k)\) converge to the B-stationary Nash equilibrium as \(\varepsilon _k\rightarrow 0\). Meanwhile, we require the Jacobian matrix of \(F^\ell _\varepsilon (x)\) to solve variational inequality (45) with gradient-based method which includes the Hessian matrix of the response function \(y_\varepsilon (x)\). Hence the computation of the Hessian matrix and its inverse is, of course, expensive when the dimension of followers’ problems is high. We leave this issue as a topic for future research.

6 Conclusion

This paper has presented a smoothing method of the followers’ best response in the MLMFG in the case where the followers’ optimization problems are more general than the previous studies [10, 13]. Then we have shown the convergence result for a Clarke and Bouligand stationary Nash equilibrium on MLMFG as the smoothing parameter tends to zero. Finally, a numerical experiment has been conducted to check the behavior of the proposed method; specifically, observing the behavior of the sequence of stationary Nash equilibria as the smoothing parameter approaches zero. Note that the numerical experiment in Sect. 5 does not indicate the numerical efficiency of the algorithm; therefore, an analysis of the complexity and the convergence as the problem size increases will be required. In fact, this method requires third-order derivatives of the followers’ objectives and constraint functions eventually to obtain an approximate Nash equilibrium. Nevertheless, the convergence result, particularly for the B-stationary points, justifies our proposed method. In the future, a more efficient algorithm than the semismooth Newton method should be considered, which would allow experiments with higher-dimensional problems.

Furthermore, we remark that in this paper we did not mention a local surrogate of the LF Nash equilibrium for MLMFG. To the best of the authors’ knowledge, there is no such literature on MLMFG that deals with a local LF Nash equilibrium. Unfortunately, verifying local optimality in nonconvex optimization is still NP-hard, and thus as well as finding an LF Nash equilibrium in MLMFG, finding the LF local Nash equilibrium may be difficult in practice but theoretically important such as second-order analysis.