Abstract
A generalizing analysis is made in order to ease the solvability of the generally complex single-leader–single-follower reverse Stackelberg game. This game is of a hierarchical nature and can therefore be implemented as a structure for multi-level decision-making problems, like in road pricing. In particular, a leader function of the affine type is analyzed in order to procure a systematic approach to solving the game to optimality. To this end, necessary and sufficient existence conditions for this optimal affine leader function are developed. Compared to earlier results reported in the literature, differentiability of the follower objective functional is relaxed, and locally strict convexity of the sublevel set at the desired reverse Stackelberg equilibrium is replaced with the more general property of an exposed point. Moreover, a full characterization of the set of optimal affine leader functions that is derived, which use in the case of secondary optimization objectives as well as for a constrained decision space, is illustrated.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
Hierarchical optimization approaches have proven useful in settings, in which conflicting objectives appear [1, 2], or as a means of dealing with large-scale control problems [3]. Here, the hierarchical reverse Stackelberg game [4] is considered, which is also known as a Stackelberg game with incentive strategies [5], or more recently, as the inverse Stackelberg game [6, 7]. As compared to the original Stackelberg game [2], in the reverse Stackelberg game, instead of an immediate decision variable, the leader proposes a mapping of the follower’s decision space into her decision space. Compared to the original Stackelberg game, this structure provides the leader with a stronger influence in case of a nonunique follower response to the single-leader decision.
While diverse research directions have been considered w.r.t. the reverse Stackelberg game, e.g., partial information [8], sensitivity [9], and applications like road tolling [1, 10], by the best knowledge of the authors, no structural approach to finding optimal leader functions has yet been provided. Instead, much research is tailored to the specific case of a quadratic, (strictly) convex and differentiable cost functional and, in a dynamic game framework, of a linear state update equation [8, 11–16], in which case affine solutions have been shown to be optimal [8]. Our aim is to expand available solution methods by first considering problem instances in which nonconvex sublevel sets for the follower objective function as well as nondifferentiable objective functions apply.
In this article, a first step is made toward developing a systematic solution approach that on the one hand eases the solution process of the generally complex single-leader–single-follower reverse Stackelberg game, and that at the same time deals with a game setting in which assumptions that could restrict the application to certain problem settings are relaxed as much as possible. In particular, leader functions of the affine type are analyzed in order to procure a systematic approach for solving the game to optimality.
To this end, we formulate necessary and sufficient existence conditions for an optimal affine solution as initially presented in [17] in their most general form. Compared to earlier results reported in the literature, the differentiability requirement of the follower objective functional is relaxed, and locally strict convexity of the follower’s sublevel set at the desired reverse Stackelberg equilibrium is replaced by the more general property of an exposed point.
Moreover, a full characterization of the set of optimal affine leader functions is derived. The parametrized characterization of such a set facilitates further optimization, e.g., when considering the sensitivity to deviations from the optimal follower response as a secondary objective, as is illustrated in [18]. Furthermore, the characterization can be used to verify the existence of an optimal affine leader function in a constrained decision space, in which case the derivation of existence conditions is a challenging task. The computational complexity of the original game and of the proposed solution approach is considered, and illustrative examples are provided.
The remainder of this paper is structured as follows. Section 2 includes a definition of the reverse Stackelberg game along with a brief analysis of its computational complexity and with an outline of the solution approach, followed by a clarification of notation and assumptions. In Sect. 3, necessary and sufficient conditions are proven for the existence of an optimal affine leader function, considering separately the case of a scalar leader input, and the cases in which the desired equilibrium is either an interior or a boundary point of the sublevel set. The characterization of the set of optimal affine leader functions is provided in Sect. 4 and illustrated by an example, after which the use of this set for the constrained case is illustrated by an example in Sect. 5. Conclusions are presented in Sect. 6.
2 Preliminaries
2.1 Reverse Stackelberg Game
The basic single-leader–single-follower, static, deterministic reverse Stackelberg game without constraints can be defined as follows.
Let the leader and follower decision variables be denoted by \(\mathbf u _{p}\in \Omega _{p}\subseteq \mathbb {R}^{n_p}\), \(n_p\in \mathbb {N}\), \(p\in \{\mathrm{L,F}\}\), while \(\mathcal {J}_{p}:\Omega _\mathrm{L}\times \Omega _\mathrm{F}\rightarrow \mathbb {R}\) denotes, respectively, the leader’s and follower’s objective (cost) functional. We assume the leader to have complete knowledge of the follower’s objective functional and decision space.
Given that the leader player acts first by announcing the leader function \(\gamma _\mathrm{L}:\Omega _\mathrm{F}\rightarrow \Omega _\mathrm{L}\) while taking into account the follower’s response, we can write the problem as a set of composed functions similar as is done in [6]:
where we assume that \(\gamma ^*_\mathrm{L}\) is constructed such that the optimal—indicated by an superscribed asterisk—follower response is unique. An overview of symbols that are frequently used in this article can be found in Table 1.
2.1.1 Computational Complexity
Already the single-leader–single-follower static reverse Stackelberg problem is complex and, in general, difficult to solve analytically due to the composed functions appearing in the optimization problem formulated in (1) as well as the possible existence of multiple optima (hence solutions) and a nonunique follower response [6, 7, 19].
Theorem 2.1
The reverse Stackelberg game (1) is at least strongly NP-hard.
Proof
The original Stackelberg game is a special case of (1), i.e., for \(\gamma _\mathrm{L}:\Omega _\mathrm{F}\rightarrow \{ {\mathbf {u}}_\mathrm{L}^\mathrm{d}\}\), with \(\mathbf u _\mathrm{L}^\mathrm{d}\in \Omega _\mathrm{L}\) a free variable, (1) can be written as
from which a suitable, explicit value \(\mathbf {u}_\mathrm{L}^\mathrm{d}\) for \(\gamma _\mathrm{L}(\mathbf {u}_\mathrm{F})\) follows.
Moreover, the Stackelberg game (2) is equivalent [20, 21] to the bilevel programming problem that can be written as
for general cost functions \(F(\cdot ),f(\cdot )\) and constraint functions \(G(\cdot ),g(\cdot )\). The linear bilevel programming problem is proven to be NP-hard [22] and later strongly NP-hard [23].
Hence, the reverse Stackelberg game can be reduced to the strongly NP-hard bilevel optimization problem, and therefore belongs at least to this complexity class. \(\square \)
A common, simplifying approach to the reverse Stackelberg problem is for the leader player to first determine a particular desired optimum \(\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \) that she seeks to achieve [4, 5]. A natural choice would be the leader’s global optimum \(\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \in \arg \min _{\mathbf {u}_\mathrm{L}\in \Omega _\mathrm{L},\mathbf {u}_\mathrm{F}\in \Omega _\mathrm{F}}\mathcal {J}_\mathrm{{L}}(\mathbf {u}_\mathrm{L},\mathbf {u}_\mathrm{F})\). Given such an equilibrium point, the remaining problem can be written as follows:
where \(\Gamma _\mathrm{L}\) denotes the class of leader functions \(\gamma _\mathrm{L}:\Omega _\mathrm{F}\rightarrow \Omega _\mathrm{L}\) that is allowed in a particular game setting.
In other words, the leader should construct her function \(\gamma _\mathrm{L}\) such that it passes through the desired optimum, but without intersection with other points in the sublevel set
Then, the optimal follower response coincides with the desired input \(\mathbf {u}_\mathrm{F}^\mathrm{d}\).
2.1.2 Affine Incentive Compatibility
In order to further reduce the complexity of the general reverse Stackelberg game and to create a systematic approach toward solving the general game, in this paper, we focus on the particular affine structure of the leader function, i.e., we assume \(\Gamma _\mathrm{L}\) to include only functions of the form
where \(\mathbf {B}\) denotes a linear operator mapping \(\Omega _\mathrm{F} \rightarrow \Omega _\mathrm{L}\), represented by an \(n_\mathrm{L}\times n_\mathrm{F}\) matrix in the finite-dimensional case we will consider according to assumption A.3 in Sect. 2.3 below.
The property of a particular desired leader equilibrium to be feasible for an instance of the reverse Stackelberg game is known as incentive compatibility in the literature [5]; it will therefore be analyzed under what conditions an optimal affine leader function exists, meaning that the leader is able to induce the follower to choose the desired input \(\mathbf {u}_\mathrm{F}^\mathrm{d}\), and thus reach her desired equilibrium.
From now on, we denote by \(\mathcal {A}_\mathrm{L}\) the set of affine relations through \(\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \) defined as sets of dimension \(n_\mathrm{F}\) in \(\Omega _\mathrm{L}\times \Omega _\mathrm{F}\) and such that, for \(\alpha _\mathrm{L} \in \mathcal {A}_\mathrm{L}\), \(\alpha _\mathrm{L} \cap \Lambda _\mathrm{d} = \left\{ \left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \right\} \).
Note that the introduction of \(\mathcal {A}_\mathrm{L}\) is necessary in order to be able to work with the function \(\gamma _\mathrm{L}:\Omega _\mathrm{F}\rightarrow \Omega _\mathrm{L}\) as a set of points
Remark 2.1
Note that, in this paper, the leader function \(\gamma _\mathrm{L}\) is defined as a mapping \(\Omega _\mathrm{F}\rightarrow \Omega _\mathrm{L}\) that can also be represented by the set of points (8) [24]. In the following, both the mapping and the set representation of \(\gamma _\mathrm{L}\) are adopted, depending on the context.
For \(\alpha _\mathrm{L} \in \mathcal {A}_\mathrm{L}, \alpha _\mathrm{L}(\Omega _\mathrm{L}) =\Omega _\mathrm{F}\), we can then characterize a candidate leader function by \(\gamma _\mathrm{L}:=(\alpha _\mathrm{L})^{-1}\). To indicate that a function \(\alpha _\mathrm{L}\in \mathcal {A}_\mathrm{L}\) is not only affine, but also it is in addition a subset of \(X\), we define \(\mathcal {A}_\mathrm{L}^X:=\{\alpha _\mathrm{L}\in \mathcal {A}_\mathrm{L} : \alpha _\mathrm{L}\subseteq X \}\).
2.2 Notation
The analysis mostly relies on concepts from convex analysis and geometry, such as hyperplanes and strictly convex functions and sets (see, e.g., [24, 25]). We denote the convex hull of a set \(X\) by \(\mathrm{conv}(X)\). In addition, the following notation is adopted:
-
By \(f(\tilde{X})\), we denote the image of a function \(f:X\rightarrow Y\) for a subset \(\tilde{X}\subseteq X\), where the domain is denoted \(X:=\mathrm{dom}(f)\).
-
\(\Pi _X(x)\) denotes a supporting hyperplane to the set \(X\) at the point \(x\in X\).
-
As in [24], a set \(X\) is an affine subspace iff \(\forall y,z\in X, \forall \alpha \in \mathbb {R}: \alpha y + (1-\alpha )z\in X\).
-
An exposed point of a convex set \(X\) is defined as a point in its closure \(\bar{X}:= {{\bigcap }} \left\{ X+\epsilon B : \epsilon >0 \right\} \) with \(B\) the Euclidean unit ball \(B:=\{ x : {\vert } x {\vert }_2 \le 1 \}\) that intersects with a strictly supporting hyperplane to \(X\) [25] (see also Fig. 1). Similarly, a point \(\tilde{x}\) in the closure of a nonconvex set \(\tilde{X}\) is an exposed point if there exists a neighborhood of \(\tilde{x}\), \(\mathcal {N}(\tilde{x})\), such that \(\tilde{x}\) intersects with a strictly supporting hyperplane to \(\mathcal {N}(\tilde{x})\).
-
The projection of the set \(P\subseteq \mathbb {R}^{n}\) onto the space \(X=\mathbb {R}^m,m\le n\) is denoted \(\mathrm{proj}_{X}(P)\).
-
By \(\{ 0 \}^{n_\mathrm{L}}\times \Omega _\mathrm{F}\), we denote the decision space in which the leader components are taken to be zero.
-
A generalized gradient \(\partial f(x)\) of a locally Lipschitz continuous function \(f:\mathbb {R}^n\rightarrow \mathbb {R}\) at \(x\) is defined as follows:
$$\begin{aligned} \partial f(x):=\mathrm{conv} (\{\lim _{m\rightarrow \infty }\nabla f(x_m): x_m\rightarrow x, x_m \in \mathrm{dom}(f){\setminus } \Omega _f \} ), \end{aligned}$$with \(\Omega _f\) being the set of points where \(f\) is nondifferentiable [26]. By \(\mathcal {V}(X(x))\), we denote the generalized normal to the set \(X\) at the point \(x\in \bar{X}\), defined as a basis of the generalized gradient at \(x\in \bar{X}\). Thus, in case \(\Lambda _\mathrm{d}\) is smooth at \(\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \), \(\mathcal {V}\left( \Lambda _\mathrm{d}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \right) =\nabla \mathcal {J}_\mathrm{F}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \). The generalized normal can be defined through a finite or an infinite number of linearly independent basis vectors as clarified in Fig. 5.
2.3 Assumptions
-
\(\text {[A.1]}\) Let \(\Omega _\mathrm{L},\Omega _\mathrm{F}\) be convex sets.
-
\(\text {[A.2]}\) Let \(\Lambda _\mathrm{d}\) be a connected set.
-
\(\text {[A.3]}\) Let \(n_\mathrm{L},n_\mathrm{F}\) be finite.
-
\(\text {[A.4]}\) Let \(\Lambda _\mathrm{d}\not = \left\{ \left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \right\} \).
The first assumption is taken from the literature on Stackelberg games, e.g., [8, 27]. Assumption A.2 is a less restrictive case of taking \(\mathcal {J}_\mathrm{F}\), and therefore also \(\Lambda _\mathrm{d}\) to be strictly convex, as was done in [8]. Note that we do not require \(\mathcal {J}_\mathrm{F}\) to be continuous. In case \(\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \in \mathrm{bd}(\mathrm{conv}(\Lambda _\mathrm{d}))\), the assumption can even be omitted altogether as the analysis below is focused on \(\mathrm{conv}(\Lambda _\mathrm{d})\) rather than \(\Lambda _\mathrm{d}\). Assumption A.3 is accepted in many control applications [28, 29]; moreover, it is necessary in order to use the concept of a supporting hyperplane. Finally, the special case excluded by assumption A.4 presents the trivial situation in which \(\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \) is automatically optimal for the follower as well.
3 Existence Conditions
In the current section, basic necessary and sufficient conditions for the existence of an optimal affine leader function are proposed for the most general case, in which the sublevel set \(\Lambda _\mathrm{d}\) is allowed to be nonconvex and nonsmooth, in an unconstrained decision space. These conditions form the basis for a characterization of the set of optimal solutions, as provided in Sect. 4. An explicit analysis of the subcases in which convex sublevel sets apply that are smooth at \(\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \) can be found in [17]. Here, it should be noted that, when relaxing the strict convexity of the follower objective functional from the original results in [8], the desired leader equilibrium is not automatically a boundary point of the convex hull of the sublevel set. Exclusion of this case prevents the current theory from being generally applicable, as is illustrated in Fig. 2b. There, an example is shown of an optimal affine leader function for a desired equilibrium \(\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \), that is, in the interior of the convex hull of \(\Lambda _\mathrm{d}\).
We first need to consider the special case of \(n_\mathrm{L}=1\); no optimal affine leader function then exists if, in addition, \(\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \) is not an exposed point of \(\mathrm{conv}(\Lambda _\mathrm{d})\).
Proposition 3.1
Let \(\Omega _\mathrm{L}=\mathbb {R}^{n_\mathrm{L}},\Omega _\mathrm{F}=\mathbb {R}^{n_\mathrm{F}}\) and assume that \(n_\mathrm{L}=1\). Then, the desired equilibrium \(\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \) can be reached under an affine \(\gamma _\mathrm{L}:{\Omega }_\mathrm{F}\rightarrow \Omega _\mathrm{L}\) if and only if it holds both that \(\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \) is an exposed point of \(\mathrm{conv}(\Lambda _\mathrm{d})\) and that
Proof
First note that, since \(n_\mathrm{L}=1\), if and only if a strictly supporting hyperplane \(\Pi _{\mathrm{conv}(\Lambda _\mathrm{d})}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \) exists, it coincides with an affine \(\alpha _\mathrm{L}\in \mathcal {A}_\mathrm{L}\), as is shown in Lemma 6.3. Note that a plane \(\Pi _{\mathrm{conv}(\Lambda _\mathrm{d})}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \) is strictly supporting if and only if \(\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \) is an exposed point of \(\mathrm{conv}(\Lambda _\mathrm{d})\) (implying that \(\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \not \in \mathrm{int}(\mathrm{conv}(\Lambda _\mathrm{d}))\)).
It remains to be shown that, in addition to \(\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \) being exposed, in order for \(\alpha _\mathrm{L}^{\Pi _{\Lambda _\mathrm{d}}}(\Omega _\mathrm{L})=\Omega _\mathrm{F}\) to hold, it is necessary and sufficient that there exists a vector \(\varvec{\nu }\in \mathcal {V}\left( \Lambda _\mathrm{d}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \right) :\mathrm{proj}_{\Omega _\mathrm{L}}(\varvec{\nu })\not = \{\mathbf {0}\}\), from which it follows that the projection \(\mathrm{proj}_{\Omega _\mathrm{L}}\left( \mathcal {V}\left( \mathrm{conv}\left( \Lambda _\mathrm{d}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \right) \right) \right) \) should not include only the zero vector. In that case, no explicit description of a leader function exists. This sufficiency and necessity is proven next.
(\(\Rightarrow \)) By contraposition: Suppose that \(\mathrm{proj}_{\Omega _\mathrm{L}}\left( \mathcal {V}\left( \mathrm{conv}\left( \Lambda _\mathrm{d}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \right) \right) \right) = \{\mathbf {0}\}\). Then, there exists a tangent plane \(\Pi _{\mathrm{conv}(\Lambda _\mathrm{d})}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \) with a normal vector \(\varvec{\nu }\) for which it holds that \( \mathrm{proj}_{\Omega _\mathrm{L}}(\varvec{\nu }) =\{\mathbf {0}\}\). It follows that this normal vector defining the hyperplane \(\Pi _{\mathrm{conv}(\Lambda _\mathrm{d})}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \) is parallel to the decision space \(\Omega _\mathrm{F}\), i.e., the hyperplane is orthogonal to \(\{ 0 \}^{n_\mathrm{L}}\times \Omega _\mathrm{F}\). Therefore, \(\mathrm{proj}_{\Omega _\mathrm{F}}\left( \Pi _{\Lambda _\mathrm{d}}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \right) \not \supset \Omega _\mathrm{F}\) and \(\Pi _{\Lambda _\mathrm{d}}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \) will not include any elements \((\mathbf {u}_\mathrm{L},\mathbf {u}_\mathrm{F})\in \Omega _\mathrm{L}\times (\Omega _\mathrm{F}{\setminus } \{\mathbf {u}_\mathrm{F}^\mathrm{d}\})\), which implies that \(\alpha _\mathrm{L}^{\Pi _{\Lambda _\mathrm{d}}}(\Omega _\mathrm{L}) \varsubsetneq \Omega _\mathrm{F}\).
(\(\Leftarrow \)) If \(\mathrm{proj}_{\Omega _\mathrm{L}}\left( \mathcal {V}\left( \mathrm{conv}\left( \Lambda _\mathrm{d}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \right) \right) \right) \not = \{\mathbf {0}\}\), then there exists a normal vector \(\varvec{\nu }\in \mathcal {V}\left( \mathrm{conv}\left( \Lambda _\mathrm{d}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \right) \right) \) defining a hyperplane \(\Pi _{\mathrm{conv}(\Lambda _\mathrm{d})}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \) that is not orthogonal to the decision space \(\Omega _\mathrm{L}\).
It follows that the hyperplane is not orthogonal to \(\{ 0 \}^{n_\mathrm{L}}\times \Omega _\mathrm{F}\): \(\mathrm{proj}_{\Omega _\mathrm{L}}\left( \mathcal {V}\left( \mathrm{conv}\left( \Lambda _\mathrm{d}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \right) \right) \right) = \Omega _\mathrm{F}\).
Hence, for all \(\mathbf {u}_\mathrm{F}\in \Omega _\mathrm{F}\), there exists \(\mathbf {u}_\mathrm{L}\in \Omega _\mathrm{L} : (\mathbf {u}_\mathrm{L},\mathbf {u}_\mathrm{F})\in \Pi _{\mathrm{conv}(\Lambda _\mathrm{d})}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \). Thus, there exists an affine \(\alpha _\mathrm{L}^{\Pi _{\Lambda _\mathrm{d}}}:\alpha _\mathrm{L}^{\Pi _{\Lambda _\mathrm{d}}}(\Omega _\mathrm{L})=\Omega _\mathrm{F}\).
Under the use of a leader function \(\gamma _\mathrm{L}:=\left( \alpha _\mathrm{L}^{\Pi _{\mathrm{conv}(\Lambda _\mathrm{d})\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) }}\right) ^{-1}\), by definition of the level set (6), the minimum of \(\mathcal {J}_\mathrm{F}(\cdot )\) will be obtained at \(\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \). \(\square \)
An example of a case in which \(\Lambda _\mathrm{d}\) is nonsmooth and no affine \(\gamma _\mathrm{L}\) exists is depicted in Fig. 2a: here, \(\mathrm{proj}_{\Omega _\mathrm{L}}\left( \mathcal {V}\left( \Lambda _\mathrm{d}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \right) \right) = \{\mathbf {0}\}\).
Propositions 3.2 and 3.3 consider, respectively, the case in which the desired leader equilibrium \(\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \) is an exposed point of \(\mathrm{conv}(\Lambda _\mathrm{d})\), or is in the interior of \(\mathrm{conv}(\Lambda _\mathrm{d})\) for \(n_\mathrm{L}\ge 1\) and \(n_\mathrm{L}> 1\).
Proposition 3.2
Let \(n_\mathrm{L}\ge 1\) and assume that \(\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \) is an exposed point of \(\mathrm{conv}(\Lambda _\mathrm{d})\). Allow \(\Lambda _\mathrm{d}\) to be nonsmooth at \(\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \) and assume that \(\Omega _\mathrm{L}=\mathbb {R}^{n_\mathrm{L}},{ \Omega }_\mathrm{F}=\mathbb {R}^{n_\mathrm{F}}\). Then, the desired equilibrium \(\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \) can be reached under an affine \(\gamma _\mathrm{L}:\Omega _\mathrm{F}\rightarrow \Omega _\mathrm{L}\) if and only if \(\mathrm{proj}_{\Omega _\mathrm{L}}\left( \mathcal {V}\left( \mathrm{conv}\left( \Lambda _\mathrm{d}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \right) \right) \right) \not = \{\mathbf {0}\}\).
Proof
The necessity and sufficiency of the expression
is proven in the previous Proposition 3.1 for \(\gamma _\mathrm{L}:=\Pi _{\mathrm{conv}(\Lambda _\mathrm{d})}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \). In case \(n_L> 1\), it holds that \(\gamma _\mathrm{L} \subset \Pi _{\mathrm{conv}(\Lambda _\mathrm{d})}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \). Thus, there exists at least one leader function on the plane that satisfies the condition that \(\alpha _\mathrm{L}^{\Pi _{\Lambda _\mathrm{d}}}(\Omega _\mathrm{L}) = \Omega _\mathrm{F}\). \(\square \)
Finally, for \(n_\mathrm{L}>1\), requiring \(\alpha _\mathrm{L}\in \mathcal {A}_\mathrm{L}\) to lie on a supporting hyperplane separating the full \((n_\mathrm{F}+n_\mathrm{L})\)-dimensional decision space into subspaces is generally too restrictive for the existence of an optimal affine leader function. This applies to, e.g., the general nonconvex case under a constrained decision space, and to the case in which \(\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \in \mathrm{int(conv}(\Lambda _\mathrm{d}))\), as depicted in Fig. 2b. Hence, instead, the tangent hyperplane concept is adopted in Proposition 3.3 below.
Proposition 3.3
Let \(n_\mathrm{L} > 1\) and assume that \(\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \in \mathrm{int}\left( \mathrm{conv}\left( {\Lambda }_\mathrm{d}\right) \right) \). Allow \(\Lambda _\mathrm{d}\) to be nonsmooth at \(\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \) and assume that \({\Omega _L}=\mathbb {R}^{n_\mathrm{L}},{ \Omega _\mathrm{F}}=\mathbb {R}^{n_\mathrm{F}}\). Then, the desired equilibrium \(\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \) can be reached under an affine \(\gamma _\mathrm{L}:\Omega _\mathrm{F}\rightarrow \Omega _\mathrm{L}\) if and only if there exists an \(n_\mathrm{F}\)-dimensional tangent, affine subspace \(\Pi _\mathrm{d}^\mathrm{t}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \) to \(\Lambda _\mathrm{d}\) at \(\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \) such that \(\Pi _\mathrm{d}^\mathrm{t}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \cap \Lambda _\mathrm{d}=\left\{ \left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \right\} \), and such that \(\mathrm{proj}_{\Omega _\mathrm{L}}\left( \mathcal {V}\left( \Lambda _\mathrm{d}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \right) \right) \ne \{\mathbf {0}\}\).
Proof
Since \(\alpha _\mathrm{L}\in \mathcal {A}_\mathrm{L}\) is of the same dimension as a tangent, affine subspace \(\Pi _\mathrm{d}^\mathrm{t}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \), there exists \(\alpha _\mathrm{L}\in \mathcal {A}_\mathrm{L}: \alpha _\mathrm{L}\cap \Lambda _\mathrm{d}=\{\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \}\) if and only if
In order for \(\alpha _\mathrm{L}\in \mathcal {A}^{\Pi _\mathrm{d}^\mathrm{t}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) }_\mathrm{L}\) to be a mapping \(\Omega _\mathrm{L}\rightarrow \Omega _\mathrm{F}\), it is necessary and sufficient that \(\mathrm{proj}_{\Omega _\mathrm{L}} \left( \mathcal {V}\left( \Lambda _\mathrm{d}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \right) \right) \ne \{\mathbf {0}\}\), as was proven before in Proposition 3.1. \(\square \)
Remark 3.1
Consider the case with \(\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \in \mathrm{bd}\left( \mathrm{conv}\left( \Lambda _\mathrm{d}\right) \right) \), but where \(\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \) is not an exposed point, i.e., no supporting hyperplane \(\Pi _{\mathrm{conv}(\Lambda _\mathrm{d})}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \) exists that intersects with \(\mathrm{conv}(\Lambda _\mathrm{d})\), thus also with \(\Lambda _\mathrm{d}\) solely in the point \(\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \). However, by definition of the convex hull, there does exist a supporting hyperplane \(\tilde{\Pi }_{\mathrm{conv}(\Lambda _\mathrm{d})}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \) for which thus holds that \(\tilde{\Pi }_{\mathrm{conv}(\Lambda _\mathrm{d})}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \cap \Lambda _\mathrm{d}{\setminus }\left\{ \left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \right\} \not = \emptyset \). For this case not captured in Proposition 3.2, with \(n_\mathrm{L}=1\), an optimal affine \(\gamma _\mathrm{L}\) may still exist, as depicted in Fig. 3.
Remark 3.2
In the special case with \(\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \) exposed and with scalar decision variables (\(n_\mathrm{F}=1,n_\mathrm{L}=1\)), an affine \(\gamma _\mathrm{L}:\Omega _\mathrm{F}\rightarrow \Omega _\mathrm{L}\) leading to \(\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \) automatically exists.
Since \(\Lambda _\mathrm{d}\) is nonsmooth at \(\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \), a supporting hyperplane to \(\Lambda _\mathrm{d}\) will not be a unique (tangent) hyperplane. By both the convexity of \(\Lambda _\mathrm{d}\) and by \(\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \) being an exposed point, we know that \(\not \exists \mathbf {u}_\mathrm{L}\in \Omega _\mathrm{L}{\setminus } \{\mathbf {u}_\mathrm{L}^\mathrm{d}\} : \, \{(\mathbf {u}_\mathrm{L},\mathbf {u}_\mathrm{F}^\mathrm{d})\} \in \Lambda _\mathrm{d}\). Therefore, there must exist an alternative normal vector defining the hyperplane \(\Pi _{\Lambda _\mathrm{d}}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \) that is not orthogonal to \(\{ 0 \}^{n_\mathrm{L}}\times \Omega _\mathrm{F}\). For such a vector, \(\Pi _{\Lambda _\mathrm{d}}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \), and therefore \(\alpha ^{\Pi _{\Lambda _\mathrm{d}}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) }_\mathrm{L}\), will cover \(\Omega _\mathrm{F}: \mathrm{dom}(\gamma _\mathrm{L})= \Omega _\mathrm{F}\).
4 Characterization of an Optimal Affine Leader Function
In this section, the full set of affine leader functions will be derived under which the leader is able to induce the follower to choose the input \(\mathbf {u}_\mathrm{F}^\mathrm{d}\) and thereby to reach the desired solution point. In Sect. 4.1, first the case considered in the literature is summarized, after which we deal with the more general case in Sect. 4.2.
4.1 Under Differentiability Assumptions
A characterization of an optimal affine leader function (7), which reduces to the computation of an \(n_\mathrm{L}\times n_\mathrm{F}\) matrix \(\mathbf {B}\), was first derived in [8] in case \(\mathcal {J}_\mathrm{F}(\cdot )\) is differentiable in \(\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \).
In order to make sure that \(\mathbf {B}\) exists as defined next, it is assumed in [8] that \(\Omega _\mathrm{L},\Omega _\mathrm{F}\) are Hilbert spaces and that \(\mathcal {J}_\mathrm{F}(\cdot )\) is Fréchet differentiable on \(\Omega _\mathrm{L}\times \Omega _\mathrm{F}\). Additionally, \(\mathcal {J}_\mathrm{F}(\cdot )\) is assumed to be strictly convex on \(\Omega _\mathrm{L}\times \Omega _\mathrm{F}\). Then, for \(n_\mathrm{L},n_\mathrm{F}\) finite – a similar analysis is applicable for the infinite case – \(B\) should satisfy
which holds under the assumption that \(\nabla _\mathbf{u _\mathrm{L}}\mathcal {J}_\mathrm{F}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \not = 0\) (as follows from the conditions for the existence of an optimal affine leader function) and which can be verified by taking the inner product of the expression \(\mathbf u _\mathrm{L}=0\) according to (7) and \([\nabla _\mathbf{u _\mathrm{L}}\mathcal {J}_\mathrm{F}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) ]^\mathrm{T}\). This product
corresponds exactly to the expression of a tangent hyperplane \(\Pi _{\Lambda _\mathrm{d}}^\mathrm{t}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \) to \(\Lambda _\mathrm{d}\) at \(\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \), from which it is concluded that if (9) holds, the affine function \(\gamma _\mathrm{L}\) indeed lies on the hyperplane \(\Pi ^\mathrm{t}_{\Lambda _\mathrm{d}}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \).
Under the condition \(\nabla _{\mathbf {u}_\mathrm{L}}\mathcal {J}_\mathrm{F}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \not = 0\), the following expression is given in [8]:
Note that this is only one of many possible expressions for \(n_\mathrm{L}>1\). Moreover, in some constrained cases, this expression does not yield an optimal leader function, while an alternative, optimal affine solution, does exist as will be illustrated by Example 4.1 below. A generalized characterization of the optimal affine leader function will therefore be presented in Sect. 4.2 below.
Example 4.1
Expression (13) Subject to Constraints
We now provide a situation in which the specific expression of \(\mathbf {B}\) proposed in [8] for \(\mathcal {J}_\mathrm{F}(\cdot )\) differentiable at \(\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \) does not yield a feasible leader function in the constrained case, but in which an optimal leader function does exist.
Let
and let \((\mathbf u _{\mathrm{L},1}^\mathrm{d},\mathbf u _{\mathrm{L},2}^\mathrm{d},\mathbf u _{\mathrm{F}}^\mathrm{d})=(0.5,6,4)\).
Then, \(\nabla _\mathbf{u _\mathrm{F}}\mathcal {J}_\mathrm{F}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) =2 \mathbf u _\mathrm{F}-12\), \(\nabla _\mathbf{u _\mathrm{L}}\mathcal {J}_\mathrm{F}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) =\begin{bmatrix} 2\mathbf u _{L,1}-2 \\ 2\mathbf u _{L,2}-10\end{bmatrix}\), leading to
As can be seen in Fig. 4, a mapping \(\gamma _\mathrm{L}\) as defined through (7) with \(\mathbf {B}\) as defined in (14) does not return values for all \(\mathbf {u}_\mathrm{F}\in \Omega _\mathrm{F}\) for the constraints
Thus, a parametrization \(\mathbf {B}\) as defined by (13) does not belong to the characterization of an optimal leader function in this constrained case.
However, there do exist optimal affine mappings \(\Omega _\mathrm{F}\rightarrow \Omega _\mathrm{L}\) through \(\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \) that lie on \(\Pi _{\Lambda _\mathrm{d}}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \). As also plotted in Fig. 4, a suitable leader function that also lies on the tangent hyperplane defined by the relation
would be
4.2 The General Case
As the previously presented result only captures the case in which \(\mathcal {J}_\mathrm{F}(\cdot )\) is differentiable and moreover as only one particular solution is specified, we now provide a characterization of the full set of possible leader functions with an affine structure that are optimal in an unconstrained decision space for the cases in which \(\mathcal {J}_\mathrm{F}(\cdot )\) is not required to be differentiable. Based on such a set, one can apply further, secondary selection criteria like the minimization of sensitivity of deviations from the optimal response [18], as well as deal with constraints on the decision space, as will be shown in Sect. 5 below.
In the following, we characterize \(\gamma _\mathrm{L}\) as a linear combination of matrices \(R=\begin{bmatrix}\mathbf {R}_\mathrm{L}^\mathrm{T}&\mathbf {R}_\mathrm{F}^\mathrm{T}\end{bmatrix}^\mathrm{T}\), \(\mathbf {R} \in \mathbb {R}^{(n_\mathrm{L}+n_\mathrm{F})\times n_\mathrm{F}}, \mathbf {R}_\mathrm{L}\in \mathbb {R}^{n_\mathrm{L}\times n_\mathrm{F}}, \mathbf {R}_\mathrm{F}\in \mathbb {R}^{n_\mathrm{F}\times n_\mathrm{F}}\), i.e.,
where \(s\in \mathbb {R}^{n_\mathrm{F}}\) represents the free parameters of the affine function. Now, for \(\mathbf {R}_\mathrm{F}\) invertible—which automatically follows from the necessary conditions, as will be proven in Lemma 4.1 below—it follows that
i.e., one arrives at the explicit form of leader function (7). The problem left in order to arrive at a full characterization of an optimal affine \(\gamma _\mathrm{L}\) is to determine the set of possible basis vectors.
Lemma 4.1
In order for a leader function \(\gamma _\mathrm{L}\) characterized by (15) to be optimal, for \(\mathbf {R}=\begin{bmatrix}\mathbf {R}_\mathrm{L}^\mathrm{T}&\mathbf {R}_\mathrm{F}^\mathrm{T}\end{bmatrix}\), the following should hold:
-
1.
\(\exists \varvec{\nu } \in \mathcal {V}(X\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) ):\varvec{\nu }^\mathrm{T} \mathbf {R}= \mathbf {0}^\mathrm{T}\), with \(\mathcal {V}\left( X\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \right) \) the generalized normal to \(X\) at \(\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \), where \(X=\mathrm{conv}(\mathrm \Lambda _\mathrm{d}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) )\) in case \(\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \) is exposed with respect to \(\mathrm{conv}(\mathrm \Lambda _\mathrm{d})\) (Proposition 3.2 applies), or \(X=\mathrm \Lambda _\mathrm{d}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \) in case Proposition 3.3 applies, i.e., \(\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \in \mathrm{int(conv)}\left( \mathrm \Lambda _\mathrm{d}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \right) \).
-
2.
The columns of \(\mathbf {R}_\mathrm{F}\) should be a basis for \(\mathrm \Omega _\mathrm{F}\), i.e., \(\mathbf {R}_\mathrm{F}\) should be of full rank \(n_\mathrm{F}\) and thus invertible.
Proof
-
1.
By definition of a tangent hyperplane \(\Pi _\mathrm{d}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \) to a set \(X\) at \(\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \), it holds that \(\Pi _\mathrm{d}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \perp \varvec{\nu }\) for some \(\varvec{\nu } \in \mathcal {V}\left( X\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \right) \). Since we require each optimal \(\gamma _\mathrm{L}\) characterized by (15) to lie on \(\Pi _\mathrm{d}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \), it follows that it is needed that also each column of \(\mathbf {R}\) is orthogonal to \(\varvec{\nu }\) , i.e., \(\varvec{\nu }^\mathrm{T}\mathbf {R}=0^\mathrm{T}\). Note that, in case \(\mathcal {J}_\mathrm{F}(\cdot )\) is differentiable, i.e.,
$$\begin{aligned}&\mathcal {V}\left( X\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \right) \\&\quad =\left\{ \varvec{\nu }=\begin{bmatrix} \varvec{\nu }_\mathrm{L}^\mathrm{T}&\varvec{\nu }_\mathrm{F}^\mathrm{T} \end{bmatrix}^\mathrm{T}: \varvec{\nu }_\mathrm{L}=\nabla _{\mathbf {u}_\mathrm{L}}\mathcal {J}_\mathrm{F}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) ,\varvec{\nu }_\mathrm{F}=\nabla _{\mathbf {u}_\mathrm{F}}\mathcal {J}_\mathrm{F}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \right\} , \end{aligned}$$this condition is equivalent to the expression of a tangent hyperplane \(\Pi ^\mathrm{t}_{\Lambda _\mathrm{d}}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \): \([\nabla _{\mathbf {u}_\mathrm{L}}\mathcal {J}_\mathrm{F}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) ]^\mathrm{T}(\mathbf {u}_\mathrm{L}^\mathrm{d}-\mathbf {u}_\mathrm{L}) + [\nabla _{\mathbf {u}_\mathrm{F}}\mathcal {J}_\mathrm{F}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) ]^\mathrm{T} (\mathbf {u}_\mathrm{F}^\mathrm{d}-\mathbf {u}_\mathrm{F})=0\).
-
2.
For an optimal affine leader function \(\gamma _\mathrm{L}(\cdot )\) characterized by (15) to satisfy \(\mathrm{dom}(\gamma _\mathrm{F})=\Omega _\mathrm{F}\), it is required that the \(n_\mathrm{F}\) columns of \(\mathbf {R}_\mathrm{F}\) are independent basis vectors spanning \(\Omega _\mathrm{F}\). Thus, \(\mathbf {R}_\mathrm{F}\) is of full rank hence invertible. \(\square \)
In fact, we can select w.l.o.g. \(\mathbf {R}_\mathrm{F}:= \mathbf {I}_{n_\mathrm{F}} = \begin{bmatrix} \mathbf {e}_1&\dots&\mathbf {e}_{n_\mathrm{F}}\end{bmatrix}\), as shown in Lemma 4.2.
Lemma 4.2
If there exists an optimal affine \(\gamma _\mathrm{L}(\cdot )\) characterized by (15), one can select w.l.o.g. \(\mathbf {R}_\mathrm{F}=\mathbf {I}_{n_\mathrm{F}}\).
Proof
Consider
To prove that \(\mathcal {S}\equiv \tilde{\mathcal {S}}\), we will show that, for each possible 3-tuple \((s,\mathbf {I}_{n_\mathrm{F}},\mathbf {R}_\mathrm{L})\) according to (15) with \(\varvec{\nu }^\mathrm{T}\begin{bmatrix} \mathbf {R}_\mathrm{L}^\mathrm{T}&\mathbf {I}_{n_\mathrm{F}} \end{bmatrix}^\mathrm{T} =0^\mathrm{T}\) that yields some \(\mathbf {u}_\mathrm{L},\mathbf {u}_\mathrm{F}\), one can find an equivalent tuple \((\tilde{s},\tilde{R}_\mathrm{F},\tilde{R}_\mathrm{L})\), \(\tilde{s}\in \mathbb {R}^{n_\mathrm{F}}\) for which additionally it holds that \(\varvec{\nu }^\mathrm{T}\begin{bmatrix} \tilde{R}_\mathrm{L}^\mathrm{T}&\tilde{R}_\mathrm{F} \end{bmatrix}^\mathrm{T} =0^\mathrm{T}\), yielding the same values \(\mathbf {u}_\mathrm{L},\mathbf {u}_\mathrm{F}\).
It can be easily seen that the expression \(\mathbf {u}_\mathrm{F}=\mathbf {u}_\mathrm{F}^\mathrm{d}+\mathbf {I}_{n_\mathrm{F}} \cdot s\) is equivalent to \(\mathbf {u}_\mathrm{F}=\mathbf {u}_\mathrm{F}^\mathrm{d}+\tilde{R}_{\mathrm{F}} \cdot \tilde{s}\) with \(s=\tilde{R}_F \cdot \tilde{s}\): as shown in Lemma 4.1, it follows from the existence of an optimal affine \(\gamma _\mathrm{L}(\cdot )\) that \(\mathbf {R}_\mathrm{F}\) is invertible. Then, for a given \(s\), there exists a unique \(\tilde{s}\) and vice versa. From \(\mathbf {B}=\tilde{R}_\mathrm{L}\tilde{R}_\mathrm{F}^{-1}\), according to (16) and from the substitution to \(\mathbf {B}=\mathbf {R}_\mathrm{L}\mathbf {I}_{n_\mathrm{F}}\), for equivalence, it should hold that \(\mathbf {R}_\mathrm{L}=\tilde{R}_\mathrm{L}\tilde{R}_\mathrm{F}^{-1}\). Finally, we have that
hence, \(\mathcal {S}=\mathcal {\tilde{S}}\). \(\square \)
Now, given \(\mathbf {R}_\mathrm{F}:=\mathbf {I}_{n_\mathrm{F}}\), we still need to identify the set of matrices \(\mathbf {R}_\mathrm{L}\) that satisfy \(\varvec{\nu } ^\mathrm{T} R=0^\mathrm{T}\) for some normal vector \(\varvec{\nu }\), which reduces to \(\begin{bmatrix} \varvec{\nu }_\mathrm{L}^\mathrm{T}&\varvec{\nu }_\mathrm{F}^\mathrm{T}\end{bmatrix} \begin{bmatrix}\mathbf {R}_\mathrm{L} \\ \mathbf {I}_{n_\mathrm{F}}\end{bmatrix} =0^\mathrm{T}\) or, equivalently, \(\varvec{\nu }_\mathrm{L}^\mathrm{T}\mathbf {R}_\mathrm{L} = -\varvec{\nu }_\mathrm{F}^\mathrm{T}\).
Due to the necessary condition \(\mathrm{proj}_{\Omega _\mathrm{L}}(\mathcal {V}(\mathrm{conv}(\Lambda _\mathrm{d}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) )))\not = \{\mathbf {0}\}\) (Proposition 3.2 applies) or \(\mathrm{proj}_{\Omega _\mathrm{L}}(\mathcal {V}(\Lambda _\mathrm{d}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) ))\not = \{\mathbf {0}\}\) (Proposition 3.3 applies), \(\varvec{\nu }_\mathrm{L}^\mathrm{T}\) must contain at least one nonzero entry. Hence, the expressions \(\varvec{\nu }_\mathrm{L}^\mathrm{T}\mathbf {R}_{\mathrm{L},j} = -\varvec{\nu }_{\mathrm{F},j}\), \(j=1,\ldots ,n_\mathrm{F}\), can be solved for. Proposition 4.1 below provides a parametrized characterization of this problem that will be needed for further optimization.
From the previous derivations, the following theorem automatically follows.
Theorem 4.1
Let \(\Omega _\mathrm{L}=\mathbb {R}^{n_\mathrm{L}}, \Omega _\mathrm{F}=\mathbb {R}^{n_\mathrm{F}}\). Assume that the conditions in Proposition 3.2 or Proposition 3.3 are satisfied, and that therefore an optimal affine leader function of the form (7) exists. Then, the set
contains optimal affine solutions that can be characterized by \(\mathbf {B}:=\mathbf {R}_\mathrm{L}\mathbf {I}_{n_\mathrm{F}}\), with \(\varvec{\nu }_\mathrm{L}^\mathrm{T}\mathbf {R}_\mathrm{L}=-\varvec{\nu }_\mathrm{F}^\mathrm{T}\), for some \(\varvec{\nu } \in \mathcal {V}(\mathrm{conv}(\Lambda _\mathrm{d}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) ))\) in case \(\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \) is exposed with respect to \(\mathrm{conv}(\Lambda _\mathrm{d})\) (Proposition 3.2 applies), or for some some \(\varvec{\nu } \in \mathcal {V}\left( \Lambda _\mathrm{d}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \right) \) in case \(\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \in \mathrm{int}\left( \mathrm{conv}\left( \Lambda _\mathrm{d}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \right) \right) \) (Proposition 3.3 applies).
For the sake of conciseness, in the remainder of this section, we will assume \(\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \) to be an exposed point of \(\mathrm{conv}(\Lambda _\mathrm{d})\). As a result, we consider the case in which Proposition 3.2 is satisfied rather than Proposition 3.3, i.e., we consider the generalized normal \(\mathcal {V}\left( \mathrm{conv}\left( \Lambda _\mathrm{d}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \right) \right) \). For the case in which Proposition 3.3 holds, the generalized normal should be substituted by \(\mathcal {V}\left( \Lambda _\mathrm{d}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \right) \) in the following.
In order to be able to optimize over the set of possible leader functions and to select a function that is optimal with respect to some criteria, Proposition 4.1 now provides a parametrized characterization of the optimal affine leader function.
Proposition 4.1
Let \(\Gamma _\mathrm{L}^* := \{ \gamma _\mathrm{L}: \Omega _\mathrm{F} \rightarrow \Omega _\mathrm{L} : \gamma _\mathrm{L}\text { satisfies }\)(5), (15)}.
-
1.
For \(\mathcal {J}_\mathrm{F}(\cdot )\) nondifferentiable at \(\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \), the possible realizations of \(\mathbf {R}_\mathrm{L}\in \mathbb {R}^{n_\mathrm{L}\times n_\mathrm{F}}\) can be written as
$$\begin{aligned} \mathbf {R}_\mathrm{L}\in \mathcal {R}_\mathrm{L}&:= \left\{ \begin{bmatrix} \mathbf {R}_{\mathrm{L},1}&\ldots&\mathbf {R}_{\mathrm{L},n_\mathrm{F}} \end{bmatrix} : \mathbf {R}_j := \begin{bmatrix}\mathbf {R}_{\mathrm{L},j} \\ \mathbf {e}_j\end{bmatrix} \in \mathcal {R}_j, j=1,\ldots ,n_\mathrm{F} \right\} , \end{aligned}$$with the set of possible columns of \(\mathbf R \in \mathbb {R}^{(\mathrm{n}_\mathrm{L}+\mathrm{n}_\mathrm{F})\times \mathrm{n}_\mathrm{F}}\) characterized by
$$\begin{aligned} \mathcal {R}_j&:= \left\{ \mathbf {Q}\cdot \mathbf {W}\cdot p^+_j \left| \begin{array}{l} p^{+}_j:= \left( \displaystyle \sum \limits _{i=1}^{N^\mathrm{f}_{+,j}} \alpha ^{+}_{i,j} \beta ^\mathrm{f}_{i,+,j} + \sum \limits _{i=1}^{N^\mathrm{e}_{+,j}} \mu ^{+}_{i,j} \beta ^\mathrm{e}_{i,+,j}\right) \\ \displaystyle \sum \limits _i \alpha ^{+}_{i,j}=1,\alpha ^{+}_{i,j} \in \mathbb {R}_+,\mu ^{+}_{i,j}\in \mathbb {R}_{+} \end{array} \right. \right\} \nonumber \\&\cup \left\{ \mathbf {Q}\cdot (-\mathbf {W})\cdot p^-_j \left| \begin{array}{l} p^{-}_j:= \left( \displaystyle \sum \limits _{i=1}^{N^\mathrm{f}_{-,j}} \alpha ^{-}_{i,j} \beta ^\mathrm{f}_{i,-,j} + \sum \limits _{i=1}^{N^\mathrm{e}_{-,j}} \mu ^{-}_{i,j} \beta ^\mathrm{e}_{i,-,j}\right) \\ \displaystyle \sum \limits _i \alpha ^{-}_{i,j}=1,\alpha ^{-}_{i,j} \in \mathbb {R}_+,\mu ^{-}_{i,j}\in \mathbb {R}_{+} \end{array} \right. \right\} , \end{aligned}$$(17)for \(j=1,\ldots ,n_\mathrm{F}\), with \(\mathbf {Q}:=\begin{bmatrix} \mathbf {I}_{n_\mathrm{L}}&\mathbf {0}_{n_\mathrm{L}\times n_\mathrm{F}} \end{bmatrix}\) and with \(\mathbf {W}=\begin{bmatrix} \mathbf {w}_1&\ldots&\mathbf {w}_m \end{bmatrix}\), where \(\{\mathbf {w}_i\}_{i=1}^m\), \(m\in \{\mathbb {N},\infty \}\) with \(\mathbf {w}_i \in \mathbb {R}^{n_\mathrm{L}+n_\mathrm{F}}\) is the set of generators of \(\bar{\mathcal {V}}(\mathrm{conv}(\Lambda _\mathrm{d}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) ))\) such that
$$\begin{aligned}&\bar{\mathcal {V}}(\mathrm{conv}(\Lambda _\mathrm{d}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) ))\\&\quad :=\left\{ \sum \limits _{i=1}^m\beta _i : w_i \beta _i \in \mathbb {R}_+ \right\} \bigcup \left\{ \sum \limits _{i=1}^m\beta _i (-w_i) : \beta _i\in \mathbb {R}_+ \right\} . \end{aligned}$$Here, \(\{ \beta ^\mathrm{f}_{i,s,j}\}_{i=1}^{N_{s,j}^\mathrm{f}}\) and \(\{ \beta ^\mathrm{e}_{i,s,j}\}_{i=1}^{N_{s,j}^\mathrm{e}}\), \(s\in \{+,-\}\) are the sets of finite vertices and extreme rays, respectively, of the polyhedra \(\mathcal {P}^+_j=\{\varvec{\beta }: \mathbf {P}\mathbf {W}\varvec{\beta }=\mathbf {e}_j,\varvec{\beta }\in (\mathbb {R}_+)^m \}\) and \(\mathcal {P}^-_j=\{\varvec{\beta }: \mathbf {P}(-\mathbf {W})\varvec{\beta }=\mathbf {e}_j,\varvec{\beta }\in (\mathbb {R}_+)^m \}\).
-
2.
For \(\mathcal {J}_\mathrm{F}(\cdot )\) differentiable at \(\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \), \(\mathbf {R}_\mathrm{L}\) belongs to the affine space of the form
$$\begin{aligned} \mathcal {R}_{\mathrm{L}}:= \left\{ \mathbf {R}_\mathrm{L} : \mathbf {R}_\mathrm{L}= \mathbf {R}_{\mathrm{L}}^0 + \mathcal {B}_N \cdot \mathbf {T},\mathbf {T}\in \mathbb {R}^{\mathrm{dim}(N)\times n_\mathrm{F}} \right\} , \end{aligned}$$(18)with \(\mathbf {R}_\mathrm{L}^0\) a particular solution of \(\nabla _{\mathbf {u}_\mathrm{L}}^\mathrm{T}\mathcal {J}_\mathrm{F}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \mathbf {R}_{\mathrm{L}}= \nabla _{\mathbf {u}_\mathrm{F}}\mathcal {J}_\mathrm{F}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \) and with \(\mathcal {B}_N\) a basis of \(N:=\mathrm{null}\left( \nabla _{\mathbf {u}_\mathrm{L}}^\mathrm{T}\mathcal {J}_\mathrm{F}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \right) \).
Proof
-
1.
For \(\Lambda _\mathrm{d}\) nonsmooth at \(\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \), the associated generalized normal is by definition a convex and and pointed cone, i.e., for any \(a_1,a_2 \in \mathbb {R}_+\) and \(\ a_1 v_1 + a_2 v_2 \in \mathcal {V}\left( \mathrm{conv}\left( \Lambda _\mathrm{d}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \right) \right) \), \(\varvec{\nu }_1,\varvec{\nu }_2\in \mathcal {V}\left( \mathrm{conv}\left( \Lambda _\mathrm{d}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \right) \right) \). In fact, \( \mathcal {V}\left( \mathrm{conv}\left( \Lambda _\mathrm{d}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \right) \right) \) is the normal cone [25], defined by the set of normal vectors to \(\Lambda _\mathrm{d}\) at \(\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \), which is generated by \(n\in \{\mathbb {N},\infty \}\) generators:
$$\begin{aligned} \mathcal {V}\left( \mathrm{conv}\left( \Lambda _\mathrm{d}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \right) \right)&:=\left\{ \sum \limits _{i=1}^n\alpha _i \varvec{\nu }_i : \alpha _i\in \mathbb {R}_+,\varvec{\nu }_i\in \mathbb {R}^{n_\mathrm{L}+n_\mathrm{F}} \right. \nonumber \\&\quad \left. \text {a generator of } \mathcal {V}\left( \mathrm{conv}\left( \Lambda _\mathrm{d}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \right) \right) \right\} . \end{aligned}$$(19)The polar and dual cone of \(\mathcal {V}\left( \mathrm{conv}\left( \Lambda _\mathrm{d}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \right) \right) \) and its orthogonal complement are denoted, respectively, byFootnote 1
$$\begin{aligned} \mathcal {V}^0&:= \left\{ r \in \mathbb R^{n_\mathrm{L}+n_\mathrm{F}} : r^\mathrm{T} \cdot \varvec{\nu } \le 0 \quad \forall \varvec{\nu } \in \mathcal {V}\left( \mathrm{conv}\left( \Lambda _\mathrm{d}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \right) \right) \right\} , \\ \mathcal {V}^*&:= \left\{ r \in \mathbb R^{n_\mathrm{L}+n_\mathrm{F}} : r^\mathrm{T} \cdot \varvec{\nu } \ge 0 \quad \forall \varvec{\nu }\in \mathcal {V}\left( \mathrm{conv}\left( \Lambda _\mathrm{d}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \right) \right) \right\} , \\ \mathcal {V}^\perp&:= \left\{ r \in \mathbb R^{n_\mathrm{L}+n_\mathrm{F}} : r^\mathrm{T} \cdot \varvec{\nu } = 0 \quad \forall \varvec{\nu }\in \mathcal {V}\left( \mathrm{conv}\left( \Lambda _\mathrm{d}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \right) \right) \right\} . \end{aligned}$$It now follows from condition (1) of Lemma 4.1 that the set of possible columns \(\mathbf {R}_j,j=1,\ldots ,n_\mathrm{F}\) of \(R\) can be represented by
$$\begin{aligned}&\bar{\mathcal {V}}\left( \mathrm{conv}\left( \Lambda _\mathrm{d}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \right) \right) \nonumber \\&\quad :=\left\{ \mathbf {r}\in \mathbb R^{n_\mathrm{L}+n_\mathrm{F}} : \exists \varvec{\nu }\in \mathcal {V}\left( \mathrm{conv}\left( \Lambda _\mathrm{d}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \right) \right) \text { s.t. } \mathbf {r}^\mathrm{T} \cdot \varvec{\nu } = 0 \right\} \nonumber \\&\quad =\left( \mathbb {R}^{n_\mathrm{L}+n_\mathrm{F}}{\setminus } \left( \mathcal {V}^0\cup \mathcal {V}^*\right) \right) \cup \mathcal {V}^\perp . \end{aligned}$$(20)This last expression is illustrated in Fig. 5 where \(\bar{\mathcal {V}}\left( \mathrm{conv}\left( \Lambda _\mathrm{d}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \right) \right) \) corresponds to the area between \(\varvec{\nu }^\mathrm{o}_1,\varvec{\nu }^\mathrm{o}_2\) and the vectors \(\varvec{\nu }_1',\varvec{\nu }_2'\) perpendicular to \(\varvec{\nu }_1,\varvec{\nu }_2\). The expression (20) follows from the fact that the (closure of the) complement of a cone is again a cone [30]; hence, the complement of the double cone \(\mathcal {V}^0\cup \mathcal {V}^*\) [31], which consists of the union of two apex-to-apex placed pointed cones, i.e., \(\mathbb {R}^{n_\mathrm{L}+n_\mathrm{F}}{\setminus }\left( \mathcal {V}^0\cup \mathcal {V}^*\right) \), embodies a cone. Finally, the null vector is recovered in order to yield a pointed cone. The set \(\bar{\mathcal {V}}\left( \mathrm{conv}\left( \Lambda _\mathrm{d}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \right) \right) \) can therefore be written as a linear combination of generators \(\mathbf {w}_i\) from the set \(\{\pm \mathbf {w}_i\}_{i=1}^m\) as in (19), now also considering the negatives \(-\mathbf {w}_i\):
$$\begin{aligned} \bar{\mathcal {V}}\left( \mathrm{conv}\left( \Lambda _\mathrm{d}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \right) \right)&:= \left\{ \sum \limits _{i=1}^m\beta _i \mathbf {w}_i : \beta _i\in \mathbb {R}_+,\mathbf {w}_i\in \mathbb {R}^{n_\mathrm{L}+n_\mathrm{F}} \right\} \nonumber \\&\cup \left\{ \sum \limits _{i=1}^m\beta _i (-\mathbf {w}_i) : \beta _i\in \mathbb {R}_+,\mathbf {w}_i\in \mathbb {R}^{n_\mathrm{L}+n_\mathrm{F}} \right\} .\qquad \end{aligned}$$(21)Finally, since \(\mathbf {R}=\begin{bmatrix} \mathbf {R}_\mathrm{L}^\mathrm{T}&\mathbf {I}_{n_\mathrm{F}}\end{bmatrix}^\mathrm{T}\) by Lemmas 4.1 and 4.2, we need to select from \(\bar{\mathcal {V}}(\mathrm{conv}(\Lambda _\mathrm{d}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) ))\) those vectors \(\mathbf {r}\) such that, for \(j=1,\ldots ,n_\mathrm{F}\),
$$\begin{aligned} \mathcal {R}_j := \left\{ \mathbf {r}\in \bar{\mathcal {V}}\left( \mathrm{conv}\left( \Lambda _\mathrm{d}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \right) \right) : \mathrm{proj}_{\Omega _\mathrm{F}}(\mathbf {r}) = \mathbf {e}_j\right\} , \end{aligned}$$i.e., approved as a \(j\)-th column of \(\mathbf {R}\) are those vectors
$$\begin{aligned}&\mathbf {r}=\mathbf {W}\cdot \varvec{\beta }\in \bar{\mathcal {V}}\left( \mathrm{conv}\left( \Lambda _\mathrm{d}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \right) \right) \text { and }\\&\quad \mathbf {r}=-\mathbf {W}\cdot \varvec{\beta }\in \bar{\mathcal {V}}\left( \mathrm{conv}\left( \Lambda _\mathrm{d}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \right) \right) , \end{aligned}$$with \(\mathbf {W}=\begin{bmatrix} \mathbf {w}_1&\ldots&\mathbf {w}_m\end{bmatrix}\), \(\varvec{\beta } = \begin{bmatrix} \varvec{\beta }_1&\ldots&\varvec{\beta }_m\end{bmatrix}^\mathrm{T}\), such that, for \(\mathbf {P}:=\begin{bmatrix} \mathbf {0}_{n_\mathrm{F}\times n_\mathrm{L}}&\mathbf {I}_{n_\mathrm{F}} \end{bmatrix}\), we have
$$\begin{aligned} \mathbf {P}\cdot \mathbf {W}\cdot \varvec{\beta } =\mathbf {e}_j, \text { respectively, } \mathbf {P}\cdot \left( -\mathbf {W} \right) \cdot \varvec{\beta } =\mathbf {e}_j. \end{aligned}$$The solutions to these two equations, where it should be noted that \(\varvec{\beta }^+\in \mathbb {R}_+^{m}\), \(\varvec{\beta }^-\in \mathbb {R}_+^{m}\), can be parametrized using, e.g., the double description method [32] for the polyhedra
$$\begin{aligned}&\mathcal {P}^+_{j}=\{\varvec{\beta }^+ : \mathbf {P}\mathbf {W}\varvec{\beta }^+=\mathbf {e}_j,\varvec{\beta }^+ \ge \mathbf {0} \}\text { and } \mathcal {P}^-_{j}=\{\varvec{\beta }^- : \mathbf {P}(-\mathbf {W})\varvec{\beta }^-=\mathbf {e}_j,\varvec{\beta }^- \ge \mathbf {0} \}, \end{aligned}$$i.e.,
$$\begin{aligned} \beta ^s_{j}=\sum \limits _{i=1}^{N^\mathrm{f}_{s,j}} \alpha ^s_{i,j} \beta ^\mathrm{f}_{i,s,j} + \sum \limits _{i=1}^{N^\mathrm{e}_{s,j}} \mu ^s_{i,j} \beta ^\mathrm{e}_{i,s,j}, s\in \{+,-\}, \end{aligned}$$with \(\sum \limits _i \alpha ^s_{i,j}=1,\alpha ^s_{i,j} \in \mathbb {R}_+,\mu ^k_{i,j}\in \mathbb {R}_+\), and with \(\left\{ \beta ^\mathrm{f}_{i,s,j} : i=1,\ldots ,N_{s,j}^\mathrm{f}\right\} \) the set of finite vertices of \(\mathcal {P}^s_{j}\), and with \(\left\{ \beta ^\mathrm{e}_{i,s,j} : i=1,\ldots ,N_{k,s}^\mathrm{e}\right\} \) the set of extreme rays of \(\mathcal {P}^s_{j}\) [33], for \(s\in \{+,-\}\).
-
2.
For \(\mathcal {J}_\mathrm{F}(\cdot )\) differentiable at \(\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \), \(\mathcal {V}\left( \mathrm{conv}\left( \Lambda _\mathrm{d}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \right) \right) \) is uniquely defined by \(\varvec{\nu }=\nabla \mathcal {J}_\mathrm{F}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \) with \(\nabla _{\mathbf {u}_\mathrm{L}}\mathcal {J}_\mathrm{F}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \ne \mathbf {0}\) (by Proposition 3.2 and 3.3), and therefore \(\varvec{\nu }_\mathrm{L}^\mathrm{T}\mathbf {R}_{\mathrm{L},j} = -\varvec{\nu }_{\mathrm{F},j}, j=1,\dots ,\varvec{\nu }_\mathrm{F}\) can be solved as a simple system of equalities. Here, for each zero element \(\varvec{\nu }_{\mathrm{L},i}\), the corresponding entry \(\mathbf {R}_{\mathrm{L},0,j}\) of \(\mathbf {R}_\mathrm{L}\) is free. Therefore, the possible solutions can be written as
$$\begin{aligned}&\nabla _{\mathbf {u}_\mathrm{L}}^\mathrm{T}\mathcal {J}_\mathrm{F}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \mathbf {R}_{\mathrm{L},j}= \nabla _{\mathbf {u}_\mathrm{F},j}\mathcal {J}_\mathrm{F}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \nonumber \\&\quad \Rightarrow \mathbf {R}_{\mathrm{L},j}= \mathbf {R}_{\mathrm{L},j}^0 +\underbrace{\mathcal {B}\left( \mathrm{null}\left( \nabla _{\mathbf {u}_\mathrm{L}}^\mathrm{T}\mathcal {J}_\mathrm{F}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \right) \right) }_{\mathcal {B}_N}\cdot \mathbf {t},\mathbf {t}\in \mathbb {R}^{\mathrm{dim}(N)}, \end{aligned}$$(22)where \(\mathbf {R}_{\mathrm{L},j}^0\) denotes a particular solution to (22) and \(\mathcal {B}_N\cdot \mathbf {t}\), \(\mathbf {t}\in \mathbb {R}^{\mathrm{dim}(N)}\) is a homogeneous solution to (22). Note that the basis of the null space of \(\nabla _{\mathbf {u}_\mathrm{L}}^\mathrm{T}\mathcal {J}_\mathrm{F}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \), \(\mathcal {B}\left( \mathrm{null}\left( \nabla _{\mathbf {u}_\mathrm{L}}^\mathrm{T}\mathcal {J}_\mathrm{F}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \right) \right) \), as well as a particular solution \(\mathbf {R}_{\mathrm{L},j}^0\), can be computed with a singular value decomposition (SVD) or QR decomposition (see, e.g., [34]).
\(\square \)
Remark 4.1
So far a static, single-stage reverse Stackelberg game has been considered. Whereas this basic case serves for developing the conditions summarized in Sect. 3 and the characterization of the present section, real-life control settings will often have a dynamic, multi-stage nature [7]. As it is also done in e.g., [8], the currently presented results can be simply applied to the dynamic case with open-loop information when considering the game as a series of static optimization problems. In other words, at each (discrete) time step \(k\in \mathcal {K}=\{1,2,\ldots ,K\}, K\in \mathbb {N}\) of the game the desired values \((\mathbf {u}_\mathrm{L}^\mathrm{d}(k),\mathbf {u}_\mathrm{F}^\mathrm{d}(k))\) are computed, where the mappings \(\gamma _\mathrm{L}(\mathbf {u}_\mathrm{F}(k),k)\) can be computed as done in the static case. However, for more involved dynamic settings with incomplete information and deviations of players from the optimal values, further research is needed.
4.2.1 Computation and Complexity
While the characterization of this section is aimed to provide a structured method to solve the reverse Stackelberg game with an affine leader function, the computational efficiency of testing the several conditions should be kept into account:
-
Determining a global optimum to represent the desired leader equilibrium \(\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \) is, in general, a constrained nonlinear programming problem; this subproblem has to be solved in any solution approach. Alternatively, a desired equilibrium that is not directly derived from a leader objective functional, or a series of such points, may be provided a priori.
-
Determining the convex hull of \(\Lambda _\mathrm{d}\) is only required in case \(\mathcal {J}_\mathrm{F}(\cdot )\) is both nonconvex and nondifferentiable at \(\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \). The vertices of \(\Lambda _\mathrm{d}\) can then be determined if \(\mathcal {J}_\mathrm{F}(\cdot )\) is of the particular type of a piecewise-affine function. For a polyhedron [33] with \(n\) vertices as input points, computing the convex hull in case \(\mathrm{dim}=n_\mathrm{L}+n_\mathrm{F}\) can be done with a worst-case complexity of \(O(n \log p)\) for \(\mathrm{dim}\le 3\) and \(O(n \cdot f_p / p)\) for \(\mathrm{dim}\ge 4\), where \(p\) points are actually on the hull and \(f_p\) denotes the maximum number of facets for \(p\) vertices [35].
-
Verifying whether the projection of the generalized normal onto the leader’s decision space is nonzero relies on simple vector products.
-
Computing particular and homogeneous solutions to (22) with an SVD leads to a numerically reliable solution due to its ability to deal with rank-deficient matrices; however, no finite termination can be guaranteed for the computation of the SVD. The iterative SVD approach can however be terminated when a sufficiently precise solution is obtained, which leads to a practical overall complexity of \(\mathcal {O}(n^3)\) floating-point operations, for an \(m\times n\) matrix in case \(m\approx n\) [34]. The alternative of using the QR decomposition technique does not have such reliability properties, but it does have finite termination; the complexity of the algorithms discussed in [34] is also of the order \(\mathcal {O}(n^3)\) in case \(n\approx m\).
Example 4.2
(Rosenbrock Function) The nonconvex Rosenbrock function [36] is often used to show the performance of optimization algorithms and is written as
as depicted in Fig. 6 together with several level curves. If we adopt this function structure for \(\mathcal {J}_\mathrm{F}\), it can be inferred from these contour lines that several desired leader equilibria cannot be obtained under an affine leader function, i.e., those in the valleys of the upper part of the level curves that are associated with increasing objective function values when considering increasing values of \(\mathbf {u}_\mathrm{L}\).
In order to illustrate the approach in higher dimensions, we however adopt the extended Rosenbrock function [37], written in general for \(n\) dimensions as
We now take \(\Omega _\mathrm{L}=\mathbb {R}\), while \(\Omega _\mathrm{F}=\mathbb {R}^2\). The sublevel set \(\Lambda _\mathrm{d}\) for \(\mathcal {J}_\mathrm{F}:=f_\mathrm{e}(\mathbf {u}_\mathrm{L},\mathbf {u}_\mathrm{F})\) and with the desired equilibrium \((\mathbf {u}_{\mathrm{L},1}^\mathrm{d},\mathbf {u}_{\mathrm{L},2}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d})= (-0.2, -0.27, 0.15 ) \) is plotted in Fig. 7. Since
the unique supporting hyperplane at \(\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \), also plotted in Fig. 7, can be written as
Now, \(\gamma _\mathrm{L}(\cdot )\) characterized by (15) with \(\mathbf {R}_\mathrm{F}=1\) has associated matrices \(\mathbf {R}_\mathrm{L}\) according to (18), where \(\mathbf {R}_\mathrm{L}^0= \begin{bmatrix} (15.42/-27.20)&(-56.21/-27.20) \end{bmatrix}^\mathrm{T}\) is a particular solution of
and with the basis \(\mathcal {B}_{\mathrm{null}\left( \nabla _{\mathbf {u}_\mathrm{L}}^\mathrm{T}\mathcal {J}_\mathrm{F}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \right) } = \begin{bmatrix} 0.2645&0.9644 \end{bmatrix}^\mathrm{T},\) leading to the set of optimal affine solutions characterized by
5 Constrained Decision Spaces
So far, the situation without constraints has been considered, and conditions have been provided under which an optimal affine leader function exists that leads to the desired reverse Stackelberg equilibrium point. These conditions form necessary but not sufficient conditions for the existence of an optimal affine leader function in the constrained game in which \(\Omega _\mathrm{L} \subsetneq \mathbb {R}^{n_\mathrm{L}}\) or \(\Omega _\mathrm{F}\subsetneq \mathbb {R}^{n_\mathrm{F}}\). In the constrained case, the complexity arises that additionally the locally defined supporting hyperplane \(\Pi _{\Lambda _\mathrm{d}}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \)—or the tangent hyperspace \(\Pi ^\mathrm{t}_{\Lambda _\mathrm{d}}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \) for the case with \(n_\mathrm{L}> 1\) and \(\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \in \mathrm{int}(\mathrm{conv}(\Lambda _\mathrm{d}))\)—should be within the constrained decision space \(\Omega _\mathrm{L}\times \Omega _\mathrm{F}\), with \(\Omega _\mathrm{L} \subsetneq \mathbb {R}^{n_\mathrm{L}}\) or \(\Omega _\mathrm{F}\subsetneq \mathbb {R}^{n_\mathrm{F}}\). This implies that the supporting or tangent hyperplane should contain an \(n_\mathrm{F}\)-dimensional affine subspace \(\gamma _\mathrm{L}\) satisfying (i) \(\gamma _\mathrm{L}\) should cover \(\Omega _\mathrm{F}\), i.e., \(\mathrm{dom}(\gamma _\mathrm{L})= \Omega _\mathrm{F}\) while (ii) \(\gamma _\mathrm{L}(\Omega _\mathrm{F})\subseteq \Omega _\mathrm{L}\). However, since the hyperplanes are derived locally, it thus still has to be verified whether an optimal leader function \(\gamma _\mathrm{L}\) exists in the bounded decision space such that the global conditions (i) and (ii) hold.
Hence, given the set of feasible solutions characterized in Sect. 4 that is essentially developed for the unconstrained decision space, constraints can be incorporated to verify which elements of \(\Gamma _\mathrm{L}^*\) are still valid under the constrained conditions. Here it should be noted that any constraints on the decision spaces can obviously affect the desired equilibrium point \(\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \) as well as the set \(\Lambda _\mathrm{d}\), the elements of which are both assumed to be given in the conditions and the initial characterization of Sects. 3 and 4. In order to use these results, applicable constraints on the decision spaces should therefore naturally be incorporated in the computation of a desired leader equilibrium \(\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \) and in the derivation of the associated sublevel set \(\Lambda _\mathrm{d}\) at the initial stage.
The following simple example illustrates this approach.
Example 5.1
Consider a reverse Stackelberg game with the desired equilibrium \(\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) =(5,5)\) and a nonsmooth, convex sublevel set \(\Lambda _\mathrm{d}\) as depicted in Fig. 8.
For \(\Omega _\mathrm{L}=\Omega _\mathrm{F}=\mathbb {R}\), \(\gamma _\mathrm{L}\) characterized by (15) with \(\mathbf {R}_\mathrm{F}=1\) has associated matrices \(\mathbf {R}_\mathrm{L}\) according to (18) that can be described by the interval \(\mathcal {R}_\mathrm{L}=(-1,1)\). This can be derived from the two generating vectors of the normal cone \(\mathcal {V}\left( \Lambda _\mathrm{d}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \right) \) as depicted in Fig. 5a, i.e., \(\varvec{\nu }_1=\varvec{\nu }_2'=\begin{bmatrix} 1&1 \end{bmatrix}^\mathrm{T}\) and \(\varvec{\nu }_1'=\varvec{\nu }_2=\begin{bmatrix} -1&-1 \end{bmatrix}^\mathrm{T}\). Multiplication with \(\mathbf {Q}=\begin{bmatrix} 1&0\end{bmatrix} \) yields \(\mathcal {R}_\mathrm{L}=(-1,1)\) and given \(\mathbf {B}=\mathbf {R}_\mathrm{L}\mathbf {R}_\mathrm{F}^{-1}\) according to (3.11), the set of optimal affine solutions can be characterized by
Note that in this particular case, an optimal affine mapping \(\Omega _\mathrm{F}\rightarrow \Omega _\mathrm{L}\) through \(\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \) coincides with a line \(\Pi _{\Lambda _\mathrm{d}}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \). In Fig. 8, this interval of possible slopes of \(\gamma _\mathrm{L}\in \Gamma _\mathrm{L}^{*}\) is indicated by \(\alpha \).
Now, consider the decision space imposed by the constraints \(\Omega _\mathrm{L}=[0,10]\), \(\Omega _\mathrm{F}=[0,8]\). Not all mappings \(\gamma _\mathrm{L}\in \Gamma _\mathrm{L}^*\) return values for all \(\mathbf {u}_\mathrm{F}\in \Omega _\mathrm{F}\). Using the extrema \((0,10)\), \((8,0)\), and \((8,10)\), one can derive the intersection points of affine functions through \(\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \) with the decision space boundaries \(\mathrm{bd}(\Omega _\mathrm{L}),\mathrm{bd}(\Omega _\mathrm{F})\). Instead of using the generating vectors of \(\mathcal {V}\left( \Lambda _\mathrm{d}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \right) \) as in the unconstrained case, we are now interested in a smaller cone that is generated by the vectors \(\begin{bmatrix} -3/5&1 \end{bmatrix}^\mathrm{T},\begin{bmatrix} 3/5&1 \end{bmatrix}^\mathrm{T}\), resulting in a new range \(\mathcal {R}_\mathrm{L}=[-3/5,3/5]\). The full set of possible optimal affine leader functions can thus be characterized as follows:
The new interval of possible slopes of \(\gamma _\mathrm{L}'\in \Gamma _\mathrm{L}^{*,\mathrm{con}}\) is indicated in Fig. 8 by \(\alpha '\).
Derivation of the set \(\Gamma _\mathrm{L}^{*,\mathrm{con}}\)
The possible optimal matrices \(\mathbf {B}\) of (16) computed in Sect. 4 for the cases with \(\mathcal {J}_\mathrm{F}(\cdot )\) differentiable and nondifferentiable, respectively, can be re-evaluated under the presence of constraints, where we assume the constrained decision spaces to be convex and closed and bounded, i.e., compact. In particular, one needs to verify whether \(\gamma _\mathrm{L}(\Omega _\mathrm{F})\subseteq \Omega _\mathrm{L}\).
-
(a)
In case only \(\Omega _\mathrm{F}\) is restricted while \(\Omega _\mathrm{L}=\mathbb {R}^{n_\mathrm{L}}\), we have that \(\Gamma _\mathrm{L}^{*,\mathrm{con}}= \Gamma _\mathrm{L}^*\), which can be concluded from the fact that \(\Gamma _\mathrm{L}^*\) is derived only locally based on \(\Lambda _\mathrm{d}\), while \(\Omega _\mathrm{L}=\mathbb {R}^{n_\mathrm{L}}\); hence, \(\gamma _\mathrm{L}(\Omega _\mathrm{F})\subset \Omega _\mathrm{L}\) still holds for all \(\gamma _\mathrm{L}\in \Gamma _\mathrm{L}^*\).
-
(b)
Further, the case with only \(\Omega _\mathrm{L}\) restricted while \(\Omega _\mathrm{F}=\mathbb {R}^{n_\mathrm{F}}\) is only feasible if \(\gamma _{\mathrm{L},i}(\Omega _\mathrm{F}) = \{c_i\}\in \Omega _{\mathrm{L},i}\) for some \(c_i\in \mathbb {R}\) for every index \(i\) such that \(\Omega _{\mathrm{L},i}\varsubsetneq \mathbb {R}\), where \(\Omega _{\mathrm{L},i}\) is the projection of \(\Omega _\mathrm{L}\) on the \(i\)-th coordinate and \(\gamma _{\mathrm{L},i}(\cdot )\) denotes the \(i\)-th component of the vector-valued function \(\gamma _\mathrm{L}(\cdot )\). Indeed, the realization of the leader function for each such \(i\)-th component of the leader’s decision space should be constant for any follower decision variable, as for any affine function \(\gamma _{\mathrm{L},i}(\mathbf {u}_\mathrm{F})\ne c_i\in \mathbb {R}\), \(\lim _{\mathbf {u}_\mathrm{F}\rightarrow \pm \infty } \gamma _{\mathrm{L},i}(\mathbf {u}_\mathrm{F})\) will not be finite and therefore not an element of \(\Omega _{\mathrm{L},i}\varsubsetneq \mathbb {R}\).
-
(c)
In case both decision spaces are restricted by linear constraints, \(\Omega _\mathrm{L},\Omega _\mathrm{F}\) represent polytopes. By applying the affine mapping \(\gamma _\mathrm{L}(\cdot )\), convexity is preserved, implying that its image \(\gamma _\mathrm{L}\left( \Omega _\mathrm{F}\right) \) is a polytope as well [38]. As a result, it can be easily checked whether the image of \(\gamma _\mathrm{L}\) for its domain \(\Omega _\mathrm{F}\) is subject to the linear constraints imposed by \(\Omega _\mathrm{L}\): it is sufficient to verify \(\gamma _\mathrm{L}(\mathbf {u}_\mathrm{F}^v)\in \Omega _\mathrm{L}\) only for the vertices \(\mathbf {u}_\mathrm{F}^v\) of \(\Omega _\mathrm{F}\).
This result can also be used to obtain the reduced characterization of the set of optimal affine leader functions in the constrained case. If we denote the linear constraints on \(\Omega _\mathrm{L}\) by
$$\begin{aligned} \mathbf {A}_\mathrm{L}\cdot \mathbf {u}_\mathrm{L}\le \mathbf {b}_\mathrm{L}, \mathbf {A}_\mathrm{L}\in \mathbb {R}^{n_\mathrm{c}\times n_\mathrm{L}}, \mathbf {b}_\mathrm{L}\in \mathbb {R}^{n_\mathrm{c}}, n_\mathrm{c}\in \mathbb {N}, \end{aligned}$$one should add the following set of linear inequality constraints to the characterizations in (17) or (18), respectively, depending on differentiability of \(\mathcal {J}_\mathrm{F}(\cdot )\) at \(\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \), where \(\{\mathbf {u}_\mathrm{F}^v\}_{v=1}^{n_\mathrm{v}}\) denotes the set of vertices associated with \(\Omega _\mathrm{F}\):
$$\begin{aligned} \left\{ A_\mathrm{L}\cdot \left( \mathbf {u}_\mathrm{L}^\mathrm{d}+\mathbf {R}_\mathrm{L}(\mathbf {u}_\mathrm{F}^{v}-\mathbf {u}_\mathrm{F}^\mathrm{d}) \right) \le b_\mathrm{L} \right\} _{v=1,\dots ,{n_\mathrm{v}}}. \end{aligned}$$(25)In the nondifferentiable case, this results in the characterization (17) in which the non-negative parameters \(\alpha _{i,j}^s,\mu _{i,j}^s\) are now constrained by (25), as well as by \(\sum \limits _i\alpha _{i,j}^s= 1\), \(j = 1,\dots ,n_\mathrm{F}\) and with \(i = 1,\dots ,N_{s,j}^\mathrm{f}\) for \(s \in \{+,-\}\). Similarly, in the differentiable case, this will result in the characterization (18), in which \(T\) now belongs to the polyhedron \(\mathbb {R}^{\mathrm{dim}(N)\times n_\mathrm{F}}\) s.t. (25).
-
(d)
Finally, in the case of nonlinear constraints, a similar approach to c) can be adopted based on a piecewise-affine approximation of the constraints. However, this may in general generate a large number of vertices. Further, in order to guarantee feasibility of the leader function, (conservative) inner approximations should be made with respect to the leader’s decision space and outer approximations should be made with respect to the follower’s decision space. Alternatively, a fully numerical evaluation of the set
$$\begin{aligned} \Gamma _\mathrm{L}^*:=\{ \gamma _\mathrm{L}:\Omega _\mathrm{F}\rightarrow \Omega _\mathrm{L} : \gamma _\mathrm{L} \text { according to (7) satisfying (5), (15)} \} \end{aligned}$$for the unconstrained case may be made, e.g., based on a gridding of the decision spaces.
6 Conclusions
The single-leader–single-follower reverse Stackelberg game is considered, in which the leader player faces the problem of selecting a leader function—mapping her decision space into the follower’s decision space—that will lead to a specific desired equilibrium. Currently, many examples and applications in which this type of game is considered adopt strictly convex follower’s objective functionals and unconstrained decision spaces, in which case an affine leader function is automatically optimal. In order to allow the reverse Stackelberg game to be more readily applicable as an optimization structure in multi-level control problems like in traffic tolling, there is a need to develop a more general solution approach.
In this article, we have therefore first developed necessary and sufficient existence conditions and a characterization of the set of optimal affine leader functions that can be computed in a systematic manner. After such an initial set of optimal affine functions that are locally feasible is derived for unconstrained decision spaces, this set can be further reduced to include only those elements that map the full follower’s decision space into the leader’s decision space in case these spaces are constrained. Secondary optimization criteria can be incorporated similarly.
In [39], subsequent results are provided to deal with cases in which no optimal affine solutions exist. There, several methods are provided for the computation of optimal nonlinear leader functions, e.g., with piecewise-affine and smooth (piecewise) polynomial structures. Since these methods are computationally intensive or optimality cannot be guaranteed, continued research on existence conditions for optimal leader functions is needed.
In particular, to further develop a systematic solution approach for the general reverse Stackelberg game, more results on dynamic extensions are needed. Both cases in which the follower does not adopt the optimal decision values and in which the leader lacks information on the follower that is crucial to derive an optimal strategy should be considered. Further steps include analysis of the robustness of a leader function in case of uncertain conditions.
Notes
The argument \(\mathrm{conv}\left( \Lambda _\mathrm{d}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \right) \) is omitted for the sake of conciseness.
References
Staňková, K., Olsder, G., Bliemer, M.: Comparison of different toll policies in the dynamic second-best optimal toll design problem: case study on a three-link network. Eur. J. Transp. Infrastruct. Res. 9(4), 331–346 (2009)
von Stackelberg, H.: Marktform und Gleichgewicht. Julius Springer, Vienna, Austria (1934)
Scattolini, R.: Architectures for distributed and hierarchical model predictive control: a review. J. Process Control 19(5), 723–731 (2009)
Ho, Y.C., Luh, P., Muralidharan, R.: Information structure, Stackelberg games, and incentive controllability. IEEE Trans. Autom. Control 26(2), 454–460 (1981)
Ho, Y.C., Luh, P., Olsder, G.: A control-theoretic view on incentives. Automatica 18(2), 167–179 (1982)
Olsder, G.: Phenomena in inverse Stackelberg games, part 1: static problems. J. Optim. Theory Appl. 143(3), 589–600 (2009)
Olsder, G.: Phenomena in inverse Stackelberg games, part 2: dynamic problems. J. Optim. Theory Appl. 143(3), 601–618 (2009)
Zheng, Y.P., Başar, T.: Existence and derivations of optimal affine incentive schemes for Stackelberg games with partial information: A geometric approach. Int. J. Control 35(6), 997–1011 (1982)
Cansever, D., Başar, T.: A minimum sensitivity approach to incentive design problems. Large Scale Syst. 5, 233–244 (1983)
Groot, N., De Schutter, B., Hellendoorn, H.: Toward system-optimal routing in traffic networks: a reverse Stackelberg game approach. In: IEEE Transactions on Intelligent Transportation Systems (2014) Accepted for publication
Chang, T.S., Ho, Y.C.: Incentive problems: a class of stochastic Stackelberg closed-loop dynamic games. Systems and Control Letters 1(1), 16–21 (1981)
Ehtamo, H., Hämäläinen, R.: Incentive strategies and equilibria for dynamic games with delayed information. J. Optim. Theory Appl. 63(3), 355–369 (1989)
Martín-Herrán, G., Zaccour, G.: Credible linear-incentive equilibrium strategies in linear-quadratic differential games. In: Pourtallier, O., Gaitsgory, V., Bernhard, P. (eds.) Advances in Dynamic Games and their Applications. Annals of the International Society of Dynamic Games, vol. 10, pp. 1–31. Birkhäuser, Boston (2009)
Li, M., Cruz Jr, J., Simaan, M.: An approach to discrete-time incentive feedback Stackelberg games. IEEE Trans. Syst. Man Cybern. Part A Syst. Hum. 32(4), 472–481 (2002)
Salman, M., Cruz Jr, J.: An incentive model of duopoly with government coordination. Automatica 17(6), 821–829 (1981)
Tolwinski, B.: Closed-loop Stackelberg solution to a multistage linear-quadratic game. J. Optim. Theory Appl. 34(4), 485–501 (1981)
Groot, N., De Schutter, B., Hellendoorn, H.: Existence conditions for an optimal affine leader function in the reverse Stackelberg game. In: Proceedings of the 15th IFAC Workshop on Control Applications of Optimization, p. Paper 16. Rimini, Italy (2012)
Groot, N., De Schutter, B., Hellendoorn, H.: A full characterization of the set of optimal affine leader functions in the reverse Stackelberg game. In: Proceedings of the 51st IEEE Conference on Decision and Control, pp. 6484–6488. Maui, HI (2012)
Başar, T.: General theory for Stackelberg games with partial state information. Large Scale Syst. 3(1), 47–56 (1982)
Colson, B., Marcotte, P., Savard, G.: An overview of bilevel optimization. Ann. Oper. Res. 153(1), 235–256 (2007)
Vicente, L., Calamai, P.: Bilevel and multilevel programming: a bibliography review. J. Glob. Optim. 5(3), 291–306 (1994)
Jeroslow, R.: The polynomial hierarchy and a simple model for competitive analysis. Math. Program. 32(4), 146–164 (1985)
Hansen, P., Jaumard, B., Savard, G.: New branch-and-bound rules for linear bilevel programming. SIAM J. Sci. Stat. Comput. 13(5), 1194–1217 (1992)
Auslender, A., Teboulle, M.: Asymptotic Cones and Functions in Optimization and Variational Inequalities. Springer Monographs in Mathematics. Springer, New York, NY (2003)
Rockafellar, R.: Convex Analysis. Princeton University Press, Princeton, NJ (1970)
Clarke, F.: Optimization and Nonsmooth Analysis. Wiley, New York, NY (1983)
Başar, T., Olsder, G.: Dynamic Noncooperative Game Theory. Classics in Applied Mathematics, 2nd edn. SIAM, Philadelphia, PA (1999)
Åström, K., Wittenmark, B.: Computer-Controlled Systems: Theory and Applications, 3rd edn. Prentice Hall, Upper Saddle River, NJ (1997)
Franklin, G., Powell, J., Emami-Naeini, A.: Feedback Control of Dynamic Systems, 6th edn. Prentice-Hall, Upper Saddle River, NJ (2010)
Hasselblatt, B., Katok, A.: A First Course in Dynamics. Cambridge University Press, Cambridge, UK (2003)
Gellert, W., Gottwald, S., Hellwich, M., Kästner, H., Künstner, H.E.: VNR Concise Encyclopedia of Mathematics, 2nd edn. Van Nostrand Reinhold, New York, NY (1989)
Motzkin, T., Raiffa, H., Thompson, G., Thrall, R.: The double description method. In: Kuhn, H., Tucker, A. (eds.) Contributions to the Theory of Games. Annals of Mathematics Studies, vol. 28, pp. 51–73. Princeton University Press, Princeton, NJ (1953)
Schrijver, A.: Theory of Linear and Integer Programming. Wiley, Chichester, UK (1986)
Golub, G., Van Loan, C.: Matrix Computations, 2nd edn. The John Hopkins University Press, Baltimore, MD (1989)
Barber, C., Dobkin, D., Huhdanpaa, H.: The quickhull algorithm for convex hulls. ACM Trans. Math. Softw. 22(4), 469–483 (1996)
Rosenbrock, H.: An automatic method for finding the greatest or least value of a function. Comput. J. 3(3), 175–184 (1960)
De Jong, K.: An analysis of the behavior of a glass of genetic adaptive systems. PhD dissertation, University of Michigan (1975)
Boyd, S., Vandenberge, L.: Convex Optimization. Cambridge University Press, Cambridge, UK (2004)
Groot, N., De Schutter, B., Hellendoorn, H.: On systematic computation of optimal nonlinear solutions for the reverse Stackelberg game. IEEE Trans. Syst. Man Cybern. Syst. 44(10), 1315–1327 (2014)
Acknowledgments
Research supported by the European Union Seventh Framework Programme [FP7/2007-2013] under Grant agreement no. 257462 HYCON2 Network of Excellence, and by the European COST Action TU1102.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Mauro Pontani.
Appendix
Appendix
This appendix presents some lemmata that form basic elements for the proofs of the propositions in Sect. 3. Lemma 6.1 follows straightforwardly from the supporting hyperplane theorem (e.g., Theorem 11.6 in [25]) and the definition of a strictly supporting hyperplane.
Lemma 6.1
Assume the set \(\Lambda _\mathrm{d}\) to be convex. Let \(\Omega _\mathrm{L}=\mathbb {R}^{n_\mathrm{L}},\Omega _\mathrm{F}=\mathbb {R}^{n_\mathrm{F}}\) and let \(\alpha _\mathrm{L}\in \mathcal {A}_\mathrm{L}\) be any affine function through \(\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \) such that \(\alpha _\mathrm{L} \cap \Lambda _\mathrm{d} = \left\{ \left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \right\} \). Then \(\alpha _\mathrm{L}\) lies on a supporting hyperplane to \(\Lambda _\mathrm{d}\) at \(\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \).
Lemma 6.2
Let \(\Lambda _\mathrm{d}\) be defined through (6). A supporting hyperplane \(\Pi _{\Lambda _\mathrm{d}}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \) exists at \(\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \) if and only if \(\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \not \in \mathrm{int} (\mathrm{conv}( \Lambda _\mathrm{d}))\). Further, for an exposed point \(\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \) of \(\mathrm{conv}(\Lambda _\mathrm{d})\), \(\Pi _{\Lambda _\mathrm{d}}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \cap \Lambda _\mathrm{d}=\left\{ \left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \right\} \).
Proof
By definition of a convex hull, a supporting hyperplane \(\Pi _{\Lambda _\mathrm{d}}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \) exists if and only if there exists a supporting hyperplane \(\Pi _{\mathrm{{conv}}(\Lambda _\mathrm{d})}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \) to \(\mathrm{{conv}}(\Lambda _\mathrm{d})\) at \(\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \). Further, a supporting hyperplane to \(\mathrm{{conv}}(\Lambda _\mathrm{d})\) exists at \(\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \) if and only if \(\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \) is a boundary point of \(\mathrm{{conv}}(\Lambda _\mathrm{d})\) and thus also of \(\Lambda _\mathrm{d}\) ([25], Theorem 11.6). Clearly, an exposed of \(\mathrm{conv}(\Lambda _\mathrm{d})\) is such a boundary point. For the intersection of \(\Pi _{\Lambda _\mathrm{d}}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \) with \(\Lambda _\mathrm{d}\) solely to occur in the point \(\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \), it is required that \(\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \) is an exposed point of \(\mathrm{conv}(\Lambda _\mathrm{d})\). (Note that it is therefore sufficient for \(\mathrm{conv}(\Lambda _\mathrm{d})\) to be locally strictly convex at \(\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \).) \(\square \)
Lemma 6.3
Assume there exists a strictly supporting hyperplane
with \(\Lambda _\mathrm{d}\) according to (6). Then an affine function \(\alpha _\mathrm{L}^{\Pi _{\mathrm{conv}(\Lambda _\mathrm{d})}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) }\in \mathcal {A}_\mathrm{L}\) coincides with \(\Pi _{\mathrm{conv}(\Lambda _\mathrm{d})}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \) if and only if \(\mathbf {u}_\mathrm{L}\) is scalar (\(n_\mathrm{L}=1\)).
Proof
The dimension of a hyperplane \(\Pi _{\Lambda _\mathrm{d}}\), \((n_\mathrm{L}+n_\mathrm{F})-1\), equals the number of independent variables of an affine leader function \(\alpha _\mathrm{L}\in \mathcal {A}_\mathrm{L}\), with \(\Omega _\mathrm{F}\subseteq \mathbb {R}^{n_\mathrm{F}}\) only in case \(\mathbf {u}_\mathrm{L}\) is scalar. If there exists a strictly supporting hyperplane \(\Pi _{\mathrm{conv} (\Lambda _\mathrm{d})}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \), it follows that this plane coincides with \(\alpha _\mathrm{L}^{\Pi _{\mathrm{conv}(\Lambda _\mathrm{d})}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) }\). \(\square \)
Rights and permissions
About this article
Cite this article
Groot, N., De Schutter, B. & Hellendoorn, H. Optimal Affine Leader Functions in Reverse Stackelberg Games. J Optim Theory Appl 168, 348–374 (2016). https://doi.org/10.1007/s10957-014-0694-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10957-014-0694-4