1 Introduction

Hierarchical optimization approaches have proven useful in settings, in which conflicting objectives appear [1, 2], or as a means of dealing with large-scale control problems [3]. Here, the hierarchical reverse Stackelberg game [4] is considered, which is also known as a Stackelberg game with incentive strategies [5], or more recently, as the inverse Stackelberg game [6, 7]. As compared to the original Stackelberg game [2], in the reverse Stackelberg game, instead of an immediate decision variable, the leader proposes a mapping of the follower’s decision space into her decision space. Compared to the original Stackelberg game, this structure provides the leader with a stronger influence in case of a nonunique follower response to the single-leader decision.

While diverse research directions have been considered w.r.t. the reverse Stackelberg game, e.g., partial information [8], sensitivity [9], and applications like road tolling [1, 10], by the best knowledge of the authors, no structural approach to finding optimal leader functions has yet been provided. Instead, much research is tailored to the specific case of a quadratic, (strictly) convex and differentiable cost functional and, in a dynamic game framework, of a linear state update equation [8, 1116], in which case affine solutions have been shown to be optimal [8]. Our aim is to expand available solution methods by first considering problem instances in which nonconvex sublevel sets for the follower objective function as well as nondifferentiable objective functions apply.

In this article, a first step is made toward developing a systematic solution approach that on the one hand eases the solution process of the generally complex single-leader–single-follower reverse Stackelberg game, and that at the same time deals with a game setting in which assumptions that could restrict the application to certain problem settings are relaxed as much as possible. In particular, leader functions of the affine type are analyzed in order to procure a systematic approach for solving the game to optimality.

To this end, we formulate necessary and sufficient existence conditions for an optimal affine solution as initially presented in [17] in their most general form. Compared to earlier results reported in the literature, the differentiability requirement of the follower objective functional is relaxed, and locally strict convexity of the follower’s sublevel set at the desired reverse Stackelberg equilibrium is replaced by the more general property of an exposed point.

Moreover, a full characterization of the set of optimal affine leader functions is derived. The parametrized characterization of such a set facilitates further optimization, e.g., when considering the sensitivity to deviations from the optimal follower response as a secondary objective, as is illustrated in [18]. Furthermore, the characterization can be used to verify the existence of an optimal affine leader function in a constrained decision space, in which case the derivation of existence conditions is a challenging task. The computational complexity of the original game and of the proposed solution approach is considered, and illustrative examples are provided.

The remainder of this paper is structured as follows. Section 2 includes a definition of the reverse Stackelberg game along with a brief analysis of its computational complexity and with an outline of the solution approach, followed by a clarification of notation and assumptions. In Sect. 3, necessary and sufficient conditions are proven for the existence of an optimal affine leader function, considering separately the case of a scalar leader input, and the cases in which the desired equilibrium is either an interior or a boundary point of the sublevel set. The characterization of the set of optimal affine leader functions is provided in Sect. 4 and illustrated by an example, after which the use of this set for the constrained case is illustrated by an example in Sect. 5. Conclusions are presented in Sect. 6.

2 Preliminaries

2.1 Reverse Stackelberg Game

The basic single-leader–single-follower, static, deterministic reverse Stackelberg game without constraints can be defined as follows.

Let the leader and follower decision variables be denoted by \(\mathbf u _{p}\in \Omega _{p}\subseteq \mathbb {R}^{n_p}\), \(n_p\in \mathbb {N}\), \(p\in \{\mathrm{L,F}\}\), while \(\mathcal {J}_{p}:\Omega _\mathrm{L}\times \Omega _\mathrm{F}\rightarrow \mathbb {R}\) denotes, respectively, the leader’s and follower’s objective (cost) functional. We assume the leader to have complete knowledge of the follower’s objective functional and decision space.

Given that the leader player acts first by announcing the leader function \(\gamma _\mathrm{L}:\Omega _\mathrm{F}\rightarrow \Omega _\mathrm{L}\) while taking into account the follower’s response, we can write the problem as a set of composed functions similar as is done in [6]:

$$\begin{aligned}&\gamma _\mathrm{L}^*(\cdot ) \in \displaystyle \arg \min _{\gamma _\mathrm{L}(\cdot )}\mathcal {J}_\mathrm{L}(\gamma _\mathrm{L}(\mathbf {u}_\mathrm{F}^*(\gamma _\mathrm{L}(\cdot ))),\mathbf {u}_\mathrm{F}^*(\gamma _\mathrm{L}(\cdot ))), \nonumber \\&\mathbf {u}_\mathrm{F}^* = \displaystyle \arg \min _{\mathbf {u}_\mathrm{F}\in \Omega _\mathrm{F}}\mathcal {J}_\mathrm{F}(\gamma _\mathrm{L}^*(\mathbf {u}_\mathrm{F}),\mathbf {u}_\mathrm{F}), \end{aligned}$$
(1)

where we assume that \(\gamma ^*_\mathrm{L}\) is constructed such that the optimal—indicated by an superscribed asterisk—follower response is unique. An overview of symbols that are frequently used in this article can be found in Table 1.

Table 1 List of Important Symbols

2.1.1 Computational Complexity

Already the single-leader–single-follower static reverse Stackelberg problem is complex and, in general, difficult to solve analytically due to the composed functions appearing in the optimization problem formulated in (1) as well as the possible existence of multiple optima (hence solutions) and a nonunique follower response [6, 7, 19].

Theorem 2.1

The reverse Stackelberg game (1) is at least strongly NP-hard.

Proof

The original Stackelberg game is a special case of (1), i.e., for \(\gamma _\mathrm{L}:\Omega _\mathrm{F}\rightarrow \{ {\mathbf {u}}_\mathrm{L}^\mathrm{d}\}\), with \(\mathbf u _\mathrm{L}^\mathrm{d}\in \Omega _\mathrm{L}\) a free variable, (1) can be written as

$$\begin{aligned} \left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \in \arg \min _\mathbf{u _\mathrm{L}\in \Omega _\mathrm{L},\mathbf u _\mathrm{F}\in \Omega _\mathrm{F}} \left\{ \mathcal {J}_\mathrm{L}(\mathbf {u}_\mathrm{L},\mathbf {u}_\mathrm{F}): \mathbf {u}_\mathrm{F}\in \arg \min _{\mathbf {u}_\mathrm{F}\in \Omega _\mathrm{F}} \left\{ \mathcal {J}_\mathrm{F}(\mathbf {u}_\mathrm{L},\mathbf {u}_\mathrm{F}) \right\} \right\} , \end{aligned}$$
(2)

from which a suitable, explicit value \(\mathbf {u}_\mathrm{L}^\mathrm{d}\) for \(\gamma _\mathrm{L}(\mathbf {u}_\mathrm{F})\) follows.

Moreover, the Stackelberg game (2) is equivalent [20, 21] to the bilevel programming problem that can be written as

$$\begin{aligned} \min _{\mathbf {x}\in X} \left\{ F(\mathbf {x},\tilde{\mathbf {y}}):G(\mathbf {x},\tilde{\mathbf {y}})\le 0, \tilde{y}\in \arg \min _{\mathbf {y}\in Y} \left\{ f(\mathbf {x},\mathbf {y}):g(\mathbf {x},\mathbf {y})\le 0 \right\} \right\} , \end{aligned}$$
(3)

for general cost functions \(F(\cdot ),f(\cdot )\) and constraint functions \(G(\cdot ),g(\cdot )\). The linear bilevel programming problem is proven to be NP-hard [22] and later strongly NP-hard [23].

Hence, the reverse Stackelberg game can be reduced to the strongly NP-hard bilevel optimization problem, and therefore belongs at least to this complexity class. \(\square \)

A common, simplifying approach to the reverse Stackelberg problem is for the leader player to first determine a particular desired optimum \(\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \) that she seeks to achieve [4, 5]. A natural choice would be the leader’s global optimum \(\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \in \arg \min _{\mathbf {u}_\mathrm{L}\in \Omega _\mathrm{L},\mathbf {u}_\mathrm{F}\in \Omega _\mathrm{F}}\mathcal {J}_\mathrm{{L}}(\mathbf {u}_\mathrm{L},\mathbf {u}_\mathrm{F})\). Given such an equilibrium point, the remaining problem can be written as follows:

$$\begin{aligned}&\text {To find: }&\!&\gamma _\mathrm{{L}}\in \Gamma _\mathrm{L}, \end{aligned}$$
(4)
$$\begin{aligned}&\text {s.t. }&\!&\arg \min _{\mathbf {u}_\mathrm{F}\in \Omega _\mathrm{F}} \mathcal {J}_\mathrm{{F}}(\gamma _\mathrm{{L}}(\mathbf {u}_\mathrm{F}),\mathbf {u}_\mathrm{F}) = \mathbf {u}_\mathrm{F}^\mathrm{{d}}, \gamma _\mathrm{{L}}(\mathbf {u}_\mathrm{F}^\mathrm{{d}})=\mathbf {u}_\mathrm{L}^\mathrm{d}, \; \end{aligned}$$
(5)

where \(\Gamma _\mathrm{L}\) denotes the class of leader functions \(\gamma _\mathrm{L}:\Omega _\mathrm{F}\rightarrow \Omega _\mathrm{L}\) that is allowed in a particular game setting.

In other words, the leader should construct her function \(\gamma _\mathrm{L}\) such that it passes through the desired optimum, but without intersection with other points in the sublevel set

$$\begin{aligned} \Lambda _\mathrm{d}:=\left\{ (\mathbf {u}_\mathrm{L},\mathbf {u}_\mathrm{F})\in \Omega _\mathrm{L}\times \Omega _\mathrm{F} :\mathcal {J}_\mathrm{F}(\mathbf {u}_\mathrm{L},\mathbf {u}_\mathrm{F})\le \mathcal {J}_\mathrm{F}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \right\} . \end{aligned}$$
(6)

Then, the optimal follower response coincides with the desired input \(\mathbf {u}_\mathrm{F}^\mathrm{d}\).

2.1.2 Affine Incentive Compatibility

In order to further reduce the complexity of the general reverse Stackelberg game and to create a systematic approach toward solving the general game, in this paper, we focus on the particular affine structure of the leader function, i.e., we assume \(\Gamma _\mathrm{L}\) to include only functions of the form

$$\begin{aligned} \mathbf {u}_\mathrm{L}:=\gamma _\mathrm{L}(\mathbf {u}_\mathrm{F})=\mathbf {u}_\mathrm{L}^\mathrm{d} + \mathbf {B}\left( \mathbf {u}_\mathrm{F}- \mathbf {u}_\mathrm{F}^\mathrm{d}\right) , \end{aligned}$$
(7)

where \(\mathbf {B}\) denotes a linear operator mapping \(\Omega _\mathrm{F} \rightarrow \Omega _\mathrm{L}\), represented by an \(n_\mathrm{L}\times n_\mathrm{F}\) matrix in the finite-dimensional case we will consider according to assumption A.3 in Sect. 2.3 below.

The property of a particular desired leader equilibrium to be feasible for an instance of the reverse Stackelberg game is known as incentive compatibility in the literature [5]; it will therefore be analyzed under what conditions an optimal affine leader function exists, meaning that the leader is able to induce the follower to choose the desired input \(\mathbf {u}_\mathrm{F}^\mathrm{d}\), and thus reach her desired equilibrium.

From now on, we denote by \(\mathcal {A}_\mathrm{L}\) the set of affine relations through \(\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \) defined as sets of dimension \(n_\mathrm{F}\) in \(\Omega _\mathrm{L}\times \Omega _\mathrm{F}\) and such that, for \(\alpha _\mathrm{L} \in \mathcal {A}_\mathrm{L}\), \(\alpha _\mathrm{L} \cap \Lambda _\mathrm{d} = \left\{ \left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \right\} \).

Note that the introduction of \(\mathcal {A}_\mathrm{L}\) is necessary in order to be able to work with the function \(\gamma _\mathrm{L}:\Omega _\mathrm{F}\rightarrow \Omega _\mathrm{L}\) as a set of points

$$\begin{aligned} \left\{ (\mathbf {u}_\mathrm{L},\mathbf {u}_\mathrm{F}): \mathbf {u}_\mathrm{F}\in \Omega _\mathrm{F}, \mathbf {u}_\mathrm{L}=\gamma _\mathrm{L}(\mathbf {u}_\mathrm{F})\right\} . \end{aligned}$$
(8)

Remark 2.1

Note that, in this paper, the leader function \(\gamma _\mathrm{L}\) is defined as a mapping \(\Omega _\mathrm{F}\rightarrow \Omega _\mathrm{L}\) that can also be represented by the set of points (8) [24]. In the following, both the mapping and the set representation of \(\gamma _\mathrm{L}\) are adopted, depending on the context.

For \(\alpha _\mathrm{L} \in \mathcal {A}_\mathrm{L}, \alpha _\mathrm{L}(\Omega _\mathrm{L}) =\Omega _\mathrm{F}\), we can then characterize a candidate leader function by \(\gamma _\mathrm{L}:=(\alpha _\mathrm{L})^{-1}\). To indicate that a function \(\alpha _\mathrm{L}\in \mathcal {A}_\mathrm{L}\) is not only affine, but also it is in addition a subset of \(X\), we define \(\mathcal {A}_\mathrm{L}^X:=\{\alpha _\mathrm{L}\in \mathcal {A}_\mathrm{L} : \alpha _\mathrm{L}\subseteq X \}\).

2.2 Notation

The analysis mostly relies on concepts from convex analysis and geometry, such as hyperplanes and strictly convex functions and sets (see, e.g., [24, 25]). We denote the convex hull of a set \(X\) by \(\mathrm{conv}(X)\). In addition, the following notation is adopted:

  • By \(f(\tilde{X})\), we denote the image of a function \(f:X\rightarrow Y\) for a subset \(\tilde{X}\subseteq X\), where the domain is denoted \(X:=\mathrm{dom}(f)\).

  • \(\Pi _X(x)\) denotes a supporting hyperplane to the set \(X\) at the point \(x\in X\).

  • As in [24], a set \(X\) is an affine subspace iff \(\forall y,z\in X, \forall \alpha \in \mathbb {R}: \alpha y + (1-\alpha )z\in X\).

  • An exposed point of a convex set \(X\) is defined as a point in its closure \(\bar{X}:= {{\bigcap }} \left\{ X+\epsilon B : \epsilon >0 \right\} \) with \(B\) the Euclidean unit ball \(B:=\{ x : {\vert } x {\vert }_2 \le 1 \}\) that intersects with a strictly supporting hyperplane to \(X\) [25] (see also Fig. 1). Similarly, a point \(\tilde{x}\) in the closure of a nonconvex set \(\tilde{X}\) is an exposed point if there exists a neighborhood of \(\tilde{x}\), \(\mathcal {N}(\tilde{x})\), such that \(\tilde{x}\) intersects with a strictly supporting hyperplane to \(\mathcal {N}(\tilde{x})\).

    Fig. 1
    figure 1

    The points \(a\) and \(d\) of this closed convex set \(X\) are exposed; points \(b\) and \(c\) are not

  • The projection of the set \(P\subseteq \mathbb {R}^{n}\) onto the space \(X=\mathbb {R}^m,m\le n\) is denoted \(\mathrm{proj}_{X}(P)\).

  • By \(\{ 0 \}^{n_\mathrm{L}}\times \Omega _\mathrm{F}\), we denote the decision space in which the leader components are taken to be zero.

  • A generalized gradient \(\partial f(x)\) of a locally Lipschitz continuous function \(f:\mathbb {R}^n\rightarrow \mathbb {R}\) at \(x\) is defined as follows:

    $$\begin{aligned} \partial f(x):=\mathrm{conv} (\{\lim _{m\rightarrow \infty }\nabla f(x_m): x_m\rightarrow x, x_m \in \mathrm{dom}(f){\setminus } \Omega _f \} ), \end{aligned}$$

    with \(\Omega _f\) being the set of points where \(f\) is nondifferentiable [26]. By \(\mathcal {V}(X(x))\), we denote the generalized normal to the set \(X\) at the point \(x\in \bar{X}\), defined as a basis of the generalized gradient at \(x\in \bar{X}\). Thus, in case \(\Lambda _\mathrm{d}\) is smooth at \(\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \), \(\mathcal {V}\left( \Lambda _\mathrm{d}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \right) =\nabla \mathcal {J}_\mathrm{F}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \). The generalized normal can be defined through a finite or an infinite number of linearly independent basis vectors as clarified in Fig. 5.

2.3 Assumptions

  • \(\text {[A.1]}\) Let \(\Omega _\mathrm{L},\Omega _\mathrm{F}\) be convex sets.

  • \(\text {[A.2]}\) Let \(\Lambda _\mathrm{d}\) be a connected set.

  • \(\text {[A.3]}\) Let \(n_\mathrm{L},n_\mathrm{F}\) be finite.

  • \(\text {[A.4]}\) Let \(\Lambda _\mathrm{d}\not = \left\{ \left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \right\} \).

The first assumption is taken from the literature on Stackelberg games, e.g., [8, 27]. Assumption A.2 is a less restrictive case of taking \(\mathcal {J}_\mathrm{F}\), and therefore also \(\Lambda _\mathrm{d}\) to be strictly convex, as was done in [8]. Note that we do not require \(\mathcal {J}_\mathrm{F}\) to be continuous. In case \(\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \in \mathrm{bd}(\mathrm{conv}(\Lambda _\mathrm{d}))\), the assumption can even be omitted altogether as the analysis below is focused on \(\mathrm{conv}(\Lambda _\mathrm{d})\) rather than \(\Lambda _\mathrm{d}\). Assumption A.3 is accepted in many control applications [28, 29]; moreover, it is necessary in order to use the concept of a supporting hyperplane. Finally, the special case excluded by assumption A.4 presents the trivial situation in which \(\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \) is automatically optimal for the follower as well.

3 Existence Conditions

In the current section, basic necessary and sufficient conditions for the existence of an optimal affine leader function are proposed for the most general case, in which the sublevel set \(\Lambda _\mathrm{d}\) is allowed to be nonconvex and nonsmooth, in an unconstrained decision space. These conditions form the basis for a characterization of the set of optimal solutions, as provided in Sect. 4. An explicit analysis of the subcases in which convex sublevel sets apply that are smooth at \(\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \) can be found in [17]. Here, it should be noted that, when relaxing the strict convexity of the follower objective functional from the original results in [8], the desired leader equilibrium is not automatically a boundary point of the convex hull of the sublevel set. Exclusion of this case prevents the current theory from being generally applicable, as is illustrated in Fig. 2b. There, an example is shown of an optimal affine leader function for a desired equilibrium \(\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \), that is, in the interior of the convex hull of \(\Lambda _\mathrm{d}\).

Fig. 2
figure 2

a Example of a convex set \(\Lambda _\mathrm{d}\) that is nonsmooth at \(\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \), for which no optimal affine leader function exists. b Example of an optimal affine leader function not lying on a supporting hyperplane \(\Pi _{\Lambda _\mathrm{d}}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \) for \(n_\mathrm{L}> 1\), \(\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \in \mathrm{int}(\mathrm{conv}(\Lambda _\mathrm{d}))\)

We first need to consider the special case of \(n_\mathrm{L}=1\); no optimal affine leader function then exists if, in addition, \(\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \) is not an exposed point of \(\mathrm{conv}(\Lambda _\mathrm{d})\).

Proposition 3.1

Let \(\Omega _\mathrm{L}=\mathbb {R}^{n_\mathrm{L}},\Omega _\mathrm{F}=\mathbb {R}^{n_\mathrm{F}}\) and assume that \(n_\mathrm{L}=1\). Then, the desired equilibrium \(\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \) can be reached under an affine \(\gamma _\mathrm{L}:{\Omega }_\mathrm{F}\rightarrow \Omega _\mathrm{L}\) if and only if it holds both that \(\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \) is an exposed point of \(\mathrm{conv}(\Lambda _\mathrm{d})\) and that

$$\begin{aligned} \mathrm{proj}_{\Omega _\mathrm{L}}\left( \mathcal {V}\left( \mathrm{conv}\left( \Lambda _\mathrm{d}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \right) \right) \right) \ne \{\mathbf {0}\}. \end{aligned}$$

Proof

First note that, since \(n_\mathrm{L}=1\), if and only if a strictly supporting hyperplane \(\Pi _{\mathrm{conv}(\Lambda _\mathrm{d})}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \) exists, it coincides with an affine \(\alpha _\mathrm{L}\in \mathcal {A}_\mathrm{L}\), as is shown in Lemma 6.3. Note that a plane \(\Pi _{\mathrm{conv}(\Lambda _\mathrm{d})}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \) is strictly supporting if and only if \(\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \) is an exposed point of \(\mathrm{conv}(\Lambda _\mathrm{d})\) (implying that \(\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \not \in \mathrm{int}(\mathrm{conv}(\Lambda _\mathrm{d}))\)).

It remains to be shown that, in addition to \(\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \) being exposed, in order for \(\alpha _\mathrm{L}^{\Pi _{\Lambda _\mathrm{d}}}(\Omega _\mathrm{L})=\Omega _\mathrm{F}\) to hold, it is necessary and sufficient that there exists a vector \(\varvec{\nu }\in \mathcal {V}\left( \Lambda _\mathrm{d}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \right) :\mathrm{proj}_{\Omega _\mathrm{L}}(\varvec{\nu })\not = \{\mathbf {0}\}\), from which it follows that the projection \(\mathrm{proj}_{\Omega _\mathrm{L}}\left( \mathcal {V}\left( \mathrm{conv}\left( \Lambda _\mathrm{d}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \right) \right) \right) \) should not include only the zero vector. In that case, no explicit description of a leader function exists. This sufficiency and necessity is proven next.

(\(\Rightarrow \)) By contraposition: Suppose that \(\mathrm{proj}_{\Omega _\mathrm{L}}\left( \mathcal {V}\left( \mathrm{conv}\left( \Lambda _\mathrm{d}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \right) \right) \right) = \{\mathbf {0}\}\). Then, there exists a tangent plane \(\Pi _{\mathrm{conv}(\Lambda _\mathrm{d})}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \) with a normal vector \(\varvec{\nu }\) for which it holds that \( \mathrm{proj}_{\Omega _\mathrm{L}}(\varvec{\nu }) =\{\mathbf {0}\}\). It follows that this normal vector defining the hyperplane \(\Pi _{\mathrm{conv}(\Lambda _\mathrm{d})}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \) is parallel to the decision space \(\Omega _\mathrm{F}\), i.e., the hyperplane is orthogonal to \(\{ 0 \}^{n_\mathrm{L}}\times \Omega _\mathrm{F}\). Therefore, \(\mathrm{proj}_{\Omega _\mathrm{F}}\left( \Pi _{\Lambda _\mathrm{d}}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \right) \not \supset \Omega _\mathrm{F}\) and \(\Pi _{\Lambda _\mathrm{d}}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \) will not include any elements \((\mathbf {u}_\mathrm{L},\mathbf {u}_\mathrm{F})\in \Omega _\mathrm{L}\times (\Omega _\mathrm{F}{\setminus } \{\mathbf {u}_\mathrm{F}^\mathrm{d}\})\), which implies that \(\alpha _\mathrm{L}^{\Pi _{\Lambda _\mathrm{d}}}(\Omega _\mathrm{L}) \varsubsetneq \Omega _\mathrm{F}\).

(\(\Leftarrow \)) If \(\mathrm{proj}_{\Omega _\mathrm{L}}\left( \mathcal {V}\left( \mathrm{conv}\left( \Lambda _\mathrm{d}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \right) \right) \right) \not = \{\mathbf {0}\}\), then there exists a normal vector \(\varvec{\nu }\in \mathcal {V}\left( \mathrm{conv}\left( \Lambda _\mathrm{d}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \right) \right) \) defining a hyperplane \(\Pi _{\mathrm{conv}(\Lambda _\mathrm{d})}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \) that is not orthogonal to the decision space \(\Omega _\mathrm{L}\).

It follows that the hyperplane is not orthogonal to \(\{ 0 \}^{n_\mathrm{L}}\times \Omega _\mathrm{F}\): \(\mathrm{proj}_{\Omega _\mathrm{L}}\left( \mathcal {V}\left( \mathrm{conv}\left( \Lambda _\mathrm{d}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \right) \right) \right) = \Omega _\mathrm{F}\).

Hence, for all \(\mathbf {u}_\mathrm{F}\in \Omega _\mathrm{F}\), there exists \(\mathbf {u}_\mathrm{L}\in \Omega _\mathrm{L} : (\mathbf {u}_\mathrm{L},\mathbf {u}_\mathrm{F})\in \Pi _{\mathrm{conv}(\Lambda _\mathrm{d})}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \). Thus, there exists an affine \(\alpha _\mathrm{L}^{\Pi _{\Lambda _\mathrm{d}}}:\alpha _\mathrm{L}^{\Pi _{\Lambda _\mathrm{d}}}(\Omega _\mathrm{L})=\Omega _\mathrm{F}\).

Under the use of a leader function \(\gamma _\mathrm{L}:=\left( \alpha _\mathrm{L}^{\Pi _{\mathrm{conv}(\Lambda _\mathrm{d})\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) }}\right) ^{-1}\), by definition of the level set (6), the minimum of \(\mathcal {J}_\mathrm{F}(\cdot )\) will be obtained at \(\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \). \(\square \)

An example of a case in which \(\Lambda _\mathrm{d}\) is nonsmooth and no affine \(\gamma _\mathrm{L}\) exists is depicted in Fig. 2a: here, \(\mathrm{proj}_{\Omega _\mathrm{L}}\left( \mathcal {V}\left( \Lambda _\mathrm{d}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \right) \right) = \{\mathbf {0}\}\).

Propositions 3.2 and 3.3 consider, respectively, the case in which the desired leader equilibrium \(\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \) is an exposed point of \(\mathrm{conv}(\Lambda _\mathrm{d})\), or is in the interior of \(\mathrm{conv}(\Lambda _\mathrm{d})\) for \(n_\mathrm{L}\ge 1\) and \(n_\mathrm{L}> 1\).

Proposition 3.2

Let \(n_\mathrm{L}\ge 1\) and assume that \(\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \) is an exposed point of \(\mathrm{conv}(\Lambda _\mathrm{d})\). Allow \(\Lambda _\mathrm{d}\) to be nonsmooth at \(\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \) and assume that \(\Omega _\mathrm{L}=\mathbb {R}^{n_\mathrm{L}},{ \Omega }_\mathrm{F}=\mathbb {R}^{n_\mathrm{F}}\). Then, the desired equilibrium \(\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \) can be reached under an affine \(\gamma _\mathrm{L}:\Omega _\mathrm{F}\rightarrow \Omega _\mathrm{L}\) if and only if \(\mathrm{proj}_{\Omega _\mathrm{L}}\left( \mathcal {V}\left( \mathrm{conv}\left( \Lambda _\mathrm{d}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \right) \right) \right) \not = \{\mathbf {0}\}\).

Proof

The necessity and sufficiency of the expression

$$\begin{aligned} \mathrm{proj}_{\Omega _\mathrm{L}}\left( \mathcal {V}\left( \mathrm{conv}\left( \Lambda _\mathrm{d}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \right) \right) \right) \not = \{\mathbf {0}\} \end{aligned}$$

is proven in the previous Proposition 3.1 for \(\gamma _\mathrm{L}:=\Pi _{\mathrm{conv}(\Lambda _\mathrm{d})}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \). In case \(n_L> 1\), it holds that \(\gamma _\mathrm{L} \subset \Pi _{\mathrm{conv}(\Lambda _\mathrm{d})}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \). Thus, there exists at least one leader function on the plane that satisfies the condition that \(\alpha _\mathrm{L}^{\Pi _{\Lambda _\mathrm{d}}}(\Omega _\mathrm{L}) = \Omega _\mathrm{F}\). \(\square \)

Finally, for \(n_\mathrm{L}>1\), requiring \(\alpha _\mathrm{L}\in \mathcal {A}_\mathrm{L}\) to lie on a supporting hyperplane separating the full \((n_\mathrm{F}+n_\mathrm{L})\)-dimensional decision space into subspaces is generally too restrictive for the existence of an optimal affine leader function. This applies to, e.g., the general nonconvex case under a constrained decision space, and to the case in which \(\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \in \mathrm{int(conv}(\Lambda _\mathrm{d}))\), as depicted in Fig. 2b. Hence, instead, the tangent hyperplane concept is adopted in Proposition 3.3 below.

Proposition 3.3

Let \(n_\mathrm{L} > 1\) and assume that \(\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \in \mathrm{int}\left( \mathrm{conv}\left( {\Lambda }_\mathrm{d}\right) \right) \). Allow \(\Lambda _\mathrm{d}\) to be nonsmooth at \(\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \) and assume that \({\Omega _L}=\mathbb {R}^{n_\mathrm{L}},{ \Omega _\mathrm{F}}=\mathbb {R}^{n_\mathrm{F}}\). Then, the desired equilibrium \(\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \) can be reached under an affine \(\gamma _\mathrm{L}:\Omega _\mathrm{F}\rightarrow \Omega _\mathrm{L}\) if and only if there exists an \(n_\mathrm{F}\)-dimensional tangent, affine subspace \(\Pi _\mathrm{d}^\mathrm{t}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \) to \(\Lambda _\mathrm{d}\) at \(\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \) such that \(\Pi _\mathrm{d}^\mathrm{t}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \cap \Lambda _\mathrm{d}=\left\{ \left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \right\} \), and such that \(\mathrm{proj}_{\Omega _\mathrm{L}}\left( \mathcal {V}\left( \Lambda _\mathrm{d}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \right) \right) \ne \{\mathbf {0}\}\).

Proof

Since \(\alpha _\mathrm{L}\in \mathcal {A}_\mathrm{L}\) is of the same dimension as a tangent, affine subspace \(\Pi _\mathrm{d}^\mathrm{t}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \), there exists \(\alpha _\mathrm{L}\in \mathcal {A}_\mathrm{L}: \alpha _\mathrm{L}\cap \Lambda _\mathrm{d}=\{\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \}\) if and only if

$$\begin{aligned} \exists \Pi _\mathrm{d}^\mathrm{t}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) : \Pi _\mathrm{d}^\mathrm{t}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \cap \Lambda _\mathrm{d}=\left\{ \left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \right\} . \end{aligned}$$

In order for \(\alpha _\mathrm{L}\in \mathcal {A}^{\Pi _\mathrm{d}^\mathrm{t}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) }_\mathrm{L}\) to be a mapping \(\Omega _\mathrm{L}\rightarrow \Omega _\mathrm{F}\), it is necessary and sufficient that \(\mathrm{proj}_{\Omega _\mathrm{L}} \left( \mathcal {V}\left( \Lambda _\mathrm{d}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \right) \right) \ne \{\mathbf {0}\}\), as was proven before in Proposition 3.1. \(\square \)

Remark 3.1

Consider the case with \(\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \in \mathrm{bd}\left( \mathrm{conv}\left( \Lambda _\mathrm{d}\right) \right) \), but where \(\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \) is not an exposed point, i.e., no supporting hyperplane \(\Pi _{\mathrm{conv}(\Lambda _\mathrm{d})}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \) exists that intersects with \(\mathrm{conv}(\Lambda _\mathrm{d})\), thus also with \(\Lambda _\mathrm{d}\) solely in the point \(\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \). However, by definition of the convex hull, there does exist a supporting hyperplane \(\tilde{\Pi }_{\mathrm{conv}(\Lambda _\mathrm{d})}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \) for which thus holds that \(\tilde{\Pi }_{\mathrm{conv}(\Lambda _\mathrm{d})}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \cap \Lambda _\mathrm{d}{\setminus }\left\{ \left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \right\} \not = \emptyset \). For this case not captured in Proposition 3.2, with \(n_\mathrm{L}=1\), an optimal affine \(\gamma _\mathrm{L}\) may still exist, as depicted in Fig. 3.

Fig. 3
figure 3

Example of an affine \(\gamma _\mathrm{L}\) lying on a supporting hyperplane \(\Pi _{\Lambda _\mathrm{d}}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \) that is not strictly supporting

Remark 3.2

In the special case with \(\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \) exposed and with scalar decision variables (\(n_\mathrm{F}=1,n_\mathrm{L}=1\)), an affine \(\gamma _\mathrm{L}:\Omega _\mathrm{F}\rightarrow \Omega _\mathrm{L}\) leading to \(\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \) automatically exists.

Since \(\Lambda _\mathrm{d}\) is nonsmooth at \(\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \), a supporting hyperplane to \(\Lambda _\mathrm{d}\) will not be a unique (tangent) hyperplane. By both the convexity of \(\Lambda _\mathrm{d}\) and by \(\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \) being an exposed point, we know that \(\not \exists \mathbf {u}_\mathrm{L}\in \Omega _\mathrm{L}{\setminus } \{\mathbf {u}_\mathrm{L}^\mathrm{d}\} : \, \{(\mathbf {u}_\mathrm{L},\mathbf {u}_\mathrm{F}^\mathrm{d})\} \in \Lambda _\mathrm{d}\). Therefore, there must exist an alternative normal vector defining the hyperplane \(\Pi _{\Lambda _\mathrm{d}}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \) that is not orthogonal to \(\{ 0 \}^{n_\mathrm{L}}\times \Omega _\mathrm{F}\). For such a vector, \(\Pi _{\Lambda _\mathrm{d}}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \), and therefore \(\alpha ^{\Pi _{\Lambda _\mathrm{d}}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) }_\mathrm{L}\), will cover \(\Omega _\mathrm{F}: \mathrm{dom}(\gamma _\mathrm{L})= \Omega _\mathrm{F}\).

4 Characterization of an Optimal Affine Leader Function

In this section, the full set of affine leader functions will be derived under which the leader is able to induce the follower to choose the input \(\mathbf {u}_\mathrm{F}^\mathrm{d}\) and thereby to reach the desired solution point. In Sect. 4.1, first the case considered in the literature is summarized, after which we deal with the more general case in Sect. 4.2.

4.1 Under Differentiability Assumptions

A characterization of an optimal affine leader function (7), which reduces to the computation of an \(n_\mathrm{L}\times n_\mathrm{F}\) matrix \(\mathbf {B}\), was first derived in [8] in case \(\mathcal {J}_\mathrm{F}(\cdot )\) is differentiable in \(\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \).

In order to make sure that \(\mathbf {B}\) exists as defined next, it is assumed in [8] that \(\Omega _\mathrm{L},\Omega _\mathrm{F}\) are Hilbert spaces and that \(\mathcal {J}_\mathrm{F}(\cdot )\) is Fréchet differentiable on \(\Omega _\mathrm{L}\times \Omega _\mathrm{F}\). Additionally, \(\mathcal {J}_\mathrm{F}(\cdot )\) is assumed to be strictly convex on \(\Omega _\mathrm{L}\times \Omega _\mathrm{F}\). Then, for \(n_\mathrm{L},n_\mathrm{F}\) finite – a similar analysis is applicable for the infinite case – \(B\) should satisfy

$$\begin{aligned} \left[ \nabla _{\mathbf {u}_\mathrm{L}}\mathcal {J}_\mathrm{F}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \right] ^\mathrm{T} \mathbf {B} = \left[ \nabla _{\mathbf {u}_\mathrm{F}}\mathcal {J}_\mathrm{F}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \right] ^\mathrm{T}, \end{aligned}$$
(9)

which holds under the assumption that \(\nabla _\mathbf{u _\mathrm{L}}\mathcal {J}_\mathrm{F}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \not = 0\) (as follows from the conditions for the existence of an optimal affine leader function) and which can be verified by taking the inner product of the expression \(\mathbf u _\mathrm{L}=0\) according to (7) and \([\nabla _\mathbf{u _\mathrm{L}}\mathcal {J}_\mathrm{F}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) ]^\mathrm{T}\). This product

$$\begin{aligned} 0&= \left[ \nabla _\mathbf{u _\mathrm{L}}\mathcal {J}_\mathrm{F}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \right] ^\mathrm{T}\left[ \left( \mathbf u _\mathrm{L}^\mathrm{d}-\mathbf u _\mathrm{L}\right) +\mathbf {B}\left( \mathbf {u}_\mathrm{F}^\mathrm{d}-\mathbf u _\mathrm{F}\right) \right] \end{aligned}$$
(10)
$$\begin{aligned}&= \left[ \nabla _\mathbf{u _\mathrm{L}}\mathcal {J}_\mathrm{F}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \right] ^\mathrm{T}\left( \mathbf u _\mathrm{L}^\mathrm{d}-\mathbf u _\mathrm{L}\right) +\left[ \nabla _\mathbf{u _\mathrm{L}}\mathcal {J}_\mathrm{F}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \right] ^\mathrm{T}\mathbf {B}\left( \mathbf u _\mathrm{F}^\mathrm{d}-\mathbf u _\mathrm{F}\right) \end{aligned}$$
(11)
$$\begin{aligned}&= \left[ \nabla _\mathbf{u _\mathrm{L}}\mathcal {J}_\mathrm{F}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \right] ^\mathrm{T}\left( \mathbf u _\mathrm{L}^\mathrm{d}-\mathbf u _\mathrm{L}\right) + \left[ \nabla _\mathbf{u _\mathrm{F}}\mathcal {J}_\mathrm{F}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \right] ^\mathrm{T}\left( \mathbf u _\mathrm{F}^\mathrm{d}-\mathbf u _\mathrm{F}\right) \end{aligned}$$
(12)

corresponds exactly to the expression of a tangent hyperplane \(\Pi _{\Lambda _\mathrm{d}}^\mathrm{t}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \) to \(\Lambda _\mathrm{d}\) at \(\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \), from which it is concluded that if (9) holds, the affine function \(\gamma _\mathrm{L}\) indeed lies on the hyperplane \(\Pi ^\mathrm{t}_{\Lambda _\mathrm{d}}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \).

Under the condition \(\nabla _{\mathbf {u}_\mathrm{L}}\mathcal {J}_\mathrm{F}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \not = 0\), the following expression is given in [8]:

$$\begin{aligned} \mathbf {B}=\nabla _{\mathbf {u}_\mathrm{L}}\mathcal {J}_\mathrm{F}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \nabla ^\mathrm{T}_{\mathbf {u}_\mathrm{F}}\mathcal {J}_\mathrm{F}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) / ||\nabla ^\mathrm{T}_{\mathbf {u}_\mathrm{L}}\mathcal {J}_\mathrm{F}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) ||^2. \end{aligned}$$
(13)

Note that this is only one of many possible expressions for \(n_\mathrm{L}>1\). Moreover, in some constrained cases, this expression does not yield an optimal leader function, while an alternative, optimal affine solution, does exist as will be illustrated by Example 4.1 below. A generalized characterization of the optimal affine leader function will therefore be presented in Sect. 4.2 below.

Example 4.1

Expression (13) Subject to Constraints

We now provide a situation in which the specific expression of \(\mathbf {B}\) proposed in [8] for \(\mathcal {J}_\mathrm{F}(\cdot )\) differentiable at \(\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \) does not yield a feasible leader function in the constrained case, but in which an optimal leader function does exist.

Let

$$\begin{aligned} \mathcal {J}_\mathrm{F}(\mathbf u _\mathrm{L},\mathbf u _\mathrm{F})=(\mathbf u _\mathrm{F}-6)^2 + (\mathbf u _{\mathrm{L},1}-1)^2 +(\mathbf u _{\mathrm{L},2}-5)^2, \end{aligned}$$

and let \((\mathbf u _{\mathrm{L},1}^\mathrm{d},\mathbf u _{\mathrm{L},2}^\mathrm{d},\mathbf u _{\mathrm{F}}^\mathrm{d})=(0.5,6,4)\).

Then, \(\nabla _\mathbf{u _\mathrm{F}}\mathcal {J}_\mathrm{F}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) =2 \mathbf u _\mathrm{F}-12\), \(\nabla _\mathbf{u _\mathrm{L}}\mathcal {J}_\mathrm{F}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) =\begin{bmatrix} 2\mathbf u _{L,1}-2 \\ 2\mathbf u _{L,2}-10\end{bmatrix}\), leading to

$$\begin{aligned} \mathbf {B}&:= \frac{\nabla _\mathbf{u _\mathrm{L}}\mathcal {J}_\mathrm{F}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \nabla ^\mathrm{T}_\mathbf{u _\mathrm{F}}\mathcal {J}_\mathrm{F}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) }{ ||\nabla ^\mathrm{T}_\mathbf{u _\mathrm{L}}\mathcal {J}_\mathrm{F}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) ||^2} \nonumber \\&= \left( \begin{bmatrix} -1 \\ 2 \end{bmatrix}\cdot (-4) \right) / \left( \begin{bmatrix} 1&\quad \! 2 \end{bmatrix}\begin{bmatrix} 1 \\ 2 \end{bmatrix}\right) = \begin{bmatrix} 4/5 \\ -8/5 \end{bmatrix}. \end{aligned}$$
(14)

As can be seen in Fig. 4, a mapping \(\gamma _\mathrm{L}\) as defined through (7) with \(\mathbf {B}\) as defined in (14) does not return values for all \(\mathbf {u}_\mathrm{F}\in \Omega _\mathrm{F}\) for the constraints

$$\begin{aligned} \mathbf {u}_{\mathrm{L},1}\in [0,16], \mathbf {u}_{\mathrm{L},2}\in [0,5], \mathbf {u}_\mathrm{F}\in [2,8]. \end{aligned}$$

Thus, a parametrization \(\mathbf {B}\) as defined by (13) does not belong to the characterization of an optimal leader function in this constrained case.

Fig. 4
figure 4

Situation with a supporting hyperplane \(\Pi _{\Lambda _\mathrm{d}}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \) that is unique due to the differentiability of \(\mathcal {J}_\mathrm{F}(\cdot )\) at \(\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \). The bounds of the decision space are indicated by a box.

However, there do exist optimal affine mappings \(\Omega _\mathrm{F}\rightarrow \Omega _\mathrm{L}\) through \(\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \) that lie on \(\Pi _{\Lambda _\mathrm{d}}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \). As also plotted in Fig. 4, a suitable leader function that also lies on the tangent hyperplane defined by the relation

$$\begin{aligned} -\mathbf {u}_{\mathrm{L},1}+1/2\cdot \mathbf {u}_{\mathrm{L},2} + 2\cdot \mathbf {u}_\mathrm{F}-9/4=0 \end{aligned}$$

would be

$$\begin{aligned} \mathbf {u}_\mathrm{L}= \widetilde{\gamma _\mathrm{L}}(\mathbf {u}_\mathrm{F})= \begin{bmatrix} 6 \\ 1/2 \end{bmatrix} + \begin{bmatrix} (-9/4 -6)/4 \\ -1/8 \end{bmatrix}(4 - \mathbf {u}_\mathrm{F}). \end{aligned}$$

4.2 The General Case

As the previously presented result only captures the case in which \(\mathcal {J}_\mathrm{F}(\cdot )\) is differentiable and moreover as only one particular solution is specified, we now provide a characterization of the full set of possible leader functions with an affine structure that are optimal in an unconstrained decision space for the cases in which \(\mathcal {J}_\mathrm{F}(\cdot )\) is not required to be differentiable. Based on such a set, one can apply further, secondary selection criteria like the minimization of sensitivity of deviations from the optimal response [18], as well as deal with constraints on the decision space, as will be shown in Sect. 5 below.

In the following, we characterize \(\gamma _\mathrm{L}\) as a linear combination of matrices \(R=\begin{bmatrix}\mathbf {R}_\mathrm{L}^\mathrm{T}&\mathbf {R}_\mathrm{F}^\mathrm{T}\end{bmatrix}^\mathrm{T}\), \(\mathbf {R} \in \mathbb {R}^{(n_\mathrm{L}+n_\mathrm{F})\times n_\mathrm{F}}, \mathbf {R}_\mathrm{L}\in \mathbb {R}^{n_\mathrm{L}\times n_\mathrm{F}}, \mathbf {R}_\mathrm{F}\in \mathbb {R}^{n_\mathrm{F}\times n_\mathrm{F}}\), i.e.,

$$\begin{aligned} \gamma _\mathrm{L}: \begin{bmatrix} \mathbf {u}_\mathrm{L}\\ \mathbf {u}_\mathrm{F}\end{bmatrix} = \begin{bmatrix} \mathbf {u}_\mathrm{L}^\mathrm{d} \\ \mathbf {u}_\mathrm{F}^\mathrm{d}\end{bmatrix} + \begin{bmatrix} \mathbf {R}_\mathrm{L} \\ \mathbf {R}_\mathrm{F} \end{bmatrix}\cdot s, \end{aligned}$$
(15)

where \(s\in \mathbb {R}^{n_\mathrm{F}}\) represents the free parameters of the affine function. Now, for \(\mathbf {R}_\mathrm{F}\) invertible—which automatically follows from the necessary conditions, as will be proven in Lemma 4.1 below—it follows that

$$\begin{aligned}&\mathbf {u}_\mathrm{F}=\mathbf {u}_\mathrm{F}^\mathrm{d}+\mathbf {R}_\mathrm{F}\cdot s \qquad \Rightarrow s=\mathbf {R}_\mathrm{F}^{-1}(\mathbf {u}_\mathrm{F}-\mathbf {u}_\mathrm{F}^\mathrm{d}), \nonumber \\&\mathbf {u}_\mathrm{L}=\mathbf {u}_\mathrm{L}^\mathrm{d}+\underbrace{\mathbf {R}_\mathrm{L}\mathbf {R}_\mathrm{F}^{-1}}_{\mathbf {B}}(\mathbf {u}_\mathrm{F}-\mathbf {u}_\mathrm{F}^\mathrm{d}), \end{aligned}$$
(16)

i.e., one arrives at the explicit form of leader function (7). The problem left in order to arrive at a full characterization of an optimal affine \(\gamma _\mathrm{L}\) is to determine the set of possible basis vectors.

Lemma 4.1

In order for a leader function \(\gamma _\mathrm{L}\) characterized by (15) to be optimal, for \(\mathbf {R}=\begin{bmatrix}\mathbf {R}_\mathrm{L}^\mathrm{T}&\mathbf {R}_\mathrm{F}^\mathrm{T}\end{bmatrix}\), the following should hold:

  1. 1.

    \(\exists \varvec{\nu } \in \mathcal {V}(X\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) ):\varvec{\nu }^\mathrm{T} \mathbf {R}= \mathbf {0}^\mathrm{T}\), with \(\mathcal {V}\left( X\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \right) \) the generalized normal to \(X\) at \(\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \), where \(X=\mathrm{conv}(\mathrm \Lambda _\mathrm{d}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) )\) in case \(\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \) is exposed with respect to \(\mathrm{conv}(\mathrm \Lambda _\mathrm{d})\) (Proposition 3.2 applies), or \(X=\mathrm \Lambda _\mathrm{d}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \) in case Proposition 3.3 applies, i.e., \(\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \in \mathrm{int(conv)}\left( \mathrm \Lambda _\mathrm{d}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \right) \).

  2. 2.

    The columns of \(\mathbf {R}_\mathrm{F}\) should be a basis for \(\mathrm \Omega _\mathrm{F}\), i.e., \(\mathbf {R}_\mathrm{F}\) should be of full rank \(n_\mathrm{F}\) and thus invertible.

Proof

  1. 1.

    By definition of a tangent hyperplane \(\Pi _\mathrm{d}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \) to a set \(X\) at \(\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \), it holds that \(\Pi _\mathrm{d}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \perp \varvec{\nu }\) for some \(\varvec{\nu } \in \mathcal {V}\left( X\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \right) \). Since we require each optimal \(\gamma _\mathrm{L}\) characterized by (15) to lie on \(\Pi _\mathrm{d}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \), it follows that it is needed that also each column of \(\mathbf {R}\) is orthogonal to \(\varvec{\nu }\) , i.e., \(\varvec{\nu }^\mathrm{T}\mathbf {R}=0^\mathrm{T}\). Note that, in case \(\mathcal {J}_\mathrm{F}(\cdot )\) is differentiable, i.e.,

    $$\begin{aligned}&\mathcal {V}\left( X\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \right) \\&\quad =\left\{ \varvec{\nu }=\begin{bmatrix} \varvec{\nu }_\mathrm{L}^\mathrm{T}&\varvec{\nu }_\mathrm{F}^\mathrm{T} \end{bmatrix}^\mathrm{T}: \varvec{\nu }_\mathrm{L}=\nabla _{\mathbf {u}_\mathrm{L}}\mathcal {J}_\mathrm{F}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) ,\varvec{\nu }_\mathrm{F}=\nabla _{\mathbf {u}_\mathrm{F}}\mathcal {J}_\mathrm{F}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \right\} , \end{aligned}$$

    this condition is equivalent to the expression of a tangent hyperplane \(\Pi ^\mathrm{t}_{\Lambda _\mathrm{d}}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \): \([\nabla _{\mathbf {u}_\mathrm{L}}\mathcal {J}_\mathrm{F}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) ]^\mathrm{T}(\mathbf {u}_\mathrm{L}^\mathrm{d}-\mathbf {u}_\mathrm{L}) + [\nabla _{\mathbf {u}_\mathrm{F}}\mathcal {J}_\mathrm{F}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) ]^\mathrm{T} (\mathbf {u}_\mathrm{F}^\mathrm{d}-\mathbf {u}_\mathrm{F})=0\).

  2. 2.

    For an optimal affine leader function \(\gamma _\mathrm{L}(\cdot )\) characterized by (15) to satisfy \(\mathrm{dom}(\gamma _\mathrm{F})=\Omega _\mathrm{F}\), it is required that the \(n_\mathrm{F}\) columns of \(\mathbf {R}_\mathrm{F}\) are independent basis vectors spanning \(\Omega _\mathrm{F}\). Thus, \(\mathbf {R}_\mathrm{F}\) is of full rank hence invertible. \(\square \)

In fact, we can select w.l.o.g. \(\mathbf {R}_\mathrm{F}:= \mathbf {I}_{n_\mathrm{F}} = \begin{bmatrix} \mathbf {e}_1&\dots&\mathbf {e}_{n_\mathrm{F}}\end{bmatrix}\), as shown in Lemma 4.2.

Lemma 4.2

If there exists an optimal affine \(\gamma _\mathrm{L}(\cdot )\) characterized by (15), one can select w.l.o.g. \(\mathbf {R}_\mathrm{F}=\mathbf {I}_{n_\mathrm{F}}\).

Proof

Consider

$$\begin{aligned} \mathcal {S}&:= \bigg \{\gamma _\mathrm{L}: \begin{bmatrix} \mathbf {u}_\mathrm{L}\\ \mathbf {u}_\mathrm{F}\end{bmatrix} = \begin{bmatrix} \mathbf {u}_\mathrm{L}^\mathrm{d} \\ \mathbf {u}_\mathrm{F}^\mathrm{d}\end{bmatrix} + \begin{bmatrix} \mathbf {R}_\mathrm{L} \\ \mathbf {I}_{n_\mathrm{F}} \end{bmatrix}\cdot s, s\in \mathbb {R}^{n_\mathrm{F}}, \text {with } \mathbf {R}_\mathrm{L},\mathbf {R}_\mathrm{F}=\mathbf {I}_{n_\mathrm{F}} \\&\qquad {\text { satisfying conditions }} (1) {\hbox { and }} (2) {\text { of Lemma }}4.1 \bigg \},\\ \tilde{\mathcal {S}}&:= \bigg \{\gamma _\mathrm{L}: \begin{bmatrix} \mathbf {u}_\mathrm{L}\\ \mathbf {u}_\mathrm{F}\end{bmatrix} = \begin{bmatrix} \mathbf {u}_\mathrm{L}^\mathrm{d} \\ \mathbf {u}_\mathrm{F}^\mathrm{d}\end{bmatrix} + \begin{bmatrix} \tilde{R}_\mathrm{L} \\ \tilde{R}_\mathrm{F} \end{bmatrix}\cdot \tilde{s}, \tilde{s}\in \mathbb {R}^{n_\mathrm{F}}, \text {with } \tilde{R}_\mathrm{L}, \tilde{R}_\mathrm{F} \\&\qquad {\text { satisfying conditions }} (1) {\hbox { and }} (2){\text { of Lemma }}4.1{\text { with R substituted by }} \tilde{R} \bigg \}. \end{aligned}$$

To prove that \(\mathcal {S}\equiv \tilde{\mathcal {S}}\), we will show that, for each possible 3-tuple \((s,\mathbf {I}_{n_\mathrm{F}},\mathbf {R}_\mathrm{L})\) according to (15) with \(\varvec{\nu }^\mathrm{T}\begin{bmatrix} \mathbf {R}_\mathrm{L}^\mathrm{T}&\mathbf {I}_{n_\mathrm{F}} \end{bmatrix}^\mathrm{T} =0^\mathrm{T}\) that yields some \(\mathbf {u}_\mathrm{L},\mathbf {u}_\mathrm{F}\), one can find an equivalent tuple \((\tilde{s},\tilde{R}_\mathrm{F},\tilde{R}_\mathrm{L})\), \(\tilde{s}\in \mathbb {R}^{n_\mathrm{F}}\) for which additionally it holds that \(\varvec{\nu }^\mathrm{T}\begin{bmatrix} \tilde{R}_\mathrm{L}^\mathrm{T}&\tilde{R}_\mathrm{F} \end{bmatrix}^\mathrm{T} =0^\mathrm{T}\), yielding the same values \(\mathbf {u}_\mathrm{L},\mathbf {u}_\mathrm{F}\).

It can be easily seen that the expression \(\mathbf {u}_\mathrm{F}=\mathbf {u}_\mathrm{F}^\mathrm{d}+\mathbf {I}_{n_\mathrm{F}} \cdot s\) is equivalent to \(\mathbf {u}_\mathrm{F}=\mathbf {u}_\mathrm{F}^\mathrm{d}+\tilde{R}_{\mathrm{F}} \cdot \tilde{s}\) with \(s=\tilde{R}_F \cdot \tilde{s}\): as shown in Lemma 4.1, it follows from the existence of an optimal affine \(\gamma _\mathrm{L}(\cdot )\) that \(\mathbf {R}_\mathrm{F}\) is invertible. Then, for a given \(s\), there exists a unique \(\tilde{s}\) and vice versa. From \(\mathbf {B}=\tilde{R}_\mathrm{L}\tilde{R}_\mathrm{F}^{-1}\), according to (16) and from the substitution to \(\mathbf {B}=\mathbf {R}_\mathrm{L}\mathbf {I}_{n_\mathrm{F}}\), for equivalence, it should hold that \(\mathbf {R}_\mathrm{L}=\tilde{R}_\mathrm{L}\tilde{R}_\mathrm{F}^{-1}\). Finally, we have that

$$\begin{aligned} \varvec{\nu }_\mathrm{L}^\mathrm{T} \mathbf {R}_\mathrm{L} + \varvec{\nu }_\mathrm{F}^\mathrm{T}=0&\qquad \Leftrightarrow \varvec{\nu }_\mathrm{L}^\mathrm{T} \tilde{R}_\mathrm{L}\tilde{R}_\mathrm{F}^{-1} + \varvec{\nu }_\mathrm{F}^\mathrm{T}=0&\quad \Leftrightarrow \varvec{\nu }_\mathrm{L}^\mathrm{T} \tilde{R}_\mathrm{L} + \varvec{\nu }_\mathrm{F}^\mathrm{T}\tilde{R}_\mathrm{F}=0; \end{aligned}$$

hence, \(\mathcal {S}=\mathcal {\tilde{S}}\). \(\square \)

Now, given \(\mathbf {R}_\mathrm{F}:=\mathbf {I}_{n_\mathrm{F}}\), we still need to identify the set of matrices \(\mathbf {R}_\mathrm{L}\) that satisfy \(\varvec{\nu } ^\mathrm{T} R=0^\mathrm{T}\) for some normal vector \(\varvec{\nu }\), which reduces to \(\begin{bmatrix} \varvec{\nu }_\mathrm{L}^\mathrm{T}&\varvec{\nu }_\mathrm{F}^\mathrm{T}\end{bmatrix} \begin{bmatrix}\mathbf {R}_\mathrm{L} \\ \mathbf {I}_{n_\mathrm{F}}\end{bmatrix} =0^\mathrm{T}\) or, equivalently, \(\varvec{\nu }_\mathrm{L}^\mathrm{T}\mathbf {R}_\mathrm{L} = -\varvec{\nu }_\mathrm{F}^\mathrm{T}\).

Due to the necessary condition \(\mathrm{proj}_{\Omega _\mathrm{L}}(\mathcal {V}(\mathrm{conv}(\Lambda _\mathrm{d}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) )))\not = \{\mathbf {0}\}\) (Proposition 3.2 applies) or \(\mathrm{proj}_{\Omega _\mathrm{L}}(\mathcal {V}(\Lambda _\mathrm{d}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) ))\not = \{\mathbf {0}\}\) (Proposition 3.3 applies), \(\varvec{\nu }_\mathrm{L}^\mathrm{T}\) must contain at least one nonzero entry. Hence, the expressions \(\varvec{\nu }_\mathrm{L}^\mathrm{T}\mathbf {R}_{\mathrm{L},j} = -\varvec{\nu }_{\mathrm{F},j}\), \(j=1,\ldots ,n_\mathrm{F}\), can be solved for. Proposition 4.1 below provides a parametrized characterization of this problem that will be needed for further optimization.

From the previous derivations, the following theorem automatically follows.

Theorem 4.1

Let \(\Omega _\mathrm{L}=\mathbb {R}^{n_\mathrm{L}}, \Omega _\mathrm{F}=\mathbb {R}^{n_\mathrm{F}}\). Assume that the conditions in Proposition 3.2 or Proposition 3.3 are satisfied, and that therefore an optimal affine leader function of the form (7) exists. Then, the set

$$\begin{aligned} \Gamma _\mathrm{L}^*:=\{ \gamma _\mathrm{L}: \Omega _\mathrm{F}\rightarrow \Omega _\mathrm{L} : \gamma _\mathrm{L}\,{according\,to}\,(7)\,{satisfying}\, (5), (15) \} \end{aligned}$$

contains optimal affine solutions that can be characterized by \(\mathbf {B}:=\mathbf {R}_\mathrm{L}\mathbf {I}_{n_\mathrm{F}}\), with \(\varvec{\nu }_\mathrm{L}^\mathrm{T}\mathbf {R}_\mathrm{L}=-\varvec{\nu }_\mathrm{F}^\mathrm{T}\), for some \(\varvec{\nu } \in \mathcal {V}(\mathrm{conv}(\Lambda _\mathrm{d}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) ))\) in case \(\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \) is exposed with respect to \(\mathrm{conv}(\Lambda _\mathrm{d})\) (Proposition 3.2 applies), or for some some \(\varvec{\nu } \in \mathcal {V}\left( \Lambda _\mathrm{d}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \right) \) in case \(\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \in \mathrm{int}\left( \mathrm{conv}\left( \Lambda _\mathrm{d}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \right) \right) \) (Proposition 3.3 applies).

For the sake of conciseness, in the remainder of this section, we will assume \(\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \) to be an exposed point of \(\mathrm{conv}(\Lambda _\mathrm{d})\). As a result, we consider the case in which Proposition 3.2 is satisfied rather than Proposition 3.3, i.e., we consider the generalized normal \(\mathcal {V}\left( \mathrm{conv}\left( \Lambda _\mathrm{d}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \right) \right) \). For the case in which Proposition 3.3 holds, the generalized normal should be substituted by \(\mathcal {V}\left( \Lambda _\mathrm{d}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \right) \) in the following.

In order to be able to optimize over the set of possible leader functions and to select a function that is optimal with respect to some criteria, Proposition 4.1 now provides a parametrized characterization of the optimal affine leader function.

Proposition 4.1

Let \(\Gamma _\mathrm{L}^* := \{ \gamma _\mathrm{L}: \Omega _\mathrm{F} \rightarrow \Omega _\mathrm{L} : \gamma _\mathrm{L}\text { satisfies }\)(5), (15)}.

  1. 1.

    For \(\mathcal {J}_\mathrm{F}(\cdot )\) nondifferentiable at \(\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \), the possible realizations of \(\mathbf {R}_\mathrm{L}\in \mathbb {R}^{n_\mathrm{L}\times n_\mathrm{F}}\) can be written as

    $$\begin{aligned} \mathbf {R}_\mathrm{L}\in \mathcal {R}_\mathrm{L}&:= \left\{ \begin{bmatrix} \mathbf {R}_{\mathrm{L},1}&\ldots&\mathbf {R}_{\mathrm{L},n_\mathrm{F}} \end{bmatrix} : \mathbf {R}_j := \begin{bmatrix}\mathbf {R}_{\mathrm{L},j} \\ \mathbf {e}_j\end{bmatrix} \in \mathcal {R}_j, j=1,\ldots ,n_\mathrm{F} \right\} , \end{aligned}$$

    with the set of possible columns of \(\mathbf R \in \mathbb {R}^{(\mathrm{n}_\mathrm{L}+\mathrm{n}_\mathrm{F})\times \mathrm{n}_\mathrm{F}}\) characterized by

    $$\begin{aligned} \mathcal {R}_j&:= \left\{ \mathbf {Q}\cdot \mathbf {W}\cdot p^+_j \left| \begin{array}{l} p^{+}_j:= \left( \displaystyle \sum \limits _{i=1}^{N^\mathrm{f}_{+,j}} \alpha ^{+}_{i,j} \beta ^\mathrm{f}_{i,+,j} + \sum \limits _{i=1}^{N^\mathrm{e}_{+,j}} \mu ^{+}_{i,j} \beta ^\mathrm{e}_{i,+,j}\right) \\ \displaystyle \sum \limits _i \alpha ^{+}_{i,j}=1,\alpha ^{+}_{i,j} \in \mathbb {R}_+,\mu ^{+}_{i,j}\in \mathbb {R}_{+} \end{array} \right. \right\} \nonumber \\&\cup \left\{ \mathbf {Q}\cdot (-\mathbf {W})\cdot p^-_j \left| \begin{array}{l} p^{-}_j:= \left( \displaystyle \sum \limits _{i=1}^{N^\mathrm{f}_{-,j}} \alpha ^{-}_{i,j} \beta ^\mathrm{f}_{i,-,j} + \sum \limits _{i=1}^{N^\mathrm{e}_{-,j}} \mu ^{-}_{i,j} \beta ^\mathrm{e}_{i,-,j}\right) \\ \displaystyle \sum \limits _i \alpha ^{-}_{i,j}=1,\alpha ^{-}_{i,j} \in \mathbb {R}_+,\mu ^{-}_{i,j}\in \mathbb {R}_{+} \end{array} \right. \right\} , \end{aligned}$$
    (17)

    for \(j=1,\ldots ,n_\mathrm{F}\), with \(\mathbf {Q}:=\begin{bmatrix} \mathbf {I}_{n_\mathrm{L}}&\mathbf {0}_{n_\mathrm{L}\times n_\mathrm{F}} \end{bmatrix}\) and with \(\mathbf {W}=\begin{bmatrix} \mathbf {w}_1&\ldots&\mathbf {w}_m \end{bmatrix}\), where \(\{\mathbf {w}_i\}_{i=1}^m\), \(m\in \{\mathbb {N},\infty \}\) with \(\mathbf {w}_i \in \mathbb {R}^{n_\mathrm{L}+n_\mathrm{F}}\) is the set of generators of \(\bar{\mathcal {V}}(\mathrm{conv}(\Lambda _\mathrm{d}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) ))\) such that

    $$\begin{aligned}&\bar{\mathcal {V}}(\mathrm{conv}(\Lambda _\mathrm{d}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) ))\\&\quad :=\left\{ \sum \limits _{i=1}^m\beta _i : w_i \beta _i \in \mathbb {R}_+ \right\} \bigcup \left\{ \sum \limits _{i=1}^m\beta _i (-w_i) : \beta _i\in \mathbb {R}_+ \right\} . \end{aligned}$$

    Here, \(\{ \beta ^\mathrm{f}_{i,s,j}\}_{i=1}^{N_{s,j}^\mathrm{f}}\) and \(\{ \beta ^\mathrm{e}_{i,s,j}\}_{i=1}^{N_{s,j}^\mathrm{e}}\), \(s\in \{+,-\}\) are the sets of finite vertices and extreme rays, respectively, of the polyhedra \(\mathcal {P}^+_j=\{\varvec{\beta }: \mathbf {P}\mathbf {W}\varvec{\beta }=\mathbf {e}_j,\varvec{\beta }\in (\mathbb {R}_+)^m \}\) and \(\mathcal {P}^-_j=\{\varvec{\beta }: \mathbf {P}(-\mathbf {W})\varvec{\beta }=\mathbf {e}_j,\varvec{\beta }\in (\mathbb {R}_+)^m \}\).

  2. 2.

    For \(\mathcal {J}_\mathrm{F}(\cdot )\) differentiable at \(\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \), \(\mathbf {R}_\mathrm{L}\) belongs to the affine space of the form

    $$\begin{aligned} \mathcal {R}_{\mathrm{L}}:= \left\{ \mathbf {R}_\mathrm{L} : \mathbf {R}_\mathrm{L}= \mathbf {R}_{\mathrm{L}}^0 + \mathcal {B}_N \cdot \mathbf {T},\mathbf {T}\in \mathbb {R}^{\mathrm{dim}(N)\times n_\mathrm{F}} \right\} , \end{aligned}$$
    (18)

    with \(\mathbf {R}_\mathrm{L}^0\) a particular solution of \(\nabla _{\mathbf {u}_\mathrm{L}}^\mathrm{T}\mathcal {J}_\mathrm{F}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \mathbf {R}_{\mathrm{L}}= \nabla _{\mathbf {u}_\mathrm{F}}\mathcal {J}_\mathrm{F}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \) and with \(\mathcal {B}_N\) a basis of \(N:=\mathrm{null}\left( \nabla _{\mathbf {u}_\mathrm{L}}^\mathrm{T}\mathcal {J}_\mathrm{F}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \right) \).

Proof

  1. 1.

    For \(\Lambda _\mathrm{d}\) nonsmooth at \(\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \), the associated generalized normal is by definition a convex and and pointed cone, i.e., for any \(a_1,a_2 \in \mathbb {R}_+\) and \(\ a_1 v_1 + a_2 v_2 \in \mathcal {V}\left( \mathrm{conv}\left( \Lambda _\mathrm{d}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \right) \right) \), \(\varvec{\nu }_1,\varvec{\nu }_2\in \mathcal {V}\left( \mathrm{conv}\left( \Lambda _\mathrm{d}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \right) \right) \). In fact, \( \mathcal {V}\left( \mathrm{conv}\left( \Lambda _\mathrm{d}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \right) \right) \) is the normal cone [25], defined by the set of normal vectors to \(\Lambda _\mathrm{d}\) at \(\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \), which is generated by \(n\in \{\mathbb {N},\infty \}\) generators:

    $$\begin{aligned} \mathcal {V}\left( \mathrm{conv}\left( \Lambda _\mathrm{d}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \right) \right)&:=\left\{ \sum \limits _{i=1}^n\alpha _i \varvec{\nu }_i : \alpha _i\in \mathbb {R}_+,\varvec{\nu }_i\in \mathbb {R}^{n_\mathrm{L}+n_\mathrm{F}} \right. \nonumber \\&\quad \left. \text {a generator of } \mathcal {V}\left( \mathrm{conv}\left( \Lambda _\mathrm{d}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \right) \right) \right\} . \end{aligned}$$
    (19)

    The polar and dual cone of \(\mathcal {V}\left( \mathrm{conv}\left( \Lambda _\mathrm{d}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \right) \right) \) and its orthogonal complement are denoted, respectively, byFootnote 1

    $$\begin{aligned} \mathcal {V}^0&:= \left\{ r \in \mathbb R^{n_\mathrm{L}+n_\mathrm{F}} : r^\mathrm{T} \cdot \varvec{\nu } \le 0 \quad \forall \varvec{\nu } \in \mathcal {V}\left( \mathrm{conv}\left( \Lambda _\mathrm{d}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \right) \right) \right\} , \\ \mathcal {V}^*&:= \left\{ r \in \mathbb R^{n_\mathrm{L}+n_\mathrm{F}} : r^\mathrm{T} \cdot \varvec{\nu } \ge 0 \quad \forall \varvec{\nu }\in \mathcal {V}\left( \mathrm{conv}\left( \Lambda _\mathrm{d}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \right) \right) \right\} , \\ \mathcal {V}^\perp&:= \left\{ r \in \mathbb R^{n_\mathrm{L}+n_\mathrm{F}} : r^\mathrm{T} \cdot \varvec{\nu } = 0 \quad \forall \varvec{\nu }\in \mathcal {V}\left( \mathrm{conv}\left( \Lambda _\mathrm{d}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \right) \right) \right\} . \end{aligned}$$

    It now follows from condition (1) of Lemma 4.1 that the set of possible columns \(\mathbf {R}_j,j=1,\ldots ,n_\mathrm{F}\) of \(R\) can be represented by

    $$\begin{aligned}&\bar{\mathcal {V}}\left( \mathrm{conv}\left( \Lambda _\mathrm{d}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \right) \right) \nonumber \\&\quad :=\left\{ \mathbf {r}\in \mathbb R^{n_\mathrm{L}+n_\mathrm{F}} : \exists \varvec{\nu }\in \mathcal {V}\left( \mathrm{conv}\left( \Lambda _\mathrm{d}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \right) \right) \text { s.t. } \mathbf {r}^\mathrm{T} \cdot \varvec{\nu } = 0 \right\} \nonumber \\&\quad =\left( \mathbb {R}^{n_\mathrm{L}+n_\mathrm{F}}{\setminus } \left( \mathcal {V}^0\cup \mathcal {V}^*\right) \right) \cup \mathcal {V}^\perp . \end{aligned}$$
    (20)

    This last expression is illustrated in Fig. 5 where \(\bar{\mathcal {V}}\left( \mathrm{conv}\left( \Lambda _\mathrm{d}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \right) \right) \) corresponds to the area between \(\varvec{\nu }^\mathrm{o}_1,\varvec{\nu }^\mathrm{o}_2\) and the vectors \(\varvec{\nu }_1',\varvec{\nu }_2'\) perpendicular to \(\varvec{\nu }_1,\varvec{\nu }_2\). The expression (20) follows from the fact that the (closure of the) complement of a cone is again a cone [30]; hence, the complement of the double cone \(\mathcal {V}^0\cup \mathcal {V}^*\) [31], which consists of the union of two apex-to-apex placed pointed cones, i.e., \(\mathbb {R}^{n_\mathrm{L}+n_\mathrm{F}}{\setminus }\left( \mathcal {V}^0\cup \mathcal {V}^*\right) \), embodies a cone. Finally, the null vector is recovered in order to yield a pointed cone. The set \(\bar{\mathcal {V}}\left( \mathrm{conv}\left( \Lambda _\mathrm{d}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \right) \right) \) can therefore be written as a linear combination of generators \(\mathbf {w}_i\) from the set \(\{\pm \mathbf {w}_i\}_{i=1}^m\) as in (19), now also considering the negatives \(-\mathbf {w}_i\):

    $$\begin{aligned} \bar{\mathcal {V}}\left( \mathrm{conv}\left( \Lambda _\mathrm{d}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \right) \right)&:= \left\{ \sum \limits _{i=1}^m\beta _i \mathbf {w}_i : \beta _i\in \mathbb {R}_+,\mathbf {w}_i\in \mathbb {R}^{n_\mathrm{L}+n_\mathrm{F}} \right\} \nonumber \\&\cup \left\{ \sum \limits _{i=1}^m\beta _i (-\mathbf {w}_i) : \beta _i\in \mathbb {R}_+,\mathbf {w}_i\in \mathbb {R}^{n_\mathrm{L}+n_\mathrm{F}} \right\} .\qquad \end{aligned}$$
    (21)

    Finally, since \(\mathbf {R}=\begin{bmatrix} \mathbf {R}_\mathrm{L}^\mathrm{T}&\mathbf {I}_{n_\mathrm{F}}\end{bmatrix}^\mathrm{T}\) by Lemmas 4.1 and 4.2, we need to select from \(\bar{\mathcal {V}}(\mathrm{conv}(\Lambda _\mathrm{d}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) ))\) those vectors \(\mathbf {r}\) such that, for \(j=1,\ldots ,n_\mathrm{F}\),

    $$\begin{aligned} \mathcal {R}_j := \left\{ \mathbf {r}\in \bar{\mathcal {V}}\left( \mathrm{conv}\left( \Lambda _\mathrm{d}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \right) \right) : \mathrm{proj}_{\Omega _\mathrm{F}}(\mathbf {r}) = \mathbf {e}_j\right\} , \end{aligned}$$

    i.e., approved as a \(j\)-th column of \(\mathbf {R}\) are those vectors

    $$\begin{aligned}&\mathbf {r}=\mathbf {W}\cdot \varvec{\beta }\in \bar{\mathcal {V}}\left( \mathrm{conv}\left( \Lambda _\mathrm{d}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \right) \right) \text { and }\\&\quad \mathbf {r}=-\mathbf {W}\cdot \varvec{\beta }\in \bar{\mathcal {V}}\left( \mathrm{conv}\left( \Lambda _\mathrm{d}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \right) \right) , \end{aligned}$$

    with \(\mathbf {W}=\begin{bmatrix} \mathbf {w}_1&\ldots&\mathbf {w}_m\end{bmatrix}\), \(\varvec{\beta } = \begin{bmatrix} \varvec{\beta }_1&\ldots&\varvec{\beta }_m\end{bmatrix}^\mathrm{T}\), such that, for \(\mathbf {P}:=\begin{bmatrix} \mathbf {0}_{n_\mathrm{F}\times n_\mathrm{L}}&\mathbf {I}_{n_\mathrm{F}} \end{bmatrix}\), we have

    $$\begin{aligned} \mathbf {P}\cdot \mathbf {W}\cdot \varvec{\beta } =\mathbf {e}_j, \text { respectively, } \mathbf {P}\cdot \left( -\mathbf {W} \right) \cdot \varvec{\beta } =\mathbf {e}_j. \end{aligned}$$

    The solutions to these two equations, where it should be noted that \(\varvec{\beta }^+\in \mathbb {R}_+^{m}\), \(\varvec{\beta }^-\in \mathbb {R}_+^{m}\), can be parametrized using, e.g., the double description method [32] for the polyhedra

    $$\begin{aligned}&\mathcal {P}^+_{j}=\{\varvec{\beta }^+ : \mathbf {P}\mathbf {W}\varvec{\beta }^+=\mathbf {e}_j,\varvec{\beta }^+ \ge \mathbf {0} \}\text { and } \mathcal {P}^-_{j}=\{\varvec{\beta }^- : \mathbf {P}(-\mathbf {W})\varvec{\beta }^-=\mathbf {e}_j,\varvec{\beta }^- \ge \mathbf {0} \}, \end{aligned}$$

    i.e.,

    $$\begin{aligned} \beta ^s_{j}=\sum \limits _{i=1}^{N^\mathrm{f}_{s,j}} \alpha ^s_{i,j} \beta ^\mathrm{f}_{i,s,j} + \sum \limits _{i=1}^{N^\mathrm{e}_{s,j}} \mu ^s_{i,j} \beta ^\mathrm{e}_{i,s,j}, s\in \{+,-\}, \end{aligned}$$

    with \(\sum \limits _i \alpha ^s_{i,j}=1,\alpha ^s_{i,j} \in \mathbb {R}_+,\mu ^k_{i,j}\in \mathbb {R}_+\), and with \(\left\{ \beta ^\mathrm{f}_{i,s,j} : i=1,\ldots ,N_{s,j}^\mathrm{f}\right\} \) the set of finite vertices of \(\mathcal {P}^s_{j}\), and with \(\left\{ \beta ^\mathrm{e}_{i,s,j} : i=1,\ldots ,N_{k,s}^\mathrm{e}\right\} \) the set of extreme rays of \(\mathcal {P}^s_{j}\) [33], for \(s\in \{+,-\}\).

  2. 2.

    For \(\mathcal {J}_\mathrm{F}(\cdot )\) differentiable at \(\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \), \(\mathcal {V}\left( \mathrm{conv}\left( \Lambda _\mathrm{d}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \right) \right) \) is uniquely defined by \(\varvec{\nu }=\nabla \mathcal {J}_\mathrm{F}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \) with \(\nabla _{\mathbf {u}_\mathrm{L}}\mathcal {J}_\mathrm{F}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \ne \mathbf {0}\) (by Proposition 3.2 and 3.3), and therefore \(\varvec{\nu }_\mathrm{L}^\mathrm{T}\mathbf {R}_{\mathrm{L},j} = -\varvec{\nu }_{\mathrm{F},j}, j=1,\dots ,\varvec{\nu }_\mathrm{F}\) can be solved as a simple system of equalities. Here, for each zero element \(\varvec{\nu }_{\mathrm{L},i}\), the corresponding entry \(\mathbf {R}_{\mathrm{L},0,j}\) of \(\mathbf {R}_\mathrm{L}\) is free. Therefore, the possible solutions can be written as

    $$\begin{aligned}&\nabla _{\mathbf {u}_\mathrm{L}}^\mathrm{T}\mathcal {J}_\mathrm{F}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \mathbf {R}_{\mathrm{L},j}= \nabla _{\mathbf {u}_\mathrm{F},j}\mathcal {J}_\mathrm{F}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \nonumber \\&\quad \Rightarrow \mathbf {R}_{\mathrm{L},j}= \mathbf {R}_{\mathrm{L},j}^0 +\underbrace{\mathcal {B}\left( \mathrm{null}\left( \nabla _{\mathbf {u}_\mathrm{L}}^\mathrm{T}\mathcal {J}_\mathrm{F}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \right) \right) }_{\mathcal {B}_N}\cdot \mathbf {t},\mathbf {t}\in \mathbb {R}^{\mathrm{dim}(N)}, \end{aligned}$$
    (22)

    where \(\mathbf {R}_{\mathrm{L},j}^0\) denotes a particular solution to (22) and \(\mathcal {B}_N\cdot \mathbf {t}\), \(\mathbf {t}\in \mathbb {R}^{\mathrm{dim}(N)}\) is a homogeneous solution to (22). Note that the basis of the null space of \(\nabla _{\mathbf {u}_\mathrm{L}}^\mathrm{T}\mathcal {J}_\mathrm{F}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \), \(\mathcal {B}\left( \mathrm{null}\left( \nabla _{\mathbf {u}_\mathrm{L}}^\mathrm{T}\mathcal {J}_\mathrm{F}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \right) \right) \), as well as a particular solution \(\mathbf {R}_{\mathrm{L},j}^0\), can be computed with a singular value decomposition (SVD) or QR decomposition (see, e.g., [34]).

\(\square \)

Fig. 5
figure 5

The a finitely and b infinitely generated normal cone \(\mathcal {V}(\Lambda _\mathrm{d}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) )\) and the associated cone \(\bar{\mathcal {V}}(\Lambda _\mathrm{d}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) )=\left( \mathbb {R}^{n_\mathrm{L}+n_\mathrm{F}}{\setminus }\left( \mathcal {V}^0\cup \mathcal {V}^*\right) \right) \cup \mathcal {V}^\perp \)

Remark 4.1

So far a static, single-stage reverse Stackelberg game has been considered. Whereas this basic case serves for developing the conditions summarized in Sect. 3 and the characterization of the present section, real-life control settings will often have a dynamic, multi-stage nature [7]. As it is also done in e.g., [8], the currently presented results can be simply applied to the dynamic case with open-loop information when considering the game as a series of static optimization problems. In other words, at each (discrete) time step \(k\in \mathcal {K}=\{1,2,\ldots ,K\}, K\in \mathbb {N}\) of the game the desired values \((\mathbf {u}_\mathrm{L}^\mathrm{d}(k),\mathbf {u}_\mathrm{F}^\mathrm{d}(k))\) are computed, where the mappings \(\gamma _\mathrm{L}(\mathbf {u}_\mathrm{F}(k),k)\) can be computed as done in the static case. However, for more involved dynamic settings with incomplete information and deviations of players from the optimal values, further research is needed.

Fig. 6
figure 6

The Rosenbrock function and several level curves

4.2.1 Computation and Complexity

While the characterization of this section is aimed to provide a structured method to solve the reverse Stackelberg game with an affine leader function, the computational efficiency of testing the several conditions should be kept into account:

  • Determining a global optimum to represent the desired leader equilibrium \(\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \) is, in general, a constrained nonlinear programming problem; this subproblem has to be solved in any solution approach. Alternatively, a desired equilibrium that is not directly derived from a leader objective functional, or a series of such points, may be provided a priori.

  • Determining the convex hull of \(\Lambda _\mathrm{d}\) is only required in case \(\mathcal {J}_\mathrm{F}(\cdot )\) is both nonconvex and nondifferentiable at \(\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \). The vertices of \(\Lambda _\mathrm{d}\) can then be determined if \(\mathcal {J}_\mathrm{F}(\cdot )\) is of the particular type of a piecewise-affine function. For a polyhedron [33] with \(n\) vertices as input points, computing the convex hull in case \(\mathrm{dim}=n_\mathrm{L}+n_\mathrm{F}\) can be done with a worst-case complexity of \(O(n \log p)\) for \(\mathrm{dim}\le 3\) and \(O(n \cdot f_p / p)\) for \(\mathrm{dim}\ge 4\), where \(p\) points are actually on the hull and \(f_p\) denotes the maximum number of facets for \(p\) vertices [35].

  • Verifying whether the projection of the generalized normal onto the leader’s decision space is nonzero relies on simple vector products.

  • Computing particular and homogeneous solutions to (22) with an SVD leads to a numerically reliable solution due to its ability to deal with rank-deficient matrices; however, no finite termination can be guaranteed for the computation of the SVD. The iterative SVD approach can however be terminated when a sufficiently precise solution is obtained, which leads to a practical overall complexity of \(\mathcal {O}(n^3)\) floating-point operations, for an \(m\times n\) matrix in case \(m\approx n\) [34]. The alternative of using the QR decomposition technique does not have such reliability properties, but it does have finite termination; the complexity of the algorithms discussed in [34] is also of the order \(\mathcal {O}(n^3)\) in case \(n\approx m\).

Example 4.2

(Rosenbrock Function) The nonconvex Rosenbrock function [36] is often used to show the performance of optimization algorithms and is written as

$$\begin{aligned} f(x_1,x_2) = \left( 1-x_1\right) ^2 + 100\left( x_2-x_1^2\right) ^2, \end{aligned}$$
(23)

as depicted in Fig. 6 together with several level curves. If we adopt this function structure for \(\mathcal {J}_\mathrm{F}\), it can be inferred from these contour lines that several desired leader equilibria cannot be obtained under an affine leader function, i.e., those in the valleys of the upper part of the level curves that are associated with increasing objective function values when considering increasing values of \(\mathbf {u}_\mathrm{L}\).

In order to illustrate the approach in higher dimensions, we however adopt the extended Rosenbrock function [37], written in general for \(n\) dimensions as

$$\begin{aligned} f_\mathrm{e}(x_1,\dots ,x_n) = \sum \limits _{i=1}^{n-1} \left[ (1-x_i)^2 + 100\left( x_{i+1}-x_i^2\right) ^2\right] . \end{aligned}$$
(24)

We now take \(\Omega _\mathrm{L}=\mathbb {R}\), while \(\Omega _\mathrm{F}=\mathbb {R}^2\). The sublevel set \(\Lambda _\mathrm{d}\) for \(\mathcal {J}_\mathrm{F}:=f_\mathrm{e}(\mathbf {u}_\mathrm{L},\mathbf {u}_\mathrm{F})\) and with the desired equilibrium \((\mathbf {u}_{\mathrm{L},1}^\mathrm{d},\mathbf {u}_{\mathrm{L},2}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d})= (-0.2, -0.27, 0.15 ) \) is plotted in Fig. 7. Since

$$\begin{aligned} \nabla _{\mathbf {u}_\mathrm{L}}\mathcal {J}_\mathrm{F}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right)&=\nabla _{\mathbf {u}_\mathrm{L}}\Big [ \left( 1-\mathbf {u}_\mathrm{F}^\mathrm{d}\right) ^2 + \left( 1-\mathbf {u}_\mathrm{L,1}^\mathrm{d}\right) ^2 + \ldots \\&\qquad 100\left( \mathbf {u}_\mathrm{L,1}^\mathrm{d}-{\mathbf {u}_\mathrm{F}^\mathrm{d}}^2\right) ^2 + 100\left( \mathbf {u}_\mathrm{L,2}^\mathrm{d}-{\mathbf {u}_\mathrm{L,1}^\mathrm{d}}^2\right) ^2\Big ] \\&= \begin{bmatrix} \left( -2+2\mathbf {u}_{\mathrm{L},1}+200\mathbf {u}_{\mathrm{L},1}-200\mathbf {u}_\mathrm{F}^2 -400\mathbf {u}_{\mathrm{L},2}\mathbf {u}_{\mathrm{L},1}+400\mathbf {u}_{\mathrm{L},1}^3\right) \\ \left( 200\mathbf {u}_{\mathrm{L},2}-200\mathbf {u}_{\mathrm{L},1}^2\right) \end{bmatrix} \\&= \begin{bmatrix} -56.21&15.42 \end{bmatrix}^\mathrm{T} \ne \begin{bmatrix} 0&0 \end{bmatrix}^\mathrm{T}, \nonumber \\ \nabla _{\mathbf {u}_\mathrm{F}}\mathcal {J}_\mathrm{F}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right)&= -2+2\mathbf {u}_\mathrm{F}^\mathrm{d}-400 \mathbf {u}_{\mathrm{L},1}^\mathrm{d} \mathbf {u}_{\mathrm{F}}^\mathrm{d}+400{\mathbf {u}_\mathrm{F}^\mathrm{d}}^3=-27.20, \end{aligned}$$

the unique supporting hyperplane at \(\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \), also plotted in Fig. 7, can be written as

$$\begin{aligned} \Pi _{\Lambda _\mathrm{d}}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) =\begin{bmatrix} -56.21&15.42 \end{bmatrix} \left( \begin{bmatrix} -0.27 \\ 0.15 \end{bmatrix} - \begin{bmatrix} \mathbf {u}_{\mathrm{L},1}\\ \mathbf {u}_{\mathrm{L},2}\end{bmatrix} \right) - 27.20 (-0.2-\mathbf {u}_\mathrm{F}) =0. \end{aligned}$$
Fig. 7
figure 7

Sublevel set \(\Lambda _\mathrm{d}\) and a part of the strictly supporting hyperplane \(\Pi _{\Lambda _\mathrm{d}}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \) for the extended Rosenbrock function

Now, \(\gamma _\mathrm{L}(\cdot )\) characterized by (15) with \(\mathbf {R}_\mathrm{F}=1\) has associated matrices \(\mathbf {R}_\mathrm{L}\) according to (18), where \(\mathbf {R}_\mathrm{L}^0= \begin{bmatrix} (15.42/-27.20)&(-56.21/-27.20) \end{bmatrix}^\mathrm{T}\) is a particular solution of

$$\begin{aligned} \nabla _{\mathbf {u}_\mathrm{L}}^\mathrm{T}\mathcal {J}_\mathrm{F}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \mathbf {R}_{\mathrm{L}}= \nabla _{\mathbf {u}_\mathrm{F}}\mathcal {J}_\mathrm{F}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) , \end{aligned}$$

and with the basis \(\mathcal {B}_{\mathrm{null}\left( \nabla _{\mathbf {u}_\mathrm{L}}^\mathrm{T}\mathcal {J}_\mathrm{F}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \right) } = \begin{bmatrix} 0.2645&0.9644 \end{bmatrix}^\mathrm{T},\) leading to the set of optimal affine solutions characterized by

$$\begin{aligned} \mathcal {R}_{\mathrm{L}}:= \left\{ \mathbf {R}_\mathrm{L} : \mathbf {R}_\mathrm{L}= \begin{bmatrix} (15.42/-27.20) \\ (-56.21/-27.20) \end{bmatrix} + \begin{bmatrix} 0.2645 \\ 0.9644 \end{bmatrix} \cdot \mathbf {T},\mathbf {T} \in \mathbb {R} \right\} . \end{aligned}$$

5 Constrained Decision Spaces

So far, the situation without constraints has been considered, and conditions have been provided under which an optimal affine leader function exists that leads to the desired reverse Stackelberg equilibrium point. These conditions form necessary but not sufficient conditions for the existence of an optimal affine leader function in the constrained game in which \(\Omega _\mathrm{L} \subsetneq \mathbb {R}^{n_\mathrm{L}}\) or \(\Omega _\mathrm{F}\subsetneq \mathbb {R}^{n_\mathrm{F}}\). In the constrained case, the complexity arises that additionally the locally defined supporting hyperplane \(\Pi _{\Lambda _\mathrm{d}}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \)—or the tangent hyperspace \(\Pi ^\mathrm{t}_{\Lambda _\mathrm{d}}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \) for the case with \(n_\mathrm{L}> 1\) and \(\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \in \mathrm{int}(\mathrm{conv}(\Lambda _\mathrm{d}))\)—should be within the constrained decision space \(\Omega _\mathrm{L}\times \Omega _\mathrm{F}\), with \(\Omega _\mathrm{L} \subsetneq \mathbb {R}^{n_\mathrm{L}}\) or \(\Omega _\mathrm{F}\subsetneq \mathbb {R}^{n_\mathrm{F}}\). This implies that the supporting or tangent hyperplane should contain an \(n_\mathrm{F}\)-dimensional affine subspace \(\gamma _\mathrm{L}\) satisfying (i) \(\gamma _\mathrm{L}\) should cover \(\Omega _\mathrm{F}\), i.e., \(\mathrm{dom}(\gamma _\mathrm{L})= \Omega _\mathrm{F}\) while (ii) \(\gamma _\mathrm{L}(\Omega _\mathrm{F})\subseteq \Omega _\mathrm{L}\). However, since the hyperplanes are derived locally, it thus still has to be verified whether an optimal leader function \(\gamma _\mathrm{L}\) exists in the bounded decision space such that the global conditions (i) and (ii) hold.

Hence, given the set of feasible solutions characterized in Sect. 4 that is essentially developed for the unconstrained decision space, constraints can be incorporated to verify which elements of \(\Gamma _\mathrm{L}^*\) are still valid under the constrained conditions. Here it should be noted that any constraints on the decision spaces can obviously affect the desired equilibrium point \(\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \) as well as the set \(\Lambda _\mathrm{d}\), the elements of which are both assumed to be given in the conditions and the initial characterization of Sects. 3 and 4. In order to use these results, applicable constraints on the decision spaces should therefore naturally be incorporated in the computation of a desired leader equilibrium \(\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \) and in the derivation of the associated sublevel set \(\Lambda _\mathrm{d}\) at the initial stage.

The following simple example illustrates this approach.

Example 5.1

Consider a reverse Stackelberg game with the desired equilibrium \(\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) =(5,5)\) and a nonsmooth, convex sublevel set \(\Lambda _\mathrm{d}\) as depicted in Fig. 8.

For \(\Omega _\mathrm{L}=\Omega _\mathrm{F}=\mathbb {R}\), \(\gamma _\mathrm{L}\) characterized by (15) with \(\mathbf {R}_\mathrm{F}=1\) has associated matrices \(\mathbf {R}_\mathrm{L}\) according to (18) that can be described by the interval \(\mathcal {R}_\mathrm{L}=(-1,1)\). This can be derived from the two generating vectors of the normal cone \(\mathcal {V}\left( \Lambda _\mathrm{d}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \right) \) as depicted in Fig. 5a, i.e., \(\varvec{\nu }_1=\varvec{\nu }_2'=\begin{bmatrix} 1&1 \end{bmatrix}^\mathrm{T}\) and \(\varvec{\nu }_1'=\varvec{\nu }_2=\begin{bmatrix} -1&-1 \end{bmatrix}^\mathrm{T}\). Multiplication with \(\mathbf {Q}=\begin{bmatrix} 1&0\end{bmatrix} \) yields \(\mathcal {R}_\mathrm{L}=(-1,1)\) and given \(\mathbf {B}=\mathbf {R}_\mathrm{L}\mathbf {R}_\mathrm{F}^{-1}\) according to (3.11), the set of optimal affine solutions can be characterized by

$$\begin{aligned} \Gamma _\mathrm{L}^{*}= \left\{ \mathbf {u}_\mathrm{L}:=\gamma _\mathrm{L}(\mathbf {u}_\mathrm{F})= 5 + \mathbf {B}\cdot (\mathbf {u}_\mathrm{F}-5) : \mathbf {B}\in (-1,1), \mathbf {u}_\mathrm{F}\in \Omega _\mathrm{F} \right\} . \end{aligned}$$

Note that in this particular case, an optimal affine mapping \(\Omega _\mathrm{F}\rightarrow \Omega _\mathrm{L}\) through \(\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \) coincides with a line \(\Pi _{\Lambda _\mathrm{d}}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \). In Fig. 8, this interval of possible slopes of \(\gamma _\mathrm{L}\in \Gamma _\mathrm{L}^{*}\) is indicated by \(\alpha \).

Fig. 8
figure 8

Example of a set \(\Gamma ^*_\mathrm{L}\) that is reduced under the consideration of constraints on \(\Omega _\mathrm{L},\Omega _\mathrm{F}\). Here, \(\alpha \) indicates the range of possible values of \(\mathbf {R}_\mathrm{L}\). Note that the dashed functions \(\gamma _\mathrm{L}^1,\gamma _\mathrm{L}^2\) have an infinitesimal gap with \(\mathrm{bd}(\Lambda _\mathrm{d})\) for values of \(\mathbf {u}_\mathrm{L}\) below or above \(\mathbf {u}_\mathrm{F}^\mathrm{d}\), respectively

Now, consider the decision space imposed by the constraints \(\Omega _\mathrm{L}=[0,10]\), \(\Omega _\mathrm{F}=[0,8]\). Not all mappings \(\gamma _\mathrm{L}\in \Gamma _\mathrm{L}^*\) return values for all \(\mathbf {u}_\mathrm{F}\in \Omega _\mathrm{F}\). Using the extrema \((0,10)\), \((8,0)\), and \((8,10)\), one can derive the intersection points of affine functions through \(\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \) with the decision space boundaries \(\mathrm{bd}(\Omega _\mathrm{L}),\mathrm{bd}(\Omega _\mathrm{F})\). Instead of using the generating vectors of \(\mathcal {V}\left( \Lambda _\mathrm{d}\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \right) \) as in the unconstrained case, we are now interested in a smaller cone that is generated by the vectors \(\begin{bmatrix} -3/5&1 \end{bmatrix}^\mathrm{T},\begin{bmatrix} 3/5&1 \end{bmatrix}^\mathrm{T}\), resulting in a new range \(\mathcal {R}_\mathrm{L}=[-3/5,3/5]\). The full set of possible optimal affine leader functions can thus be characterized as follows:

$$\begin{aligned} \Gamma _\mathrm{L}^{*,\mathrm{con}}= \left\{ \mathbf {u}_\mathrm{L}:=\gamma _\mathrm{L}(\mathbf {u}_\mathrm{F})= 5 + \mathbf {B}\cdot (\mathbf {u}_\mathrm{F}-5) : \mathbf {B}\in [-3/5, 3/5], \mathbf {u}_\mathrm{F}\in \Omega _\mathrm{F} \right\} . \end{aligned}$$

The new interval of possible slopes of \(\gamma _\mathrm{L}'\in \Gamma _\mathrm{L}^{*,\mathrm{con}}\) is indicated in Fig. 8 by \(\alpha '\).

Derivation of the set \(\Gamma _\mathrm{L}^{*,\mathrm{con}}\)

The possible optimal matrices \(\mathbf {B}\) of (16) computed in Sect. 4 for the cases with \(\mathcal {J}_\mathrm{F}(\cdot )\) differentiable and nondifferentiable, respectively, can be re-evaluated under the presence of constraints, where we assume the constrained decision spaces to be convex and closed and bounded, i.e., compact. In particular, one needs to verify whether \(\gamma _\mathrm{L}(\Omega _\mathrm{F})\subseteq \Omega _\mathrm{L}\).

  1. (a)

    In case only \(\Omega _\mathrm{F}\) is restricted while \(\Omega _\mathrm{L}=\mathbb {R}^{n_\mathrm{L}}\), we have that \(\Gamma _\mathrm{L}^{*,\mathrm{con}}= \Gamma _\mathrm{L}^*\), which can be concluded from the fact that \(\Gamma _\mathrm{L}^*\) is derived only locally based on \(\Lambda _\mathrm{d}\), while \(\Omega _\mathrm{L}=\mathbb {R}^{n_\mathrm{L}}\); hence, \(\gamma _\mathrm{L}(\Omega _\mathrm{F})\subset \Omega _\mathrm{L}\) still holds for all \(\gamma _\mathrm{L}\in \Gamma _\mathrm{L}^*\).

  2. (b)

    Further, the case with only \(\Omega _\mathrm{L}\) restricted while \(\Omega _\mathrm{F}=\mathbb {R}^{n_\mathrm{F}}\) is only feasible if \(\gamma _{\mathrm{L},i}(\Omega _\mathrm{F}) = \{c_i\}\in \Omega _{\mathrm{L},i}\) for some \(c_i\in \mathbb {R}\) for every index \(i\) such that \(\Omega _{\mathrm{L},i}\varsubsetneq \mathbb {R}\), where \(\Omega _{\mathrm{L},i}\) is the projection of \(\Omega _\mathrm{L}\) on the \(i\)-th coordinate and \(\gamma _{\mathrm{L},i}(\cdot )\) denotes the \(i\)-th component of the vector-valued function \(\gamma _\mathrm{L}(\cdot )\). Indeed, the realization of the leader function for each such \(i\)-th component of the leader’s decision space should be constant for any follower decision variable, as for any affine function \(\gamma _{\mathrm{L},i}(\mathbf {u}_\mathrm{F})\ne c_i\in \mathbb {R}\), \(\lim _{\mathbf {u}_\mathrm{F}\rightarrow \pm \infty } \gamma _{\mathrm{L},i}(\mathbf {u}_\mathrm{F})\) will not be finite and therefore not an element of \(\Omega _{\mathrm{L},i}\varsubsetneq \mathbb {R}\).

  3. (c)

    In case both decision spaces are restricted by linear constraints, \(\Omega _\mathrm{L},\Omega _\mathrm{F}\) represent polytopes. By applying the affine mapping \(\gamma _\mathrm{L}(\cdot )\), convexity is preserved, implying that its image \(\gamma _\mathrm{L}\left( \Omega _\mathrm{F}\right) \) is a polytope as well [38]. As a result, it can be easily checked whether the image of \(\gamma _\mathrm{L}\) for its domain \(\Omega _\mathrm{F}\) is subject to the linear constraints imposed by \(\Omega _\mathrm{L}\): it is sufficient to verify \(\gamma _\mathrm{L}(\mathbf {u}_\mathrm{F}^v)\in \Omega _\mathrm{L}\) only for the vertices \(\mathbf {u}_\mathrm{F}^v\) of \(\Omega _\mathrm{F}\).

    This result can also be used to obtain the reduced characterization of the set of optimal affine leader functions in the constrained case. If we denote the linear constraints on \(\Omega _\mathrm{L}\) by

    $$\begin{aligned} \mathbf {A}_\mathrm{L}\cdot \mathbf {u}_\mathrm{L}\le \mathbf {b}_\mathrm{L}, \mathbf {A}_\mathrm{L}\in \mathbb {R}^{n_\mathrm{c}\times n_\mathrm{L}}, \mathbf {b}_\mathrm{L}\in \mathbb {R}^{n_\mathrm{c}}, n_\mathrm{c}\in \mathbb {N}, \end{aligned}$$

    one should add the following set of linear inequality constraints to the characterizations in (17) or (18), respectively, depending on differentiability of \(\mathcal {J}_\mathrm{F}(\cdot )\) at \(\left( \mathbf {u}_\mathrm{L}^\mathrm{d},\mathbf {u}_\mathrm{F}^\mathrm{d}\right) \), where \(\{\mathbf {u}_\mathrm{F}^v\}_{v=1}^{n_\mathrm{v}}\) denotes the set of vertices associated with \(\Omega _\mathrm{F}\):

    $$\begin{aligned} \left\{ A_\mathrm{L}\cdot \left( \mathbf {u}_\mathrm{L}^\mathrm{d}+\mathbf {R}_\mathrm{L}(\mathbf {u}_\mathrm{F}^{v}-\mathbf {u}_\mathrm{F}^\mathrm{d}) \right) \le b_\mathrm{L} \right\} _{v=1,\dots ,{n_\mathrm{v}}}. \end{aligned}$$
    (25)

    In the nondifferentiable case, this results in the characterization (17) in which the non-negative parameters \(\alpha _{i,j}^s,\mu _{i,j}^s\) are now constrained by (25), as well as by \(\sum \limits _i\alpha _{i,j}^s= 1\), \(j = 1,\dots ,n_\mathrm{F}\) and with \(i = 1,\dots ,N_{s,j}^\mathrm{f}\) for \(s \in \{+,-\}\). Similarly, in the differentiable case, this will result in the characterization (18), in which \(T\) now belongs to the polyhedron \(\mathbb {R}^{\mathrm{dim}(N)\times n_\mathrm{F}}\) s.t. (25).

  4. (d)

    Finally, in the case of nonlinear constraints, a similar approach to c) can be adopted based on a piecewise-affine approximation of the constraints. However, this may in general generate a large number of vertices. Further, in order to guarantee feasibility of the leader function, (conservative) inner approximations should be made with respect to the leader’s decision space and outer approximations should be made with respect to the follower’s decision space. Alternatively, a fully numerical evaluation of the set

    $$\begin{aligned} \Gamma _\mathrm{L}^*:=\{ \gamma _\mathrm{L}:\Omega _\mathrm{F}\rightarrow \Omega _\mathrm{L} : \gamma _\mathrm{L} \text { according to (7) satisfying (5), (15)} \} \end{aligned}$$

    for the unconstrained case may be made, e.g., based on a gridding of the decision spaces.

6 Conclusions

The single-leader–single-follower reverse Stackelberg game is considered, in which the leader player faces the problem of selecting a leader function—mapping her decision space into the follower’s decision space—that will lead to a specific desired equilibrium. Currently, many examples and applications in which this type of game is considered adopt strictly convex follower’s objective functionals and unconstrained decision spaces, in which case an affine leader function is automatically optimal. In order to allow the reverse Stackelberg game to be more readily applicable as an optimization structure in multi-level control problems like in traffic tolling, there is a need to develop a more general solution approach.

In this article, we have therefore first developed necessary and sufficient existence conditions and a characterization of the set of optimal affine leader functions that can be computed in a systematic manner. After such an initial set of optimal affine functions that are locally feasible is derived for unconstrained decision spaces, this set can be further reduced to include only those elements that map the full follower’s decision space into the leader’s decision space in case these spaces are constrained. Secondary optimization criteria can be incorporated similarly.

In [39], subsequent results are provided to deal with cases in which no optimal affine solutions exist. There, several methods are provided for the computation of optimal nonlinear leader functions, e.g., with piecewise-affine and smooth (piecewise) polynomial structures. Since these methods are computationally intensive or optimality cannot be guaranteed, continued research on existence conditions for optimal leader functions is needed.

In particular, to further develop a systematic solution approach for the general reverse Stackelberg game, more results on dynamic extensions are needed. Both cases in which the follower does not adopt the optimal decision values and in which the leader lacks information on the follower that is crucial to derive an optimal strategy should be considered. Further steps include analysis of the robustness of a leader function in case of uncertain conditions.