Keywords

1 Introduction

In this chapter we consider bilevel optimization models with uncertain parameters. Such models can be classified based on the chronology of decision and observation as well as the nature of the uncertainty involved. A bilevel stochastic program arises, if the uncertain parameter is realization of some random vector with known distribution, that can only be observed once the leader has submitted their decision. In contrast, the follower decides under complete information.

If upper and lower level objectives coincide, the bilevel stochastic program collapses to a classical stochastic optimization problem with recourse (cf. [1, Chap. 2]). Relations to other mathematical programming problems are explored in the seminal work [2] that also established the existence of solutions, Lipschitzian properties and directional differentiability of a risk-neutral formulation of a bilevel stochastic nonlinear model. Moreover, gradient descent and penalization methods were investigated to tackle discretely distributed stochastic mathematical programs with equilibrium constraints (SMPECs).

Reference [3] studies an application to topology optimization problems in structural mechanics. Many other applications are motivated by network related problems that inherit a natural order of successive decision making under uncertainty. Notable examples arise in telecommunications (cf. [4]), grid-based (energy) markets (cf. [5,6,7,8]) or transportation science (cf. [9, 10]). An extensive survey on bilevel stochastic programming literature is provided in [11, Chap. 1.4].

In two-stage stochastic bilevel programming leader and follower take two decisions: The decision on the respective first-stage variables is made in a here-and-now fashion, i.e. without knowledge of the realization of the random parameter. In contrast, the respective second-stage decisions are made in a wait-and-see manner, i.e. after observing the parameter (cf. [12]).

This chapter is organized as follows: In Sects. 17.2.117.2.5, we outline structural properties, existence and optimality conditions as well as stability results for bilevel stochastic linear problems while paying special attention to the modelling of risk-aversion via coherent or convex risk measures or stochastic dominance constraints. Sections 17.2.6 and 17.2.7 are devoted to the algorithmic treatment of bilevel stochastic linear problems, where the underlying distribution is finite discrete . An application of two-stage stochastic bilevel programming in the context of network pricing is discussed in Sect. 17.3. The chapter concludes with an overview of potential challenges for future research.

2 Bilevel Stochastic Linear Optimization

While the analysis in this section is confined to the bilevel stochastic linear problems with random right-hand side , the concepts and underlying principles can be easily transferred to stochastic extensions of more complex bilevel programming models.

2.1 Preliminaries

We consider the optimistic formulation of a parametric bilevel linear program

$$\displaystyle \begin{aligned} \min_x \left\{c^\top x + \min_y \{q^\top y \; | \; y \in \Psi(x,z)\} \; | \; x \in X \right\}, \end{aligned} $$
(P(z))

where \(X \subseteq \mathbb {R}^n\) is a nonempty polyhedron, \(c \in \mathbb {R}^n\) and \(q \in \mathbb {R}^m\) are vectors, \(z \in \mathbb {R}^s\) is a parameter, and the lower level optimal solution set mapping \(\Psi : \mathbb {R}^n \times \mathbb {R}^s \rightrightarrows \mathbb {R}^m\) is given by

$$\displaystyle \begin{aligned}\Psi(x,z) := \underset{y}{\mbox{Argmin}} \; \{d^\top y \; | \; Ay \leq Tx + z\} \end{aligned}$$

with matrices \(A \in \mathbb {R}^{s \times m},\ T \in \mathbb {R}^{s \times n}\) and a vector \(d \in \mathbb {R}^m\). Let \(f: \mathbb {R}^n \times \mathbb {R}^s \to \mathbb {R} \cup \lbrace \pm \infty \rbrace \) denote the mapping

$$\displaystyle \begin{aligned}f(x,z) := c^\top x + \min_y \{q^\top y \; | \; y \in \Psi(x,z)\}. \end{aligned}$$

Lemma 17.2.1

Assume dom f  ∅, then f is real-valued and Lipschitz continuous on the polyhedron \(P = \{(x,z) \in \mathbb {R}^n \times \mathbb {R}^s \; | \; \exists y \in \mathbb {R}^m: Ay \leq Tx + z\}\). △

Proof

By Eaves [13, Sect. 3], ∅ ≠ dom f ⊆dom  Ψ implies dom  Ψ = P. Consequently, the linear program in the definition of f(x, z) is solvable for any (x, z) ∈ P by parametric linear programming theory (see [14, Sect. 2.3.2]). Consider any (x, z), (x′, z′) ∈ P. Without loss of generality, assume that f(x, z) ≥ f(x′, z′) and let y′∈ Ψ(x′, z′) be such that f(x′, z′) = c x′ + q y′. Following [15] we obtain

$$\displaystyle \begin{aligned} \begin{array}{rcl} |f(x,z)-f(x',z')| \; &\displaystyle = \; f(x,z)- c^\top x' - q^\top y' \; \leq \; c^\top x + q^\top y - c^\top x' - q^\top y' \\ &\displaystyle \leq \; \|c\| \|x-x'\| + \|q\| \|y-y'\| \end{array} \end{aligned} $$

for any y ∈ Ψ(x, z). Let \(\mathbb {B}\) denote the Euclidean unit ball and 0 <  Λ <  a constant, then [16, Theorem 4.2] yields

$$\displaystyle \begin{aligned}\Psi(x',z')\subseteq \Psi(x,z)+ \Lambda \|(x,z)-(x',z')\|\mathbb{B} \end{aligned}$$

and hence |f(x, z) − f(x′, z′)| ≤ (∥c∥ +  Λ∥q∥)∥(x, z) − (x′, z′)∥. □

Remark 17.2.2

An alternate proof for Lemma 17.2.1 is given in [17, Theorem 1]. However, the arguments above can be easily extended to lower level problems with convex quadratic objective function and linear constraints. △

Linear programming theory (cf. [18]) provides verifiable necessary and sufficient condition for dom f ≠ ∅:

Lemma 17.2.3

dom f  ∅ holds if and only if there exists \((x,z) \in \mathbb {R}^n \times \mathbb {R}^s\)such that

  1. a.

    {y | Ay  Tx + z} is nonempty,

  2. b.

    there is some \(u \in \mathbb {R}^s\)satisfying A u = d and u ≤ 0, and

  3. c.

    the function yq y is bounded from below on Ψ(x, z).

Under these conditions,

$$\displaystyle \begin{aligned}\min_{y} \lbrace q^\top y \; | \; y \in \Psi(x',z') \rbrace \end{aligned}$$

is attained for any (x′, z′) ∈ P.

2.2 Bilevel Stochastic Linear Programming Models

A bilevel stochastic program arises if the parameter z = Z(ω) in P(z) is the realization of a known random vector Z on some probability space \((\Omega , \mathcal {F}, \mathbb {P})\) and we assume the following chronology of decision and observation:

Throughout the analysis, we assume the stochasticity to be purely exogenous, i.e. the distribution of Z to be independent of x.

Let \(\mu _Z := \mathbb {P} \circ Z^{-1} \in \mathcal {P}(\mathbb {R}^s)\) denote the Borel probability measure induced by Z. We shall assume dom f ≠ ∅ and that the lower level problem is feasible for any leader’s decision and any realization of the randomness, i.e.

$$\displaystyle \begin{aligned}X \subseteq P_Z := \lbrace x \in \mathbb{R}^n \; | \; (x,z) \in P \; \forall z \in \mbox{supp} \; \mu_Z \rbrace. \end{aligned}$$

In two-stage stochastic programming, a similar assumption is known as relatively complete recourse (cf. [1, Sect. 2.1.3]). In this setting, each leader’s decision x ∈ X gives rise to a random variable f(x, Z(⋅)). We thus may fix any mapping \(\mathcal {R}: \mathcal {X} \to \mathbb {R}\), where \(\mathcal {X}\) is a linear subspace of \(L^0(\Omega , \mathcal {F}, \mathbb {P})\) that contains the constants and satisfies

$$\displaystyle \begin{aligned}\lbrace f(x,Z(\cdot)) \; | \; x \in X \rbrace \subseteq \mathcal{X}, \end{aligned}$$

and consider the bilevel stochastic program

$$\displaystyle \begin{aligned} \min_x \left\{\mathcal{R}[f(x,Z(\cdot))] \; | \; x \in X \right\}. \end{aligned} $$
(17.2.1)

Under suitable moment or boundedness conditions on Z the classical L p-spaces \(L^p(\Omega , \mathcal {F}, \mathbb {P})\) with p ∈ [1, ] are natural choices for the domain \(\mathcal {X}\) of \(\mathcal {R}\). We define

$$\displaystyle \begin{aligned}\mathcal{M}^p_s := \left\{ \mu \in \mathcal{P}(\mathbb{R}^s) \; | \; \int_{\mathbb{R}^s} \|z\|{}^p~\mu(dz) < \infty \right\}, \end{aligned}$$

which denotes the set of Borel probability measures on \(\mathbb {R}^s\) with finite moments of order p ∈ [1, ), and the set

$$\displaystyle \begin{aligned}\mathcal{M}^\infty_s := \left\{ \mu \in \mathcal{P}(\mathbb{R}^s) \; | \; \mbox{supp} \; \mu_Z \; \text{is bounded} \right\}. \end{aligned}$$

Lemma 17.2.4

Assume dom f  ∅ and \(\mu _Z \in \mathcal {M}^p_s\)for some p ∈ [1, ]. Then the mapping \(F: P_Z \to L^0(\Omega , \mathcal {F}, \mathbb {P})\)given by F(x) := f(x, Z(⋅)) takes values in \(L^p(\Omega , \mathcal {F}, \mathbb {P})\)and is Lipschitz continuous with respect to the L p-norm.

Proof

We first consider the case that p is finite. By (0, 0) ∈ P and Lemma 17.2.1, there exist a constant L f such that

$$\displaystyle \begin{aligned} \begin{array}{rcl} \| F(x) \|{}_{L^p}^p &\displaystyle \leq 2^p |f(0,0)|{}^p + 2^p \int_{\mathbb{R}^s} |f(x,z) - f(0,0)|{}^p~\mu_Z(dz) \\ &\displaystyle \leq 2^p |f(0,0)|{}^p + 2^p L_f^p \|x\|{}^p + 2^p L_f^p \int_{\mathbb{R}^s} \|z\|{}^p~\mu_Z(dz) < \infty \end{array} \end{aligned} $$

holds for any x ∈ P Z. Furthermore, for any x, x′∈ P Z we have

$$\displaystyle \begin{aligned}\| F(x) - F(x') \|{}_{L^p} = \left( \int_{\mathbb{R}^s} |f(x,z) - f(x',z)|{}^p~\mu_Z(dz) \right)^{1/p} \leq L_f \|x - x' \|. \end{aligned} $$

For p = , Lemma 17.2.1 implies that for any fixed x ∈ P Z, the mapping f(x, ⋅) is continuous on supp μ Z. Thus, \(\mu _Z \in \mathcal {M}^\infty _s\) yields

$$\displaystyle \begin{aligned}\|F(x)\|{}_{L^\infty} \leq \sup_{z \in \mbox{supp} \; \mu_Z} |f(x,z)| < \infty.\end{aligned} $$

Moreover, for any x, x′∈ P Z we have

$$\displaystyle \begin{aligned}\| F(x) - F(x') \|{}_{L^\infty} \leq \sup_{z \in \mbox{supp} \; \mu_Z} |f(x,z) - f(x',z)| \leq L_f \|x-x'\|.\end{aligned} $$

The mapping \(\mathcal {R}\) in (17.2.1) can be used to measure the risk associated with the random variable F(x).

Definition 17.2.5

A mapping \(\mathcal {R}: \mathcal {X} \to \mathbb {R}\) defined on some linear subspace \(\mathcal {X}\) of \(L^0(\Omega , \mathcal {F}, \mathbb {P})\) containing the constants is called a convex risk measure if the following conditions are fulfilled:

  1. a.

    (Convexity) For any \(Y_1, Y_2 \in \mathcal {X}\) and λ ∈ [0, 1] we have

    $$\displaystyle \begin{aligned}\mathcal{R}[\lambda Y_1 + (1-\lambda)Y_2] \leq \lambda \mathcal{R}[Y_1] + (1 - \lambda) \mathcal{R}[Y_2]. \end{aligned}$$
  2. b.

    (Monotonicity) \(\mathcal {R}[Y_1] \leq \mathcal {R}[Y_2]\) for all \(Y_1, Y_2 \in \mathcal {X}\) satisfying Y 1 ≤ Y 2 with respect to the \(\mathbb {P}\)-almost sure partial order.

  3. c.

    (Translation equivariance) \(\mathcal {R}[Y + t] = \mathcal {R}[Y] + t\) for all \(Y \in \mathcal {X}\) and \(t \in \mathbb {R}\).

A convex risk measure \(\mathcal {R}\) is coherent if the following holds true:

  1. d.

    (Positive homogeneity) \(\mathcal {R}[t Y] = t \cdot \mathcal {R}[Y]\) for all \(Y \in \mathcal {X}\) and t ∈ [0, ). △

Definition 17.2.6

A mapping \(\mathcal {R}: \mathcal {X} \to \mathbb {R}\) is called law-invariant if for all \(Y_1, Y_2 \in \mathcal {X}\) with \(\mathbb {P} \circ Y_1^{-1} = \mathbb {P} \circ Y_2^{-1}\) we have \(\mathcal {R}[Y_1] = \mathcal {R}[Y_2]\). △

Coherent risk measures have been introduced in [19], while the analysis of convex risk measures dates back to [20]. A thorough discussion of their analytical traits is provided in [21]. Below we list some risk measures that are commonly used in stochastic programming (cf. [1, Sect. 6.3.2]).

Examples

  1. (a)

    The expectation \(\mathbb {E}: L^1(\Omega , \mathcal {F}, \mathbb {P}) \to \mathbb {R}\),

    $$\displaystyle \begin{aligned}\mathbb{E}[Y] = \int_\Omega Y(\omega)~\mathbb{P}(d\omega), \end{aligned}$$

    is a law-invariant and coherent risk measure that turns (17.2.1) into the risk neutral bilevel stochastic program

    $$\displaystyle \begin{aligned}\min_x \left\{\mathbb{E}[F(x)] \; | \; x \in X \right\}. \end{aligned}$$
  2. (b)

    The expected excess of orderp ∈ [1, ) over a predefined level \(\eta \in \mathbb {R}\) is the mapping \(\mbox{EE}_{\eta }^p: L^p(\Omega , \mathcal {F}, \mathbb {P}) \to \mathbb {R}\) given by

    $$\displaystyle \begin{aligned}\mbox{EE}_{\eta}^p[Y] := \Big( \mathbb{E}\big[\max \lbrace Y - \eta, 0 \rbrace^p \big] \Big)^{1/p}. \end{aligned}$$

    \(\mbox{EE}_{\eta }^p\) is law-invariant, convex and nondecreasing, but neither translation-equivariant nor positively homogeneous (cf. [1, Example 6.22]).

  3. (c)

    The mean upper semideviation of orderp ∈ [1, ) is the mapping \(\mbox{SD}^p_\rho : L^p(\Omega , \mathcal {F}, \mathbb {P}) \to \mathbb {R}\) defined by

    $$\displaystyle \begin{aligned}\mbox{SD}_\rho^p[Y] := \mathbb{E}[Y] + \rho \cdot \mbox{EE}_{\mathbb{E}[Y]}^p[Y] = \mathbb{E}[Y] + \rho \cdot \Big( \mathbb{E}\big[\max \lbrace \mathbb{E}[Y] - \eta, 0 \rbrace^p \big] \Big)^{1/p}, \end{aligned}$$

    where ρ ∈ (0, 1] is a parameter. \(\mbox{SD}_\rho ^p\) is a law-invariant coherent risk measure (cf. [1, Example 6.20]).

  4. (d)

    The excess probability \(\mbox{EP}_\eta : L^0(\Omega , \mathcal {F}, \mathbb {P}) \to \mathbb {R}\) over a prescribed target level \(\eta \in \mathbb {R}\) given by

    $$\displaystyle \begin{aligned}\mbox{EP}_\eta[Y] = \mathbb{P}[\lbrace \omega \in \Omega \; | \; Y(\omega) > \eta \rbrace], \end{aligned}$$

    is nondecreasing and law-invariant. However, it lacks convexity, translation-equivariance and positive homogeneity (cf. [22, Example 2.29]).

  5. (e)

    The Value-at-Risk \(\mbox{VaR}_\alpha : L^0(\Omega , \mathcal {F}, \mathbb {P}) \to \mathbb {R}\) at level α ∈ (0, 1) defined by

    $$\displaystyle \begin{aligned}\mbox{VaR}_{\alpha}[Y] := \inf \lbrace \eta \in \mathbb{R} \; | \; \mathbb{P}[\lbrace \omega \in \Omega \; | \; Y(\omega) \leq \eta \rbrace] \geq \alpha \rbrace \end{aligned}$$

    is law-invariant, nondecreasing, translation-equivariant and positively homogeneous, but in general not convex (cf. [23]).

  6. (f)

    The Conditional Value-at-Risk \(\mbox{CVaR}_\alpha : L^1(\Omega , \mathcal {F}, \mathbb {P}) \to \mathbb {R}\) at level α ∈ (0, 1) given by

    $$\displaystyle \begin{aligned}\mbox{CVaR}_{\alpha}[Y] := \inf \lbrace \eta + \frac{1}{1-\alpha}\mbox{EE}_{\eta}^1[Y] \; | \; \eta \in \mathbb{R} \rbrace \end{aligned}$$

    is a law-invariant coherent risk measure (cf. [23, Proposition 2]). The variational representation above was established in [24, Theorem 10].

  7. (g)

    The entropic risk measure \(\mbox{Entr}_{\alpha }: L^\infty (\Omega , \mathcal {F}, \mathbb {P}) \to \mathbb {R}\) defined by

    $$\displaystyle \begin{aligned}\mbox{Entr}_{\alpha}[Y] := \frac{1}{\alpha} \ln \Big( \mathbb{E} \big[\exp(\alpha Y) \big] \Big), \end{aligned}$$

    where α > 0 is a parameter, is a law-invariant convex (but not coherent) risk measure (cf. [21, Example 4.13, Example 4.34]).

  8. (h)

    The worst-case risk measure \(\mathcal {R}_{\max }: L^\infty (\Omega , \mathcal {F}, \mathbb {P}) \to \mathbb {R}\) given by

    $$\displaystyle \begin{aligned}\mathcal{R}_{\max}[Y] := \sup_{\omega \in \Omega} Y(\omega) \end{aligned}$$

    is law-invariant and coherent (cf. [21, Example 4.8]). This choice of \(\mathcal {R}\) in (17.2.1) leads to the bilevel robust problem

    $$\displaystyle \begin{aligned}\min_x \left\{\mathcal{R}_{\max}[F(x)] \; | \; x \in X \right\}. \end{aligned}$$

    Note that \(\mathcal {R}_{\max }\) only depends on the so called uncertainty set \(Z(\Omega ) \subseteq \mathbb {R}\). Thus, a bilevel robust problem can be formulated without knowledge of the distribution of the uncertain parameter. In robust optimization, the uncertainty set is often assumed to be finite, polyhedral or ellipsoidal (cf. [25]).

Remark 17.2.7

The set of convex (coherent) risk measures on \(L^p(\Omega , \mathcal {F}, \mathbb {P})\) is a convex cone for any fixed p ∈ [1, ]. In particular, if \(\mathcal {R}: L^p(\Omega , \mathcal {F}, \mathbb {P}) \to \mathbb {R}\) is a convex (coherent) risk measure, then so is \(\mathbb {E} + \rho \cdot \mathcal {R}\) for any ρ ≥ 0. The mean-risk bilevel stochastic programming model

$$\displaystyle \begin{aligned}\min_x \left\{\mathbb{E}[F(x)] + \rho \cdot \mathcal{R}[F(x)] \; | \; x \in X \right\} \end{aligned}$$

seeks to minimize a weighted sum of the expected value of the outcome and a quantification of risk.△

Example

Consider the bilevel stochastic problem

$$\displaystyle \begin{aligned}\min \left\{ \mathcal{R}[\min \Psi(x,Z)] \; | \; 1 \leq x \leq 6 \right\}, \end{aligned}$$

where Ψ(x, z) := Argminy{−y | y ≥ 1, y ≤ x + 2 + z 1, y ≤−x + 8.5 + z 2} and assume that Z is uniformly distributed over the square [−0.5, 0.5]2.

As it can be seen in Fig. 17.1, we have

$$\displaystyle \begin{aligned}\Psi(x,z) = \begin{cases} \lbrace x + 2 +z_1 \rbrace &\text{if} \; x \leq 3.25 + 0.5z_2 - 0.5z_1 \\ \lbrace -x + 8.5 + z_2 \rbrace &\text{else} \end{cases} \end{aligned} $$

for any x ∈ [1, 6] and z ∈ [−0.5, 0.5]2. A straightforward calculation shows that

$$\displaystyle \begin{aligned} \begin{array}{rcl} \mathbb{E}[\min \Psi(x,Z)] &\displaystyle = \int_{-0.5}^{0.5} \int_{-0.5}^{0.5} x + 2 + z_1~dz_1~dz_2 \\ &\displaystyle = x+2 \end{array} \end{aligned} $$

holds for any x ∈ [1, 2.75]. Similarly, we have

$$\displaystyle \begin{aligned} \begin{array}{rcl} \mathbb{E}[\min \Psi(x,Z)] &\displaystyle = \int_{-0.5}^{2x-6} \int_{-0.5}^{-2x + 6.5 + z_2} x + 2 + z_1~dz_1~dz_2 \\ &\displaystyle + \int_{2x-6}^{0.5} \int_{-0.5}^{0.5} x + 2 + z_1~dz_1~dz_2 \\ &\displaystyle +\int_{6-2x}^{0.5} \int_{-0.5}^{2x - 6.5 + z_1} -x + 8.5 + z_2~dz_2~dz_1 \\ &\displaystyle = -\frac{4}{3}x^3 + 11x^2 - \frac{117}{4} x + \frac{1427}{48} \end{array} \end{aligned} $$

for x ∈ [2, 75, 3.25] and

$$\displaystyle \begin{aligned} \begin{array}{rcl} \mathbb{E}[\min \Psi(x,Z)] &\displaystyle = \int_{2x-7}^{0.5} \int_{-0.5}^{-2x + 6.5 + z_2} x + 2 + z_1~dz_1~dz_2 \\ &\displaystyle +\int_{-0.5}^{7-2x} \int_{-0.5}^{2x - 6.5 + z_1} -x + 8.5 + z_2~dz_2~dz_1 \\ &\displaystyle +\int_{7-2x}^{0.5} \int_{-0.5}^{0.5} -x + 8.5 + z_2~dz_2~dz_1 \\ &\displaystyle = \frac{4}{3} x^3 - 15 x^2 + \frac{221}{4} x - \frac{989}{16} \end{array} \end{aligned} $$

for x ∈ [3.25, 3.75]. Finally, for x ∈ [3.75, 6] we calculate

$$\displaystyle \begin{aligned} \begin{array}{rcl} \mathbb{E}[\min \Psi(x,Z)] &\displaystyle = \int_{-0.5}^{0.5} \int_{-0.5}^{0.5} -x + 8.5 + z_2~dz_2~dz_1 \\ &\displaystyle = -x + 8.5. \end{array} \end{aligned} $$

Thus, \(\mathbb {E}[\min \Psi (\cdot \,,Z)]\) is piecewise polynomial, non-convex and non-differentiable . It is easy to check that x  = 6 is a global minimizer of the risk-neutral model

$$\displaystyle \begin{aligned}\min \left\{ \mathbb{E}[\min \Psi(x,Z)] \; | \; 1 \leq x \leq 6 \right\}. \end{aligned}$$

In this particular example, x is also a global minimizer of the bilevel robust problem

$$\displaystyle \begin{aligned}\min \left\{ \mathcal{R}_{max}[\min \Psi(x,Z)] \; | \; 1 \leq x \leq 6 \right\}. \end{aligned} $$
Fig. 17.1
figure 1

The bold line depicts the graph of Ψ(⋅ , (0, 0)), while the dotted lines correspond to the graphs of Ψ(⋅ , (±0.5, ±0.5)) and Ψ(⋅ , (∓0.5, ±0.5))

2.3 Continuity and Differentiability

Continuity properties of \(\mathcal {R}\) carry over to Lipschitzian properties of \(\mathcal {Q}_{\mathcal {R}}: P_Z \to \mathbb {R}\), \(\mathcal {Q}_{\mathcal {R}}(x) := \mathcal {R}[F(x)]\).

Proposition 17.2.8

Assume dom f  ∅ and \(\mu _Z \in \mathcal {M}^p_s\)for some p ∈ [1, ]. Then the following statements hold true for any \(\mathcal {R}: L^p(\Omega , \mathcal {F}, \mathbb {P}) \to \mathbb {R}:\)

  1. a.

    \(\mathcal {Q}_{\mathcal {R}}\)is locally Lipschitz continuous if \(\mathcal {R}\)is convex and continuous.

  2. b.

    \(\mathcal {Q}_{\mathcal {R}}\)is locally Lipschitz continuous if \(\mathcal {R}\)is convex and nondecreasing.

  3. c.

    \(\mathcal {Q}_{\mathcal {R}}\)is locally Lipschitz continuous if \(\mathcal {R}\)is a convex risk measure .

  4. d.

    \(\mathcal {Q}_{\mathcal {R}}\)is Lipschitz continuous if \(\mathcal {R}\)is Lipschitz continuous.

  5. e.

    \(\mathcal {Q}_{\mathcal {R}}\)is Lipschitz continuous if \(\mathcal {R}\)is a coherent risk measure.

Proof

  1. a.

    It is well-known that any real-valued convex and continuous mapping on a normed space is locally Lipschitz continuous (cf. [26]). The result is thus an immediate consequence of Lemma 17.2.4.

  2. b.

    Any real-valued, convex and nondecreasing functional on the Banach lattice \(L^p(\Omega , \mathcal {F}, \mathbb {P})\) is continuous (see e.g. [27, Theorem 4.1]).

  3. c.

    By definition, any convex risk measure is convex and nondecreasing.

  4. d.

    This is a straightforward conclusion from Lemma 17.2.4.

  5. e.

    Any coherent risk measure on \(L^p(\Omega , \mathcal {F}, \mathbb {P})\) is Lipschitz continuous by Inoue [28, Lemma 2.1].

Remark 17.2.9

Any coherent risk measure \(\mathcal {R}: L^\infty (\Omega , \mathcal {F}, \mathbb {P}) \to \mathbb {R}\) is Lipschitz continuous with constant 1 by Föllmer and Schied [21, Lemma 4.3]. Concrete Lipschitz constants for continuous coherent law-invariant risk measures \(\mathcal {R}: L^p (\Omega , \mathcal {F}, \mathbb {P}) \to \mathbb {R}\) with p ∈ [1, ) may be obtained from representation results (see e.g. [29]). △

Proposition 17.2.8 allows to formulate sufficient conditions for the existence of optimal solutions to the bilevel stochastic linear program (17.2.1):

Corollary 17.2.10

Assume dom f  ∅, \(\mu _Z \in \mathcal {M}^p_s\)for some p ∈ [1, ] and let X  P Zbe nonempty and compact. Then (17.2.1) is solvable for any convex and nondecreasing mapping \(\mathcal {R}: L^p(\Omega , \mathcal {F}, \mathbb {P}) \to \mathbb {R}\). △

Due to the lack of convexity, Proposition 17.2.8 and the subsequent Corollary do not apply to the excess probability and the Value-at-Risk. However, invoking Lemma 17.2.1, the arguments used in the proof of [30, Proposition 3.3] can adapted to the setting of bilevel stochastic linear programming:

Proposition 17.2.11

Assume dom f  ∅ and fix \(\eta \in \mathbb {R}\), then \(\mathcal {Q}_{\mathit{\mbox{EP}}_\eta }\)is lower semicontinuous on P Zand continuous at any x  P Zsatisfying

$$\displaystyle \begin{aligned}\mu_Z[\lbrace z \in \mathbb{R} \; | \; f(x,z) = \eta \rbrace] = 0. \end{aligned}$$

Furthermore, let X  P Zbe nonempty and compact. Then

$$\displaystyle \begin{aligned}\min_x \left\{ \mathit{\mbox{EP}}_\eta [F(x)] \; | \; x \in X \right\} \end{aligned}$$

is solvable.

\(\mathcal {Q}_{\mbox{VaR}_\alpha }\) has been analyzed in [17, Theorem 2]:

Proposition 17.2.12

Assume dom f  ∅ and α ∈ (0, 1), then \(\mathcal {Q}_{\mathit{\mbox{VaR}}_\alpha }\)is continuous. Moreover, let X  P Zbe nonempty and compact. Then

$$\displaystyle \begin{aligned}\min_x \left\{\mathit{\mbox{VaR}}_\alpha[F(x)] \; | \; x \in X \right\} \end{aligned}$$

is solvable.

For specific risk measures, sufficient conditions for differentiability of \(\mathcal {Q}_{\mathcal {R}}\) have been investigated in [31].

Proposition 17.2.13

Assume dom f  ∅ and that \(\mu _Z \in \mathcal {M}^1_s\)is absolutely continuous with respect to the Lebesgue measure. Fix any \(\eta \in \mathbb {R}\), then \(\mathcal {Q}_{\mathbb {E}}\)and \(\mathcal {Q}_{\mathit{\mbox{EE}}_\eta ^1}\)are continuously differentiable at any x 0 ∈int P Z. Furthermore, for any ρ ∈ [0, 1), \(\mathcal {Q}_{\mathit{\mbox{SD}}_\rho ^1}\)is continuously differentiable at any x 0 ∈int P Zsatisfying \(\mathcal {Q}_{\mathbb {E}}(x_0) \neq 0\). △

Remark 17.2.14

Theorems 3.7, 3.8 and 3.9 in [31] provide more involved sufficient conditions for continuous differentiability of \(\mathcal {Q}_{\mathbb {E}}\), \(\mathcal {Q}_{\mbox{EE}_\eta ^1}\) and \(\mathcal {Q}_{\mbox{SD}_\rho ^1}\) that do not require μ Z to be absolutely continuous.△

Remark 17.2.15

Note that the assumptions of Proposition 17.2.13 are not fulfilled in the example at the end of Sect. 17.2.2: The right-hand side of the restriction system is only partially random as the right-hand side of the restriction y ≥ 1 does not depend on Z. If we extend the system to

$$\displaystyle \begin{aligned}y \leq x + 2 + z_1^{\prime}, \; y \leq -x + 8.5 + z_2^{\prime}, \; y \geq 1 + z_3^{\prime}, \end{aligned}$$

the third component of the extended random vector Z′ has to take the value 0 with probability 1. Thus, \(\mathbb {P} \circ Z^{\prime -1}\) is not absolutely continuous with respect to the Lebesgue measure. △

In the presence of differentiability, necessary optimality conditions for (17.2.1) can be formulated in terms of directional derivatives (cf. [31, Corollary 3.10]).

Proposition 17.2.16

Assume dom f  ∅, \(\mu _Z \in \mathcal {M}^p_s\)and X  P Z. Furthermore, let x 0 ∈ X be a local minimizer of problem (17.2.1) and assume that \(\mathcal {Q}_{\mathcal {R}}\)is differentiable at x 0. Then

$$\displaystyle \begin{aligned}\mathcal{Q}^{\prime}_{\mathcal{R}}(x_0)v \geq 0 \end{aligned}$$

holds for any feasible direction

$$\displaystyle \begin{aligned}v \in \lbrace v \in \mathbb{R}^n \; | \; \exists \epsilon_0 > 0: \; x_0 + \epsilon v \in X \; \forall \epsilon \in [0,\epsilon_0] \rbrace. \end{aligned}$$

2.4 Stability

While we have only considered \(\mathcal {Q}_{\mathcal {R}}\) as a functions of the leader’s decision x so far, it also depends on the underlying probability measure μ Z. In stochastic programming, incomplete information about the true underlying distribution or the need for computational efficiency may lead to optimization models that employ an approximation of μ Z. This section analysis deals with the behaviour of optimal values and (local) optimal solution sets of (17.2.1) under perturbations of the underlying distribution.

Taking into account that the support of the perturbed measure may differ from the original support, we shall assume dom f ≠ ∅ and

$$\displaystyle \begin{aligned}P = \mathbb{R}^n \times \mathbb{R}^s \end{aligned}$$

to ensure that the objective function of (17.2.1) remains well defined. The corresponding assumption in two-stage stochastic programming is called complete recourse (cf. [1, Sect. 2.1.3]). Sufficient conditions for dom f ≠ ∅ and \(P = \mathbb {R}^n \times \mathbb {R}^s\) are given in [17, Corollary 1] and [17, Corollary 2]. The following characterization is a direct consequence of Gordan’s Lemma (cf. [32]):

Lemma 17.2.17

\(P = \mathbb {R}^n \times \mathbb {R}^s\)holds if and only if u = 0 is the only non-negative solution to A u = 0. △

Throughout this section, we shall consider the situation that \(\mathcal {R}: L^p(\Omega , \mathcal {F}, \mathbb {P}) \to \mathbb {R}\) with p ∈ [1, ) is law-invariant, convex and nondecreasing. Furthermore, for the sake of notational simplicity (cf. [31, Remark 4.1]), we assume that the probability space \((\Omega , \mathcal {F}, \mathbb {P})\) is atomless, i.e. for any \(A \in \mathcal {F}\) with \(\mathbb {P}[A] > 0\) there exists some \(B \in \mathcal {F}\) with \(B \subsetneq A\) and \(\mathbb {P}[A] > \mathbb {P}[B] > 0\).

Then for any x ∈ X and \(\mu \in \mathcal {M}^p_s\), we have \((\delta _x \otimes \mu ) \circ f^{-1} \in \mathcal {M}^p_1\), where \(\delta _x \in \mathcal {P}(\mathbb {R}^n)\) denotes the Dirac measure at x. The atomlessness of \((\Omega , \mathcal {F}, \mathbb {P})\) ensures that there exists some \(Y_{(x,\mu )} \in L^p(\Omega , \mathcal {F}, \mathbb {P})\) such that \(\mathbb {P} \circ Y_{(x,\mu )}^{-1} = (\delta _x \otimes \mu ) \circ f^{-1}\). Thus, we may consider the mapping \(\mathcal {Q}_{\mathcal {R}}: X \times \mathcal {M}^p_s \to \mathbb {R}\) defined by

$$\displaystyle \begin{aligned}\mathcal{Q}_{\mathcal{R}}(x,\mu) := \mathcal{R}[Y_{(x,\mu)}]. \end{aligned}$$

Note that the specific choice of Y (x,μ) does not matter due to the law-invariance of \(\mathcal {R}\).

Consider the parametric optimization problem

$$\displaystyle \begin{aligned} \min_x \lbrace \mathcal{Q}_{\mathcal{R}}(x,\mu) \; | \; x \in X \rbrace. \end{aligned} $$
(P μ )

As (Pμ ) may be non-convex, we shall pay special attention to sets of locally optimal solutions. For any open set \(V \subseteq \mathbb {R}^n\) we introduce the localized optimal value function \(\varphi _V: \mathcal {M}^p_s \to \overline {\mathbb {R}}\),

$$\displaystyle \begin{aligned}\varphi_V(\mu) := \min_x \lbrace \mathcal{Q}_{\mathcal{R}}(x,\mu) \; | \; x \in X \cap \mbox{cl} \; V \rbrace, \end{aligned}$$

as well as the localized optimal solution set mapping \(\phi _V: \mathcal {M}^p_s \rightrightarrows \mathbb {R}^n\),

$$\displaystyle \begin{aligned}\phi_V(\mu) := \underset{x}{\mbox{Argmin}} \lbrace \mathcal{Q}_{\mathcal{R}}(x,\mu) \; | \; x \in X \cap \mbox{cl} \; V \rbrace. \end{aligned}$$

It is well known that additional assumptions are needed when studying stability of local solutions.

Definition 17.2.18

Given \(\mu \in \mathcal {M}^p_s\) and an open set \(V \subseteq \mathbb {R}^n\), ϕ V(μ) is called a complete local minimizing (CLM) set of (Pμ ) w.r.t. V  if ∅ ≠ ϕ V(μ) ⊆ V . △

Remark 17.2.19

The set of global optimal solutions \(\phi _{\mathbb {R}^n}(\mu )\) and any set of isolated minimizers are CLM sets. However, sets of strict local minimizers may fail to be CLM sets (cf. [33]). △

In the following, we shall equip \(\mathcal {P}(\mathbb {R}^s)\) with the topology of weak convergence , i.e. the topology, where a sequence \(\{\mu _l\}_{l \in \mathbb {N}} \subset \mathcal {P}(\mathbb {R}^s)\) converges weakly to \(\mu \in \mathcal {P}(\mathbb {R}^s)\), written , if and only if

$$\displaystyle \begin{aligned} \lim_{l \to \infty} \int_{\mathbb{R}^s} h(t)~\mu_l(dt) = \int_{\mathbb{R}^s} h(t)~\mu(dt) \end{aligned}$$

holds for any bounded continuous function \(h:\mathbb {R}^s \to \mathbb {R}\) (cf. [34]). The example below (cf. [22, Example 3.2]) shows that even \(\varphi _{\mathbb {R}^n}\) may fail to be weakly continuous on the entire space \(\mathcal {P}(\mathbb {R}^s)\).

Example

The problem

$$\displaystyle \begin{aligned}\min_x \left\{x + \int_{\mathbb{R}} z~\mu(dz) \; | \; 0 \leq x \leq 1 \right\} \end{aligned}$$

arises from a bilevel stochastic linear problem, where \(\mathcal {R} = \mathbb {E}\) and \(\Psi (x,z) = \{z\} \subsetneq \mathbb {R}\) holds for any (x, z). Assume that μ is the Dirac measure at 0, then the above problem can be rewritten as

$$\displaystyle \begin{aligned}\min_x \{x \; | \; 0 \leq x \leq 1\} \end{aligned}$$

and its optimal value is 0.

However, while the sequence \(\mu _l := (1 - \frac {1}{l}) \delta _{0} + \frac {1}{l} \delta _l\) converges weakly to δ 0, replacing μ with μ l yields the problem

$$\displaystyle \begin{aligned}\min_x \left\{x + 1 \; | \; 0 \leq x \leq 1 \right\}, \end{aligned}$$

whose optimal value is equal to 1 for any \(l \in \mathbb {N}\).

We shall follow the approach of [22, 31] and [35] and confine the stability analysis to locally uniformly ∥⋅∥p-integrating sets.

Definition 17.2.20

A set \(\mathcal {M} \subseteq \mathcal {M}^p_s\) is said to be locally uniformly ∥⋅∥p-integrating if for any 𝜖 > 0 there exists some open neighborhood \(\mathcal {N}\) of μ w.r.t. the topology of weak convergence such that

$$\displaystyle \begin{aligned}\lim_{a \to \infty} \sup_{\nu \in \mathcal{M} \cap \mathcal{N}} \int_{\mathbb{R}^s \setminus a \mathbb{B}} \|z\|{}^p~\nu(dz) \leq \epsilon. \end{aligned}$$

A detailed discussion of locally uniformly ∥⋅∥p-integrating sets and their generalizations is provided in [21, 36, 37], and [38]. The following examples demonstrate the relevance of the concept.

Examples

  1. (a)

    Fix κ, 𝜖 > 0. Then by Föllmer and Schied [21, Corollary A.47 (c)], the set

    $$\displaystyle \begin{aligned}\mathcal{M}(\kappa, \epsilon) := \left\{ \mu \in \mathcal{P}(\mathbb{R}^s) \; | \; \int_{\mathbb{R}^s} \|z\|{}^{p + \epsilon}~\mu(dz) \leq \kappa \right\} \end{aligned}$$

    of Borel probability measures with uniformly bounded moments of order p + 𝜖 is locally uniformly ∥⋅∥p-integrating.

  2. (b)

    Fix any compact set \(\Xi \subset \mathbb {R}^s\). By Föllmer and Schied [21, Corollary A.47, (b)], the set

    $$\displaystyle \begin{aligned}\{\mu \in \mathcal{P}(\mathbb{R}^s) \; | \; \mu[\Xi] = 1\} \end{aligned}$$

    of Borel probability measures whose support is contained in Ξ is locally uniformly ∥⋅∥p-integrating.

The following result has been established in [31, Theorem 4.7]:

Theorem 17.2.21

Assume dom f  ∅ and \(P = \mathbb {R}^n \times \mathbb {R}^s\). Let \(\mathcal {M} \subseteq \mathcal {M}^{p}_{s}\)be locally uniformly ∥⋅∥p-integrating, then

  1. a.

    \(\mathcal {Q}_{\mathcal {R}}|{ }_{\mathbb {R}^n \times \mathcal {M}}\) is real-valued and weakly continuous.

  2. b.

    \(\varphi _{\mathbb {R}^n}|{ }_{\mathcal {M}}\) is weakly upper semicontinuous.

In addition, assume that \(\mu _0 \in \mathcal {M}\)is such that ϕ V(μ 0) is a CLM set of \(P_{\mu _0}\)w.r.t. some open bounded set \(V \subsetneq \mathbb {R}^n\). Then the following statements hold true:

  1. c.

    \(\varphi _V|{ }_{\mathcal {M}}\)is weakly continuous at μ 0.

  2. d.

    \(\phi _V|{ }_{\mathcal {M}}\)is weakly upper semicontinuous at μ 0in the sense of Berge (cf. [ 39]), i.e. for any open set \(\mathcal {O} \subseteq \mathbb {R}^{n}\)with \(\phi |{ }_V(\mu _0) \subseteq \mathcal {O}\)there exists a weakly open neighborhood \(\mathcal {N}\)of μ 0such that \(\phi _V(\mu ) \subseteq \mathcal {O}\)for all \(\mu \in \mathcal {N} \cap \mathcal {M}\).

  3. e.

    There exists some weakly open neighborhood \(\mathcal {U}\)of μ 0such that ϕ V(μ) is a CLM set for (Pμ ) w.r.t. V  for any \(\mu \in \mathcal {U} \cap \mathcal {M}\). △

Proof

Fix any \(x_0 \in \mathbb {R}^n\). By Lemma 17.2.1, f is Lipschitz continuous on \(\mathbb {R}^n \times \mathbb {R}^s\). Thus, there exists a constant L > 0 such that

$$\displaystyle \begin{aligned} |f(x,z)| \leq L \|z\| + L\|x-x_0\| + |f(x_0, 0)| \end{aligned}$$

and the result follows from [35, Corollary 2.4.]. □

Remark 17.2.22

Under the assumptions of Theorem 17.2.21d., any accumulation point x of a sequence local optimal solutions x l ∈ ϕ V(μ l) as , \(\mu _l \in \mathcal {M}\), is a local optimal solution of (Pμ ). A detailed discussion of Berge’s notion of upper semicontinuity and related concepts is provided in [40, Chap. 5].△

As any Borel probability measure is the weak limit of a sequence of measures having finite support, Theorem 17.2.21 justifies an approach where the true underlying measure is approximated by a sequence of finite discrete ones. It is well known that approximation schemes based on discretization via empirical estimation [41, 42] or conditional expectations [43, 44] produce weakly converging sequences of discrete probability measures under mild assumptions.

Remark 17.2.23

All results of Sects. 17.2.117.2.4 can be easily extended to the pessimistic approach to bilevel stochastic linear programming, where f takes the form

$$\displaystyle \begin{aligned}f(x,z) = c^\top x - \min_y \{- q^\top y \; | \; y \in \Psi(x,z)\} \end{aligned}$$

(cf. [22, Chap. 4]).△

2.5 Stochastic Dominance Constraints

One possibility to model the minimization in

$$\displaystyle \begin{aligned}\min \lbrace f(x,Z(\cdot)) \; | \; x \in X \rbrace \end{aligned}$$

is doing it w.r.t. some risk measure that maps f(x, Z(⋅)) into the reals, as introduced in Sect. 17.2. In this section, we shall discuss an alternate approach, where a disutility function \(g: \mathbb {R}^n \to \mathbb {R}\) is minimized over some subset of random variables of acceptable risk:

$$\displaystyle \begin{aligned} \min_x \left\{g(x) \; | \; x \in X,\; f(x, Z(\cdot)) \in \mathcal{A}\right\}, \end{aligned}$$

where \(\mathcal {A} \subseteq f(X,Z) := \{f(x, Z(\cdot )) \; | \; x \in X\}\). The following cases are of particular interest (cf. [1, pp. 90–91]) :

Examples

  1. (a)

    \(\mathcal {A}\) is given by probabilistic constraints, i.e.

    $$\displaystyle \begin{aligned}\mathcal{A}_{\text{pc}} = \{h \in f(X,Z) \; | \; \mathbb{P}[h \leq \beta_j] \geq p_j \; \forall j= 1, \ldots, l\} \end{aligned}$$

    for bounds \(\beta _1, \ldots , \beta _l \in \mathbb {R}\) and safety levels p 1, …, p l ∈ (0, 1).

  2. (b)

    \(\mathcal {A}\) is given by first-order stochastic dominance constraints, i.e.

    $$\displaystyle \begin{aligned}\mathcal{A}_{\text{fo}} = \{h \in f(X,Z) \; | \; \mathbb{P}[h \leq \beta] \geq \mathbb{P}[b \leq \beta] \; \forall \beta \in \mathbb{R}\}, \end{aligned}$$

    where \(b \in L^{0}(\Omega , \mathcal {F}, \mathbb {P})\) is a given benchmark variable. If b is discrete with a finite number of realizations, it is sufficient to impose the relation \(\mathbb {P}[h \leq \beta ] \geq \mathbb {P}[b \leq \beta ]\) for any β in a finite subset of \(\mathbb {R}\). In this case, \(\mathcal {A}\) admits a description by a finite system of probabilistic constraints.

  3. (c)

    \(\mathcal {A}\) is given by second-order stochastic dominance constraints, i.e.

    $$\displaystyle \begin{aligned}\mathcal{A}_{\text{so}} = \{h \in f(X,Z) \; | \; \mathbb{E}[\max\{h - \eta, 0\}] \leq \mathbb{E}[\max\{b - \eta, 0\}] \; \forall \eta \in \mathbb{R}\}, \end{aligned}$$

    where \(b \in L^{1}(\Omega , \mathcal {F}, \mathbb {P})\) is a given benchmark variable.

A discussion of general models involving probabilistic or stochastic dominance constraints can be found in [1, Chap. 8] and [45, Chap. 8.3].

Let \(\nu := \mathbb {P} \circ b^{-1} \in \mathcal {P}(\mathbb {R})\) denote the distribution of the benchmark variable b. Then the feasible set under first-order stochastic dominance constraints admits the representation

$$\displaystyle \begin{aligned}\left\{ x \in X \; | \; \mu_Z\big[ \lbrace z \in \mathbb{R}^s \; | \; f(x,z) \leq \beta \rbrace \big] \geq \nu\big[ \lbrace b \in \mathbb{R} \; | \; b \leq \beta \rbrace \big] \; \forall \beta \in \mathbb{R} \right\}. \end{aligned}$$

Similarly, for second-order stochastic dominance constraints, \(\mu \in \mathcal {M}^1_s\) and \(\nu \in \mathcal {M}^1_1\), the feasible set takes the form

$$\displaystyle \begin{aligned}\left\{ x \in X \, | \int_{\mathbb{R}^s} \max \lbrace f(x,z) - \eta, 0 \rbrace~\mu_Z(dz) \leq \int_{\mathbb{R}} \max \lbrace b - \eta, 0 \rbrace~\nu(db) \; \forall \eta \in \mathbb{N} \right\}. \end{aligned}$$

In both cases, the feasibility does only depend on the distribution of the underlying random vector . As in Sect. 17.2.4, we consider situations where μ Z is replaced with an approximation and study the behaviour of the mappings \(\mathcal {C}_1: \mathbb {P}(\mathbb {R}^s) \rightrightarrows \mathbb {R}^n\) defined by

$$\displaystyle \begin{aligned}\mathcal{C}_1(\mu) = \left\{ x \in X \; | \; \mu \big[ \lbrace z \in \mathbb{R}^s \; | \; f(x,z) \leq \beta \rbrace \big] \geq \nu\big[ \lbrace b \in \mathbb{R} \; | \; b \leq \beta \rbrace \big] \; \forall \beta \in \mathbb{R} \right\}. \end{aligned}$$

and \(\mathcal {C}_2: \mathcal {M}^1_s \rightrightarrows \mathbb {R}^n\) given by

$$\displaystyle \begin{aligned}C_2(\mu) := \left\{ x \in X \, | \int_{\mathbb{R}^s} \max \lbrace f(x,z) - \eta, 0 \rbrace~\mu(dz) \leq \int_{\mathbb{R}} \max \lbrace b - \eta, 0 \rbrace~\nu(db) \, \forall \eta \in \mathbb{N} \right\}. \end{aligned}$$

Invoking Lemma 17.2.1, the following result can be obtained by adapting the proofs of [46, Proposition 2.1] and [47, Proposition 2.2] :

Proposition 17.2.24

Assume dom f  ∅ and \(P = \mathbb {R}^n \times \mathbb {R}^s\). Then the following statements hold true:

  1. a.

    The multifunction C 1is closed w.r.t. the topology of weak convergence , i.e. for any sequences \(\lbrace \mu _{l} \rbrace _{l} \subset \mathcal {P}(\mathbb {R}^{s})\)and \(\lbrace x_{l} \rbrace _{l} \subset \mathbb {R}^{n}\)with , \(x_{l} \rightarrow x \in \mathbb {R}^{n}\)for l ∞ and x l ∈ C 1(μ l) for all \(l \in \mathbb {N}\)it holds true that x  C 1(μ).

  2. b.

    Additionally assume that \(\nu \in \mathcal {M}^1_1\), then the multifunction C 2is closed w.r.t. the topology of weak convergence.

By considering the constant sequence μ l = μ for all \(l \in \mathbb {N}\) we obtain the closedness of the sets C 1(μ) and C 2(μ) under the conditions of Proposition 17.2.24. The closedness of the multifunctions C 1 and C 2 is also the key to proving the following stability result (cf. [46, Proposition 2.5]):

Theorem 17.2.25

Assume dom f  ∅, \(P = \mathbb {R}^n \times \mathbb {R}^s\)and that X is nonempty and compact. Moreover, let g be lower semicontinuous. Then the following statements hold true:

  1. a.

    The optimal value function \(\varphi _1: \mathcal {P}(\mathbb {R}^{s}) \to \mathbb {R} \cup \lbrace \infty \rbrace \)given by

    $$\displaystyle \begin{aligned}\varphi_1(\mu) := \inf \lbrace g(x) \; | \; x \in C_1(\mu)\rbrace \end{aligned}$$

    is weakly lower semicontinuous on dom C 1.

  2. b.

    Additionally assume \(\nu \in \mathcal {M}^1_1\), then the function \(\varphi _2: \mathcal {M}^1_s \to \mathbb {R} \cup \lbrace \infty \rbrace \)given by

    $$\displaystyle \begin{aligned}\varphi_2(\mu) := \inf \lbrace g(x) \; | \; x \in C_2(\mu)\rbrace \end{aligned}$$

    is weakly lower semicontinuous on dom C 2. △

2.6 Finite Discrete Distributions

Throughout this section, we shall assume that the underlying random vector Z is discrete with a finite number of realizations \(Z_1, \ldots , Z_K \in \mathbb {R}^s\) and respective probabilities π 1, …, π K ∈ (0, 1]. Let I denote the index set {1, …, K}, then P Z takes the form

$$\displaystyle \begin{aligned}P_Z = \lbrace x \in \mathbb{R}^n \; | \; \forall k \in I \; \exists y \in \mathbb{R}^m: \; Ay \leq Tx + Z_k \rbrace. \end{aligned}$$

Suppose that x 0 ∈ X is such that \(\lbrace y \in \mathbb {R}^m \; | \; Ay \leq Tx_0 + Z_k \rbrace = \emptyset \) holds for some k ∈ I. Then the probability of f(x 0, Z(ω)) =  is a least π k > 0, i.e. x 0 should be considered as infeasible for problem (17.2.1). Consequently, X ⊆ P Z can be understood as an induced constraint. Note that X ∩ P Z is a polyhedron if X is a polyhedron.

In this setting, the bilevel stochastic linear problem can be reduced to a standard bilevel program, which allows to adapt optimality conditions and algorithms designed for the deterministic case (cf. [48]).

Proposition 17.2.26

Assume , \( \mathcal {R}_{\max } \rbrace \)and let X  P Zbe a polyhedron. If \(\mathcal {R} \in \lbrace \mathit{\mbox{EP}}_{\eta }, \mathit{\mbox{VaR}}_{\alpha } \rbrace \), additionally assume that X bounded. Then for any parameter β, there exists a constant M > 0 such that the bilevel stochastic linear problem

$$\displaystyle \begin{aligned}\min_x \left\{ \mathcal{R}[F(x)] \; | \; x \in X \right\} \end{aligned}$$

is equivalent to the standard bilevel program

$$\displaystyle \begin{aligned}\min_x \Big\{\underbrace{\ \;\inf_{\eta \in \mathbb{R}}}_{\mathit{\text{if}}\ \mathcal{R} = \mathit{\mbox{CVaR}}_{\alpha}}\ \min_w \big\{a(x, w) \; | \; w \in \Psi_{\mathcal{R}}(x)\big\}| \; x \in X\Big\},\ \mathit{\text{or}} \end{aligned}$$
$$\displaystyle \begin{aligned}\min_x \Big\{c^\top x + \inf_{\eta \in \mathbb{R}}\big\{\eta \; | \; \min_w \{a(x, w) \; | \; w \in \Psi_{\mathcal{R}}(x)\} \geq \alpha\big\}| \; x \in X\Big\}\quad \mathit{\text{if}}\ \mathcal{R}= \mathit{\mbox{VaR}}_{\alpha}, \end{aligned}$$

where \(\Psi _{\mathcal {R}}: \mathbb {R}^n \rightrightarrows \mathbb {R}^{\mathit{\mbox{dim}}\;w}\) with

$$\displaystyle \begin{aligned}\Psi_{\mathcal{R}}(x) := \underset{w}{\mathit{\mbox{Argmin}}} \; \left\{\sum_{k \in I}d^\top y_k \; | \; Ay_k \leq Tx + Z_k,\;b_k(x, w) \geq 0\ \forall k \in I\right\}. \end{aligned} $$

The specific formulations can be found in Table 17.1. △

Table 17.1 Equivalent bilevel linear programs

Proof

For \(\mathcal {R} \in \lbrace \mathbb {E}, \mbox{EE}_\eta ^1, \mbox{SD}^1_\rho , \mbox{CVaR}_\alpha \rbrace \), we refer to [31, Section 5].

For the excess probability, the first of the considered quantile-based risk measures, we have \(\mbox{EP}_{\eta }[F(x)] = \mathbb {P}\left [c^\top x + \inf _y \{q^\top y \; | \; y \in \Psi (x, Z(\cdot ))\} > \eta \right ]\). Fix \(M \in \mathbb {R}\) such that

$$\displaystyle \begin{aligned} M > \sup\big\{c^\top x + \inf_{y_k} \big\{q^\top y_k \; | \; y_k \in \Psi(x, Z_k)\big\} \; | \; x \in X,\; k \in I\big\} - \eta\end{aligned} $$

and, for y k ∈ Ψ(x, Z k), let

$$\displaystyle \begin{aligned} \theta_k := \begin{cases} 0 & \text{if}\ c^\top x + q^\top y_k - \eta \leq 0,\\ 1 & \text{otherwise.} \end{cases} \end{aligned} $$

Then the excess probability is equal to

$$\displaystyle \begin{aligned} \begin{array}{rcl} &\displaystyle \sum_{k \in I}\pi_k \inf_{y_k, \theta_k} \left\{\theta_k \; | \; M\theta_k \geq c^\top x + q^\top y_k - \eta,\; y_k \in \Psi(x, Z_k),\; \theta_k \in \{0, 1\}\right\}\\ &\displaystyle \left. \begin{aligned} = \inf_{\genfrac{}{}{0pt}{}{y_1, \ldots, y_K,}{\theta_1, \ldots, \theta_K}} \left\{\sum_{k \in I}\pi_k \theta_k \; | \; \right.&\displaystyle M\theta_k \geq c^\top x + q^\top y_k - \eta,\; y_k \in \Psi(x, Z_k),\; \theta_k \in \{0, 1\}\\ &\displaystyle \forall k \in I \end{aligned} \right\}. \end{array} \end{aligned} $$

Similar to the proof for \(\mathcal {R} = \mbox{EP}_\eta \), the expression \(\mathbb {P}[f(x,Z(\cdot )) \leq \eta ]\) equals

$$\displaystyle \begin{aligned} &\sum_{k \in I}\pi_k \inf_{y_k, \theta_k} \left\{\theta_k \; | \; M(1 - \theta_k) \geq c^\top x + q^\top y_k - \eta,\; y_k \in \Psi(x, Z_k),\; \theta_k \in \{0, 1\} \right\}\\ &\left. \begin{aligned} = \left.\inf_{\genfrac{}{}{0pt}{}{y_1, \ldots, y_K,}{\theta_1, \ldots, \theta_K}} \right\{\sum_{k \in I}\pi_k \theta_k \; | \; &M(1 - \theta_i) \geq c^\top x + q^\top y_k - \eta,\; y_k \in \Psi(x, Z_k),\; \theta_k \in \{0, 1\} \\ &\forall k \in I \end{aligned} \right\}, \end{aligned}$$

where

$$\displaystyle \begin{aligned}\theta_k := \begin{cases} 1 & \text{if}\ c^\top x + q^\top y_k - \eta \leq 0,\\ 0 & \text{otherwise.} \end{cases} \end{aligned}$$

Thereby we get equality of VaRη[f(x, Z(⋅))] and

$$\displaystyle \begin{aligned} \inf \left\{\eta \in \mathbb{R} \; | \; \inf_{\genfrac{}{}{0pt}{}{y_1, \ldots, y_K,}{\theta_1, \ldots, \theta_K}} \left\{\sum_{k \in I}\pi_k \theta_k \; | \; (y_1, \ldots, y_K, \theta_1, \ldots, \theta_K) \in \Psi_{\mbox{VaR}_{\eta}}(x) \right\} \geq \alpha\right\}. \end{aligned}$$

The worst-case risk measure is equal to \(\sup _{k \in I} \left \{c^\top x + \min _{y_k} \{q^\top y \; | \; y {\in } \Psi (x, Z_k)\}\right \}\) and the result follows from \(\Psi _{\mathcal {R}_{\max }} = \Psi (x, Z_1) \times \ldots \times \Psi (x, Z_K)\). □

Remark 17.2.27

  1. a.

    The equivalent standard bilevel problem is linear provided that \(\mathcal {R} \in \lbrace \mathbb {E}, \mbox{EE}_\eta ^1, \mbox{SD}_{\rho }^1 \rbrace \).

  2. b.

    Analogous to [31, Remarks 5.2, 5.4], the inner minimization problems of the standard bilevel linear programs for \(\mathcal {R} \in \lbrace \mathbb {E}, \mbox{EE}_\eta ^1, \mbox{EP}_\eta , \mathcal {R}_{\max } \rbrace \) can be decomposed into K scenario problems that only differ w.r.t. the right-hand side of the constraint system. For the other models, a similar decomposition is possible after Lagrangian relaxation of the coupling constraints involving different scenarios.

  3. c.

    For \(\mathcal {R} = \mbox{CVaR}_\alpha \), every evaluation of the objective function in the standard bilevel linear program corresponds to solving a bilevel linear problem with scalar upper level variable η.

  4. d.

    Alternate models for \(\mathcal {R} = \mbox{VaR}_\alpha \) are given in [17] and [49], where the considered bilevel stochastic linear problem is reduced to a mixed-integer nonlinear program and a mathematical programming problem with equilibrium constraints, respectively. A mean-risk model with \(\mathcal {R} = \mbox{CVaR}_\alpha \) is used in [5, Sect. III].△

Similar reformulations can be obtained for the models discussed in Sect. 17.2.5 if we assume that the disutility function is linear.

Proposition 17.2.28

Assume dom f  ∅, and let X  P Zbe a bounded polyhedron. Then for any parameter γ, the problem

$$\displaystyle \begin{aligned} \min_x \left\{g^\top x \; | \; F(x) \in \mathcal{A},\; x \in X \right\} \end{aligned}$$

is equivalent to

$$\displaystyle \begin{aligned} \min_x \Big\{g^\top x \; | \; \inf_{w_j} \left\{a(w_j) \; |\; w_j \in \Psi_{\mathcal{R}}(x)\right\} \geq \delta_j \ \forall j = 1, \ldots, l,\; x \in X\Big\}. \end{aligned}$$

The specific formulations are listed in Table 17.2, where \(\bar {a}_j := 1 - \mathbb {P}[b \leq a_j]\)and \(\tilde {a}_j := \int _{\mathbb {R}^s}\max \{b(z) - a_j, 0\}\;\mu _Z(dz)\).

Table 17.2 Equivalent programs, notation as in the Example in Sect. 17.2.5

2.7 Solution Approaches

To solve bilevel problems, it is very common to use a single level reformulation. Often the lower level minimality condition is replaced by its Karush-Kuhn-Tucker or Fritz John conditions and the bilevel problem is reduced to a mathematical programming problem with equilibrium constraints (cf. [5, 17], [48, Chap. 3.5.1]).

For \(\mathcal {R} \in \lbrace \mathbb {E}, \mbox{EE}_\eta ^1, \mbox{SD}^1_\rho , \mbox{EP}_\eta , \mathcal {R}_{\max } \rbrace \), the equivalent standard bilevel programs in Proposition 17.2.26 can be all restated as

$$\displaystyle \begin{aligned} \min_u \big\{g^\top u + \min_w \lbrace h^\top w \; | \; w \in \Psi(u) \rbrace \; | \; u \in U\big\}, \end{aligned} $$
(17.2.2)

where \(\Psi : \mathbb {R}^k \rightrightarrows \mathbb {R}^l\) is given by Ψ(u) = Argminw{t w | Ww ≤ Bu + b} for vectors \(g \in \mathbb {R}^k\), \(h, t \in \mathbb {R}^l\) and \(b \in \mathbb {R}^r\), matrices \(W \in \mathbb {R}^{r \times l}\) and \(B \in \mathbb {R}^{r \times k}\), and \(U \subseteq \mathbb {R}^k\) is a nonempty polyhedron. The usage of the KKT conditions of the lower level problem leads to the single-level problem

$$\displaystyle \begin{aligned} \min_{u, w, v} \left\{ g^\top u + h^\top w \; \Bigg| \; \begin{aligned} &Ww \leq Bu + b, \; W^\top v = t, \; v \leq 0, \\ &v^\top (Ww - Bu - b) = 0, \; u \in U \end{aligned} \right\}. \end{aligned} $$
(17.2.3)

More details as well as statements on the coincidence of optimal values and the existence of local and global minimizers are given in [31, Sect. 6]. If the condition v (Ww − Bu − b) = 0 is relaxed by v (Ww − Bu − b) ≤ ε (the resulting problem is denoted by P(ε)), the violation of regularity conditions like (MFCQ) and (LICQ) at every feasible point of (17.2.3) can be bypassed. A discussion of other difficulties associated with (17.2.3) is provided in [11, Chap. 3.1.2].

In [31, Sect. 6] it is also shown that \((\bar {u}, \bar {w})\) is a local minimizer of the optimistic formulation, if \((\bar {u}, \bar {w}, \bar {v})\) is an accumulation point of a sequence \(\lbrace (u_n, w_n, v_n) \rbrace _{n \in \mathbb {N}}\) of local minimizers of problem P(ε n) for ε n 0.

In the risk-neutral setting, problem (17.2.2) exhibits a block-structure (cf. Remark 17.2.27 b.). Adapting the solution method for general linear complementarity problems proposed in [50], this special structure has been used in [11, Chap. 6] to construct an efficient algorithm for the global resolution of bilevel stochastic linear problems based on dual decomposition.

Remark 17.2.29

Utilizing the lower level value function, problem (17.2.2) can be reformulated as a single level quasiconcave optimization problem (cf. [48, Chap. 3.6.5]). Solution methods based on a branch-and-bound scheme have been proposed in [51] and [52]. However, without modifications, these algorithms fail to exploit the block structure arising in risk-neutral bilevel stochastic linear optimization models (cf. [11, Chap. 4.2]). △

3 Two-Stage Stochastic Bilevel Programs

In two-stage stochastic bilevel programming, both leader and follower have to make their respective first-stage decisions without knowledge of the realization of a stochastic parameter. Afterwards, the second-stage decisions are made under complete information. This leads to the following chronology of decision and observation:

Remark 17.3.1

The bilevel stochastic linear problems considered in Sect. 17.2 can be understood as special two-stage bilevel programs, where the follower’s first-stage and the leader’s second stage decision do not influence the outcome. △

In [12], a two-stage stochastic extension of the bilevel network pricing model introduced in [53] is studied. Consider a multicommodity transportation network (N,  Λ, K), where (N,  Θ) is a directed graph and each commodity k ∈ K is to be transported from an origin O(k) ∈ N to a destination D(k) ∈ N in order to satisfy a demand n k ∈ (0, ). The set of arcs Θ is partitioned into the subsets θ and \(\bar {\theta }\) of tariff and tariff-free arcs, respectively, and the leaders is maximizing the revenue raised from tariffs, knowing that user flows are assigned to cheapest paths. In [53], this situation is modeled as a bilevel program

$$\displaystyle \begin{aligned}\text{``}\max_{x}\text{''} \left\{ \sum_{k \in K} x^\top y^k \; | \; (y,\bar{y}) \in \Psi(x) \right\}, \end{aligned}$$

where the lower level is given by

$$\displaystyle \begin{aligned}\Psi(x, c, d, b) := \underset{y,\bar{y}}{\mbox{Argmin}} \left\{ \sum_{k \in K} \left[ (c + x)^\top y^k + \bar{c}^\top \bar{y}^k \right] \; \Big| \; \begin{array}{l} y, \bar{y} \geq 0, \\ Ay^k + \bar{A} \bar{y}^k = b^k \; \forall k \in K \end{array} \right\}, \end{aligned}$$

and x is the vector of tariffs controlled by the leader, y k and \(\bar {y}^k\) are the flows of commodity k on the tariff and tariff-free arcs, respectively. Moreover, c and \(\bar {c}\) are the fixed costs on θ and \(\bar {\theta }\), respectively, \((A,\bar {A})\) denotes the node-arc incidence matrix and the vectors b k defined by

$$\displaystyle \begin{aligned}b^k_i := \begin{cases} n^k, &\text{if} \; i = O(k) \\ -n^k, &\text{if} \; i = D(k) \\ 0, &\text{else} \end{cases} \end{aligned}$$

are used to express nodal balance. Reference [12] extends the above model to a two-stage setting including market uncertainties: After deciding on first-stage tariffs, the situation repeats itself on the same network but with different cost and demand parameters. At the first-stage , only the distribution of the second-stage parameter Z(ω) = (c 2, d 2, b 2)(ω) is known and the stages are linked by the restriction that the second-stage tariffs should not differ too widely from those set at the first stage. The linking constraint is motivated by policy regulations and competitivity issues. In a risk-neutral setting, this results in the problem

$$\displaystyle \begin{aligned} \text{``}\max_{x_1}\text{''} \left\{ \sum_{k \in K} x_1^\top y^k_1 + \mathbb{E}[\Phi(x_1, Z(\cdot))] \; | \; (y_1,\bar{y}_1) \in \Psi(x_1,c_1,d_1,b_1) \right\}, \end{aligned} $$
(17.3.1)

where the recourse is given by

$$\displaystyle \begin{aligned}\Phi(x_1, Z(\omega)) := \text{``}\max_{x_2}\text{''} \left\{ \sum_{k \in K} x_2^\top y^k_2 \; | \; (x_1,x_2) \in \Gamma(\delta), \; (y_2,\bar{y}_2) \in \Psi(x_2,Z(\omega)) \right\} \end{aligned}$$

and the set Γ(δ) is defined as either

$$\displaystyle \begin{aligned}\Gamma(\delta) := \Gamma_A(\delta) := \lbrace (x_1,x_2) \; | \; |x_{1, \theta} - x_{2, \theta}| \leq \delta_\theta \; \forall \theta \in \Theta \rbrace \end{aligned}$$

if tariff changes are limited in absolute values or

$$\displaystyle \begin{aligned}\Gamma(\delta) := \Gamma_R(\delta) := \lbrace (x_1,x_2) \; | \; |x_{1, \theta} - x_{2, \theta}| \leq \delta_\theta |x_{1, \theta}| \; \forall \theta \in \Theta \rbrace \end{aligned}$$

if proportional limits are considered. Assuming that the underlying random vector Z is discrete with a finite number of realizations, a reformulation of (17.3.1) as a single-stage bilevel program is established in [12]. Moreover, sensitivity analysis of the optimal value function of (17.3.1) w.r.t. the parameter δ ∈ [0, )| Θ| (cf. [12, Proposition 4.1, Proposition 4.2]) as well as numerical studies are conducted (cf. [12, Sects. 5, 6]).

4 Challenges

We shall highlight some aspects of bilevel stochastic programming that are highly deserving of future research:

Going (Further) Beyond the Risk-Neutral Case for Nonlinear Models

The first paper on bilevel stochastic programming has already outlined the basic principles as well as existence and sensitivity results for risk neutral models (cf. [2]). Nevertheless, so far, most of the research on bilevel stochastic nonlinear programming is still concerned with the risk-neutral case. Notable exceptions are [5] and [8], where models involving the Conditional Value-at-Risk are considered. In the first paper the problem of maximizing the medium-term revenue of an electricity retailer under uncertain pool prices, demand, and competitor prices is modeled as a bilevel stochastic quadratic problem, while the latter explores links between electricity swing option pricing and bilevel stochastic optimization. However, there exists no systematic analysis of bilevel stochastic nonlinear problems in the broader framework of coherent risk measures or higher stochastic dominance constraints. Future research may also consider distributionally robust models (cf. [54]).

Exploiting (Quasi) Block Structures Arising in Risk-Averse Models

Under finite discrete distributions many bilevel stochastic problems can be reformulated as standard bilevel programs. While this reformulation entails a blow-up of the dimension which is usually linear in the number of scenarios, the resulting problems often exhibit (quasi) block structures (cf. Remark 17.2.27b., [2]). For risk-neutral bilevel stochastic linear problems, [11, Chap. 6] utilizes these structures to enhance the mixed integer programming based solution algorithm of [50] resulting in a significant speed-up. Based on the structural similarities an analogous approach should be possible for risk-averse models after Lagrangian relaxation of coupling constraints.

Going Beyond Exogenous Stochasticity

While the analysis in the vast majority of papers on stochastic programming is confined to the case of purely exogenous stochasticity, this assumption is known to be unrealistic in economic models, where the decision maker holds market power. Therefore, models with decision dependent distributions are of particular interest in view of stochastic Stackelberg games (cf. [55]).