1 Introduction

Most numerical solution approaches in stochastic programming require the replacement of the underlying multivariate probability distribution by a discrete probability measure with a finite number of realizations or scenarios. The most used approach so far is Monte Carlo sampling (see, for example, [47, Chapter 6]). Another more classical approach for two-stage models uses discrete probability measures leading to lower and upper bounds for the expected recourse function. They are obtained by means of moment problems (see [26, Section 4.7.2]). More recently optimal quantization techniques (see [16, 35]) and (randomized) Quasi-Monte Carlo methods (see [3, 29, 34]) are employed for solving two-stage stochastic programs. For a survey on scenario generation in stochastic programming see [44].

Jitka Dupačová was one of the pioneers for scenario generation and reduction. We recall her earlier paper [8] and the influential work [9, 10].

Here, we study a problem-based approach to scenario generation and reduction for stochastic programming models without information constraints. A general form of such models is [26, 47, 49]

$$\begin{aligned} \min \left\{ \int _{\varXi }f_{0}(x,\xi )P(d\xi ):x\in X,\,\int _{\varXi }f_{1}(x,\xi )P(d\xi ) \le 0\right\} \end{aligned}$$
(1)

where X is a closed subset of \(\mathbb {R}^m\), \(\varXi \) a closed subset of \(\mathbb {R}^s\), P is a Borel probability measure on \(\varXi \) abbreviated by \(P\in \mathcal {P}(\varXi )\). The functions \(f_{0}\) and \(f_{1}\) from \(\mathbb {R}^m\times \varXi \) to the extended reals \(\overline{\mathbb {R}} =[-\infty , \infty ]\) are normal integrands (in the sense of [42, Chapter 14]). For example, typical integrands \(f_{0}\) in linear two-stage stochastic programming models are of the form [55, 47, Chapt. 2]

$$\begin{aligned} f_{0}(x,\xi )=\left\{ \begin{array}{cl} g(x)+\varPhi (q(\xi ),h(x,\xi )), &{} q(\xi )\in \mathcal {D}\\ +\infty , &{} \text{ else }\end{array}\right. \quad \text{ and }\quad f_{1}(x,\xi )\equiv 0, \end{aligned}$$
(2)

where X and \(\varXi \) are convex polyhedral, \(g(\cdot )\) is a linear function, \(\varPhi \) denotes the infimal function of the linear (second-stage) optimization problem

$$\begin{aligned} \varPhi (q,t):=\inf \{\langle q,y\rangle :Wy=t,y\in Y\} \end{aligned}$$
(3)

with a \((r,\bar{m})\) matrix W, a convex polyhedral cone \(Y\subset \mathbb {R}^{\bar{m}}\) with \(Y^{\star }\) denoting its polar cone, \(q(\cdot )\) is an affine function, \(h(\cdot ,\xi )\) is affine for fixed \(\xi \) and \(h(x,\cdot )\) is affine for fixed x, and \(\mathcal {D}=\{u\in \mathbb {R}^{\bar{m}}{:}\{z\in \mathbb {R}^{r}{:}W^{\top }z-u\in Y^{\star }\} \ne \emptyset \}\) denotes the convex polyhedral dual feasibility set. Other examples of practical interest are infimal functions of linear-quadratic or second-order cone programming problems. Typical integrands \(f_{1}\) appearing in chance constrained programming are of the form \(f_{1}(x,\xi )= p-\mathbb {1}_{\mathcal {P}(x)}(\xi )\), where \(\mathbb {1}_{\mathcal {P}(x)}\) is the characteristic function of the polyhedron \(\mathcal {P}(x)=\{\xi \in \varXi {:}h(x,\xi )\le 0\}\) depending on x.

Let v(P) and S(P) denote the infimum and solution set of (1). The notation indicates that their dependence on the underlying probability distribution is of particular interest. For general continuous multivariate probability distributions P such stochastic optimization models are not solvable in general. Even the computation of the involved integrals requires multivariate numerical integration methods. Many approaches for solving optimization models (1) numerically are based on discrete approximations of the probability measure P, i.e., on finding a discrete probability measure \(P_{n}\) in

$$\begin{aligned} \mathcal {P}_{n}(\varXi ):=\left\{ \sum _{i=1}^{n}w_{i}\delta _{\xi ^{i}}:\xi ^{i}\in \varXi , \,i=1,\ldots ,n,\,(w_{1},\ldots ,w_{n})\in \mathcal {S}_{n}\right\} \end{aligned}$$

for some \(n\in \mathbb {N}\), which approximates P in a suitable way. Here, \(\mathcal {S}_{n}\) denotes the standard simplex \(\mathcal {S}_{n}=\{w\in \mathbb {R}^{n}_{+}{:}\sum _{i=1}^{n}w_{i}=1\}\) and \(\xi ^{i}\), \(i=1,\ldots ,n\), the scenarios. Of course, the notion ’suitable’ should at least mean that the distance between the infima

$$\begin{aligned} \left| v(P)-v(P_{n})\right| \end{aligned}$$
(4)

becomes reasonably small. This is a consequence of stability results for stochastic programming problems which explore the behavior of infima and solution sets if the probability distribution is perturbed. To state a version of such results we introduce the following sets of functions and of probability distributions (both defined on \(\varXi \))

$$\begin{aligned} \mathcal {F}= & {} \left\{ f_{j}(x,\,\cdot \,){:}j=0,1,\,x\in X\right\} ,\nonumber \\ \mathcal {P}_{\mathcal {F}}= & {} \left\{ Q\in \mathcal {P}(\varXi ){:}-\infty<\int _{\varXi }\inf _{x\in X} f_{j}(x,\xi )Q(d\xi ),\right. \nonumber \\&\left. \sup _{x\in X}\int _{\varXi }f_{j}(x,\xi )Q(d\xi ) <+\infty ,j=0,1\right\} \end{aligned}$$
(5)

and the following (semi-) distance on \(\mathcal {P}_{\mathcal {F}}\)

$$\begin{aligned} d_{\mathcal {F}}(P,Q)=\sup _{f\in \mathcal {F}}\left| \int _{\varXi }f(\xi )(P-Q)(d\xi )\right| \quad (P,Q\in \mathcal {P}_{\mathcal {F}}). \end{aligned}$$
(6)

The distance \(d_{\mathcal {F}}\) is based on minimal information of the underlying optimization model (1). It is nonnegative, symmetric and satisfies the triangle inequality. At first sight the set \(\mathcal {P}_{\mathcal {F}}\) seems to have a complicated structure. For typical applications, however, like for linear two-stage and chance constrained models, the set \(\mathcal {P}_{\mathcal {F}}\) or appropriate subsets allow a simple characterization. For example as subsets of \(\mathcal {P}(\varXi )\) satisfying certain moment conditions.

To state the following result we need a specific property for set-valued mappings. A set-valued map \(M{:}\mathbb {R}\rightrightarrows \mathbb {R}^{m}\) with closed graph \(\mathrm{gph}\,M=\{(y,x){:}x\in M(y)\}\) has the Aubin property at \(\bar{y}\in \mathbb {R}\) for \(\bar{x}\in M(\bar{y})\) if there are neighborhoods V of \(\bar{y}\) and W of \(\bar{x}\), and a constant \(\kappa \in \mathbb {R}_{+}\) such that

$$\begin{aligned} M(y')\cap W\subset M(y)+\kappa |y-y'|\mathbb {B}\quad \text{ for } \text{ all } y,y'\in V, \end{aligned}$$

where \(\mathbb {B}\) is the unit ball in \(\mathbb {R}^{m}\) (see [42, Definition 9.36]). The next quantitative stability result for problem (1) is a consequence of [43, Theorems 5 and 9].

Proposition 1

We consider the optimization model (1) with infimum v(P) and solution set S(P) for \(P\in \mathcal {P}_{\mathcal {F}}\). Assume that X is compact and

  1. (i)

    the function \(x\rightarrow \int _{\varXi }f_{0}(x,\xi )P(d\xi )\) is Lipschitz continuous on X,

  2. (ii)

    the set-valued mapping \(y\rightrightarrows \left\{ x\in X: \int _{\varXi }f_{1}(x,\xi )P(d\xi )\le y\right\} \) has the Aubin property at \(\bar{y}=0\) for each \(\bar{x}\in S(P)\).

Then there exist constants \(L>0\) and \(\delta >0\) such that the estimates

$$\begin{aligned} \left| v(P)-v(Q)\right|\le & {} L\,d_{\mathcal {F}}(P,Q) \end{aligned}$$
(7)
$$\begin{aligned} \sup _{x\in S(Q)}d(x,S(P))\le & {} \varPsi _{P}(L\,d_{\mathcal {F}}(P,Q)) \end{aligned}$$
(8)

hold whenever \(Q\in \mathcal {P}_{\mathcal {F}}\) and \(d_{\mathcal {F}}(P,Q)<\delta \). The real-valued function \(\varPsi _{P}\) is given by \(\varPsi _{P}(r)=r+\psi _{P}^{-1}(2r)\) for all \(r\in \mathbb {R}_{+}\), where \(\psi _{P}\) is the growth function near the solution set S(P) and \(\psi _{P}(\tau )\) is defined for \(\tau \ge 0\) as

$$\begin{aligned} \inf _{x\in X}\left\{ \int _{\varXi }f_{0}(x,\xi )P(d\xi )-v(P): d(x,S(P))\ge \tau ,\,x\in X,\,\int _{\varXi }f_{1}(x,\xi )P(d\xi )\le 0\right\} . \end{aligned}$$

Note that in case \(f_{1}\equiv 0\) the estimates hold for \(L=1\) and any \(\delta >0\) and that \(\varPsi _{P}\) is lower semicontinuous and increasing on \(\mathbb {R}_{+}\) with \(\varPsi _{P}(0)=0\).

The estimates (7) and (8) in Proposition 1 suggest to choose discrete approximations from \(\mathcal {P}_{n}(\varXi )\) for solving (1) such that they solve the best approximation problem

$$\begin{aligned} \min _{P_{n}\in \mathcal {P}_{n}(\varXi )}d_{\mathcal {F}}(P,P_{n}) \end{aligned}$$
(9)

in order to bound (4) as tight as possible. Determining the scenarios of some solution to (9) may be called optimal scenario generation. This choice of discrete approximations was already suggested in [43, Section 4.2], but characterized there as a challenging task which is not solvable in most cases in reasonable time.

It is recommended in [37, 43] to eventually enlarge the function class \(\mathcal {F}\) such that \(d_{\mathcal {F}}\) becomes a metric distance and has further nice properties. Following this idea, however, leads to coarse estimates of the original minimal information distance and, hence, may lead to unfavorable convergence rates of the sequence

$$\begin{aligned} \left( \min _{P_{n}\in \mathcal {P}_{n}(\varXi )}d_{\mathcal {F}}(P,P_{n})\right) _{n\in \mathbb {N}} \end{aligned}$$
(10)

and to nonconvex nondifferentiable minimization problems (9) for determining the optimal scenarios.

In linear two-stage stochastic programming the class \(\mathcal {F}\) contains piecewise linear-quadratic functions defined on \(\varXi \) if condition (A1) (see Sect. 2) is satisfied. If the linear two-stage model has even random recourse, \(\mathcal {F}\) may contain more general piecewise polynomial functions (see [45]). Hence, a suitably enlarged class of functions may be chosen as the set

$$\begin{aligned} \mathcal {F}_{r}=\left\{ f:\varXi \mapsto \mathbb {R}:f(\xi )-f(\tilde{\xi })\le \max \left\{ 1,\Vert \xi \Vert , \Vert \tilde{\xi }\Vert \right\} ^{r-1}\Vert \xi -\tilde{\xi }\Vert ,\,\forall \xi ,\tilde{\xi }\in \varXi \right\} \nonumber \\ \end{aligned}$$
(11)

of all locally Lipschitzian functions on \(\varXi \) with polynomially growing normalized local Lipschitz constants. Here, \(\Vert \cdot \Vert \) denotes any norm on \(\mathbb {R}^{s}\) and \(r\ge 1\) characterizes the growth of the Lipschitz moduli. The corresponding distance

$$\begin{aligned} \zeta _{r}(P,Q)=d_{\mathcal {F}_{r}}(P,Q) \end{aligned}$$
(12)

is defined on the set \(\mathcal {P}_{r}(\varXi )\) of all probability measures on \(\varXi \) having rth order central moments and is called Fortet–Mourier metric of order r (see [36, Section 5.1]). The Fortet–Mourier metric has a dual representation as a transshipment problem (see [36, Section 5.3]). If \(\varXi \) is compact, \(\zeta _{r}\) admits even a dual representation as transportation problem (see [38, Section 4.3]), namely, it holds

$$\begin{aligned} \zeta _{r}(P,Q)=\inf \left\{ \int _{\varXi \times \varXi }c_{r}(\xi ,\tilde{\xi }) \eta (d\xi ,d\tilde{\xi }):\eta \circ \pi _{1}^{-1}=P,\eta \circ \pi _{2}^{-1}=Q \right\} , \end{aligned}$$
(13)

where \(\eta \) is a probability measure on \(\varXi \times \varXi \), \(\pi _{1}\) and \(\pi _{2}\) are the projections from \(\varXi \times \varXi \) to the first and second component, respectively, \(c_{r}\) is a metric on \(\mathbb {R}^{s}\) and \(c_{r}(\xi ,\tilde{\xi })\) is defined as

$$\begin{aligned} \inf \left\{ \sum _{i=1}^{n-1}\max \{1,\Vert \xi _{i}\Vert ,\Vert \xi _{i+1}\Vert \}^{r-1} \Vert \xi _{i}-\xi _{i+1}\Vert :n\in \mathbb {N},\xi _{i}\in \varXi ,\xi _{1}=\xi ,\xi _{n}=\tilde{\xi } \right\} \end{aligned}$$

for all \(\xi ,\tilde{\xi }\in \varXi \). The representation (13) implies, in particular, that the best approximation problem (9) for \(\mathcal {F}=\mathcal {F}_{r}\) is equivalent to

$$\begin{aligned} \min _{(\xi ^{1},\ldots ,\xi ^{n})\in \varXi ^{n}}\int _{\varXi }\min _{i=1,\dots ,n} c_{r}(\xi ,\xi ^{i})P(d\xi ), \end{aligned}$$
(14)

where \(\xi ^{i}\), \(i=1,\ldots ,n\), are the scenarios of \(P_{n}\in \mathcal {P}_{n}(\varXi )\). This follows similarly as in [16, Lemma 4.2]. For \(r=1\) the probabilities \(w_{i}\) of \(\xi ^{i}\) can be computed by \(w_{i}=P(A_{i})\), \(i=1,\ldots ,n\), where the collection \(\{A_{i}:i=1,\ldots ,n\}\) is a Voronoi partition of \(\varXi \), i.e., \(A_{i}\) is Borel measurable and a subset of

$$\begin{aligned} \left\{ \xi \in \varXi :\Vert \xi -\xi ^{i}\Vert =\min _{j=1,\ldots ,n}\Vert \xi -\xi ^{j}\Vert \right\} \quad (i=1,\ldots ,n). \end{aligned}$$

Note that the objective function in (14) is continuous and inf-compact on \(\varXi ^{n}\). Hence, the minimization problem (14) is solvable, but nonconvex for \(n\ge 2\) even for \(r=1\). Furthermore, due to a classical result (see [6, Proposition 2.1]), the estimate

$$\begin{aligned} c\,n^{-\frac{1}{s}}\le \zeta _{1}(P,P_{n})\le \zeta _{r}(P,P_{n}) \end{aligned}$$

holds for each \(P_{n}\in \mathcal {P}_{n}(\varXi )\), sufficiently large n and some constant \(c>0\) if P has a density on \(\varXi \). Hence, the convergence rate (10) for \(\mathcal {F}=\mathcal {F}_{r}\) is worse than the Monte Carlo rate \(O(n^{-\frac{1}{2}})\) if the dimension s of \(\varXi \) is greater than two.

The approach to optimal scenario reduction for linear two-stage stochastic programs developed in [10] is based on Monge-Kantorovich functionals and applies to the Fortet–Mourier metric \(\zeta _{r}\) [see (12)] due to the representation (13). Starting with a discrete probability measure P based on a large number N of scenarios, it selects a smaller number n of scenarios out of the original set of scenarios together with new probabilities such that the new discrete probability measure represents the best approximation to P with respect to \(\zeta _{r}\). More precisely, let P have the scenarios \(\xi ^{i}\) with probabilities \(p_{i}\), \(i=1,\ldots ,N\). Using the dual representation (13) of \(\zeta _r\) the best approximation problem

$$\begin{aligned} \min _{Q\in \mathcal {P}_{n}(\mathrm{supp}\,P)}\zeta _r(P,Q) \end{aligned}$$

can be rewritten as combinatorial program

$$\begin{aligned} \min \left\{ \sum _{i,j=1}^{N}p_{i}x_{ij}c_{r}(\xi ^{i},\xi ^{j})\left| \begin{array}{c} \sum \limits _{i=1}^{N}x_{ij}=1\,,j=1,\ldots ,N,\,\sum \limits _{i=1}^{N}y_{i}\le n\,,\\ x_{ij}\le y_{i}\,,\, x_{ij}\in \{0,1\}\,,\,i,j=1,\ldots ,N\,,\\ y_{i}\in \{0,1\}\,,\,i=1,\ldots ,N \end{array}\right. \right\} , \end{aligned}$$
(15)

where the variable \(y_i\) decides whether scenario \(\xi ^{i}\) remains and \(x_{ij}\) selects a scenario \(\xi ^{j}\) that minimizes the distance \(c_r(\,\cdot \,,\xi ^{i})\). We note that (15) is known as n-median problem (see [4]) which is NP-hard as shown in [27].

If J denotes a subset of \(\{1,\ldots ,N\}\) with cardinality \(|J|=n\), the best approximation problem can be decomposed into finding the optimal index set J of remaining scenarios and into determining the optimal discrete probability measure given J. With \(P_{J}\) denoting any probability measure with support consisting of the scenarios \(\xi ^{j}\), \(j\in J\), the best approximation problem has a solution \(P_{J}^{\star }\) such that

$$\begin{aligned}&\zeta _r(P,P_{J}^{*})=\min _{P_{J}}\zeta _{r}(P,P_{J}) =\sum \limits _{i\not \in J}p_i\min _{j\in J}c_r(\xi ^{i},\xi ^{j}) \end{aligned}$$
(16)
$$\begin{aligned}&\text{ with } P_{J}^{\star } \text{ given } \text{ by } \quad P_{J}^{\star }=\sum _{j\in J}\pi _{j}\xi ^{j}\,,\; \text{ where } \;\; \pi _j=p_j+\sum \limits _{i\in I_j} p_i\quad (\forall j\in J)\qquad \quad \end{aligned}$$
(17)

and the index sets \(I_{j}\), \(j\in J\), are defined by \(I_j:=\{i\in \{1,\ldots ,N\}{\setminus } J: j=j(i)\}\) with \(j(i)\in \arg \min \nolimits _{j\in J}c_r(\xi ^{i},\xi ^{j})\), \(\forall i\not \in J\). The formula (17) for the optimal weights is called redistribution rule in [10, 19] where the results (16) and (17) are proved, too.

For a survey of theory and algorithms for n-median problems we refer the interested reader to [4]. Presently local search heuristics [1] and a novel approximation algorithm [30] seem to be the most favorable algorithms with best approximation guarantees. Simple alternatives without approximation guarantees are forward and backward greedy heuristics developed and tested in [18, Algorithms 2.2 and 2.4], [19]. The scenario reduction approach described above has been extended to discrepancy distances in [20, 21]. The latter distances are of the form

$$\begin{aligned} \alpha (P,Q)=\sup _{B\in \mathcal {B}}|P(B)-Q(B)|\quad (P,Q\in \mathcal {P}(\varXi )), \end{aligned}$$
(18)

where \(\mathcal {B}\) is a suitable class of Borel subsets of \(\varXi \). Such distances are relevant for chance constrained stochastic programs if \(\mathcal {B}\) contains the relevant sets (for example, the polyhedra \(\mathcal {P}(x)\)). We recall, however, that employing probability metrics like (12) and (18) means that decisions on reducing scenarios are based on coarse estimates of the minimal information distances (6) and, thus, do essentially not depend on the specific stochastic program. Possibly due to this observation several authors developed specific heuristic approaches to scenario generation and reduction for specific applications (see, for example, [13, 32]). These developments served in turn as a motivation for the work reported in the present paper.

We will show in this paper that the optimal scenario generation problem (9) may have favorable solution properties if the set \(\mathcal {F}\) remains as small as possible, i.e., as chosen in (5). In Sect. 2 we demonstrate this for linear two-stage stochastic programs. First we show that (9) can be formulated as generalized semi-infinite program (Theorem 1) which is convex in some cases (Theorem 2), enjoys stability (Theorem 3) and allows a transformation into a standard semi-infinite program in a number of cases. In Sect. 3 we revisit the problem of optimal scenario reduction for two-stage models and provide a new formulation based on the minimal information distance (6) as mixed-integer linear semi-infinite program. The latter decomposes into solving binary and linear semi-infinite programs recursively. Section 4 presents a mixed-integer linear semi-infinite program for optimal scenario generation in chance constrained programming. Finally we illustrate the approach to scenario generation for the classical newsvendor problem and finish with conclusions.

2 Optimal scenario generation for two-stage models by generalized semi-infinite programming

We consider a linear two-stage stochastic program (1) with the integrand (2), a probability distribution P on \(\mathbb {R}^{s}\) and with \(\varPhi \) denoting the infimal value (3) of the second-stage program. Furthermore, we impose the following conditions in addition to the general assumptions made in Sect. 1:

  1. (A0)

    X is a bounded polyhedron and \(\varXi \) is convex polyhedral.

  2. (A1)

    \(h(x,\xi )\in W(Y)=\{Wy:y\in Y\}\) and \(q(\xi )\in \mathcal {D}\) hold for all \((x,\xi )\in X\times \varXi \),

  3. (A2)

    P has a second order absolute moment.

Condition (A1) combines the usual conditions relatively complete recourse and dual feasibility and (A2) implies that all integrals are finite. Both conditions are standard for two-stage stochastic programs. In particular, (A0)–(A2) imply that the infima v(P) and \(v(P_{n})\) are attained and the estimate

$$\begin{aligned} |v(P)-v(P_{n})|\le & {} \sup _{x\in X}\left| \int _{\varXi }f_{0}(x,\xi )P(d\xi )- \int _{\varXi }f_{0}(x,\xi )P_{n}(d\xi )\right| \\= & {} \sup _{x\in X}\left| \int _{\varXi }\varPhi (q(\xi ),h(x,\xi ))P(d\xi )- \int _{\varXi }\varPhi (q(\xi ),h(x,\xi ))P_{n}(d\xi )\right| \end{aligned}$$

holds due to Proposition 1 for every \(P_{n}\in \mathcal {P}_{n}(\varXi )\). Hence, the formulation of the optimal scenario generation problem for (1), (2) based on the minimal information distance (6) consists in solving the best uniform approximation problem

$$\begin{aligned} \min _{\begin{array}{c} (\xi ^{1},\ldots ,\xi ^{n})\in \varXi ^{n}\\ {(w_{1},\ldots ,w_{n} )\in \mathcal {S}_{n}} \end{array}}\sup _{x\in X}\left| \int _{\varXi }\varPhi (q(\xi ),h(x,\xi ))P(d\xi ) -\sum _{i=1}^{n}w_{i}\varPhi (q(\xi ^{i}),h(x,\xi ^{i}))\right| . \end{aligned}$$
(19)

It means that the convex expected recourse function \(F_{P}:X\rightarrow \mathbb {R}\)

$$\begin{aligned} F_{P}(x):=\int _{\varXi }\varPhi (q(\xi ),h(x,\xi ))P(d\xi ) \end{aligned}$$
(20)

has to be approximated uniformly on X by the best convex combination of n convex polyhedral functions appearing as integrand in \(F_{P}\).

Note that the minimal class \(\mathcal {F}=\{\varPhi (q(\cdot ),h(x,\cdot )):x\in X\}\) of functions from \(\varXi \) to \(\mathbb {R}\) enjoys specific properties. All functions are finite, continuous and piecewise linear-quadratic on \(\varXi \). They are linear-quadratic on each convex polyhedral set

$$\begin{aligned} \varXi _{j}(x)=\{\xi \in \varXi :(q(\xi ),h(x,\xi ))\in \mathcal {K}_{j}\}\quad (j=1,\ldots ,\ell ), \end{aligned}$$

where the convex polyhedral cones \(\mathcal {K}_{j}\), \(j=1,\ldots ,\ell \), represent a decomposition of the domain of \(\varPhi \), which is itself a convex polyhedral cone in \(\mathbb {R}^{\bar{m}+r}\). The latter decomposition depends only on the matrix W [54]. In particular, the functions \(\varPhi (q(\cdot ),h(x,\cdot ))\) are locally Lipschitz continuous where the Lipschitz constants on the balls \(\{\xi \in \varXi :\Vert \xi \Vert \le \rho \}\) grow linearly with \(\rho \) and can be chosen uniform with respect to \(x\in X\) (see [43, Proposition 22]).

It is well-known that best uniform approximation problems may be reformulated as semi-infinite programs (SIP), i.e., as optimization problems with finitely many variables, but infinitely many constraints. We show next that (19) leads to a generalized semi-infinite program (GSIP), that is, to a SIP in which the index set of the constraints is infinite and depends on the decision. Theory and numerical methods for such models are studied in a number of publications. We mention here the monograph [50] and the tutorial [17].

Theorem 1

Assume (A0)–(A2). Then (19) is equivalent to the GSIP

$$\begin{aligned} \min _{\begin{array}{c} t\ge 0\\ (\xi ^{1},\ldots ,\xi ^{n})\in \varXi ^{n}\\ (w_{1},\ldots ,w_{n}) \in \mathcal {S}_{n} \end{array}}\left\{ t\left| \begin{array}{c} \sum \limits _{i=1}^{n}w_{i}\langle h(x,\xi ^{i}),z_{i}\rangle \le t+F_{P}(x)\\ F_{P}(x)\le t+\sum \limits _{i=1}^{n}w_{i}\langle q(\xi ^{i}),y_{i}\rangle \\ \forall (x,y,z)\in \mathcal {M}(\xi ^{1},\ldots ,\xi ^{n})\end{array}\right. \right\} , \end{aligned}$$
(21)

where the set-valued mapping \(\mathcal {M}\) from \(\varXi ^{n}\) to \(\mathbb {R}^{m+(\bar{m}+r)n}\) is defined by

$$\begin{aligned} \mathcal {M}(\xi )= & {} \{(x,y,z)\in X\times Y^{n}\times \mathbb {R}^{rn}:Wy_{i}=h(x,\xi ^{i}),\nonumber \\&W^{\top }z_{i}-q(\xi ^{i})\in Y^{*},\,i=1,\ldots ,n\} \end{aligned}$$
(22)

for all \(\xi =(\xi ^{1},\ldots ,\xi ^{n})\in \varXi ^{n}\) and \(F_{P}:X\rightarrow \mathbb {R}\) is given by (20). If the function h is affine, the feasible set of (21) is closed.

Proof

By the standard way of rewriting best uniform approximation problems one obtains first by introducing the auxiliary variable t that the semi-infinite program

$$\begin{aligned} \min _{\begin{array}{c} t\ge 0\\ (\xi ^{1},\ldots ,\xi ^{n})\in \varXi ^{n}\\ (w_{1},\ldots ,w_{n}) \in \mathcal {S}_{n} \end{array}}\left\{ t\left| \begin{array}{c} \sum \limits _{i=1}^{n}w_{i}\varPhi (q(\xi ^{i}),h(x,\xi ^{i}))\le t+F_{P}(x)\\ F_{P}(x)\le t+\sum \limits _{i=1}^{n}w_{i}\varPhi (q(\xi ^{i}),h(x,\xi ^{i}))\\ \forall x\in X\end{array}\right. \right\} \end{aligned}$$
(23)

is equivalent to (19). Next we exploit the duality relation

$$\begin{aligned} \varPhi (q,t)=\inf \{\langle q,y\rangle :Wy=t, y\in Y\}= \sup \{\langle t,z\rangle :W^{\top }z-q\in Y^{*}\} \end{aligned}$$

of the second-stage program for all pairs \((q,t)\in \mathcal {D}\times W(Y)\). Then the primal and dual program are both solvable. Due to (A1) the semi-infinite program (23) may be reformulated as

$$\begin{aligned} \min _{\begin{array}{c} t\ge 0\\ (\xi ^{1},\ldots ,\xi ^{n})\in \varXi ^{n}\\ (w_{1},\ldots ,w_{n}) \in \mathcal {S}_{n} \end{array}}\left\{ t\left| \begin{array}{c} \sum \limits _{i=1}^{n}w_{i}\sup \{\langle h(x,\xi ^{i}),z\rangle : W^{\top }z-q(\xi ^{i})\in Y^{*}\}\le t+F_{P}(x)\\ F_{P}(x)\le t+\sum \limits _{i=1}^{n}w_{i}\inf \{\langle q(\xi ^{i}),y\rangle : Wy=h(x,\xi ^{i}),\,y\in Y\}\\ \forall x\in X\end{array}\right. \right\} . \end{aligned}$$
(24)

Next we introduce 2n new variables \(y_{i}\in Y\) with \(Wy_{i}=h(x,\xi ^{i})\) and \(z_{i}\in \mathbb {R}^{r}\) with \(W^{\top }z_{i}-q(\xi ^{i})\in Y^{*}\), \(i=1,\ldots ,n\), and consider the generalized linear semi-infinite program (21). Then any \(t\ge 0\) and \((\xi ^{1},\ldots ,\xi ^{n})\in \varXi ^{n}\) solving problem (24) satisfies the constraints of (21).On the other hand, if \(t\ge 0\) and \((\xi ^{1},\ldots ,\xi ^{n})\in \varXi ^{n}\) attain the minimum in (21), the two inequalities

$$\begin{aligned} \sum _{i=1}^{n}w_{i}\langle h(x,\xi ^{i}),z_{i}\rangle \le t+F_{P}(x)\quad \text{ and }\quad F_{P}(x)\le t+\sum _{i=1}^{n}w_{i}\langle q(\xi ^{i}),y_{i}\rangle \end{aligned}$$

are satisfied for all \((x,y,z)\in \mathcal {M}(\xi ^{1},\ldots ,\xi ^{n})\). Hence, the inequalities

$$\begin{aligned} \sum _{i=1}^{n}w_{i}\sup \{\langle h(x,\xi ^{i}),z\rangle&:&W^{\top }z-q(\xi ^{i})\in Y^{*}\}\le t+F_{P}(x)\\ F_{P}(x)\le t+\sum _{i=1}^{n}w_{i}\inf \{\langle q(\xi ^{i}),y\rangle&:&Wy=h(x,\xi ^{i}),\,y\in Y\} \end{aligned}$$

are satisfied for all \(x\in X\). Hence, programs (24) and (21) are equivalent.

To show that the feasible set of (21) is closed, we know from [50, Corollary 3.1.21] (see also [17, Proposition 3.4]) that the lower semicontinuity of \(\mathcal {M}\) on \(\varXi ^{n}\) is a sufficient condition. Since the graph \(\mathrm{gph}\,\mathcal {M}\) of \(\mathcal {M}\) is of the form

$$\begin{aligned} \mathrm{gph}\,\mathcal {M}= & {} \big \{(\xi ^{1},\ldots ,\xi ^{n},x,y,z)\in \varXi ^{n}\times X\times Y^{n}\times \mathbb {R}^{r n}:Wy_{i}=h(x,\xi ^{i}),\\&\;\;W^{\top }z_{i}-q(\xi ^{i})\in Y^{*},\,i=1,\ldots ,n\big \} \end{aligned}$$

and h is affine, \(\mathrm{gph}\,\mathcal {M}\) is convex polyhedral. Such set-valued mappings are even Hausdorff Lipschitz continuous on its domain (see, for example, [42, Example 9.35]) and, hence, on \(\varXi ^{n}\) due to (A1). This completes the proof. \(\square \)

In general the optimization model (21) is not convex even when the the weights \(w_{i}\), \(i=1,\ldots ,n\), are fixed. However, we prove now that the model is convex if the function h is affine and either only right-hand sides or only costs are random.

Theorem 2

Assume (A0)–(A2), let the function h be affine, the weights \(w_{i}\), \(i=1,\ldots ,n\), be fixed and let either h or q be random. Then the feasible set of the GSIP (21) is closed and convex.

Proof

Let q be nonrandom. Then the feasible set M of (21) is of the form

$$\begin{aligned} M=\left\{ (t,\xi ^{1},\ldots ,\xi ^{n})\in \mathbb {R}_{+}\times \varXi ^{n}\left| \begin{array}{c} \sum \limits _{i=1}^{n}w_{i}\langle h(x,\xi ^{i}),z_{i}\rangle - t\le F_{P}(x)\\ F_{P}(x)\le t + \sum \limits _{i=1}^{n}w_{i}\langle q,y_{i}\rangle \\ \forall (x,y,z)\in \mathcal {M}(\xi ^{1},\ldots ,\xi ^{n})\end{array} \right. \right\} . \end{aligned}$$
(25)

Let \(\alpha \in [0,1]\) and \(\xi _{j}=(\xi _{j}^{1},\ldots ,\xi _{j}^{n})\in \varXi ^{n}\), \(t_{j}\in \mathbb {R}_{+}\), be such that \((t_{j},\xi _{j})\in M\), \(j=1,2\). We have to show that \(\alpha (t_{1},\xi _{1})+(1-\alpha )(t_{2},\xi _{2})\) belongs to M, too.

Let \(x\in X\) and \(z_{i}\in \{z\in \mathbb {R}^{r}:W^{\top }z-q\in Y^{\star }\}\) for \(i=1,\ldots ,n\) be chosen arbitrarily. Then we have

$$\begin{aligned}&\sum \limits _{i=1}^{n}w_{i}\langle h(x,\alpha \xi _{1}^{i}+(1-\alpha )\xi _{2}^{i}), z_{i}\rangle - \alpha t_{1}-(1-\alpha )t_{2}\\&\quad =\alpha \left( \sum \limits _{\,i=1}^{n}w_{i}\langle h(x,\xi _{1}^{i}),z_{i}\rangle -t_{1}\right) +(1-\alpha )\left( \sum \limits _{\,i=1}^{n}w_{i}\langle h(x, \xi _{2}^{i}),z_{i}\rangle -t_{2}\right) \\&\quad \le \alpha F_{P}(x)+(1-\alpha )F_{P}(x)=F_{P}(x). \end{aligned}$$

Now, let \(y_{ij}\in \{y\in Y:Wy=h(x,\xi _{j}^{i})\}\) for \(j=1,2\), \(i=1,\ldots ,n\), be chosen arbitrarily in addition. We obtain \(\alpha y_{i1}+(1-\alpha )y_{i2}\in \{y\in Y:Wy=h(x,\alpha \xi _{1}^{i}+(1-\alpha ) \xi _{2}^{i})\}\) and, hence,

$$\begin{aligned}&\alpha t_{1}+(1-\alpha )t_{2}+\sum \limits _{i=1}^{n}w_{i}\langle q,\alpha y_{i1}+ (1-\alpha )y_{i2}\rangle \\&\quad =\alpha \Big (t_{1}+\sum \limits _{i=1}^{n}w_{i}\langle q, y_{i1}\rangle \Big )+ (1-\alpha )\Big (t_{2}+\sum \limits _{i=1}^{n}w_{i}\langle q,y_{i2}\rangle \Big )\\&\quad \ge \alpha F_{P}(x)+(1-\alpha )F_{P}(x)=F_{P}(x). \end{aligned}$$

This means \(\alpha (t_{1},\xi _{1})+(1-\alpha )(t_{2},\xi _{2})\in M\) and M is convex. If q is random, but h nonrandom, the proof is similar. The closedness of M follows from Theorem 1. \(\square \)

For fixed weights and given \(n\in \mathbb {N}\) the GSIP (21) for determining the optimal scenarios \(\xi ^{i}\), \(i=1,\ldots ,n\), is of dimension \(n\,s+1\) and, thus, large scale in many cases. A difficulty of (21) is that the set \(\mathcal {M}(\xi ^{1},\ldots ,\xi ^{n})\) is unbounded even in general.

We note that \(F_{P}(x)\) can only be calculated approximately even if the probability measure P is completely known. Hence, it becomes important that the optimization model (21) behaves stable when the function \(F_{P}\) is perturbed. The following result shows that even Lipschitz stability of the optimal values can be expected if the conditions of Theorem 2 are imposed.

Theorem 3

Assume (A0) –(A2) and that the infimum \(v(F_{P})\) of (21) is positive. Let the function h be affine, let either h or q be random and the weights \(w_{i}\), \(i=1,\ldots ,n\), be fixed. Then there exist \(\kappa >0\) and \(\delta >0\) such that

$$\begin{aligned} |v(F_{P})-v(F)|\le \kappa \sup _{x\in X}|F_{P}(x)-F(x)|\,, \end{aligned}$$
(26)

for each continuous function F on X such that \(\sup _{x\in X}|F_{P}(x)-F(x)|<\delta \). Here, v(F) denotes the optimal value of (21) with \(F_{P}\) replaced by F.

Proof

As in the proof of Theorem 2 we assume without loss of generality that q is nonrandom. We consider the set-valued mapping \((t,\xi ^{1},\ldots ,\xi ^{n})\mapsto \Lambda (t,\xi ^{1},\ldots ,\xi ^{n})\) from \(\mathbb {R}_{+}\times \varXi ^{n}\) to the Banach space C(X) of real-valued continuous functions on X with the standard norm \(\Vert \cdot \Vert _{\infty }\), where

$$\begin{aligned} \Lambda (t,\xi ^{1},\ldots ,\xi ^{n})=\left\{ f\in C(X)\left| \begin{array}{c} \sum \limits _{i=1}^{n}w_{i}\langle h(x,\xi ^{i}),z_{i}\rangle - t - F_{P}(x)\le f(x)\\ f(x)\le t + \sum \limits _{i=1}^{n}w_{i}\langle q,y_{i}\rangle -F_{p}(x)\\ \forall (x,y,z)\in \mathcal {M}(\xi ^{1},\ldots ,\xi ^{n})\end{array} \right. \right\} . \end{aligned}$$

First, we show that the graph of the set-valued mapping \(\Lambda \) denoted by \(\mathrm{gph}\,\Lambda \) is convex. Let \(\alpha \in [0,1]\) and \(f_{j}\in C(X)\), \(\xi _{j}=(\xi _{j}^{1},\ldots ,\xi _{j}^{n})\in \varXi ^{n}\), \(t_{j}\in \mathbb {R}_{+}\), be such that \((t_{j},\xi _{j},f_{j})\in \mathrm{gph}\,\Lambda \), \(j=1,2\). Then we obtain as in the proof of Theorem 2

$$\begin{aligned}&\sum \limits _{i=1}^{n}w_{i}\langle h(x,\alpha \xi _{1}^{i}+(1-\alpha )\xi _{2}^{i}), z_{i}\rangle - \alpha t_{1}-(1-\alpha )t_{2}-F_{P}(x)\\&\quad =\alpha \Big (\sum \limits _{i=1}^{n}w_{i}\langle h(x,\xi _{1}^{i}),z_{i}\rangle - t_{1}-F_{P}(x)\Big )\\&\qquad +(1-\alpha )\Big (\sum \limits _{i=1}^{n}w_{i}\langle h(x,\xi _{2}^{i}),z_{i}\rangle -t_{2}-F_{P}(x)\Big )\\&\quad \le \alpha f_{1}(x)+(1-\alpha )f_{2}(x)\qquad \qquad \text{ and }\\&\qquad \alpha t_{1}+(1-\alpha )t_{2}+\sum \limits _{i=1}^{n}w_{i}\langle q,\alpha y_{i1} +(1-\alpha )y_{i2}\rangle -F_{P}(x)\\&\quad \ge \alpha f_{1}(x)+(1-\alpha )f_{2}(x), \end{aligned}$$

where \(x\in X\), \(z_{i}\in \{z\in \mathbb {R}^{r}:W^{\top }z-q\in Y^{\star }\}\) and \(y_{ij}\in \{y\in Y:Wy=h(x,\xi _{j}^{i})\}\) for \(j=1,2\), \(i=1,\ldots ,n\), are chosen arbitrary. This proves that \(\mathrm{gph}\,\Lambda \) is convex. It is also closed as subset of \(\mathbb {R}^{ns+1}\times C(X)\). Furthermore, we know that the null function \(0\in C(X)\) belongs to the range of \(\Lambda \) and that \(\Lambda ^{-1}(0)\) is just the feasible set of (21). Thus, there exists \((\bar{t},\bar{\xi }^{1},\ldots ,\bar{\xi }^{n})\in \mathbb {R}_{+}\times \varXi ^{n}\) such that \(0\in \Lambda (\bar{t},\bar{\xi }^{1},\ldots ,\bar{\xi }^{n})\). We know that \(\bar{t}\ge v(F_{P})>0\) by assumption. Next we choose \(\delta \) such that \(0<\delta <\bar{t}\) and conclude that the closed ball \(\mathbb {B}(0,\delta )\) in C(X) is contained in the range of \(\Lambda \). The Robinson-Ursescu theorem (see [41, Theorem 2]) on continuity properties of set-valued mappings having closed convex graphs then implies that the inverse multifunction \(\Lambda ^{-1}\) has the Aubin property at \(f=0\) for any point \((\bar{t},\bar{\xi }^{1},\ldots ,\bar{\xi }^{n})\in \Lambda ^{-1}(0)\) with \(\bar{t}>0\). This means that there exist neighborhoods U of 0 and W of \((\bar{t},\bar{\xi }^{1},\ldots ,\bar{\xi }^{n})\), and a constant \(\kappa \in \mathbb {R}_{+}\) such that

$$\begin{aligned} \Lambda ^{-1}(f)\cap W\subseteq \Lambda ^{-1}(\tilde{f})+\kappa \Vert f-\tilde{f}\Vert _{\infty }\mathbb {B}\end{aligned}$$
(27)

holds for all \(f,\tilde{f}\in U\), where \(\mathbb {B}\) is the unit ball in \(\mathbb {R}^{ns+1}\).

Next we choose \(f=F-F_{P}\) with \(F\in C(X)\) and \(\tilde{f}=0\).

Let \(\varepsilon >0\) and \((\tilde{t},\tilde{\xi }^{1},\ldots ,\tilde{\xi }^{n}) \in \Lambda ^{-1}(f)\cap W\) such that \(v(f)\le \tilde{t}\le v(f)+\varepsilon \). Then there exists an element \((\bar{t},\bar{\xi }^{1},\ldots ,\bar{\xi }^{n})\in \Lambda ^{-1}(0)\) such that

$$\begin{aligned} \Vert (\bar{t},\bar{\xi }^{1},\ldots ,\bar{\xi }^{n})-(\tilde{t},\tilde{\xi }^{1}, \ldots ,\tilde{\xi }^{n})\Vert \le \kappa \Vert f\Vert _{\infty } \end{aligned}$$

holds for all \(f\in U\) due to the Aubin property (27) of \(\Lambda ^{-1}\) at 0. We note that \(\Lambda ^{-1}(F-F_{P})\) is the constraint set of (21) with \(F_{P}\) replaced by F, respectively, and obtain that the estimates

$$\begin{aligned} v(F_{P})-v(F)\le \bar{t}-v(F)\le |\bar{t}-\tilde{t}|-\varepsilon \le \kappa \Vert F-F_{P}\Vert _{\infty }-\varepsilon \,. \end{aligned}$$

hold for all \(F\in C(X)\) with \(F-F_{P}\in U\). Since the latter estimate is valid for any \(\varepsilon >0\), we obtain \(v(F_{P})-v(F)\le \kappa \Vert F-F_{P}\Vert _{\infty }\) if \(F-F_{P}\in U\). In the same way we can derive the estimate \(v(F)-v(F_{P})\le L\Vert F-F_{P}\Vert _{\infty }\) if \(F-F_{P}\in U\). It remains to select \(\delta >0\) such that the open ball around 0 with radius \(\delta \) in C(X) is contained in V and to require \(\Vert F-F_{P}\Vert _{\infty }<\delta \). Finally, we note that starting with the Aubin property of \(\Lambda ^{-1}\) at \(0\in C(X)\) the proof followed classical arguments of quantitative stability in optimization (see [28, Theorem 1]). \(\square \)

Notice that the infimum \(v(F_{P})\) of (21) is always nonnegative and \(v(F_{P})=0\) means that P has at most n scenarios. Hence, the assumption \(v(F_{P})>0\) is natural and satisfied, for example, for any n if P is a continuous probability distribution.

Theorem 3 applies, for example, if \(F_{P}\) is approximated by Monte Carlo or Quasi-Monte Carlo methods with a large sample size \(N>n\). Let

$$\begin{aligned} F_{P}(x)\approx \frac{1}{N}\sum _{j=1}^{N}\varPhi (q(\hat{\xi }^{j}),h(x,\hat{\xi }^{j})) \end{aligned}$$

be such an approximate representation of \(F_{P}(x)\) based on a sample \(\hat{\xi }^{j}\), \(j=1,\ldots ,N\). Inserting this approximation into (21) and exploiting again the duality relation then leads to the following approximate version of (21)

$$\begin{aligned} \min _{\begin{array}{c} t\ge 0,(\xi ^{1},\ldots ,\xi ^{n})\in \varXi ^{n} \\ {(w_{1},\ldots ,w_{n}) \in \mathcal {S}_{n}} \end{array}}\left\{ t\left| \begin{array}{c} \sum \limits _{i=1}^{n}w_{i}\langle h(x,\xi ^{i}),z_{i}\rangle \le t+ \frac{1}{N}\sum \limits _{j=1}^{N}\langle q(\hat{\xi }^{j}),\hat{y}_{j}\rangle \\ \frac{1}{N}\sum \limits _{j=1}^{N}\langle h(x,\hat{\xi }^{j}),\hat{z}_{j}\rangle \le t+\sum \limits _{i=1}^{n}w_{i}\langle q(\xi ^{i}),y_{i}\rangle \\ \forall (x,\hat{y},\hat{z})\in \mathcal {M}(\hat{\xi }^{1},\ldots ,\hat{\xi }^{N})\\ \forall (x,y,z)\in \mathcal {M}(\xi ^{1},\ldots ,\xi ^{n}) \end{array}\right. \right\} , \end{aligned}$$
(28)

where the sample \(\hat{\xi }^{j}\), \(j=1,\ldots ,N\) is given. The latter problem may also be characterized as a scenario clustering problem: Given a large scenario set \(\hat{\xi }^{j}\), \(j=1,\ldots ,N\), we are looking for a smaller scenario set \(\xi ^{i}\), \(i=1,\ldots ,n\), where each scenario \(\xi ^{j}\) corresponds to a cluster \(\hat{\xi }^{i}\), \(i\in I_{j}\), of the original scenarios.

The specific structure of (21) and (28) as generalized semi-infinite programs is promising and allows for specific solution algorithms (see [17, 50,51,52]).

In a number of cases it is even possible to reduce the GSIP (21) to a semi-infinite program by a transformation inspired by the recent paper [48]. To describe the idea, we consider the situation that only costs are random, the polyhedral cone Y is given by \(Y=\mathbb {R}^{\bar{m}}_{+}\) and the transformation is defined by

$$\begin{aligned} t:\varXi \times \mathcal {U}\rightarrow \mathbb {R}^{r},\quad t(\xi ,u)=u+(W^{+})^{\top }(q(\xi )-\bar{q})\,, \end{aligned}$$
(29)

where \(\mathcal {U}=\{u\in \mathbb {R}^{r}:W^{\top }u\le \bar{q}\}\), \(\bar{q}\in \mathbb {R}^{\bar{m}}\) and the \((\bar{m},r)\) matrix \(W^{+}\) denotes the Moore-Penrose inverse of W.

Proposition 2

Assume (A0) and (A2), \(h(x)\in W(\mathbb {R}^{\bar{m}}_{+})\) for all \(x\in X\) and that \(\bar{q}\) and \(q(\xi )\) belong to the range of \(W^{\top }\) for all \(\xi \in \varXi \). Then the generalized semi-infinite program (21) is equivalent to the semi-infinite program

$$\begin{aligned} \min _{\begin{array}{c} t\ge 0\\ (\xi ^{1},\ldots ,\xi ^{n})\in \varXi ^{n}\\ (w_{1},\ldots ,w_{n}) \in \mathcal {S}_{n} \end{array}}\left\{ t\left| \begin{array}{c} \sum \limits _{i=1}^{n}w_{i}\langle h(x),u_{i}+(W^{+})^{\top }(q(\xi ^{i})-\bar{q}) \rangle \le t+F_{P}(x)\\ F_{P}(x)\le t+\sum \limits _{i=1}^{n}w_{i}\langle q(\xi ^{i}),y_{i}\rangle \\ \forall (x,y_{1},\ldots ,y_{n},u_{1},\ldots ,u_{n})\in X\times \mathcal {Y}(x)^{n}\times \mathcal {U}^{n} \end{array}\right. \right\} , \end{aligned}$$
(30)

where \(\mathcal {Y}(x)=\{y\in \mathbb {R}_{+}^{\bar{m}}:Wy=h(x)\}\)  for each \(x\in X\). If the weights \(w_{i}\), \(i=1,\ldots ,n\), are fixed, (30) is a linear semi-infinite program.

Proof

The mapping t given by (29) has the property

$$\begin{aligned} W^{\top }t(\xi ,u)=W^{\top }u+W^{\top }(W^{\top })^{+}(q(\xi )-\bar{q})\le \bar{q} +(q(\xi )-\bar{q})=q(\xi ) \end{aligned}$$

for each pair \((\xi ,u)\in \varXi \times \mathcal {U}\) as \(q(\xi )-\bar{q}\in W^{\top }(\mathbb {R}^{r})\) and \(W^{\top }(W^{\top })^{+}\) is just the orthogonal projection onto \(W^{\top }(\mathbb {R}^{r})\). Hence, it holds

$$\begin{aligned} t(\xi ,\mathcal {U})=\{z\in \mathbb {R}^{r}:W^{\top }z\le q(\xi )\}. \end{aligned}$$

The desired equivalence between (21) and (30) follows by setting \(z_{i}=u_{i}+(W^{+})^{\top }(q(\xi ^{i})-\bar{q})\), \(i=1,\ldots ,n\), in (21). \(\square \)

A similar result can be derived if only right-hand sides are random and the primal polyhedral constraint set of the linear second-stage problem is given in the form \(\{y\in \mathbb {R}^{\bar{m}}:Wy\le h(x,\xi )\}\). Proposition 2 opens the possibility of using classical solution algorithms for semi-infinite programs, in particular, discretization and exchange methods (see the monographs [15, 23] and the surveys [22, 40]). In the “Appendix” we provide a short description of the discretization method due to [39].

Finally, we discuss the possible use of lower and upper bounds of \(F_{P}(x)\) for scenario generation. There is a well-developed theory for deriving lower und upper bounds of expectation functionals of convex-concave integrands. While lower bounds are due to Jensen’s classical result (e.g., see [7, Theorem 10.2.6]), upper bounds are known as Edmundson-Madansky bounds. They were further developed in the context of stochastic programming, for example, in [2, 5, 11, 12, 14, 25]. Many upper bounds are derived via generalized moment problems appearing as duals of semi-infinite programs [12, 25] (see also [26, Section 3.2.1]).

Let \(l_{P}(x)\) and \(u_{P}(x)\) denote lower and upper bounds of \(F_{P}(x)\), respectively. Then the following optimization problem (derived from (21)) computes upper bounds of the infima to (19) or (21), respectively:

$$\begin{aligned} \min _{\begin{array}{c} t\ge 0,(\xi _{1},\ldots ,\xi _{n})\in \varXi ^{n}\\ {(w_{1},\ldots ,w_{n}) \in \mathcal {S}_{n}} \end{array}}\left\{ t\left| \begin{array}{c} \sum \limits _{i=1}^{n}w_{i}\langle h(x,\xi ^{i}),z_{i}\rangle \le t+l_{P}(x),\\ u_{P}(x)\le t +\sum \limits _{i=1}^{n}w_{i}\langle q(\xi ^{i}),y_{i}\rangle ,\\ \forall (x,y,z)\in \mathcal {M}(\xi ^{1},\ldots ,\xi ^{n}) \end{array}\right. \right\} . \end{aligned}$$
(31)

If \(l_{P}(x)\) and \(u_{P}(x)\) are exchanged, the optimization problem (31) provides lower bounds of the infima to (21). These observations may be of interest for the numerical solution of (21) if it is nonconvex.

3 Optimal scenario reduction for two-stage models

Next we discuss the scenario reduction approach for two-stage models based on the minimal information distance (5) and the best approximation problem (9).

As in Sect. 1 let \(\xi ^{i}\), \(i=1,\ldots ,N\), be a large set of scenarios with probabilities \(p_{i}\), \(i=1,\ldots ,N\), that define a discrete probability measure P. For prescribed \(n\in \mathbb {N}\), \(n<N\), we intend to determine an index set \(J\subset \{1,\ldots ,N\}\) of cardinality \(|J|=n\) and new weights \(\bar{\pi }_{j}\), \(j\in J\), such that the probability measure

$$\begin{aligned} P_{J}^{*}=\sum _{j\in J}\bar{\pi }_{j}\delta _{\xi ^{j}} \end{aligned}$$

solves the optimal scenario reduction problem

$$\begin{aligned} \min \left\{ \sup _{x\in X}\left| \sum _{j\in J}\pi _{j}\varphi _{j}(x)- \sum _{i=1}^{N}p_{i}\varphi _{i}(x)\right| :J\subset \{1,\ldots ,N\}, |J|=n,\pi \in \mathcal {S}_{n}\right\} , \end{aligned}$$
(32)

where the functions \(\varphi _{i}(x)=\varPhi (q(\xi ^{i}),h(x,\xi ^{i}))\), \(i=1,\ldots ,N\), are convex polyhedral on X. Problem (32) represents a mixed-integer semi-infinite program. Compared with (15), (32) is based on Proposition 1 and, hence, on a (much) smaller upper bound for the difference of the optimal values. In addition, the solution of problem (32) depends on the data of the two-stage stochastic program.

Problem (32) decomposes into finding the optimal index set J of remaining scenarios and into determining the optimal weights \(\pi _{j}\), \(j\in J\), given J. The outer combinatorial optimization problem

$$\begin{aligned} \min \left\{ D(J,P):J\subset \{1,\ldots ,N\},|J|=n\right\} , \end{aligned}$$
(33)

determines the index set J and can be reformulated as binary optimization problem similar to (15). Here, the objective function D(JP) denotes the infimum of the inner program

$$\begin{aligned} \min _{\pi \in \mathcal {S}_{n}}\sup _{x\in X}\left| \sum _{j\in J}\pi _{j}\varphi _{j}(x)- \sum _{i=1}^{N}p_{i}\varphi _{i}(x)\right| . \end{aligned}$$
(34)

Any evaluation of the objective in (33) requires the solution of the best approximation problem (34). Hence, compared to the problem (16) in the introduction, the infimum of problem (34) cannot be computed explicitly.

For linear two-stage stochastic programs satisfying (A0)–(A2) the optimization model (34) contains finite functions and is equivalent to the reduced linear semi-infinite program

$$\begin{aligned} \min _{t\ge 0,\pi \in \mathcal {S}_{n}}\left\{ t\left| \begin{array}{c} \sum \limits _{j\in J}\pi _{j}\varphi _{j}(x)\le t+\sum \limits _{i=1}^{N} p_{i}\varphi _{i}(x)\\ \sum \limits _{i=1}^{N}p_{i}\varphi _{i}(x)\le t+\sum \limits _{j\in J} \pi _{j}\varphi _{j}(x)\\ \forall x\in X\end{array}\right. \right\} \end{aligned}$$
(35)

or to

$$\begin{aligned} \min _{t\ge 0,\pi \in \mathcal {S}_{n}}\left\{ t\left| \begin{array}{c} \sum \limits _{j\in J}\pi _{j}\langle h(x,\xi ^{j}),z_{j}\rangle \le t+ \sum \limits _{i=1}^{N}p_{i}\langle q(\xi ^{i}),y_{i}\rangle \\ \sum \limits _{i=1}^{N}p_{i}\langle h(x,\xi ^{i}),z_{i}\rangle \le t + \sum \limits _{j\in J}\pi _{j}\langle q(\xi ^{j}),y_{j}\rangle \\ \forall (x,y,z)\in \mathcal {M}(\xi ^{1},\ldots ,\xi ^{N}) \end{array}\right. \right\} , \end{aligned}$$
(36)

where the set \(\mathcal {M}(\xi ^{1},\ldots ,\xi ^{N})\) is defined as in (22) with n replaced by N. Hence, the linear semi-infinite program (36) has a comparably low number \(n+1\) of variables, but a (very) high-dimensional index set.

Problems (35) and (36) mean: For a given convex combination of many convex polyhedral functions \(\varphi _{i}(\cdot )\) on X one is looking for the best convex combination of a given subset of convex polyhedral functions that approximates the former uniformly.

4 Scenario generation for chance constrained programs

We consider a chance constrained program

$$\begin{aligned} \min \{g(x):x\in X,\,P(\mathcal {P}(x))\ge p\}, \end{aligned}$$

where \(\mathcal {P}(x)=\{\xi \in \varXi :h(x,\xi )\le 0\}\) is a polyhedron depending on x, g is a linear objective g, X and \(\varXi \) are polyhedral, h a function as described in Sect. 1 and \(p\in (0,1)\) a given probability level. Then we have \(f_{0}(x,\xi )=g(x)\) and \(f_{1}(x,\xi )=p-\mathbb {1}_{\mathcal {P}(x)}(\xi )\), and the best approximation problem (9) is of the form

$$\begin{aligned} \min \limits _{t\ge 0,\,P_{n}\in \mathcal {P}_{n}(\varXi )} \left\{ t\left| \begin{array}{c} P(\mathcal {P}(x))\le t+P_{n}(\mathcal {P}(x))\\ P_{n}(\mathcal {P}(x))\le t+ P(\mathcal {P}(x))\\ \forall x\in X\end{array}\right. \right\} \end{aligned}$$
(37)

and, thus,

$$\begin{aligned} P_{n}(\mathcal {P}(x))=\sum _{i=1}^{n}w_{i}\mathbb {1}_{\mathcal {P}(x)}(\xi ^{i})= \sum _{i=1}^{n}w_{i}\mathbb {1}_{\mathbb {R}_{-}^{r}}(h(x,\xi ^{i}))\quad (x\in X). \end{aligned}$$

It is well-known that chance constrained optimization models with discrete probability distributions are nonconvex in general (see, for example, [26, Section 2.2.2]), but can be reformulated as mixed-integer programs. We follow here the presentation in [26, Section 2.2.2] and choose a constant \(M>0\) such that

$$\begin{aligned} h(x,\xi ^{i})-M\,e\le 0\quad \forall x\in X, \end{aligned}$$
(38)

holds for each \(i=1,\ldots ,n\), where \(e=(1,\ldots ,1)^{\top }\in \mathbb {R}^{r}\). Such constant M always exists as X is compact. This allows to introduce binary variables \(z_{i}\in \{0,1\}\) such that \(z_{i}=0\) if \(h(x,\xi ^{i})\le 0\) for all \(x\in X\) and \(z_{i}=1\) otherwise, \(i=1,\ldots ,n\).

Then it is possible to reformulate (37) as mixed-integer semi-infinite program

$$\begin{aligned} \min \limits _{\begin{array}{c} t\ge 0,\,(\xi ^{1},\ldots ,\xi ^{n})\in \varXi ^n\\ {(w_{1},\ldots ,w_{n})\in \mathcal {S}_{n}}\\ (z_{1},\ldots ,z_{n}) \in \{0,1\}^{n} \end{array}}\left\{ t\left| \begin{array}{c} P(\mathcal {P}(x))\le t+\sum \limits _{i=1}^{n}w_{i}(1-z_{i})\\ \sum \limits _{i=1}^{n}w_{i}(1-z_{i})\le t+ P(\mathcal {P}(x))\\ h(x,\xi ^{i})-z_{i}Me\le 0,\;i=1,\ldots ,n\\ \forall x\in X\end{array}\right. \right\} . \end{aligned}$$
(39)

If the weights \(w_{i}\), \(i=1,\ldots ,n\), are fixed, problem (39) is a mixed-integer linear semi-infinite optimization model.

Since mixed-integer linear programs containing ’big-M’ type constraints are often difficult to solve, one is interested in strengthening the formulation of (39) by incorporating valid inequalities. A possible way consists in introducing precedence constraints based on partial orders \(\preceq \) on the index set \(\{1,\ldots ,n\}\). Such a partial order \(\preceq \) is called strongly consistent for (39) in [46] if for all \(x\in X\)

$$\begin{aligned} i\preceq j \wedge h(x,\xi ^{j})\le 0\Rightarrow h(x,\xi ^{i})\le 0\,. \end{aligned}$$

It follows as in [46] that the constraints

$$\begin{aligned} z_{i}\le z_{j}\quad \text{ for } \text{ all } i,j\in \{1,\ldots ,n\} \text{ such } \text{ that } i\preceq j \end{aligned}$$

are valid inequalities if \(\preceq \) is a strongly consistent order for (39).

If the function h is of the special form \(h(x,\xi )=\xi -T(\xi )x\) with a linear (sm)-matrix function \(T(\cdot )\), a strongly consistent order is \(i\preceq j \Leftrightarrow \xi ^{i}-T(\xi ^{i})x\le \xi ^{j}-T(\xi ^{j})x\), for all \(x\in X\), where \(\le \) is the component-wise inequality between elements of \(\mathbb {R}^{s}\). For the special function \(h(x,\xi )=\xi -Tx\) and fixed weights \(w_{i}\), \(i=1,\ldots ,n\), problem (39) is a mixed-integer linear semi-infinite program of the form

$$\begin{aligned} \min \limits _{\begin{array}{c} t\ge 0,\,(\xi ^{1},\ldots ,\xi ^{n})\in \varXi ^n\\ {(z_{1},\ldots ,z_{n})\in \{0,1\}^{n}} \end{array}}\left\{ t\left| \begin{array}{c} P(\mathcal {P}(x))\le t+\sum \limits _{i=1}^{n}w_{i}(1-z_{i})\\ \sum \limits _{i=1}^{n}w_{i}(1-z_{i})\le t+ P(\mathcal {P}(x))\\ \xi ^{i}-Tx-z_{i}Me\le 0,\;i=1,\ldots ,n\\ z_{i}\le z_{j} \text{ if } \xi ^{i}\le \xi ^{j},\,i,j=1,\ldots ,n\\ \forall x\in X\end{array}\right. \right\} . \end{aligned}$$
(40)

The papers [31, 53, 56] are sources for deriving further valid inequalities.

5 Newsvendor with random demand: an illustration

We consider the classical newsvendor problem to illustrate the approach to scenario generation developed in Sect. 2. We recall that a newsvendor must place a daily order for a number of copies x of a newspaper. He has to pay c monetary units for each copy and sells a copy at r units, where \(0<c<r\). The daily demand \(\xi \) is a real random variable with (discrete) probability distribution \(P\in \mathcal {P}(\mathbb {N})\), \(\varXi =\mathbb {R}_{+}\), and the remaining copies \(y(\xi )=\max \{0,x-\xi \}\) have to be removed. The newsvendor wishes that the order x maximizes his expected profit or, equivalently, minimizes his expected costs, i.e.,

$$\begin{aligned} \mathbb {E}[f_{0}(x,\xi )]=\int _{\mathbb {R}}f_0(x,\xi )dP(\xi )=(c-r)x+r\int _{\mathbb {R}} \max \{0,x-\xi \}P(d\xi )\quad (x\in \mathbb {R}). \end{aligned}$$

The model may be reformulated as a linear two-stage stochastic program with the optimal value function \(\varPhi (t)=\max \{0,-t\}\). Starting from

$$\begin{aligned} \varPhi (t)=\inf \{\langle q,y\rangle :Wy=t,\,y\ge 0\}= \sup \{\langle t,z\rangle :W^{\top }z\le q\} \end{aligned}$$

with \(W=(-1, 1)\), \(q=(0,r)^{\top }\), \(Y=\mathbb {R}_{+}\) and \(h(x,\xi )=\xi -x\), we obtain \(D=\{z\in \mathbb {R}:W^{\top }z\le q\}=[0,r]\) and for \(x\in X=\mathbb {R}_{+}\)

$$\begin{aligned} \mathbb {E}[f_{0}(x,\xi )]=(c-r)x+r\int _{0}^{x}(x-\xi )dG_{P}(\xi )= (c-r)x+r\int _{0}^{x}G_{P}(\xi )d\xi \,. \end{aligned}$$
(41)

The latter is obtained using integration by parts, where \(G_{P}\) denotes the distribution function \(G_{P}(x)= P(\{\xi \in \mathbb {R}:\xi \le x\})=\sum _{k\le x}p_{k}\) of P and \(p_{k}\) is the probability of demand \(k\in \mathbb {N}\). The unique solution is the \(\frac{r-c}{r}\)-quantile of P.

The corresponding optimal scenario generation problem is of the form

$$\begin{aligned} \min _{\begin{array}{c} t\ge 0,(\xi ^{1},\ldots ,\xi ^{n})\in \mathbb {R}^{n}\\ (w_{1},\ldots ,w_{n}) \in \mathcal {S}_{n} \end{array}}\left\{ t\left| \begin{array}{c} \sum \limits _{i=1}^{n}w_{i}(\xi ^{i}-x)z_{i}\le t+F_{P}(x)\\ F_{P}(x)\le t+ r\sum \limits _{i=1}^{n}w_{i}y_{i}\\ \forall (x,y,z)\in \mathbb {R}_{+}\times \mathbb {R}_{+}^{n}\times \mathbb {R}^{n}:\\ y_{i}+x\ge \xi ^{i},\,0\le z_{i}\le r,\,i=1,\ldots ,n \end{array}\right. \right\} , \end{aligned}$$
(42)

where \(F_{P}\) is the convex expected recourse function

$$\begin{aligned} F_{P}(x)=r\sum _{k=1}^{\infty }p_{k}\,\max \{0,x-k\}\,. \end{aligned}$$
(43)

We note that Theorems 2 and 3 apply to (42) if the weights \(w_{i}\) are fixed.

By incorporating \(F_{P}\) from (43), (42) is equivalent with the best approximation problem

$$\begin{aligned} \min _{\begin{array}{c} (\xi ^{1},\ldots ,\xi ^{n})\in \mathbb {R}^{n}\\ (w_{1},\ldots ,w_{n}) \in \mathcal {S}_{n} \end{array}}\sup _{x\in \mathbb {R}_{+}}\left| \sum _{k=1}^{\infty }p_{k} \max \{0,x-k\}-\sum _{i=1}^{n}w_{i}\max \{0,x-\xi ^{i}\}\right| . \end{aligned}$$
(44)

We assume that the support of P is bounded, i.e., that the series representation in (43) is not infinite. Let \(N\in \mathbb {N}\) be such that \(p_{k}=0\) for all \(k> N\). Then \(F_{P}\) is piecewise linear convex on \(\mathbb {R}_{+}\) with possible kinks at any \(k\in \mathbb {N}\), \(k\le N\). The slope of \(F_{P}\) at k is \(r\sum _{i=1}^{k}p_{i}\) and it holds \(F_{P}(x)=r(x-\mathbb {E}[\xi ])\) for \(x\ge N\) where \(\mathbb {E}[\xi ]\) is the mean value of \(\xi \), i.e., \(\mathbb {E}[\xi ]=\sum _{k=1}^{N}p_{k}k\).

Using the transformation idea from Proposition 2 (see [48]) we are able to transfer the generalized semi-infinite program (42) into a semi-infinite one. To this end we define the mapping

$$\begin{aligned} t:\varXi \times \mathcal {U}\rightarrow \mathbb {R},\quad t(\xi ,u)=u+\xi , \end{aligned}$$

where \(\mathcal {U}=\mathbb {R}_{+}=\{u\in \mathbb {R}_{+}:x+u\ge 0\}\) for each \(x\in \mathbb {R}_{+}\). Then the transformation \(y=t(\xi ,u)\) leads to \(t(\xi ,\mathcal {U})=\{y\in \mathbb {R}_{+}:x+y\ge \xi \}\) for all \((x,\xi )\in \mathbb {R}_{+}^{2}\) and the optimization model (42) is of the form

$$\begin{aligned} \min _{\begin{array}{c} t\ge 0,(\xi ^{1},\ldots ,\xi ^{n})\in \mathbb {R}^{n}\\ {(w_{1},\ldots ,w_{n}) \in \mathcal {S}_{n}} \end{array}}\left\{ t\left| \begin{array}{c} \sum \limits _{i=1}^{n}w_{i}(\xi ^{i}-x)z_{i}\le t+F_{P}(x)\\ F_{P}(x)\le t+ r\sum \limits _{i=1}^{n}w_{i}(y_{i}+\xi ^{i})\\ \forall (x,y,z)\in \mathbb {R}_{+}\times \mathbb {R}_{+}^{n}\times D^{n} \end{array}\right. \right\} , \end{aligned}$$
(45)

where u is replaced again by y. If the support of P is contained in [0, M] for some \(M\in \mathbb {N}\), we can also replace both \(\mathbb {R}_{+}\) in (45) by [0, M] and arrive at a compact index set of the semi-infinite program. Hence, a solution of (45) by a discretization method (see “Appendix”) is possible.

6 Conclusions

The generation of scenarios is an important issue for solving applied stochastic programming models. Presently Monte Carlo sampling methods are the preferred approach (see [24]), but besides Quasi-Monte Carlo and sparse grid methods also best approximation methods are in use. The latter utilize metric distances of probability measures and suggest to determine discrete measures as best approximations to the underlying probability distribution (see [33, 35]).

Existing scenario reduction methods [10, 18] are based on the same theoretical background. However, we pointed out in Sect. 1 that stability results indicate that such probability metrics only lead to coarse estimates of distances of optimal values and solutions. Decisions on scenario generation and reduction based on such estimates appear somewhat questionable and should at least be further examined. This is supported by slow convergence rates in terms of such probability metrics. But a stability result like Proposition 1 also suggests to make use of the minimal information distance \(d_{\mathcal {F}}\) [see (6), (5)] as a basis for best approximation methods. This observation served as the guideline for the present paper. It turned out that at least for linear two-stage models the best approximation problem for scenario generation has favorable properties. It represents a best uniform approximation problem for the expected recourse function by a convex combination of polyhedral functions generated by scenarios. The latter can be rewritten as generalized semi-infinite optimization model and in many cases transformed into a standard semi-infinite program. If either only right-hand sides or only costs are random the optimization model is convex. In any case there exists a well-developed theory and a number of solution algorithms for such models (see [17, 22, 40, 50,51,52]). Scenario reduction problems for linear two-stage models can be decomposed into solving a combinatorial optimization problem and a linear semi-infinite program, where the first determines the remaining scenarios and the second their new probabilities.

The characterization of scenario generation with respect to the distance \(d_{\mathcal {F}}\) as best approximation problem for the expected recourse function provides a link to bounding schemes for the expected recourse (see [26, Section 3.2.1]). It reveals the close relationship of scenario generation, scenario reduction and bounding.

The aim of the present paper consisted in showing that employing minimal information distances for scenario generation and reduction leads to interesting optimization models. Their solution should result in improved decisions for scenario generation and reduction at least for two-stage models. In a next step we are planning to confirm this by numerical experiments.