Keywords

1 Introduction

Many solution methods for stochastic programming models in finance and energy rely on approximating the underlying probability distribution P by a probability measure based on a finite number of scenarios (or atoms). In case of multi-stage models scenarios have to satisfy certain constraints due to the nonanticipativity of decisions. Such constraints lead to tree-structured scenarios. Hence, designing approximation schemes for multi-stage models requires the generation of scenarios as well as of trees.

There is a variety of methods for generating scenarios (see also the introductory overview (Römisch 2003) and Chapter 15, this volume), namely,

  • Monte Carlo methods (applied to statistical models or data) (Shapiro 2003b),

  • Quasi-Monte Carlo methods (see Lemieux (2009) for a recent exposition),

  • optimal quantization methods for probability measures (see Graf and Luschgy (2000)),

  • quadrature rules using sparse grids (e.g. Chen and Mehrotra (2008))

based on the underlying probability distribution P. In general, however, the generated scenarios will not exhibit tree structure.

Presently, there exist several approaches for generating such scenario trees. In a number of cases the tree structure is (partially) predetermined and scenarios are generated sequentially by (conditional) random sampling (Dempster 2004; Kouwenberg 2001; Shapiro 2003a) or quasi-Monte Carlo sampling (Pennanen 2009). Alternatively, scenarios are generated by using bound-based constructions (Frauendorfer 1996; Kuhn 2005), using optimization methods such that certain moments are matched (Høyland and Wallace 2001; Gülpinar et al. 2004) or such that a best approximation of P is obtained in terms of a suitable probability metric (Pflug 2001; Hochreiter and Pflug 2007; Mirkov and Pflug 2007). We also refer to Dupačová et al. (2000) and the references therein for an overview on scenario tree generation.

The approach to scenario tree generation presented in the following does not require restrictive conditions on P. It starts with a number of scenarios generated by one of the methods mentioned above. The branching structure of the tree is not predetermined, but automatically detected by the algorithms such that a good approximation of P measured in terms of the closeness of optimal values is obtained. The whole approach is based on a quantitative stability analysis of (linear) multi-stage stochastic programs (see Heitsch et al. (2006) and Heitsch and Römisch (2010)). The algorithms rely on applying scenario reduction sequentially over time and are first analyzed in Heitsch and Römisch (2009a). In this chapter we review parts of our earlier work in Sections 14.2 and 14.4, develop new convergence results in Section 14.3, and report on an application to the electricity portfolio management of a municipal power company in Section 14.5. Readers which are mainly interested in algorithms and their application may skip Sections 14.2 and 14.3.

To state the mathematical optimization model, let the periods of the time horizon be denoted \(t=1,\ldots,T\) and \(\xi=(\xi_{t})_{t=1}^{T}\) be an \({\mathbb R}^{d}\)-valued stochastic process on some probability space \((\varOmega,{\mathcal{F}},{\mathbb P})\). The filtration \(({\mathcal{F}}_{t}(\xi))_{t=1}^{T}\) associated with ξ is defined by \({\mathcal{F}}_{t}(\xi):=\sigma(\xi^{t})\) with \(\xi^{t}=(\xi_{1},\ldots,\xi_{t})\) for \(t=1,\ldots,T\). We assume that \({\mathcal{F}}_{1}=\{\emptyset,\varOmega\}\) and \(\xi_{t}\in L_{r}(\varOmega,{\mathcal{F}},{\mathbb P})\), i.e., \({\mathbb E}(|\xi_{t}|^{r})<+\infty\) for every \(t=1,\ldots,T\) and some \(r\ge 1\). By \(P={\mathbb P}\,\xi^{-1}\) we denote the probability distribution of ξ and by P t its tth marginal probability distribution for \(t=1,\ldots,T\), i.e.,

$$P_{t}(B_{t})={\mathbb P}\,\xi^{-1}(\varXi_{1}\times\cdots\times\varXi_{t-1}\times B_{t} \times\varXi_{t+1}\times\cdots\times\varXi_{T})\quad(B_{t}\in{\mathcal{B}}(\varXi_{t})),$$

where \(\varXi_{t}\subseteq{\mathbb R}^{d}\) denotes the support of ξ t and \({\mathcal{B}}(\varXi_{t})\) the σ-field of its Borel subsets. In particular, Ξ 1 denotes the singleton \(\varXi_{1}=\{\xi_{1}\}\).

A linear multi-stage stochastic program can be written as

$$\min\left\{{\mathbb E}\left(\sum_{t=1}^{T}\langle b_{t}(\xi_{t}),x_{t}\rangle\right) \left| \begin{array}{l} x_{t}=x_{t}(\xi_{1},...,\xi_{t}),\, x_{t}\in X_{t}, \\ A_{t,0}x_{t}+A_{t,1}(\xi_{t})=h_{t}(\xi_{t}) \end{array} \; (t=1,...,T)\right.\right\},$$
((14.1))

where x t is an \({\mathbb R}^{m_{t}}\)-valued decision vector for time period t. The latter is a Borel measurable function depending (only) on \((\xi_{1},\ldots,\xi_{t})\in\times_{\tau=1}^{t}\varXi_{\tau}=\varXi^{t}\), i.e., it depends on the data observed until time t (nonanticipativity). In particular, the components of x 1 are here and now decisions since x 1 may only depend on ξ 1 which was assumed to be deterministic. The decisions are subject to constraints: each x t has to be chosen within a given polyhedral set X t . Moreover, there are dynamic constraints involving matrices \(A_{t,\tau}\), \(\tau\in\{0,1\}\), and right-hand sides h t . The matrices \(A_{t,1}(\cdot)\), the cost coefficients \(b_{t}(\cdot)\), and right-hand sides \(h_{t}(\cdot)\) may depend on ξ t in an affinely linear way. \({\mathbb E}\) denotes expectation with respect to \({\mathbb P}\), i.e., the objective corresponds to the expected total costs of a decision vector \((x_{1},\ldots,x_{T})\).

2 Stability of Multi-stage Stochastic Programs

Studying stability of the multi-stage stochastic program (14.1) consists in regarding it as an optimization problem in the infinite dimensional linear space \(\times_{t=1}^{T}L_{r'}(\varOmega,\mathcal{F},{\mathbb P};{\mathbb R}^{m_{t}})\). This is a Banach space when endowed with the norm

$${\begin{array}{lcl} \|x\|_{r'} & := & \left(\sum\limits_{t=1}^T{\mathbb E}\left[|x_{t}|^{r'}\right]\right)^{1/r'} \quad \mathrm{for} \ r'\in[1,+\infty), \\ \|x\|_{\infty} & := & \max\limits_{t=1,...,T}\mathrm{ess}\sup|x_{t}|, \end{array} }$$

where \(|\,.\,|\) denotes some norm on the relevant Euclidean spaces and \(\mathrm{ess}\sup|x_{t}|\) denotes the essential supremum of \(|x_{t}|\), i.e., the smallest constant C such that \(|x_{t}|\le C\) holds \({\mathbb P}\)-almost surely. Analogously, ξ can be understood as an element of the Banach space \(\times_{t=1}^{T}L_{r}(\varOmega,\mathcal{F},{\mathbb P};{\mathbb R}^{d})\) with norm \(\|\xi\|_{r}\). For the integrability numbers r and r’ it will be imposed that

$$\begin{array}{rcl} r & := & \left\{ \begin{array}{ll} \in [1,+\infty) &,\, \hbox{if only costs or only right-hand sides are random} \\ 2 &,\, \hbox{if only costs and right-hand sides are random}\\ T &,\,\hbox{if all technology matrices are random} \end{array}\right. , \\ & & \\[2mm] r' & := & \left\{ \begin{array}{ll} \frac{r}{r-1} &,\,\hbox{if only costs are random} \\ r &,\,\hbox{if only right-hand sides are random}\\ +\infty &,\,\hbox{if all technology matrices are random} \end{array}\right. , \end{array}$$
((14.2))

with regard to problem (14.1). The choice of r and the definition of \(r'\) are motivated by the knowledge of existing moments of the input process ξ, by having the stochastic program well defined (in particular, such that \(\langle b_{t}(\xi_{t}),x_{t}\rangle\) is integrable for every decision x t and \(t=1,...,T\)), and by satisfying the conditions (A2) and (A3) (see below).

Since \(r'\) depends on r and our assumptions will depend on both r and r’, we will add some comments on the choice of r and its interplay with the structure of the underlying stochastic programming model. To have the stochastic program well defined, the existence of certain moments of ξ has to be required. This fact is well known for the two-stage situation (see, e.g., Ruszczyński and Shapiro (2003), chapter 2). If either right-hand sides or costs in a multi-stage model (14.1) are random, it is sufficient to require \(r\ge 1\). The flexibility in case that the stochastic process ξ has moments of order \(r>1\) may be used to choose r’ as small as possible in order to weaken condition (A3) (see below) on the feasible set. If the linear stochastic program is fully random (i.e., costs, right-hand sides, and technology matrices are random), one needs \(r\ge T\) to have the model well defined and no flexibility w.r.t. r’ remains.

Next we introduce some notation. We set \(s:=Td\) and \(m:=\sum_{t=1}^T m_{t}\). Let

$$F(\xi,x):={\mathbb E}\left[\sum_{t=1}^T\langle b_{t}(\xi_{t}),x_{t}\rangle\right]$$

denote the objective function defined on \(L_{r}(\varOmega,\mathcal{F},{\mathbb P}; {\mathbb R}^s)\times L_{r'}(\varOmega,\mathcal{F},{\mathbb P};{\mathbb R}^{m})\) and let

$$\mathcal{X}(\xi) := \left\{ x\in\times_{t=1}^{T} L_{r'}(\varOmega,\sigma(\xi^t),{\mathbb P};{\mathbb R}^{m_{t}}) \, | \, x_{t}\in\mathcal{X}_{t}(x_{t-1};\xi_{t}) \hbox{ a.s. }\,\, (t=1,...,T) \right\}$$

denote the set of feasible elements of (14.1) with \(x_0\equiv 0\) and

$$\mathcal{X}_{t}(x_{t-1};\xi_{t}):=\left\{x_{t}\in{\mathbb R}^{m_{t}}: x_{t}\in X_{t},A_{t,0}x_{t}+A_{t,1}(\xi_{t})x_{t-1}=h_{t}(\xi_{t})\right\}$$

denoting the tth feasibility set for every \(t=1,...,T\). This allows to rewrite the stochastic program (14.1) in the short form

$$\min\left\{F(\xi,x):x\in\mathcal{X}(\xi)\right\}.$$
((14.3))

In the following, we need the optimal value

$$v(\xi) = \inf\left\{F(\xi,x):x\in\mathcal{X}(\xi)\right\}$$

for every \(\xi\in L_{r}(\varOmega,\mathcal{F},{\mathbb P}; {\mathbb R}^s)\) and, for any \(\varepsilon\ge 0\), the ε-approximate solution set (level set)

$$S_{\varepsilon}(\xi):=\left\{x\in\mathcal{X}(\xi):F(\xi,x)\le v(\xi)+\varepsilon\right\}$$

of the stochastic program (14.1). Since, for \(\varepsilon=0\), the set \(S_{\varepsilon}(\xi)\) coincides with the set solutions to (14.3), we will also use the notation \(S(\xi):=S_{0}(\xi)\). The following conditions will be imposed on (14.3):

  • (A1) The numbers \(r,r'\) are chosen according to (14.2) and \(\xi\in L_{r}(\varOmega,\mathcal{F},{\mathbb P};{\mathbb R}^{s})\).

  • (A2) There exists a \(\delta>0\) such that for any \(\tilde{\xi}\in L_{r}(\varOmega,\mathcal{F},{\mathbb P};{\mathbb R}^{s})\) satisfying \(\|\tilde{\xi}-\xi\|_{r}\le\delta\), any \(t=2,...,T\) and any \(x_{\tau}\in L_{r'}(\varOmega,\sigma(\tilde{\xi}^{\tau}),{\mathbb P};{\mathbb R}^{m_{\tau}})\) (\(\tau=1,...,t-1\)) satisfying \(x_{\tau}\in\mathcal{X}_{\tau}(x_{\tau-1}; \tilde{\xi}_{\tau})\) a.s. (where \(x_0= 0\)), there exists \(x_{t}\in L_{r'}(\varOmega,\sigma(\tilde{\xi}^{t}),{\mathbb P};{\mathbb R}^{m_{t}})\) satisfying \(x_{t}\in\mathcal{X}_{t}(x_{t-1};\tilde{\xi}_{t})\) a.s. (relatively complete recourse locally around ξ).

  • (A3) The optimal values \(v(\tilde{\xi})\) of (14.3) with input \(\tilde{\xi}\) are finite for all \(\tilde{\xi}\) in a neighborhood of ξ and the objective function F is level-bounded locally uniformly at ξ, i.e., for some \(\varepsilon_0>0\) there exists a \(\delta>0\) and a bounded subset B of \(L_{r'}(\varOmega,\mathcal{F},{\mathbb P};{\mathbb R}^{m})\) such that \(S_{\varepsilon_0}(\tilde{\xi})\) is contained in B for all \(\tilde{\xi}\in L_{r}(\varOmega,\mathcal{F},{\mathbb P};{\mathbb R}^{s})\) with \(\|\tilde{\xi}-\xi\|_{r}\le\delta\).

For any \(\tilde{\xi}\in L_{r}(\varOmega,\mathcal{F},{\mathbb P};{\mathbb R}^{s})\) sufficiently close to ξ in L r , condition (A2) implies the existence of some feasible \(\tilde{x} \) in \(\mathcal{X}(\tilde{\xi})\) and (14.2) implies the finiteness of the objective \(F(\tilde{\xi},\cdot)\) at any feasible \(\tilde{x} \). A sufficient condition for (A2) to hold is the complete recourse condition on every recourse matrix \(A_{t,0}\), i.e., \(A_{t,0}X_{t}={\mathbb R}^{n_{t}}\), \(t=1,...,T\). The locally uniform level-boundedness of the objective function F is quite standard in perturbation results for optimization problems (see, e.g., Rockafellar and Wets (1998), theorem 1.17). The finiteness condition on the optimal value \(v(\xi)\) is not implied by the level boundedness of F for all relevant pairs \((r,r')\). In general, conditions (A2) and (A3) get weaker for increasing r and decreasing r’, respectively.

The first stability result for multi-stage stochastic programs represents a quantitative continuity property of the optimal values. Its main observation is that multi-stage models behave stable at some stochastic input process ξ if both its probability distribution and its filtration are approximated with respect to the L r -distance and the filtration distance

$${D_{\rm f}}(\xi,\tilde{\xi})\!:=\! \sup_{\varepsilon>0}\! \inf_{x\in S_{\varepsilon}(\xi)\atop{\tilde{x}\in S_{\varepsilon}(\tilde{\xi})}} \!\!\sum_{t=2}^{T-1}\!\max\!\left\{ \big\|x_{t}-{\mathbb E}[x_{t}|{\mathcal{F}}_{t}(\tilde{\xi})]\big\|_{r'}, \big\|\tilde{x}_{t}-{\mathbb E}[\tilde{x}_{t}|{\mathcal{F}}_{t}(\xi)]\big\|_{r'}\!\right\},$$
((14.4))

where \({\mathbb E}[\,\cdot\,|{\mathcal{F}}_{t}(\xi)]\) and \({\mathbb E}[\,\cdot\,|{\mathcal{F}}_{t}(\tilde{\xi})]\) (\(t=1,...,T\)) are the corresponding conditional expectations, respectively. Note that for the supremum in (14.4) only small ε’s are relevant and that the approximate solution sets are bounded for \(\varepsilon\in(0,\varepsilon_0]\) according to (A3).

The following stability result for optimal values of program (14.3) is taken from Heitsch et al. (2006), theorem 2.1.

Theorem 1

Let (A1), (A2), and (A3) be satisfied and the set X1 be nonempty and bounded. Then there exist positive constants L and δ such that the estimate

$$\left|v(\xi)-v(\tilde{\xi})\right|\le L\left(\|\xi-\tilde{\xi}\|_{r}+ {D_{\rm f}}(\xi,\tilde{\xi})\right)$$
((14.5))

holds for all random elements \(\tilde{\xi}\in L_{r}(\varOmega,\mathcal{F},{\mathbb P};{\mathbb R}^{s})\) with \(\|\tilde{\xi}-\xi\|_{r}\le\delta\).

The result states that the changes of optimal values are at most proportional to the errors in terms of L r - and filtration distance when approximating ξ. The corresponding constant L depends on \(\|\xi\|_r\) (i.e. the rth moment of ξ), but is not known in general.

The filtration distance has a simpler representation if the approximation \(\tilde{\xi}\) of ξ is ξ-adapted, i.e., if \({\mathcal{F}}_{t}(\tilde{\xi})\subseteq{\mathcal{F}}_{t}(\xi)\) holds for every \(t=1,\ldots,T\). The latter is equivalent to the existence of measurable functions \(f_{t}:\varXi^{t}\to\varXi^{t}\) such that

$$\tilde{\xi}^{t}=f_{t}(\xi^{t})\quad(t=1,\ldots,T).$$

For ξ-adapted approximations \(\tilde{\xi}\) we have

$${D_{\rm f}}(\xi,\tilde{\xi})=\sup_{\varepsilon>0}\inf_{x\in S_{\varepsilon}(\xi)} \sum_{t=2}^{T-1}\left\|x_{t}-{\mathbb E}[x_{t}|{\mathcal{F}}_{t}(\tilde{\xi})]\right\|_{r'}$$
((14.6))

and if, in addition, the solution set \(S(\xi)\) is nonempty, we obtain

$${D_{\rm f}}(\xi,\tilde{\xi})=\inf_{x\in S(\xi)} \sum_{t=2}^{T-1}\left\|x_{t}-{\mathbb E}[x_{t}|{\mathcal{F}}_{t}(\tilde{\xi})]\right\|_{r'}.$$
((14.7))

The latter representation allows the conclusion

$${D_{\rm f}}(\xi,\tilde{\xi})\le\sum_{t=2}^{T-1}\left\|x_{t}-{\mathbb E}[x_{t}|{\mathcal{F}}_{t}(\tilde{\xi})]\right\|_{r'}$$
((14.8))

for any solution \(x=(x_{1},x_{2},\ldots,x_{T})\) of (14.3). For a given solution x of (14.3) there exist Borel measurable functions \(g_{t}:\varXi^{t}\to{\mathbb R}^{m_{t}}\) such that

$${\mathbb E}[x_{t}|{\mathcal{F}}_{t}(\tilde{\xi})]={\mathbb E}[x_{t}|\tilde{\xi}^{t}]=g_{t}(\tilde{\xi}^{t}) \quad(t=1,\ldots,T),$$

where \(g_{1}(\tilde{\xi}_{1})=g_{1}(\xi_{1})=x_{1}\) and \(g_{t}(\xi^{t})=x_{t}\). In general, further properties of the functions

$$g_{t}(y_{1},\ldots,y_{t})={\mathbb E}[x_{t}|\tilde{\xi}^{t}=(y_{1},\ldots,y_{t})]= {\mathbb E}[x_{t}|f_{t}(\xi^{t})=(y_{1},\ldots,y_{t})],$$

i.e., of the conditional expectations of x t under the condition that \(\tilde{\xi}^{t}\) equals \((y_{1},\ldots,y_{t})\), are not known. Since x t is a (measurable) function of ξ t, the function value \(g_{t}(y_{1},\ldots,y_{t})\) may be computed via the (multivariate) probability distribution P of ξ.

Unfortunately, in general, there is no solution \(x\in S(\xi)\) such that the functions x t depend continuously on ξ t for every \(t=1,\ldots,T\) (cf. the discussion in Rockafellar and Wets (1974)). Sometimes, however, the functional dependence of x t on ξ t is of a specific form as in the following situation (Garstka and Wets 1974, theorem 4.3).

Proposition 1

Assume that only right-hand sides in (14.1) are random and that \(S(\xi)\) is nonempty. Then there exists \(x=(x_{1},\ldots,x_{T})\) in \(S(\xi)\) such that \(x_{t}=\varphi_{t}(h_{1}(\xi_{1}),\ldots,h_{t}(\xi_{t}))\) and ϕ t is a continuous, piecewise affine function for every \(t=1,\ldots,T\). In particular, xt is Lipschitz continuous as a function of ξt for every \(t=1,\ldots,T\).

This motivates the following condition on the conditional distributions of ξ and on the ξ-adapted approximation \(\tilde{\xi}\) of ξ.

  • (A4) For each \(t\in\{1,\ldots,T\}\) and each pair \((\varPhi_{t},f_{t})\) of Lipschitz continuous functions \(\varPhi_{t}:\varXi^{t}\to{\mathbb R}^{m_{t}}\) and \(f_{t}:\varXi^{t}\to\varXi^{t}\), the function

    $$g_{t}(y_{1},\ldots,y_{t})={\mathbb E}[\varPhi_{t}(\xi^{t})|f_{t}(\xi^{t})=(y_{1},\ldots,y_{t})]$$
    ((14.9))

    is Lipschitz continuous on Ξ t.

Then the following result is an immediate consequence of Theorem 1 and Proposition 1 if (A4) is imposed in addition.

Corollary 1

Let (A1), (A2), (A3), and (A4) be satisfied and X1 be nonempty and bounded. Assume that only right-hand sides in (14.1) are random and that \(S(\xi)\neq\emptyset\). Then there exist positive constants \(\hat{L} \) and δ such that

$$\left|v(\xi)-v(\tilde{\xi})\right|\le\hat{L}\|\xi-\tilde{\xi}\|_{r}$$
((14.10))

whenever \(\|\xi-\tilde{\xi}\|_{r}<\delta\) for any ξ-adapted \(\tilde{\xi}\) such that \(\tilde{\xi}^{t}=f_{t}(\xi^{t})\), \(t=1,\ldots,T\), with Lipschitz continuous functions \(f_{t}:\varXi^{t}\to{\mathbb R}^{td}\).

Proof

According to Proposition 1 there exists a solution \(x=(x_{1},\ldots,x_{T})\in S(\xi)\) such that \(x_{t}=\varPhi_{t}(\xi^{t})\) is a Lipschitz continuous function of ξ t for every \(t=1,\ldots,T\). Let \(\tilde{\xi}\) be ξ-adapted such that \(\tilde{\xi}^{t}=f_{t}(\xi^{t})\), \(t=1,\ldots,T\), where the functions \(f_{t}:\varXi^{t}\to\varXi^{t}\) are Lipschitz continuous on Ξ t. According to (A4) the functions g t from Ξ t to \({\mathbb R}^{m_{t}}\), \(t=1,\ldots,T\), given by (14.9) are Lipschitz continuous. Hence, there exist constants \(K_{t}>0\) such that

$$|g_{t}(y_{1},\ldots,y_{t})-g_{t}(\tilde{y}_{1},\ldots,\tilde{y}_{t})|\!\le\! K_{t}\!\sum_{\tau=1}^{t}|y_{\tau}-\tilde{y}_{\tau}|\;\;((y_{1},\ldots,y_{t}), (\tilde{y}_{1},\ldots,\tilde{y}_{t})\in\varXi^{t})$$

and, hence,

$${D_{\rm f}}(\xi,\tilde{\xi})\le\sum_{t=2}^{T-1}\left\|g_{t}(\xi^{t})-g_{t}(\tilde{\xi}^{t})\right\|_{r} \le\sum_{t=2}^{T-1}K_{t}\sum_{\tau=1}^{t}{\mathbb E}(|\xi_{\tau}-\tilde{\xi}_{\tau}|^{r})^{\frac{1}{r}}\le K\|\xi-\tilde{\xi}\|_{r}$$

for some suitably large constant K. Together with Theorem 1 we obtain (14.10) with \(\hat{L}=L K\).

We note that our condition (A4) is similar to assumption 2.6 in Küchler (2008) and Corollary 1 reminds of Küchler (2008), theorem 3. We also note that in case of finite supports Ξ t , \(t=1,\ldots,T\), the functions g t are necessarily Lipschitz continuous with possibly large Lipschitz constants K t , \(t=1,\ldots,T\), leading to a large constant \(\hat{L} \) in Corollary 1.

Stability results of approximate solutions to (14.3) require a stronger version of the filtration distance D f, namely,

$${D_{\rm f}}^{*}(\xi,\tilde{\xi}):=\sup_{x\in{\mathcal{B}}_{\infty}}\sum_{t=2}^{T} \big\|{\mathbb E}[x_{t}|{\mathcal{F}}_{t}(\xi)]-{\mathbb E}[x_{t}|{\mathcal{F}}_{t}(\tilde{\xi})]\big\|_{r'},$$
(14.11)

where \({\mathcal{B}}_{\infty}:=\{x:\varOmega\to{\mathbb R}^{m}:x \hbox{ is }{\mathcal{F}}\hbox{-measurable},|x(\omega)|\le 1,\,{\mathbb P}\hbox{-almost surely}\}\). Notice that the sum is extended by the additional summand for \(t=T\) and that the former infimum is replaced by a supremum with respect to a sufficiently large bounded set. If we require, in addition to assumption (A3), that for some \(\varepsilon_0 > 0\) there exist constants \(\delta>0\) and \(C>0\) such that \(|\tilde{x}(\omega)|\leq C\) for \({\mathbb P}\)-almost every \(\omega\in\varOmega\) and all \(\tilde{x}\in S_{\varepsilon_0}(\tilde{\xi})\) with \(\tilde{\xi}\in L_{r}(\varOmega,\mathcal{F},{\mathbb P};{\mathbb R}^{s})\) and \(\|\tilde{\xi}-\xi\|_{r}\le\delta\), we have

$${D_{\rm f}}(\xi,\tilde{\xi})\le C\,{D_{\rm f}}^{*}(\xi,\tilde{\xi}).$$
((14.11))

Sometimes it is sufficient to consider the unit ball in \(L_{r'}\) rather than the corresponding ball \(\mathcal{B}_{\infty}\) in \(L_{\infty}\) (cf. Heitsch and Römisch (2009a, 2010)). In contrast to D f the distance \({D_{\rm f}}^{*}\) is a metric as it satisfies the triangle inequality.

Next we state a second stability result that represents a calmness property of approximate solution sets (Heitsch and Römisch (2010), theorem 2.4).

Theorem 2

Let (A1), (A2), and (A3) be satisfied with \(r'\in[1,+\infty)\) and the set X1 be nonempty and bounded. Assume that \(S(\xi)\neq\emptyset\). Then there exist \(\bar{L}>0\) and \(\bar{\varepsilon}>0\) such that

$${{ d \kern -.15em l}}_{\infty}\left(S_{\varepsilon}(\xi),S_{\varepsilon}(\tilde{\xi})\right)\le \frac{\bar{L}}{\varepsilon}\left(\|\xi-\tilde{\xi}\|_{r}+D_\mathrm{f}^{*}(\xi,\tilde{\xi})\right)$$
((14.13))

holds for every \(\tilde{\xi}\in L_{r}(\varOmega,\mathcal{F},{\mathbb P};{\mathbb R}^{s})\) with \(\|\xi-\tilde{\xi}\|_{r}\le\delta\) (with \(\delta>0\) from (A3)) and \(S(\tilde{\xi})\neq\emptyset\), and for any \(\varepsilon\in(0,\bar{\varepsilon})\). Here, \({{ d \kern -.15em l}}_{\infty}\) denotes the Pompeiu–Hausdorff distance of closed bounded subsets of \(L_{r'}=L_{r'}(\varOmega.\mathcal{F},{\mathbb P};{\mathbb R}^{m})\) given by

$${{ d \kern -.15em l}}_{\infty}(B,\tilde{B})=\sup_{x\in L_{r'}}\left|d_{B}(x)-d_{\tilde{B}}(x)\right|$$

with \(d_{B}(x)\) denoting the distance of x to B, i.e., \(d_{B}(x)=\inf_{y\in B}\|x-y\|_{r'}\).

The most restrictive assumption in Theorem 2 is the existence of solutions to both problems. Notice that solutions always exist if the underlying random vector ξ has a finite number of scenarios or if \(r'\in(1,+\, \infty)\). For a more thorough discussion we refer to Heitsch and Römisch (2010), section 2. Notice that the constant \(\frac{\bar{L}}{\varepsilon}\) gets larger for decreasing ε and that, indeed, Theorem 2 does not remain true for the Pompeiu–Hausdorff distance of solution sets \(S(\xi)=S_{0}(\xi)\) and \(S(\tilde{\xi})=S_{0}(\tilde{\xi})\), respectively.

3 Tree Approximation Framework and Convergence

We present a general framework for tree approximations of multi-stage stochastic programs in case that empirical estimates of the underlying probability distribution are available and prove convergence using the stability results of Section 14.2.

First, we show that sequences \((\xi^{(n)})\) of ξ-adapted random variables converging to ξ in L r also converge to ξ in terms of the filtration distance D f. Recall that \(\xi^{(n)}\) is ξ-adapted if \({\mathcal{F}}_{t}(\xi^{(n)})\subseteq{\mathcal{F}}_{t}(\xi)\) holds for every \(t=1,\ldots,T\).

Proposition 2

Let (A1), (A2), and (A3) be satisfied, \(r'\in[1,+ \infty)\) and \(S(\xi)\) be nonempty. Let \((\xi^{(n)})\) be a sequence of ξ-adapted random variables converging to ξ in Lr such that the σ-fields \({\mathcal{F}}_{t}(\xi^{(n)})\) are nondecreasing with respect to n for every \(t=2,\ldots,T\). Then it holds

$$\lim_{n\to\infty}{D_{\rm f}}(\xi,\xi^{(n)})=\lim_{n\to\infty}\,\inf_{x\in S(\xi)} \sum_{t=2}^{T-1} \big\|x_{t}-{\mathbb E}[x_{t}|\mathcal{F}_{t}(\xi^{(n)})]\big\|_{r'} = 0.$$

Proof

By \(\hat{{\mathcal{F}}}_{t}\) we denote the smallest σ-field containing \({\mathcal{F}}_{t}(\xi^{(n)})\) for every \(n\in{\mathbb N}\), i.e.,

$$\hat{{\mathcal{F}}}_{t}:=\sigma\left(\bigcup\limits_{n\in{\mathbb N}} {\mathcal{F}}_{t}(\xi^{(n)})\right)\quad(t=1,\ldots,T).$$

As \({\mathcal{F}}_{t}(\xi^{(n)})\subseteq{\mathcal{F}}_{t}(\xi)\) holds for every \(n\in{\mathbb N}\), we conclude

$$\hat{{\mathcal{F}}}_{t}\subseteq{\mathcal{F}}_{t}(\xi)\quad(\forall t=1,\ldots,T)$$

due to the convergence of the sequence \((\xi^{(n)})\) to ξ in L r . Furthermore, the filtration distance \({D_{\rm f}}(\xi,\xi^{(n)})\) allows the representation

$${D_{\rm f}}(\xi,\xi^{(n)})=\inf_{x\in S(\xi)}\sum_{t=2}^{T-1}\big\|x_{t}-{\mathbb E}[x_{t}|{\mathcal{F}}_{t}(\xi^{(n)})]\big\|_{r'}$$

due to (14.7). As the σ-fields \({\mathcal{F}}_{t}(\xi^{(n)})\) are nondecreasing with respect to n, we obtain for each \(x\in S(\xi)\) and each \(t=2,\ldots,T\)

$$\big\|x_{t}-{\mathbb E}[x_{t}|{\mathcal{F}}_{t}(\xi^{(n)})]\big\|_{r'}=\big\|{\mathbb E}[x_{t}|\hat{{\mathcal{F}}}_{t}]-{\mathbb E}[x_{t}|{\mathcal{F}}_{t}(\xi^{(n)})]\big\|_{r'} \longrightarrow 0$$

as \(n\to\infty\) by classical convergence results of conditional expectations (e.g., Fetter (1977)). This completes the proof.

The result remains true if the assumption that the σ-fields \({\mathcal{F}}_{t}(\xi^{(n)})\) are nondecreasing is replaced by the slightly weaker condition

$$\sigma\left(\bigcup_{k=1}^{\infty}\bigcap_{n=k}^{\infty}{\mathcal{F}}_{t}(\xi^{(n)})\right)= \sigma\left(\bigcap_{k=1}^{\infty}\bigcup_{n=k}^{\infty}{\mathcal{F}}_{t}(\xi^{(n)})\right)$$

for every \(t=2,\ldots,T\) (Fetter 1977). Next we show that ξ-adapted approximations can be obtained by certain discretization techniques.

3.1 Convergence of Discretizations

Let Ξ t denote the closed subset \(\mathrm{supp}\,(\xi_{t})\) of \({\mathbb R}^{d}\) and \(\varXi^{t}=\times_{\tau=1}^{t}\varXi_{\tau}\) for every \(t=1,\ldots,T\). Now, we consider ξ to be given on the probability space \((\varXi,{\mathcal{B}}(\varXi),P)\), where \(\varXi=\varXi^{T}\) and \({\mathcal{B}}(\varXi)\) is the σ-field of Borel subsets of the sample space Ξ. Furthermore, the σ-fields \({\mathcal{F}}_{t}(\xi)\) are of the form

$${\mathcal{F}}_{t}(\xi) = \left\{\xi^{-1}\left(B_{t}\times\varXi_{t+1}\times\cdots\times\varXi_{T}\right): B_{t}\in {\mathcal{B}}(\varXi^{t})\right\}.$$
((14.14))

3.1.1 Decomposition of the Sample Space

We aim at approximating the stochastic process ξ by a certain sequence of discrete processes, i.e., by processes having a finite number of scenarios. The approach is based on finite decompositions of the sample space Ξ. Let us consider a sequence

$${\mathcal{D}}_{t}^{(n)}\,\subset\,{\mathcal{B}}(\varXi_{t}),\quad n\in {\mathbb N}.$$

It will be called a sequence of finite segmentations of Ξ if the following conditions are satisfied:

  • (C1) The elements of \({\mathcal{D}}_{t}^{(n)}\) are pairwise disjoint for all \(n\in{\mathbb N}\),

  • (C2) \({\mathcal{D}}_{t}^{(n)}\) is finite and it holds \(P_{t}\left({\cup_{D_{t}\in{\mathcal{D}}_{t}^{(n)}}D_{t}}\right) = 1\), \(n\in{\mathbb N}\).

  • (C3) For \(\delta_{t,n}:=\sup \left\{|\xi_{t}-{\tilde{\xi}}_{t}|: D_{t}\in{\mathcal{D}}_{t}^{(n)},\, \xi_{t},{\tilde{\xi}}_{t}\in D_{t},\, |\xi_{t}|,|{\tilde{\xi}}_{t}|\leq n\right\}\,\) it holds \(\lim_{n\rightarrow\infty} \delta_{t,n} = 0\) for every \(t=1,\ldots,T\).

Conditions (C1) and (C2) ensure that the sets \({\mathcal{D}}_{t}^{(n)}\) define finite partitions of the sample space Ξ t for every \(n\in{\mathbb N}\) such that P-almost every element of Ξ t can be associated with a unique set in \({\mathcal{D}}_{t}^{(n)}\). Condition (C3) says that the partition sets get arbitrarily small uniformly within increasing balls of radii n in Ξ t .

Next we define a sequence of finite segmentations in Ξ by

$${\mathcal{D}}^{(n)}:=\left\{D_1\times\cdots\times D_T : D_{t}\in {\mathcal{D}}_{t}^{(n)},t=1,\ldots,T\right\},\quad n\in{\mathbb N}.$$
((14.15))

3.1.2 Discretization of the Stochastic Process

Using the sequence \({\mathcal{D}}^{(n)}\) we will define a sequence of approximate stochastic processes \(\xi^{(n)}=(\xi_1^{(n)},\ldots,\xi_T^{(n)})\). To this end, we select nonanticipative elements

$$\hat\xi^{D_1,\ldots,D_{t},n}_{t}\in D_{t} \quad \mathrm{with} \quad |\hat\xi^{D_1,\ldots,D_{t},n}_{t}|\leq C\max\left\{1,\inf\{|y_{t}|:y_{t}\in D_{t}\}\right\}$$
((14.16))

for every \(n\in{\mathbb N}\), \(t\in\{1,\ldots,T\}\) and every set \(D_1\times\cdots\times D_T\in{\mathcal{D}}^{(n)}\), where the boundedness condition in (14.16) has to be satisfied for some fixed constant \(C\ge 1\). In this way we obtain a well-defined scenario

$$\hat\xi^{(n)}_{D_1\times\cdots\times D_T} :=\left(\hat\xi^{D_1,n}_1,\ldots,\hat\xi^{D_1,\ldots,D_T,n}_T\right)$$

for every \(n\in{\mathbb N}\) and \(D_1\times\cdots\times D_T\in{\mathcal{D}}^{(n)}\) and define an approximate stochastic process by

$$\xi^{(n)}(\omega) := \hat\xi^{(n)}_{D_1\times\cdots\times D_T} \ \mathrm{if} \ \omega\in\xi^{-1}(D_{1}\times\cdots\times D_{T})$$
((14.17))

and have \(\xi^{(n)}\) well-defined on Ω \({\mathbb P}\)-almost surely. The stochastic processes \(\xi^{(n)}\) are approximations of ξ in the following sense.

Proposition 3

Let \(\xi\in L_r(\varOmega,{\mathcal{F}},{\mathbb P};{\mathbb R}^s)\) and (C1), (C2), and (C3) be satisfied for each \(t=1,\ldots,T\). Then each stochastic process \(\xi^{(n)}\) defined by (14.17) is ξ-adapted and it holds

$$\lim_{n\rightarrow\infty}\|\xi - \xi^{(n)}\|_r = 0\,.$$

If, in addition, (A1), (A2), and (A3) are satisfied, \(r'\in[1,+\infty)\), \(S(\xi)\) is nonempty, and \({\mathcal{D}}_{t}^{(n+1)}\) is a refinement of \({\mathcal{D}}_{t}^{(n)}\) for each \(n\in{\mathbb N}\), one has

$$\lim_{n\rightarrow\infty}{D_{\rm f}}(\xi,\xi^{(n)})=0\,.$$

Proof

Due to the construction of \(\xi^{(n)}\), the sets

$$\left\{\xi^{-1}\left(D_1\times\cdots\times D_{t}\times \varXi_{t+1}\times\cdots\times\varXi_T \right)\,:\,D_{\tau}\in {\mathcal{D}}_{\tau}^{(n)}, \tau=1,\ldots,t\right\}$$

generate the σ-fields \({\mathcal{F}}_{t(\xi^{(n)})}\). Thus, it holds \({\mathcal{F}}_{t}(\xi^{(n)})\subseteq{\mathcal{F}}_{t}(\xi)\) according to (14.14) for every \(n\in{\mathbb N}\) and \(t=1,\ldots,T\).

Next we show the L r -convergence of \((\xi^{(n)})\). To this end, let

$$B_{\gamma_{n}}(0):= \left\{y\in\varXi:\max_{t=1,\ldots,T}|y_{t}|\le\gamma_n\right\}$$

the closed ball in Ξ around the origin with radius

$$\gamma_{n}:=n-\max_{t=1\ldots,T} \delta_{t,n}\,.$$

Then, by using (C1) and (C2) we obtain

$$\begin{array}{rcl}\big\|\xi-\xi^{(n)}\big\|_r^r &=& \int_\varOmega \big|\xi(\omega)-\xi^{(n)}(\omega)\big|^r {\mathbb P}(d\omega)\\ &=& \sum_{D_1\times\cdots\times D_T\in{\mathcal{D}}^{(n)}} \, \int_{D_1\times\cdots\times D_T} \, |\xi-\hat\xi_{D_1,\ldots,D_T}^{(n)}|^r P(d\xi)\\ &\le& c_{r}\sum_{D_1\times\cdots\times D_T\in{\mathcal{D}}^{(n)}} \, \int_{D_1\times\cdots\times D_T} \,\,\sum_{t=1}^T |\xi_{t}-\hat\xi_{t}^{D_1,\ldots,D_{t},n}|^r P(d\xi),\end{array}$$

where c r is a suitable (norm equivalence) constant. Splitting the integration interval with respect to \(B_{\gamma_{n}}(0)\) and its complement, using (C3) and the boundedness condition (14.16) allows to estimate

$$\begin{array}{rcl}\big\|\xi-\xi^{(n)}\big\|_r^r &\le& c_{r}\sum_{t=1}^T\,\, \delta_{t,n}^r + c_{r}\int_{\varXi \setminus B_{\gamma_{n}}(0)}\sum_{t=1}^T (|\xi_{t}|(1 + C))^r P(d\xi)\\ &\le& \hat{C} \left(\max_{t=1,\ldots,T} \delta_{t,n}^r + \int_{\varXi \setminus B_{\gamma_{n}}(0)} |\xi|^r P(d\xi)\right),\end{array}$$

where \(\hat{C}>0\) is some constant not depending on n. Because both summands of the last estimate tend to zero whenever n tends to infinity, the first part is proved. The second part is a consequence of Proposition 2.

Proposition 3 and Theorem 1 provide a convergence result for discretizations of (14.1), namely,

$$\lim_{n\rightarrow\infty}v(\xi^{(n)})=v(\xi)\,.$$
((14.18))

Of course, this is not surprising when recalling the convergence results for discretizations of multi-stage stochastic programs in Pennanen (2005), Pennanen (2009).

To determine the probabilities of the scenarios of \(\xi^{(n)}\) for some \(n\in{\mathbb N}\), one has to compute

$$P(D_1\times\cdots\times D_T)\quad\hbox{for every}\quad D_1\times\cdots\times D_T \in{\mathcal{D}}^{(n)}.$$

This might be difficult in general. However, if additional structure on ξ is available, the discretization scheme may be adapted such that the probabilities are computationally accessible. For example, let the stochastic process ξ be driven by a finite number of mutually independent \({\mathbb R}^{d_{t}}\)-valued random variables \(z_{t} \) with probability distributions P t , \(t=2,\ldots,T\), i.e.,

$$\xi_{t}=g_{t}(\xi_{1},\ldots,\xi_{t-1},z_{t}),$$

where the g t , \(t=2,\ldots,T\), denote certain measurable functions from \({\mathbb R}^{(t-1)d}\times{\mathbb R}^{d_{t}}\) to \({\mathbb R}^{d}\) (see, e.g., Kuhn (2005) and Pennanen (2009)). Then there exists a measurable function G such that \(\xi=G(z_{2},\ldots,z_{T})\). If \({\mathcal{D}}_{t}^{(n)}\) is now a partition of the support of z t in \({\mathbb R}^{d_{t}}\), \(t=2,\ldots,T\), then \(\xi^{(n)}\) may be defined by

$$\xi_{t}^{(n)}=g_{t}(\xi_{1}^{(n)},\ldots,\xi_{t-1}^{(n)},z_{t}^{(n)})\,,$$

where \(z_{t}^{(n)}\), \(t=2,\ldots,T\), has a finite number of given scenarios in every \(D_{t}\in{\mathcal{D}}_{t}^{(n)}\). The probability distribution of \(\xi^{(n)}\) is then known if \(P_{t}(D_{t})\) can be computed for all \(D_{t}\in{\mathcal{D}}_{t}^{(n)}\), \(t=2,\ldots,T\). This covers situations, where ξ is a Gaussian process or is given by certain time series models.

3.2 Convergence of Discrete and Empirical Measures

Our next result deals with convergence properties of discrete probability distributions.

Proposition 4

Let P be a probability distribution on \({\mathbb R}^{Td}\) supported by a finite set of scenarios \(\varXi = \{\xi^1,\ldots,\xi^N\}\subseteq{\mathbb R}^{Td}\) with positive probabilities \(p_i:=P(\{\xi^i\})\). Moreover, let \((P^{(n)})_{n\in{\mathbb N}}\) be a sequence of probability distributions on \({\mathbb R}^{Td}\) supported by Ξ with probabilities \(p_i^{(n)}:=P^{(n)}(\{\xi^i\})\) such that

$$\lim_{n\rightarrow\infty}p_i^{(n)}=p_i,\quad i=1,\ldots,N.$$

Then there exist random variables ξ and \((\xi^{(n)})_{n\in{\mathbb N}}\) defined on some probability space \((\varOmega,{\mathcal{F}},{\mathbb P})\) having probability distributions P and \(P^{(n)}\), \(n\in{\mathbb N}\), respectively, such that

$$\begin{array}{rcl}\hbox{(\rm{i})} &&\|\xi-\xi^{(n)}\|_r\rightarrow 0 \quad (n\rightarrow\infty)\quad \hbox{and}\\ \hbox{(\rm{ii})} &&{D_{\rm f}}^{*}(\xi,\xi^{(n)})\!=\!\sup_{x\in{\mathcal{B}}_{\infty}}\sum\nolimits_{t=2}^{T} \big\|{\mathbb E}[x_{t}|{\mathcal{F}}_{t}(\xi)]-{\mathbb E}[x_{t}|{\mathcal{F}}_{t}(\xi^{(n)})]\big\|_{r'}\!\rightarrow\! 0 \quad(n\rightarrow\infty),\end{array}$$

for all \(1\le r<\infty\) and \(1\leq r'< \infty\), where \({\mathcal{B}}_\infty\) is given by

$${\mathcal{B}}_\infty := \{x=(x_1,\ldots,x_T)\in L_{r'}(\varOmega,{\mathcal{F}},{\mathbb P};{\mathbb R}^m): |x_{t}(\omega)|\leq 1 ,\,{\mathbb P}\hbox{-a.s.}\}$$
((14.19))

and denotes the set of all essentially bounded functions.

Proof

The sequence \((P^{(n)})_{n\in{\mathbb N}}\) converges weakly to P on \({\mathbb R}^{Td}\). Hence, there exists a probability space \((\varOmega,{\mathcal{F}},{\mathbb P})\) and there exist \({\mathbb R}^{Td}\)-valued random variables ξ and \(\xi^{(n)}\), \(n\in{\mathbb N}\), defined on it with probability distributions P and \(P^{(n)}\), \(n\in{\mathbb N}\), respectively, such that it holds

$$\xi^{(n)}\rightarrow \xi \quad {\mathbb P}\hbox{-a.s.}$$

(see, e.g., Dudley (1989), theorem 11.7.1). Since the random variables are supported by the finite set Ξ, almost sure convergence also implies convergence in the rth mean, i.e.,

$$\lim_{n\rightarrow\infty}\|\xi-\xi^{(n)}\|_{r}=0\,.$$

It remains to show \(\lim_{n\rightarrow\infty}{D_{\rm f}}^{*}(\xi,\xi^{(n)})=0\). To this end, we introduce partitions \(\{E_{tk}\}_{k\in I_{t}}\) and \(\{E^{(n)}_{tk}\}_{k\in I_{t}}\) in Ω, which generate the σ-fields \({\mathcal{F}}_{t}(\xi)\) and \({\mathcal{F}}_{t}(\xi^{(n)})\), respectively. Let

$$\begin{array}{rcl}{E}_{tk}&:=&\{\omega\in\varOmega: (\xi_1,\ldots,\xi_{t})(\omega) = (\xi^k_1,\ldots,\xi^k_{t})\}, \quad k\in I_{t},\\ {E}_{tk}^{(n)}&:=&\{\omega\in\varOmega: (\xi_1^{(n)},\ldots,\xi_{t}^{(n)})(\omega) = (\xi^k_1,\ldots,\xi^k_{t})\}, \quad k\in I_{t},\end{array}$$

where \(I_{t}\subseteq\{1,\ldots,N\}\) denotes the index set of distinguishable scenarios at time \(t=2,\ldots,T\). We set

$$\bar{E}_{tk}^{(n)} := {E}_{tk}\cap {E}_{tk}^{(n)} \quad\hbox{and}\quad \bar\varOmega_{t}^{(n)} := \varOmega \setminus \cup_{k\in I_{t}} \bar{E}_{tk}^{(n)},$$

and observe that

$$\begin{array}{rcl}\int_{\varOmega}|\xi-\xi^{(n)}|^r{\mathbb P}(d\omega) &\geq& \int_{E_{tk}\setminus \bar{E}_{tk}^{(n)}}|\xi-\xi^{(n)}|^r {\mathbb P}(d\omega) + \int_{E_{tk}^{(n)}\setminus \bar{E}_{tk}^{(n)}}|\xi-\xi^{(n)}|^r {\mathbb P}(d\omega)\\ &\geq& C_\mathrm{min} \left({\mathbb P}(E_{tk}\setminus \bar{E}_{tk}^{(n)}) + {\mathbb P}(E_{tk}^{(n)}\setminus \bar{E}_{tk}^{(n)})\right),\end{array}$$

where \(C_\mathrm{min}:=\min\left\{|\xi^i-\xi^j|^r :i,j\in\{1,\ldots,N\},\xi^{i}\neq\xi^{j}\right\}\) denotes the minimal distance between two varying scenarios. Hence, by the L r -convergence of \(\xi^{(n)}\) we conclude for all \(k\in I_{t}\) that \({\mathbb P}(E_{tk}\setminus \bar{E}_{tk}^{(n)})\) and \({\mathbb P}(E_{tk}^{(n)}\setminus \bar{E}_{tk}^{(n)})\) tend to zero whenever n tends to infinity. Moreover, the latter implies that \({\mathbb P}(E_{tk}^{(n)})\) as well as \({\mathbb P}(\bar{E}_{tk}^{(n)})\) converge to \({\mathbb P}(E_{tk})\). Let \(p_{tk}={\mathbb P}(E_{tk})\) and \(p_{tk}^{(n)}={\mathbb P}(E_{tk}^{(n)})\). Then we have for any \(x\in{\mathcal{B}}_\infty\)

$$\begin{array}{rcl}\big\|{\mathbb E}[x_{t}|{\mathcal{F}}_{t}(\xi)]\!&-&\!{\mathbb E}[x_{t}|{\mathcal{F}}_{t}(\xi^{(n)})]\big\|_{r'}^{r'} = \int_\varOmega \big|{\mathbb E}[x_{t}|{\mathcal{F}}_{t}(\xi)]-{\mathbb E}[x_{t}|{\mathcal{F}}_{t}(\xi^{(n)})]\big|^{r'}{\mathbb P}(d\omega)\\ &=&\sum_{k\in I_{t}} \int_{\bar{E}_{tk}^{(n)}} \big|{\mathbb E}[x_{t}|{\mathcal{F}}_{t}(\xi)]-{\mathbb E}[x_{t}|{\mathcal{F}}_{t}(\xi^{(n)})]\big|^{r'}{\mathbb P}(d\omega)\\ && + \int_{\bar\varOmega_{t}^{(n)}} \big|{\mathbb E}[x_{t}|{\mathcal{F}}_{t}(\xi)]-{\mathbb E}[x_{t}|{\mathcal{F}}_{t}(\xi^{(n)})]\big|^{r'}{\mathbb P}(d\omega)\\\end{array}$$
$$\begin{array}{rcl}&\le& \sum_{k\in I_{t}} {\mathbb P}(\bar{E}_{tk}^{(n)})\left| \frac{\int_{{E}_{tk}}x_{t}{\mathbb P}(d\omega)}{p_{tk}}- \frac{\int_{{E}_{tk}^{(n)}}x_{t}{\mathbb P}(d\omega)}{p_{tk}^{(n)}} \right|^{r'} + 2 {\mathbb P}(\bar\varOmega_{t}^{(n)})\\ &\le& \sum_{k\in I_{t}} {\mathbb P}(\bar{E}_{tk}^{(n)})\left(\left| \frac{\int_{\bar{E}_{tk}^{(n)}}x_{t}(p_{tk}^{(n)}-p_{tk}){\mathbb P}(d\omega)}{p_{tk}^{(n)}p_{tk}}\right|\right.\\ && +\, \left.\left|\frac{\int_{{E}_{tk}\setminus\bar{E}_{tk}^{(n)}}p_{tk}^{(n)}x_{t}{\mathbb P}(d\omega)}{p_{tk}^{(n)}p_{tk}}\right| + \left|\frac{\int_{{E}_{tk}^{(n)}\setminus\bar{E}_{tk}^{(n)}}p_{tk}x_{t}{\mathbb P}(d\omega)}{p_{tk}^{(n)}p_{tk}} \right|\right)^{r'}\\ && +\, 2 {\mathbb P}(\bar\varOmega_{t}^{(n)})\\[2mm] &\leq& \sum_{k\in I_{t}} {\mathbb P}(\bar{E}_{tk}^{(n)}) \left(\frac{ {\mathbb P}(\bar{E}_{tk}^{(n)})|p_{tk}^{(n)}-p_{tk}|}{p_{tk}^{(n)}p_{tk}} +\frac{{\mathbb P}({E}_{tk}\setminus\bar{E}_{tk}^{(n)})}{p_{tk}}\right.\\ && +\, \left.\frac{{\mathbb P}({E}_{tk}^{(n)}\setminus\bar{E}_{tk}^{(n)})}{p_{tk}^{(n)}} \right)^{r'} + 2 {\mathbb P}(\bar\varOmega_{t}^{(n)})\,,\end{array}$$

where we used the almost sure boundedness \(|x_{t}|\le 1\). The latter estimate does not depend on x and due to the fact that all summands tend to zero, \({D_{\rm f}}^{*}(\xi,\xi^{(n)})\) converges to 0.

Proposition 4 will be used in the proof of Theorem 3 to compare two stochastic processes having identical scenarios, but different probabilities.

3.2.1 Empirical Distributions and Sampling

Let ξ be a Ξ-valued stochastic process defined on some probability space \((\varOmega,{\mathcal{F}},{\mathbb P})\) with induced distribution P. Furthermore, let \((\xi^{k})_{k\in{\mathbb N}}\) be a sequence of independent and identically distributed Ξ-valued random variables on some probability space \((\varOmega^*,{\mathcal{F}}^*,{\mathbb P}^*)\) such that \(P = {\mathbb P}^{*}\,\xi^{-1}_1\). We consider the random empirical measures

$$P^{(k)}(\omega^*)(B) := \frac1k \sum_{j=1}^k \delta_{\xi^j(\omega^*)}(B), \qquad n\in{\mathbb N},\,\omega^*\in\varOmega^*,B\in{\mathcal{B}}(\varXi),$$
((14.20))

where δ z denotes the probability measure on Ξ placing unit mass at \(z\in\varXi\). Then the sequence \((P^{(k)}(\omega^*))\) converges \({\mathbb P}^*\)-almost surely to P in the sense of weak convergence (see, e.g., Dudley (1989), chapter 11.4). The portmanteau theorem (e.g., Dudley (1989), theorem 11.1.1) offers several equivalent characterizations of weak convergence, one of them is recalled in the following for later use.

Proposition 5

Let \(( P^{(k)})\) be a sequence of empirical measures according to (14.20). Then it holds

$$\lim_{k\rightarrow\infty} P^{(k)}(\omega^*)(B) = P(B), \qquad\hbox{for all }B\in{\mathcal{B}}(\varXi) \ \hbox{with} \ P(\partial B)=0,$$

for \({\mathbb P}^*\)-almost every \(\omega^*\in \varOmega^*\), where ∂ B denotes the (topological) boundary of the Borel set B in the space \({\mathbb R}^s\).

Proposition 5 allows to estimate probabilities of Borel sets in \({\mathbb R}^s\) empirically, e.g., by sampling. In particular, the probability of any set belonging to a finite segmentation \({\mathcal{D}}^{(n)}\) of Ξ (see (14.15)) can be estimated by sampling if its boundary has Lebesgue measure zero and P is absolutely continuous.

3.3 Application to Scenario Tree Construction

The following conceptual algorithm represents a general approach to constructing scenario trees for multi-stage stochastic programs.

Algorithm 1

Let ξ be the original Ξ-valued stochastic input process of the stochastic program (14.1) defined on some probability space \((\varOmega,{\mathcal{F}},{\mathbb P})\) and let P be the probability distribution of ξ.

  • Step [1]: Determine a sequence of finite segmentations \({\mathcal{D}}^{(n)}\) in Ξ such that assumptions (C1), (C2), and (C3) are satisfied (cf. Sect. 14.3.1) and choose a reasonably large \(n\in{\mathbb N}\).

  • Step [2]: Determine the empirical measure \(P^{(k)}\) based on k independent and identically P-distributed random variables.

  • Step [3]: Compute the probabilities \(p^{(n,k)}=P^{(k)}(D_1\times\cdots\times D_T)\) for every \(D_1\times\cdots\times D_T\in{\mathcal{D}}^{(n)}\) according to formula (14.20).

  • Step [4]: Choose nonanticipative scenarios whose tth components belong to Dt for any \(D_1\times\cdots\times D_T\in{\mathcal{D}}^{(n)}\) according to (14.16).

  • Step [5]: Finally, define the stochastic scenario tree process \({\xi_{\rm tr}}:={\xi_{\rm tr}}^{(n,k)}\) with scenarios chosen in step 4 endowed with the empirical probabilities \(p^{(n,k)}\) from step 3.

Next we study the asymptotic behavior of the approximate scenario trees \(({\xi_{\rm tr}}^{(n,k)})_{n,k\in{\mathbb N}}\) constructed by Algorithm 1. Note that the parameters n and k measure the quality of the discretization \({\mathcal{D}}^{(n)}\) of Ξ and of the empirical probabilities, respectively.

Theorem 3

Let (A1), (A2), and (A3) be satisfied and X 1 be nonempty and bounded. Let \(1\leq r'<\infty\) and assume that for some constant \(C>0\) the estimate

$${D_{\rm f}}(\hat{\xi},\tilde{\xi})\leq C{D_{\rm f}}^{*}(\hat{\xi},\tilde{\xi})$$
((14.21))

holds for all \(\hat{\xi}\) and \(\tilde{\xi}\) in a neighborhood of ξ in Lr. Assume that the sequence \(({\xi_{\rm tr}}^{(n,k)})_{n,k\in{\mathbb N}}\) is constructed by Algorithm 1. Furthermore, assume for the sequence \(({\mathcal{D}}^{(n)})_{n\in{\mathbb N}}\) of partitions of Ξ that \({\mathcal{D}}^{(n+1)}\) is a refinement of \({\mathcal{D}}^{(n)}\) and that \(P_{t}(\partial D_{t})=0\) for all \(D_{t}\in {\mathcal{D}}_{t}^{(n)}\) (\(t=1,\ldots,T\)). Then it holds

$$\lim_{n\rightarrow\infty}\left(\lim_{k\rightarrow\infty}v({\xi_{\rm tr}}^{(n,k)})\right) \,=\, v(\xi)\quad P^{*}\hbox{-almost surely},$$
((14.22))

where \(v(\xi_{tr}^{(n,k)})\) and \(v(\xi)\) denote the optimal values of the stochastic program (14.1) with input \({\xi_{\rm tr}}^{(n,k)}\) and ξ, respectively, and \((\varOmega^*,{\mathcal{F}}^*,{\mathbb P}^*)\) is the probability space on which the random empirical measures \(P^{(k)}\), \(k\in{\mathbb N}\), are defined.

Proof

We use the constructions and notations of Section 14.3.1 and consider, for each \(n\in{\mathbb N}\), the (discrete) stochastic processes \(\xi^{(n)}\) possessing the scenarios \(\hat\xi^{(n)}_{D_1\times\cdots\times D_T}\) with probabilities \(P(D_1\times\cdots\times D_T)\) for every \(D_1\times\cdots\times D_T\in{\mathcal{D}}^{(n)}\). Due to Proposition 3 the processes \(\xi^{(n)}\) are ξ-adapted and it holds

$$\lim_{n\rightarrow\infty}\|\xi-\xi^{(n)}\|_{r}=0\quad\hbox{and} \lim_{n\rightarrow\infty}{D_{\rm f}}(\xi,\xi^{(n)})=0\,.$$

Hence, we obtain from (14.5) in Theorem 1

$$|v(\xi)-v(\xi^{(n)})|\leq \varepsilon_n$$

for some sequence \((\varepsilon_{n})\) tending to zero as \(n\rightarrow \infty\).

Let \(\hat{\varOmega}^{*}\) be the subset of \(\varOmega^{*}\) such that \(P^{*}(\hat{\varOmega}^{*})=1\) and the sequence \((P^{(k)}(\omega^{*}))\) of random empirical measures converges weakly for every \(\omega^{*}\in\hat{\varOmega}^{*}\).

Now, let \(\omega^{*}\in\hat{\varOmega}^{*}\) and let \({\xi_{\rm tr}}^{(n,k)}={\xi_{\rm tr}}^{(n,k)}(\omega^{*})\) be the process determined by Algorithm 1. We will show that for any large \(n\in{\mathbb N}\) there exists \(k(n)\in{\mathbb N}\) (depending on ω *) such that

$$|v(\xi^{(n)}) - v({\xi_{\rm tr}}^{(n,k)}(\omega^*))| \leq \frac1n$$

for all \(k\geq k(n)\). Then the triangle inequality would imply

$$|v(\xi) - v({\xi_{\rm tr}}^{(n,k)}(\omega^*))| \leq \varepsilon_n + \frac1n \qquad(k\geq k(n))$$

and, hence, the proof was complete.

To this end, let \(n\in{\mathbb N}\) be sufficiently large and fixed. We consider the processes \(\xi^{(n)}\) and \({\xi_{\rm tr}}^{(n,k)}\) \((k\in{\mathbb N})\) and observe that both processes possess identical scenarios which do not depend on k. In general, only the probabilities

$$p_{D_1\times\cdots\times D_T}^{(n)}= P(D_1\times\cdots\times D_T)\hbox{ and } p_{D_1\times\cdots\times D_T}^{(n,k)}= P^{(k)}(\omega^{*})(D_1\times\cdots\times D_T)$$

associated with scenario \(\hat\xi^{(n)}_{D_1\times\cdots\times D_T}\) are different.

It holds \(P(\partial(D_{1}\times\cdots\times D_{T}))\le\sum_{t=1}^{T} P_{t}(\partial D_{t})=0\) since for every boundary point x of \(D_{1}\times\cdots\times D_{T}\) there exists \(t\in\{1,\ldots,T\}\) such that \(x_{t}\in\partial D_{t}\). Hence, Proposition 5 implies

$$p_{D_1\times\cdots\times D_T}^{(n,k)}\rightarrow p_{D_1\times\cdots\times D_T}^{(n)} \qquad(k\rightarrow\infty),$$

and Proposition 4 yields the existence of a common probability space such that

$$\|\xi^{(n)} - {\xi_{\rm tr}}^{(n,k)}\|_r \rightarrow 0 \quad\hbox{and}\quad {D_{\rm f}}^{*}(\xi^{(n)},{\xi_{\rm tr}}^{(n,k)}) \rightarrow 0 \quad (k\rightarrow\infty).$$

Then estimate (14.21) implies that the inequality

$${D_{\rm f}}(\xi^{(n)},{\xi_{\rm tr}}^{(n,k)})\leq C {D_{\rm f}}^{*}(\xi^{(n)},{\xi_{\rm tr}}^{(n,k)})$$

holds for large n and k. By making use of Theorem 1 (applied to \(\xi^{(n)}\) instead of ξ), we obtain

$$|v(\xi^{(n)}) - v({\xi_{\rm tr}}^{(n,k)})|\leq L(\xi^{(n)})\big(\|\xi^{(n)} - {\xi_{\rm tr}}^{(n,k)}\|_r +{D_{\rm f}}^{*}(\xi^{(n)},{\xi_{\rm tr}}^{(n,k)})\big)$$

for some constant \(L(\xi^{(n)})\) and all sufficiently large \(k\in{\mathbb N}\). This implies

$$|v(\xi^{(n)}) - v({\xi_{\rm tr}}^{(n,k)})| \rightarrow 0 \qquad (k\rightarrow\infty)$$

and, in particular, the existence of \(k(n)\) such that

$$|v(\xi^{(n)}) - v({\xi_{\rm tr}}^{(n,k)})|\leq \frac1n$$

for all \(k\geq k(n)\).

We note that the limits in (14.22) cannot be interchanged. This may be interpreted such that it makes no sense to choose n very large, i.e., to choose a very fine partition of Ξ if for some reason k is not sufficiently large. In Section 14.4 we will discuss a variant of the general scenario tree construction approach provided by Algorithm 1 that is based on successive scenario reduction.

4 Scenario Tree Construction Based on Scenario Reduction

Next we discuss a method for generating scenario trees which is developed in Heitsch and Römisch (2009a) and motivated by Algorithm 1. It is based on a procedure of successive scenario reduction and bundling steps for increasing time stages applied to a sufficiently large scenario set. The latter is typically obtained by sampling from the underlying distribution. The stage-wise reduction may be viewed as a simultaneous realization of steps 1, 3, and 4 of Algorithm 1. Before describing the details, we briefly recall the ideas of optimal scenario reduction.

4.1 Optimal Scenario Reduction

The basic idea of scenario reduction consists in determining a (nearly) best approximation in terms of a suitable probability metric of the underlying discrete probability distribution by a probability measure with smaller support. The metric is associated with the stochastic programming model in a natural way such that the model behaves stable with respect to changes of the probability distribution. Such natural metrics are provided in Römisch (2003) for several classes of stochastic programs.

Originally, the concept of scenario reduction was developed in Dupačová et al. (2003) and Heitsch and Römisch (2003). More recently, it has been improved for two-stage models in Heitsch and Römisch (2007) and extended to mixed-integer and chance-constrained models in Henrion et al. (2008), Henrion et al. (2009) as well as to multi-stage models in Heitsch and Römisch (2009b). The concept does not impose special conditions on the underlying probability distribution except the existence of certain moments.

Scenario reduction aims at reducing the number of scenarios in an optimal way. If ξ is a given random vector on some probability space \((\varOmega,{\mathcal{F}},{\mathbb P})\) with finite support, i.e., represented by the scenarios \(\xi^i\) and probabilities p i, \(i=1,\ldots,N\), then one may be interested in finding a suitable index subset \(J\subset\{1,\ldots,N\}\) and a new random vector \(\hat\xi\) supported only by the scenarios \(\xi^j\), \(j\notin J\), such that \(\hat\xi\) is the best approximation to ξ. Here, we consider the norm \(\|\cdot\|_{r}\) in L r as the natural distance function.

If J is given, the best approximation to ξ can be given explicitly. To show this, let \(A_{i}=\xi^{-1}(\xi^{i})\in{\mathcal{F}}\), \(i=1,\ldots,N\). Then

$$\|\xi-\hat{\xi}\|_{r}^{r}=\sum_{i=1}^{N}\int_{A_{i}}|\xi^{i}-\xi^{j(i)}|^{r} {\mathbb P}(d\omega)=\sum_{i=1}^{N}p_{i}|\xi^{i}-\xi^{j(i)}|^{r}$$

for some mapping \(j:\{1,\ldots,N\}\to\{1,\ldots,N\}\setminus J\) and the best approximation problem reads

$$\min\left\{\sum _{i=1}^N p_i|\xi^i-\xi^{j(i)}|^r\Big|\,j:\{1,\ldots,N\}\to\{1,\ldots,N\}\setminus J\right\}.$$
((14.23))

Since the lower bound

$$\sum_{i=1}^{N}p_{i}|\xi^{i}-\xi^{j(i)}|^{r}\ge\sum_{i\in J}p_{i}\min_{j\notin J} |\xi^{i}-\xi^{j}|^{r}$$

is always valid, the minimum in (14.23) is attained for the mapping \(j:\{1,\ldots,N\}\to\{1,\ldots,N\}\setminus J\) defined by

$$j(i) \in\arg\min_{j\notin J} |\xi^i-\xi^j|\,,\qquad i\in J.$$

Hence, the best approximation \(\hat{\xi}\) is supported by the scenarios ξ j with probabilities q j, \(j\notin J\), where

$$\|\xi-\hat\xi\|_{r}^{r}=\sum_{i\in J}p_{i}\min_{j\notin J}|\xi^{i}-\xi^{j}|^{r}\,,\\[2mm]$$
((14.24))
$$q_j = p_j + \sum_{ {i\in J}\atop{j(i)=j}} p_i\,.$$
((14.25))

In other words, the redistribution rule (14.25) consists in assigning the new probability to a preserved scenario to be equal to the sum of its former probability and of all probabilities of deleted scenarios that are closest to it.

Finding the optimal index set J, say, with prescribed cardinality, such that it solves the combinatorial optimization problem

$$\min\left\{\sum_{i\in J}p_{i}\min_{j\notin J}|\xi^{i}-\xi^{j}|^{r}: J\subseteq\{1,\ldots,N\},\,|J|=N-n\right\}\quad(1\le n <N),$$

is much more complicated. The latter problem represents a metric k-median problem which is known to be NP-hard, hence (polynomial-time) approximation algorithms and heuristics become important. Simple heuristics may be derived from formula (14.24) for the approximation error. The results are two heuristic algorithms to compute nearly optimal index sets J with given cardinality \(N-n\).

Algorithm 2

(Forward selection)

[Initialization]

Set \(J:=\{1,\ldots,N\}\).

[Index Selection]

Determine an index \(l\in J\) such that

$$l\in\arg\min_{u\in J} \sum_{k\in J\setminus\{u\}} p_k \min_{j\notin J\setminus\{u\}}|\xi^k-\xi^j|^{r}$$

and set \(J:=J\setminus\{l\}\). If the cardinality of J equals \(N-n\) go to the termination step. Otherwise continue with a further index selection step.

[Termination]

Determine scenarios \(\xi^{j} \), \(j\notin J\), and apply the redistribution rule (14.25) for the final index set J.

Algorithm 3

(Backward reduction)

[Initialization]

Set \(J:=\emptyset\).

[Index Selection]

Determine an index \(u\notin J\) such that

$$u\in\arg\min_{l\notin J} \sum_{k\in J\cup\{l\}} p_k \min_{j\notin J\cup\{l\}} |\xi^k-\xi^j|^{r}$$

and set \(J:=J\cup\{u\}\). If the cardinality of J equals \(N-n\) go to the termination step. Otherwise continue with a further index selection step. [Termination]

Determine scenarios \(\xi^{j} \), \(j\notin J\), and apply the redistribution rule (14.25) for the final index set J.

Optimal scenario reduction allows two interesting interpretations. The first is that it may actually be considered as a problem of optimal quantization of probability measures (in the sense of Graf and Luschgy (2000)) when applied to a discrete probability measure (with N atoms) and using the rth order Wasserstein distance (see also Heitsch and Römisch (2009a), lemma 2.1). Second, scenario reduction leads to a canonical decomposition of the sample space \({\mathbb R}^{s}\). To illustrate this fact assume that \(\{\xi^{j}:j\notin J\}\) has been computed to reduce the original scenario set \(\{\xi^{1},\ldots,\xi^{N}\}\) contained in \({\mathbb R}^{s}\). Then the so-called Voronoi regions defined by

$$V(\xi^j) := \left\{ \xi\in{\mathbb R}^{s} : |\xi-\xi^j|<\min_{k\notin J\cup\{j\}}|\xi-\xi^k|\right\}, \quad j\notin J$$

represent disjoint subsets of \({\mathbb R}^{s}\). It is known that for strictly convex norms \(|\cdot|\) the union of the closures of \(V(\xi^{j})\) cover \({\mathbb R}^{s}\) and the boundaries \(\partial V(\xi^j)\) have Lebesgue measure \(\lambda^{s} \) zero. The latter holds for the l p -norms with \(1 <p<\infty\) and for p = 2 Voronoi regions are even convex (see Graf and Luschgy (2000), section 1). Voronoi regions are a suitable choice for the sets D t in Section 14.3.1.

Figure 14.1 shows the Voronoi decomposition of the space R 2 obtained by scenario reduction starting from \(N=1\,000\) samples from the two-dimensional standard normal distribution computed with Algorithm 2.

Fig. 14.1
figure 14_1_192804_1_En

Illustration of scenario reduction starting from 1000 sample scenarios of the two-dimensional standard normal distribution (left) reduced to 100 scenarios (middle) and further reduced to 50 scenarios (right). Displayed are the corresponding decompositions of \({\mathbb R}^{2}\) into Voronoi regions obtained by the forward selection algorithm with respect to the Euclidean norm

4.2 Scenario Tree Construction

The idea of the tree construction method is to apply the scenario reduction techniques to a set of scenarios successively for increasing and decreasing time stages, respectively. This leads to forward or backward variants of a tree generation method that aims at recovering the original information structure approximately. Next we present a detailed description of the forward variant, the backward approach may be found in Heitsch and Römisch (2009a).

In the following, let \(I:=\{1,\ldots,N\}\) be the index set of the given set of scenarios ξ i. Then the successive scenario reduction technique is applied to the time horizons \(\{1,\ldots,t\}\) with increasing time \(t\in\{1,\ldots,T\}\). It computes partitions of I of the form

$${\mathcal{C}}_{t}:=\{C_{t}^1,\ldots,C_{t}^{k_{t}}\}\,, \qquad k_{t}\in{\mathbb N},$$

successively such that

$$C_{t}^k\cap C_{t}^{k'}=\emptyset\quad\hbox{for}\quad k\neq k'\,,\quad\hbox{and} \quad\bigcup\limits_{k=1}^{k_{t}}C_{t}^k =I$$

holds for every t. The elements of a partition \({\mathcal{C}}_{t}\) are called (scenario) clusters. The following algorithm allows to generate different scenario tree processes depending on the parameter settings for the reductions in each step (Fig. 14.2).

Fig. 14.2
figure 14_2_192804_1_En

Demonstration of the forward tree construction for an example containing \(T=5\) time periods. Displayed are the stepwise changes of the scenarios tree structure starting with a fan of individual scenarios

Algorithm 4

(Forward construction)

[Initialization]

Define \(C_1=\{I\}\)

and set \(t:=2\).

[Cluster computation]

Let \(C_{t-1}=\{C_{t-1}^1,\ldots,C_{t-1}^{k_{t-1}}\}\). For every \(k\in\{1,\ldots,{k_{t-1}}\}\) apply scenario reduction to the scenario subsets \(\{\xi^{i}_{t}\}_{i\in C_{t-1}^k}\) (at time t). This yields disjoint subsets of remaining and deleted scenarios \(I_{t}^k\) and \(J_{t}^k\), respectively. Next, obtain the mappings \(j_{t}^k:J_{t}^k\rightarrow I_{t}^k\) such that

$$j_{t}^k(i)\in \arg\min_{j\in I_{t}^k} |\xi^{i}_{t}-\xi^{j}_{t}|\,,\quad i\in J_{t}^k\,,$$

according to the reduction procedure (cf. Section 14.4.1). Finally, define an overall mapping \(\alpha_{t}:I\rightarrow I\) by

$$\alpha_{t}(i)=\left\{\begin{array}{cl} j_{t}^k(i), &\,i\in J_{t}^k\,\hbox{for some} \ k=1,\ldots,k_{t-1},\\ i &\, \hbox{otherwise}.\end{array}\right.$$
((14.26))

A new partition at t is defined now by

$${C}_{t}:=\left\{\left.\alpha_{t}^{-1}(i)\,\right|\,i\in I_{t}^k,\,k=1,\ldots,k_{t-1}\right\}$$

which is in fact a refinement of the partition \(C_{t-1}\). If \(t <T\) set \(t:=t+1\) and continue with a further cluster computation step, otherwise go to the termination step.

[Termination]

According to the partition set C T and mappings (14.26) define a scenario tree process \({\xi_{\rm tr}}\) supported by the scenarios

$${\xi_{\rm tr}}^k = \left( \xi_1^*,\xi_2^{\alpha_2(i)},\ldots, \xi_{t}^{\alpha_{t}(i)},\ldots,\xi_T^{\alpha_T(i)}\right)\quad \hbox{for any}\quad i\in C_T^k\,,$$

and probabilities \(q_k:=\sum\limits_{i\in C_T^k}p_i\), for each \(k=1,\ldots,k_T\).

Both heuristic algorithms from Section 14.4.1 may be used to compute the scenario reduction within every cluster computation step. According to (14.24) the error of the cluster computation step t is

$$\mathrm{err}_{t}:=\sum_{k=1}^{k_{t-1}}\sum_{i\in J_{t}^{k}}p_{i}\min_{j\in I_{t}^{k}} |\xi_{t}^{i}-\xi_{t}^{j}|^{r}.$$

Furthermore, as shown in Heitsch (2007, proposition 6.6), the estimate

$$\|\xi-{\xi_{\rm tr}}\|_{r}\le \left(\sum_{t=2}^{T}\mathrm{err}_{t}\right)^{\frac{1}{r}}$$

holds for the total approximation error. The latter estimate allows to control the construction process by prescribing tolerances ε t for err t for every \(t=2,\ldots,T\).

5 Application to Electricity Management

The deregulation of energy markets has lead to an increased awareness of the need for profit maximization with simultaneous consideration of financial risk, adapted to individual risk aversion policies of market participants. Mathematical modeling of such optimization problems with uncertain input data results in large-scale stochastic programming models with a risk functional in the objective. When considering a medium-term planning horizon, one is faced with consecutive decisions based on consecutive observations, thus, the stochastic programs need to be multi-stage.

Fig. 14.3
figure 14_3_192804_1_En

Yearly scenario profiles of the trivariate stochastic process with components electricity demand (top), spot prices (center), and heat demand (bottom)

Next we report on some experiences with constructing scenario trees for a multi-stage stochastic optimization model that is tailored to the requirements of a typical German municipal power utility, which has to serve an electricity demand and a heat demand of customers in a city and its vicinity. The power utility owns a combined heat and power (CHP) facility that can serve the heat demand completely and the electricity demand partly. Further electricity can be obtained by purchasing volumes for each hour at the (day ahead) spot market of the European Energy Exchange (EEX) or by signing a supply contract for a medium-term horizon with a larger power producer. The latter possibility is suspected to be expensive, but relying on the spot market only is known to be extremely risky. Spot price risk, however, may be reduced by obtaining electricity futures at EEX. The optimization aims to maximize the mean overall revenue and, simultaneously, to minimize a risk functional on a basis of a hourly discretized optimization horizon of 1 year. Details of the optimization model can be found in Eichhorn et al. (2005).

Table 14.1 Dimension of simulation scenarios
Table 14.2 Results of Algorithm 4 for yearly demand-price scenario trees
Fig. 14.4
figure 14_4_192804_1_En

Yearly demand–price scenario trees obtained by Algorithm 4

Electricity demand and heat demand as well as spot and future prices are not known in advance, but statistical information is available due to historical observations (Fig. 14.3). A very heterogeneous statistical model is employed. It is adapted to historical data in a rather involved procedure. It consists of a cluster classification for the intra-day (demand and price) profiles and a three-dimensional time series model for the daily average values. The latter consists of deterministic trend functions and a trivariate ARMA model for the (stationary) residual time series; see Eichhorn et al. (2005) for further details. An arbitrary number of three-dimensional sample paths (scenarios) can easily be obtained by simulating white noise processes for the ARMA model and by adding on the trend functions and matched intra-day profiles from the clusters afterward. However, such a bunch of sample paths does not reflect the information structure in multi-stage stochastic optimization, i.e., it neglects the fact that information is revealed gradually over time. For this reason, finally, trivariate scenario tree processes have been computed by the approach of recursive forward scenario reduction (see Section 14.4). Table 14.1 displays the size of the three-dimensional (electricity demand, heat demand, and spot price) scenarios which serve as inputs for the tree construction (Algorithm 4). We performed a couple of test series for generating scenario trees. Due to the fact that electricity future products can only be traded monthly, branching was allowed only at the end of each month which leads to scenario trees of at most 12 stages. Because stochastic data enter both the objective and right-hand sides of the model Algorithm 4 is used with \(r=r'=2\) (cf. (14.2)). Moreover, different relative reduction levels \(\varepsilon_\mathrm{rel}\) have been chosen. The relative levels are given by

$$\varepsilon_\mathrm{rel}:=\frac{\varepsilon}{\varepsilon_{\max}}\qquad\hbox{and}\qquad \varepsilon_{\mathrm{rel},t} := \frac{\varepsilon_{t}}{\varepsilon_{\max}},$$

where \(\varepsilon_\mathrm{max}\) is given as the best possible L r -distance of the stochastic process represented by all scenarios and their probabilities and of one of its scenarios endowed with probability 1. The individual tolerances ε t at branching points is computed such that

$$\varepsilon_{t}^r= \frac{2\varepsilon^r}{T-1}\left(q+(1-2q)\frac{t-2}{T-2}\right), \quad t=2,\ldots,T,$$
((14.27))

where \(q\in[0,1]\) is a parameter that affects the branching structure of the constructed trees. Note that a value \(q<\frac12\) generates a sequence \(\varepsilon_{t}^r\) with linear growth while \(q>\frac12\) results in a decreasing sequence \(\varepsilon_{t}^r\), \(t=1,\ldots,T\).

Table 14.2 displays the results of our test runs with different relative reduction levels. As expected, for very small reduction levels, the reduction affects only a few scenarios. Furthermore, the number of nodes decreases considerably if the reduction level is increased. The computing times of less than 30 s already include approximately 20 s for computing distances of all scenario pairs that are needed in all calculations.

Figure 14.4 illustrates the scenario trees obtained for reduction levels of 40 and 55%, respectively. Observe that in all computations the maximal number of 12 stages is not reached even at higher reduction levels. This phenomenon could be caused by the low level of heat demand during the summer period (see Fig. 14.3) such that branching occurs less frequently.