1 Introduction

Decomposition methods are the state-of-the-art tools when solving large-scale stochastic (linear) programming problems. The idea of Benders [2] of dynamically approximating value functions and feasible regions of relaxed problems is appealing and inspired the development of various algorithms. When decomposing the problems with respect to time periods, then these methodologies utilize dynamic programming techniques. As such, they exhibit various curses-of-dimensionality, e.g., resulting from the state space or the scenario space, cf. Powell [48]. In order to overcome these curses—at least practically—sampling-based Benders decomposition methods have been proposed by Pereira and Pinto [44].

If the stochastic process governing the uncertainty in the mathematical programming problem is stage-wise independent, then the complexity arising from the number of scenarios is present primarily in the generation of upper bounds, when applying Benders decomposition algorithms towards a minimization problem. Sampling-based methods overcome this curse-of-dimensionality by sampling a rather small number of scenarios out of (potential) exponentially many. This way, an estimate of the upper bound can be computed along with an estimate of the confidence interval. Furthermore, a curse resulting from the state space might be avoided as well by exploiting information gained from the sampling procedure. As stage-wise independence is an unrealistic assumption for many real world problems, extensions to incorporate linear dependence on previous stage(s) have been developed. However, for minimization problems, the lower bounds derived from sampling-based methodologies also rely on the finiteness of the distributions of the uncertain data.

Working directly with scenario trees has the great advantage that no assumptions on the distributions of the data and their correlation (inter-stage and inter-state) have to be made. To obtain a tree of manageable size, so-called scenario reduction techniques have been developed, see Sect. 3. However, it remains a difficult task to obtain a reasonable representation of the random variables which is statistically valid and does not exhibit exponential grows in the number of stages.

Sampling-based and scenario-based approaches applied to stochastic optimization were separate entities for a long time. In this paper, we propose to unify both approaches to exploit their individual strengths and to overcome their individual weaknesses, see Sect. 3.6.

The main contributions of this paper are twofold: (1) we develop nested Benders decomposition algorithms which incorporate both scenario trees and sampling approaches for stage-wise independent and stage-wise dependent processes, and we prove the correctness of the obtained algorithms; and (2) we apply the methodology to hydro-thermal scheduling, incorporating both fuel price and/or electricity demand uncertainty and inflow uncertainty. We demonstrate the usefulness of the methodology with a case study for Panama’s power system. Furthermore, we provide a classification of different types of uncertainties present at hydro-thermal scheduling problems.

The remainder of the paper is organized as follows. We summarize multi-stage stochastic programming formulations and fundamentals in Sect. 2, present different nested Benders decomposition algorithms in Sect. 3, and apply the developed methodology towards a least-cost hydro-thermal scheduling problem in Sect. 4. We conclude with Sect. 5.

We use the following convention throughout the paper. Roman letters \(x,y,u,v,g\) and \(s\) as well as the Greek letter \(\eta \) and \(\delta \) are decision variables; other Roman letters are data. The Greek letters \(\pi \) and \(\overline{\pi }\) represent duals. We omit indices if they are clear in the context; transposition signs for vectors and matrices are consistently omitted as well. The notation is summarized in Appendix 6.

2 Multi-stage stochastic linear programming problem (MSLP)

Let the triplet \((\varOmega ,\fancyscript{A},P)\) denote a probability space with the universe of outcomes \(\varOmega \), sigma-algebra \(\fancyscript{A}\) and probability measure \(P\) along with a real-valued random vector \(\xi \). The definition of the random vector and the probability space imply that \(\xi : \varOmega \rightarrow \mathbb {R}\) and \(\{ \omega : \xi (\omega ) \le r\} \in \fancyscript{A}\) for all \(r \in \mathbb {R}\); \(\omega \in \varOmega \) denotes a possible event. We assume that the cumulative distribution function \(F_{\xi }(x)\) for the random vector \(\xi \) is known, i.e., \(F_\xi (x) = P(\xi \le x)\) which is the probability of the events \(\{ \omega : \xi (\omega ) \le x\}\). Let \(\xi _t\) be the random vector with the corresponding universe \(\varOmega _t\) for stage \(t\) with \(t = 2, \ldots , T\); \(\varOmega _t | \omega _{t-1}\) is the conditional set of outcomes for stage \(t\), given that event \(\omega _{t-1} \in \varOmega _{t-1}\) occurred. Then \(\xi = \left( \xi _2, \ldots , \xi _T \right) \).

We are interested in solving the following multi-stage stochastic linear programming problem in the general form

$$\begin{aligned} z^* := \min&c_1 x_1 + \mathbb {E}_{\xi } \left[ \min \ c_2( \omega _2) x_2 (\omega _2) + \mathbb {E}_{\xi } \left[ \cdots \right. \right. \nonumber \\&\qquad \qquad \qquad \qquad \left. \left. + \mathbb {E}_{\xi } \left[ \min \ c_T( \omega _T) x_T (\omega _T) \right] \cdots \right] \right] \end{aligned}$$
(1)
$$\begin{aligned} \mathrm{s.t.}&W_1 x_1 = h_1\end{aligned}$$
(2)
$$\begin{aligned}&T_1(\omega _2) x_1 + W_2(\omega _2) x_2 (\omega _2) = h_2 (\omega _2) \quad \quad \forall \omega _2 \in \varOmega _2\end{aligned}$$
(3)
$$\begin{aligned}&\quad \quad \vdots \nonumber \\&T_{T-1}(\omega _T) x_{T-1}(\omega _{T-1}) + W_T(\omega _T) x_T (\omega _T) = h_T (\omega _T) \nonumber \\&\qquad \qquad \qquad \quad \forall \omega _{T-1} \in \varOmega _{T-1}, \;\; \omega _T \in \varOmega _T | \omega _{T-1}\end{aligned}$$
(4)
$$\begin{aligned}&x_1, \, x_t(\omega _t) \ge 0 \quad \quad \forall t = 2, \ldots , T, \end{aligned}$$
(5)

with vectors \(c_t \in \mathbb {R}^{n_t}\) and \(h_t \in \mathbb {R}^{m_t}\), and \(m_t \times n_t\)-matrices \(T_t\) and \(m_t \times n_t\)-matrices \(W_t\); \(x_t\) is \(n_t\)-dimensional decision variable. The data \(c_t, h_t, W_t\) and \(T_t\) are unknown for \(t = 2, \ldots , T\). Constraints (3)–(5) are to be met almost surely.

The mathematical programming problem (1)–(5) seeks an optimal decision \(x_1\) which hedges against the uncertainty related to parameters affecting the \(x_t\) variables for stages \(t = 2, \ldots , T\). The paradigm is that the assignment of the \(x_t\) variables (for stages \(t=2, \ldots , T\)) can be postponed until the uncertainty corresponding to stage \(t\) has unfolded—in contrast to decision \(x_1\). Optimality of \(x\) is obtained at any minimum of objective function (1), i.e., the sum of the cost associated with decision \(x_1\) and the expected cost resulting from optimal adjustments \(x_t\) to decision \(x_1 (t = 2, \ldots , T)\).

2.1 Deterministic equivalent

Let the \(T\) -stage value function be

$$\begin{aligned} Q_{T}(x_{T-1},\omega _{T}) := \min&c_{T}(\omega _T) x_{T}\\ s.t.&W_{T}(\omega _T) x_{T} = h_{T}(\omega _T) - T_{T-1}(\omega _T) x_{T-1}\\&x_{T} \ge 0. \end{aligned}$$

In the last stage, \(T\), we require the value of the decision variable \(x_{T-1}\) of the previous stage, and one realization \(\omega _T\) of the random vector \(\xi \), in order to obtain \(Q_{T}(x_{T-1},\omega _T)\).

The expected \(t+1\) -stage value function is then given by

$$\begin{aligned} \fancyscript{Q}_{t+1}(x_t) := \mathbb {E}_{\xi }\big [Q_{t+1}(x_t,\omega _{t+1})\big ]\qquad t=2, \ldots , T-1. \end{aligned}$$
(6)

For the \(t\) -stage value function \((t=2, \ldots , T-1)\), we obtain

$$\begin{aligned} Q_{t}(x_{t-1},\omega _{t}) := \min&c_{t}(\omega _t) x_{t} + \fancyscript{Q}_{t+1}(x_t)\\ s.t.&W_{t}(\omega _t) x_{t} = h_{t}(\omega _t) - T_{t-1}(\omega _t) x_{t-1}\\&x_{t} \ge 0. \end{aligned}$$

The (expected) \(t+1\)-stage value function is also called (expected) future value function, or in the context of SDDP, cost-to-go function or future cost function.

This enables us to re-write the MSLP (1)–(5) as its deterministic equivalent

$$\begin{aligned} z^* = \min&c_{1} x_{1} + \fancyscript{Q}_2(x_1) \end{aligned}$$
(7)
$$\begin{aligned} s.t.&W_{1} x_{1} = h_{1} \end{aligned}$$
(8)
$$\begin{aligned}&x_{1} \ge 0. \end{aligned}$$
(9)

Depending on the nature of function \(\fancyscript{Q}_2(x)\), problem (7)–(9) is either a convex or a non-convex minimization problem, which greatly affects the choices of readily available solution techniques and the effort required to obtain a provably optimal solution to (7)–(9).

2.2 Extensive form

In case of a continuous random vector \(\xi \), the solution of problem (1)–(5) requires the evaluation of a (multidimensional) integral—a task that is computationally intractable except for a few special cases which are of limited practical importance. The numerical solution of such problems is thus usually attained through the discretization of the probability distribution of \(\xi \) and the subsequent construction of a scenario tree. The stochastic optimization problem may then be re-written as one single LP in the so-called extensive form

$$\begin{aligned} z^\mathrm{E} = \min&c_1 x_1 + \sum \nolimits _{s=1}^S p_s \sum \nolimits _{t=2}^T c_{st} x_{st}\end{aligned}$$
(10)
$$\begin{aligned} s.t.&W_1 x_1 = h_1\end{aligned}$$
(11)
$$\begin{aligned}&T_{s,t-1} x_{s,t-1} + W_{st} x_{st} = h_{st} \quad \quad \forall s = 1, \ldots , S, \;\;\; t = 2, \ldots , T \qquad \end{aligned}$$
(12)
$$\begin{aligned}&x_{st} = x_{\tilde{s}t} \quad \quad \forall s = 1, \ldots , S \ \wedge \ \tilde{s} \in \mathbb {S}_{st}, \;\;\; t = 2, \ldots , T \end{aligned}$$
(13)
$$\begin{aligned}&x_1, \, x_{st} \ge 0 \quad \quad \forall s = 1, \ldots , S, \;\;\; t = 2, \ldots , T, \end{aligned}$$
(14)

with scenarios \(s = 1, \ldots , S\) realized with probability \(p_s >0\). The set \(\mathbb {S}_{st}\) is the collection of all scenarios \(\tilde{s}\) which are identical to scenario \(s\) in stage \(t\) with \(\tilde{s}, s = 1, \ldots , S\) and \(t = 2, \ldots , T\). We use the convention \(x_{s,1} \equiv x_1\). Equations (13) are the non-anticipativity constraints for the MSLP.

In case that the uncertainty is represented via a scenario fan, then \(\mathbb {S}_{st} = \emptyset \) for all scenarios \(s\) and for all stages \(t = 2, \ldots , T\). The index “\(s\)” accompanying the data \(c_t,h_t,T_t,W_t\) and the decision variables \(x_t\) indicate their realization for, and dependence on, scenario \(s\).

If \(\xi \) is a discrete and finite random vector (i.e., the support of \(\xi \) is finite), then all \(S\) possible realizations may be enumerated, implying that \(z^\mathrm{E} = z^*\). In this case, problem (10)–(14) is a deterministic equivalent to (1)–(5) in extensive form.

2.3 Special cases and properties

The effect of event \(\omega _t\) on each of the MSLP’s components determines a few special cases:

$$\begin{aligned} W_t(\omega _t)&\equiv W_t \quad \forall t=2,\ldots ,T&\text {fixed recourse} \end{aligned}$$
(15)
$$\begin{aligned} T_t(\omega _t)&\equiv T_t \quad \forall t=2,\ldots ,T&\text {fixed technology} \end{aligned}$$
(16)
$$\begin{aligned} h_t(\omega _t)&\equiv h_t \quad \forall t=2,\ldots ,T&\text {fixed right-hand-side (RHS)} \end{aligned}$$
(17)
$$\begin{aligned} c_t(\omega _t)&\equiv c_t \quad \forall t=2,\ldots ,T&\text {fixed objective function coefficients} \end{aligned}$$
(18)

The shape of function \(Q_t(\cdot ,\cdot )\) crucially influences the type of solution algorithms applicable to solve the MSLP of type (1)–(5). The following theorem summarizes useful properties:

Theorem 1

(Birge and Louveaux [5], Chap. 3,Theorem 2]) For a given \(\omega _t\), the value function \(Q_t(x_t, \omega _t)\) is a piece-wise linear ...

  1. a.)

    ...convex function in \((h_t,T_t)\),

  2. b.)

    ...concave function in \(c_t\),

  3. c.)

    ...convex function in \(x_t\).

Theorem 1 implies that \(\fancyscript{Q}_{t+1}(x_{t+1})\) is a piece-wise linear, convex function in \(x_t\). However, \(Q_t(\cdot ,\cdot )\) is a nonlinear (in general, not piecewise linear) function in variations of the recourse matrix.

3 Benders decomposition: solution approaches towards MSLPs

The transformation of the stochastic optimization problem into an extensive form may lead to a very large scale problem, because a huge number of scenarios may be required to accurately represent the cumulative distribution function of \(\xi \). Thus, many different techniques have been proposed to obtain discretizations which (1) consistently approximate the (possibly) infinite sample space and (2) lead to optimization problems of manageable size.

Any scenario tree is a finite-dimensional approximation of the MSLP (1)–(5) which may introduce inaccuracies. These approximate mathematical optimization problems may posses optimal solutions (or policies) which might be “far away” from the optimal policy of the original MSLP. Scenario trees which exhibit certain consistency properties, e.g., increasing the number of scenarios leads to an asymptotic convergence of an optimal solution of the approximated problem to an optimal solution of the original problem, are studied by Olsen [40] and more recently by Pennanen [41]. Pennanen [42] presents a discretization procedure which leads to consistent approximations. His procedure is based on low-discrepancy sequences, specifically tailored to time series models for the random variables \(\xi \); the model presented in Sect. 3.4 fits into this framework. Another alternative to sampling based methods (which are discussed in this paper) is proposed by Mirkov and Pflug [36]. They discretize the distributions directly and work with distribution distances, i.e., conditional transportation distances, allowing them to derive a bound from the optimal solution of the approximated problem to the original problem, under some regularity conditions; see also Heitsch et al. [27].

Once a scenario tree has been constructed (either by discretization of the sample space or by exhausting all possible scenarios for discrete distributions), scenario reduction might be employed to obtain a tree which leads to a computationally feasible MSLP problem. The idea of scenario reduction is to thin out the tree by removing scenarios and updating the probabilities of the remaining scenarios. A computational framework for optimal scenario reduction techniques for problems of type (10)–(14) was developed by Dupačová et al. [17]. With their method, one can find an approximation to \(z^*\). To select a good representation of scenarios in a scenario generation framework is another strategy, cf. Heitsch and Römisch [26]. Other approaches compute lower and upper bounds, cf. Kuhn [33].

In this section, we discuss various Benders decomposition based solution methods for solving the MSLP (10)–(14) in its extensive form; i.e., we are given a scenario tree representation of the uncertainty. We identify the assumptions—in terms of (1)–(5)—needed for the methods to converge to a (optimal or approximately optimal) solution of problem (10)–(14); however, we do not address the question on how close or good the obtained solution is with respect to the original MSLP (1)–(5).

3.1 Assumptions

For the ease of presentation, we make the following two assumptions for the remainder of this section:

A1::

MSLP (10)–(14) is bounded and feasible;

A2::

for any \(x_{s,t-1}\), the sub-problems defined through equations (12)–(14) are feasible for stage \(t\), \((t=2,\ldots , T)\); e.g., the problem has relatively complete recourse.

We comment on the need and implications for both assumptions in the next Sect. 3.2.

3.2 Nested Benders decomposition (NBD)

A popular exact optimization method in the context of stochastic (linear) programming is Benders decomposition, also known in the stochastic programming community as L-Shaped Method or, when applied to the dual problem, the Dantzig–Wolfe Decomposition. Pereira and Pinto [44] called this algorithm also Dual Dynamic Programming (DDP).

The basic idea of (nested) Benders decomposition is to build an outer linearization that approximates the functions \(\fancyscript{Q}_{t}(x) (t=2,\ldots ,T)\) via two types of cuts, the so-called Benders cuts: feasibility and optimality cuts, cf. Benders [2, 3]. The key concept is that the feasible region of the dual of the sub-problems for stage \(t\) is independent of any selection of the \(x_{s,t-1}\) variables. This allows for an exact representation of the expected future value function with a finite number of cuts.

A pseudo-code of a NBD procedure is presented in Algorithm 3.1 (assumptions of Sect. 3.1 need to hold). The set \(\mathbb {S}_t\) denotes a collection of scenarios \(s\) for stage \(t\) excluding “merged” scenarios for that stage, i.e., if \(s\) and \(\tilde{s} \in \mathbb {S}_t\), then \(\tilde{s} \notin \mathbb {S}_{st}\); set \(\mathbb {S}^\mathrm{S}_{st}\) is the collection of all “successor scenarios” of scenario \(s\) in stage \(t\), e.g., if \(\tilde{s} \in \mathbb {S}^\mathrm{S}_{st}\) then \(\tilde{s} \notin \mathbb {S}^\mathrm{S}_{s,t+1}\). Probability \(p_{s\tilde{s}t}\) is the chance of realizing scenario \(\tilde{s}\) in stage \(t+1\), conditioned on scenario \(s\) in stage \(t\); \(\sum \nolimits _{\tilde{s} \in \mathbb {S}^\mathrm{S}_{st}} p_{s\tilde{s}t} = 1\) for all \(t = 2, \ldots , T\) and \(s \in \mathbb {S}_{t}\).

Algorithm 3.1 terminates in a finite number of iterations with an optimal solution with respect to any given accuracy \(\varepsilon >0\):

Theorem 2

Algorithm 3.1 computes an optimal solution for problem (10)–(14) in finitely many iterations.

The proof of convergence is based on the following two observations:

Correctness :

the optimality cuts generated are valid outer linearizations of the expected future value functions, i.e., for all \(t\), the optimality cuts at stage \(t\) underestimate the expected \(t+1\)-stage value function \(\fancyscript{Q}_{t+1}(x)\); this implies that the computed \(\underline{z}\) defines a lower bound on \(z^\mathrm{E}\).

Finiteness :

the boundedness of each sub-problem for each stage \(t\) (assumptions A1 and A2) implies that the dual has a finite number of extreme-points, eventually leading to a finite number of optimality cuts. The proof needs to account for the fact that the optimality cuts computed for stages \(t < T\) are not always tight, i.e., they do not always yield a support of the expected future value function.

A formal proof is presented, for instance, by Birge and Louveaux [5], Chap. 6, Theorem 1].

Assumption A2 as stated in Sect. 3.1 implies that we do not need any feasibility cuts for the Benders Algorithm 3.1. The inclusion of feasibility cuts can be done in a straight forward manner, though different strategies exist how to proceed iteratively with the Benders algorithm once an infeasible sub-problem has been detected, cf. Gassmann [18] and Morton [38]. Assumption A1 excludes the possibility of an overall unboundedness of the problem; together with A2, this implies that all the \(t\)-stage problems \((t=1,\ldots , T)\), including the first-stage problem (also called the Master problem), have a finite optimal solution, e.g., are not unbounded.

figure a

Algorithm 3.1 is a multicut version of (nested) Benders decomposition, because there are \(|\mathbb {S}^\mathrm{S}_{st}|\) cuts at each stage \(t\) and scenario \(s\): one cut for each successor of \(s\). Alternatively, all \(|\mathbb {S}^\mathrm{S}_{st}|\) cuts can be aggregated to one single cut. Birge and Louveaux [5], Chap.5.1d] discuss the advantages and disadvantages of the two approaches: disaggregated versus aggregated cuts.

Note that Algorithm 3.1 computes safe lower bounds \(\underline{z}\) and safe upper bounds \(\bar{z}\) on \(z^\mathrm{E}\).

A common misconception is that (nested) Benders decomposition requires a fixed recourse (15) to converge. However, this is not required as both the feasible region and the recourse functions are convex for a finite number of scenarios. Nevertheless, the MSLP (1)–(5) with fixed recourse is easier to analyze with respect to stability, cf. Wets [59].

3.3 Sampling-based nested Benders decomposition for stage-wise independent random vectors

A special case is present when the random vector \(\xi \) is stage-wise independent. In this case, the expected \(t+1\)-stage value function (6) depends solely on \(x_{t-1}\); particularly, it is independent of the realization \(\omega _{t-1}\) of \(\xi \). This leads to a so-called recombining scenario tree, see Fig. 1a. Nevertheless, if there are \(K\) realizations per stage, then the corresponding scenario tree has \(S = K^{T-1}\) scenarios; in the case of Fig. 1a, there are \(N=64\) scenarios.

Fig. 1
figure 1

Recombining scenario tree with \(T=4\) and \(K=4\). a Recombining tree, b \(N=5\) sample paths

The idea of sampling-based nested Benders decomposition is to explore the stage-wise independence of \(\xi \) by

(forward simulation):

drawing \(N \ll S\) samples from the full tree to obtain new decisions \(\hat{x}_{nt}\) and a confidence interval around an estimate \(\hat{z}\) of an upper bound for \(z^\mathrm{E}\), these samples are also called sample paths; and

(backward recursion):

computing an outer linearization for function \(\fancyscript{Q}_t(\tilde{x})\) by solving \(N \cdot K\) linear programming problems, only. When working directly with the extensive form of the corresponding tree (and ignoring the stage-wise independence), one would need to solve \(K^{t-1}\) linear programming problems; \(t = 2, \ldots , T\). One backward pass over the entire time horizon \(T\) requires \(\frac{K^T - 1}{K-1}\) LP solves when ignoring the stage-wise independence while the sampling-based method requires only \(N (T-1) K + 1\) LP solves. In other words, the cuts are “shared” among all scenarios, i.e., optimality cut set \(\mathbb {F}_{st}\) is independent of scenario \(s\).

In Fig. 1b, a choice of 5 different sample paths for the recombining tree of Fig. 1a is shown.

The sampling-based nested Benders decomposition method is summarized in pseudo-code form in Algorithm 3.2. \(z_{1-\alpha /2}\) denotes the upper \(1 - \alpha /2\) critical point for a standard normal random variable.

figure b

The next corollary states the correctness of Algorithm 3.2:

Corollary 1

The optimality cuts (39)-(40) are valid, i.e., they underestimate the expected \(t+1\)-stage value function (6).

A proof of Corollary 1 follows from Theorem 2 after observing that all possible \(K\) realizations are used to calculate the optimality cut; alternatively, cf. Pereira and Pinto [44]. Corollary 1 implies that \(\underline{z}\) is a safe lower bound on \(z^\mathrm{E}\), while \(\bar{z}\) is an approximated upper bound only.

The stopping criterion of Algorithm 3.2 uses a type of confidence interval around \(\bar{z}\) (upper bound on \(z^\mathrm{E}\)) with half-width \(z_{1-\alpha /2}\tfrac{\sigma _z}{\sqrt{N}}\). However, by the central limit theorem, a true confidence interval is only obtained under the following two assumptions:

C1::

the \(z_n\) are independent and identically distributed random variables for \(n=1,\ldots ,N\), and

C2::

\(N \rightarrow +\infty \).

Arguably, neither of the two assumptions hold for Algorithm 3.2. Alternative stopping criteria were proposed, e.g., by Shapiro [54] or Homem-de-Mello et al. [28].

Philpott and Guan [47] prove that Algorithm 3.2 almost surely converges to an optimal solution:

Theorem 3

(Philpott and Guan [47, Theorem 4]) Assuming fixed recourse (15), fixed objective function coefficients (18) and an independent sampling procedure (for step 9), Algorithm 3.2 converges with probability 1 to an optimal policy of (10)–(14) in a finite number of iterations.

A similar result is obtained by Shapiro [54] where the assumption of a finite distribution for \(\xi \) is waived. Shapiro shows that Algorithm 3.2 almost surely converges to an optimal policy of the Sample Average Approximation problem, i.e., the \(t+1\)-stage value function is computed for \(N\) random samples taken from the true distribution and the expected \(t+1\)-stage value function is given by their average values.

Algorithm 3.2 is Pereira and Pinto’s DDP algorithm applied to the stochastic optimization problem with stage-wise independent random variables, also called SDDP.

3.4 Sampling-based nested Benders decomposition for stage-wise dependent random vectors

The stage-wise dependence of the random vector \(\xi \) reveals a complication which is hidden in the notation of the random vector \(\xi \) of the \(t+1\)-stage value function as defined in (6): \(\fancyscript{Q}_{t+1}(x)\) depends on the realization(s) of the previous stage(s); i.e., the expectation in (6) is taken with respect random vector \(\xi _{t+1}\) conditioned on \(\omega _{t}, \omega _{t-1}, \ldots , \omega _2\).

We consider a special case, with the following assumptions:

S1::

the random vector \(\xi \) has the memoryless property; i.e., \(\xi _t\) depends only on \(\omega _{t-1}\),

S2::

the distribution of vector \(\xi _t|\omega _{t-1}\) is given by the function \(d_{t}(\omega _{t-1},\varpi )\) with noise term \(\varpi \) (independent of \(\omega _{t-1}\)) and constant derivative:

$$\begin{aligned} \frac{\partial d_t(\omega _{t-1},\cdot )}{\partial \omega _{t-1}} = \varphi _{t-1} \quad \quad \forall t = 2, \ldots , T, \end{aligned}$$

where \(\varphi _{t-1}\) is a quadratic matrix and \(\omega _{1}\) is certain. Implicitly, we assume that \(d_t(\omega _{t-1},\cdot )\) is a continuous function in \(\omega _{t-1}\), i.e., \(\xi _t|\omega _{t-1}\) is a continuous random vector. As we limit the solution to MSLP in extensive form, we assume that we are given \(K\) possible realizations corresponding to \(\xi _t|\omega _{t-1}\) for each stage \(t\). Each realization occurs with probability \(p_k\) (\(k=1, \ldots ,K\)); \(p_k\) may also depend on \(t\).

An example of a process satisfying both assumptions S1 and S2 is a linear autoregressive model of lag 1, see Sect. 4.3.1.

Assumption S1 allows us to ease the discussions because we can limit the presentation to one state variable, see below. All proceeding results derived for lag 1 models naturally generalize to the case for general order autoregressive models; cf. Infanger and Morton [32], Sect. 5]. Assumption S2 is more restrictive and cannot be waived easily; it boils down to the effects of the sampling methods on the dual feasibility. We come back to this discuss at the end of this section.

With assumption S1, the expected \(t+1\)-stage value function gets extended by a new (so-called) state variable \(\omega _{t}\)

$$\begin{aligned} \fancyscript{Q}_{t+1}(x_t,\omega _t) := \mathbb {E}_{\xi _{t+1}|\omega _{t}} \big [ Q_{t+1}(x_t,\omega _{t+1})\big ] \qquad t=2, \ldots , T-1. \end{aligned}$$
(41)

The linearity assumption S2 implies

Theorem 4

For fixed recourse (15), the expected \(t+1\)-stage value function \(\fancyscript{Q}_{t+1}(x_t,\omega _t)\) is a piece-wise linear ...

  1. a.)

    ...convex function in \(x_t\),

  2. b.)

    ...convex function in \(\omega _t\) in case of fixed objective function coefficients (18),

  3. c.)

    ...concave function in \(\omega _t\) in case of fixed technology (16) and fixed RHS (17),

  4. d.)

    ...convex function jointly in \(x_t\) and \(\omega _t\) in case of fixed objective function coefficients (18) and fixed technology (16).

The proof of Theorem 4 follows from the linearity of \(d_t(\omega _{t-1},\cdot )\) in \(\omega _{t-1}\) by adjusting the proof of Theorem 1 accordingly.

Property d.) of the \(t+1\)-stage value function \(\fancyscript{Q}_{t+1}(x_t,\omega _t)\) is particular important for NBD algorithms, as the optimality cuts require convexity (piece-wise linearity is required for finite convergence of the classical NBD). A tailored NBD algorithm is summarized in Algorithm 3.3.

figure c

The correctness of Algorithm 3.3 is established in the following

Corollary 2

The optimality cuts (52)–(54) are valid, i.e., they underestimate the expected \(t+1\)-stage value function (41).

Proof

The approximated \(t\)-stage value function is given by (for realizations \(h_{kt}\) of the RHS; we omit index “\(n\)” here)

$$\begin{aligned} \widetilde{\fancyscript{Q}}_{t}(x_{t-1},\omega _{t-1}) := \min&\sum _{k=1}^K p_k \left( c_{t} x_k + \eta _k \right) \end{aligned}$$
(55)
$$\begin{aligned} s.t.&W_{t} x_k = h_{kt} - T_{t-1} x_{t-1} \quad \quad k = 1, \ldots , K\end{aligned}$$
(56)
$$\begin{aligned}&\eta _k \ge E_{ft} x_k + E_{ft}^{h}h_{kt} + e_{ft} \quad \;\; \forall f \in \mathbb {F}_{t}, \; k = 1, \ldots , K\nonumber \\ \end{aligned}$$
(57)
$$\begin{aligned}&x_k \ge 0, \quad \quad k=1,\ldots ,K \end{aligned}$$
(58)

We need to show that

$$\begin{aligned} \widetilde{\fancyscript{Q}}_{t}(x_{t-1},\omega _{t-1}) \le \fancyscript{Q}_{t}(x_{t-1},\omega _{t-1}) \end{aligned}$$

for all feasible \(x_{t-1}\) and for all \(\omega _{t-1} \in \varOmega _{t-1}\).

The dual of (55)–(58) can be written as (we scale the dual variables by \(1/p_k\))

$$\begin{aligned} z^D_{t}(x_{t-1},\omega _{t-1}) := \max&\sum _{k=1}^K p_k \left( y_k \left( h_{kt} - T_{t-1} x_{t-1}\right) + \sum _{f \in \mathbb {F}_{t}} \overline{y}_{fk} \left( E_{ft}^{h}h_{kt} + e_{ft}\right) \right) \nonumber \\ \end{aligned}$$
(59)
$$\begin{aligned} s.t.&y_k W_{t} - \sum \nolimits _{f \in \mathbb {F}_{t}} \overline{y}_{fk} E_{ft} \le c_{t} \quad \quad k = 1, \ldots , K \end{aligned}$$
(60)
$$\begin{aligned}&\sum \nolimits _{f \in \mathbb {F}_{t}} \overline{y}_{fk} = 1 \quad \quad k = 1, \ldots , K \end{aligned}$$
(61)
$$\begin{aligned}&\overline{y}_{fk} \ge 0 \quad \quad \forall f \in \mathbb {F}_{t}, \; k = 1, \ldots , K \end{aligned}$$
(62)

By duality theory, the dual (59)–(62) has an optimal solution (recall that we assume relative complete recourse) with \(z^D_{t}(x_{t-1},\omega _{t-1}) = \widetilde{\fancyscript{Q}}_{t}(x_{t-1},\omega _{t-1})\). As the feasible region of the primal is bounded (assumption A1), so is the feasible region of the dual. Thus, the feasible region of the dual can be characterized by finitely many extreme points. An optimum of the dual (59)–(62) is then obtained at an extreme point, given by \(\pi _{k}\) together with \(\overline{\pi }_{fk}\). Let there be a particular value for \(x_{t-1}\) denoted by \(\hat{x}_{t-1}\) and the corresponding value of the RHS for event \(\omega _{t-1}\), given by \(\hat{h}_{t-1}\), then

$$\begin{aligned}&z^{D}_{t}(\hat{x}_{t-1},\hat{h}_{t-1}) \\&\quad = \sum _{k=1}^K p_k \left( \pi _{k} \left( \hat{h}_{kt} - T_{t-1} x_{t-1}\right) + \sum _{f \in \mathbb {F}_{t}} \overline{\pi }_{fk} \left( E_{ft}^{h}\hat{h}_{kt} + e_{ft}\right) \right) \\&\quad = \sum _{k=1}^K p_k \left( \left( \sum _{f \in \mathbb {F}_{t}} \overline{\pi }_{fk} E_{ft}^{h} + \pi _{k} \right) d_{t}(\hat{h}_{t-1},\varpi )_k - \pi _{k} T_{t-1}\hat{x}_{t-1} + \sum _{f \in \mathbb {F}_{t}} \overline{\pi }_{fk} e_{ft} \right) \\&\quad = \left( \sum _{k=1}^K p_k \left( \sum _{f \in \mathbb {F}_{t}} \overline{\pi }_{fk} E_{ft}^{h} + \pi _{k} \right) \right) \varphi _t \hat{h}_{t-1} - \left( \sum _{k=1}^K p_k \pi _{k} \right) T_{t-1}\hat{x}_{t-1} + c(\varpi ) \end{aligned}$$

with constant value \(c(\varpi )\), independent of \(\hat{h}_{t-1}\) and \(\hat{x}_{t-1}\); it can be calculated through

$$\begin{aligned} c(\varpi )= & {} z^{D}_{t}(\hat{x}_{t-1},\hat{h}_{t-1}) - \left( \sum _{k=1}^K p_k \left( \sum _{f \in \mathbb {F}_{t}} \overline{\pi }_{fk} E_{ft}^{h} + \pi _{k} \right) \right) \varphi _t \hat{h}_{t-1}\\&+ \left( \sum _{k=1}^K p_k \pi _{k} \right) T_{t-1}\hat{x}_{t-1}. \end{aligned}$$

This implies

$$\begin{aligned} \widetilde{\fancyscript{Q}}_{t}(\hat{x}_{t-1},\hat{h}_{t-1})= & {} E_{f,t-1} \hat{x}_{t-1} + E_{f,t-1}^{h} \hat{h}_{t-1} + e_{f,t-1} \end{aligned}$$

with \(E_{f,t-1}\), \(E_{f,t-1}^{h}\) and \(e_{f,t-1}\) defined appropriately (e.g., as in (52)-(54)), for the corresponding index \(f\).

Because the feasible region of the dual (59)–(62) is neither affect by \(x_{t-1}\) nor by \(\omega _{t-1}\), \(\pi _{k}\) together with \(\overline{\pi }_{fk}\) remains an extreme point of the feasible region of the dual, though optimality might be lost. For an arbitrary \(x_{t-1}\) and \(\omega _{t-1}\), this implies that

$$\begin{aligned} \widetilde{\fancyscript{Q}}_{t}(x_{t-1},\omega _{t-1})\ge & {} E_{f,t-1} x_{t-1} + E_{f,t-1}^{h} h_{t-1}(\omega _{t-1}) + e_{f,t-1} \quad \quad t=2, \ldots , T. \end{aligned}$$

As in the last stage \(T\), \(\fancyscript{Q}_{T+1}(x_{T},\omega _{T}) \equiv 0\); it follows that \(\fancyscript{Q}_{T}(x_{T-1},\omega _{T-1}) = \widetilde{\fancyscript{Q}}_{T}(x_{T-1},\omega _{T-1})\). The argument follows then by (backwards) induction over the stages. \(\square \)

The calculated \(\underline{z}\) defines a lower bound on \(z^\mathrm{E}\) because there are \(K\) realizations of the RHS, conditioned on the stage and the previous realization. Note that \(\varphi _t = 0\) leads to a model of lag 0 which reduces Algorithm 3.3 to Algorithm 3.2.

Algorithm 3.3 is an SDDP algorithm with linear sampling.

Let us re-visit the requirement S2 and the challenges faced when introducing non-linear relationships and sampling procedures affecting parameters other than the RHS. Assumption S2 implies the convexity of the expected \(t+1\)-stage value function \(\fancyscript{Q}_{t+1}(x_t,\omega _t)\) for fixed recourse, fixed objective function coefficients, and fixed technology; property d.) of Theorem 4. Thus, only the randomness of the RHS parameter \(h_t\) can be treated by a (linear) sampling procedure.

There are several ways to illustrate the challenges faced, when extending the approach beyond serial dependence in the RHS. The first insight is given by the correctness proof for the cuts (Corollary 2). The idea of the proof is that the dual feasible region of the approximated \(t\)-stage problem is not affected by the randomness of RHS parameter \(h_t\). Thus, the extreme points of the dual remain unaffected and do not “see” the randomness of \(h_t\). Together with the linearity of the sampling procedure, it follows that (1) the derive optimality cuts are valid and (2) a finite number of cuts suffice. General sampling procedures for parameters \(c_t, T_{t-1}\) and \(W_t\) may affect the feasible region of the dual; thus, special dependency structures are required which do not affect the dual feasible region.

A second way is the “add-a-state-variable” view; cf. Maceira and Damázio [34], de Matos and Finardi [11] and de Queiroza and Morton [12], Sect. 5]. Here, the idea is to apply a re-formulation to obtain \(t\)-stage problems which are interstage independent, by introducing additional state variables which capture the history of the stochastic process. The relationship capturing the randomness of the parameters then explicitly enter the formulation via constraint(s); non-linear relationships might destroy the polyhedral structure. Valid (optimality) cuts are then derived by taking the dual and the argument above applies.

3.5 Combining sampling-based and scenario-based NBD methods

Given is the MSLP (10)–(14) in its extensive form. The uncertainty for scenario \(s\) (and stage \(t\)) is parameterized by \(c_{st}\), \(h_{st}\), \(T_{s,t-1}\) and \(W_{st}\); a possible realization of random vector \(\xi \).

We consider the case when random vector \(\xi \) can be separated into two different, independent random vectors \(\xi ^T\) and \(\xi ^S\); i.e., each component of \(c_{st}\), \(h_{st}\), \(T_{s,t-1}\) and \(W_{st}\) is a realization of one of the two random vectors \(\xi ^T\) or \(\xi ^S\). The set \(\mathbb {S}_{t}\) corresponds to the scenario tree part, i.e., associated with random vectors \(\xi ^T\), and denotes the collection of all corresponding scenarios \(s\) for stage \(t\). In the following, we distinguish two cases: \(\xi ^S\) being stage-wise independent (Sect. 3.5.1) and stage-wise dependent (Sect. 3.5.2).

We want to briefly mention here that the two proposed algorithms below can be seen as a generalization of the work by Philpott and de Matos [46] where a Markov chain was combined with SDDP, by assuming that a Markov state is the scenario tree realization and the transition matrices define the probabilities. We refer to Sect. 4.4 for a more detailed discussion.

3.5.1 Nested Benders decomposition and sampling-based nested Benders decomposition for stage-wise independent random variables

We assume random vector \(\xi ^S\) is stage-wise independent while random vector \(\xi ^T\) has a scenario tree structure. Let there be \(K_1\) realizations associated with random vector \(\xi ^S\) for each stage (\(t>1\)); random vector \(\xi ^T\) has \(K_2\) realizations (per stage). Combining both uncertainties leads to a tree with \(S = (K_1K_2)^{T-1}\) scenarios. For instance, in Fig. 2, the second stage of the combined tree has 8 nodes and the third stage \(64 = (4 \cdot 2)^{3-1}\); for each of the 8 realizations of the second stage, there are 8 realizations for stage 3.

Fig. 2
figure 2

Combining two types of uncertainties into one algorithmic framework \((T=4)\): recombining scenario tree \((K_1=4)\) representing \(\xi ^S\) and scenario tree \((K_2=2)\) representing \(\xi ^T\). a Recombining tree: \(\xi ^S\); 4 possible sample paths are dashed in gray, b scenario tree: \(\xi ^T\)

The idea is to treat random vector \(\xi ^T\) exactly (via NBD) while we take a sampling approach for \(\xi ^S\). Both stochastic processes share the same set of stages as shown in Fig. 2. Figure 2a shows again a recombining scenario tree with 4 stages and \(K_1=4\) realizations per stage. We apply the sampling approach towards the recombining tree. A scenario tree with \(K_2=2\) realizations per stage is shown in Fig. 2b. At each stage \(t\) and for each node in the recombining tree, there are \(K_2^{t-1}\) realizations corresponding to the scenario tree.

We obtain one Benders cut for each \(s\); each sample \(n\) taken from the recombining scenario tree leads to a Benders cut as well. Note that we do not loose the ability to share cuts in the recombining tree, i.e., the \(n\) cuts associated with \(\xi ^S\) can be shared for each scenario \(s \in \mathbb {S}_{t}\) but not among the different scenarios \(s\). With other words, this algorithmic framework preserves cut sharing within each scenario of the scenario tree part. This is an important distinction to existing variants of the SDDP method, see Sect. 4.4.

The procedure is summarized in Algorithm 3.4. Scenarios \(s\), \(\tilde{s}\) and \(\check{s}\) correspond to the scenarios of random vector \(\xi ^T\) (\(\tilde{S} = K_2 ^{T-1}\)); \(\bar{s}\) are the scenarios corresponding to the random vector \(\xi ^S\) only (\(\bar{S} = K_1^{T-1})\). The realizations corresponding to the uncertain parameters on both trees are given by \(c_{n_{\bar{s}},\tilde{s},t}\), \(W_{n_{\bar{s}},\tilde{s},t}\), \(h_{n_{\bar{s}},\tilde{s},t}\), \(T_{n_{\bar{s}},\tilde{s},t-1}\).

figure d

Correctness of the Benders cuts is established by

Corollary 3

The optimality cuts (73)–(74) are valid, i.e., they underestimate the expected \(t+1\)-stage value function.

Proof

The approximated \(t\)-stage value function for scenario \(s\), stage \(t\) and realizations \(c_{kst}\), \(h_{kst}\), \(T_{k,s,t-1}\) and \(W_{kst}\) is given by; we omit index “\(n\)” here

$$\begin{aligned} \widetilde{\fancyscript{Q}}_{st}(x_{t-1},\omega _{t-1}) := \min&\sum _{k = 1}^{K_1} p_k c_{kst} x_k + \sum _{\tilde{s} \in \mathbb {S}^\mathrm{S}_{st}} p_{s\tilde{s}t} \eta _{\tilde{s}} \\ s.t.&W_{kst} x_k = h_{kst} - T_{k,s,t-1} x_{t-1} \quad \quad k = 1, \ldots , K_1\\&\eta _{\tilde{s}} \ge E_{f\tilde{s}t} x_k + e_{f\tilde{s}t} \;\;\; \forall \tilde{s} \in \mathbb {S}^\mathrm{S}_{st}, \;\ f \in \mathbb {F}_{\tilde{s}t}, \;\ k = 1, \ldots , K_1\\&x_k \ge 0, \quad \quad k=1,\ldots ,K_1 \end{aligned}$$

The objective function of the dual (the variables are scaled) can be written as

$$\begin{aligned} \max&\sum _{k=1}^{K_1} p_k \left( y_k \left( h_{kst} - T_{k,s,t-1} x_{t-1}\right) + \sum _{\tilde{s} \in \mathbb {S}^\mathrm{S}_{st}} \sum _{f \in \mathbb {F}_{\tilde{s}t}} \overline{y}_{fk\tilde{s}} \ e_{f\tilde{s}t} \right) \end{aligned}$$
(75)

The remainder of the proof follows the ideas of the proof of Corollary 2. \(\square \)

3.5.2 Nested Benders decomposition and sampling-based nested Benders decomposition for stage-wise dependent random variables

Here, we assume stage-wise dependence of random vector \(\xi ^S\); \(\xi ^T\) underlies a scenario tree structure. As in Sect. 3.4, we assume the dependence of \(\xi _t^S\) on the events of stage \(t-1\) only (assumption S1) in a linear fashion (assumption S2). This allows us to write the expected \(t+1\)-stage value function for scenario \(s\) as

$$\begin{aligned} \fancyscript{Q}_{s,t+1}(x_t,\overline{\omega }_t) := \mathbb {E}_{\xi ^S_{t+1}|\overline{\omega }_{t}} \big [ Q_{s,t+1}(x_t,\overline{\omega }_{t+1}) \big ] \qquad t=2, \ldots , T-1 \end{aligned}$$
(76)

where we exploit the independence of \(\xi ^S\) and \(\xi ^T\).

In order to preserve convexity of the expected \(t+1\)-stage value function (76), only the RHS values are allowed to be stochastic, governed by \(\xi ^S\), cf. Theorem 4 and the discussions at the end of Sect. 3.4. However, the scenario tree part can still affect all components. For the ease of notation, we split the realizations corresponding to the uncertain parameters to \(c_{st}\), \(W_{st}\), \(T_{s,t-1}\) (corresponding to \(\xi ^T_t\)) and \(h_{\bar{s}t}\) (corresponding to \(\xi ^S_t\)). Further, let \(h_{nt}\) be a realization for one scenario \(\bar{s}\) among the \(\bar{S}\) scenarios of \(\xi _t^S\). The resulting Benders Decomposition algorithm variant is outlines in Algorithm 3.5.

figure e

Algorithm 3.5 is correct:

Corollary 4

The optimality cuts (89)–(91) are valid, i.e., they underestimate the expected \(t+1\)-stage value function (76).

Proof

The approximated \(t\)-stage value function corresponding to (76) for scenario \(s\) (corresponding to \(\xi _t^T\)), stage \(t\) and realizations \(c_{st}\), \(T_{s,t-1}\), \(W_{st}\) for \(\xi _t^T\), and \(h_{kt}\) for \(\xi _t^S|\overline{\omega }_{t-1}\) is given by; we omit index “\(n\)” here

$$\begin{aligned} \widetilde{\fancyscript{Q}}_{st}(x_{t-1},\overline{\omega }_{t-1}) := \min&\sum _{k = 1}^{K_1} p_k c_{st} x_k + \sum _{\tilde{s} \in \mathbb {S}^\mathrm{S}_{st}} p_{s\tilde{s}t} \eta _{\tilde{s}} \end{aligned}$$
(92)
$$\begin{aligned} s.t.&W_{st} x_k = h_{kt} - T_{s,t-1} x_{t-1} \quad \quad k = 1, \ldots , K_1\nonumber \\&\eta _{\tilde{s}} \ge E_{f\tilde{s}t} x_k + E_{f\tilde{s}t}^h h_{kt} + e_{f\tilde{s}t} \quad \quad \forall \tilde{s} \in \mathbb {S}^\mathrm{S}_{st}, \end{aligned}$$
(93)
$$\begin{aligned}&f \in \mathbb {F}_{\tilde{s}t}, \;\ k = 1, \ldots , K_1\end{aligned}$$
(94)
$$\begin{aligned}&x_k \ge 0, \quad \quad k=1,\ldots ,K_1 \end{aligned}$$
(95)

Let the optimal objective function value of a dual to (92)–(95) be \(z_{st}^D\) and let \(\pi _{k}\) and \(\overline{\pi }_{fk\tilde{s}}\) be an optimal dual solution (scaled, index \(s\) for the dual solution is omitted) for realizations \(\hat{x}_{t-1}\) and \(\hat{h}_{t-1}\), then

$$\begin{aligned}&z_{st}^D(\hat{x}_{t-1},\hat{h}_{t-1}) \\&\;\;= \left( \sum _{k=1}^{K_1} p_k \left( \sum _{\tilde{s} \in \mathbb {S}^\mathrm{S}_{st}} \sum _{f \in \mathbb {F}_{t}} \overline{\pi }_{fk\tilde{s}} E_{f\tilde{s}t}^{h} + \pi _{k} \right) \right) \varphi _t \hat{h}_{t-1}\\&\qquad - \left( \sum _{k=1}^{K_1} p_k \pi _{k} \right) T_{s,t-1}\hat{x}_{t-1} + c(\varpi ) \end{aligned}$$

with constant \(c(\varpi )\).

The ideas of the proof of Corollary 2 close this proof. \(\square \)

3.6 Discussion and method comparison

Combining tree and sampling approaches can exploit the individual approaches’ main strength

tree :

representation of uncertainties of complex underlying processes

sampling :

break of curse-of-dimensionality resulting from number of scenarios and in the state space

and overcome their main individual weakness

tree :

curse-of-dimensionality resulting from number of scenarios

sampling :

stochasticity is limited to stage-wise independent random variables or stage-wise dependent random variables following linear models (cf. Sect. 3.4).

However, it is crucial that the scenario tree does not contain too many scenarios. This can be achieved, for instance, if the two stochastic processes represented in \(\xi ^S\) and \(\xi ^T\) live on different scales, as shown in Fig. 3. The number of realizations \(K_1\) can differ per stage, Fig. 3a. Similarly, the number of realizations \(K_2\) per stage and node can be different in the tree structure, Fig. 3b. Nevertheless, the curse-of-dimensionality is not completely broken; it is still present for the processes modeled with the scenario tree approach, though a coarse tree may suffice for a specific application.

Fig. 3
figure 3

The recombining scenario tree and the scenario tree can live on different scales; \(K_1\) and \(K_2\) may vary per stage \((t>1)\). a Recombining tree: \(\xi ^S\), b scenario tree: \(\xi ^T\)

The five different NBD algorithms are summarized and compared in Table 1. The three different assumptions for the distribution of the stochasticities are hierarchical: “linear dependence” of lag 0 leads to a stage-wise independence process, see Sect. 3.4. Thus, algorithms capable of dealing with linear dependence of the randomness of the input data can also cope with stage-wise independence processes. Similarly, linearly dependent random vectors can be represented in a tree as well. The row “# of LP’s per Iteration” contains the number of LP’s to be solved in one backward and forward pass of the Benders decomposition algorithm variants. The number of state variables is given per stage and scenario, if applicable.

Table 1 Comparison of nested Benders decomposition algorithm variants

As mentioned, sampling-based approaches assume a linear model to represent the stochasticity in order to preserve the convexity of the expected future value function. Scenario tree approaches do not require this assumption. As such, the combination of both approaches can be seen as a convexification of the expected future value function. The idea can be applied to other approaches which are restricted to uncertainties present in the RHS.

We call both algorithms 3.4 and 3.5 SDDPT: SDDP & Tree.

4 Hydro-thermal scheduling

Least-cost hydro-thermal scheduling consists of determining an optimal operating policy for the use of hydro and thermal resources to minimize total expected costs while satisfying demand. We assume that we are given a hydro-thermal power system which must centrally dispatch the generation. The resulting problem has still its justification also in the context of deregulated electricity markets, as it is the core of many optimization problems such as the profit maximization problem by Mo et al. [37] or the optimal expansion problem by Gorenstin et al. [22]. Furthermore, many hydro-thermal systems are still centrally dispatched, e.g., in Central and South American countries. SDDP was originally designed as solution method for treating stochastic water inflows in the context of least-cost hydro-thermal scheduling, cf. Pereira and Pinto [44].

In the context of the hydro-thermal scheduling problem, the central issues regarding uncertainty and its effects on the decisions to be taken may vary depending on the time horizon and characteristics of the system under consideration. Predominantly hydro systems are more concerned with inflow uncertainty, since that directly affects the system’s capacity of sustained energy production. Thermal systems, on the other hand, are usually focused on guaranteeing reliability at times of peak demand, thus making unit outages an important issue.

The remainder of this section is organized as follows. We present four different classes of uncertainties (Sect. 4.1), review the literature (Sect. 4.2), apply the SDDPT algorithm to a hydro-thermal scheduling problem incorporating both fuel price/electricity demand and hydro inflow uncertainty (Sect. 4.3), and compare the new approach with existing Markov chain models (Sect. 4.4). Finally, we present an illustrative example (Sect. 4.5) and a case study for Panama (Sect. 4.6).

4.1 Classification of uncertainties

Inspired by Zimmermann [62], we classify the uncertainties for our real world optimization problem with respect to its context. In general, uncertainties related to the hydro-thermal scheduling problem may be broadly classified into four groups, see Table 2. For each one of these groups, the way to mathematically represent these uncertainties has an immediate impact on the methodologies to efficiently solve the resulting problems.

Table 2 Classification of uncertainties in hydro-thermal scheduling

The first group includes sources of uncertainty to which a time series model may be accurately fitted and expected to provide reasonable forecasts—uncertainty in inflows is an example that lies in this category. The second group relates to random variables whose evolution in time is better represented by Markov chains. That is sometimes the case of electricity spot prices. The third group may include, for example, the availability of each generating unit. In this case, the best approach is usually to perform a probabilistic evaluation based on Monte Carlo sampling. Finally, the fourth group deals with sources of uncertainty that are more closely related to structural, political or macro-economical conditions and can only be characterized in the form of a scenario tree, particularly when one is interested in long-term projections rather than short-term forecasts. Growth in electricity demand or the evolution of fuel prices are the most prominent examples in this group, and are exactly the motivation for our work.

4.2 Literature review

Scenario tree approaches to represent stochastic water inflows have been proposed in literature. However, in order to capture the inflow uncertainty, a large scenario tree may be required, leading to very large scale deterministic equivalent programs; cf. Shrestha et al. [56]. In contrast, sampling-based methods have received wide attention in the literature. The various curses-of-dimensionality present in stochastic dynamic programming (SDP) approaches [60] inspired the development of decomposition methods [38, 43] and SDDP approaches [44].

SDDP is now an established method which includes, for example, the operational modeling of hydro and thermal plants, hydrological uncertainty, (linearized) transmission networks [23] (e.g., Kirchhoff laws, losses, security constraints), natural gas supply, demand and transportation network [4], load variation per load level and per bus with monthly or weekly stages, and \(\hbox {CO}_2\) emission allowance constraints [52]. SDDP-type algorithms have been used in practice for more than a decade [35].

Several important algorithms have been developed which are related to the SDDP methodology: Abridged Nested Decomposition [14, 15], Cutting-Plane and Partial-Sampling [9], Generalized Dual Dynamic Programming [57], and Constructive Dual Dynamic Programming [4951]. Several extensions of SDDP have been proposed, for instance by Diniz and dos Santos [13] and dos Santos and Diniz [16] who incorporate the information of several future stages into one stage.

The deregulation of the electricity market added another stochastic component to the hydro-thermal scheduling problem: electricity spot prices. This lead to the development of the so-called Hybrid SDP/SDDP methods [1921], where pot price forecasts are treated via Markov chains in a discrete manner (in the SDP framework) while the reservoir levels and water inflows are modeled by continuous approximations (in the SDDP framework); see also Sect. 4.4.

Recently, there is a stream of research on the incorporation of risk aversion into the SDDP methodology, predominantly using the Conditional Value-at-Risk (CVaR) measure [53]. The possibility to formulate CVaR risk constraints in a linear programming framework is an important property for its inclusion into SDDP algorithm. Because CVaR constraints the cost (or profit) associated with the \(N\) sampled scenarios simultaneously, special tricks have to be developed to embed the CVaR constraints into a Dynamic Programming based framework. Therefore, primal [30] and dual penalization [54] techniques have been proposed. Shapiro et al. [55] present risk averse approaches for different risk measures, including CVaR, and apply it to the Brazilian power system. In this context, Philpott and de Matos [46] combine the SDDP framework with a Markov chain; see discussion in Sect. 4.4.

Given the recent global economic crisis and huge swings in oil prices, it became evident that relying on point estimates for key variables such as demand in future time stages and fuel costs for thermal plants may result in biased and risky decisions. Pereira et al. [45] proposed the modeling of the electricity demand uncertainty as a linear autoregressive process. This is theoretically possible and amenable to the application of the SDDP algorithm since electricity demand appears on the RHS of the constraint matrix (cf. Sect. 4.3.2) and, hence, the future cost function is a convex function in electricity demand. In practice, however, a linear autoregressive model seems not to be a good predictor for electricity demand since these are mean-reverting processes and are not able to capture the possibility that future electricity demand may follow structural regimes which are completely different from that of the present. As the fuel price appears in the objective function, an autoregressive process model leads to a future cost function having a saddle shape (cf. Theorem 4), destroying the necessary convexity of the problem which allows it to be solved with SDDP. Hence, a Markov chain approach seems to be the natural way and was proposed by Pereira et al. [45], leading to fuel price clusters with transition probabilities. Again, such a model is difficult to calibrate and it seems not to capture the fuel price development completely. We will discuss in Sect. 4.4 how our approach differs. More complex models, such as the one by Batlle and Barquín [1], seem to more appropriately capture the fuel price uncertainty; the output of such models can be captured with a scenario tree.

In the literature, there is a wide range of publications suggesting scenario tree approaches for the stochastic load process and the stochastic fuel prices; see Nowak and Römisch [39]. There are different efficient methodologies for the generation of scenarios trees; refer to Høyland and Wallace [29], Casey and Sen [6], Batlle and Barquín [1] and Zhou et al. [61], while the reduction of the size of the tree is computationally very important as discussed by Gröwe-Kuska et al. [24] and Heitsch and Römisch [25], see also Sect. 3.

4.3 SDDPT towards hydro-thermal scheduling

Given are \(I\) hydro plants (index \(i\)) and \(J\) thermal plants (index \(j\)). The electricity demand \(D_t\) [MWh] for each stage \(t = 1, \ldots , T\) can be either satisfied through electricity generated by turbined water \(u_{it} [\hbox {m}^3]\) of any hydro plant \(i\) or through thermal power generation \(g_{jt}\) [MWh]. For simplicity, we assume that the generated energy from hydro plant \(i\) is given through the linear relation \(\rho _i u_{it}\), where \(\rho _i\) [MWh/m\(^3\)] is the constant production coefficient for hydro plant \(i\). A shortage in electricity supply of \(\delta _t\) [MWh] is allowed, in principle, but leads to a high penalty cost via the coefficient \(\Upsilon \) [$/MWh]—this is practically important but also motivated to ensure feasibility of the problem (cf. assumption A1).

The thermal power generation involves the variable production cost \(C_{jt}\) in $ per MWh produced and the thermal plants’ generation is subject to lower bounds \(\underline{G}_{jt}\) [MWh] and upper bounds \(\overline{G}_{jt}\) [MWh]. The hydro plants have no variable operation cost but the hydro power generation is subject to minimal \(\underline{U}_{it} [\hbox {m}^3]\) and maximal turbined water \(\overline{U}_{it} [\hbox {m}^3]\).

Without loss of generality, we can assume that each hydro plant \(i\) has a hydro reservoir which is subject to a lower bound \(\underline{V}_{it} [\hbox {m}^3]\) and an upper bound \(\overline{V}_{it} [\hbox {m}^3]\). Given the set of immediate upstream hydro plants \({\mathbb {U}}_i\) for plant \(i\), then, for each stage, there is a water balance equation, stating that the water level \(v_{i,t+1} [\hbox {m}^3]\) at the end of stage \(t\) for reservoir \(i\) has to equal the water level \(v_{it}\) at the beginning of stage \(t\) minus the turbined water \(u_{it}\) and the spilled water \(s_{it} [\hbox {m}^3]\) plus the water inflow \(A_{it} [\hbox {m}^3]\) and the water released from the immediate upstream plants. Furthermore, there is a minimum spillage \(\underline{S}_{it} [\hbox {m}^3]\) and a maximum spillage \(\overline{S}_{it} [\hbox {m}^3]\) per time stage \(t\) and hydro reservoir \(i\).

The objective is to minimize the (expected) variable cost over the planning horizon, while meeting the electricity demand for each stage.

4.3.1 Stochastic water inflows

The uncertain inflows are typically modeled as a linear autoregressive model via a continuous Markov process, taking into consideration the correlation with the inflows of the previous stage(s). For notational convenience, we assume a lag 1 model; i.e., the inflows in stage \(t\) depend only on the previous inflows in stage \(t-1\). The inflow model for stage \(t\) and reservoir \(i\) is then given by

$$\begin{aligned} A_{it} = \varsigma _{it} \left( \phi _{1i} \cdot \frac{A_{i,t-1} - \mu _{i,t-1}}{\varsigma _{i,t-1}} + \phi _{2i} \cdot \zeta _{it} \right) + \mu _{it} \end{aligned}$$
(96)

with inflow mean \(\mu _{it} [\hbox {m}^3]\), standard deviation \(\varsigma _{it} [\hbox {m}^3]\), model parameters \(\phi _{1i}\) [-] and \(\phi _{2i} [\hbox {m}^3]\), and random variables outcome \(\zeta _{it}\) [–] sampled from an appropriate distribution; typically a standard normal distribution. These random variables are correlated with respect to \(i\) but independent between stages \(t\). However, the inflows into each reservoir do not directly depend on the (past) inflows of all other reservoirs. Thus, using the notation of Sect. 3.4, matrix \(\varphi _t\) is a diagonal matrix with diagonal entries \(\varsigma _{it} \phi _{1i}{/}\varsigma _{i,t-1}\). In conclusion, model (96) satisfies assumptions S1 and S2.

Following the spirit of Sect. 3.4, we approximate the true, continuous inflow distributions via \(K_1\) inflow realizations \(A_{kt}\), \(k = 1, \ldots , K_1\), each having equal probability \(p_k = 1/K_1\). These inflow realizations are also called backward openings in the context of SDDP. Recall that this leads to a tree with \(K_1^{T-1}\) scenarios (representing only the inflow uncertainty). In order to “simulate” the stochastic inflow, a sample of \(N\) so-called forward inflows \(A_{nt}\), \(n = 1, \ldots , N\), is chosen from the scenario tree; i.e., \(N\) out of \(K_1^{T-1}\) scenarios are sampled.

4.3.2 Fuel cost and electricity demand uncertainty via scenario tree

Let us assume that we want to include uncertainties into the hydro-thermal scheduling problem, which are best captured via a scenario tree. Candidates for such uncertainties are fuel price uncertainty and electricity demand uncertainty; i.e., the data \(C_{tj}\) and/or \(D_t\) are now stochastic.

We need to assume that the hydro inflow and the uncertainty treated via a tree are independent, i.e., random vector \(\xi ^S\) and \(\xi ^T\) are statistically independent, cf. Sect. 3.5. This is justified in the context of our hydro-thermal application as the hydro inflows should have no influence on the fuel prices and the electricity demand (and vice versa). Without loss of generality, in this section, we discuss the uncertainty with respect to the fuel prices only.

Using the notation of Sect. 3.5.2, let \(\mathbb {S}_t\) be the set of different scenarios for the stochastic fuel price and \(C_{jst}\) be its realization with \(s \in \mathbb {S}_t\).

4.3.3 One-stage dispatch problem

For stage \(t\) and fuel cost scenario \(s\), past inflow samples \(A_{n,t-1}\), inflow scenarios \(A_{knt}\), (initial) water reservoir levels \(V_{nst}\) (\(A\) and \(V\) are vectors in the reservoirs \(i \in \mathbb {I}\)) and fuel cost \(C_{jst}\), the approximated \(t\)-stage problem (85)-(88) is the so-called one-stage dispatch problem (omitting indices \(k,s\) and \(t\) on the decision variables \(s,u,v,\delta \) and \(\eta \))

$$\begin{aligned}&z_{kst}(V_{nst},A_{n,t-1}) \mathrel {\mathop :}=\min \; \sum _{j=1}^J C_{jst} g_{j} + \Upsilon \delta + \sum _{\tilde{s} \in \mathbb {S}^\mathrm{S}_{st}} p_{s\tilde{s}t} \eta _{\tilde{s}}\end{aligned}$$
(97)
$$\begin{aligned} s.t.&\sum \nolimits _{j=1}^J g_{j} + \sum \nolimits _{i=1}^I \rho _i u_{i} = D_t - \delta \end{aligned}$$
(98)
$$\begin{aligned}&v_{i,t+1} = V_{inst} - u_{i} - s_{i} + \sum \nolimits _{h \in {\mathbb {U}}_i} \left( u_{h} + s_{h} \right) + A_{iknt} \quad \; \forall i = 1, \ldots , I\end{aligned}$$
(99)
$$\begin{aligned}&\eta _{\tilde{s}} \ge \sum \nolimits _{i=1}^I E_{fi\tilde{s}t} v_{i,t+1} + \sum \nolimits _{i=1}^I E_{fi\tilde{s}t}^h A_{iknt} + e_{f\tilde{s}t} \quad \; \forall \tilde{s} \in \mathbb {S}^\mathrm{S}_{st}, \;\; f \in \mathbb {F}_{\tilde{s}t} \end{aligned}$$
(100)
$$\begin{aligned}&\underline{G}_{jt} \le g_{j} \le \overline{G}_{jt}, \quad \; \underline{U}_{it} \le u_{i} \le \overline{U}_{it}, \nonumber \\&\underline{V}_{i,t+1} \le v_{i,t+1} \le \overline{V}_{i,t+1}, \quad \; \underline{S}_{it} \le s_{i} \le \overline{S}_{it}, \quad \; \delta \ge 0,\nonumber \\&\qquad \qquad \qquad \qquad \qquad \qquad \quad \forall i = 1, \ldots , I, \,\; j=1, \ldots , J. \end{aligned}$$
(101)

Constraints (98) model the electricity demand. The water balance equations for each reservoir are given by constraints (99). The optimality cuts for the expected \(t+1\)-stage value function are stated in (100). Finally, the bounds on the decision variables are represented in (101). For notational consistency, we have used \(\eta \) rather than \(\alpha \) or \(\beta \) which are typically used in the context of least-cost hydro-thermal scheduling and profit maximization models in the context of hydro-thermal scheduling, respectively.

The parameters as defined in (89)-(91) for the optimality cuts (100) are obtained by

$$\begin{aligned} E_{fi\tilde{s}t} \!= & {} \! \sum \nolimits _{k = 1}^{K_1} p_{k} \pi _{i,k,n,s,t+1} \\ E_{fi\tilde{s}t}^h \!= & {} \! \sum \nolimits _{k = 1}^{K_1} p_{k} \left( \sum \nolimits _{s \in \mathbb {S}^\mathrm{S}_{\tilde{s},t+1}}\sum \nolimits _{f \in \mathbb {F}_{t+1}}\overline{\pi }_{f,i,k,s,t+1}E_{f,i,s,t+1}^{h} + \pi _{i,k,n,s,t+1}\right) \varphi _{t+1} \\ e_{f\tilde{s}t} \!= & {} \! \sum \nolimits _{k=1}^{K_1} p_k \; z_{k,s,t+1}(V_{n,s,t+1},A_{nt}) \!-\! \sum \nolimits _{i=1}^I \left( \! E_{f,i,\tilde{s},t+1} V_{n,s,t+1} \!+\! E_{f,i,\tilde{s},t+1}^h A_{nt} \!\right) \end{aligned}$$

where each “successor” \(s\) of scenario \(\tilde{s}\) in stage \(t+1\) (i.e., \(s \in \mathbb {S}^\mathrm{S}_{\tilde{s},t+1}\)) and each sample \(n\) leads to one cut \(f\); \(\pi _{ikn\tilde{s}t}\) and \(\overline{\pi }_{fikn\tilde{s}t}\) are an optimal dual solution corresponding to constraints (99) and (100), respectively. Notice that \(T_{s,t-1}\) in (97)–(101) is scenario independent and consists of the negative identity matrix for variables \(v\) and the zero matrix corresponding to the other decision variables. Therefore, decision variables \(s,u,\delta \) and \(\eta \) do not (directly) participate in the optimality cuts (100).

This enables us to apply the SDDPT Algorithm 3.5 towards our least-cost hydro-thermal scheduling problem, incorporating both inflow uncertainty and fuel cost uncertainty.

4.4 Scenario tree versus Markov chain

Pereira et al. [45] proposed a Markov chain to cope with fuel price uncertainty. This Markov chain has a state space size of \(\overline{K}\) and is time-homogeneous. Hence, the transition probability distribution does not depend on the stages and can be represented by a right stochastic matrix. This leads to \(\overline{K}\) price clusters for each stage \(t\) with stage independent transition probability. The incorporation within SDDP works as follows. Each forward inflow \(n\) gets assigned exactly one such price cluster (hence, \(N \ge \overline{K}\)) and the approximated \(t+1\)-stage value function in (97) is substituted by the expected value of the \(t+1\)-stage value function for each price cluster; e.g., variables \(\eta \) get an additional index \(\kappa \)—cuts cannot be shared among the price clusters. This leads to the very nice property that the number of LPs to be solved remains the same as in the standard SDDP. However, the main drawback of this method is that it is practically very tricky to define the initial values for the cost clusters and to derive meaningful transition probabilities. Furthermore, it is questionable if the fuel prices really evolve according to a time-homogeneous Markov chain.

In contrast, the scenario tree approach provides a natural way to forecast fuel prices and/or electricity demand. Government agencies such as the US Energy Information Administration (EIA) or the International Energy Agency (IEA) regularly publish fuel price and electricity demand forecasts on a scenario basis. The World Energy Outlook by the IEA, for instance, based its forecasts in 2007 on three different scenarios: a reference scenario, an alternative policy scenario and a high growth scenario. Those data are readily available and can be transformed into a scenario tree straightforwardly. This practical advantage comes with the cost that the number of LPs to be solved increases with the size of the tree; cf. Sect. 3.6. Hence, one wants to ensure to use a tree of size as small as practically feasible.

As mentioned in Sect. 4.2, the hybrid SDP/SDDP method [19] also combines two sources of uncertainty: stochastic inflows and electricity spot prices. In this framework, however, electricity spot market prices are clustered and each (forward) inflow scenario is assigned to one such cluster. Transition probabilities between clusters capture the uncertainty of the prices. Thus, electricity spot market prices are treated as a Markov chain. Similar clustering techniques where adopted in the context of risk management [31] as well as fuel contracts [7, 8].

An explicit combination of SDDP framework with a Markov chain was proposed by Philpott and de Matos [46]. The authors take advantage of the special structure imposed by the Markov chain to allow for cut sharing within each Markov state but not between the states. The approaches mentioned above share this feature and Philpott and de Matos [46] is the first work to explicitly present this Markov chain framework with the ability to share cuts within each state.

All these Markov chain approaches are similar to the SDDPT approach in that cuts are shared within, but not between, each state or scenario tree node, respectively. Technically, they differ in (1) a Markov chain approach combined with an inflow model allows to capture dependencies between the inflows and the other stochastic process (e.g., sport market prices) in contrast to the proposed model in this paper where the two underlying stochastic processes are assumed to be independent; (2) in the Markov chain approach, inflow scenarios are assigned to a state (or cluster) while in SDDPT all inflow scenarios are considered for each scenario tree node; and (3) a Markov chain typically possess the same number of states per stage, while a scenario tree grows exponentially. In addition, associating the inflows as part of the information of the states in the Markov process does not allow inflow models with lag \(> 1\) as this violates the Markovian property; the SDDPT approach can be combined with an affine (inflow) model of any lag.

4.5 Illustrative example

To demonstrate the utility of the proposed decomposition algorithm, we use it to determine a cost-minimizing operations schedule for a small hydro-thermal power system. We chose to model a small system so that the extensive form of the MSLP can be solved as a monolith with CPLEX. The mathematical programming formulations and the SDDPT algorithm are implemented in GAMS 24.1.3 using CPLEX with GUSS. The computations are executed on a 64 bit machine with an Intel(R) Core(TM) i7 CPU @ 2.93 GHz, 48 GB RAM running Ubuntu 12.04.4.

The system under study is comprised of three thermal plants and one hydro plant. We model the system over six stages while considering the uncertainty in fuel costs and in inflows. Fuel-cost uncertainty is modeled via four scenarios. The demand for electricity is fixed at 100 MWh per stage \((D_t=100, t=1,\ldots ,6)\) and the time that electricity is rationed the producer is penalized by 1000 $/MWh (\(\Upsilon = 1000\)). Thermal plants need not produce at any minimal level \((\underline{G}_{jt}=0, t=1,\ldots ,6)\), but must not exceed their given capacities \((\overline{G}_{1t}=30, \overline{G}_{2t}=40, \overline{G}_{3t}=20\) MWh/stage, \(t=1,\ldots ,6)\). The thermal generation cost are uncertain and are represented by a scenario tree, cf. Fig. 4. Each scenario has the same probability.

Fig. 4
figure 4

Scenario tree for thermal generation cost; legend: \((C_{1st}, C_{2st}, C_{3st})\) in ($/MWh)

The hydro plant has a power production coefficient of \(1 \hbox {MWh/m}^3 (\rho _1=1)\) and turbining capacity of 10 \(\hbox {m}^{3} (\underline{U}_{1t}=0, \overline{U}_{1t}=10, t=1,\ldots ,6)\). The associated reservoir has a minimal reservoir limit of 8 \(\hbox {m}^3\) and a capacity of 25 \(\hbox {m}^3 (\underline{V}_{1t}=8, \overline{V}_{1t}=25, t=1,\ldots ,6)\). The initial reservoir level and the end of horizon reservoir level are \(10 \hbox {m}^3 (v_{11} = v_{17} \equiv 10)\). The inflows into the single reservoir are assumed to be uncertain and stage-wise independent. The following discrete values (with equal probability) are assumed: 10, 2, 15, 13, 8, 6, 11, 4, 7, 17 \(\hbox {m}^3\). The inflows are deterministic in the first stage at value 10 \(\hbox {m}^3\).

Each of the \(K_1\) inflow scenarios per stage is combined with the uncertainty in the thermal generation cost. The total number of scenarios \(S\), combining both inflow and generation cost uncertainty, is thus \(S = |\mathbb {S}_T| \cdot K_1^{T-1} = 4 \cdot K_1^5\). Figure 5 illustrates the resulting tree for \(K_1=3\) and \(S=972\) scenarios.

Fig. 5
figure 5

Complete scenario tree combining both inflow uncertainty \((K_1=3)\) and thermal generation cost as in Fig. 4. The gray rectangles represent the nodes of the scenario tree of Fig. 4

Table 3 summarizes the results found from modeling the hydro-thermal system for a different number of inflow realizations, \(K_1\), and the corresponding number of scenarios, \(S\).

The remaining columns show the size of the resulting MSLP (11)–(14) in extensive form (number of constraints “rows,” number of decision variables “columns,” and non-zero entries in the constraint matrix “non-zeros”), the optimal objective function value “\(z^\mathrm{E}\)”, and both the generation and total solution time (in seconds), when solved with CPLEX.

Table 3 MSLP in extensive form for illustrative example

The computational results for SDDPT are summarized in Table 4. SDDPT is stopped as soon the computed lower bound, \(\underline{z}\), equals the optimal solution (with an absolute tolerance of \(10^{-5}\)).

Table 4 SDDPT results for the illustrative example

This is an artificial stopping criterion that illustrates the number of iterations (“iter”) the algorithm requires for different configurations and the resulting computational times (in seconds). In each case, the algorithm is executed 25 times using a different sequence of random draws to model the inflow uncertainty. The table shows the minimum (“min”), maximum (“max”) and average (“\(\emptyset \)”) values over the 25 runs. After the algorithm terminates, we solve the \(T\)-stage problem over all different inflow scenarios with the computed cuts to confirm that the optimal solution to the MSLP has been obtained, i.e., there are \(K_1^{T-1}\) to evaluate. Again, this is artificial but is used to evaluate the algorithm. In all cases, including the nine runs in which the iteration limit of 1000 is reached, our algorithm computes an optimal policy.

We observe that the performance of the SDDPT algorithm depends strongly on the number of samples, \(N\), relative to the number of scenarios \(K_1\). While a larger \(N\) tends to require fewer iterations, for a fixed number of iterations, a larger \(N\) is computationally more expensive. This leads to a trade-off between \(N\) and the computational time. For \(K_1 \ge 8\), the computational times of SDDPT are lower for all configurations than the computational times of the monolith. Figure 6 shows how the lower bounds for the example evolves typically.

Fig. 6
figure 6

Evolution of lower bound (“\(y\)-axis”) for \(K_1 = 7\) and \(N=5\) over the number of iterations (“\(x\)-axis”). The figure shows 25 different runs, utilizing a different random sequence of the inflow samples

4.6 Case study for Panama

We present and discuss the results of the application of electricity demand uncertainty by studying the cases of Panama’s power system. The consideration of electricity demand and fuel price uncertainty is especially important not only to Panama, but also to a number of countries in Central America due to three main factors:

  • Significant share of hydro resources and existence of reservoirs The fact that a considerable part of the system’s installed capacity comes from hydro plants and the existence of reservoirs which are capable of seasonal regulation stresses the necessity of taking into account the uncertainties related to electricity demand and fuel prices. By having a more detailed representation of the evolution of these parameters; i.e., instead of relying on single point estimates for electricity demand and fuel prices throughout the horizon, one is capable of having a more accurate calculation of the opportunity cost of water, which ultimately determines the system’s operating policy.

  • High dependence on international markets Resources such as oil, coal and natural gas are not usually abundant in these countries and, consequently, they experience a severe dependence on international markets, being exposed to both availability and price issues. By factoring into the problem the possibility that there might be a stronger need for these fuels in the future or that they might be a lot more expensive then, the obtained solution may be hedged against extreme events that would otherwise compromise security of supply or lead to unbearable costs.

  • Supply adequacy issues There are countries in which the whole system is designed and dimensioned according to a pre-defined reliability criterion which may be, for example, a maximum percentage risk of running into a situation where part of the load has to be shed. In such cases, the need for the installation of new generation capacity is assessed by means of successive simulations of the system for a given set of inflow scenarios (and usually fixed electricity demand and fuel prices): more capacity is added as long as the results indicate a risk of deficit greater than 5 %. In such cases, a simulation of exactly the same supply configuration associated with an increase in fuel prices would lead to deficit risks greater than those previously calculated.

We study the effects of electricity demand uncertainty on the first stage decisions; we know the electricity demand for the first stage with certainty. However, all future electricity demands per stage are not known and have to be forecasted. For mid-term optimization models, the first stage decisions are the information of interest. In hydro-thermal scheduling, mid-term optimization problems provide information for the water reservoir management; i.e., water reservoir levels are priced via the expected value function cuts, see Wallace and Fleten [58]. We use those cuts in our studies to obtain solutions for the first stage. This allows us to study the electricity demand effects on an annual basis for different first stage decisions.

In our computational results, we consider 7 different electricity demand scenarios. The electricity demand for January is the same for each scenario while the electricity demands for all other months are given by the cumulative percentage change relative to the reference scenario at \(\pm 1.75\,\%, \pm 1.00\,\%\) and \(\pm 0.50\,\%\) ; i.e., in stage \(t, 1 < t \le T\), an \(x\%\) change for the electricity demand \(D_t\) of the reference scenario leads to the new electricity demand of \((1+x)^{t-1}D_t\). The different electricity demand scenarios for the case of Panama are shown in Fig. 7. We observe that resulting scenario tree is a fan with a total of 7 scenarios.

Fig. 7
figure 7

Electricity demand scenarios for Panama

The time horizon of choice is 1 year with monthly stages where the first month is January and the last month considered is December. To achieve accurate computational results and to reduce noise, we use \(N = 100\) forward inflow scenarios and \(K_1 = 50\) backward openings for the SDDPT algorithm. We assume stage-wise independent inflows. We stop the algorithm after 100 iterations. The implementation and the computational environment is the same as in Sect. 4.5.

For the end of the planning horizon, we define a target of reaching the same water reservoir levels as at the beginning of the model, which can be somewhat justified by the seasonal pattern of the water inflows. The future water values for the last stage is set to zero, consistent with the algorithms developed in Sect. 3. The resulting end effects are expected to very marginally influence the first stage decisions (i.e., the dispatch decisions for January) and the water values at the first stage.

The case study presented focuses on the modeling of electricity demand uncertainty together with uncertainties in the inflow. The motivation of these examples is to demonstrate the potential and usefulness of the developed algorithms in an academic set-up. Thus, the case study represents a rather small hydro system, compared to the Norwegian or the Brazilian systems. In this context, we want to mention two important aspects: (1) The presented hydro-thermal scheduling models are a simplification of reality. For instance, non-linear reservoir head effects or energy transmission capacities are not included in the model. (2) Though the case study focuses on electricity demand and inflow uncertainty, the inflow uncertainty remains the main driver for the optimal policies for hydro-dominated power systems. Thus, for real systems, one might first enhance the aspects related to inflow modeling, for instance, by taking weekly decisions and better forecasting methods than linear autoregressive models of type (96), before focusing on the treatment of additional uncertainties.

In 2009, Panama’s electricity power system consisted of 14 thermal plants, 4 plants with hydro-reservoirs as well as 1 run-of-the-river plant. The thermal plants’ data are given in Table 5. The generation cost per MWh ranged from $70.6 to $313.4 for all months; we assume a constant fuel price and an annual discount rate of 10 %. For the first month considered, the fixed thermal generation is 82.4 GWh with a generation cost of $10.3 million and the thermal capacity is 426.4 GWh. Over a one year horizon, the fixed generation cost accumulates to $115.3 million. A schematic diagram of the hydro system of Panama is shown in Fig. 8. The installed hydro-electric capacity for January was 529.4 GWh. The electricity demand for January is assumed to be known at 537.3 GWh. Further, we assume an electricity demand presenting seasonal effects with a difference of 2.1 % between January and December. This pattern can be seen in Fig. 7.

Fig. 8
figure 8

Hydro-electric system of Panama

Table 5 Thermal plants considered for the Panama system

Consider now Table 6. Computational results are shown for 7 different electricity demand scenarios and a stochastic scenario corresponding to Fig. 7. Thus, the stochastic case associated with the demand uncertainty has 7 different scenarios having the same realization in the first stage corresponding to January; all 7 scenarios occur with the same probability. The second row gives the yearly electricity demand. The row “Thermal” gives the thermal generation for the first stage, while the row labeled “Hydro” indicates the hydro-electric generation in GWh for the first stage. The cost are reported both for the lower bound “Cost \(\underline{z}\)” after 100 iterations as well as for the approximate upper bound “Cost \(\hat{z}\)” after evaluating the solution for 10,00 random samples, i.e., we solve the hydro-thermal scheduling problem using the computed Benders cuts for 10,000 samples (out of the \(K_1^{T-1} \approx 4.9 \cdot 10^{18}\) samples from the tree corresponding to the inflow uncertainty) and report the average observed cost. The fixed generation cost are excluded. Next, the computational times in seconds are recorded, though the decomposition algorithms are implemented in a modeling language where computational speed is not the main focus. “EVS” stands for Expected Value Solution and the corresponding rows indicate the corresponding yearly generation cost. The rows “\(\varDelta \)” provide the percentage change with respect to the scenario having average (or mean) demand.

Table 6 Computational results for the electricity system of Panama with different electricity demand scenarios

We observe the typical behavior of the lower bound of the SDDP algorithm in Fig. 9a, where most improvement is realized in the first 15 iterations; cf. Fig. 9b. Using 100 forward inflow scenarios and 100 Benders iterations, leads to 10,000 Benders cuts for the first stage for each of the demand scenarios and 70,000 Benders cuts for the stochastic case. Nevertheless, having 50 possible realizations of the inflows per stage (except the first stage) leads to a very large scenario tree where we expect that the computed lower bounds are not optimal (in contrast to the illustrative example in Sect. 4.5).

Fig. 9
figure 9

Evolution of lower bound using SDDP over 100 iterations, including the fixed thermal generation cost of $115.3 million. a Absolute value of lower bounds, b change in % compared to lower bound at iteration 20

The results in Table 6 reveal that the first stage decisions (i.e., the decisions corresponding to January) are very sensitive to changes in electricity demand; recall that the electricity demand of the first stage is the same in each scenario. This is explained by the very idea of hydro-thermal electricity systems, in which one wants to hedge against dry seasons where the installed thermal capacity might not be sufficient to meet the electricity demand (or very expensive thermal generation units might be needed). Relatively full hydro-reservoirs can prevent electricity shortages during those seasons. However, this comes with the “risk” that some water might have to be spilled if a wet season occurs. This explains the trends in the higher (lower) thermal electricity generation for electricity demand increases (decreases).

At an electricity demand increase, the thermal electricity generation does not increase with the same rate as it decreases at an electricity demand decrease; the same holds true for the annual cost. The first reason is given by the hedging against dry seasons; i.e., the increase in future electricity demand leads to a proportionally higher increase of the hydro electricity than a decrease for the case of electricity demand decreases. The second reason is that a decrease in electricity demand may allow to use some hydro-electric power in the first stage to avoid the production using the most expensive thermal plants. The third reason explaining the relative similarity in electricity production of the case of \(-\)1.00 % and \(-\)1.75 % is the fact that all the “cheap” thermal plants have already been used in the first stage generation.

Let us now have a look at the solutions for the “average” case and the “stochastic” case. The “average” case reports on the cost observed for the (unrealistic) assumptions of a perfect foresight with respect to the electricity demand scenarios and that only the 7 considered scenarios occur, each with equal probability. Thus, the resulting annual cost of the average case represents the case of perfect information of the electricity demand. The lower bound of this cost is given by $161.855 million. The difference between the cost when having perfect information and the cost of the stochastic approach amounts to approx. $912,600 (when comparing lower bounds) and approx. $312,000 (when comparing approx. upper bounds), representing what is called the expected value of perfect information. The solution for the stochastic case is obtained by incorporating the cuts computed for each of the 7 demand scenarios; the solutions for EVS are obtained in the same way.

As shown in Table 6, the thermal and hydro electric generation decisions for the first stage differ significantly for the “mean” and “stochastic” case. The reason is once again that the extreme cases of electricity demand increases may lead to electricity shortages in future stages which are penalized heavily. Hence, the higher reservoir levels in the stochastic case compared to the “average” case are a hedging against future electricity shortages.

Using the expected electricity demand’s first stage solution in any of the electricity demand scenarios leads to the so-called EVS. Recognize that we use this term with respect to the “two-stage uncertainty” in the electricity demand embedded in a multi-stage stochastic optimization context. Hence, this is can be seen as an adaptation of this recognized terminology (cf. Birge and Louveaux [5]). The increase in annual operation costs by ignoring the random variations in the electricity demand compared to the stochastic approach is called the value of the stochastic solution (VSS). For our data, we have that the VSS is approx. $163,000 (when comparing lower bounds) and approx. $87,000 (when comparing approx. upper bounds).

5 Conclusions

The SDDP algorithm allows for a detailed representation of the system’s characteristics—in particular, it becomes possible to represent hydro plants individually—while considering uncertainty in inflow scenarios and remaining computationally tractable. For these reasons the SDDP algorithm is typically used in finding the solution to the least-cost hydro-thermal scheduling problem. The SDDP methodology relies on the approximation of the expected future value function by a set of linear inequalities (Benders cuts), which can be iteratively constructed until a convergence criterion is achieved.

In this work, we proposed an extension of the SDDP methodology which permits us to incorporate additional sources of randomness whose evolution in time is more accurately represented in the form of scenario trees, such as electricity demand and fuel prices. The proposed approach amounts to parallelizable runs of the SDDP algorithm, each corresponding to the data associated with a branch of the scenario tree of a given stage. Whenever two or more branches of the scenario tree are joined at a node, the corresponding expected future value functions are also merged, following the probability of occurrence associated with each branch. Therefore, at these nodes a single expected future value function is obtained and the algorithm continues in the same manner until it reaches the scenario tree root node.

The impact of taking into account electricity demand and fuel price uncertainty may reach beyond the operational scheduling problem and extend into supply adequacy and load supplying capability issues. The importance of taking into account electricity demand uncertainties was explored by using the real power systems of Panama as a case study.

The incorporation of fuel price uncertainty might be especially useful when maximizing profits in a deregulated energy market. Typically, risk constraints are taken into account in such models, “penalizing” a certain risk exposure. The hybrid SDP/ SDDP algorithms for hydro-thermal profit maximization models can adopt our methodology in a straight forward way. We see potential of our approach for fuel price uncertainty in such an environment.