1 Introduction

This work deals with a mid-term reservoir optimization problem over a finite planning horizon. In each period, water must be released from the reservoirs to produce electricity. However, the release decisions are constrained by not only the availability of water, but also the physical limits of the turbines, and bounds on the level of the reservoirs, that may be set by legal requirements. This problem is rightfully acknowledged to be difficult, in particular due to the uncertainty associated with the natural inflows to the reservoirs, e.g., snow-melt, snow water equivalent.

Thus, the mid-term reservoir optimization is inherently a multiperiod stochastic problem. As a result, the problem is often cast as a multiperiod stochastic program or formulated under the framework of stochastic dynamic programming. Numerous meta-heuristic approaches have also been proposed for reservoir optimization problems, e.g., Almubaidin et al. (2022). Two recent systematic reviews of such methods are available in Azad et al. (2020), Beiranvand and Ashofteh (2023).

When stochastic programming (SP) is employed to solve the problem, the random variables, e.g., natural inflows, and demand for energy, are discretized via a so-called scenario tree, which easily becomes intractable if a detailed representation of the stochastic variables is needed. This issue is often dealt with through decomposition strategies, such as Benders’ decomposition, e.g., Carpentier et al. (2014), Rebennack (2016), the progressive hedging algorithm, e.g., Gonçalves et al. (2012), Carpentier et al. (2013), Zéphyr et al. (2014), in which the so-called non-anticipativity constraints are dualized in the objective function, stochastic dynamic programming (SDP) (Ruszczyński and Shapiro 2003; Shapiro et al. 2009), scenario tree reduction strategies (Dupačová et al. 2003; Bin et al. 2015), model predictive control, e.g., Nolde et al. (2008), Uysal et al. (2018), Lin et al. (2020), etc.

Being a sequential decision-making problem, the mid-term optimization of reservoir lends itself naturally to stochastic dynamic programming (SDP). Indeed, in the groundbreaking theory of dynamic programming presented in Bellman (1958), Bellman decomposed a multi-stage decision process stagewise in a coordinated manner. Thus, it is no surprise that DP quickly found a fertile ground for reservoir optimization applications (Labadie 2004).

The solution of SP or SDP reservoir management problems broadly consists of two main steps, namely (i) the calculation of an expectation; and (ii) an optimization step, or vice-versa. In models for the mid- or long-term planning of hydroelectric production, the optimization step often has to deal with nonlinear objective functions, due to, among others, nonlinear production functions (Cerisola et al. 2012). To take advantage of the widespread availability of linear programming solvers, the combined power response curve of the turbines at a power plant can often be approximated reasonably well by a concave, piecewise linear function of turbined water flow, even though the response curves of the individual turbines may be highly nonlinear. For instance, this strategy is used by companies like Hydro-Quebec (Carpentier et al. 2013) and Rio Tinto (Côté and Arsenault 2019) (4-reservoir system) to approximate production functions; similarly in studies on the Colombian power network (Morillo et al. 2020) (15-reservoir system), on a “network of hydropower plants and irrigated areas in the Nile Basin” (Goor et al. 2011), the Brazilian electrical system (Diniz and Maceira 2008) (110 hydro plants). An immediate consequence of this approximation scheme is that under mild assumptions on the terminal value/cost-to-go function, one can easily show that the value/cost-to-go functions are concave/convex in the reservoir levels. These ideas are also exploited in Zéphyr et al. (2015), Zéphyr et al. (2017) where an approximate stochastic dynamic programming model of a multiperiod, multireservoir hydroelectric system is presented in which the Bellman value function is approximated by a piecewise linear function that is evaluated by linear programming. The piecewise linear approximation is supported by a finite grid of node points (or vertices) in the continuous state space where the Bellman function is evaluated at the nodes. Similarly, using a finite grid for the reservoir levels, Dias et al. (2010) approximates the expected value of the Bellman value function with a piecewise linear function, generated by a set of hyperplanes using a convex hull algorithm. The latter approach is applied to a power generation planning problem in Dias et al. (2013).

Resorting to SDP to solve reservoir optimization problems poses another technical challenge, since in theory an optimization problem has to be solved for each possible state value, which is impossible due to the fact that the reservoir level space is continuous. Thus, the latter must be discretized or sampled. The simplest discretization strategy to approximate our continuous dynamic program consists in constructing a uniform grid, obtained as the Cartesian product of same-size and fixed-spacing grids along each dimension of the reservoir level (state) space. However, this approach is impractical, as the complexity of the problem increases exponentially with the dimension of the state space, limiting applications to three to four reservoirs. This is known in dynamic programming as the curse of dimensionality.

The above uniform discretization scheme has inspired the development of parsimonious approaches that select sub-samples of points along each dimension of the state space, and then use analytical functions based on multi-linear interpolations, polynomials, cubic splines, to approximate the Bellman function (Johnson et al. 1993). As these techniques did not prove to be a panacea against the dimensionality issue, statistical techniques have been employed to sample the state space more efficiently. Perhaps, one of the oldest strategies is Latin hypercube, in which each dimension of the state space is discretized into p values, and the overall sample is chosen so that each uni-dimensional value is selected exactly once. This is a special case of orthogonal array with strength d, where \(d\le n\), n being the dimension of the state space. Under this scheme, each uni-dimensional grid point is chosen exactly a same number of times in each possible \(d\)-dimensional subspace (Chen et al. 1999).

Other sampling techniques resort to some form of Monte Carlo simulation to sample the state space in contrast to the discretization strategies used in the above-mentionned schemes. For instance, in stochastic dual dynamic programming (SDDP), originally developed for reservoir optimization problems in the seminal works (Pereira and Pinto 1985, 1991), the connections between SP and SDP, e.g., Ruszczyński and Shapiro (2003), Shapiro et al. (2009), are exploited to efficiently sample the reservoir level space, based on Monte Carlo simulation. Assuming the natural inflows to be temporally independent, SDDP alternates between a backward pass, to build the so-called value/cost-to-go functions, and a forward step, to draw a sample of state space values to approximate the value/cost-to-go functions in the next backward loop, until a convergence criterion is met. In contrast with classical SDP, where the state space is distretized into an evenly spaced grid, in SDDP, the sate space is iteratively sampled by simulating trajectories of reservoir levels through the forward passes, thus mitigating the inherent curse of dimensionality of SDP. By thus sampling the state space, SDDP selects grid points in regions of operations of the reservoirs, in contrast with classical SDP, where the state space is uniformly discretized.

On the other hand, quasi-randomized or quasi-Monte Carlo sampling techniques, where randomly generated points are replaced with more evenly distributed ones, based on the notion of low-discrepancy sequences, are known to enjoy faster convergence rate than randomized techniques (Cervellera et al. 2013; Cervellera and Muselli 2007). For further account of reservoir optimization techniques, please see Labadie (2004), Rani and Moreira (2010), Ahmad et al. (2014) and Dobson et al. (2019).

In Zéphyr et al. (2015), we proposed a simplicial approximate SDP approach for the mid-term optimization of reservoirs. While the previous work laid the foundation of the approach, in Zéphyr et al. (2017), we extended the methodology to the optimization of multireservoir systems with highly correlated natural inflows, in which the support of the random variable reduces from n to 1. The simplicial approach was exploited to derive an analytical form for the expected value of the value function under the assumptions that the natural inflows follow a truncated normal or a log-normal distribution.

The iterative scheme amounts to partitioning the reservoir level space into a finite but potentially large set of simplices in each period of the planning horizon. The value function is evaluated at the extreme points of the resulting simplices, and interpolated elsewhere. In addition, error bounds are computed for all simplices and, at each iteration, a new grid point associated with the largest error bound is added to the grid, and the simplex containing the point is divided into smaller simplices that are appended to the list of existing simplices. Thus, in each period, constructing the grid requires to maintain a complete list of simplices that spans the whole reservoir level space. Because the number of simplices increases fast with the grid size and with the dimension of the state space, this method becomes impractical for models with many reservoirs.

This work is essentially a revisit of the sampling approach presented in Zéphyr et al. (2015), in which, in each period, we avoid making a list of simplices and randomly sample the reservoir level space to select grid points at which the value function is approximated. We resort to linear programming to identify the simplex containing a candidate grid point and to obtain a local error bound on the approximation of the Bellman function. Then, the global error bound is estimated using a statistical model. This is motivated by the computational burden of the simplicial scheme, induced by the exponential growth of the number of created simplices, which limits applications to dimensions lower than ten, based on our empirical observations.

The remainder of the paper is organized as follows. We provide a detailed description of the problem under analysis in Sect. 2. Next, we discuss a simplicial approximate stochastic dynamic programming (ASDP) scheme for the problem in Sect. 3, followed by a hybrid Monte Carlo simplicial ASDP proposal in Sect. 4. Results of extensive numerical experiments are reported in Sect. 5. The paper ends with concluding remarks in Sect. 6.

2 Reservoir optimization problem

A hydropower system often comprises power plants that may or may not be associated with reservoirs. Reservoir optimization problems are typically divided into long-, mid-, and short-term, depending on, among other factors, the length of the planning horizon (Raso and Malaterre 2017). In a mid-term problem, which is of interest to us, the time span is typically between one and five years (van Ackooij et al. 2014), divided into daily, weekly, or monthly time steps (Zéphyr et al. 2017).

In this work, we consider a mid-term reservoir optimization problem over a finite horizon of T periods. At each period t, the operator of the system wants to find the release, \({\varvec{u}}_t\), and storage, \({\varvec{s}}_t\), decisions that maximize the expected total energy production. Without loss of generality, we assume each plant to be associated with a reservoir; the random natural inflows to the reservoirs are denoted \(\tilde{{\varvec{q}}_t}\).

At each period t, water released from each reservoir \(i=1,\ldots ,n,\) is limited by the turbine capacity, \(\varvec{\overline{u}}\), to prevent physical damage. Similarly, due to legal and environmental considerations, at each time period, the level of the reservoirs must be kept between lower and upper limits, \(\varvec{\underline{s}}\), and \(\varvec{\overline{s}}\), respectively.

In addition, we assume the topology of the system to form an arborescence, i.e., a combination of reservoirs in series and in parallel. Water released upstream are absorbed by the immediate successors (reservoirs) at the same period, and in case of overflow, excess of water from upstream reservoirs, \({\varvec{y}}_t\), are absorbed by immediate successors or spilled out of the system.

At each period t, the state of the system is governed by the standard mass balance equation:

$$\begin{aligned} {\varvec{s}}_{t}={\varvec{s}}_{t-1}-{\varvec{B}}{\varvec{u}}_t-{\varvec{C}} {\varvec{y}}_t+ \tilde{\varvec{q}}_t, \end{aligned}$$
(1)

where entries of the square connectivity matrix, \(B_{ij}\), are 1 for \(i=j\), -1 if the water released from reservoir j is routed to reservoir i, and 0, otherwise. The elements of the square matrix \({\varvec{C}}\) similarly define the routing of the spilled water.

As in Zéphyr et al. (2015), for each plant \(i=1,\ldots ,n\), we assume the production function \(p_{it}\) to nonlinearly depend on the release and the storage at the beginning of the period.

A typical multi-period mid-term reservoir optimization problem reads:

$$\begin{aligned}&\max _{{{\varvec{u}}_t,\varvec{y_t}} }\mathbb {E}_{ \tilde{{\varvec{q}}}_t}\left[ \sum _{t=1}^T\sum _{i=1}^np_{it}(u_{it})+V_{T+1}(\varvec{s}_{T+1})\right] \end{aligned}$$
(2)
$$\begin{aligned}&\text {s.t., for }\, t=1,\ldots ,T: \end{aligned}$$
(3)
$$\begin{aligned}&\qquad {{\varvec{s}}_{t+1}={\varvec{s}}_{t}-B{\varvec{u}}_t-{\varvec{C}}\varvec{y_t}+ \tilde{ {\varvec{q}}}_t} \end{aligned}$$
(4)
$$\begin{aligned}&\qquad \varvec{\underline{s}}\le {{\varvec{s}}_{t+1}}\le \varvec{\overline{s}} \end{aligned}$$
(5)
$$\begin{aligned}&\qquad 0\le {\varvec{u}}_t \le \overline{{\varvec{u}}} \end{aligned}$$
(6)
$$\begin{aligned}&\qquad {\varvec{y}}_t \ge {\varvec{0}}, \end{aligned}$$
(7)

where, \(\mathbb {E}\) is the expectation operator, and \(V_{T+1}(\varvec{s}_{T+1})\), assumed to be a concave function, captures the terminal value of the stored water in the system.

At each time period t, assume the operator of the system observes the level of the reservoirs, the realization \({\varvec{q}}_t\) of the random natural inflows, \(\tilde{{\varvec{q}}}_t\), and decides on the water released, spilled and stored to find the best trade-off between utilizing the available water for current production needs and leaving it for the future. Under this setting, and by Bellman’s principle of optimality, Problem (2)-(7) can be reformulated as a sequence of coordinated subproblems, moving backward in time, i.e., for \(t=T,T-1,\ldots , 1,\)

$$\begin{aligned} V_t\left( {\varvec{s}}_t,{\varvec{q}}_t\right) :=&\max _{{{\varvec{u}}_t,\varvec{y_t}}} \left\{ \sum _{i=1}^np_{it}(u_{it})+\mathcal {V}_{t+1}\left( \varvec{s}_{t+1},\tilde{{\varvec{q}}}_{t+1}\right) \right\} \end{aligned}$$
(8)
$$\begin{aligned}&\text {s.t. }\,(4){-}(7), \end{aligned}$$
(9)

where \(V_t(\cdot )\), called value function, measures the value of the stored water from period t onward, and \(\mathcal {V}_{t+1}(\cdot ):=\mathbb {E}_{\tilde{{\varvec{q}}}_{t+1}|{\varvec{q}}_t} V_{t+1}\left( {\varvec{s}}_{T+1},\tilde{{\varvec{q}}}_{t+1}\right)\). As in Zéphyr et al. (2017), since the terminal value function is concave, we observe that if the production functions are concave, the problem is convex and the concavity of the value function \(V_t({\varvec{s}}_t,\cdot )\) propagates backwards.

Proposition 1

If (i) \(p_{it}(u_{it})\) is concave in \(u_{it}\), and (ii) the support of \(\tilde{{\varvec{q}}}_{t}\) is discrete and finite, then \({V}_{t}({\varvec{s}}_{t},\cdot )\) is concave in \({\varvec{s}}_{t}\).

Proof

The feasible domain of Problem (8)–(9) is a polyhedron; since \(V_{T+1}({\varvec{s}}_{T+1},\cdot )\) is concave in \(\varvec{s}_{T+1}\), by the concavity of the production function, and the linearity property of the expectation operator, it follows that \(V_{T}({\varvec{s}}_{T},\cdot )\) is concave in \({\varvec{s}}_T\). The concavity property then follows by backward induction on t, for \(t=T-1,\ldots ,1\). \(\square\)

Problem (8)–(9) may be nonlinear, in particular due to the nonlinearity of the production functions. Indeed, in practice, production functions are often nonconcave (i) due to head effects, i.e, the difference between upstream and downstream reservoir levels; and (ii) because the power produced by a plant varies nonlinearly with the water release and the number of turbines, whose efficiency may decrease beyond a maximum flow rate (Zéphyr et al. 2017). In industry, this issue is often dealt with by approximating production functions with their concave envelopes (e.g., Goor et al. 2011; Carpentier et al. 2013; Côté and Arsenault 2019; Morillo et al. 2020).

As in Zéphyr et al. (2017), the nonlinearity hurdle is passed using inner generalized linear programming (GLP) on a support grid to obtain a convex approximation of the problem. For each plant i, assume that the production function is evaluated over a finite grid of reservoir releases \(\mathcal {U}_t:=\{{u}_i^k|k \in K_i\}\), constructed in a preprocessing step, where \(K_i\) is the set of indices associated with the discrete releases \({u}_i^k, \ i=1,\ldots , n\). Similarly, the expected value function \(\mathcal {V}_{t+1}(\cdot )\) is evaluated over a finite set of states \(\mathcal {G}_t:=\{{{\varvec{s}}}_{t+1}^j|j\in J_t\}\), where \(J_t\) is the set of indices associated with the discrete storage vectors \({{\varvec{s}}}_{t+1}^j\), possibly obtained by division of simplices as explained in Sect. 3. The following GLP is a linear approximation of Problem (8)–(9):

$$\begin{aligned} \hat{V}_t({\varvec{s}}_t,{\varvec{q}}_t):=&\max _{{{\varvec{u}}_t,\varvec{y_t}}, \varvec{\lambda }, \varvec{\mu }} \left\{ \sum _{i=1}^n\sum _{k\in K_i}p_{it}({u}_{i}^k)\lambda _i^k+\sum _{j\in J_t}\hat{\mathcal {V}}_{t+1}\left( {\varvec{s}}_{t+1}^j,\cdot \right) \mu ^j\right\} \end{aligned}$$
(10)
$$\begin{aligned} \text {s.t. }\,&(4){-}(7) \end{aligned}$$
(11)
$$\begin{aligned}&u_{it}-\sum _{k\in K_i}\lambda ^k_i u_i^k =0,&i=1,\ldots , n \end{aligned}$$
(12)
$$\begin{aligned}&{\varvec{s}}_{t+1}-\sum _{j\in J_t}\mu _j{{\varvec{s}}}_{t+1}^j=0 \end{aligned}$$
(13)
$$\begin{aligned}&\sum _{k\in K_i} \lambda _i^k=1,&i=1,\ldots , n \end{aligned}$$
(14)
$$\begin{aligned}&\sum _{j\in J_t} \mu ^j = 1 \end{aligned}$$
(15)
$$\begin{aligned}&{\varvec{\lambda }, \varvec{\mu }\ge 0} \end{aligned}$$
(16)

Note that \(\varvec{\lambda }\) and \(\varvec{\mu }\) are vectors of convex combination coefficients, as expressed in equations (12)-(16). Thus, for each power plant i, in each period, the release is interpolated on the discrete release values; similarly the next period storage level is interpolated on the storage grid.

Since the calculation of the expected value is not the focus of this work, we assume the natural inflow process to be finite, and serially independent. As a result, in the numerical experiments, in each period, we will use Monte Carlo simulation to generate a finite sample of natural inflows, and the expected value of the approximate value function, \(\hat {\mathcal{V}}_{t+1}(\cdot )\), will be estimated by the sample mean of the \(\hat{V}_{t+1}({\varvec{s}}_t,\varvec{q}_t)\)’s. Similarly, at each time period, for a given state point \({\varvec{s}}_t^k\), let \(\varvec{\pi }_t^{j}\) be a vector of optimal dual prices associated with the mass-balance constraints (1), for a given observation \({\varvec{q}}_t^j, j=1,\ldots , J\). In the sequel, a vector of subgradient, \({\varvec{g}}_t^k\), will be taken as the sample mean of the \(\varvec{\pi }_t^{j}\)’s.

In closing this section, observe that since (i) Problem (10)–(16) is linear and its objective maximized; and (ii) \(s_t\) is in the right hand side of the water-balance constraint (4), therefore the GLP is a parametric linear program, so that its optimal value function \(\hat{V}_t({\varvec{s}}_t, {\varvec{q}}_t)\) is a piecewise linear concave function of \({\varvec{s}}_t\).

3 Simplicial approximate stochastic dynamic programming

Despite its theoretical elegance, it is well known that dynamic programming is plagued by the so-called curse of dimensionality, in the sense that the computational burden of Problem (10)–(16) increases exponentially with the dimension, n, of the reservoir level space \(S_t\), except for rare cases (e.g.,unconstrained linear systems with quadratic production functions), for which analytical solutions can be derived easily. As a result, the problem cannot be solved for all possible reservoir level vectors; thus, we have to resort to some numerical procedure. To tackle the curse of dimensionality, in each time period t, we need to select a sample of discrete state vectors \(\mathcal {G}_t:=\left\{ {\varvec{s}}_t^j \in S_t, j =1, 2, \ldots , m\right\}\), \(t=T,T-1,\ldots 1\). As discussed earlier, popular sampling techniques include Monte Carlo simulation (Chen et al. 2020; Morillo et al. 2020; Zéphyr and Anderson 2018; Morillo et al. 2017; De Matos et al. 2015), quasi-Monte Carlo simulation (Cervellera et al. 2013; Alessandri et al. 2010; Mello et al. 2011), Latin hypercube (Feng et al. 2020; Mello et al. 2011), orthogonal arrays (Feng et al. 2017; Chen 1999).

In our context, the state space is defined by the level of the reservoirs, which is confined within the hyperrectangle \(S_t:=\{{\varvec{s}}_t \in I\!\!R^n \ | \ \underline{{\varvec{s}}}\le {\varvec{s}}_t \le \overline{{\varvec{s}}}\}\), as defined by the box constraint (5). As a result, the state space is continuous, and as aforementioned, the approximate value function (10)–(16) cannot be evaluated for all possible pairs \(({\varvec{s}}_t, {\varvec{q}}_t)\). Therefore, we have to resort to some form of discretization or sampling of the state space \(S_t\).

Under a simplicial approximate stochastic dynamic scheme, the set \(S_t\) is iteratively partitioned into smaller convex subsets, called simplices, and the approximate value function (10)–(16) is evaluated at their vertices, or extreme points.

Simplicial partitioning of convex sets is widespread in the global optimization literature (e.g., Gimbutas and Žilinskas 2018; Žilinskas and Žilinskas 2002; Paulavičius and Žilinskas 2014, 2009; Horst 1976; Tuy 1991; Bomze and Eichfelder 2013), and less popular in the field of dynamic programming (e.g., Zéphyr et al. 2017, 2015; Habets et al. 2006; Yershov and LaValle 2012; Sala and Armesto 2022). Perhaps simplicial partitioning has received a lot of attention in global optimization as a simplex is an n-dimensional polyhedron with “the minimal number of vertices”, at which the function is evaluated (Paulavičius and Žilinskas 2009).

In our previous work (Zéphyr et al. 2015), that we revisit here, we iteratively sampled the state space based on the curvature of the value function, which we locally estimated by the difference between an upper and a lower bounds constructed on each simplex.

We provide a detailed review of simplicial partitioning of hyperrectangles in “Appendix 1”.

3.1 Simplicial piecewise linear approximation of the value function

In any period t, assume at some iteration of the simplicial algorithm, the state space \(S_t\) has been partitioned into simplices, and the expected value function has been evaluated at the extreme points \({\varvec{s}}_t^k \in S_t\), \(k=1,\ldots ,K\), \(f^k:=\hat{\mathcal {V}_t}({\varvec{s}}_t^k,\tilde{{\varvec{q}}}_t)\). (In the sequel, we drop the time index t for ease of notation.) Then, for any point \({\varvec{s}}\in S\), the expected value function can be approximated by the following linear program, which by the concavity of the approximate value function yields a lower bound, \(B_L({\varvec{s}})\):

$$\begin{aligned} B_L({\varvec{s}}):= \max \sum _{k=1,\ldots ,K} \lambda _k f^k \,\hbox {s.t.}\,{\varvec{s}} =\sum _{k=1,\ldots ,K} \lambda _k {\varvec{s}}^k, \ \sum _{k=1,\ldots ,K} \lambda _k=1, \hbox { and }\lambda _k\ge 0 \ \forall k. \end{aligned}$$
(17)

Let \(\mathcal {B}({\varvec{s}})\) be the set of indices of the nonzero components \(\lambda _k\) in a basic optimal solution of the linear program (17); \(\mathcal {B}({\varvec{s}})\) contains at most \(n+1\) elements so that the point \({\varvec{s}}\) can be expressed as a convex combination of at most \(n+1\) vertices, and the set of all convex combinations of these vertices is a simplex. Also, if vectors of subgradients \({\varvec{g}}^k, k \in \mathcal {B}({\varvec{s}}),\) are known at the grid points \({\varvec{s}}^k\), then the expected value function is bounded above by:

$$\begin{aligned} B_U({\varvec{s}}):= \min _{k\in \mathcal {B}({\varvec{s}})} f^k + {{\varvec{g}}^k}^\top (\varvec{s}-{\varvec{s}}^k). \end{aligned}$$
(18)

Then \(B_L({\varvec{s}})\le f({\varvec{s}})\le B_U({\varvec{s}})\) so that \(B_U(\varvec{s})-B_L({\varvec{s}})\) is an upper bound on the approximation error at the point \({\varvec{s}}\) using the support vertices \({\varvec{s}}^1,\ldots ,{\varvec{s}}^K\). It is also pointed out in Zéphyr et al. (2015) that the largest error bound on the simplex with vertex set \(\mathcal {B}\) is given by the linear program:

$$\begin{aligned} \overline{E}_{\mathcal {B}}:= & {} \max _{{\varvec{s}},\phi ,\lambda _k,k\in \mathcal {B}} \phi -\sum _{k\in \mathcal {B}} \lambda _k f^k \nonumber \\{} & {} \,\hbox {s.t.}\,\quad {\varvec{s}}=\sum _{k\in \mathcal {B}} \lambda _k {\varvec{s}}^k, \ \sum _{k\in \mathcal {B}} \lambda _k=1, \ {\lambda _k\ge 0} \ \hbox { and }\phi \le f^k + {{\varvec{g}}^k}^\top ({\varvec{s}}-{\varvec{s}}^k), \ \forall k\in \mathcal {B}. \end{aligned}$$
(19)

If the error bound \(\overline{E}_{\mathcal {B}}\) exceeds a certain criterion, then an optimal point \(s^*_{\mathcal {B}}\) of (19) would be a candidate vertex for being added to the set of vertices as \({\varvec{s}}^{K+1}:={\varvec{s}}^*_{\mathcal {B}}\). Similarly, if there exists some analytical expression for the function \(f(\varvec{s}):=\hat{\mathcal {V}}_t({\varvec{s}}_t, {\varvec{q}}_t)\), the largest actual approximation error on a simplex with vertices in \(\mathcal {B}\) can be found through the nonlinear program:

$$\begin{aligned} E_{\mathcal {B}}:= \max _{{\varvec{s}},\lambda _k,k\in \mathcal {B}} f(\varvec{s})-\sum _{k\in \mathcal {B}} \lambda _k f^k \,\hbox {s.t.}\,{\varvec{s}} =\sum _{k\in \mathcal {B}} \lambda _k {\varvec{s}}^k, \ \sum _{k\in \mathcal {B}} \lambda _k=1 \hbox { and }\ { \varvec{\lambda }\ge {\varvec{0}}}. \end{aligned}$$
(20)

In the approach of Zéphyr et al. (2015), an initial set of vertices is first chosen, for example the \(2^n\) vertices of the hyperrectangle S plus one interior point \({\varvec{s}}^{(2^n+1)}\). Next an initial set of simplices is explicitly enumerated that spans these vertices. Then the linear program (19) is solved for every simplex in the set and the next vertex to be added is selected as the optimal solution \({\varvec{s}}^*_{\mathcal {B}}\) for the simplex \(\mathcal {B}\) with the largest error bound \(\overline{E}_{\mathcal {B}}\). Such a point \({\varvec{s}}^*_{\mathcal {B}}\) is called a division point and the list of simplices is correspondingly updated by deleting the simplex with vertex set \(\mathcal {B}\) from the list and adding to the list the new simplices created by dividing \(\mathcal {B}\). Iterating this way until a termination criterion is satisfied, the method of Zéphyr et al. (2015) stops with a list of, say, K vertices \({\varvec{s}}^1,\ldots ,{\varvec{s}}^K\) at which the approximate value function and its expectation are evaluated, together with a potentially very large list of associated simplices.

The advantage of this scheme is that it provides a monotonic error bound sequence on the approximation error. However, its Achille’s heel is the exhaustive examination of the list of created simplices that is kept in memory in each time period, and the slow convergence. Depending on the size of such a list, this might be very expensive in terms of memory usage; this is the focus of the next subsection.

3.2 Complexity and convergence analysis

A detailed complexity analysis of general operations on simplices (not the simplicial approximation itself) is provided in Zéphyr et al. (2017). In particular, at each iteration k of the procedure, assume we have a list of \(r^k\) active simplices, finding the simplex with the worst approximation error requires \({\mathcal {O}}(r^k)\) operations.

Now, assume we want to partition the hypercube S into simplices until a desired error bound, \(\overline{E}_0\), is attained. Therefore, our goal is to find a full-dimensional simplex \(\mathcal {B}\subset S\) generated by the columns of a full row rank matrix \({\varvec{S}}_\mathcal {B}\in I\!\!R^{n\times {(n+1)}}\), such that the optimal value of (19) is \(\overline{E}_\mathcal {B}\le \overline{E}_0\). Toward this end, we first decompose the hypercube S into initial simplices, and for each created simplex solve (19) to find the largest error bound as well as the divisison point \({\varvec{s}}\). Then, the initial simplex with the largest error is divided at the corresponding division point using the radial \(\omega\)-subdivision strategy (see “Appendix 1”). We repeat the same process until the threshold \(\overline{E}_0\) is met.

Proposition 2

Let Vol(S) be the volume of the hyperrectangle S, the number of simplices required to achieve the error bound \(\overline{E}_0\) is of the order \(\mathcal {O}\left( \frac{\text {Vol}(S)n!}{(n+1)\overline{E}_0^{n/2}}\right)\).

A proof of this proposition is provided in “Appendix 2”.

Furthermore,

Proposition 3

Assume at each iteration of the simplicial scheme, the \(\omega\)-subdivision of simplex is used, the simplicial algorithm will converge to the desired error bound \(\overline{E}_0\) in a finite number of steps, which is proportional to an exponential factor.

Proof

Under the \(\omega\)-subdivision strategy, at each iteration k of the simplicial partitioning scheme, the number of created simplices (subdivision of the simplex with the highest error bound), \(N^k\), is \(2\le N^k\le n+1\). In addition, assume K iterations (simplex subdivisions) are performed, and N simplices created, then we have \(2K\le KN^k \le K(n+1)\), i.e., \(K\ge \frac{N}{n+1}\ge \frac{2K}{n+1}\). It follows from (32) that K is of the order \(\mathcal {O}\left( \frac{\text {Vol}(S)n!}{\overline{E}_0^{n/2}}\right)\), which concludes the proof. \(\square\)

Let us numerically illustrate Proposition (3). First, let us consider hypothetical quadratic expected value functions, of the form \(\mathcal {V}(\varvec{s})=-\frac{1}{2}{\varvec{s}}^\top {\varvec{A}} {\varvec{s}} + {\varvec{b}}^\top {\varvec{s}}\), where the matrices \({\varvec{A}}\) and vectors \({\varvec{b}}\) are randomly generated.

Let us consider relative error bounds \(\overline{E}_0^{\prime}\), as the ratio of a simplex error bound to the maximal error over the initial simplices. For each considered state dimension and relative error threshold indicated in the results reported in Fig. 1, five replications of the simplicial decomposition algorithm are performed.

Figure 1 depicts the natural logarithm of the average total number of created simplices (\(\overline{N}\)), grid points (\(\overline{G}\)), iterations (\(\overline{K}\)), which also is the additional simplices created (in addition to the initial ones), and the CPU time (\(\overline{t}\)), for different error thresholds and state space dimensions. These results confirm that the computational burden to achieve a fixed error bound increases more than exponentially with the dimension, n, of the hyperrectangles.

Fig. 1
figure 1

Graphical illustration of the simplicial approximation complexity for quadratic functions

Let us repeat the same tests on hypothetical Cobb–Douglas expected value functions of the form

$$\begin{aligned} \mathcal {V}({\varvec{s}}) = \prod _{i=1}^n s_i^{\alpha _i} \quad (\alpha _i\ge 0 \hbox { and }\sum _{i=1}^n \alpha _i\le 1 ). \end{aligned}$$
(21)

As for the quadratic functions, for each error threshold and each state space dimension, the simplicial procedure is carried out to construct grid points to approximate the functions, and five replications are performed. The same statistics are calculated as above. Samples of results reported in Fig. 2 also confirm that the complexity of the simplicial scheme is exponential in the state space dimension.

Fig. 2
figure 2

Graphical illustration of the simplicial approximation complexity for concave Cobb–Douglas functions

In closing,

Proposition 4

The convergence rate of the simplicial algorithm is at best linear.

Proof

Since at each iteration the simplex with maximal error bound \(\overline{E}_\mathcal {B}\) is divided, the simplicial algorithm generates a non-increasing sequence \(\{{\overline{E}_\mathcal {B}}_k\}\), such that, by Proposition (3), \(\lim \limits _{k \rightarrow \infty }{\overline{E}_\mathcal {B}}_k=0\). Indeed, at any iteration of the algorithm, assume simplex \(\mathcal {B}\subset S\), generated by the matrix \({\varvec{S}}_\mathcal {B}\) (a matrix whose columns are the extreme points of the simplex), is divided; consider any resulting subsimplex \({\mathcal {B}}^c\) with generating matrix \({\varvec{S}_\mathcal {B}}^c\). Matrices \({\varvec{S}}_\mathcal {B}\) and \({{\varvec{S}}_\mathcal {B}}^c\) differ only by one column. The only column of \({{\varvec{S}}_\mathcal {B}}^c\) that is not in \({\varvec{S}}_\mathcal {B}\) is the division point, \(\varvec{s}_\mathcal {B}^*\), of the parent simplex \(\mathcal {B}\), and is a convex combination of the columns of \({{\varvec{S}}_\mathcal {B}}\).

Now, given that the approximate value function (10)–(16) and its expectation are concave, we have \(\sum _{k\in \mathcal {B}} \lambda _k^*\hat{\mathcal V}({\varvec{s}}^k, \cdot ) \le \hat{{\mathcal {V}}}({\varvec{s}}_\mathcal {B}^*, \cdot )\), where \(\varvec{\lambda }^*\) is the optimal \(\varvec{\lambda }\) from Problem (19), and the \({\varvec{s}}^k\)’s are the vertices of the parent simplex \(\mathcal {B}\), or the columns of matrix \({\varvec{S}_\mathcal {B}}\). Thus, we always have \(\sum _{k\in \mathcal {B}} \lambda _k^*\hat{{\mathcal {V}}}({\varvec{s}}^k, \cdot )\le \sum _{j\in \mathcal {B}^c} \lambda _j\hat{{\mathcal {V}}}({\varvec{s}}^j, \cdot ) \ 0 \le \lambda _j \le 1\), where the \({\varvec{s}}^j\)’s (one of them being the optimal division point \({\varvec{s}}_\mathcal {B}^*\)) are the extreme points of the subsimplex \(\mathcal {B}^c\). Similarly, due to the concavity of the function, \(\hat{{\mathcal {V}}}({\varvec{s}}_\mathcal {B}^*, \cdot )\le \min _{k \in \mathcal {B}}\{f^k+\varvec{{g^k}}^\top ({\varvec{s}}_\mathcal {B}^* - {\varvec{s}}^j)\}\) (the extrapolation of the function at \({\varvec{s}}_\mathcal {B}\)). It is also clear that \(\min _{j \in \mathcal {B}^c}\{f^j + {{{\varvec{g}}^k}}^\top ({\varvec{s}}^c - \varvec{s}^j), \ s^c \in \mathcal {B}^c\} \le \min _{k \in \mathcal {B}}\{f^k+\varvec{{g^k}}^\top ( {\varvec{s}} -{\varvec{s}}^k)\}, \ {\varvec{s}} \in \mathcal {B}.\)

Therefore, due to the concavity of the approximate value function, we always have \(\overline{E}_{\mathcal {B}^c}\le \overline{E}_{\mathcal {B}}\), where \(\overline{E}_{\mathcal {B}^c}\) and \(\overline{E}_{\mathcal {B}}\) are the maximal error bound on the function over subsimplex \(\mathcal {B}^c\) and parent simplex \(\mathcal {B}\), respectively. As a result, the error sequence \(\{{\overline{E}_\mathcal {B}}_k\}\) is non-increasing, and \(\lim \limits _{k \rightarrow \infty }\frac{{\overline{E}_\mathcal {B}}_{k+1}}{{\overline{E}_\mathcal {B}}_k}\le 1\); and the proof is complete. \(\square\)

Figure 3 illustrates the convergence of the simplicial algorithm on the approximation of value functions for four midterm reservoir problems. We consider a ten-period planning horizon, and the parameters of the problems are generated as described in the numerical experiment section. For each case, we generate five replications. The grid sizes are fixed at \(100n + 2^n\). The evolution of the average relative error (ratio of the error at each iteration to that of the first iteration) for the first period is depicted in Fig. 3.

As stated in the proof of Proposition (4), we see that the sequence of the approximation error is non-increasing. For the four-dimensional problems, at the last iteration, the initial error is reduced to approximately \(20\%\), and around \(75\%\) for the six-dimensional problems, suggesting that denser grid sizes are needed to obtain a similar precision as for the four-dimensional problems.

In general, the approximation error decreases relatively fast over the first few iterations, then slows down dramatically. This is due to the fact that, as the active simplices (not yet divided) become smaller, the local curvature of the function does not vary significantly, as a result, the approximation error is relatively steady on the existing simplices.

Fig. 3
figure 3

Illustration of the convergence of the simplex algorithm

An apparent disadvantage of the simplicial scheme, especially for state space dimensions greater than or equal to ten, is the extra computational burden associated with a potentially very large list of simplices as well as the complete, uniform exploration of the whole state space which may not be required in practical applications where more localized approximations would be adequate.

Therefore in this paper we seek to explore other ways of constructing grid points to evaluate the approximate value function and its expectation in each period without enumerating an exhaustive list of associated simplices in the hope to alleviate the inherent exponential complexity of the simplicial approach.

4 Hybrid simplicial approximate dynamic programming

We now examine some randomized approaches for selecting new grid points at which to evaluate the approximate value function (10)–(16) in each period t that avoid making a large list of active simplices. With these approaches, it is not possible to identify a division point of largest error bound, so there is a need for statistical estimation of the approximation error, and other heuristics must be called upon for selecting a new grid point at each iteration. We first describe three such heuristics and next we discuss statistical estimation of the approximation error.

4.1 Randomized simplex-based sampling of the reservoir level space

Monte Carlo (MC). Instead of using a regular grid of equally spaced vertices, one simple and very crude approach is to use a sequence of pseudo-random vertices. In each period t, let \({\varvec{v}}^k\) be a sequence of n-vectors of independent variates, uniformly distributed in \([{\varvec{0}},{\varvec{1}}]\). Again, we drop the time period index t for ease of notation. Starting with the initial set of \(2^n\) extreme points of the hyperrectangle S, the i-th component of the \(k^\text {th}\) random vertex is given by \(s^{(2^n+k)}_i = \underline{s}_i+(\overline{s}_i - \underline{s}_i) v^k_i\), for \(i=1,\ldots ,n\).

This naïve random sequence of approximation nodes can be considered neutral with respect to the approximation error in the sense that the choice of the next vertex to enter the support set is not based on an error criterion such as the division point of a simplex with largest error bound in Eq. (19). Therefore one would expect that a numerical comparison of this naïve scheme with the previous method would show a significant difference in accuracy.

MC simplicial. This method combines the idea of using a simplicial approximation with that of using the Monte Carlo scheme, but the way it searches for a simplex to be divided is different than what was done in our previous simplicial method, because here the state space is not exhaustively partitioned into an simplices, as discussed in “Appendix 3”. In period t, suppose the approximate value function has been evaluated at K points. We then generate a random point \(\hat{{\varvec{s}}}\) uniformly in S as before (\(\hat{s}_i= \underline{s}_i+(\overline{s}_i - \underline{s}_i)v_i\)), solve Eq. (17) to find the vertex set \(\mathcal {B}(\hat{{\varvec{s}}})\) of the simplex containing \(\hat{{\varvec{s}}}\) and solve Eq. (19) to obtain the division point \({\varvec{s}}^*_{\mathcal {B}}\) that has the largest error bound in that simplex. Lastly, we choose that division point as the new vertex \({\varvec{s}}^{K+1}= \varvec{s}^*_{\mathcal {B}}\). This procedure is repeated until the size of the grid reaches a desired target.

Batch MC simplicial. As in the MC simplicial method, in period t, suppose at a given iteration there are K vertices in the grid, with \(K\ge n+1\). Next, we generate a sample of m random points \(\hat{{\varvec{s}}}^1,\ldots ,\hat{{\varvec{s}}}^m\) uniformly in S. For each random point \(\hat{{\varvec{s}}}^j\) in the sample, Eq. (17) is solved to find the vertex set \(\mathcal {B}^j\) of the simplex that contains \(\hat{{\varvec{s}}}^j\), and Eq. (19) to obtain the division point \({\varvec{s}}^{*}_{\mathcal {B}^j}\) that has the largest error bound \(\overline{E}_{\mathcal {B}^j}\) in that simplex. Then the new vertex is chosen as the division point of the simplex with the largest error bound in the sample, so \(\varvec{s}^{K+1}= {\varvec{s}}^*_{\mathcal {B}^{j^*}}\) where \(j^*=\arg \max _{j=1,\ldots ,m} \overline{E}_{\mathcal {B}^j}\). This way, by evaluating a small number m of simplices, we have good chances of choosing a candidate with a relatively large error bound, but without having to maintain a large list of simplices as in the previous papers.

By keeping one candidate out of m at each iteration, the best we can hope for is that the selected vertices would belong to the top (1/m)th among the sampled candidates. But there is a probability \((1-1/m)^m\) that the selected vertex is not in the top (1/m)th, and also some probability that the sample has more than one candidate in the top (1/m)th, so that good candidates are discarded in some iterations. With \(m=3\), these probabilities are 8/27 (select bad vertex because all 3 candidates are not in the top one third, each with probability 2/3) and 7/27 (discard a good vertex, since at least 2 candidates are in the top one third). While this seems better than the MC and the MC simplicial methods, where all vertices are selected (good and bad), we can try to improve the selection process by putting some candidate vertices in a waiting line instead of discarding them right away.

Batch MC simplicial with queue. As in the batch MC simplicial method, but now, we keep a list, of at most r recently explored simplices, which have been queued from previous iterations instead of being discarded. Initially, the queue is empty. In a typical iteration, m new candidates are sampled as in the batch MC simplicial method, which are combined into a pool with the (at most) r candidates from the queue. The new vertex is chosen as the division point of the simplex with the largest error bound in the pooled candidate list. The next r candidates with largest error bounds are held in the queue, and the remaining candidates with the smallest error bounds are discarded.

Parameters for this would need to be experimented if this turns out to be a tempting avenue. The computational effort is similar to the batch MC simplicial method but it is hoped that the batch MC simplicial with queue would have smaller approximation error than the batch MC simplicial.

The above methods attempt to replace an exhaustive list of simplices with a shorter list from which a division point is chosen with the largest error bound at each step. It is hoped that the use of a truncated candidate list will be compensated by the large number of sampled points and simplices over a large number of steps. However, in absence of an exhaustive list of simplices, there is no uniform upper bound on the approximation error. Also, as it is illustrated in Fig. 4, in contrast to the simplicial scheme, there is no guarantee as to the monotonicity of the sequence of generated approximation errors. Thus, the next section will discuss the statistical estimation of error.

Fig. 4
figure 4

Illustration of the convergence of the hybrid simplicial methods on the approximation of the first period value function for one of the 4-reservoir literature test problems described in Sect. 5.3

An illustrative comparison between the original and the hybrid simplicial methods is provided in “Appendix 3”.

4.2 Statistical estimation of the approximation error

Under the concavity of the expected value function \(\hat{\mathcal V}_t({\varvec{s}}_t,\cdot )\), the approximation error is the difference between the function and its piecewise linear approximation. At any point \({\varvec{s}}_t\in S_t\), the approximation error is \(\hat{\mathcal V}_t({\varvec{s}}_t,\cdot )-B_L({\varvec{s}}_t)\), where \(B_L({\varvec{s}}_t)\) solves Eq. (17). Then Eq. (18) implies the approximation error is bounded by \(B_U({\varvec{s}}_t)-B_L({\varvec{s}}_t)\). At all points in simplex \(\mathcal {B}\) that contains \({\varvec{s}}_t\), the approximation error is bounded by \(\overline{E}_\mathcal {B}\) of Eq. (19), while the largest error on the simplex is \(E_\mathcal {B}\) of Eq. (20). Here we are interested in the estimation of the largest actual error \(\max _{s_t\in S_t}\{\hat{{\mathcal {V}}}_t({\varvec{s}}_t,\cdot )-B_L(\varvec{s}_t)\}\) or the largest error bound \(\max _\mathcal {B}\overline{E}_\mathcal {B}\). In both cases, we will use a random sample of m points \(\hat{{\varvec{s}}}^1,\ldots ,\hat{{\varvec{s}}}^m\in S\).

Since function \(\hat{{\mathcal {V}}}_t({\varvec{s}}_t, \cdot )\) is finite and concave everywhere on \(S_t\), by construction, the approximation error is also a well-behaved function; it is equal to zero at the support nodes and varies smoothly on the simplices. Therefore, when sampling the state space uniformly, it might be reasonable to assume that the corresponding distribution of the approximation error is also well-behaved. However, since we do not know the theoretical distribution, first, we conduct an empirical investigation. To this end, we generate samples of grid points with the different randomized methods - except for the pure Monte Carlo and the simplicial methods, and calculate the true approximation errors for random sample points. Examples of empirical distributions are illustrated in Fig. 5. The true empirical distribution seems to be less scattered than a uniform distribution, and perhaps to lie somewhere between a left triangular distribution with mode at the minimum value, or a right triangular distribution with mode at the maximum value.

Fig. 5
figure 5

Examples of empirical distributions of the approximation errors

Therefore, we propose to use, as statistical models, four simple distributions on (0, b): the right-angled triangular with mode at right \(\text{ TR }(0,b)\), the uniform \(\text{ U }(0,b)\), the right-angled triangular with mode at left \(\text{ TL }(0,b),\) and the symmetrical triangular with mode at the center \(\text{ TC }(0,b)\). For these distributions, the parameter b can be estimated with order statistics.

If a random variable X has a uniform distribution on the interval [0, b], then it is well known, see e.g., Gibbons (1974), that the maximum likelihood estimator (MLE) of the parameter b is the largest observation in the sample. So with sample size m and observed values \(x_1,\ldots ,x_m\), the MLE of b is \(x_{(m)}= \max _{i=1,\ldots ,m} x_i\). Estimators of the limit parameters of a right-angled triangular distribution on the interval [ab], with the mode at the upper limit b, are given in Kachiashvili and Topchishvili (2016), where it is shown that the MLE of b is also \(x_{(m)}\). However, by arguing as in Lamond and Zéphyr (2021), it is easily seen that \(x_{(m)}\) is not an MLE of b for a right-angled triangular distribution with the mode at the lower limit. The true MLE is provided in “Appendix 4”. By the symmetry of the triangular distribution with mode at the center, a simple unbiased estimator of the upper limit is \(\hat{b}=x_{(1)}+x_{(m)}\) where \(x_{(1)}= \min _{i=1,\ldots ,m} x_i\).

In addition to the point estimates of parameter b, it is useful to obtain confidence intervals. For this, it is convenient to define the standardized random variable Y with distribution on the unit interval [0, 1]. For a random sample of m observations, we define the largest of them by \(y_{(m)}\), with the random variable \(Y_{(m)}\) representing its sampling distribution. Let \(F_m(y)\) be the cumulative distribution function of \(Y_{(m)}\). Then \(p=F_m(y)\) is the cumulative probability and \(y=F_m^{-1}(p)\), the quantile. Formulas for these are given in Table 1 for our first three distributions.

Table 1 Formulas for sampling distributions and quantiles

Formulas for unbiased point estimates, \(\hat{b}\), of parameter b with lower and upper limits of confidence intervals are given in Table 2 as multipliers of \(x_{(m)}\), where

$$\begin{aligned} A_m = \prod _{j=1}^m \frac{j}{j+0.5}, \end{aligned}$$
(22)

from adapting equation (6) of Kachiashvili and Topchishvili (2016).

Table 2 Formulas for unbiased point estimate \(\hat{b}\) of b and limits of confidence interval

A numerical example is given in Table 3. For the triangular distribution with mode at left \(\text{ TL }(0,b)\), we see that the unbiased estimate and confidence interval limits based on the order statistic \(x_{(m)}\) are quite large compared to the other two distributions. There might be an interest here in using an MLE estimate instead, which has small bias and smaller variance as pointed out in Lamond and Zéphyr (2021)  thus allowing a smaller sample size for estimating the approximation error, and therefore fewer computations. In the absence of a simple formula for the cumulative distribution of \(x_{(1)}+x_{(m)}\) for the triangular distribution with mode at center \(\text{ TC }(0,b),\) we can use Monte Carlo simulations to obtain approximate confidence intervals for b. For instance, for \(x_{(1)}+x_{(m)}=10\) the point estimate is \(\hat{b}=10\) and the 95% confidence intervals are respectively \(6.1 \le b \le 27\) for \(m=3\), and \(8.1 \le b \le 13\) for \(m=30\).

Table 3 Numerical example for point estimate \(\hat{b}\) of b and confidence interval with \(m=3\) and 30, \(\alpha =0.05\) and \(x_{(m)}=10\)

5 Numerical experiments

Three types of analysis are carried out in the numerical experiments. First, in Sect. 5.1, we appraise the sensitivity of the performance of the two Monte Carlo simplicial methods with respect to their underlying parameters. Second, in Sect. 5.2 the methods are compared on the trade-off between accuracy and computational burden on (i) the approximation of concave functions; and (ii) several simulated reservoir optimizations problems. Lastly, in Sect. 5.3, we compare the methods on three reservoir optimization problems available in the literature.

5.1 Sensitivity of solution performance to parameter values: batch MC simplicial and bath MC simplicial with queue methods

Recall that in the batch MC simplicial method, in each period t, at each iteration, a sample of m random points is chosen in the state space, \(S_t\). Intuitively, this approach is approximately m times slower than the MC simplicial scheme, in which one random point is selected at each iteration. One natural question is how to determine the appropriate sample size m. Though we do not have any theoretical answer to this question, we perform numerical experiments to analyze the sensitivity of solution performance on the approximations of Cobb–Douglas type functions (in dimension n=3, 6, and 9, respectively), with randomly generated parameters, and the approximation of value functions for reservoir management problems (in dimension n=3, 4, and 6, respectively).

We approximate the Cobb–Douglas functions on grids of size 100n, then interpolate the values of the functions on other grids (out-of-sample) of size 200n (solving Problem (17)) and calculate the true approximation errors. For the reservoir management problems, we approximate the value functions (in each time period) on grids of size 100n as well, then solve the first period problem for a sample of 200n \(({\varvec{s}}_1, {\varvec{q}}_1)\) state pairs. For each case (Cobb–Douglas function approximations and value function approximations), five replications are performed for values of m ranging from one to ten. The average results are reported in Fig. 6. Note that smaller values are better in the upper portion of the figure, and the opposite in the lower portion of the figure. The figure displays an “imperfect elbow shape”, and seems that values of m between three to five would suffice to obtain good approximation performance. The computational burden grows linearly with the parameter m; since we strive for a good trade-off between computational burden and accuracy, in the sequel, we will fix m at 3.

Fig. 6
figure 6

Performance of the batch MC simplicial method on different types of problems and by sample size

Similarly, the batch MC simplicial with queue method features two parameters m (same as the previous method), and r, the size of the queue of previously generated random points. We perform the same experiments as above to assess the sensitivity of solution performance with respect to these parameters. We vary the values of m between one and six (based on the above observations), and the values of r between one and eight. Overall, the computational burden is linear in m, and does not seem to be influenced by the lenght of the queue, r (Fig. 7); similarly for the performance of the solution (Fig.  8). In addition, in Fig. 8, in most of the cases, for fixed value r, we observe an elbow shape at \(m=3\) (except for the last picture), suggesting that \(m=3\) seems to be a good enough sample size. Extensive numerical experiments have demonstrated that this method exhibits similar performance (both in terms of computational burden and accuracy) than the batch MC simplicial scheme; thus, results for this method will not be reported in the sequel for the sake of brevity.

Fig. 7
figure 7

Average CPU time in seconds of the batch MC simplicial with queue method for different types of problems and by sample size

Fig. 8
figure 8

Performance of the batch MC simplicial with queue method on different types of problems and by sample size

5.2 Accuracy versus computational burden

Here, we focus on the trade-off between accuracy and computation time. First, in Sect. 5.2.1, we compare the performance of the methods on the approximation of Cobb–Douglas concave functions of the form (21) for different state dimensions n. Though the primary interest of this work is mid-term reservoir management problems, this first setting is motivated by the fact that (i) the simplex-based approximations exploit the concavity property of the functions to be approximated, in contrast to the pure Monte Carlo (MC) scheme; (ii) in the reservoir management context, to handle the nonlinearity of the production functions, we approximate the latter by piecewise concave linear functions (Problem (10)–(16)); (iii) similarly, the value functions are approximated by piecewise concave linear functions (Problem (10)–(16)). Thus, it is no easy task isolating the sole effects of the methods, due to the multiple layers of approximation embedded in the dynamic programs.

Next, in Sect. 5.2.2, the schemes are gauged on several simulated reservoir management problems.

5.2.1 Approximation of concave functions

Grid points of size \(2^n + 100n\) are generated with each method; then the out-of-sample interpolation errors - the difference between the true and interpolated values- are calculated on randomly generated samples of sizes 200n. In addition, under each method and at each iteration, we record the time in seconds to build the grid (ti), the minimum (\(\text {E}_\text {min}\)), the maximum (\(\text {E}_\text {max}\)), the mean (\(\text {E}_\text {av}\)), and the standard deviation of the interpolation error (\(\text {E}_\text {std}\)). We take the simplicial method as our benchmark, and for each method, we calculate relative performance measures as the ratio of the corresponding measure to that of the simplicial. Furthermore, in additional to the relative computation times (in seconds), we also report the absolute times. The results are depicted in Table 4.

As expected, the pure MC method is the fastest as no additional optimization problem is solved except for the approximate dynamic Problems (10)–(16). Also, notice that as we conjectured, the batch MC simplicial scheme is about three times slower than its MC simplicial counterpart, as in the former, in each iteration, we generate three sample points, compared to one in the latter. For three-dimensional functions, the average CPU time of the simplicial method is lower than that of the MC simplicial scheme; for five-dimensional problems, the computation times are comparable. For dimensions equal to eight, the relative average CPU time of the MC simplicial method is only 3% that of the simplicial benchmark, which becomes practically intractable for ten-dimensional problems.

Accuracy-wise (average interpolation errors), except for the three-dimensional problems on which it performs better than the pure MC scheme, the simplicial approach features the worst performance. The batch MC simplicial is the top performer on all cases, followed by its MC simplicial counterpart; however, the difference grows smaller as the dimensions of the functions increase, and the MC simplicial scheme still remains about three times faster.

Table 4 Statistics pertaining to interpolation errors of Cobb–Douglas concave functions

Furthermore, we test the scalability of the randomized methods on the approximation of 11- to 15-dimensional Cobb–Dougblas concave functions. As above, we use all the methods, but the simplicial one (as it is intractable for such high-dimensional problems) to generate sample points of size \(2^n + 100n\); then interpolation errors are calculated on samples of size 100n. We also perform five replications with each method and calculate the same performance statistics, which are reported in Table 5. In addition to being tractable for all the cases, the hybrid methods still outperform the naïve approach (MC) in terms of the maximum and average interpolation errors; they also feature lower standard deviations of the approximation errors. The batch MC simplicial method still outperforms the MC simplicial one, but at the expense of higher computation time.

Table 5 Statistics pertaining to interpolation errors of 11- to 15-dimensional Cobb–Douglas concave functions using the hybrid methods

Lastly, instead of a fixed grid size per period, we fix the total CPU time to build the grids for three-, five- and eight-dimensional Cobb–Douglas concave functions. As above, the constructed grids are then used to calculate interpolation errors on samples of size 100n. We report in Table 6 the relative average of the size of the grids \(\overline{K}\), and interpolation errors \(\overline{\text {E}}_\text {av}\) calculated over five replications. Being the fastest method, the MC approach generates grids, on average, between 200 and 970 times denser than the simplicial approach. Consistent with the observation from Table 4 that the simplest method is faster than the hybrid ones on three- and five-dimensional problems, the latter schemes generate grids less dense than the simplicial method (on average). As the hybrid methods are faster on the eight-dimensional problems, they generate grids of bigger size (on average) than the simplicial approach. The overwhelmingly larger size of the grids generated by the MC approach allows for smaller interpolations errors compared to all three other methods. While smaller average interpolations errors are obtained with simplicial scheme compared to the hybrid ones, the latter outperform the simplicial approach on the five- and eight-dimensional problems.

Table 6 Statistics pertaining to interpolation errors of Cobb–Douglas concave functions for fixed CPU time

5.2.2 Simulated mid-term reservoir optimization problems

As in Zéphyr et al. (2015), for each plant \(i=1,\ldots ,n\), we assume the production function to be of the form

$$\begin{aligned} {p_{it}(u_{it}):=\beta _i\left( \left( u_{it}+ \gamma _i\right) ^{\alpha _i} - \gamma _i^{\alpha _i} \right) , \ \beta _i>0,\ \gamma _i \ge 0, \ 0\le \alpha _i\le 1} \end{aligned}$$
(23)

These production functions are linearized as in (10)–(16). Furthermore, we consider a planning horizon of length T=10, and three reservoir configurations in dimension \(n=4,6,8\), respectively. The problems’ parameters, including bounds on the reservoir and water release levels, borrowed from Zéphyr et al. (2017), are shown in Table 7.

Table 7 Model parameters borrowed from Zéphyr et al. (2017)

For each reservoir configuration, problem instances are randomly generated based on the experimental framework depicted in Table 7. To mitigate boundary effects, the terminal value function, \(\mathcal {V}_{T+1}(s_{T+1})\), is chosen as a concave function of the form (21).

In addition, in each period of the planning horizon, we use each method to generate samples of \(2^n + 200n\) grid points to evaluate the approximate value function (10)–(16). Then, we randomly generate a sample of 1, 000n initial reservoir levels and natural inflows. Next, as in Cervellera et al. (2017), the first period approximate problem is solved with each method for each state observation of the sample, and we record the minimum (\(V_{1\text {min}}\)), the maximum (\(V_{1\text {max}}\)), the average (\(V_{1\text {av}}\)), and the standard deviation (\(V_{1\text {std}}\)) of the first period value function evaluation. Five replications are performed for each case, then we calculate the average of each such statistic as well as the average time (\(\overline{ti}\)) to build the ten value functions. As in the above comparisons, we take the simplicial scheme as the benchmark method. The results (relative measures) are reported in Table 8 as well as the average absolute CPU times (in seconds).

Again, without any surprise, the pure MC method is the fastest. The average CPU time is relatively the same under the simplicial and its MC simplicial variant on the four-dimensional problems; the latter scheme features lower computational burden on the 6- and eight-dimensional instances. Both hybrid methods outperform the simplicial scheme on all the other metrics on the 6-dimensional problems. The performance of the methods is similar on the 8-dimensional problems, however at lower computational burden for the MC variant methods. Indeed, the CPU time of the MC method is approximately 2% of that of the simplicial scheme, and 4% and 9%, for the MC simplicial and its batch variant, respectively.

Table 8 Statistics pertaining to the first period evaluations of the value functions for three reservoir configurations \((n=4,6,8)\) taking the simplicial method as our baseline

Let us now take the MC approach as our benchmark against the two hybrid schemes. Results depicted in Table 9 show that, on average, the MC approach is between two and three times as fast as the MC simplicial counterpart, and between 4.5 and 7.6 times faster than the batch simplicial MC. On the other hand, the two hybrid methods provide slightly better accuracy than the MC scheme.

Table 9 Statistics pertaining to the first period evaluations of the value functions for three reservoir configurations \((n=4,6,8)\) using the MC method as our baseline

In addition, we conduct an analysis of the sensitivity of the solution accuracy of the different methods to the size of the grids. We repeat the above experiments on four- and six-dimensional reservoir problems. The parameters are generated as in Table  7. In each period, for each problem, we construct grids of sizes varying between \(K_1=2^n + 20n\), and \(K_5=2^n + 100n\), in increment of 20n. As before, the first period value functions are solved for 1, 000n randomly generated initial reservoir levels and inflows, then the average is taken. For each grid size \(K_j, j=2,\ldots , 5\), Table 10 depicts the relative average value function \({\frac{\overline{V}_j}{\overline{V}_{j-1}}}\). The results show that the average evaluations of the first period value functions are relatively steady.

Table 10 Variation rate of the average first period value functions with the size of the grid for two reservoir configurations \((n=4,6)\)

We repeat same experiments as above on four- and six-dimensional simulated reservoir problems, now instead of fixing the grid size per time period, we set the total CPU time to build the ten value functions, which is split evenly between the ten periods. We also solve the first period approximate problem (10)–(16) for 1, 000n pairs of reservoir levels and natural inflows. The average relative first period value function \(\overline{V}_{1_\text {av}}\), as well as the average relative first period grid size \(\overline{K}_1\) under each method are depicted in Table 11. As above, we perform five replications.

Again, as expected, being the fastest, the MC method generates denser grids compared to the other approaches, ranging approximately between two times and five times the sizes of the grids generated by the simplicial scheme. For the four reservoir problems, on average, the size of the grids generated by the MC simplicial scheme varies between approximately 94.24% and 1.02% of those generated under the simplicial method. However, similarly to our previous observations, the simplicial algorithm is slower on the six-dimensional problems; the MC simplicial approach generates grids, on average, more than twice as dense as those generated under the simplicial counterpart. In terms of accuracy, all three other methods sightly outperform the simplicial one. The MC and MC simplicial schemes exhibit similar accuracy level, with a slight edge for the MC simplicial on the six-dimensional problems.

Table 11 Statistics pertaining to the first period evaluations of the value functions for two reservoir configurations \((n=4,6)\) using the simplicial method as our baseline and with fixed CPU time to build the value functions

5.3 Performance comparisons on three literature reservoir optimization problems

Our last comparison setting is three literature reservoir optimization problems: two four-dimensional and one ten-dimensional problems. The planning horizons are one year divided into monthly time steps. These problems were designed to assess the effectiveness of reservoir optimization solution methods. For details about their characteristics, please see Chow and Cortes-Rivera (1974), Murray and Yakowitz (1979), Moravej and Hosseini-Moghari (2016). The main difference between the two four-dimensional problems is that in one of them (hereafter Problem 1), the release decisions are less constrained, and the upper bounds on the reservoirs are stationary (do not vary with time), in contrast with the second one (Problem 2).

In all three problems, the first period reservoir level \(({\varvec{s}}_1)\) is fixed, similarly for the terminal one \(({\varvec{s}}_{13})\). Though these constraints can easily be handled in a multi-period model, this is not the case in dynamic programming-like methods, as in period \(t=12\), the algorithms can pick a reservoir level that violates the terminal value constraints on the reservoir levels. Similarly, in any period t, the bounds may also be violated. We mitigate this issue by introducing linearized penalty functions in the objective functions. We calibrate the penalty coefficients through trial-and-errors, until we obtain solutions that meet all the constraints (solving the value functions forward in time as explained below).

We build the value functions moving backward in time. Then, starting from the initial reservoir level, we solve the value functions forward in time, using the previous period suboptimal reservoir level as initial value. In each time period, we calculate the suboptimal current period objective value (say the current period suboptimal production in our context). Thus, the suboptimal value of the problem is the sum of such suboptimal objective values.

Under each method, first, we use different grid sizes to build the value functions, as illustrated in Tables 12, 13, 14, 15, 16 and 17. Under the simplicial method, each problem is solved once (one backward and one forward steps), as the problems are deterministic and the simplicial method is also a deterministic algorithm. Under the hybrid methods, we perform five replications, and calculate the averages (solution times and suboptimal values).

Tables 12, 14, and 16 report the optimality gaps (difference between the known optimal values and the suboptimal ones obtained with the methods) for each grid size and each method. No results are reported for the simplicial method for the largest problem (ten-dimensional), which proved intractable for this method (we stopped the algorithm after several hours spent in the last period recursion).

The optimality gaps decreases as the grid size increases, regardless of the method. Overall, the batch MC simplicial scheme consistently exhibits the lowest optimality gaps, followed by the MC simplicial method, though the latter is outperformed by the simplicial approach on the two four-dimensional problems for the two largest grid sizes. The pure MC method consistently features the highest optimality gaps. The associated CPU times (in seconds) are reported in Tables 13, 15, 17, respectively.

Table 12 Optimality gap of the first four-reservoir problem (Problem 1) described in Chow and Cortes-Rivera (1974), Murray and Yakowitz (1979) across the tested methods for different grid size
Table 13 CPU time in seconds to approximate the value functions for the first four-reservoir problem (Problem 1) reported in Chow and Cortes-Rivera (1974), Murray and Yakowitz (1979) for different grid size across the tested methods
Table 14 Optimality gap of the second four-reservoir problem (Problem 2) described in Chow and Cortes-Rivera (1974), Murray and Yakowitz (1979), Moravej and Hosseini-Moghari (2016) across the tested methods for different grid size
Table 15 CPU time in seconds to approximate the value functions for the second four-reservoir problem (Problem 2) reported in Chow and Cortes-Rivera (1974), Murray and Yakowitz (1979), Moravej and Hosseini-Moghari (2016) for different grid size across the tested methods
Table 16 Optimality gap of the ten-reservoir problem described in Chow and Cortes-Rivera (1974), Murray and Yakowitz (1979), Moravej and Hosseini-Moghari (2016) across the tested methods for different grid size
Table 17 CPU time in seconds to approximate the value functions for the ten-reservoir problem reported in Chow and Cortes-Rivera (1974), Murray and Yakowitz (1979), Moravej and Hosseini-Moghari (2016) for different grid size across the tested methods

Second, as for the simulated problems, we also compare the performance of the methods by fixing a total computation time to build the 12 value functions. The total time is evenly split between the 12 periods. In addition to the optimality gap, \(\epsilon\), we also calculate the average grid size per period, \(\overline{K}_t\). For each problem, we consider three total CPU times. The results are reported in Tables 18, 19 and 20. Since the MC approach is the fastest, it generates the highest average number of grid points per period. Detailed analysis of the results (not reported here) show that in the backward pass, the MC method alternates between very high and relatively low number of grid points. This is due to the fact that when a very dense grid is generated in a period t, then the computational burden of the approximate problem (10)–(16) increases in period \(t-1\), as the grid generated in period t is used to approximate the value function in the previous period, as well as to interpolate the next period reservoir levels. Thus, a low-size grid is generated in period \(t-1\). If this behaviour, combined with the uniform discretization of the reservoir level space (not taking into account information about the curvature of the value function), seems to put the MC scheme at a disadvantage in terms of optimality gap compared to the other three methods on the two four-reservoir problems, this is not the case on the ten-dimensional one, on which it outperforms the other methods. The simplicial method systematically outperforms the two MC variants on the second four-reservoir problem, but it is intractable for the largest dimension problem, due to the exponential complexity of the initial partitioning of the hypercube into simplices.

Table 18 Optimality gap of the first four-reservoir problem (Problem 1) described in Chow and Cortes-Rivera (1974), Murray and Yakowitz (1979) across the tested methods for different total CPU time budget
Table 19 Optimality gap of the second four-reservoir problem (Problem 2) described in Chow and Cortes-Rivera (1974), Murray and Yakowitz (1979), Moravej and Hosseini-Moghari (2016) across the tested methods for different fixed CPU time budget
Table 20 Optimality gap of the ten-reservoir problem described in Chow and Cortes-Rivera (1974), Murray and Yakowitz (1979), Moravej and Hosseini-Moghari (2016) across the tested methods for different CPU time budget

We close the numerical experiments section with the following remarks about the scalability of the hybrid methods to problems with more than ten reservoirs, the largest size problems solved in this work.

On the one hand, in addition to function evaluations, the complexity of the hybrid methods depends on the sizes of the linear programs (17) and (19). The former program features K decision variables and \(n + 1\) constraints, and the latter at most \(2n + 2\) decision variables and at most \(2n + 2\) constraints; K, being the size of the grid at each iteration. At each iteration, the MC simplicial scheme solves each of the two programs once, as one additional grid point is added to the grid. Under the batch MC variants, at each iteration, each program is called m times; m being the size of the randomly generated candidate sample points. Also notice that the size of Program (17) varies with K, which is not the case of Program (19). Results of extra numerical experiments not reported in this paper suggested that the computational burden of the hybrid methods is proportional to the grid sizes. Thus, we believe the proposed methods are scalable to more than ten reservoirs.

On the other hand, the value functions are typically built off-line. Then, in each time period, once the pair \(({\varvec{s}}_t, {\varvec{q}}_t)\) is observed, the approximate problem (10)–(16) is solved online using the previously built value function to make operational decisions. Solving this problem is relatively fast. The value functions are updated off-line as more data (natural inflows) become available.

6 Conclusions

This work has revisited a simplicial approximate stochastic dynamic programming scheme presented in Zéphyr et al. (2015) for the mid-term sub-optimal operations of multi-period multi-reservoir systems. This iterative method relies on the exhaustive examination of a list of created simplices, whose vertices define grid points at which the value functions are evaluated at each period. The scheme is limited by the computational burden of partitioning a hypercube into simplices.

We have proposed two hybrid methods that combine random sampling strategies with the approach proposed in Zéphyr et al. (2015) to locally estimate the approximation error. Simulation results of randomly generated and three literature mid-term reservoir management test problems showed that, compared to the simplicial methods, the hybrid methods seem to offer a good trade-off between solution time and accuracy, in particular when the state space dimension is greater than nine. Approximation of functions of dimension up to 15 within reasonable computation time illustrated the potential scalability of the proposed randomized methods, which might further be leveraged through parallelization.