1 Introduction, motivation and context

In energy management, it is common that strategic investment decisions (storage capacities, production units) are made at a slow time-scale, whereas operational decisions (storage management, production) are made at a fast time-scale. The total number of decision stages may be huge, which leads to numerically untractable optimization problems—for instance, a two-time-scale stochastic optimal problem where fast controlled stochastic dynamics (e.g. a change every 15 or 30 min) affect a controlled long term stochastic behavior (e.g. a change every day or every week) over several years. How can we nevertheless provide numerical solutions (policies) to such problems?

1.1 Literature review

Stochastic Dynamic Programming (SDP) based on the Bellman equation (Bellman 1957) is a standard method to solve a multistage stochastic optimization problem by time decomposition. This method suffers the so called curses of dimensionality as introduced in Bellman (1957), Bertsekas (2017), Powell (2007). In particular the complexity of the most classical implementation of SDP (that discretizes the state space) is exponential in the number of state variables.

A major contribution to handle a large number of state variables is the well-known Stochastic Dual Dynamic Programming (SDDP) algorithm Pereira and Pinto (1991). This method is adapted to problems with linear dynamics and convex costs. Other similar methods have been developed such as Mixed Integer Dynamic Approximation Scheme (MIDAS) Philpott et al. (2016) or Stochastic Dual Dynamic Integer Programming (SDDiP) Zou et al. (2018) for nonconvex problems, in particular those displaying binary variables. The performance of these algorithms is sensitive to the number of time steps (Leclère et al. 2020; Philpott et al. 2016).

Other classical stochastic optimization methods are even more sensitive to the number of time stages. It is well-known that solving a multistage stochastic optimization problem on a scenario tree displays a complexity exponential in the number of time steps.

Problems displaying a large number of time stages, in particular problems with multiple time-scales, require to design specific methods. A class of stochastic optimization problems to deal with two time-scales has been introduced in Kaut et al. (2014) and further formalized in Maggioni et al. (2019). It is called Multi-Horizon Stochastic Optimization Problems and it frames problems where uncertainty is modeled using multi-horizon scenario trees as rigorously studied in Werner et al. (2013). Several authors have studied stochastic optimization problems with interdependent strategic/operational decisions or intrastage/interstage problems (Abgottspon and Andersson 2016; Abgottspon 2015; Abgottspon and Andersson 2014; Skar et al. 2016; Pritchard et al. 2005), but most of the time the developed methods to tackle the difficulties are problem-dependent. In Kaut et al. (2014) the authors present different particular cases where the two time-scales (called operational and strategic decision problems) can be easily decomposed. In Maggioni et al. (2019) a formal definition of a Multi Horizon Stochastic Program is given and methods to compute bounds are developed: formal Multi Horizon Stochastic Program is a stochastic optimization problem with linear cost and dynamics where uncertainties are modeled as multi time-scale scenario trees.

1.2 Paper contributions and organization

In this paper, we propose a framework to formally define stochastic optimization problems naturally displaying two time-scales, that is, a slow time-scale (like days) and a fast time-scale (like half hours). The ultimate goal is to design tractable algorithms for such problems with hundreds of thousands of time steps, without requiring a stationary/infinite horizon assumption (contrarily to Haessig et al. (2015)) and in a stochastic setting [which extends Heymann and Martinon (2018)].

The paper is organized as follows. In Sect. 2, we outline the setting of a generic two-time-scale multistage stochastic optimization problem. In Sect. 3, we show how to write Bellman equations at the slow time-scale (the resulting Dynamic Programming equation is referred to as the Bellman equation by time blocks, and is detailed in Carpentier et al. (2023, Sect. 5). If we suppose slow time-scale stagewise independence of the noise process, the corresponding Bellman functions provide both the optimal cost and optimal policies. If not, we nevertheless are able to derive feasible policies from the Bellman functions, which is our main objective. Then, under a monotonicity-inducing assumption, we obtain a more tractable version of the Bellman equation, by relaxing the problem dynamics without changing the slow time-scale Bellman functions. In Sect. 4, we devise two decomposition methods. The first one, akin to the so-called price decomposition, gives a lower bound of the slow time-scale Bellman functions, whereas the second one, based on resource decomposition, gives an upper bound. This upper bound is relevant, that is, not almost surely equal to \(+\infty\), for monotone multistage stochastic optimization problems. In Sect. 5, we indicate how to obtain policies, and we discuss optimality. In Sect. 6, we present an application of the above method to a battery management problem incorporating a very large number of time steps. The appendices of the paper are available as supplementary material on the journal web site. In the supplementary material, we discuss how to take advantage of periodicity properties at the slow time-scale in Appendix A, we give some insights on the numerical complexity of the decomposition methods in Appendix B, and we prove the desired monotonicity-inducing assumptions for the battery problem in Appendix C.

2 Two-time-scale stochastic optimization problems

We present a formal definition of a two-time-scale stochastic optimization problem, that is, with a slow time-scale and a fast time-scale.

2.1 Notations for two time-scales

Given two natural numbers \(r \le s\), we use either the notation \(\llbracket r,s \rrbracket\) or the notation \(r{:}s\) for the set \(\{r,r+1,\dots ,s-1,s\}\).

To properly handle two time-scales, we adopt the following notations. For a given constant time interval \(\Delta t >0\), let \(M \in {\mathbb N}^*\) be such that \((M+1)\) is the number of time steps in the slow time step, e.g. for \(\Delta t = 30\) min, \(M+1 = 48\), when the slow time step correspond to a day. A decision-maker has to make decisions on two time-scales over a given number of slow time steps \((D+1)\in {\mathbb N}^*\):

  1. 1.

    one type of (say, operational) decision every fast time step \(m \in \llbracket 0,M \rrbracket\) of every slow time step \(d \in \llbracket 0,D \rrbracket\),

  2. 2.

    another type of (say, strategic) decision every slow time step \(d \in \llbracket 0,D \rrbracket \cup \{D{+}1\}.\)

In our model the time flows between two slow time steps d and \(d+1\) as follows:

$$\begin{aligned} d,0 \quad \xrightarrow [\;\;\; \Delta t \;\;\;] \quad d,1 \quad \xrightarrow [\;\;\; \Delta t \;\;\;] \quad \dots \quad \xrightarrow [\;\;\; \Delta t \;\;\;] \quad d,M \quad \xrightarrow [\;\;\; \Delta t \;\;\;] \quad d+1,0 \end{aligned}$$

A variable z will have two time indexes \(z_{d,m}\) if it changes every fast time step m of every slow time step d. An index (dm) belongs to the set

$$\begin{aligned} {\mathbb T}= \llbracket 0,D \rrbracket \times \llbracket 0,M \rrbracket \cup \{ \big ( D{+}1,0\big ) \} \; , \end{aligned}$$
(1)

which is a totally ordered set when equipped with the following lexicographical order \(\preceq\):

$$\begin{aligned} (d,m) \preceq (d',m') \iff \big ( d \le d'\big ) \text { or } \big ( d= d' \text { and } m \le m'\big ) \; . \end{aligned}$$
(2)

We also use the following notations for describing sequences of variables and sequences of spaces. For (dm) and \((d,m') \in {\mathbb T}\), with \(m \le m'\):

  • the notation \(z_{d,m:m'}\) refers to the sequence of variables \(\big ( z_{d,m},z_{d,m+1},\dots , z_{d,m'-1},z_{d,m'}\big ),\)

  • the notation \({\mathbb Z}_{d,m:m'}\) refers to the Cartesian product \(\prod _{k=m}^{m'} {\mathbb Z}_{d,k}\) of spaces \(\left\{ {\mathbb Z}_{d,k}\right\} _{k\in \llbracket m,m' \rrbracket }\).

2.2 Two-time-scale multistage stochastic optimization setting

We consider a probability space \((\Omega ,\mathcal{F},\mathbb {P})\). Random variables are denoted using bold letters, and we denote by \(\sigma \big ( \mathbf {{Z}}\big )\) the \(\sigma\)-algebra generated by the random variable \(\mathbf {{Z}}\).

We consider an exogenous noise process \(\mathbf {{W}}=\left\{ \mathbf {{W}}_{d} \right\} _{d\in \llbracket 0,D \rrbracket }\) at the slow

time-scale, as detailed below. For any \(d\in \llbracket 0,D \rrbracket\), the random variable \(\mathbf {{W}}_{d}\) consists of a sequence of random variables \(\left\{ \mathbf {{W}}_{d,m} \right\} _{m\in \llbracket 0,M \rrbracket }\) at the fast time-scale:

$$\begin{aligned} \mathbf {{W}}_d = (\mathbf {{W}}_{d,0}, \dots , \mathbf {{W}}_{d,m}, \dots ,\mathbf {{W}}_{d,M}) \; .\end{aligned}$$
(3)

Each random variable \(\mathbf {{W}}_{d,m}:\Omega \rightarrow {\mathbb W}_{d,m}\) takes values in a Borel spaceFootnote 1\({\mathbb W}_{d,m}\) (“uncertainty” space), so that \(\mathbf {{W}}_{d}:\Omega \rightarrow {\mathbb W}_{d}\) takes values in the product space \({\mathbb W}_{d} = {\mathbb W}_{d,0:M}\). For any \((d,m)\in {\mathbb T}\), we denote by \(\mathcal {F}_{d,m}\) the \(\sigma\)-field generated by all noises up to time (dm), that is,

$$\begin{aligned} \mathcal {F}_{d,m} = \sigma \big ( \mathbf {{W}}_0,\dots ,\mathbf {{W}}_{d-1},\mathbf {{W}}_{d,0},\dots ,\mathbf {{W}}_{d,m}\big ) \; .\end{aligned}$$
(4a)

We also introduce the filtration \(\mathcal{F}_{\llbracket 0,D \rrbracket }\) at the slow time-scale:

$$\begin{aligned} \mathcal {F}_{\llbracket 0,D \rrbracket }= \{\mathcal {F}_{d,M}\}_{d\in \llbracket 0,D \rrbracket } = \big ( \mathcal {F}_{0,M}, \ldots , \mathcal {F}_{D,M} \big ) \; .\end{aligned}$$
(4b)

In the same vein, we introduce a decision process \(\mathbf {{U}}=\left\{ \mathbf {{U}}_{d} \right\} _{d\in \llbracket 0,D \rrbracket }\) at the slow time-scale, where each \(\mathbf {{U}}_{d}\) consists of a sequence \(\left\{ \mathbf {{U}}_{d,m} \right\} _{m\in \llbracket 0,M \rrbracket }\) of decision variables at the fast time-scale:

$$\begin{aligned} \mathbf {{U}}_d = (\mathbf {{U}}_{d,0}, \ldots , \mathbf {{U}}_{d,m}, \ldots ,\mathbf {{U}}_{d,M}) \; .\end{aligned}$$
(5)

Each random variable \(\mathbf {{U}}_{d,m}:\Omega \rightarrow {\mathbb U}_{d,m}\) takes values in a Borel space \({\mathbb U}_{d,m}\) (“control” space), and we denote by \({\mathbb U}_{d}\) the Cartesian product \({\mathbb U}_{d,0:M}\). We finally introduce a state process \(\mathbf {{X}}=\left\{ \mathbf {{X}}_{d} \right\} _{d\in \llbracket 0,D{+}1 \rrbracket }\) at the slow time-scale, where each random variable \(\mathbf {{X}}_{d}:\Omega \rightarrow {\mathbb X}_{d}\) takes values in a Borel space \({\mathbb X}_{d}\) (“state” space). Note that, unlike processes \(\mathbf {{W}}\) and \(\mathbf {{U}}\), the state process \(\mathbf {{X}}\) is defined only at the slow time-scale. Thus, for any \(d\in \llbracket 0,D{+}1 \rrbracket\), the random variable \(\mathbf {{X}}_{d}\) represents the system state at time (d, 0).

We also consider Borel spaces \({\mathbb Y}_{d}\) such that, for each \(d\in \llbracket 0,D{+}1 \rrbracket\), \({\mathbb X}_{d}\) and \({\mathbb Y}_{d}\) are paired spaces when equipped with a bilinear form \(\langle \cdot ,\cdot \rangle\). In this paper, we assume that each state space \({\mathbb X}_{d}\) is the vector space \({\mathbb R}^{n_{d}}\), so that \({\mathbb Y}_{d}={\mathbb R}^{n_{d}}\), the bilinear form \(\langle \cdot ,\cdot \rangle\) being the standard scalar product.

For each \(d\in \llbracket 0,D \rrbracket\), we introduce a nonnegative Borel-measurable instantaneous cost function \(L_{d}: {\mathbb X}_{d}\times {\mathbb U}_{d}\times {\mathbb W}_{d} \rightarrow [0,+\infty ]\) and a Borel-measurable dynamics \(f_{d}: {\mathbb X}_{d}\times {\mathbb U}_{d}\times {\mathbb W}_{d} \rightarrow {\mathbb X}_{d+1}\). Note that both the instantaneous cost \(L_{d}\) and the dynamics \(f_{d}\) depend on all the fast time-scale decision and noise variables constituting the slow time step d. We also introduce a nonnegative Borel-measurable final cost function \(K: {\mathbb X}_{D{+}1} \rightarrow [0,+\infty ]\).Footnote 2

With all these ingredients, we write a two-time-scale stochastic optimization problem

$$\begin{aligned} V^{\textrm{e}}(x) = \inf _{\mathbf {{X}},\mathbf {{U}}} \;&{\mathbb E}\bigg [\sum _{d=0}^{D} L_d(\mathbf {{X}}_d,\mathbf {{U}}_d,\mathbf {{W}}_{d}) + K(\mathbf {{X}}_{D{+}1})\bigg ] \; ,\end{aligned}$$
(6a)
$$\begin{aligned} \text {s.t} \quad&\mathbf {{X}}_{0} = x \; , \hspace{5.0pt}\mathbf {{X}}_{d+1} = f_{d}(\mathbf {{X}}_d,\mathbf {{U}}_d,\mathbf {{W}}_{d}) \; , \hspace{5.0pt}\forall d \in \llbracket 0,D \rrbracket \; ,\end{aligned}$$
(6b)
$$\begin{aligned}&\sigma (\mathbf {{U}}_{d,m}) \subset \mathcal {F}_{d,m} \; , \hspace{5.0pt}\forall (d,m)\in \llbracket 0,D \rrbracket \times \llbracket 0,M \rrbracket \; .\end{aligned}$$
(6c)

The expected cost value in (6) is well defined, as all functions are nonnegative measurable. Constraint (6c)—where \(\sigma (\mathbf {{U}}_{d,m})\) is the \(\sigma\)-field generated by the random variable \(\mathbf {{U}}_{d,m}\) — expresses the fact that each decision \(\mathbf {{U}}_{d,m}\) is \(\mathcal {F}_{d,m}\)-measurable, that is, is nonanticipative. The function \(V^{\textrm{e}}\) is called the optimal value function of Problem (6).

The notation \(V^{\textrm{e}}(x)\) for the optimal value of Problem (6) emphasizes the fact that the dynamics equations (6b) correspond to equality constraints (as is classical). We also introduce a relaxation of Problem (6). For this purpose, we consider the following multistage stochastic optimization problem:

$$\begin{aligned} V^{\textrm{i}}(x) = \inf _{\mathbf {{X}},\mathbf {{U}}} \;&{\mathbb E}\bigg [\sum _{d=0}^{D} L_{d}(\mathbf {{X}}_d,\mathbf {{U}}_d,\mathbf {{W}}_{d}) + K(\mathbf {{X}}_{D+1})\bigg ] \; ,\end{aligned}$$
(7a)
$$\begin{aligned} \text {s.t} \quad&\mathbf {{X}}_{0} = x \; , \hspace{5.0pt}f_{d}(\mathbf {{X}}_d,\mathbf {{U}}_d,\mathbf {{W}}_{d}) \ge \mathbf {{X}}_{d+1} \; , \hspace{5.0pt}\forall d\in \llbracket 0,D \rrbracket \; ,\end{aligned}$$
(7b)
$$\begin{aligned}&\sigma (\mathbf {{U}}_{d,m}) \subset \mathcal {F}_{d,m} \; , \hspace{5.0pt}\forall (d,m)\in \llbracket 0,D \rrbracket \times \llbracket 0,M \rrbracket \; ,\end{aligned}$$
(7c)
$$\begin{aligned}&\sigma (\mathbf {{X}}_{d+1}) \subset \mathcal {F}_{d,M} \; , \hspace{5.0pt}\forall d\in \llbracket 0,D \rrbracket \; .\end{aligned}$$
(7d)

We have relaxed the dynamic equality constraints (6b) into inequality constraints (7b). Thus, Problem (7) is less constrained than Problem (6), so that the optimal value function \(V^{\textrm{i}}\) of Problem (7) is less than the optimal value function \(V^{\textrm{e}}\) of Problem (6):

$$\begin{aligned} V^{\textrm{i}}(x) \le V^{\textrm{e}}(x) \; , \hspace{5.0pt}\forall x \in {\mathbb X}_{0} \; .\end{aligned}$$
(8)

Remark 1

We just consider as explicit constraints the dynamic constraints (6b) and the nonanticipativity constraints (6c), but other constraints involving the state and the control can be incorporated in the instantaneous cost \(L_{d}\) and in the final cost \(K\) by means of indicator functionsFootnote 3 as \(L_{d}\) and \(K\) can take the value \(+\infty\).

Problem (6) seems very similar to a classical discrete time multistage stochastic optimization problem. But an important difference appears in the nonanticipativity constraints (6c) that express the fact that the decision vector \(\mathbf {{U}}_d = (\mathbf {{U}}_{d,0},\dots ,\mathbf {{U}}_{d,M})\) at every slow time step d does not display the same measurability for each component (information grows every fast time step). This point of view is not referred to in the literature and is one of the novelty of our approach.

3 Time block decomposition, Bellman functions and monotonicity assumptions

In Sect. 3.1, we introduce Bellman functions at the slow time scale, as a way to decompose a two-time-scale stochastic optimization problem in time blocks. In Sect. 3.2, we introduce assumptions on the data of Problem (6) which allow us to make the link between the sequence of Bellman functions associated with Problem (6) and the sequence of Bellman functions associated with Problem (7).

3.1 Time block decomposition and Bellman functions at the slow time-scale

Stochastic Dynamic Programming, based on Bellman optimality principle, is a classical way to decompose multistage stochastic optimization problems into multiple but smaller static optimization problems. In this paragraph, we apply the Bellman recursion by time blocks to decompose the multistage two-time-scale stochastic optimization Problem (7) into multiple smaller problems that are stochastic optimization problems over a single slow time step.

We first introduce a sequence \(\left\{ V_{d}^{\textrm{e}} \right\} _{d\in \llbracket 0,D{+}1 \rrbracket }\) of slow time-scale Bellman functions associated with Problem (6). These functions are defined by backward induction as follows. At time \(D{+}1\), we set \(V_{D{+}1}^{\textrm{e}} = K\), and then, for \(d\in \llbracket 0,D \rrbracket\) and for all \(x\in {\mathbb X}_{d}\), we set

$$\begin{aligned} V_{d}^{\textrm{e}}(x) = \inf _{\mathbf {{X}}_{d+1},\mathbf {{U}}_{d}} \;&{\mathbb E}\; \big [ L_{d}(x,\mathbf {{U}}_d,\mathbf {{W}}_{d}) + V_{d+1}^{\textrm{e}}(\mathbf {{X}}_{d+1})\big ] \; ,\end{aligned}$$
(9a)
$$\begin{aligned} \text {s.t} \quad&\mathbf {{X}}_{d+1} = f_{d}(x,\mathbf {{U}}_d,\mathbf {{W}}_{d}) \; ,\end{aligned}$$
(9b)
$$\begin{aligned}&\sigma (\mathbf {{U}}_{d,m}) \subset \sigma (\mathbf {{W}}_{d,0:m}) \; , \hspace{5.0pt}\forall m \in \llbracket 0,M \rrbracket \; ,\end{aligned}$$
(9c)

the expectation in (9a) being taken with respect to the marginal probability of the random vector \(\mathbf {{W}}_{d}\). We also introduce a sequence of slow time-scale Bellman functions \(\left\{ V_{d}^{\textrm{i}} \right\} _{d\in \llbracket 0,D{+}1 \rrbracket }\) associated with Problem (7). At time \(D{+}1\), we set \(V_{D{+}1}^{\textrm{i}} = K\), and then, for \(d\in \llbracket 0,D \rrbracket\) and for all \(x\in {\mathbb X}_{d}\), we set

$$\begin{aligned} V_{d}^{\textrm{i}}(x) = \inf _{\mathbf {{X}}_{d+1},\mathbf {{U}}_{d}} \;&{\mathbb E}\; \big [ L_{d}(x,\mathbf {{U}}_d,\mathbf {{W}}_{d}) + V_{d+1}^{\textrm{i}}(\mathbf {{X}}_{d+1})\big ] \; ,\end{aligned}$$
(10a)
$$\begin{aligned} \text {s.t} \quad&f_{d}(x,\mathbf {{U}}_d,\mathbf {{W}}_{d}) \ge \mathbf {{X}}_{d+1} \; ,\end{aligned}$$
(10b)
$$\begin{aligned}&\sigma (\mathbf {{U}}_{d,m}) \subset \sigma (\mathbf {{W}}_{d,0:m}) \; , \hspace{5.0pt}\forall m \in \llbracket 0,M \rrbracket \; ,\end{aligned}$$
(10c)
$$\begin{aligned}&\sigma (\mathbf {{X}}_{d+1}) \subset \sigma (\mathbf {{W}}_{d,0:M}) \; .\end{aligned}$$
(10d)

Problem (10) is less constrained than Problem (9) because the (dynamics) equality constraints (9b) are more binding than the inequality constraints (10b), and also because (9b) implies the new constraint (10d). Since \(V_{D{+}1}^{\textrm{e}}=V_{D{+}1}^{\textrm{i}}=K\), we obtain by backward induction that the Bellman functions (10) associated with Problem (7) are lower bounds to the Bellman functions (9) associated with Problem (6):

$$\begin{aligned} V_{d}^{\textrm{i}} \le V_{d}^{\textrm{e}} \; , \hspace{5.0pt}\forall d\in \llbracket 0,D{+}1 \rrbracket \; .\end{aligned}$$
(11)

3.2 Bellman functions under monotonicity-inducing assumption

We introduce assumptions on the data of Problem (6) which allow us to make the link between the sequence of Bellman functions \(\left\{ V_{d}^{\textrm{e}} \right\} _{d \in \llbracket 0,D{+}1 \rrbracket }\) associated with Problem (6) and the sequence of Bellman functions \(\left\{ V_{d}^{\textrm{i}} \right\} _{d \in \llbracket 0,D{+}1 \rrbracket }\) associated with Problem (7).

We first formulate an assumption that we call monotonicity-inducing assumption as it is the key ingredient to obtain both the monotonicity of the \(\left\{ V_{d}^{\textrm{e}} \right\} _{d \in \llbracket 0,D{+}1 \rrbracket }\) Bellman functions and, for \({d \in \llbracket 0,D{+}1 \rrbracket }\), the inequality \(V_{d}^{\textrm{i}}\ge V_{d}^{\textrm{e}}\)—the opposite of inequality (11). It is worth noting that this assumption, which seems an ad hoc trick for proving that inequality (11) is in fact an equality, is satisfied in the case study (developed in Sect. 6) which motivates this paper. The fact that this assumption is satisfied for the case study is shown in Appendix C.

Assumption 1

(Monotonicity-inducing) We assume that the data of Problem (9) satisfies the following properties.

  1. 1.

    The final cost function \(K\) is nonincreasing on its effective domain:

    $$\begin{aligned} \forall (x,x') \in \big ( {\textrm{dom}} {K}\big )^{2} \; , \hspace{5.0pt}x' \ge x \; \Longrightarrow \;K(x') \le K(x) \; .\end{aligned}$$
    (12a)
  2. 2.

    For all \(d \in \llbracket 0,D \rrbracket\), the effective domain of the Bellman function \(V^e_d\) is induced by the effective domain of the instantaneous cost function \(L_{d}\), namely

    $$\begin{aligned} {\textrm{dom}} V^e_d = \left\{ x\in {\mathbb X} \;|\; { \exists \,\mathbf {{U}} \text { satisfying (9c) s.t. } {\mathbb E}{\big[L_{d}(x,\mathbf {{U}}_d,\mathbf {{W}}_{d})\big]} <+ \infty} \right\}\;.\end{aligned}$$
    (12b)
  3. 3.

    For all \(d\in \llbracket 0,D \rrbracket\), for any two states \(x' \ge x\) both in \({\textrm{dom}} V_d^{\textrm{e}}\) (where the Bellman function \(V_d^{\textrm{e}}\) is given by (9)) and for any (control) random variable \(\mathbf {{U}}_d\) satisfying (9c) and such that \({\mathbb E} \big [L_{d}(x,\mathbf {{U}}_d,\mathbf {{W}}_{d})\big ] <+\infty\), there exists a (control) random variable \(\widetilde{\mathbf {{U}}}_{d}\) satisfying (9c) such that (almost surely)

    $$\begin{aligned}&f_{d}\big ( x',\widetilde{\mathbf {{U}}}_{d},\mathbf {{W}}_{d}\big )\in {\textrm{dom}} V_{d+1}^{\textrm{e}} \text { and } f_{d}\big ( x',\widetilde{\mathbf {{U}}}_{d},\mathbf {{W}}_{d}\big ) \ge f_{d}\big ( x,\mathbf {{U}}_{d},\mathbf {{W}}_{d}\big ) \end{aligned}$$
    (12c)
    $$\begin{aligned}&L_{d}\big ( x',\widetilde{\mathbf {{U}}}_{d},\mathbf {{W}}_{d}\big ) \le L_{d}\big ( x,{\mathbf {{U}}}_{d},\mathbf {{W}}_{d}\big ) \; . \end{aligned}$$
    (12d)

Let us comment the three items of Assumption 1. The first assumption concerning the final cost function \(K\) is rather natural in stock management, as it just expresses the fact that the more in the stock the more value (hence the less it costs). Moreover, in long term optimization problem, as it is the case in the case study of the paper, the final cost is set to zero so that the assumption is indeed satisfied.

The second assumption concerning the effective domain of the Bellman functions is akin to the relatively complete recourse assumption (Rockafellar and Wets 1976). It is a usual assumption in multistage stochastic optimization, and it is satisfied in our case study.

The third assumption is more involved. However, it is rather natural as it expresses the fact that the higher the state, the higher the next state and the lower the cost. Nevertheless, to achieve higher future state and lower cost, one may have to change control, hence the existence of \(\widetilde{\mathbf {{U}}}_{d}\) in Eqs. (12c) and (12d). Once again, this assumption is satisfied in our case study and more generally in energy problems where having more stock at time t is better for the induced cost and for the stock at the next time step.

Proposition 2

We suppose that monotonicity-inducing Assumption 1 holds true. Then, for all \(d\in \llbracket 0,D{+}1 \rrbracket\), the (original) Bellman function \(V_{d}^{\textrm{e}}\) given by backward induction (9) is nonincreasing on its effective domain, that is,

$$\begin{aligned} \forall d \in \llbracket 0,D{+}1 \rrbracket \; , \hspace{5.0pt}\forall (x,x') \in {\textrm{dom}} {V_{d}^{\textrm{e}}}\times {\textrm{dom}} {V_{d}^{\textrm{e}}} \; , \hspace{5.0pt}x \le x' \; \Longrightarrow \; V_{d}^{\textrm{e}}(x) \ge V_{d}^{\textrm{e}}(x') \; .\end{aligned}$$

Proof

The proof is done by backward induction. At time \(D{+}1\), the Bellman function \(V_{D{+}1}^{\textrm{e}}=K\) is nonincreasing on its effective domain by Condition 1 of Assumption 1. For \(d\in \llbracket 0,D \rrbracket\), assume that \(V_{d+1}^{\textrm{e}}\) is nonincreasing on its effective domain. Let \((x,x') \in {\textrm{dom}} {V_{d}^{\textrm{e}}}\times {\textrm{dom}} {V_{d}^{\textrm{e}}}\) such that \(x \le x'\). For any \(\epsilon > 0\), let \(\mathbf {{u}}_{d}\) be an \(\epsilon\)-solution of Problem (9) starting at state x. We have that

$$\begin{aligned} V_{d}^{\textrm{e}}(x) + \epsilon&\ge {\mathbb E}\; \big [ L_{d}(x,\mathbf {{U}}_d,\mathbf {{W}}_{d}) + V_{d+1}^{\textrm{e}}\big ( f_{d}(x,\mathbf {{U}}_d,\mathbf {{W}}_{d})\big )\big ] \; , \hspace{5.0pt}\end{aligned}$$
(13)

which implies \(L_{d}(x,\mathbf {{U}}_d,\mathbf {{W}}_{d}) < + \infty\) and \(f_{d}(x,\mathbf {{U}}_d,\mathbf {{W}}_{d})\in {\textrm{dom}} {V_{d+1}^{\textrm{e}}}\), \(\mathbb {P}\)-a.s.. From Condition 3 of Assumption 1, there exists a random variable \(\widetilde{\mathbf {{U}}}_{d}\) satisfying the measurability constraint (9c) and satisfying \(\mathbb {P}\)-a.s. Eqs. (12c) and (12d). Using the induction assumption and Eq. (12c) we obtain that \(V_{d+1}^{\textrm{e}}\big ( f_{d}(x',\widetilde{\mathbf {{U}}}_d,\mathbf {{W}}_{d})\big ) \le V_{d+1}^{\textrm{e}}\big ( f_{d}(x,\mathbf {{U}}_d,\mathbf {{W}}_{d})\big )\) almost surely which, combined with Eq. (13), implies that the random variable \(V_{d+1}^{\textrm{e}}\big ( f_{d}(x',\widetilde{\mathbf {{U}}}_d,\mathbf {{W}}_{d})\big )\) is integrable. Using Eq. (12d) combined with Eq. (13) we also obtain that the random variable \(L_{d}(x',\widetilde{\mathbf {{U}}}_d,\mathbf {{W}}_{d})\) is integrable and smaller than \(L_{d}(x,{\mathbf {{U}}}_d,\mathbf {{W}}_{d})\). We therefore have

$$\begin{aligned} V_{d}^{\textrm{e}}(x) + \epsilon&\ge {\mathbb E}\; \big [ L_{d}(x,\mathbf {{U}}_d,\mathbf {{W}}_{d}) + V_{d+1}^{\textrm{e}}\big ( f_{d}(x,\mathbf {{U}}_d,\mathbf {{W}}_{d})\big )\big ] \quad \quad \quad \quad (\text {by~(13)}) \\&\ge {\mathbb E}\; \big [ L_{d}(x',\widetilde{\mathbf {{U}}}_d,\mathbf {{W}}_{d}) + V_{d+1}^{\textrm{e}}\big ( f_{d}(x',\widetilde{\mathbf {{U}}}_d,\mathbf {{W}}_{d})\big )\big ] \quad \quad \quad (\text {as shown above}) \\&\ge V_{d}^{\textrm{e}}\big ( x'\big ) \; .\quad \quad \quad (\text {as }\widetilde{\mathbf {{U}}}_d \text { satisfies~(9c)}) \end{aligned}$$

This ends the proof. \(\square\)

We are now able to formulate the main proposition of this section.

Proposition 3

We suppose that monotonicity-inducing Assumption 1 holds true. Then, for any \(d\in \llbracket 0,D{+}1 \rrbracket\), the (original) Bellman function \(V_{d}^{\textrm{e}}\) in (9) coincides with the (relaxed) Bellman function \(V_{d}^{\textrm{i}}\) in (10):

$$\begin{aligned} V_{d}^{\textrm{i}} = V_{d}^{\textrm{e}} \; , \hspace{5.0pt}\forall d\in \llbracket 0,D{+}1 \rrbracket \; .\end{aligned}$$

Proof

By Eq. (11), we have that \(V_{d}^{\textrm{i}} \le V_{d}^{\textrm{e}}\) for all \(d\in \llbracket 0,D{+}1 \rrbracket\). To obtain the reverse inequality, we proceed by backward induction. At time \(D+1\), the two functions \(V_{D{+}1}^{\textrm{e}}\) and \(V_{D{+}1}^{\textrm{i}}\) are both equal to the function \(K\). Let d be fixed in \(\llbracket 0,D \rrbracket\) and assume that \(V_{d+1}^{\textrm{i}} = V_{d+1}^{\textrm{e}}\). For any \(x \in {\textrm{dom}} {V_{d}^{\textrm{i}}}\) and for any \(\epsilon > 0\), let \((\mathbf {{X}}_{d+1},\mathbf {{U}}_{d})\) be an \(\epsilon\)-optimal solution of Problem (10). We have that \(f_{d}(x,\mathbf {{U}}_d,\mathbf {{W}}_{d}) \ge \mathbf {{X}}_{d+1}\) by Eq. (10b) and

$$\begin{aligned} V_{d}^{\textrm{i}}(x) + \epsilon&\ge \; {\mathbb E}\; \big [ L_{d}(x,\mathbf {{U}}_d,\mathbf {{W}}_{d}) + V_{d+1}^{\textrm{i}}(\mathbf {{X}}_{d+1})\big ] \; ,\nonumber \\&= \; {\mathbb E}\; \big [ L_{d}(x,\mathbf {{U}}_d,\mathbf {{W}}_{d}) + V_{d+1}^{\textrm{e}}(\mathbf {{X}}_{d+1})\big ] \; ,\end{aligned}$$
(14)

by the induction assumption. From (14), we deduce that \(\big [L_{d}(x,\mathbf {{U}}_d,\mathbf {{W}}_{d})\big ]<+\infty\) and that \(\mathbf {{X}}_{d+1}\in {\textrm{dom}} {V_{d+1}^{\textrm{e}}}\), \(\mathbb {P}\)-a.s.. Using Condition 2 of Assumption 1 we obtain that \(x \in {\textrm{dom}} V^e_d\) and using Condition 3 of Assumption 1 with \(x'=x\) we obtain that there exists a random variable \(\widetilde{\mathbf {{U}}}_{d}\) satisfying the measurability constraint (9c) and satisfying both Eqs. (12c) and (12d). Using Eq. (12c) we obtain that, \(\mathbb {P}\)-a.s.,

$$\begin{aligned} f_{d}(x,\widetilde{\mathbf {{U}}}_d,\mathbf {{W}}_{d}) \ge f_{d}(x,{\mathbf {{U}}}_d,\mathbf {{W}}_{d}) \ge \mathbf {{X}}_{d+1} \text { and } f_{d}(x,\widetilde{\mathbf {{U}}}_d,\mathbf {{W}}_{d}) \in {\textrm{dom}} V_{d+1}^{\textrm{e}} \; .\end{aligned}$$
(15)

Now, we obtain successively

$$\begin{aligned} V_{d}^{\textrm{i}}(x) + \epsilon&\ge \; {\mathbb E}\; \big [ L_{d}(x,\mathbf {{U}}_d,\mathbf {{W}}_{d}) + V_{d+1}^{\textrm{e}}(\mathbf {{X}}_{d+1})\big ] \quad \quad \quad (\text {by~(14)}) \\&\ge \; {\mathbb E}\; \big [ L_{d}(x,\widetilde{\mathbf {{U}}}_d,\mathbf {{W}}_{d}) + V_{d+1}^{\textrm{e}}(f_{d}(x,\widetilde{\mathbf {{U}}}_d,\mathbf {{W}}_{d}))\big ] \quad \quad (\text {by~(12d), (15) and Proposition~2}) \\&\ge V_{d}^{\textrm{e}}\big ( x\big ) \; .\quad \quad \quad (\text {as }\widetilde{\mathbf {{U}}}_d \text {satisfies~(9c)}) \end{aligned}$$

We thus obtain the reverse inequality \(V_{d}^{\textrm{i}}\ge V_{d}^{\textrm{e}}\), hence the result. \(\square\)

The issue is that performing the backward induction (10) requires to solve a multistage stochastic optimization problem at the fast time-scale for each \(d \in \llbracket 0,D \rrbracket\) and for each \(x \in {\mathbb X}_d\). In the next section, we present two methods to compute bounds of the Bellman functions \(V_{d}^{\textrm{i}}\) at the slow time-scale, that allow to simplify the backward induction.

4 Price/resource decomposition of the dynamics in the Bellman functions

We aim at finding tractable algorithms to numerically solve the backward induction (10) and obtain the corresponding sequence of Bellman functions \(\{V_{d}^{\textrm{i}}\}_{d\in \llbracket 0,D{+}1 \rrbracket }\). Indeed, these Bellman functions are not easily obtained. The main issue is that each optimization problem (10) is a multistage stochastic optimization problem at the fast time-scale, that has to be solved for every \(d \in \llbracket 0,D \rrbracket\) and every \(x \in {\mathbb X}_d\), and each numerical solving might be in itself hard.

To tackle this issue, we propose in Sects. 4.1 and 4.2 to compute respectively lower and upper bounds of the Bellman functions at the slow time-scale. Computing the lower (resp. upper) bounds of the Bellman functions is done by Algorithm 2. (resp. Algorithm 1) based on so-called price (resp. ressource) decomposition techniques [see Bertsekas (1999, Chap. 6) and Carpentier and Cohen (2017)] applied to Problem (10). These two algorithms are precisely described in Appendix A. These two Bellman functions bounds can then be used to design admissible two-time-scale optimization policies (see Sect. 5).

Both algorithms involve the computation of auxiliary functions that gather the fast time-scale computations, and that are numerically appealing because they allow to exploit some potential periodicity of two-time-scale problems, as well as parallel computation. This point is developed in Appendix A.

4.1 Lower bounds of the Bellman functions

We present lower bounds for the Bellman functions \(\left\{ V_{d}^{\textrm{i}} \right\} _{d\in \llbracket 0,D{+}1 \rrbracket }\) given by Eq. (10). These bounds derive from an algorithm which appears to be connected to the one developed in Heymann and Martinon (2018), called “adaptive weights algorithm”. We extend the results of Heymann and Martinon (2018) in a stochastic setting and in a more general framework, as we are not tied to a battery management problem and as we use a more direct way to reach similar conclusions.

To obtain lower bounds of the sequence \(\{V_{d}^{\textrm{i}}\}_{d\in \llbracket 0,D{+}1 \rrbracket }\) of Bellman functions, we dualize the dynamic equations (10b) with Lagrange multipliers, and we use weak duality. The multipliers (called prices here) could be chosen in the class of nonpositive \(\mathcal{F}_{\llbracket 0,D \rrbracket }\)-adapted processes but it is enough, to get lower bounds, to stick to deterministic price processes. Following these lines, we obtain a lower bound as follows.

For each \(d \in \llbracket 0,D \rrbracket\), we define the function \(L_{d}^{\textrm{P}}: {\mathbb X}_{d}\times {\mathbb Y}_{d+1} \rightarrow {\mathbb R}\cup \left\{ \pm \infty \right\}\) byFootnote 4

$$\begin{aligned} L_{d}^{\textrm{P}}(x_{d},p_{d+1}) = \inf _{\mathbf {{U}}_{d}} \;&{\mathbb E}\; \big [L_{d}(x_{d},\mathbf {{U}}_d,\mathbf {{W}}_{d}) + \langle p_{d+1} f_{d}(x_{d},\mathbf {{U}}_d,\mathbf {{W}}_{d}),\big \rangle ] \; ,\end{aligned}$$
(16a)
$$\begin{aligned} \text {s.t.} \quad&\sigma (\mathbf {{U}}_{d,m}) \subset \sigma (\mathbf {{W}}_{d,0:m}) \; , \hspace{5.0pt}\forall m \in \llbracket 0,M \rrbracket \; ,\end{aligned}$$
(16b)

where \(L_{d}\) and \(f_{d}\) are respectively the instantaneous cost function and the dynamics of Problem (6).

Proposition 4

Consider the sequence \(\left\{ \underline{V}_{d}^{\textrm{P}} \right\} _{d\in \llbracket 0,D{+}1 \rrbracket }\) of Bellman functions which is defined by \(\underline{V}_{D{+}1}^{\textrm{P}} = K\) and for all \(d\in \llbracket 0,D \rrbracket\), and for all \(x\in {\mathbb X}_d\) byFootnote 5

$$\begin{aligned} \underline{V}_{d}^{\textrm{P}}(x) = \sup _{p_{d+1} \le 0} \bigg ( L_{d}^{\textrm{P}}(x,p_{d+1}) - \big ( \underline{V}_{d+1}^{\textrm{P}}\big )^{\star }(p_{d+1}) \bigg ) \; ,\end{aligned}$$
(17)

where \(\big ( \underline{V}_{d+1}^{\textrm{P}}\big )^{\star }: {\mathbb Y}_{d+1} \rightarrow {\mathbb R}\cup \left\{ \pm \infty \right\}\) is the Fenchel conjugate of \(\underline{V}_{d+1}^{\textrm{P}}\) [see Rockafellar (2015)]. Then, the Bellman functions \(\left\{ \underline{V}_{d}^{\textrm{P}} \right\} _{\llbracket 0,D{+}1 \rrbracket }\) given by Eq. (17) are lower bounds of the corresponding Bellman functions \(\left\{ V_{d}^{\textrm{i}} \right\} _{d\in \llbracket 0,D{+}1 \rrbracket }\) given by Eq. (10), that is,

$$\begin{aligned} \underline{V}_{d}^{\textrm{P}} \le V_{d}^{\textrm{i}} \; , \hspace{5.0pt}\forall d\in \llbracket 0,D{+}1 \rrbracket \; .\end{aligned}$$
(18)

Proof

We start the proof by a preliminary interchange result. We consider a subset \(\mathcal {X}\) of the space of random variables taking values in a Borel space \({\mathbb X}\) and a measurable function \(\varphi : {\mathbb X}\rightarrow {\mathbb R}\cup \{\pm \infty \}\). We assume that \(\mathcal {X}\) contains all the constant random variables. We prove that

$$\begin{aligned} \inf _{\mathbf {{X}}\in \mathcal {X}} {\mathbb E}\big [\varphi (\mathbf {{X}})\big ] = \inf _{x\in {\mathbb X}} \varphi (x) \; .\end{aligned}$$
(19)
  • The \(\le\) inequality \(\inf _{\mathbf {{X}}\in \mathcal {X}} {\mathbb E}\big [\varphi (\mathbf {{X}})\big ] \le \inf _{x\in {\mathbb X}} \varphi (x)\) is clear as \(\mathcal {X}\) contains all the constant random variables.

  • The reverse inequality holds true if \(\inf _{x\in {\mathbb X}} \varphi (x) = -\infty\) since \(\inf _{\mathbf {{X}}\in {\mathcal X}} {\mathbb E}\big [\varphi (\mathbf {{X}})\big ] \le \inf _{x\in {\mathbb X}} \varphi (x)\). Assume now that \(\inf _{x\in {\mathbb X}} \varphi (x) = \underline{\varphi } > -\infty\). Then \(\varphi (\mathbf {{X}}) \ge \underline{\varphi } \;\) \(\mathbb {P}\)-a.s. for all \(\mathbf {{X}}\in \mathcal {X}\) and hence \(\inf _{\mathbf {{X}}\in \mathcal {X}} {\mathbb E}\big [\varphi (\mathbf {{X}})\big ] \ge \underline{\varphi }\). Consider an arbitrary \(\epsilon >0\) and \(\mathbf {{X}}_{\epsilon }\) such that \({\mathbb E}\big [\varphi (\mathbf {{X}}_{\epsilon })\big ] \le \inf _{\mathbf {{X}}\in X} {\mathbb E}\big [\varphi (\mathbf {{X}})\big ] + \epsilon\). We successively obtain \(\inf _{x\in {\mathbb X}} \varphi (x) = {\mathbb E}\big [ \inf _{x\in {\mathbb X}} \varphi (x)\big ] \le {\mathbb E}\big [\varphi (\mathbf {{X}}_{\epsilon })\big ] \le \inf _{\mathbf {{X}}\in X} {\mathbb E}\big [\varphi (\mathbf {{X}})\big ] + \epsilon\). Thus, the reverse inequality \(\inf _{\mathbf {{X}}\in \mathcal {X}} {\mathbb E}\big [\varphi (\mathbf {{X}})\big ] \ge \inf _{x\in {\mathbb X}} \varphi (x)\) follows, hence the equality in (19).

We turn now to the proof of (18), that we do by backward induction. First, we have that \(\underline{V}_{D+1}^{\textrm{P}} = K= V_{D+1}^{\textrm{i}}\). Second, consider \(d\in \llbracket 0,D \rrbracket\) and assume that \(\underline{V}_{d+1}^{\textrm{P}} \le V_{d+1}^{\textrm{i}}\). Explicitly using the Moreau lower addition ⨥ (see Footnote 5), we successively haveFootnote 6

$$\begin{aligned} \underline{V}_{d}^{\textrm{P}}(x)&= \sup _{p_{d+1} \le 0} \bigg ( L_{d}^{\textrm{P}}(x,p_{d+1}) \stackrel{+}{.} \big ( - \big ( \underline{V}_{d+1}^{\textrm{P}}\big )^{\star }(p_{d+1})\big )\bigg ) \quad \quad \quad (\text {by~(17)}) \\&= \sup _{p_{d+1} \le 0} \bigg ( L_{d}^{\textrm{P}}(x,p_{d+1}) \stackrel{+}{.}{\inf _{{X}_{d+1}} \big ( - \langle p_{d+1},{x}_{d+1}\rangle + \underline{V}_{d+1}^{\textrm{P}}({x}_{d+1})\big )}\bigg ) \quad (\text {definition of}) \big ( \underline{V}_{d+1}^{\textrm{P}}\big )^{\star } \\&\le \sup _{p_{d+1} \le 0} \bigg ( L_{d}^{\textrm{P}}(x,p_{d+1}) \stackrel{+}{.} {\inf _{{X}_{d+1}} \big ( - \langle p_{d+1},{x}_{d+1}\rangle + \underline{V}_{d+1}^{\textrm{i}}({x}_{d+1})\big )}\bigg ) \!\! \quad (\text {induction assumption}) \\&= \sup _{p_{d+1} \le 0} \bigg ( L_{d}^{\textrm{P}}(x,p_{d+1}) \stackrel{+}{.} \inf _{\mathbf {{X}}_{d+1}} {\mathbb E}\big [ - \langle p_{d+1},\mathbf {{X}}_{d+1}\rangle + \underline{V}_{d+1}^{\textrm{i}}(\mathbf {{X}}_{d+1})\big ] \bigg ) \quad (\text {interchange result}) \\&\le \sup _{p_{d+1} \le 0} \; \inf _{\mathbf {{U}}_{d}} {\mathbb E}\big [L_{d}(x_{d},\mathbf {{U}}_d,\mathbf {{W}}_{d}) +\langle p_{d+1},f_{d}(x,\mathbf {{U}}_d,\mathbf {{W}}_{d})\rangle \big ] \\&\hspace{3cm} \stackrel{+}{.} \inf _{\mathbf {{X}}_{d+1}} {\mathbb E}\big [ - \langle p_{d+1},\mathbf {{X}}_{d+1}\rangle + \underline{V}_{d+1}^{\textrm{i}}(\mathbf {{X}}_{d+1})\big ] \end{aligned}$$

by substituting (16), and by using subadditivity of the infimum operation with respect to the Moreau lower addition ⨥

$$\begin{aligned}&= \sup _{p_{d+1} \le 0} \; \inf _{\mathbf {{U}}_{d},\mathbf {{x}}_{d+1}} {\mathbb E}\big [L_{d}(x_{d},\mathbf {{U}}_d,\mathbf {{W}}_{d}) +\langle p_{d+1},f_{d}(x,\mathbf {{U}}_d,\mathbf {{W}}_{d}) -\mathbf {{X}}_{d+1} \rangle + \underline{V}_{d+1}^{\textrm{i}}(\mathbf {{X}}_{d+1})\big ] \\&\le \inf _{\mathbf {{U}}_{d},\mathbf {{X}}_{d+1}} \; \sup _{p_{d+1} \le 0} {\mathbb E}\big [L_{d}(x_{d},\mathbf {{U}}_d,\mathbf {{W}}_{d}) +\langle p_{d+1},f_{d}(x,\mathbf {{U}}_d,\mathbf {{W}}_{d}) -\mathbf {{X}}_{d+1} \rangle + \underline{V}_{d+1}^{\textrm{i}}(\mathbf {{X}}_{d+1})\big ] \\&\le \inf _{\mathbf {{X}}_{d+1},\mathbf {{U}}_{d}} {\mathbb E}\big [ L_{d}(x,\mathbf {{U}}_d,\mathbf {{W}}_{d}) + V_{d+1}^{\textrm{i}}(\mathbf {{X}}_{d+1})\big ] \quad \text {s.t} \quad f_{d}(x,\mathbf {{U}}_d,\mathbf {{W}}_{d}) \ge \mathbf {{X}}_{d+1} \quad \quad \quad (\text {by weak duality}) \\&= \underline{V}_{d}^{\textrm{i}}(x)\; .\end{aligned}$$

This ends the proof. \(\square\)

Remark 5

In the presentation above, we could have defined the sequence \(\left\{ \underline{V}_{d}^{\textrm{P}} \right\} _{d\in \llbracket 0,D{+}1 \rrbracket }\) as

$$\begin{aligned} \underline{V}_{d}^{\textrm{P}}(x) = \sup _{p_{d+1}\in {\mathbb Y}_{d+1}} \bigg ( L_{d}^{\textrm{P}}(x,p_{d+1}) - \big ( \underline{V}_{d+1}^{\textrm{P}}\big )^{\star }(p_{d+1}) \bigg ) \; ,\end{aligned}$$

that is, by maximizing over the whole space \({\mathbb Y}_{d+1}\) instead of the set of nonpositive prices. Then, in Proposition 4, we would have obtained the inequalities

$$\begin{aligned} \underline{V}_{d}^{\textrm{P}} \le V_{d}^{\textrm{e}} \; , \hspace{5.0pt}\forall d\in \llbracket 0,D{+}1 \rrbracket \; ,\end{aligned}$$

the Bellman functions \(V_{d}^{\textrm{e}}\) replacing the \(V_{d}^{\textrm{i}}\). We did not do that in order to be coherent with the computation of the upper bounds in the next section, in which using the Bellman functions \(V_{d}^{\textrm{i}}\) is mandatory.

4.2 Upper bounds of the Bellman functions

We present upper bounds for the Bellman functions \(\left\{ V_{d}^{\textrm{i}} \right\} _{d\in \llbracket 0,D{+}1 \rrbracket }\) given by Equation (10). They are obtained using a kind of resource decomposition scheme associated with the dynamic equations, that is, by requiring that the state at time \(d+1\) be set at a prescribed deterministic value, so that new constraints have to be added. This is made possible by the fact that we relax the almost sure target equality constraint (6b) into the inequality constraint (7b).

We define the function \(L_{d}^{\textrm{R}}: {\mathbb X}_{d}\times {\mathbb X}_{d+1} \rightarrow [0,+\infty ]\) by

$$\begin{aligned} L_{d}^{\textrm{R}}(x_{d},r_{d+1}) = \inf _{\mathbf {{U}}_{d}} \;&{\mathbb E}\; \big [L_{d}(x_{d},\mathbf {{U}}_d,\mathbf {{W}}_{d})\big ] \; ,\end{aligned}$$
(20a)
$$\begin{aligned} \text {s.t.} \quad&f_{d}(x_{d},\mathbf {{U}}_d,\mathbf {{W}}_{d}) \ge r_{d+1} \; ,\end{aligned}$$
(20b)
$$\begin{aligned}&\sigma (\mathbf {{U}}_{d,m}) \subset \sigma (\mathbf {{W}}_{d,0:m}) \; , \hspace{5.0pt}\forall m \in \llbracket 0,M \rrbracket \; ,\end{aligned}$$
(20c)

where \(L_{d}\) and \(f_{d}\) are respectively the instantaneous cost function and the dynamics of Problem (6). Note that the function \(L_{d}^{\textrm{R}}\) can take the value \(+\infty\) since Constraint (20b) may lead to an empty admissibility set. Having replaced the equality constraint (6b) by the inequality constraint (7b) in Problem (7) makes it possible to have the inequality constraint (20b) in the definition of the function \(L_{d}^{\textrm{R}}\). This last inequality ensures that a random variable is almost surely greater or equal to a deterministic quantity, a much more easier situation that ensuring the equality between a random variable and a deterministic quantity.

Proposition 6

Consider the sequence \(\left\{ \overline{V}_{d}^{\textrm{R}} \right\} _{d\in \llbracket 0,D{+}1 \rrbracket }\) of Bellman functions defined inductively by \(\overline{V}_{D{+}1}^{\textrm{R}} = K\) and for all \(d\in \llbracket 0,D \rrbracket\) and for all \(x\in {\mathbb X}_d\) by

$$\begin{aligned} \overline{V}_{d}^{\textrm{R}}(x) = \inf _{r_{d+1} \in {\mathbb X}_{d+1}} \bigg ( L_{d}^{\textrm{R}}(x,r_{d+1}) + \overline{V}_{d+1}^{\textrm{R}}(r_{d+1})\bigg ) \; .\end{aligned}$$
(21)

Then, the Bellman functions \(\left\{ \overline{V}_{d}^{\textrm{R}} \right\} _{\llbracket 0,D{+}1 \rrbracket }\) given by Eq. (21) are upper bounds of the Bellman functions \(\left\{ V_{d}^{\textrm{i}} \right\} _{d\in \llbracket 0,D{+}1 \rrbracket }\) given by Eq. (10), that is,

$$\begin{aligned} V_{d}^{\textrm{i}} \le \overline{V}_{d}^{\textrm{R}} \; , \hspace{5.0pt}\forall d\in \llbracket 0,D{+}1 \rrbracket \; .\end{aligned}$$
(22)

Proof

The proof is done by backward induction. We first have that \(\overline{V}_{D+1}^{\textrm{R}} = K= V_{D+1}^{\textrm{i}}\). Now, consider \(d\in \llbracket 0,D \rrbracket\) and assume that \(V_{d+1}^{\textrm{i}} \le \overline{V}_{d+1}^{\textrm{R}}\). We successively have

$$\begin{aligned} V_{d}^{\textrm{i}}(x)&= \inf _{\mathbf {{X}}_{d+1},\mathbf {{U}}_{d}} {\mathbb E}\big [ L_{d}(x,\mathbf {{U}}_d,\mathbf {{W}}_{d}) + V_{d+1}^{\textrm{i}}(\mathbf {{X}}_{d+1})\big ] \\&\hspace{1cm} \text {s.t} \quad f_{d}(x,\mathbf {{U}}_d,\mathbf {{W}}_{d}) \ge \mathbf {{X}}_{d+1}, \text { and } (10c) \text {--} (10d), \quad \quad \quad (\text {by~(10)}) \\&\le \inf _{{r}_{d+1} \in {\mathbb X}_{d+1}} \inf _{\mathbf {{U}}_{d}} {\mathbb E}\big [ L_{d}(x,\mathbf {{U}}_d,\mathbf {{W}}_{d})\big ] + V_{d+1}^{\textrm{i}}({r}_{d+1}) \\&\hspace{1cm} \text {s.t} \quad f_{d}(x,\mathbf {{U}}_d,\mathbf {{W}}_{d}) \ge {r}_{d+1} \text { and } (10c) \quad \quad \quad (\text {by considering only constant r.v.}\mathbf {{X}}_{d+1}) \\&= \inf _{{r}_{d+1} \in {\mathbb X}_{d+1}} \big ( L_{d}^{\textrm{R}}(x_{d},r_{d+1}) + V_{d+1}^{\textrm{i}}({r}_{d+1})\big ) \quad \quad \quad (\text {by~(20)}) \\&\le \inf _{{r}_{d+1} \in {\mathbb X}_{d+1}} \big ( L_{d}^{\textrm{R}}(x_{d},r_{d+1}) + \overline{V}_{d+1}^{\textrm{R}}({r}_{d+1})\big ) \quad \quad \quad (\text {induction assumption}) \\&= \overline{V}_{d}^{\textrm{R}}(x)\; .\quad \quad \quad (\text {by~(21)}) \end{aligned}$$

This ends the proof. \(\square\)

5 Mixing time block and price/resource decomposition of the dynamics in the Bellman functions

In Sect. 5.1, we show how we can design (not necessarily optimal) policies by means of Bellman functions as obtained in Sect. 3 and Sect. 4. In Sect. 5.2, we discuss optimality.

5.1 Computation of policies

We assume that we have at disposal Bellman functions \(\{\widetilde{V}_d\}_{d\in \llbracket 0,D{+}1 \rrbracket }\) obtained either by resource decomposition (\(\widetilde{V}_d = \overline{V}_{d}^{\textrm{R}}\)), or by price decomposition (\(\widetilde{V}_d = \underline{V}_{d}^{\textrm{P}}\)). The computation of the \(\widetilde{V}_d\)’s, that is, the computation of the \(\overline{V}_{d}^{\textrm{R}}\)’s or \(\underline{V}_{d}^{\textrm{P}}\)’s, constitutes the offline part of the optimization procedure, as described in Algorithms 1 and 2.

Then, for a given slow time step \(d\in \llbracket 0,D \rrbracket\) and a given current state \(x_d \in {\mathbb X}_d\), we can use \(\widetilde{V}_{d+1}\) as an approximation of the Bellman function \(V_{d+1}^{\textrm{e}}\) in order to state a new fast-time-scale problem starting at d for computing the decisions to apply at d. This constitutes the online part of the procedure [in this paper, we do not discuss conditions ensuring that the control below is indeed a random variable, see Bertsekas and Shreve (1996)]:

$$\begin{aligned} \mathbf {{U}}^{*}_{d} \in \mathop {\arg \min }_{\mathbf {{U}}_{d}} \;&{\mathbb E}\big [L_{d}(x_{d},\mathbf {{U}}_{d},\mathbf {{W}}_{d}) + \widetilde{V}_{d+1} \big ( f_{d}(x_{d},\mathbf {{U}}_{d},\mathbf {{W}}_{d})\big )\big ] \; ,\end{aligned}$$
(23a)
$$\begin{aligned} \text {s.t.} \quad&\sigma (\mathbf {{U}}_{d,m}) \subset \sigma (\mathbf {{W}}_{d,0:m}) \; , \hspace{5.0pt}\forall m \in \llbracket 0,M \rrbracket \; .\end{aligned}$$
(23b)

This fast-time-scale optimization problem can be solved by any method that provides an online policy as presented in Bertsekas (2005). The presence of a final cost \(\widetilde{V}_{d}\) ensures that the effects of decisions made at the fast time-scale are taken into account at the slow time-scale.

Nevertheless, it would be time-consuming to produce online policies using the numerical solving of Problem (23) for every slow time step of the horizon in simulation. We present in the next two paragraphs how to obtain two-time-scale policies with prices or resources in less time.


Obtaining a policy using the Bellman function lower bounds (Algorithm 2) In the case where we decompose the problem using deterministic prices, we possibly solve Problem (16) for every couple of initial state and deterministic price \((x_d,p_{d+1}) \in {\mathbb X}_d \times {\mathbb Y}_{d+1}\) and for every \(d\in \llbracket 0,D \rrbracket\). This process produces, for each \(d\in \llbracket 0,D \rrbracket\) and for each \((x_d,p_{d+1}) \in {\mathbb X}_d \times {\mathbb Y}_{d+1}\), an optimal policy \(\pi _{d}^{\textrm{P}}\big [x_d,p_{d+1}\big ]: {\mathbb W}_d \rightarrow {\mathbb U}_{d}\) and an optimal value \(L_{d}^{\textrm{P}}(x_d,p_{d+1})\).

At the beginning of a slow time step d in a state \(x_d \in {\mathbb X}_d\), we compute a price \(p_{d+1}\) solving the following optimization problem

$$\begin{aligned} p_{d+1} \in \mathop {\arg \max }_{p\le 0} \bigg ( L_{d}^{\textrm{P}}(x_d, p) - \big ( \underline{V}_{d+1}^{\textrm{P}}\big )^{\star }(p)\bigg ) \; ,\end{aligned}$$
(24)

where the sequence of Bellman functions lower bounds sequence \(\left\{ \underline{V}_{d}^{\textrm{P}} \right\} _{d\in \llbracket 0,D{+}1 \rrbracket }\) is obtained by solving Eq. (17) using Algorithm 2.

Thanks to this deterministic price \(p_{d+1}\), we apply the corresponding policy \(\pi _{d}^{\textrm{P}}\big [x_d, p_{d+1}\big ]\) to simulate decisions and states drawing a scenario \(w_d\) out of the random process \(\mathbf {{W}}_d\). The next state \(x_{d+1}\) at the beginning of the slow time step \(d+1\) is \(x_{d+1}=f_{d} \big ( x_d,\pi _{d}^{\textrm{P}}\big [x_d,p_{d+1}\big ](w_d),w_d\big )\).


Obtaining a policy using the Bellman function upper bounds (Algorithm 1)

In the case where we decompose the problem using deterministic resources, we possibly solve Problem (20) for every couple of initial state and deterministic resource \((x_d, x_{d+1}) \in {\mathbb X}_d \times {\mathbb X}_{d+1}\) and for every \(d \in \llbracket 0,D \rrbracket\). This leads, for each \(d\in \llbracket 0,D \rrbracket\) and for each \((x_d, x_{d+1}) \in {\mathbb X}_d \times {\mathbb X}_{d+1}\), to an optimal policy \(\pi _{d}^{\textrm{R}}\big [x_d, x_{d+1}\big ]: {\mathbb W}_d \rightarrow {\mathbb U}_{d}\) and to an optimal value \(L_{d}^{\textrm{R}}(x_d, x_{d+1})\).

At the beginning of a slow time step d in a state \(x_d \in {\mathbb X}_d\), we compute a resource (state) \(x_{d+1}\) solving the following optimization problem

$$\begin{aligned} x_{d+1} \in \mathop {\arg \min }_{x \in {\mathbb X}_{d+1}} \bigg ( L_{d}^{\textrm{R}}(x_d, x)+\overline{V}_{d+1}^{\textrm{R}}(x)\bigg ) \; ,\end{aligned}$$
(25)

where the sequence of Bellman functions lower bounds sequence \(\left\{ \overline{V}_{d}^{\textrm{R}} \right\} _{d\in \llbracket 0,D{+}1 \rrbracket }\) is obtained by solving Eq. (21) using Algorithm 1. We apply the corresponding policy \(\pi _{d}^{\textrm{R}}\big [x_d, x_{d+1}\big ]\) to simulate decisions and states drawing a scenario \(w_d\) out of \(\mathbf {{W}}_d\). The next state \(x_{d+1}\) at the beginning of the slow time step \(d+1\) is then \(x_{d+1}=f_d \big ( x_d,\pi _{d}^{\textrm{R}}\big [x_d, x_{d+1}\big ](w_d),w_d\big )\).

5.2 Discussion on optimality

Without any specific assumption (independence, monotonicity), we have obtained by Propositions 4 and 6 that the price Bellman functions \(\underline{V}_{d}^{\textrm{P}}\) and the resource Bellman functions \(\overline{V}_{d}^{\textrm{R}}\) are respectively lower and upper bounds for the Bellman functions associated with (relaxed) Problem (7):

$$\begin{aligned} \underline{V}_{d}^{\textrm{P}} \le V_{d}^{\textrm{i}} \le \overline{V}_{d}^{\textrm{R}} \; , \hspace{5.0pt}\forall d\in \llbracket 0,D{+}1 \rrbracket \; .\end{aligned}$$
(26)

Then, if monotonicity-inducing Assumption 1 holds true, we have by Proposition 3 that these price Bellman functions and resource Bellman functions are also lower and upper bounds for the Bellman functions associated with (original) Problem (6):

$$\begin{aligned} \underline{V}_{d}^{\textrm{P}} \le V_{d}^{\textrm{e}} \le \overline{V}_{d}^{\textrm{R}} \; , \hspace{5.0pt}\forall d\in \llbracket 0,D{+}1 \rrbracket \; .\end{aligned}$$
(27)

The link between the optimal value functions \(V^{\textrm{e}}\) (resp. \(V^{\textrm{i}}\)) and the Bellman functions \(V_{d}^{\textrm{e}}\) (resp. \(V_{d}^{\textrm{e}}\)) is obtained thanks to a specific independence assumption for the noise process \(\mathbf {{w}}\).

Assumption 2

(White noise assumption) The sequence of random vectors \(\left\{ \mathbf {{W}}_d\right\} _{d\in \llbracket 0,D \rrbracket }\) is white, that is, \(\left\{ \big ( \mathbf {{W}}_{d,0},\dots ,\mathbf {{W}}_{d,m},\dots ,\mathbf {{W}}_{d,M} \big ) \right\} _{d\in \llbracket 0,D \rrbracket }\) is a sequence of \(D{+}1\) independent random vectors.

Then the Bellman’s principle of optimality applies at the slow time-scale for the optimization problem (6), leading to a Stochastic Dynamic Programming equation at the slow time-scale.

Remark 7

We do not assume that each random vector \(\mathbf {{W}}_d = (\mathbf {{W}}_{d,0},\dots ,\mathbf {{W}}_{d,M})\) is itself composed of independent random variables.

Proposition 8

Under the white noise Assumption 2, the optimal value function \(V^{\textrm{e}}\) (resp. \(V^{\textrm{i}}\)) solution of (original) Problem (6) (resp. solution of (relaxed) Problem (7)) coincides with the Bellman function \(V_0^{\textrm{e}}\) at time \(t=0\) (resp. \(V_0^{\textrm{i}}\)) given by Bellman Eqs. (9) (resp. (10)). More explicitly, we have that

$$\begin{aligned} V^{\textrm{e}}(x) = V_{0}^{\textrm{e}}(x) \quad \text {and} \quad V^{\textrm{i}}(x) = V_{0}^{\textrm{i}}(x) \; , \hspace{5.0pt}\forall x \in {\mathbb X}_{0} \; .\end{aligned}$$

Proof

The fact that the function \(V^{\textrm{e}}\) is equal to the function \(V_{0}^{\textrm{e}}\) is a consequence of Carpentier et al. (2023, Proposition 4.1) where the machinery for establishing a Dynamic Programming equation in a two-time-scale multistage stochastic optimization setting is developed. To establish the equality between the functions \(V^{\textrm{i}}\) and \(V_{0}^{\textrm{i}}\), we proceed as follows. First, it is easily established that Problem (7) is equivalent to Problem (28) stated below which involves a new decision process \(\mathbf {{\Delta }}=\left\{ \mathbf {{\Delta }}_{d+1} \right\} _{d\in \llbracket 0,D \rrbracket }\), each control variable \(\mathbf {{\Delta }}_{d+1}\) taking values in \({\mathbb X}_{d+1}\):

$$\begin{aligned} V^{\textrm{i}}(x) = \inf _{\mathbf {{X}},\mathbf {{U}},\mathbf {{\Delta }}} \;&{\mathbb E}\bigg [\sum _{d=0}^{D} L_{d}(\mathbf {{X}}_d,\mathbf {{U}}_d,\mathbf {{W}}_{d}) + K(\mathbf {{X}}_{D+1})\bigg ] \; ,\end{aligned}$$
(28a)
$$\begin{aligned} \text {s.t} \quad&\mathbf {{X}}_{0} = x \; , \hspace{5.0pt}\mathbf {{X}}_{d+1} = f_{d}(\mathbf {{X}}_d,\mathbf {{U}}_d,\mathbf {{W}}_{d}) - \mathbf {{\Delta }}_{d+1} \; , \hspace{5.0pt}\forall d\in \llbracket 0,D \rrbracket \; ,\end{aligned}$$
(28b)
$$\begin{aligned}&\mathbf {{\Delta }}_{d+1} \ge 0 \; , \hspace{5.0pt}\forall d\in \llbracket 0,D \rrbracket \; ,\end{aligned}$$
(28c)
$$\begin{aligned}&\sigma (\mathbf {{U}}_{d,m}) \subset \mathcal {F}_{d,m} \; , \hspace{5.0pt}\forall (d,m)\in \llbracket 0,D \rrbracket \times \llbracket 0,M \rrbracket \; ,\end{aligned}$$
(28d)
$$\begin{aligned}&\sigma (\mathbf {{\Delta }}_{d+1}) \subset \mathcal {F}_{d,M} \; , \hspace{5.0pt}\forall d\in \llbracket 0,D \rrbracket \; .\end{aligned}$$
(28e)

Second, Problem (28) involves standard equality constraints in the dynamics, so that the machinery developed in Carpentier et al. (2023, Proposition 4.1) applies to it. We therefore obtain a Dynamic Programming equation associated with Problem (28) involving the new decision process \(\mathbf {{\Delta }}\). This last Dynamic Programming equation reduces to the Bellman Eq. (10) when replacing the extra nonnegative decision variables by inequality constraints. \(\square\)

As an immediate consequence of Propositions 3 and 8, we obtain the following proposition which is the main result of this section.

Proposition 9

Suppose that both monotonicity-inducing Assumption 1 and white noise Assumption 2 hold true. Then, the optimal value function \(V^{\textrm{e}}\) of Problem (6) can be computed by solving Problem (7) at the slow time-scale by the Bellman backward induction (10), that is,

$$\begin{aligned} V^{\textrm{i}}(x) = V_{0}^{\textrm{i}}(x) = V_{0}^{\textrm{e}}(x) = V^{\textrm{e}}(x) \; , \hspace{5.0pt}\forall x\in {\mathbb X}_{0} \; .\end{aligned}$$

Moreover, if both monotonicity-inducing Assumption 1 and white noise Assumption 2 hold true, we have by Proposition 8 that the price Bellman function \(\underline{V}_{0}^{\textrm{P}}\) and the resource Bellman function \(\overline{V}_{0}^{\textrm{R}}\) at time \(d=0\) are respectively lower and upper bounds for the optimal value function of Problem (6):

$$\begin{aligned} \underline{V}_{0}^{\textrm{P}} \le V^{\textrm{e}} \le \overline{V}_{0}^{\textrm{R}} \; .\end{aligned}$$
(29)

Equation (29) provides an interval in which the optimal value of the original problem (6) lies. But this interval is valid only under the time-block independence Assumption 2. This last assumption is generally not satisfied in practical cases, and we cannot therefore guarantee the quality of the Bellman functions obtained by the price and resource decomposition algorithms. This being so, the price Bellman functions \(\underline{V}_{d}^{\textrm{P}}\) and the resource Bellman functions \(\overline{V}_{d}^{\textrm{R}}\) always allow to compute admissible policies for Problem (6), as explained in Sect. 5.1.

6 Case study

In this section, we apply the previous theoretical results to a long term aging and battery renewal management problem. In Sect. 6.1, we formulate the problem. In Sect. 6.2, we simplify the intraday problems. In Sect. 6.3, we describe the data used for the numerical experiments. Finally, in Sect. 6.4, we sketch how to apply resource and price decomposition algorithms, and we compare the results given by each of these methods.

6.1 Problem formulation

We consider the following energy storage management problem over 20 years. We manage the charge and discharge of an battery every time step m of 30 min. A decision of battery replacement is taken every day, so that the number of days considered in the problem is \(20 \times 365 = 7300\). Since the number of time steps during a day is \(24 \times 2 = 48\), the total number of time steps of the problem is \(48 \times 7300 = 350,\!400\). The state of charge of the battery has to remain between prescribed bounds at each time step. We also consider the evolution over time of the amount of remaining exchangeable energy in the battery (related to the number of cycles remaining), that is, the health of the battery. Once this variable reaches zero, the battery is considered unusable. In addition to the battery, the studied system includes a local renewable energy production unit and a local energy consumption: the net demand (consumption minus production) at each time step is an exogenous random variable affecting the system. Finally we pay for the local system energy consumption, that is, net demand minus energy exchanged with the battery. When this quantity is negative (excess energy production), the energy surplus is assumed to be wasted. The aim of the problem is to minimize the energy bill over the whole time horizon by providing an optimal strategy for the storage charge and the battery renewal.

As we are dealing with the energy storage management problem of a battery over a very long term (20 years) involving two time scales, we adopt the notations defined in Sect. 2.1. The total number of slow time steps (days) in the time horizon is denoted by \(D+1\) (\(D= 20 \times 365 = 7300\)), and each slow time interval \([d,d+1[\) contains \(M+1\) fast time steps (half hour), hence \(M+1 = 24 \times 2 = 48\).

At the fast time-scale, the system control is the energy \(\mathbf {{U}}_{d,m}\) transferred in and out of the battery. We denote the charge of the battery by \(\mathbf {{U}}_{d,m}^+ = \max \{0,\mathbf {{U}}_{d,m}\}\), and the discharge of the battery by \(\mathbf {{U}}_{d,m}^- = \max \{0,-\mathbf {{U}}_{d,m}\}\). For all timeFootnote 7\((d,m)\in \llbracket 0,D \rrbracket \times \llbracket 0,M{+}1 \rrbracket\), the state of the battery consists of

  • the amount of energy \(\mathbf {{S}}_{d,m}\) in the battery (state of charge), whose dynamics is given by the simple storage dynamics equation

    $$\begin{aligned} \mathbf {{S}}_{d,m+1} = \mathbf {{S}}_{d,m} + \rho ^{\textrm{c}} \mathbf {{U}}_{d,m}^{+} - \rho ^{\textrm{d}} \mathbf {{U}}_{d,m}^{-} \; , \hspace{5.0pt}\forall m \in \llbracket 0,M \rrbracket \; ,\end{aligned}$$
    (30a)

    where \(\rho ^{\textrm{c}}\) and \(\rho ^{\textrm{d}}\) are the charge and discharge coefficients of the battery,

  • the amount of remaining exchangeable energy \(\mathbf {{H}}_{d,m}\) (health of the battery), with

    $$\begin{aligned} \mathbf {{H}}_{d,m+1} = \mathbf {{H}}_{d,m} - \mathbf {{U}}_{d,m}^+ - \mathbf {{U}}_{d,m}^- \; , \hspace{5.0pt}\forall m \in \llbracket 0,M \rrbracket \; ,\end{aligned}$$
    (30b)

    so that the battery health decreases with any energy exchange,

  • the capacity \(\mathbf {{C}}_{d,m}\) of the battery (assumed to be constant at the fast time-scale)

    $$\begin{aligned} \mathbf {{C}}_{d,m+1} = \mathbf {{C}}_{d,m} \; , \hspace{5.0pt}\forall m \in \llbracket 0,M \rrbracket \; .\end{aligned}$$
    (30c)

These equations at the fast time-scale are gathered as

$$\begin{aligned} (\mathbf {{S}}_{d,m+1},\mathbf {{H}}_{d,m+1},\mathbf {{C}}_{d,m+1}) = \varphi \big ( \mathbf {{S}}_{d,m},\mathbf {{H}}_{d,m},\mathbf {{C}}_{d,m},\mathbf {{U}}_{d,m}\big ) \; , \hspace{5.0pt}\forall (d,m)\in \llbracket 0,D \rrbracket \times \llbracket 0,M \rrbracket \; .\end{aligned}$$
(30d)

At the slow time-scale, that is, for each slow time step d, there exists another control \(\mathbf {{B}}_{d}\) modeling the possible renewal of the battery at the end of the slow time step. To take it into account, we add a fictitious time step \((d,M{+}1)\) between (dM) and \((d{+}1,0)\). The dynamics of the battery for this specific time step are

$$\begin{aligned}&\mathbf {{S}}_{d+1,0} = {\left\{ \begin{array}{ll} 0 & \text { if } \mathbf {{B}}_{d} > 0 \; ,\\ \mathbf {{S}}_{d,M+1} & \text { otherwise} \; ,\end{array}\right. } \end{aligned}$$
(31a)

meaning that, when renewed, a new battery is empty,

$$\begin{aligned}&\mathbf {{H}}_{d+1,0} = {\left\{ \begin{array}{ll} \mathfrak {N}(\mathbf {{B}}_{d}) \mathbf {{B}}_{d} & \text { if } \mathbf {{B}}_{d} > 0 \; ,\\ \mathbf {{H}}_{d,M+1} & \text { otherwise} \; ,\end{array}\right. } \end{aligned}$$
(31b)

meaning that, when renewed, the health of a battery is the product of the new battery capacity \(\mathbf {{B}}_{d}\) by an integer-valued function \(\mathfrak {N}: {\mathbb R}_+ \rightarrow {\mathbb N}\) estimated at \(\mathbf {{B}}_{d}\),

$$\begin{aligned}&\mathbf {{C}}_{d+1,0} = {\left\{ \begin{array}{ll} \mathbf {{B}}_{d} & \text { if } \mathbf {{B}}_{d} > 0 \; ,\\ \mathbf {{C}}_{d,M+1} & \text { otherwise} \; ,\end{array}\right. } \end{aligned}$$
(31c)

corresponding to the renewal of the battery. These equations are gathered as:

$$\begin{aligned} (\mathbf {{S}}_{d+1,0},\mathbf {{H}}_{d+1,0},\mathbf {{C}}_{d+1,0}) = \psi \big ( \mathbf {{S}}_{d,M+1},\mathbf {{H}}_{d,M+1},\mathbf {{C}}_{d,M+1},\mathbf {{B}}_{d}\big ) \; , \hspace{5.0pt}\forall d\in \llbracket 0,D \rrbracket \; .\end{aligned}$$
(31d)

We assume that the initial state of the battery is known: \((\mathbf {{S}}_{0,0},\mathbf {{H}}_{0,0},\mathbf {{C}}_{0,0})= (s_{0},h_{0},c_{0})\).

All the control variables are subject to bound constraints

$$\begin{aligned}&\mathbf {{U}}_{d,m} \in \big [\underline{U},\overline{U}\big ] \; , \hspace{5.0pt}\mathbf {{B}}_{d} \in \big [0,\overline{B}\big ] \; ,\end{aligned}$$
(32a)

(with \(\underline{U}<0\) and \(\overline{U}>0\)), as well as the state variables:

$$\begin{aligned}&\mathbf {{S}}_{d,m} \in \big [0,\xi \, \mathbf {{C}}_{d,m}\big ] \; , \hspace{5.0pt}\mathbf {{H}}_{d,m} \in \big [0,\mathfrak {N}(\mathbf {{C}}_{d,m}) \, \mathbf {{C}}_{d,m}\big ] \; , \hspace{5.0pt}\mathbf {{C}}_{d,m} \in \big [0,\overline{B}\big ] \; .\end{aligned}$$
(32b)

The amount of remaining exchangeable energy \(\mathbf {{H}}_{d,m}\) has to be nonnegative for the battery to operate, and the upper bound on the state of charge \(\mathbf {{S}}_{d,m}\) is a fraction \(\xi \in [0,1]\) of the capacity \(\mathbf {{C}}_{d,m}\).

At each fast time step (dm), a local renewable energy production unit produces energy and a local demand consumes energy: we denote by \(\mathbf {{D}}_{d,m}\) the net demand (consumption minus production) and we suppose that it is an exogenous random variable. The excess energy consumption \(\big ( \mathbf {{D}}_{d,m} + \mathbf {{U}}_{d,m}^+ - \mathbf {{U}}_{d,m}^-\big )^{+}\) is paid at a given price \(\pi ^e_{d,m}\), assumed to be deterministic and known, whereas excess energy production is assumed to be wasted. The price \(\mathbf {{P}}^b_d\) of a new battery is supposed to be random, so that the operating cost \(L_{d}\) during the slow time step d is

$$\begin{aligned} \sum _{m=0}^{M} \pi ^e_{d,m} \big ( \mathbf {{D}}_{d,m} + \mathbf {{U}}_{d,m}^+ - \mathbf {{U}}_{d,m}^-\big )^+ + \mathbf {{P}}^b_d \mathbf {{B}}_d \; .\end{aligned}$$
(33)

The value of battery at the end of the optimization horizon is represented by a cost function K depending on the state of the battery. Then, the objective function to be minimized is the expected sum over the time span of the discounted daily costs (discount factor \(\gamma\)), plus the final cost K. We assume that the effective domain of the final cost K is \({\mathbb R}^3\) and that K is a nonincreasing function. In the numerical application the final cost \(K\) is taken identically equal to 0.

Finally, the optimization problem under consideration is

$$\begin{aligned} \inf _{\left\{ \mathbf {{U}}_{d,0:M},\mathbf {{B}}_{d} \right\} _{d\in \llbracket 0,D \rrbracket }} \;&\mathbb {E}\Bigg [ \sum _{d=0}^D \gamma ^{d} \bigg ( \sum _{m=0}^{M} \pi ^e_{d,m} \big ( \mathbf {{D}}_{d,m} + \mathbf {{U}}_{d,m}^+ - \mathbf {{U}}_{d,m}^-\big )^+ + \mathbf {{P}}^{b}_{d} \mathbf {{B}}_d \bigg ) \nonumber \\&\quad + K(\mathbf {{S}}_{D+1,0}, \mathbf {{H}}_{D+1,0}, \mathbf {{C}}_{D+1,0})\Bigg ] \; ,\end{aligned}$$
(34a)

subject, for all \((d,m)\in \llbracket 0,D \rrbracket \times \llbracket 0,M \rrbracket\), to state dynamics

$$\begin{aligned}&(\mathbf {{S}}_{0,0},\mathbf {{H}}_{0,0},\mathbf {{C}}_{0,0})= (s_{0},h_{0},c_{0}) \; ,\end{aligned}$$
(34b)
$$\begin{aligned}&(\mathbf {{S}}_{d,m+1},\mathbf {{H}}_{d,m+1},\mathbf {{C}}_{d,m+1}) = \varphi \big ( \mathbf {{S}}_{d,m},\mathbf {{H}}_{d,m},\mathbf {{C}}_{d,m},\mathbf {{U}}_{d,m}\big ) \; ,\end{aligned}$$
(34c)
$$\begin{aligned}&(\mathbf {{S}}_{d+1,0},\mathbf {{H}}_{d+1,0},\mathbf {{C}}_{d+1,0}) = \psi \big ( \mathbf {{S}}_{d,M+1},\mathbf {{H}}_{d,M+1},\mathbf {{C}}_{d,M+1},\mathbf {{B}}_{d}\big ) \; ,\end{aligned}$$
(34d)

to bounds constraints

$$\begin{aligned}&\mathbf {{S}}_{d,m} \in \big [0,\xi \, \mathbf {{C}}_{d,m}\big ] \; , \hspace{5.0pt}\mathbf {{H}}_{d,m} \in \big [0,\mathfrak {N}(\mathbf {{C}}_{d,m}) \, \mathbf {{C}}_{d,m}\big ] \; , \hspace{5.0pt}\mathbf {{C}}_{d,m} \in \big [0,\overline{B}\big ] \; ,\end{aligned}$$
(34e)
$$\begin{aligned}&\mathbf {{U}}_{d,m} \in \big [\underline{U},\overline{U}\big ] \; , \hspace{5.0pt}\mathbf {{B}}_{d} \in \big [0,\overline{B}\big ] \; ,\end{aligned}$$
(34f)

and tononanticipativity constraints

$$\begin{aligned}&\sigma (\mathbf {{U}}_{d,m}) \subset \sigma \big ( \mathbf {{D}}_{0,0},\dots ,\mathbf {{D}}_{d,m}, \mathbf {{P}}^{b}_{0},\dots ,\mathbf {{P}}^{b}_{d-1}\big ) \; ,\end{aligned}$$
(34g)
$$\begin{aligned}&\sigma (\mathbf {{B}}_{d}) \subset \sigma \big ( \mathbf {{d}}_{0,0},\dots ,\mathbf {{d}}_{d,M}, \mathbf {{p}}^{b}_{0},\dots ,\mathbf {{p}}^{b}_{d}\big ) \; .\end{aligned}$$
(34h)

We denote by \(\mathbf {{U}}_{d}\) the vector of decision variables to be taken during the slow time step d

$$\begin{aligned} \mathbf {{U}}_{d}=\big ( \left\{ \mathbf {{U}}_{d,m} \right\} _{m\in \llbracket 0,M \rrbracket },\mathbf {{B}}_{d}\big ) \; .\end{aligned}$$
(35a)

We also denote by \(\mathbf {{W}}_{d}\) the vector of noise variables occurring during the slow time step d

$$\begin{aligned} \mathbf {{W}}_{d}=\big ( \left\{ \mathbf {{D}}_{d,m} \right\} _{m\in \llbracket 0,M \rrbracket },\mathbf {{P}}^{b}_{d}\big ) \; ,\end{aligned}$$
(35b)

and by \(\mathbf {{X}}_{d}\) the vector of state variables at the beginning of the slow time step d

$$\begin{aligned} \mathbf {{X}}_{d}=\big ( \mathbf {{S}}_{d,0},\mathbf {{H}}_{d,0},\mathbf {{C}}_{d,0}\big ) \; .\end{aligned}$$
(35c)

Problem (34) is amenable to the form (6) given in Sect. 2.2, as explained below.

  • In the expression of \(\mathbf {{X}}_{d+1}=(\mathbf {{S}}_{d+1,0},\mathbf {{H}}_{d+1,0},\mathbf {{C}}_{d+1,0})\) given by (34d), replacing the variable \(\mathbf {{s}}_{d,m}\) recursively from \(m=M+1\) to \(m=1\) by using (34c), one obtains a slow-time-scale dynamics of the form (6b):

    $$\begin{aligned} (\mathbf {{S}}_{d+1,0},\mathbf {{H}}_{d+1,0},\mathbf {{C}}_{d+1,0}) = f_{d}(\mathbf {{S}}_{d,0},\mathbf {{H}}_{d,0},\mathbf {{C}}_{d,0}, \mathbf {{U}}_{d,0:M},\mathbf {{B}}_{d}) \; , \hspace{5.0pt}\forall d\in \llbracket 0,D \rrbracket \; .\end{aligned}$$
    (36)
  • The cost function of slow time step d in (33) is obviously a function depending on \(\mathbf {{U}}_{d}\) and \(\mathbf {{W}}_{d}\). The bound constraints on the control (34f) (resp. the bound constraints on the state (34e)) only depend on \(\mathbf {{U}}_{d}\) (resp. on \((\mathbf {{X}}_d,\mathbf {{U}}_d,\mathbf {{W}}_{d})\): indeed, in the same way we obtained Eq. (36), replacing in the right-hand side of (34c) the state variable \(\mathbf {{S}}_{d,m'}\) recursively from \(m'=m\) to \(m'=0\) by using (34c), we obtain that the state \((\mathbf {{S}}_{d,m+1},\mathbf {{H}}_{d,m+1},\mathbf {{C}}_{d,m+1})\) is a function of \((\mathbf {{S}}_{d,0},\mathbf {{H}}_{d,0},\mathbf {{C}}_{d,0},\mathbf {{U}}_{d,0:m})\) for all \(m\in \llbracket 0,M \rrbracket\). These constraints are incorporated in the cost of slow time step d (see Remark 1), which makes it an extended real-valued function of the form \(L_{d}(\mathbf {{X}}_d,\mathbf {{U}}_d,\mathbf {{W}}_{d})\) as in (6a). The final cost K is, by definition, a function of \(\mathbf {{X}}_{D+1}\).

  • Since \(\mathbf {{B}}_{d}\) (resp. \(\mathbf {{P}}_{d}^{b}\)) represents a control (resp. a noise) at the fictitious time step between \((d,M+1)\) and \((d+1,0)\), the nonanticipativity constraints (34g) – (34h) are of the form (6c).

Thus, Problem (34) fits the framework developed in Sect. 4 for two-time-scale optimization problems. Moreover, Assumption 1 is fulfilled for Problem (34) (see Appendix C), so that Proposition 3 applies: relaxing the dynamics (36) as inequality constraints allows to compute price and resource Bellman functions that are lower and upper bounds for the Bellman functions associated with Problem (34).

6.2 Simplifying the intraday problems

We turn now to the computation of the functions \(L_{d}^{\textrm{P}}\) in (16) and \(L_{d}^{\textrm{R}}\) in (20), that we call intraday functions in this case study. As explained in Sect. 4.1, (resp. Sect. 4.2), the intraday functions \(L_{d}^{\textrm{P}}\) (resp. \(L_{d}^{\textrm{R}}\)) depend on the couple \((x_{d},p_{d+1})\) (resp. \((x_{d},r_{d+1})\)), namely the 6-uple \((s_{d},h_{d},c_{d},p^{s}_{d+1},p^{h}_{d+1},p^{c}_{d+1})\) (resp. \((s_{d},h_{d},c_{d},s_{d+1},h_{d+1},c_{d+1})\)) in the case study under consideration. We use here some characteristics of the problem to make approximations to alleviate the computation of these intraday functions.

6.2.1 Intraday problem associated with resource decomposition

As explained in Sect. 4.2, the aim of the resource decomposition algorithm is to compute, for all slow time steps \(d\in \llbracket 0,D{+}1 \rrbracket\), upper bounds \(\overline{V}_{d}^{\textrm{R}}\) of the Bellman functions associated with Problem (34), which can be put in the form of Problem (6). These upper bounds are obtained by solving a collection of intraday problems such as (20) for each slow time step \(d\in \llbracket 0,D \rrbracket\), and then by solving the Bellman recursion (21). The intraday problems have a priori to be solved for every 6-tuple \((s_{d},h_{d},c_{d},s_{d+1},h_{d+1},c_{d+1})\), that is, the state \((s_{d},h_{d},c_{d})\) at the beginning of the slow time step and the resource target \((s_{d+1},h_{d+1},c_{d+1})\) at the end of the slow time step. This extremely computationally demanding task is greatly simplified thanks to the following considerations.


Resource intraday function reduction


Since the capacity component \(\mathbf {{C}}_{d,m}\) of the state can only change at the end of a slow time step (see Eqs. (30c) and (31c)), it is possible to take the capacity dynamics \(\mathbf {{C}}_{d,m}\), the capacity control \(\mathbf {{B}}_{d}\) and the associated bound constraint, and the cost term \(\mathbf {{P}}_{d}^{b} \mathbf {{B}}_{d}\), out of the intraday problem and to take them into account in the Bellman recursion. To achieve that, resource decomposition is performed by dealing with Eq. (30d) for \(m=M\), instead of Eq. (31d). We introduce the two resources \(s_{d+1}\) and \(h_{d+1}\) for the state of charge and the health of the battery.Footnote 8 Then, the intraday problem (20) becomes

$$\begin{aligned}&L_{d}^{\textrm{R}}(s_{d},h_{d},c_{d},s_{d+1},h_{d+1}) = \inf _{\mathbf {{U}}_{d,0:M}} {\mathbb E}\; \bigg [ \sum _{m=0}^{M} \pi ^e_{d,m} \max (0,\mathbf {{D}}_{d,m}+\mathbf {{U}}_{d,m}^{+}-\mathbf {{U}}_{d,m}^{-}) \bigg ] \; ,\end{aligned}$$
(37a)
$$\begin{aligned}&\text {s.t. } \; (\mathbf {{S}}_{d,0},\mathbf {{H}}_{d,0}) = (s_{d},h_{d}) \; , \end{aligned}$$
(37b)

with, for all \(m\in \llbracket 0,M \rrbracket\),

$$\begin{aligned}&\quad \quad \mathbf {{S}}_{d,m+1} = \mathbf {{S}}_{d,m} + \rho ^{\textrm{c}} \mathbf {{U}}_{d,m}^{+} - \rho ^{\textrm{d}} \mathbf {{U}}_{d,m}^{-} \; , \end{aligned}$$
(37c)
$$\begin{aligned}&\quad \quad \mathbf {{H}}_{d,m+1} = \mathbf {{H}}_{d,m} - \mathbf {{U}}_{d,m}^{+} - \mathbf {{U}}_{d,m}^{-} \; , \end{aligned}$$
(37d)
$$\begin{aligned}&\quad \quad \mathbf {{S}}_{d,M+1} \ge s_{d+1} \; , \hspace{5.0pt}\mathbf {{H}}_{d,M+1} \ge h_{d+1} \; , \end{aligned}$$
(37e)
$$\begin{aligned}&\quad \quad \mathbf {{U}}_{d,m} \in \big [\underline{U},\overline{U}\big ] \; , \end{aligned}$$
(37f)
$$\begin{aligned}&\quad \quad \mathbf {{S}}_{d,m} \in \big [0, \xi \, c_{d}\big ] \; , \hspace{5.0pt}\mathbf {{H}}_{d,m} \in \big [0,\mathfrak {N}(c_{d}) \, c_{d}\big ] \; , \end{aligned}$$
(37g)
$$\begin{aligned}&\quad \quad \sigma (\mathbf {{U}}_{d,m}) \subset \sigma \big ( \mathbf {{D}}_{d,0},\dots ,\mathbf {{D}}_{d,m}\big ) \; , \end{aligned}$$
(37h)

and is parameterized by the 5-tuple \((s_{d},h_{d},c_{d},s_{d+1},h_{d+1})\). The sequence \(\{\overline{V}_{d}^{\textrm{R}}\}_{d\in \llbracket 0,D{+}1 \rrbracket }\) of Bellman functions is computed by the following recursion:

$$\begin{aligned} \overline{V}_{d}^{\textrm{R}}(s_{d},h_{d},c_{d}) & = \inf _{s_{d+1},h_{d+1},\mathbf {{B}}_{d}} {\mathbb E}\Big [ \gamma \big ( L_{d}^{\textrm{R}} (s_{d},h_{d},c_{d},s_{d+1},h_{d+1}) + \mathbf {{P}}_{d}^{b} \mathbf {{B}}_{d} \big ) \nonumber \\ & \quad + \overline{V}_{d+1}^{\textrm{R}} (\mathbf {{s}}_{d+1,0},\mathbf {{H}}_{d+1,0},\mathbf {{C}}_{d+1,0}) \Big ] \; ,\end{aligned}$$
(38a)
$$\begin{aligned} \text {s.t.} \quad&(\mathbf {{S}}_{d+1,0},\mathbf {{H}}_{d+1,0},\mathbf {{C}}_{d+1,0}) = \psi \big ( s_{d+1},h_{d+1},c_{d},\mathbf {{B}}_{d}\big ) \; , \end{aligned}$$
(38b)
$$\begin{aligned}&s_{d+1} \in \big [0,\xi \, c_{d}\big ] \; , \hspace{5.0pt}h_{d+1} \in \big [0,\mathfrak {N}(c_{d}) \, c_{d}\big ] \; , \end{aligned}$$
(38c)
$$\begin{aligned}&\mathbf {{B}}_{d} \in \big [0,\overline{B}\big ] \; , \end{aligned}$$
(38d)
$$\begin{aligned}&\sigma (\mathbf {{B}}_{d}) \subset \sigma \big ( \mathbf {{P}}^{b}_{d}\big ) \; . \end{aligned}$$
(38e)

In order to further simplify the computation of the intraday functions, we remark that, in the Bellman recursion (38), we can replace the function \(L_{d}^{\textrm{R}}\) by the function \(\tilde{L}_{d}^{\textrm{R}}\) with

$$\begin{aligned} & \tilde{L}_{d}^{\textrm{R}}\big ( s_{d},h_{d},c_{d},s_{d+1},h_{d+1}\big ) = L_{d}^{\textrm{R}}\big ( s_{d},h_{d},c_{d},s_{d+1},h_{d+1}\big ) \\ & \quad + \delta _{\big [0,\mathfrak {N}(c_{d}) \, c_{d}\big ]}\big ( h_{d}\big ) + \delta _{\{h_{d+1} \le h_d\}}\big ( h_{d},h_{d+1}\big ) + \delta _{\big [0,\mathfrak {N}(c_{d}) \, c_{d}\big ]}\big ( h_{d+1}\big ) \; ,\end{aligned}$$

where \(\delta _{A}\) denotes the indicator function of the set A (see Footnote 3 on page 6). Indeed, the last term \(\delta _{\big [0,\mathfrak {N}(c_{d}) \, c_{d}\big ]}(h_{d+1})\) is obtained by moving the right-hand side of Constraint (38c) to the minimized cost \(L_{d}^{\textrm{R}}\) and the two other terms can be added as it is easily seen that \(L_{d}^{\textrm{R}}(s_{d},h_{d},c_{d},s_{d+1},h_{d+1})=+\infty\) when \(h_{d+1} > h_d\) or when \(h_d \not \in \big [0,\mathfrak {N}(c_{d}) \, c_{d}\big ]\). Then, it is straightforward to prove that

$$\begin{aligned} \tilde{L}_{d}^{\textrm{R}}(s_{d},h_{d},c_{d},s_{d+1},h_{d+1}) = \tilde{L}_{d}^{\textrm{R}}(s_{d},h_{d}{-}h_{d+1},c_{d},s_{d+1},0) \; .\end{aligned}$$
(39)

Indeed, as the health dynamics is linear nonincreasing, any admissible control for Problem (37) for the ordered pair \((h_{d},h_{d+1})\), with \(h_{d+1} \le h_d\) and \((h_d, h_{d+1}) \in \big [0,\mathfrak {N}(c_{d}) \, c_{d}\big ]^2\) is also admissible for the ordered pair \((h_d- h_{d+1},0)\) and conversely. Moreover, the resulting cost is the same since the cost does not depend on the health variable. We thus obtain Eq. (39).


Resource intraday function approximation


As suggested in Kaut et al. (2014), we decide to neglect the state of charge target at the slow time-scale. As a matter of fact, the operation of the battery is daily periodic and such that it is more or less empty at the beginning (and thus at the end) of a slow time step (day). It is thus reasonable to assume that the battery is empty at the beginning and at the end of every slow time step, which is a pessimistic but rather realistic assumption. Combined with Eq. (39), we obtain a new function \(\widehat{L}_{d}^{\textrm{R}}\) approximating the original function \(L_{d}^{\textrm{R}}\), that is

$$\begin{aligned} \widehat{L}_{d}^{\textrm{R}}(h_{d}{-}h_{d+1},c_{d}) = \tilde{L}_{d}^{\textrm{R}}(0,h_{d}{-}h_{d+1},c_{d},0,0) \approx \tilde{L}_{d}^{\textrm{R}}(s_{d},h_{d},c_{d},s_{d+1},h_{d+1}) \; .\end{aligned}$$
(40)

The approximated intraday function \(\widehat{L}_{d}^{\textrm{R}}\) now only depends on two variables, which significantly reduces the time needed to compute it. Then, the sequence \(\{\overline{V}_{d}^{\textrm{R}}\}_{d\in \llbracket 0,D{+}1 \rrbracket }\) of Bellman functions in (38) is approximated by the sequence \(\{\widehat{\overline{V}}_{d}^{\textrm{R}}\}_{d\in \llbracket 0,D{+}1 \rrbracket }\) given by the following recursion

$$\begin{aligned} \widehat{\overline{V}}_{d}^{\textrm{R}}(h_{d},c_{d}) = \inf _{h_{d+1},\mathbf {{B}}_{d}}&{\mathbb E}\Big [ \gamma \big ( \widehat{L}_{d}^{\textrm{R}}(h_{d}{-}h_{d+1},c_{d}) + \mathbf {{P}}_{d}^{b} \mathbf {{B}}_{d} \big ) + \widehat{\overline{V}}_{d+1}^{\textrm{R}} \big ( \psi ^{\textrm{H,C}}\big ( h_{d+1},c_{d},\mathbf {{B}}_{d}\big )\big ) \Big ] \; ,\end{aligned}$$
(41a)
$$\begin{aligned} \text {s.t.} \quad&h_{d+1} \in \big [0,\mathfrak {N}(c_{d}) \, c_{d}\big ] \; ,\end{aligned}$$
(41b)
$$\begin{aligned}&\mathbf {{B}}_{d} \in \big [0,\overline{B}\big ] \; ,\end{aligned}$$
(41c)
$$\begin{aligned}&\sigma (\mathbf {{B}}_{d}) \subset \sigma \big ( \mathbf {{P}}^{b}_{d}\big ) \; ,\end{aligned}$$
(41d)

where the new dynamics \(\psi ^{\textrm{H,C}}\) is deduced from \(\psi\) in (31) by keeping only the last two dynamics (31b) and (31c), which do not depend on the state of charge.

As explained in Appendix A, we consider in this study I periodicity classes (\(I=4\), that is, one class for each season of the year), so that the computation of the resource intraday problem is done only for I different days denoted \(d_{1},\dots ,d_{I}\). The complexity of the associated resource decomposition algorithm is sketched in Appendix B.

6.2.2 Intraday problem associated with price decomposition

As detailed in Sect. 4.1, the aim of the price decomposition algorithm is to compute, for all slow time steps \(d\in \llbracket 0,D{+}1 \rrbracket\), lower bounds \(\underline{V}_{d}^{\textrm{P}}\) of the Bellman functions associated with Problem (34). These lower bounds are obtained by solving a collection of intraday problems such as (16) for each slow time step \(d\in \llbracket 0,D \rrbracket\), and then by solving the Bellman recursion (17). The intraday problems have a priori to be solved for every 6-tuple \((s_{d},h_{d},c_{d},p^{s}_{d+1},p^{h}_{d+1},p^{c}_{d+1})\), that is, the state \((s_{d},h_{d},c_{d})\) at the beginning of the slow time step and the prices \((p^{s}_{d+1},p^{h}_{d+1},p^{c}_{d+1})\) associated with the dualization of the equality dynamics equations.


Price intraday function reduction

As in the resource intraday function reduction, it is possible to take the capacity dynamics, its associated control and bound constraints as well as the cost term \(\mathbf {{P}}_{d}^{b} \mathbf {{B}}_{d}\) out of the intraday problem and to take them into account in the Bellman recursion, so that the intraday problem does not depend on the price \(p^{c}_{d+1}\) associated with the capacity dynamics. To achieve that, price decomposition is not performed on Eq. (31d), but on Eq. (30d) for \(m=M\), which leads to an intraday function whose arguments are \(\big ( s_{d},h_{d},c_{d},p^{s}_{d+1},p^{h}_{d+1}\big )\). But another possible reduction occurs here: from the health dynamics (30b) summed over the fast time steps of day d, we derive the inequality

$$\begin{aligned} h_{d} - \mathbf {{H}}_{d,M+1} - \sum _{m=0}^{M} \big ( \mathbf {{U}}_{d,m}^{+}+\mathbf {{U}}_{d,m}^{-}\big ) \ge 0 \; .\end{aligned}$$
(42)

Following the framework of Sect. 4.1, we dualize this induced constraint by incorporating, on the one hand, the terms \(\mathbf {{U}}_{d,m}^{+}+\mathbf {{U}}_{d,m}^{-}\) in the definition of the price intraday function for \(m\in \llbracket 0,M \rrbracket\) and, on the other hand, the term \(h_{d} - \mathbf {{H}}_{d,M+1}\) in the computation of the Bellman functions. Doing so, the intraday function \(L_{d}^{\textrm{P}}\) does not depend anymore on the health \(h_{d}\), and is defined as

$$\begin{aligned} L_{d}^{\textrm{P}}&(s_{d},c_{d},p^{s}_{d+1},p^{h}_{d+1}) = \inf _{\mathbf {{U}}_{d,0:M}} {\mathbb E}\; \bigg [ \sum _{m=0}^{M} \Big ( \pi ^e_{d,m} \max (0,\mathbf {{D}}_{d,m}+\mathbf {{U}}_{d,m}^{+}-\mathbf {{U}}_{d,m}^{-}) \Big . \bigg . \nonumber \\&\quad - p^{h}_{d+1}\big ( \mathbf {{U}}_{d,m}^{+}+\mathbf {{U}}_{d,m}^{-}\big ) \Big ) + p^{s}_{d+1} \mathbf {{S}}_{d,M+1} \bigg ] \; ,\end{aligned}$$
(43a)

subject to, for all \(m\in \llbracket 0,M \rrbracket\),

$$\begin{aligned}&\mathbf {{S}}_{d,0} = s_{d} \; , \hspace{5.0pt}\mathbf {{S}}_{d,m+1} = \mathbf {{S}}_{d,m} + \rho ^{\textrm{c}} \mathbf {{U}}_{d,m}^{+} - \rho ^{\textrm{d}} \mathbf {{U}}_{d,m}^{-} \; ,\end{aligned}$$
(43b)
$$\begin{aligned}&\mathbf {{U}}_{d,m} \in \big [\underline{U},\overline{U}\big ] \; , \hspace{5.0pt}\mathbf {{S}}_{d,m} \in \big [0, \xi c_{d}\big ] \; ,\end{aligned}$$
(43c)
$$\begin{aligned}&\sigma (\mathbf {{U}}_{d,m}) \subset \sigma \big ( \mathbf {{D}}_{d,0},\dots ,\mathbf {{D}}_{d,m}\big ) \; .\end{aligned}$$
(43d)

The associated sequence of Bellman functions \(\{\underline{V}_{d}^{\textrm{P}}\}_{d\in \llbracket 0,D{+}1 \rrbracket }\) is computed by the following recursion:

$$\begin{aligned} & \underline{V}_{d}^{\textrm{P}}(s_{d},h_{d},c_{d}) = \sup _{(p^{s}_{d+1},p^{h}_{d+1})\le 0} \bigg ( L_{d}^{\textrm{P}}(s_{d},c_{d},p^{s}_{d+1},p^{h}_{d+1}) + \inf _{s_{d,M+1},h_{d,M+1}} \; \inf _{\mathbf {{B}}_{d}} \; {\mathbb E}\Big [ \gamma \mathbf {{p}}_{d}^{b} \mathbf {{B}}_{d} \nonumber \\ & \quad - p^{s}_{d+1} s_{d,M+1} + \big ( p^{h}_{d+1}\big ( h_{d}-h_{d,M+1}\big ) + \underline{V}_{d+1}^{\textrm{P}} (\mathbf {{S}}_{d+1,0},\mathbf {{H}}_{d+1,0},\mathbf {{C}}_{d+1,0}) \big ) \Big ] \bigg ) \; ,\end{aligned}$$
(44a)
$$\begin{aligned} \text {subject to} \quad&(\mathbf {{S}}_{d+1,0},\mathbf {{H}}_{d+1,0},\mathbf {{C}}_{d+1,0}) = \psi \big ( s_{d,M+1},h_{d,M+1},c_{d},\mathbf {{B}}_{d}\big ) \; ,\end{aligned}$$
(44b)
$$\begin{aligned}&\mathbf {{B}}_{d} \in \big [0,\overline{B}\big ] \; , \hspace{5.0pt}s_{d,M+1} \in \big [0,\xi c_{d}\big ] \; , \hspace{5.0pt}h_{d,M+1} \in \big [0,\mathfrak {N}(c_{d}) \, c_{d}\big ] \; ,\end{aligned}$$
(44c)
$$\begin{aligned}&\sigma (\mathbf {{B}}_{d}) \subset \sigma \big ( \mathbf {{P}}^{b}_{d}\big ) \; .\end{aligned}$$
(44d)

Price intraday function approximation

As in the resource decomposition algorithm, it is possible to consider that the state of charge of the battery has no influence at the slow time-scale. Doing so, we obtain a new function \(\widehat{L}_{d}^{\textrm{P}}\) approximating the original function \(L_{d}^{\textrm{P}}\), that is,

$$\begin{aligned} \widehat{L}_{d}^{\textrm{p}}(c_{d},p^{h}_{d+1}) = L_{d}^{\textrm{P}}(0,c_{d},0,p^{h}_{d+1}) \approx L_{d}^{\textrm{P}}(s_{d},c_{d},p^{s}_{d+1},p^{h}_{d+1}) \; .\end{aligned}$$
(45)

The approximated price intraday function \(\widehat{L}_{d}^{\textrm{P}}\) only depends on the 2-tuple \((c_{d},p^{h}_{d+1})\), which significantly reduces the time needed to compute it. Then, the sequence \(\{\underline{V}_{d}^{\textrm{P}}\}_{d\in \llbracket 0,D{+}1 \rrbracket }\) of Bellman functions in (44) is approximated by the sequence \(\{\widehat{\underline{V}}_{d}^{\textrm{P}}\}_{d\in \llbracket 0,D{+}1 \rrbracket }\) given by the following recursion:

$$\begin{aligned} & \widehat{\underline{V}}_{d}^{\textrm{P}}(h_{d},c_{d}) = \sup _{p^{h}_{d+1}\le 0} \bigg ( \widehat{L}_{d}^{\textrm{P}}(c_{d},p^{h}_{d+1}) + \inf _{h_{d,M+1}} \; \inf _{\mathbf {{B}}_{d}} \; {\mathbb E}\Big [ \gamma \mathbf {{p}}_{d}^{b} \mathbf {{B}}_{d} \nonumber \\ & \qquad \qquad \qquad + p^{h}_{d+1}\big ( h_{d}-h_{d,M+1}\big ) + \widehat{\underline{V}}_{d+1}^{\textrm{P}} \big ( \psi ^{\textrm{H},\textrm{C}}\big ( h_{d,M+1},c_{d},\mathbf {{B}}_{d}\big )\big ) \Big ] \bigg ) \; ,\end{aligned}$$
(46a)
$$\begin{aligned} \text {subject to} \quad&h_{d,M+1} \in \big [0,\mathfrak {N}(c_{d}) \, c_{d}\big ] \; ,\end{aligned}$$
(46b)
$$\begin{aligned}&\mathbf {{B}}_{d} \in \big [0,\overline{B}\big ] \; ,\end{aligned}$$
(46c)
$$\begin{aligned}&\sigma (\mathbf {{B}}_{d}) \subset \sigma \big ( \mathbf {{P}}^{b}_{d}\big ) \; .\end{aligned}$$
(46d)

As explained in Appendix A, we consider in this study I periodicity classes (\(I=4\)), that is, one class for each season of the year), so that the computation of the price intraday problem is done only for I different days denoted \(d_{1},\dots ,d_{I}\). The complexity of the associated price decomposition algorithm is sketched in Appendix B.

6.3 Experimental setup

The data used in the application come from case studies provided by a Schneider Electric industrial site, equipped with solar panels and a battery, and submitted to three sources of randomness—namely, solar panels production, electrical demand and prices of batteries per kWh. We present hereby the different parameters of the instance under consideration.

  • Horizon: 20 years.

  • Fast time step: 30 min.

  • Slow time step: 1 day.

  • Number of time steps: \(350,\!400\) \((= (24 \times 2) \times (20 \times 365))\).

  • Battery renewal capacity: between 0 and \(1500\) kWh with a increment of 100 kWh.

  • Periodicity class: 4 classes, one per trimester of the year.


Data to model the cost of batteries and electricity


For the prices of batteries, we obtained a yearly forecast over 20 years from Statista.Footnote 9 We added a Gaussian noise to generate synthetic random batteries prices scenarios. We display in Fig. 1 the scenarios we generated.

Fig. 1
figure 1

Scenarios of battery prices over a twenty-year timespan

Three scenarios are highlighted in Fig. 1; they correspond to the three scenarios we comment in the numerical results in Sect. 6.4.

For the price of electricity, we chose a “time of use” tariff defined by three rates:

  • an off-peak rate at 0.0255$ between 22:00 and 7:00,

  • a shoulder rate at 0.0644$ between 7:00 and 17:00,

  • a peak rate at 0.2485$ between 17:00 and 22:00.

Data to model demand and production


In order to have a realistic dataset in the model described in Sect. 6.1, we use the data collected on 70 anonymized industrial sites monitored by Schneider Electric. This data set is openly available.Footnote 10 We extracted the data of the site numbered 70. For this site, we display in Fig. 2 the half hourly distribution of the net demand (demand minus solar production) during one day.

Fig. 2
figure 2

Daily half hourly distribution of net demand (kWh)

Remark 10

(About the probabilistic independence of the data). Both batterie prices and net demands correspond to realistic data that are given as scenarios, and there is a priori no independence property for these data. Of course, it is possible to compute marginal probability distributions from these scenarios: at a given time step (dm), collect all the values \(\mathbf {{d}}_{d,m}\) (the value \(\mathbf {{p}}^{b}_{d}\) if \(m=M{+}1\)) available from the scenarios and build a discrete probability distribution from these values. This procedure gives probability distributions at the half-hourly scale. This way of proceeding will be implemented to compute the resource and price intraday functions by Dynamic Programming (see Sect. 6.4.1).

6.4 Numerical experiments

The aim of the numerical experiments is to compute and evaluate policies induced by resource (resp. price) decomposition, that is, first solving an approximation of Problem (34) by computing the resource intraday functions \(\widehat{L}_{d}^{\textrm{R}}\) in (40) and the associated resource Bellman functions \(\widehat{\overline{V}}_{d}^{\textrm{R}}\) in (41) (resp. price intraday functions \(\widehat{L}_{d}^{\textrm{P}}\) in (45) and price Bellman functions \(\widehat{\underline{V}}_{d}^{\textrm{P}}\) in (46)) after all simplifications presented in Sect. 6.2.1 (resp. Sect. 6.2.2), and second evaluating the policies induced by the Bellman functions \(\widehat{\overline{V}}_{d}^{\textrm{R}}\) and  \(\widehat{\underline{V}}_{d}^{\textrm{P}}\).

6.4.1 Computation of the resource and price intraday functions

To compute the approximated resource intraday functions \(\widehat{L}_{d}^{\textrm{R}}\) as given in Eq. (40) and the approximated price intraday functions \(\widehat{L}_{d}^{\textrm{P}}\) as given in Eq. (45), we compute the marginal probability distributions of the noises at the fast time-scale as explained in Remark 10 and we apply the Dynamic Programming algorithm. Indeed, computing the intraday functions using Stochastic Programming would be very costly due to the large number of fast time steps inside a slow time step: for example computing a price intraday function (45) would require forming a scenario tree over 48 time steps for every possible (discretized) value of the pair \((c_{d},p^{h}_{d+1})\) and solving the associated optimization problem, i.e. a task too expensive in computation time.

We recall that the intraday functions are not computed for all possible days in the time horizon, but only for a day representing each periodicity class. Here we split the year of the industrial site data into the four traditional trimesters, each trimester corresponding to one periodicity class. For each trimester, we model the net demand at a given half hour of the day by a discrete random variable with a support of size 10. The probability distribution of each discrete random variable is obtained by quantization, using k-means Kaut (2021), algorithm, the net demand realizations in the dataset associated with the half hour under consideration.

In the case of resource (resp. price) decomposition, we compute the intraday functions \(\widehat{L}_{d}^{\textrm{R}}\) (resp. \(\widehat{L}_{d}^{\textrm{P}}\)) for every possible capacity \(c_d\) and every possible exchangeable energy \(h_{d}-h_{d+1}\) (resp. every possible price \(p^{h}_{d+1}\)). In this study, the possible values of the capacity \(c_{d}\) are \(\{0,100\ldots ,1500\}\) kWh, whereas the possible values of the price \(p^{h}_{d+1}\) are \(\{0,0.025,0.05,\ldots ,0.2\}\).

We display in Fig. 3 the resource and price intraday functions for each season (trimester) of a year. Resource intraday functions depend on daily exchangeable energy and capacity, whereas price intraday functions depend on the price associated with aging and capacity.

Fig. 3
figure 3

Resource (left) and price (right) intraday functions for each trimester

6.4.2 Computation of the resource and price Bellman functions

Once obtained all possible intraday functions \(\widehat{L}_{d}^\textrm{R}\) and \(\widehat{L}_{d}^\textrm{P}\), the Bellman functions \(\widehat{\overline{V}}_{d}^{\textrm{R}}\) and \(\widehat{\underline{V}}_{d}^{\textrm{P}}\) are respectively computed by the Bellman recursions (41) and (46), for \(d\in \llbracket 0,D \rrbracket\). We display in Fig. 4 the resource and price Bellman functions obtained for the first day of the time horizon.

Fig. 4
figure 4

Resource and price Bellman functions at day 1

We observe that the resource and price Bellman functions present approximately the same shape and are just separated by a relatively small gap. The same observation holds true for all days of the time span. The largest relative gap between these bounds is 7.90%. The relative gap at the initial state \((s_{0},h_{0},c_{0})=(0,0,0)\), that is, the battery no longer works and has to be replaced, is around 4.84%.

We gather in Table 1 the computing times of the price and resource decomposition algorithms, namely the total CPU times and the total wall timesFootnote 11 when parallelization is on.

Table 1 Computing times of the price and resource decomposition methods

The computation is run on an Intel i7-8850H CPU @ 2.60 GHz 6 cores with 16 GB RAM. Table 1 displays the times needed to compute the intraday functions and the Bellman functions. We observe that, whereas the price decomposition algorithm requires a significantly longer CPU time than the resource decomposition algorithm, the two decomposition algorithms require a comparable wall time when parallelization is on. The main reason is that the parallelization of the computation of price Bellman functions decreases more significantly the computing time than the parallelization for resource Bellman functions. The explanation is that the computation done in parallel is longer in the price case, hence the CPU time saved is not compensated by too frequent memory sharings. The price intraday functions are also faster to compute because the price space is more coarsely discretized than the exchangeable energy space.

Finally, in Table 2, we give the values \(\widehat{\overline{V}}_{0}^{\textrm{R}}(x_{0})\) and \(\widehat{\underline{V}}_{0}^{\textrm{P}}(x_{0})\) of the resource and price Bellman functions at day \(d=0\) for the initial state \(x_{0}=(s_{0},h_{0},c_{0})=(0,0,0)\).

Table 2 Bounds obtained by resource and price decomposition

According to Sect. 4, these values are respectively an upper bound and a lower bound of the optimal value of Problem (34). Note however that the numerically computed values given in Table 2 may fail to be upper and lower bounds of the optimal cost of Problem (34) since the resource and price intraday functions are obtained (i) using approximations as explained in Sects. 6.2.1 and 6.2.2, and (ii) using the marginal probability distributions of the noises at the fast time-scale (see Remark 10), thus these intraday functions are optimal only if the noises are independent random variables at the fast time-scale.

6.4.3 Simulation of the resource and price policies

We present now several simulation results. Table 3 displays the times needed to perform a 20 years simulation over one scenario of battery prices and net demands, from which we deduce the average time needed to compute a decision at each time step.

Table 3 Computing times of simulation

Simulation using scenarios


We draw 1000 “true” scenarios of battery prices and net demands over 20 years, that is, scenarios extracted from the realistic data of the problem. There is thus no more independence assumption available for these scenarios. Then, as explained in Sect. 5.1, we simulate the charge and renewal decisions that are made when using the intraday functions and the Bellman functions obtained by resource and price decomposition, in order to compare the performances of both methods. All simulations start from the initial state \((s_{0},h_{0},c_{0})=(0,0,0)\). The average costs of these scenarios are given in Table 4.

Table 4 Average simulation costs using original scenarios

The comparison of the average costs shows that both decomposition methods provide comparable performances. However, the price decomposition outperforms the resource decomposition by achieving on average 1.05% of additional economic savings. This slightly superior performance of the price decomposition is observed on every simulation scenario.

We also note that the average costs are 20–25% higher than the corresponding values of the Bellman functions at the initial day for the initial state given in Table 2. This is a somewhat surprising result since the values in Table 2 are lower (price) and upper (resource) bounds of the optimal value of the problem, provided that the white noise Assumption 2 is fulfilled. But, as explained at the end of Sect. 6.4.2, the intraday functions have been computed by dynamic programming, and thus the values obtained are optimal only if the noises are independent at the fast time-scale. The computation of the price and resource Bellman functions makes use of these intraday functions and moreover are optimal only if the battery prices are day by day independent. However, the simulations are made with scenarios where the net demands and the prices are likely to be strongly correlated, hence the discrepancies.


Analysis of some scenarios


We select three scenarios (the colored scenarios in Fig. 1) among the 1000 scenarios of battery prices and net demands over 20 years that we used in the previous paragraph, and we analyse the behavior of the policies induced by resource and price decompositions. We recall that all simulations start from the initial state \((s_{0},h_{0},c_{0})=(0,0,0)\), that is, the battery no longer works and has to be replaced. Figure 5 displays the health (or exchangeable energy in kWh) of the batteries at the end of each day for the three scenarios, and Table 5 gives the associated simulation costs. In the first simulation, price and resource decompositions lead to significantly different renewal decisions and different costs. A small battery, (100 kWh, that is, 400 kWh of exchangeable energyFootnote 12) is purchased at day \(d=0\) for both price and resource decomposition. But at day \(d=2328\), another small battery is purchased in resource decomposition, whereas a large battery (1500 kWh, that is, 6 MWh of exchangeable energy) is purchased in price decomposition. Then, over the remaining time horizon, there is one battery renewal in price decomposition and two renewals in resource decomposition, hence a lower cost for price decomposition. In the second simulation, resource and price decompositions produce very similar health trajectories and costs. This is even clearer for the third simulation for which the health trajectories and the costs are almost identical. The third simulation shows a case where battery prices are high, hence only small batteries, that is, 100 kWh, are purchased.

Fig. 5
figure 5

Three simulations of the evolution of the battery health over 20 years

Table 5 Simulation results along three scenarios

Price decomposition outperforms resource decomposition on the three scenarios, but only by 1.27 % on Scenario 1 while the renewal decisions are significantly different. Our interpretation is that, in Scenario 1, it is almost as rewarding to buy either a big battery or a small battery taking into account the investment. Moreover, it seems that resource decomposition slightly underestimates the benefits of using a large battery compared to a small one. Indeed, we present in Fig. 6 the resource and price Bellman functions of the day \(d=2328\) (first battery renewal in Scenario 1), when the health of the battery is fixed to the value \(\overline{H}\) associated with a large battery renewal (1500 kWh), that is, \(\widehat{\overline{V}}_{2328}^{\textrm{R}}(\overline{H},\cdot )\) in (41) and \(\widehat{\underline{V}}_{2328}^{\textrm{P}}(\overline{H},\cdot )\) in (46). We observe that the resource Bellman function is significantly higher than the price Bellman function.

Fig. 6
figure 6

Bellman function of first renewal day of Scenario 1 for health fixed at 100 %

7 Conclusion

We have introduced the formal definition of a two-time-scale stochastic optimization problem. The motivation for two-time-scale modeling originated from a battery management problem over a long term horizon (20 years) with decisions being made every 30 min (charge/discharge). We have presented two algorithmic methods to compute daily Bellman functions to solve these generic problems—with an important number of time steps and decisions on different time-scales–when they display monotonicity properties. Both methods rely on a Bellman equation applied at the slow time-scale, producing Bellman functions at this scale.

Our first method, called resource decomposition algorithm, is a primal decomposition of the daily Bellman equation that requires to compute the value of a multistage stochastic optimization problem parameterized by a stochastic resource. Some monotonicity properties here make it possible to relax the coupling constraint and to replace the stochastic resource by a deterministic one, yielding an upper bound for the slow time-scale Bellman functions. Instead of this simplification, we could have turned the almost sure coupling constraint into a constraint in expectation. It would be interesting to compare this with our approach.

We address a similar and related difficulty in our price decomposition algorithm. It requires the computation of the value of a stochastic optimization problem parameterized by a stochastic price. Once again we replace it by a deterministic price, which is equivalent to dualize an expectation target constraint. This makes the previous enhancement proposal even more relevant. Our algorithm produces a lower bound for the slow time-scale Bellman functions that reveals to produce better results in simulation than the ones obtained by using the resource Bellman functions (we already have observed this numerical favorable phenomenon in Carpentier et al. (2020)).

Finally, we have shown with a realistic numerical application that these methods make it possible to compute design and control policies for problems with a very large number of time steps. But they could also be used for single-time-scale problems that exhibit monotonicity, periodicity and a large number of time steps.