1 Problem definition and motivation

We consider a hierarchical two-stage mixed-integer linear program (MILP) with integer variables in both the first and second stages. We are particularly interested in applications where the first-stage integer variables model design or purchasing decisions and the second-stage variables model operational decisions over a long time horizon (e.g., hourly operations over a decadal horizon). The goal is to take operational constraints into account when making a capital investment decision. Thus, our model is complicated by the fact that second-stage variables include both binary (on/off) decisions and continuous variables that model operational settings. Examples include the design of cogeneration units for commercial buildings subject to operational conditions [2830, 33, 34] and transmission network expansion subject to unit commitment constraints [1, 23]. Models of this class can involve hundreds of thousands or even millions of binary variables and are beyond the scope of today’s state-of-the-art solvers.

We consider a two-stage MILP with m first-stage variables and three sets of N second-stage variables. The first-stage variables are \(y \in \mathbb {Z}_+^m\) (non-negative integers of dimension m). We have three classes of second-stage variables: (1) on/off decisions, \(x \in \{0,1\}^N\); (2) operational settings \(v \in \mathbb {R}^N\) that are switched on or off by x, that is, \( L x \le v \le U x\) for finite bounds \(L \le U\); and (3) other second-stage variables \(0 \le w \in \mathbb {R}^N\). The MILP model is described by

$$\begin{aligned} \begin{array}{ll} \displaystyle \mathop {\hbox {minimize}}_{y,x,v,w} &{} a^T y + b^T x + c^T v + d^T w \\ \hbox {subject to }&{} A y + B x + C v + D w \le f \\ &{} y \in \mathbb {Z}_+^m \\ &{} x \in \{ 0,1\}^N \\ &{} v \in \mathbb {R}^N, \; L x \le v \le U x \\ &{} w \in \mathbb {R}^N, \; w \ge 0 , \end{array} \end{aligned}$$
(1.1)

where \(a \in \mathbb {R}^m\), \(b,c,d \in \mathbb {R}^N\), \(A \in \mathbb {R}^{M \times m}\), and \(B, C, D \in \mathbb {R}^{M \times N}\). We assume that \(m \ll N\) and \(m \ll M\); that is, the number of first-stage variables is much smaller than the number of second-stage binary variables and the number of coupling constraints.

In addition to a large number of binary and continuous variables in the second stage, the challenges of model (1.1) also arise from coupling constraints. We do not assume any sparsity structure in matrices B, C, and D. This is in contrast to the arrow-type sparsity structure that arises, for example, in stochastic programs [6]. Therefore, fixing the first-stage variable y does not decompose (1.1) into scenario-based subproblems. Moreover, the coupling constraints in (1.1) are not amenable to decomposition methods such as Lagrangian relaxation [14, 18]. The reason is that the number of coupling constraints, M, is of the same order as the number of variables, N. Dualizing all coupling constraints results in a large number of dual variables; hence, Lagrangian relaxation of (1.1) is unlikely to yield efficient decomposition methods [14].

Problem (1.1) arises in several applications. In particular, our work is motivated by the cogeneration problem with renewable energy for commercial buildings [28, 30, 33, 34]. In this case, the first-stage design involves investment decisions for cogeneration units such as fuel cells, solar panels, and battery storage. The second-stage problem aims at optimal on/off hourly operation that takes into account technology specifics such as the minimum and maximum power generation. Our goal is to include operational constraints in the design of cogeneration.

One of our objectives is to find the optimal first-stage solution of (1.1). However, the two-stage MILP with a large number of binary variables at the second stage is beyond the scope of state-of-the-art commercial MIP solvers. For example, a typical cogeneration model with a ten-year horizon results in 1.05 million binary variables (3650 days \(\times \) 24 h \(\times \) 12 units). On the other hand, a naive approach that solves (1.1) with a short horizon at the second stage provides first-stage designs that are suboptimal for a long horizon problem. The reason is that short horizon problems do not take into account coupling constraints over a long horizon. Moreover, the problem data of short-horizon problems are not representative of long-horizon problems, resulting in suboptimal solutions.

We develop a two-level approach that coarsens the hourly on/off variables to daily operation profiles. Since the profile representation yields a model with many fewer variables, we refer to this step as the primal (variable) coarsening. The resulting semi-coarse model still contains the same number of constraints as in (1.1). We reduce the number of constraints by partitioning them into groups that are of the same size as the profiles and summing over each group. This aggregation of constraints results in a relaxed problem whose solutions may not be feasible for the original MILP model. We include the violated constraints, re-solve the coarsened MILP model, and repeat this process until all constraints are satisfied. We refer to the aggregation of constraints as the dual (constraint) coarsening and the resulting MILP as the coarse model.

1.1 Literature review

The idea of aggregating variables and constraints to build approximate optimization models is not new. Zipkin studies the effect of variable aggregation and row aggregation for linear programs in [40, 41]. This framework is extended to stochastic linear programs in [5, 7]. For integer programs, a large amount of work focuses on aggregating constraints into one or more surrogate constraints for which one can show that the solution of the original problem and that of the surrogate problem are identical. Early work includes [2, 17, 19, 20]; see also the survey paper [31].

We note that the theory of surrogate constraints focuses mainly on integer programs with non-negative integer coefficients. The seminal work in [25] results in an exponential grow of the coefficients in the surrogate constraints. Many refinements have been developed, including simultaneous and sequential aggregation schemes, to reduce the magnitude of coefficients [31]. For mixed-integer programs with real-valued coefficients, however, we are not aware of aggregation schemes that guarantee equivalence between the original problem and the surrogate problem. In our two-level approach, we employ constraint aggregation as a relaxation technique to identify a smaller set of constraints that are active at the optimal solution. This is achieved by solving a sequence of MILPs and adding the violated constraints until all constraints in the original MILP are satisfied. Furthermore, we take advantage of the LP warm-start to reduce the number of MILP re-solves and the computational time of each MILP.

A large body of literature appeared in the 1990s on bilevel or multilevel mixed-integer programs; see [10, 22, 26, 37, 38] and survey papers [35, 36]. One of the main theoretical emphases is on decentralized decision-making from a game-theoretic point of view [36]. Optimality conditions for convex bilevel programs have been established in [35]. A heuristic-based branch-and-bound method is developed in [22, 38], and tabu search method is introduced in [37]. Recent years have seen specific application-driven algorithms ranging from infrastructure protection planning [32] to vulnerability analysis of power grids [27].

In this context, the multilevel method is specifically connected to the problem formulation. In contrast, our two-level method concentrates on algorithmic development for mixed-integer linear programs. In particular, our two-level approach builds optimization models of different resolutions from a fine-level problem to a coarse-level problem. We develop a systematic procedure that coarsens binary and continuous variables in addition to the aggregation of constraints. Our approach allows us to solve large MILPs with more than one million binary and continuous variables and coupling constraints. Extensive numerical experiments have been conducted to verify the efficiency of the developed algorithm.

Our two-level approach is reminiscent of multigrid methods in linear algebra and solution of partial differential equations. In [16], a multilevel iterative method for generic optimization problem is discussed. In [21], a recursive trust-region method for nonlinear unconstrained problems is developed. In [39], a line search multigrid approach for nonlinear programs is proposed.

1.2 Our contributions

Our contributions are fourfold. First, we develop a systematic two-level approach for two-stage MILPs with a large number of binary and continuous variables in the second stage. The coarsening of binary variables is done by introducing profiles (vectors) with binary elements. By selecting one profile from a small number of candidate profiles, we reduce the number of binary variables by orders of magnitude. In addition, we reduce the number of continuous variables by using a convex combination of profiles with real elements. We show that the semi-coarse model with coarsening in variables results in a tightening of the original MILP and therefore provides an upper bound on the optimal solution.

Second, we propose a simple method for constraint aggregation that does not require any sparsity assumptions on the coefficient matrices. By partitioning constraints into groups and taking a convex combination of constraints in each group, we obtain a relaxation of the semi-coarse model with many fewer constraints. We solve a sequence of coarse MILPs by adding any violated constraints until all constraints are satisfied. We show that the LP-relaxation of the coarse MILPs can be used to warm-start the MILP re-solves and significantly reduce the computational effort.

Third, we apply our two-level approach to the design of cogeneration in buildings. The resulting complex MILP model has a large number of coupling constraints over a long time horizon. We develop a moving-horizon method to generate valid profiles and show by construction that the generated profiles satisfy coupling constraints in time, further reducing the number of constraints in the coarsened models.

Fourth, we verify the efficiency of our algorithm using a variety of examples generated from simulation programs for commercial buildings. While our two-level approach requires solving a sequence of coarse MILPs, numerical results indicate that the first iterate provides a good approximate solution of the semi-coarse models. Through extensive numerical experiments, we demonstrate the scalability of the two-level approach on a rich set of large problems that are beyond the capacity of state-of-the-art commercial solvers.

Outline. The remainder of this paper is organized as follows. In Sect. 2, we describe the two-level approach for the two-stage MILPs. We show that our approach results in a tightening of the original problem and provides an upper bound on the optimal solution. Under the assumption of periodic problem data and no coupling between time periods, our two-level approach provides the optimal solution. In Sect. 3, we apply our approach to a complex MILP model from the cogeneration problem in buildings. We discuss how the profiles are generated and selected such that the coarsened models provide feasible solutions to (1.1). In Sect. 4, we demonstrate the effectiveness of our approach on a diverse set of test problems. We show that the two-level approach allows us to find good approximate solutions with a fraction of computational time compared with solving the full model. In Sect. 5, we summarize our contribution and discuss future extensions.

2 Two-level approach to MILP

In this section, we derive two models that are formulated in terms of the first-stage variables and \(n \ll N\) second-stage variables. We start by presenting our ideas for variable coarsening, resulting in a semi-coarse model that has \(\mathcal {O}(n)\) second-stage variables and \(\mathcal {O}(N)\) second-stage constraints. Next we show how to further coarsen this model by aggregating constraints, resulting in a coarse model with both \(\mathcal {O}(n)\) second-stage variables and constraints.

2.1 Primal coarsening

To coarsen the variables in the model, we introduce a coarsening factor \(\delta \in \mathbb {Z}_+\) and group N second-stage variables into \(n = N/\delta \) equal-sized groups of size \(\delta \) (the assumption that the groups are equal-sized is not critical):

$$\begin{aligned} {\mathbf{x}_i = \left( \begin{array}{c}x_{\delta (i-1)+1} \\ \vdots \\ x_{\delta i} \end{array}\right) \quad \hbox {for } \; i=1, \ldots , n.} \end{aligned}$$
(2.1)

We define groups for v and w analogously. These groups correspond to a partition of the second-stage variables xv, and w:

$$\begin{aligned} x = \left( \begin{array}{c}\mathbf{x}_1 \\ \vdots \\ \mathbf{x}_n \end{array}\right) , \quad v = \left( \begin{array}{c}\mathbf{v}_1 \\ \vdots \\ \mathbf{v}_n \end{array}\right) , \quad w = \left( \begin{array}{c}\mathbf{w}_1 \\ \vdots \\ \mathbf{w}_n \end{array}\right) . \end{aligned}$$
(2.2)

Next, we introduce \(\delta \)-profiles, which are fixed parameter values for a group of variables:

$$\begin{aligned} {\bar{X}}_k \in \{0,1\}^{\delta } \; \hbox { for } \; k=1,\ldots ,K, \end{aligned}$$
(2.3)

where K is a positive integer. We show in Sect. 3 how these profiles are generated and selected. Figure 1 illustrates that \(N=48\) hourly on/off variables can be represented by \(n=2\) daily profiles with length \(\delta =24\). Now for every \({\bar{X}}_k\), we collect a set of \(I_k\) operational \(\delta \)-profiles for \({\bar{V}}\),

$$\begin{aligned} {\bar{V}}_{jk} \in \mathbb {R}^{\delta }, \hbox { with } \; L {\bar{X}}_k \le {\bar{V}}_{jk} \le U {\bar{X}}_k \; \hbox { for } \; j=1,\ldots , I_k, \; k=1,\ldots ,K, \end{aligned}$$
(2.4)

and a set of J operational \(\delta \)-profiles for \({\bar{W}}\),

$$\begin{aligned} 0 \le {\bar{W}}_j \in \mathbb {R}^{\delta } \; \hbox { for } \; j=1,\ldots , J, \end{aligned}$$
(2.5)

where J is a positive integer.

Fig. 1
figure 1

The first row shows the hourly on/off variables \(x_t \in \{0,1\}\) for 48 h. The second row shows the profile representation with two daily profiles \({\bar{X}}_k \in \{0,1\}^{24}\)

Given these sets of \(\delta \)-profiles, we perform a change of variables that replaces the second-stage variables (xvw) by a reduced set of variables \(({\bar{x}},{\bar{v}},{\bar{w}})\) for the \(\delta \)-profiles, resulting in a coarsening of the second-stage variables. In particular, we set

$$\begin{aligned} {\mathbf{x}_i = \sum _{k=1}^K {\bar{x}}_{ik} {\bar{X}}_k \; , \quad \sum _{k=1}^K {\bar{x}}_{ik} \le 1 \; , \quad {\bar{x}}_{ik} \in \{0,1\} \quad \hbox { for }\; i=1,\ldots , n, \; k=1,\ldots , K.} \end{aligned}$$
(2.6)

Thus, we have that \({\bar{x}}\in \{0,1\}^{n \times K}\), and our goal is to create models where \(n K \ll N\). Similarly, we write v and w as

$$\begin{aligned} \mathbf{v}_i= & {} \sum _{k=1}^K \sum _{j=1}^{I_k} {\bar{v}}_{ijk} {\bar{V}}_{jk} \, , \quad \sum _{k=1}^K \sum _{j=1}^{I_k} {\bar{v}}_{ijk} \le 1 \, , \quad \sum _{j=1}^{I_k} {\bar{v}}_{ijk} = {\bar{x}}_{ik} \, , \quad 0 \le {\bar{v}}_{ijk} \le 1 \, , \quad \nonumber \\&\hbox { for } i=1,\ldots , n, \end{aligned}$$
(2.7)

and

$$\begin{aligned} {\mathbf{w}_i = \sum _{j=1}^J {\bar{w}}_{ij} {\bar{W}}_j \; , \quad \sum _{j=1}^J {\bar{w}}_{ij} \le 1 \; , \quad 0 \le {\bar{w}}_{ij} \le 1 \; , \quad \hbox { for }\; i=1,\ldots , n.} \end{aligned}$$
(2.8)

We note that we do not enforce \({\bar{w}}_{ij} \in \{0,1\}\) or \({\bar{v}}_{ijk} \in \{0,1\}\). The reason is that we wish to allow more freedom for choosing operational profiles in the second stage as long as they remain feasible. This choice also simplifies the coarsened second-stage problem and provides a valid approach if we assume that convex combinations of operational \(\delta \)-profiles \({\bar{V}}_{jk}\) and \({\bar{W}}_j\) are feasible, which is often the case in practice.

The partition of variables (xvw) implies a partition of the problem matrices B, C, and D of (1.1) as

$$\begin{aligned} B = [ B_1 : \cdots : B_{n} ], \quad C = [ C_1 : \cdots : C_{n} ], \quad D = [ D_1 : \cdots : D_{n} ], \end{aligned}$$

where \(B_i\), \(C_i\), and \(D_i \in \mathbb {R}^{M \times \delta }\) for \(i=1,\ldots ,n\). Next, we define \(\delta \)-profile matrices as

$$\begin{aligned} {\bar{X}}= & {} [ {\bar{X}}_1 : \cdots : {\bar{X}}_K ], \quad {\bar{V}}= [ {\bar{V}}_{11} : \cdots : {\bar{V}}_{1I_1} : \ldots : {\bar{V}}_{K 1} : \cdots : {\bar{V}}_{K I_K} ], \quad \nonumber \\ {\bar{W}}= & {} [ {\bar{W}}_1 : \cdots : {\bar{W}}_J], \end{aligned}$$
(2.9)

where \({\bar{X}}_k\), \({\bar{V}}_{jk}\), and \({\bar{W}}_j\) are vectors of length \(\delta \). We define aggregated (coarse) matrices \(\bar{B}\), \(\bar{C}\), and \(\bar{D}\) as

$$\begin{aligned} \bar{B} = [ B_1 {\bar{X}}: \cdots : B_{n} {\bar{X}}], \quad \bar{C} = [ C_1 {\bar{V}}: \cdots : C_{n} {\bar{V}}], \quad \bar{D} = [ D_1 {\bar{W}}: \cdots : D_{n} {\bar{W}}] , \end{aligned}$$
(2.10)

where \(\bar{B} \in \mathbb {R}^{M \times nK}\), \(\bar{C} \in \mathbb {R}^{M \times n\mathcal {I}}\), and \(\bar{D} \in \mathbb {R}^{M \times n J}\), where \(\mathcal {I} = \sum _{k=1}^K I_k\). Similarly, we aggregate the coefficient vectors in the objective function to obtain cost vectors \(\bar{b}\), \(\bar{c}\), and \(\bar{d}\)

$$\begin{aligned} \bar{b} = [ b_1^T {\bar{X}}: \cdots : b_n^T {\bar{X}}]^T, \quad \bar{c} = [ c_1^T {\bar{V}}: \cdots : c_n^T {\bar{V}}]^T, \quad \bar{d} = [ d_1^T {\bar{W}}: \cdots : d_n^T {\bar{W}}]^T, \end{aligned}$$
(2.11)

where \(\bar{b} \in \mathbb {R}^{nK}\), \(\bar{c} \in \mathbb {R}^{n \mathcal {I}}\), and \(\bar{d} \in \mathbb {R}^{nJ}\).

We obtain the following aggregated MILP, which we refer to as the semi-coarse MILP, because it has been coarsened in the primal variables only:

$$\begin{aligned} \begin{array}{ll} \displaystyle \mathop {\hbox {minimize}}_{y,{\bar{x}},{\bar{v}},{\bar{w}}} &{} a^T y + \bar{b}^T {\bar{x}}+ \bar{c}^T {\bar{v}}+ \bar{d}^T {\bar{w}}\\ \hbox {subject to }&{} A y + \bar{B} {\bar{x}}+ \bar{C} {\bar{v}}+ \bar{D} {\bar{w}}\le f \\ &{} \displaystyle \sum _{k=1}^K {\bar{x}}_{ik} \le 1 \; , \quad {\bar{x}}_{ik} \in \{0,1\} \\ &{} \displaystyle \sum _{k=1}^K \sum _{j=1}^{I_k} {\bar{v}}_{ijk} \le 1 \; , \quad \sum _{j=1}^{I_k} {\bar{v}}_{ijk} = {\bar{x}}_{ik} \, , \quad 0 \le {\bar{v}}_{ijk} \le 1 \\ &{} \displaystyle \sum _{j=1}^{J} {\bar{w}}_{ij} \le 1 \; , \quad 0 \le {\bar{w}}_{ij} \le 1. \end{array} \end{aligned}$$
(2.12)

The MILP (2.12) has potentially many fewer variables than (1.1) but contains as many constraints as the original problem. Before we show how to aggregate constraints in the next section, we finish this section by summarizing some of the properties of (2.12).

Proposition 1

Let \(({\bar{x}},{\bar{v}},{\bar{w}})\) be a feasible point of the semi-coarse model (2.12). Then it follows that the corresponding fine-scale variables

$$\begin{aligned} x = \left( \begin{array}{c}\displaystyle \sum _{k=1}^K {\bar{x}}_{1k} {\bar{X}}_k \\ \vdots \\ \displaystyle \sum _{k=1}^K {\bar{x}}_{nk} {\bar{X}}_k \end{array}\right) , \quad v = \left( \begin{array}{c}\displaystyle \sum _{k,j=1}^{K,I_k} {\bar{v}}_{1jk} {\bar{V}}_{jk} \\ \vdots \\ \displaystyle \sum _{k,j=1}^{K,I_k} {\bar{v}}_{njk} {\bar{V}}_{jk} \end{array}\right) , \quad w = \left( \begin{array}{c}\displaystyle \sum _{j=1}^J {\bar{w}}_{1j} {\bar{W}}_j \\ \vdots \\ \displaystyle \sum _{j=1}^J {\bar{w}}_{nj} {\bar{W}}_j \end{array}\right) \end{aligned}$$
(2.13)

are feasible in the original MILP (1.1).

Proof

The binary constraint \(x \in \{0,1\}^N\) follows from the representation (2.6) and the definition of binary vectors \({\bar{X}}_k\). Similarly, the non-negativity of w is a direct consequence of non-negative profiles \({\bar{W}}_j\) and non-negative coefficients \({\bar{w}}_{ij}\). To show that v is feasible, we start with the definition of \({\bar{V}}_{jk}\) in (2.4):

$$\begin{aligned} L \cdot {\bar{X}}_k \le {\bar{V}}_{jk} \le U \cdot {\bar{X}}_k. \end{aligned}$$

Since \({\bar{v}}_{ijk} \ge 0\), it follows that

$$\begin{aligned} L \sum _{k=1}^K \sum _{j=1}^{I_k} {\bar{v}}_{ijk} {\bar{X}}_k \le \sum _{k=1}^K \sum _{j=1}^{I_k} {\bar{v}}_{ijk} {\bar{V}}_{jk} \le U \sum _{k=1}^K \sum _{j=1}^{I_k} {\bar{v}}_{ijk} {\bar{X}}_k. \end{aligned}$$

Using \({\bar{x}}_{ik}=\sum _{j=1}^{I_k} {\bar{v}}_{ijk}\), we have

$$\begin{aligned} L \sum _{k=1}^K {\bar{x}}_{ik} {\bar{X}}_k \le v_i \le U \sum _{k=1}^K {\bar{x}}_{ik} {\bar{X}}_k, \quad \hbox { for } i=1,\ldots ,n. \end{aligned}$$

From the definition of \(x_i\) in (2.13), it follows that \(L x_i \le v_i \le U x_i\), for \(i=1,\ldots ,n\), and thus \(L x \le v \le U x\). The proof is complete by noting that

$$\begin{aligned} f \ge Ay + \bar{B} {\bar{x}}+ \bar{C} {\bar{v}}+ \bar{D} {\bar{w}}= Ay + Bx + Cv + Dw, \end{aligned}$$

where the equality follows from the definition of aggregated matrices (2.10) and the representation of the fine-scale variables (2.13). \(\square \)

Next, we show that the semi-coarse model provides an upper bound.

Proposition 2

The semi-coarse model (2.12) is a tightening of the original MILP (1.1), and its solution provides an upper bound on (1.1). The two problems are equivalent if all optimal profiles from the solution of (1.1) are included in (2.12).

Proof

Let \(z^\star _\mathrm{MILP}\) and \(z^\star _\mathrm{semi}\) be the optimal value of the original MILP (1.1) and the semi-coarse model (2.12), respectively. Since any feasible point of the semi-coarse model (2.12) is a feasible point of the original MILP (1.1), it follows that (2.12) is a tightening of (1.1) and thus

$$\begin{aligned} z^\star _\mathrm{MILP}\le z^\star _\mathrm{semi} \, . \end{aligned}$$

Let \((x^\star ,v^\star ,w^\star )\) be the optimal solution of MILP (1.1). We now extract optimal profiles \(({\bar{X}}^\star ,{\bar{V}}^\star ,{\bar{W}}^\star )\) corresponding to \((x^\star ,v^\star ,w^\star )\). Then it follows that there exists a solution of (2.12) such that

$$\begin{aligned} \mathbf{x}_i^\star = \sum _{k=1}^K {\bar{x}}^\star _{ik} {\bar{X}}_k^\star , \quad \mathbf{v}_i^\star = \sum _{k=1}^K \sum _{j=1}^{I_k} {\bar{v}}^\star _{ijk} {\bar{V}}_{jk}^\star , \quad \mathbf{w}_i^\star = \sum _{j=1}^J {\bar{w}}^\star _{ij} {\bar{W}}_j^\star , \quad \hbox { for } i=1,\ldots ,n. \end{aligned}$$

Moreover, the solution \(({\bar{x}}^\star ,{\bar{v}}^\star ,{\bar{w}}^\star )\) is feasible in (2.12), and the objective value is the same as the objective value of \((x^\star ,v^\star ,w^\star )\). Because (2.12) is a tightening of (1.1), there cannot be a better solution than \(({\bar{x}}^\star ,{\bar{v}}^\star ,{\bar{w}}^\star )\) that we constructed. Hence, \(({\bar{x}}^\star ,{\bar{v}}^\star ,{\bar{w}}^\star )\) is the optimal solution, and both problems (1.1) and (2.12) are equivalent. \(\square \)

2.2 Dual coarsening

We next coarsen the constraints by partitioning them into n groups of size \(\delta \) and summing over each group. We define the rows of A as \(A_i^T\), and we define a set of aggregated constraints by taking convex combination over the rows of A within each group

$$\begin{aligned} \hat{A} :=\left( \begin{array}{c}\displaystyle \sum _{i=1}^{\delta } A_{i}^T \lambda _i \\ \displaystyle \sum _{i=1}^{\delta } A_{\delta + i}^T \lambda _{\delta +i} \\ \vdots \\ \displaystyle \sum _{i=1}^{\delta } A_{\delta (n-1) + i}^T \lambda _{\delta (n-1)+i} \end{array}\right) , \end{aligned}$$

where the non-negative weights \(\lambda \)’s satisfy

$$\begin{aligned} \lambda _{\delta (k-1)+i} \in [0,1], \quad \sum _{i=1}^\delta \lambda _{\delta (k-1)+i} = 1, \quad \hbox { for } k=1,\ldots ,n. \end{aligned}$$

We aggregate the constraint matrices \(\bar{B}\), \(\bar{C}\), and \(\bar{D}\) in a similar way:

$$\begin{aligned} \hat{\bar{B}}:= & {} \left( \begin{array}{c}\displaystyle \sum _{i=1}^{\delta } \bar{B}_{i}^T \lambda _i\\ \displaystyle \sum _{i=1}^{\delta } \bar{B}_{\delta + i}^T \lambda _{\delta +i}\\ \vdots \\ \displaystyle \sum _{i=1}^{\delta } \bar{B}_{\delta (n-1) + i}^T \lambda _{\delta (n-1)+i} \end{array}\right) , \quad \hat{\bar{C}} :=\left( \begin{array}{c}\displaystyle \sum _{i=1}^{\delta } \bar{C}_{i}^T \lambda _i\\ \displaystyle \sum _{i=1}^{\delta } \bar{C}_{\delta + i}^T \lambda _{\delta +i}\\ \vdots \\ \displaystyle \sum _{i=1}^{\delta } \bar{C}_{\delta (n-1) + i}^T \lambda _{\delta (n-1)+i} \end{array}\right) , \quad \\ \hat{\bar{D}}:= & {} \left( \begin{array}{c}\displaystyle \sum _{i=1}^{\delta } \bar{D}_{i}^T \lambda _i\\ \displaystyle \sum _{i=1}^{\delta } \bar{D}_{\delta + i}^T \lambda _{\delta +i}\\ \vdots \\ \displaystyle \sum _{i=1}^{\delta } \bar{D}_{\delta (n-1) + i}^T \lambda _{\delta (n-1)+i} \end{array}\right) . \end{aligned}$$

Note that we use “bar” notation to denote coarsening in variables and “hat” notation to denote coarsening in constraints. We coarsen the right-hand side f analogously and obtain the coarse MILP:

$$\begin{aligned} \begin{array}{ll} \displaystyle \mathop {\hbox {minimize}}_{y,{\bar{x}},{\bar{v}},{\bar{w}}} &{} a^T y + \bar{b}^T {\bar{x}}+ \bar{c}^T {\bar{v}}+ \bar{d}^T {\bar{w}}\\ \hbox {subject to }&{} \hat{A} y + \hat{\bar{B}} {\bar{x}}+ \hat{\bar{C}} {\bar{v}}+ \hat{\bar{D}} {\bar{w}}\le \hat{f} \\ &{} \displaystyle \sum _{k=1}^K {\bar{x}}_{ik} \le 1 \; , \quad {\bar{x}}_{ik} \in \{0,1\} \\ &{} \displaystyle \sum _{k=1}^K \sum _{j=1}^{I_k} {\bar{v}}_{ijk} \le 1 \; , \quad \sum _{j=1}^{I_k} {\bar{v}}_{ijk} = {\bar{x}}_{ik} \, , \quad 0 \le {\bar{v}}_{ijk} \le 1 \\ &{} \displaystyle \sum _{j=1}^{J} {\bar{w}}_{ij} \le 1 \; , \quad 0 \le {\bar{w}}_{ij} \le 1. \end{array} \end{aligned}$$
(2.14)

It follows easily that (2.14) is a relaxation of (2.12), because we have simply aggregated the constraints. We can use this fact to develop a simple algorithm that solves (2.12) by solving a sequence of tighter relaxations. The main idea is that after solving a relaxation (2.14) we can check whether all constraints in (2.12) are satisfied and add any violated constraints to (2.14). It follows easily that this algorithm is finite, because after finitely many constraints have been added to (2.14), it is equivalent to (2.12).

In practice, however, solving a sequence of MILPs (2.14) may not be efficient, because MILPs do not warm-start. Instead, we solve the LP-relaxation of the coarse model (2.14) and add violated constraints until all constraints in the LP-relaxation of the semi-coarse model (2.12) are satisfied. We use the identified constraints as the initial set of constraints for the MILP coarse model iterations. We summarize this procedure in Algorithm 1.

figure a

Proposition 3

When Algorithm 1 terminates, the solution \((y^k,{\bar{x}}^k, {\bar{v}}^k, {\bar{w}}^k)\) of the coarse model (2.14) with the set of added constraints \(\mathcal {A}^{k}_\mathrm{MILP}\) is the solution of the semi-coarse model (2.12).

Proof

Since \((y^k,{\bar{x}}^k, {\bar{v}}^k, {\bar{w}}^k)\) minimizes the objective function of the semi-coarse model (2.12) and satisfies all constraints (2.12), it is the optimal solution of (2.12).

\(\square \)

We show in Sect. 4 that our approach in Algorithm 1 is advantageous; in particular, it significantly reduces the number of MILP re-solves, as opposed to the case without LP-relaxation as warm-start.

In Appendix 1, we show that if the problem data is periodic with period \(\delta \) and if there exists no coupling between periods, then our proposed approach converges in a single iteration. This observation motivates the use of our approach on problems that are nearly periodic.

3 Application to cogeneration for buildings

In this section, we apply our two-level approach to the cogeneration problem for buildings. Our MILP model (7.1) is adapted from models for cogeneration in commercial buildings [28, 30]. In particular, we take linearized models for fuel cells and water tank storage from [28], and we penalize on/off operations using switching cost as done in [30]. While the two-level framework described in Sect. 2 applies to generic MILPs (1.1), the MILP model (7.1) for cogeneration entails several complex constraints as discussed in Sect. 3.2. As a result, additional work is required to construct appropriate semi-coarse and coarse models.

We note that our MILP model (7.1) for the cogeneration problem has the following features that are not included in existing models. First, our model has a large number of binary variables, on the order of \(\mathcal {O}(10^6)\), in the second stage. This is orders of magnitude larger than the number of binary variables in [28, 30, 33, 34]. This modeling feature allows considerably more degrees of freedom for on/off operation during the life time of new technologies. Second, our model contains three sets of coupling constraints that (i) couple first- and second-stage binary variables, (ii) couple second-stage variables for different technologies, and (iii) couple second-stage variables over a long time horizon (e.g., 10–20 years). The number of coupling constraints is on the same order of variables, namely, \(\mathcal {O}(10^6)\). This makes the problem significantly harder because it does not lend itself to decomposition techniques such as Lagrangian relaxation.

In what follows, we describe the main characteristics of the MILP model in Sect. 3.1, derive the semi-coarse and coarse models in Sects. 3.2 and 3.3, and discuss profile generation and selection in Sect. 3.4.

3.1 MILP model

The cogeneration problem consists of two components: the investment decision and the operation planning. The investment decision concerns what new technologies to purchase, while the operation planning concerns how to dispatch units over a long-term period (e.g., 10–20 years). We formulate a two-stage MILP, where the first-stage variables model investment decisions and the second-stage variables model equipment operations. Note that both the first and second stages contain integer variables. In particular, on/off operations for new technologies in second-stage are made on an hourly basis.

Following [28, 30, 33, 34], we consider five technologies: batteries, boilers, solid-oxide fuel cells (SOFCs), combined heat and power (CHP) SOFCs, and water tank storage. The selection criterion is based on the installation, operation, maintenance, fuel consumption, and carbon emission costs, while meeting electricity and heating demands. The detailed MILP model is given in (7.1).

3.2 Semi-coarse model

We begin by introducing the profiles for variables. We denote profiles by upper-case letters with the bar notation on top: \(\bar{X}_{jk} \in \mathcal {P}_o\) are the on/off profiles for technology j, \(\bar{P}_{jkl} \in \mathcal {P}_p\) are the production profiles associated with \(\bar{X}_{jk}\), \(\bar{U}_k \in \mathcal {P}_u\) are the profiles for power purchased from the utility, \(\bar{Q}_k \in \mathcal {P}_q\) are the heat generation profiles from boilers, \(\bar{B}_k, \bar{B}_k^\mathrm{IO} \in \mathcal {P}_b\) are the power storage and input/output profiles for batteries, and \(\bar{S}_k, \bar{S}_k^\mathrm{out} \in \mathcal {P}_s\) are the heat storage and output profiles for water tanks, where \(\mathcal {P}_o, \mathcal {P}_p, \mathcal {P}_u, \mathcal {P}_q, \mathcal {P}_b\), and \(\mathcal {P}_s\) are the corresponding set of profiles. We denote the hth element of a profile by \(\bar{X}_{jk}(h)\) for \(h \in \{1,\ldots ,\delta \}\). Time-dependent parameters \(D_t^P\) and \(D_t^Q\) and discount factors \(Y_t\) are concatenated into profiles \(\bar{D}_d^P,\bar{D}_d^Q\), and \(\bar{Y}_d \in \mathbb {R}^{\delta }\).

As described in Sect. 2, the variables in the original MILP model are represented by using profiles in the coarsened models. In particular, binary variables \(x_{ijt}\) are coarsened to profiles \(\bar{X}_{jk}\), and continuous variables \(p_{ijt},u_t,b_t,s_t,s_t^\mathrm{out}\), and \(q_{t}\) are coarsened to profiles \(\bar{P}_{jkl},\bar{U}_k,\bar{B}_k,\bar{S}_k\), and \(\bar{S}_k^\mathrm{out},\bar{Q}_k \ge 0\), respectively. The coefficients for profiles are denoted by lower-case letters with a bar on top; for example, \(\bar{x}_{ijdk} \in \{0,1\}\) indicates whether profile k is selected on day d for unit i of technology j. As modeled in (9.1d), no more than one on/off profile can be selected for fixed ij, and d. This, in conjunction with binary elements of \(\bar{X}_{jk}\), results in binary-valued \(x_{ijt}\). Similarly, non-negative profiles \(\bar{P}_{jkl},\bar{U}_k,\bar{B}_k,\bar{S}_k,\bar{S}_k^\mathrm{out}\), and \(\bar{Q}_k \ge 0\) and their non-negative coefficients in (9.1g), (9.1q), and (9.1r) result in non-negative \(p_{ijt},u_t,b_t,s_t,s_t^\mathrm{out}\), and \(q_{t}\).

We next turn to constraints that do not couple variables in time, namely, (7.1c), (7.1e), (7.1f), (7.1h), (7.1k), (7.1l), (7.1o), and (7.1p). The profile representation for these constraints is straightforward; one simply substitutes the summation of profiles as shown in (9.1c), (9.1e), (9.1f), (9.1h), (9.1k), (9.1l), (9.1o), and (9.1p), respectively. Note that all inequalities in (9.1) are elementwise inequalities. We do not include the production constraint (7.1d) in the semi-coarse model. Instead, we require that the production profile \(\bar{P}_{jkl}\) associated with the on/off profile \(\bar{X}_{jk}\) satisfy

$$\begin{aligned} R_j^\mathrm{min} \bar{X}_{jk} \le \bar{P}_{jkl} \le R_j^\mathrm{max} \bar{X}_{jk}. \end{aligned}$$
(3.1)

This approach is justified because (7.1d) follows by construction due to the convex combination of coefficients for production profiles in (9.1g). By an analogous argument, \(s_t^\mathrm{out} \le s_{t}\) in (7.1o) is a direct consequence of the following condition imposed on profiles:

$$\begin{aligned} \bar{S}_k^\mathrm{out} \le \bar{S}_k. \end{aligned}$$
(3.2)

We next discuss how to coarsen variables that are coupled over time periods. The boundary conditions (7.1j) and (7.1n) can be expressed as (9.1j) and (9.1n). The maximum power purchased constraint (7.1h) can be rewritten as (9.1h), where \(\mathbf{1}\) denotes the vector of all ones with length \(\delta \). We now turn to switching and storage constraints (7.1g), (7.1i), and (7.1m), whose profile representation needs additional notation. Given an on/off profile \(\bar{X}_{jk} \in \{0,1\}^\delta \), we can construct the associated switching profile

$$\begin{aligned} \bar{W}_{jk}(h) = |\bar{X}_{jk}(h+1) - \bar{X}_{jk}(h)|. \end{aligned}$$
(3.3)

Then the switching cost can be included in the objective function in (9.1). To deal with the battery storage constraint (7.1i) that couples \(b_t\) and \(b_t^\mathrm{IO}\) over the horizon T, we require that the pair of profiles \((\bar{B}_k\), \(\bar{B}_k^\mathrm{IO})\) satisfy

$$\begin{aligned} \bar{B}_k(h+1) = (1-L^P) \bar{B}_k(h) + \bar{B}_k^\mathrm{IO}(h), \quad h=1,\ldots ,\delta -1. \end{aligned}$$
(3.4)

We assign the same coefficient \(\bar{b}_{dk}\) to both sets of profiles \(\bar{B}_k\) and \(\bar{B}_k^\mathrm{IO}\) throughout (9.1). It follows that (7.1i) is satisfied except for hours between profiles, namely, \(t = d \delta \) for \(d < |\mathcal {D}|\). Therefore, we introduce constraint (9.1i) to guarantee that (7.1i) holds for \(t = d \delta \) for \(d < |\mathcal {D}|\). We will show in Sect. 3.4 how to generate profiles that satisfy (3.1), (3.2), and (3.4). The heat storage constraint (7.1m) for \(j=\mathrm{chp}\) can be expressed as

$$\begin{aligned}&\sum _{k \in \mathcal {P}_s} \bar{s}_{dk} \left\{ \big [ S_\delta - (1-L^Q)I_\delta \big ] \bar{S}_{k} + \bar{S}^\mathrm{out}_{k} \right\} \\&\quad \quad \quad + \sum _{k \in \mathcal {P}_s} \bar{s}_{(d+1)k} E_\delta \bar{S}_{k} \le (E_{j}^Q / E_{j}^P) \sum _{i,k,l} \bar{p}_{ijdkl} \bar{P}_{jkl}, ~ d < |\mathcal {D}|, \end{aligned}$$

where \(I_\delta \in \mathbb {R}^{\delta \times \delta }\) is the identity matrix, \(E_{\delta 1} \in \mathbb {R}^{\delta \times \delta }\) is a matrix with 1 in the \((\delta ,1)\) entry, and \(S_\delta \in \mathbb {R}^{\delta \times \delta }\) is a Toeplitz matrix with only nonzero elements being 1 at the first upper-subdiagonal.

3.3 Coarse model

We next turn to the coarse model. Note that constraints in the semi-coarse model (9.1) are elementwise equalities or inequalities involving daily profiles. We aggregate hourly constraints by taking a weighted sum of the elements of daily profiles. Using the hat notation on the top to denote the elementwise summation of profiles, we coarsen \(\delta \) hourly constraints into a single constraint. For example, the hourly demand constraint (9.1c) is replaced by the daily demand constraint (10.1c). Similarly, the maximum hourly power demand (9.1h) is aggregated into the maximum daily demand (10.1h). The symmetric breaking constraint (10.1f) has the interpretation that the number of times unit \(i+1\) is turned on is no greater than the number of times unit i is turned on in day d. The capacity constraints for technologies (10.1e), (10.1k), (10.1o), and (10.1p) imply that the daily power/heat output is bounded by the daily capacity of the technology units purchased at the first stage. Since (9.1i), (9.1j), and (9.1n) are scalar constraints themselves, no aggregation is applied to (10.1i), (10.1j), and (10.1n) in the coarse model. Also, since the coefficients of profiles are the same for both the semi-coarse and coarse models, (10.1d), (10.1g), and (10.1q) stay the same as (9.1d), (9.1g), and (9.1r) in the semi-coarse model.

3.4 Profile generation and selection

Recall that the on/off profiles \(\bar{X}_{jk}\) must have binary elements and the profiles \(\bar{P}_{jkl}\), \(\bar{Q}_k\), \(\bar{U}_k\), \(\bar{B}_k\), \(\bar{S}_k\), and \(\bar{S}^\mathrm{out}_k\) must have non-negative elements. In addition, we impose constraints (3.1), (3.2), and (3.4) in formulating the semi-coarse model in Sect. 3.2. In this section, we discuss how to generate valid profiles, and we provide suggestions for profile selections.

One approach to generating profiles that satisfy the above constraints is as follows. We solve a number of small instances of the original MILP (7.1), take snapshots of the second-stage solutions, and extract profiles from these snapshots. For example, consider a four-day MILP (7.1); that is, the time horizon in the second-stage problem has only four days. Solving such a four-day model, we have four snapshots of daily operation and production; in particular, we have four sets of on/off profiles for \(x_{ijt} \in \{0,1\}\) and four sets of profiles \(p_{ijt}\), \(u_t\), \(b_t\), \(s_t\), \(s_t^\mathrm{out}\), and \(q_{jt} \ge 0\). We will show in Proposition 1 that these are valid profiles for the semi-coarse model (9.1).

The remaining question is how to choose the short-horizon MILPs. Our objective is to generate a rich set of profiles that are representative of the optimal solutions for a long-horizon MILP (7.1). In what follows, we develop a moving-horizon approach. Given a long-horizon MILP, the idea is to solve MILPs over a short window, roll the window forward, and re-solve the new MILP until the window reaches the end of the horizon; see Fig. 2 for an illustration. We summarize this approach and provide additional details in Algorithm 2.

Fig. 2
figure 2

Illustration of the moving horizon method in Algorithm 2

figure b

Lemma 1

The moving-horizon method described in Algorithm 2 generates non-negative profiles \(\{ \bar{P}_{jkl}\), \(\bar{Q}_k\), \(\bar{U}_k\), \(\bar{B}_k\), \(\bar{S}_k\), \(\bar{S}^\mathrm{out}_k \}\) \(\in \mathbb {R}_+^\delta \) and binary profiles \(\{\bar{X}_{jk}, \bar{W}_{jk}\} \in \{0,1\}^\delta \). Moreover, the profile pairs \(\{\bar{X}_{jk},\bar{P}_{jkl}\}\), \(\{\bar{S}_k^\mathrm{out}, \bar{S}_k\}\), and \(\{\bar{B}_k, \bar{B}_k^\mathrm{IO}\}\) satisfy (3.1), (3.2), and (3.4), respectively.

Proof

Non-negativity of the profiles \(\{\bar{P}_{jkl}\), \(\bar{Q}_k\), \(\bar{U}_k\), \(\bar{B}_k\), \(\bar{S}_k\), \(\bar{S}^\mathrm{out}_k\}\) follows from the fact that the second-stage solutions \(\{p_{ijt}\), \(q_{t}\), \(u_t\), \(b_t\), \(s_t\), \(s_t^\mathrm{out}\}\) of the original MILP model (7.1) are non-negative. Similarly, \(\bar{X}_{jk} \in \{0,1\}^\delta \) follows from \(x_{ijt} \in \{0,1\}\); and \(\bar{W}_{jk}\), constructed from (3.3), is elementwise binary. Since the production variable \(p_{ijt}\) and the on/off variable \(x_{ijt}\) satisfy (7.1d), it follows that \(\{\bar{X}_{jk},\bar{P}_{jkl}\}\) satisfies (3.1). Since the power storage \(b_t\) and power input/output \(b_t^\mathrm{IO}\) satisfy (7.1i) and since the heat storage \(s_t\) and heat output \(s_t^\mathrm{out}\) satisfy (7.1m), we conclude that (3.2) and (3.4) follow by construction. \(\square \)

Algorithm 2 potentially generates a huge number of profiles; hence, solving (9.1) and (10.1) with all generated profiles may be prohibitive. Instead, we select on/off profiles \(\bar{X}_{jk}\) that appear most frequently in the generated profile pool. Since the aim of the two-level approach is to reduce the problem size, it is desired that the number of on/off profiles k is much smaller than the length of profiles \(\delta \) (e.g., \(k \approx \delta /10\)). For the production profiles \(\bar{P}_{jkl}\) associated with each \(\bar{X}_{jk}\), we choose production profiles that have the minimum or maximum total production \(\sum _{h=1}^\delta \bar{P}_{jkl}(h)\). These extreme profiles provide an envelope of other profiles; thus, their convex combination (9.1g) provides a good range of profiles for selection. Similarly, profiles with the minimum and maximum sum of absolute values are chosen for the battery storage \(\bar{B}_k\), the heat storage \(\bar{S}_k\), the battery input/output \(\bar{B}^\mathrm{IO}_k\), and the heat output \(\bar{S}_k^\mathrm{out}\). Further, profiles for the power purchased from the utility \(\bar{U}_k\) and the heat output from the boiler \(\bar{Q}_k\) are uniformly sampled over periods of the entire horizon.

Alternatively, we also employ a k-means clustering algorithm in order to cluster profiles. Given a prespecified k number of clusters, this algorithm assigns profiles to one of k clusters defined by the centroids [24]. Since demand and pricing data for buildings tend to differ significantly in winters and summers, we set \(k=2\) and choose one profile that has the minimum least-squares distance to the centroids. In our numerical experiments in Sect. 4, we apply a k-means algorithm to the boiler output \(\bar{Q}_k\), power purchased from the utility \(\bar{U}_k\), heat storage output \(\bar{S}_k^\mathrm{out}\), and battery power output \(\bar{B}^\mathrm{IO}_k\) profiles. The clustering approach achieves better performance in the objective function than does simple sampling heuristics for profile selection.

4 Numerical results and case studies

In this section, we illustrate the effectiveness of our two-level approach using five different building examples. We demonstrate that both the semi-coarse and the coarse models allow us to find good approximate solutions in a fraction of the time compared with the full MILP model. The two-level approach also scales to large problems that are beyond the scope of state-of-the-art commercial MILP solvers.

4.1 Generation of problem instances

We use EnergyPlus 8.4 [8, 9] to generate yearly electricity and heating demands for five types of buildings, namely, a secondary school, a supermarket, a hospital, a stand-alone retail, and a full-service restaurant, located in Chicago, Illinois. Figure 3 shows the electricity demand of the secondary school and the restaurant in a year and in November. Note that the demand data shows cyclic structure in days, weeks, and months, which is a desired problem characteristic for our two-level approach as discussed in Appendix 1. To generate multiyear demands, we take 1-year demand and perturb with a zero mean unit variance Gaussian noise whose magnitude is 2 % proportional to the magnitude of demands. We follow the pricing structure in [34]: The electricity price is \(\$0.12\) per kWh and the gas price is \(\$0.049\) per kWh. The peak demand charge is \(\$14.2\) per kW for summer months (from June to September) and \(\$11.36\) per kW for winter months (from October to May). For multiyear pricing, we follow the history data from U.S. Energy Information Administration [12] and increase the electricity price 3 % annually.

Fig. 3
figure 3

Electricity demand generated by EnergyPlus for a secondary school and a full-service restaurant located in Chicago. a Electricity demand for a secondary school in Chicago in a year. b Electricity demand for a secondary school in Chicago in November. The school is closed on Thanksgiving Day. c Electricity demand for a full-service restaurant in Chicago in a year. d Electricity demand for a full-service restaurant in Chicago in November

4.2 Numerical experiment setup

Numerical experiments are performed on a workstation with 32 GB memory and two Intel E5430 Xeon 4-core 2.66 GHz CPUs. We implement our algorithms in AMPL [15] to take advantage of AMPL’s compact modeling syntax in profile generation, storage, and management. We use CPLEX version 12.6.1.0 as the MIP solver in AMPL. We set a 3-h time limit and \(1\,\%\) relative gap as the stopping criteria for CPLEX. Our AMPL codes are made publicly available at http://www.mcs.anl.gov/~fulin/codes/DistrGen.zip.

4.3 Solutions of the full model with short second-stage horizons

Table 1 shows the problem size of the full MILP model (7.1) as a function of the number of days in the second stage. The number of binary variables, continuous variables, and constraints grows linearly with the number of days in the second-stage problem. The one-year model has \(1.05 \times 10^5\) binary variables, \(2.71 \times 10^5\) continuous variables, and \(6.11 \times 10^5\) constraints.

Table 1 Problem size of the MILP model (7.1). The number of binary variables, continuous variables, and constraints grows linearly with the number of days
Table 2 Solutions of the MILP model (7.1) for Restaurant using CPLEX

Table 2 shows the solutions of small problems for Restaurant using CPLEX. The solution time in seconds, the number of nodes, and the total number of simplex iterations for the branch-and-cut algorithm grow exponentially with the number of days in the second stage. The first-stage solutions for batteries (Bat), boilers (Boil), CHP-SOFC (CHP), Power SOFC (Pow), and water tank storage (Stor) vary with the problem size. Solving the 84-day example is beyond the capabilities of CPLEX on the designated workstation. After reaching the 3-h time limit, the relative MIP gap for the 84-day model is still greater than \(2.49\,\%\). Removing the time limit does not help because then the branch-and-bound tree generated by CPLEX will consume all allowable disk space of 100 GB on the workstation. Our experience indicates that the full MILP model (7.1) probably can not be solved over a ten-year time horizon. Similar observations can be made for other building examples. Figure 4 shows that for all five buildings, both the number of simplex iterations and the amount of solution time increase exponentially with the number of days in the second stage.

Fig. 4
figure 4

Exponential increase of the amount of time and the number of simplex iterations with the number of days in the second stage of the full MILP model (7.1). The number of simplex iterations drops for the 84-day restaurant example because CPLEX reaches the 3-h time limit

Fig. 5
figure 5

Problem size for the full MILP model (7.1), the semi-coarse model (9.1), and the coarse model (10.1)

We point out that different buildings show different levels of difficulty for the MILP model (7.1). As we see in Fig. 4, the five buildings differ by orders of magnitude in solution time and the number of simplex iterations. Similarly, the number of nodes in the branch-and-bound trees varies significantly over different buildings. For example, the 28-day model requires 35,836 nodes for Restaurant as opposed to 360 nodes for School. Additional numerical results are summarized in Table 7 in Appendix 3.

4.4 Solutions of semi-coarse model with long second-stage horizons

Figure 5 compares the problem size for the full model (7.1), the semi-coarse model (9.1), and the coarse model (10.1) over a 10-year horizon in the second stage. We see a roughly 8 times reduction in the number of binary variables and 12 times reduction in the number of continuous variables. The reason is that for each day with 24 h, we pick the three most frequent on/off profiles for binary variables and two cluster centers from the k-means clustering algorithm for continuous variables. While increasing the number of profiles improves the quality of the coarsened models (9.1) and (10.1), the resulting computational effort increases significantly. For 10-year horizon problems, we find that the two-level approach strikes a good balance between solution quality and computational time with a small number of profiles (typically 2 - 3). Note the large reduction in the number of constraints from the full model to the semi-coarse model. This is mainly because we have embedded the min/max production constraints (7.1d) and the switching constraints (7.1g) in the profile generation; see Sect. 3.2. In particular, for a 10-year model with 12 fuel cell units, there is a reduction of \(12\times 87600 \times 4 \approx 4.2 \times 10^6\) constraints (i.e., a \(67\,\%\) reduction in the number of constraints.)

Table 3 shows the solution information for the semi-coarse model over a 10-year horizon. As opposed to the exponential increase for the full model, the solution time, the number of nodes, and the number of simplex iterations for the semi-coarse model grow more slowly.

Figure 6 shows that both the number of simplex iterations and the amount of solution time are more scalable for the semi-coarse models. For example, the number of simplex iterations for 10-year semi-coarse models is on the same order as that of the 84-day full model. The growth of computational effort is considerably slower than that of the full model shown in Fig. 4. The number of simplex iterations drops for 6-, 8-, and 10-year problems for SuperMarket and 8- and 10-year problems for Retail and Restaurant because CPLEX reaches the 3-h time limit. The numerical results for the semi-coarse model are summarized in Table 8 in Appendix 3.

Table 3 Solution of the semi-coarse model (9.1) for restaurant
Fig. 6
figure 6

Solution time and number of simplex iterations for the semi-coarse model over a 10-year horizon

4.5 Solutions of the coarse model with long second-stage horizons

As shown in Fig. 5, the semi-coarse and coarse models have the same number of binary and continuous variables, and the semi-coarse model has about twice as many constraints as the coarse model. Therefore, one expects that both models can be solved with a similar amount of computational effort in terms of time and simplex iterations. Since we solve a sequence of coarse models, the total computational effort often exceeds that of solving the semi-coarse model. Because of the LP warm-start phase in Algorithm 1, however, we find that the first iterate of the coarse MILPs provides remarkably good approximate solutions of the semi-coarse model.

To illustrate this point, we show in Table 4 the solution history of the MILP phase in Algorithm 1. The MILP phase converges in three coarse MILP iterates, and the first iterate requires much more computational effort than other iterates; in particular, it accounts for \(91.5\,\%\) of total computational time. The relative gap of the objective value

$$\begin{aligned} \mu = (\mathrm{ObjVal}_\mathrm{semi} - \mathrm{ObjVal}_\mathrm{coar})/\mathrm{ObjVal}_\mathrm{semi} \end{aligned}$$
(4.1)

between the semi-coarse model and the coarse model at the first iterate is less than \(0.72\,\%\). Numerical results for the first iterate of coarse MILPs are summarized in Table 9 in Appendix 3.

Table 4 Problem size of MILP iterates for a 6-year model of Restaurant. The first iterate accounts for \(91.5\,\%\) of computational time, and the relative gap between the objective value defined in (4.1) is less than \(0.72\,\%\)

4.6 Effect of the LP warm-start

Figure 7 shows the number of constraints and the computational time during the LP warm-start phase. A large number of constraints are added in the first few (3–5) iterates, resulting in more computation time at the beginning of the algorithm. As the LP warm-start phase progresses, fewer constraints are violated, and each LP-relaxation becomes easier to solve because of the warm-start. The amount of time in the LP warm-start phase is a fraction of that with one MILP re-solve in the MILP phase.

We next compare the MILP phase of Algorithm 1 with and without the LP warm-start. Figure 8 shows the number of MILP re-solves and the computation time for both cases. With the LP warm-start, Algorithm 1 takes 5 MILP re-solves, as opposed to 15 MILP re-solves without the LP warm-start. Moreover, the MILP re-solves with LP warm-start is up to 20 times faster than MILPs without the LP warm-start. We note that the improvement of the LP warm-start phase is observed in all building examples.

Fig. 7
figure 7

Number of constraints (left panel) and the solution time (right panel) in the LP warm-start phase for the one-year retail example

Fig. 8
figure 8

Effect of LP warm-start for the one-year retail example. Algorithm 1 with LP warm-start phase (open circle) converges with fewer MILP iterations than without LP warm-start (filled circle). While more constraints are added to MILPs with LP warm-start (left panel), each MILP iterate is up to 20 times faster than MILPs without LP warm-start (right panel)

4.7 Solution quality of the semi-coarse model

The solution quality of the semi-coarse model is determined by the number of profiles for binary and continuous variables in the primal coarsening step (2.13). As discussed in Sect. 2.1, a larger number of profiles imply a less tightening semi-coarse model. Table 5 shows the problem size and the objective value of the semi-coarse model for School with the 84-day horizon. As the number of on/off profiles, K, grows, the objective value decreases monotonically. Here, the number of profiles for the continuous variables is set to be four for each on/off profile; that is, \(I_k = 4\) for \(k = 1,\ldots ,K\) in (2.4), and \(J = 4K\) in (2.5). When \(K = 20\), the relative difference between the objective value of the semi-coarse model and the full model is \( (\mathrm{ObjVal}_\mathrm{semi} - \mathrm{ObjVal}_\mathrm{full})/\mathrm{ObjVal}_\mathrm{full} = 18\,\%\). Table 10 in Appendix 3 shows a more detailed comparison for five buildings between the full model and the semi-coarse model with \(K=5\) for 28, 56, and 84 days.Footnote 1

Table 5 Solution quality and problem size of the semi-coarse model versus the number of on/off profiles K for School with 84-day horizon

We next examine the correlation between the solution quality of the semi-coarse model and the periodicity of the demand data. We perform the discrete Fourier transform of the yearly electricity demand, \(d_{\omega +1} = \sum _{t=0}^{N-1} e^{\frac{-2\pi \mathbf{j} t \omega }{N}} D_{t+1}\), where \(\mathbf{j} :=\sqrt{-1}\) and \(\omega =0,\ldots ,N-1\) with \(N = 24 \times 364 = 8736\). Since the semi-coarse model employs daily profiles with \(\delta = 24\), we are interested in those frequencies that correspond to periods with integer multiples of \(\delta \) in time. Let \(\varOmega :=\{ \omega \in \{0,\ldots ,N-1\}~|\mod (\bar{\omega },\omega ) = 0\}\), where \(\bar{\omega } = 364\) is the frequency corresponding to daily profiles with \(\delta = 24\), or specifically \(\varOmega = \{0,1,2,\ldots ,182,364\}\). We compute the spectral density at desired frequencies \(\varOmega \) divided by the total spectral density:

$$\begin{aligned} \phi :=\left( \,\, \sum _{\omega \in \varOmega } |d_\omega |^2 \right) \left( \,\, \sum _{\omega =0}^{N-1} |d_\omega |^2 \right) ^{-1}. \end{aligned}$$
(4.2)

Thus, a higher density ratio \(\phi \) implies a higher degree of data periodicity with respect to daily profiles.

Table 6 shows the spectral density ratio of buildings in a descending order. The Hospital has the highest density ratio, 0.99011, trailed by SuperMarket, Restaurant, School, and Retail with the lowest ratio, 0.96787. Table 6 also shows the objective value ratio between the full model and the semi-coarse model for 364 days. Here, we solve the full model with fixed first-stage solutions obtained from the semi-coarse model. The ranking of the density ratio correlates well with the solution quality of the semi-coarse model; in particular, a higher density ratio corresponds to a larger objective value ratio between the full model and the semi-coarse model. While we do not take into account coupling between time periods, numerical results in Table 6 suggest that periodic demand data with periods being integer multiple of \(\delta \) are a desired problem characteristic for our two-level approach.

Table 6 Correlation between spectral density ratio \(\phi \) in (4.2) of yearly demands and the solution quality of semi-coarse models
Fig. 9
figure 9

Relative difference between the objective value of the full model with and without fixing the first-stage solutions from the semi-coarse model

4.8 First-stage solutions

Since the first-stage solutions are of primary importance, we next compare them from the full MILP, the semi-coarse, and the coarse models. In Table 7, we see that the investment decisions at the first stage vary significantly with respect to the horizon length of the second-stage problem. (For the School example, the number of boilers jumps from 1 unit in the 4-day model to 8 units in the 28-day model, and the number of batteries drops from 4 units in the 28-day model to 1 unit in the 4-day model.) Thus, the first-stage solutions from a short-horizon problem are suboptimal for a long-horizon problem. This implies that one must solve long-horizon MILPs (7.1) as done for the semi-coarse and coarse models. In Table 8, the first-stage solutions are less sensitive to the horizon length at the second stage. Similarly, the first-stage solutions at the first iterate of the coarse MILPs are less sensitive to the horizon length; see Table 9.

To evaluate the solution quality, we re-solve the full model by fixing the first-stage solutions from the semi-coarse model. Figure 9 shows the relative difference of the objective value (i.e., the total cost from both the first and second stages) between the full model with and without fixing the first-stage solutions for five buildings with 28, 56, and 84 days in the second stage.Footnote 2 Note that the quality of first-stage solutions from the semi-coarse model varies with the building type. For SuperMarket and Restaurant, the relative difference of the objective value, \(\nu = (\hbox {ObjVal}_\mathrm{Fix} - \hbox {ObjVal}_\mathrm{NoFix})/\hbox {ObjVal}_\mathrm{NoFix}\), is \(\nu \le 3.8\,\%\), while \(\nu \le 2.7\,\%\) for Retail and \(\nu \le 1\,\%\) for School and Hospital. In other words, the first-stage solutions from the semi-coarse model is within \(1\,\%\) suboptimal for School and Hospital and is within \(3.8\,\%\) suboptimal for SuperMarket and Restaurant.

The difference of the solution quality between buildings can be understood by examining the number of cogeneration units and other numerical results summarized in Table 11. Note that the number of cogeneration units for Restaurant is much smaller than that for School; in particular, 3 battery units, 1 boiler unit, 1 CHP-SOFC unit, 1 Power-SOFC unit, and 1 storage unit for Restaurant, as opposed to 3 battery units, 8 boiler units, 5 CHP-SOFC units, 7 Power-SOFC units, and 8 storage units for School. The reason is that the electricity and heating demands of Restaurant, on average, are \(13\,\%\) of those of School; see Fig. 3. Therefore, a small deviation in the first-stage solution for Restaurant has a bigger impact, and hence a larger \(\nu \), than for School.

5 Conclusions

We study two-stage MILPs with more than 1 million binary variables in the second-stage problem. We develop a two-level approach by constructing semi-coarse models (coarsened with respect to variables) and coarse models (coarsened in both variables and constraints). We show that the semi-coarse model is guaranteed to provide a feasible solution of the original MILP and hence results in an upper bound on the optimal solution. We solve a sequence of coarse MILPs with aggregated constraints that converges to the same upper bound with a finite number of steps. Furthermore, we take advantage of the LP warm-start to reduce the number of MILP re-solves. Under the assumption of periodic problem data with no coupling between time periods, we show that the semi-coarse model provides the optimal solution to the original fine model. Under the additional assumption of a full-rank non-negative weight matrix in the constraint aggregation, we show that the solution of the coarse model is feasible to the semi-coarse model.

We apply our approach to the cogeneration problem for commercial buildings. We demonstrate the effectiveness of the two-level approach using building examples with simulation data. In particular, the two-level approach allows us to obtain good approximate solutions at a fraction of the time required for solving the original MILPs. Furthermore, we show that the two-level approach scales to large problems that are beyond the capacity of state-of-the-art commercial solvers.

A number of extensions are of interest in our future work. First, how should one add new profiles in the coarsened models? Currently, we select a number of prespecified profiles and fix them throughout the MILP re-solves. Since the solution quality of the coarsened models is determined by the number of profiles, it is of interest to add new profiles dynamically that are most promising in reducing the objective value. It seems that the reduced cost associated with profiles can be obtained by solving appropriate pricing problems like those in the column generation approach [4]. Further, it is also of interest to consider dynamic aggregations of constraints as new profiles are included. For set-partitioning constraints, a similar approach has been developed in the column generation framework [11]. We plan to explore this line of research.

Second, how can one obtain lower bounds on the optimal solution in the two-level framework? Since the coarse model is a relaxation of the semi-coarse model, which itself is a tightening of the full model, the coarse model is not a relaxation of the original MILP. While relaxing binary variables in the original MILP results in a lower bound, our numerical results indicate a sizable gap between this lower bound and the optimal solution. It is an open problem how to generate lower bounds using appropriate coarse MILPs.

Furthermore, advanced multilevel approaches for PDEs typically require several sweeps of fine/coarse levels in order to achieve accurate solutions [13]. It may be beneficial to iterate between solving coarse models to optimality and solving “partially” the fine model until a prespecified accuracy for the solution is achieved. We intend to extend our two-level framework in this direction.