Partially distributed outer approximation

Murray, Alexander; Faulwasser, Timm; Hagenmeyer, Veit; Villanueva, Mario E.; Houska, Boris

doi:10.1007/s10898-021-01015-0

Partially distributed outer approximation

Open access
Published: 17 April 2021

Volume 80, pages 523–550, (2021)
Cite this article

Download PDF

You have full access to this open access article

Journal of Global Optimization Aims and scope Submit manuscript

Partially distributed outer approximation

Download PDF

Alexander Murray ORCID: orcid.org/0000-0002-9436-0407¹,
Timm Faulwasser¹,
Veit Hagenmeyer¹,
Mario E. Villanueva² &
…
Boris Houska²

1151 Accesses
2 Citations
Explore all metrics

Abstract

This paper presents a novel partially distributed outer approximation algorithm, named PaDOA, for solving a class of structured mixed integer convex programming problems to global optimality. The proposed scheme uses an iterative outer approximation method for coupled mixed integer optimization problems with separable convex objective functions, affine coupling constraints, and compact domain. PaDOA proceeds by alternating between solving large-scale structured mixed-integer linear programming problems and partially decoupled mixed-integer nonlinear programming subproblems that comprise much fewer integer variables. We establish conditions under which PaDOA converges to global minimizers after a finite number of iterations and verify these properties with an application to thermostatically controlled loads and to mixed-integer regression.

Towards Multi-tree Methods for Large-Scale Global Optimization

Distributed primal outer approximation algorithm for sparse convex programming with separable structures

Article 24 December 2022

Alternative regularizations for Outer-Approximation algorithms for convex MINLP

Article 01 July 2022

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

A Mixed Integer Convex Program (MICP) is an optimization problem with convex objective and constraint functions, where the only non-convex constraint is that a subset of the optimization variables need to be integer-valued [8, 36]. MICPs arise in a plethora of application areas ranging from AC transmission expansion planning and robust power flow problems [1, 30], via thermal unit design and control [12], a variety of scheduling and layout design problems [53], design of multi-product batch plants [50], to obstacle avoidance and robotic motion planning problems [34].

Although MICPs are NP-hard in general, there exist a variety of algorithms for solving MICPs to global optimality [8]. State-of-the-art MICP solvers are based on tailored methods that exploit the fact that the integrality constraints are discrete while all other constraints are convex. Early attempts to develop tailored Branch and Bound (B&B) methods for MICP have been proposed in [24], mostly focussing on computational experiments and heuristics for selecting the branching variables and nodes. Improved versions of these early B&B methods for MICP can be found in [9, 35]. Other early methods for solving MICP include generalized Benders decomposition methods [5, 23], which are, however, less frequently used in state-of-the-art MICP solvers.^{Footnote 1}

Modern MICP implementations are often based on or related to Outer Approximation (OA), which goes back to Duran and Grossmann [16]. In contrast to B&B, OA alternates between solving Nonlinear Programs (NLP) with fixed integer values as well as Mixed Integer Linear Programs (MILP), which are constructed by linearizing the objective and constraint functions at the solutions of the NLP and which are used to update the integer variables. A notable extension of OA has been developed by Fletcher and Leyffer [20], who suggest to include curvature information in the relaxed integer program leading to a quadratic outer approximation method. A more recent approach to second order outer approximation is that of Kronqvist et al. [32]. Moreover, Kesavan and co-workers [28] have studied variants of OA for solving non-convex mixed integer problems. Another class of MICP methods are based on (extended) cutting plane methods [59] or combination of OA and branch-and-cut [49]; see also [56] for a general overview of polyhedral branch-and-cut methods. In recent years, there has been considerable progress in lift-and-project methods for MICP. An excellent overview and discussion of the state-of-the-art of such lift-and-project methods can be found in a recent article by Kilinç, Linderoth, and Luedtke [29].

Another recent trend in MICP solver development is the exploitation of separable structures by so-called extended formulations [25]. Here, the main idea is to introduce auxiliary variables in order to bound decoupled summands in additive expressions separately [58], which can lead to tighter polyhedral outer approximations. Such extended formulations have not only found their way into OA methods, as discussed in [25] and [33], but they can also be used to increase the performance of lift-and-project methods for MICP [29]. Also of note are several algorithms that exploit separability to decompose large-scale MINLP problems into MINLP subproblems based on block separability [41, 42, 45, 46]. However, extended formulations only exploit separability to construct tighter outer approximations, but neither existing OA methods nor state-of-the-art lift-and-project methods ever attempt to break a large-scale MICP into decoupled MICPs with fewer integer variables. This is in contrast to distributed continuous convex optimization methods, such as dual decomposition [19, 43], Alternating Direction Method of Multipliers (ADMM) [10, 17, 21], or Augmented Lagrangian based Alternating Direction Inexact Newton (ALADIN) methods [26], which can all be used to solve large-scale convex optimization problems to global optimality by alternating between solving small-scale convex optimization problems and sparse linear algebra operations. These methods typically require communication of the solutions of the decoupled problems between neighbours or to a central coordinator [10]. Although some researchers, [39, 55], have attempted to apply these distributed local optimization methods in a heuristic manner, these methods cannot find global minimizers of non-convex problems reliably. This is due to the fact that ADMM, ALADIN, or similar distributed convex optimization method typically rely on strong duality results for augmented Lagrangians [52, 54], which fail to hold in the presence of integrality constraints.

After reviewing extended formulations and related existing outer approximation methods in Sect. 2, the main contribution of this paper is presented in Sect. 3, which introduces a Partially Distributed Outer Approximation (PaDOA) method for MICPs with separable objective functions. In contrast to existing algorithms for structured MICP, PaDOA alternates between solving MICPs with fewer integer variables and large-scale MILPs. Section 3.5 discusses the global convergence properties of PaDOA, as summarized in Theorem 3. In this context, we additionally establish the fact that global optimality of a given feasible point of an MICP with N separable objectives and Nn optimization variables can be computationally verified by solving N partially-decoupled MICPs, each comprising at most n local integer variables, and one MILP with Nn integer variables. This result is summarized in Theorem 2, which analyzes one-step convergence conditions for PaDOA. In the sense that both MICPs as well as MILPs are NP hard in general [22, 40], this result is not in conflict with existing complexity results for mixed integer optimization problems. However, there are solvers such as CPLEX [27], Gurobi [47], and many others [13], which are specifically designed for MIQPs and MILPs. Thus, the fact that one can reduce the task of verifying global optimality of a feasible point of a separable MICP with coupled affine constraints to the task of solving one MILP of a comparable size and several smaller subproblems, is—at least from a computational perspective—an important contribution. Notice that Sects. 2 and 3 focus on the theoretical properties of PaDOA, and consequently, a high level abstract notation is used to present these theoretical, yet, at this stage, conceptual ideas. These derivations lead to a formal prove of convergence, but, in general, the performance of PaDOA is problem dependent and a partial distribution of operations can only possibly lead to practical improvements if the problems are sufficiently structured. This aspect is illustrated by applying the method two MICP benchmark case studies in Sect. 4. The developments in this paper are not of pure theoretical interest only, but actually lead to a practical framework that has potential for future investigation, as outlined in our conclusions in Sect. 5.

1.1 Problem formulation

The present paper is concerned with mixed integer optimization problems of the form

$$\begin{aligned} \begin{array}{rccl} V^\star &{}=&{} \underset{x \in X, z \in Z}{\min }&{} f(x,z) \\ &{} &{} \text {s.t.}&{} A x = b \end{array} \end{aligned}$$

(1)

with separable objective function $f(x,z) = \sum _{i=1}^N f_i(x_i,z_i)$ and separable constraint sets

$$\begin{aligned} \begin{array}{rclcrcl} X &{}=&{} X_1 \times \ldots \times X_N \quad &{}\text {with}&{} \quad X_i&{}\subseteq &{} {\mathbb {R}}^{n_i} \\ \text {and} \quad Z &{}=&{} Z_1 \times \ldots \times Z_N \quad &{}\text {with}&{} \quad Z_i &{}\subseteq &{} {\mathbb {Z}}^{m_i} \; . \end{array} \end{aligned}$$

(2)

The coupling matrix A and the vector b are assumed to be given. In this context, the following blanket assumption is used.

Assumption 1

The sets $X_1, X_2, \ldots , X_N$ are non-empty convex polytopes, the sets $Z_1, Z_2, \ldots , Z_N $ are non-empty and compact, and the functions $f_i$ are convex on the convex hull of $X_i \times Z_i$.

The goal of this paper is to develop an efficient algorithm that finds $\varepsilon $-suboptimal points of (1), which are defined as follows:

Definition 1

A feasible point $(x^\star ,z^\star ) \in X \times Z$ with $A x^\star = b$ is said to be an $\varepsilon $-suboptimal point of (1), with $\varepsilon > 0$, if

$$\begin{aligned} f(x^\star ,z^\star ) \le f(x,z) + \varepsilon \,. \end{aligned}$$

for all $(x,z) \in X \times Z$ with $A x = b$.

Remark 1

The theoretical developments in this paper assume—for simplicity of presentation—that all coupled constraints are linear equations in x. This is not restrictive in the sense that coupled convex inequalities can always be reformulated by introducing slack variables [4, 11, 38].

Remark 2

Instead of (1), one could also consider more general optimization problems of the form

$$\begin{aligned} \begin{array}{cl} \underset{x \in X, z \in Z}{\min }&{} f(x,z) \\ \text {s.t.}&{} A x + B z = b \; . \end{array} \end{aligned}$$

(3)

However, under mild regularity assumptions [44], this problem is equivalent to

$$\begin{aligned} \begin{array}{cl} \underset{x \in X, y \in \mathrm {conv}\left( Z \right) , z \in Z}{\min }&{} \sum \limits _{i=1}^N \{ f_i(x_i,z_i) + {\bar{\lambda }}_i \Vert y_i - z_i \Vert _1 \} \\ \text {s.t.}&{} A x + B y = b \; . \end{array} \end{aligned}$$

(4)

with real-valued auxiliary variables y and $L_1$-penalty parameters ${\overline{\lambda }}_i \gg 0$. Here, $\mathrm {conv}\left( Z \right) $ denotes the convex hull of Z. Thus, for all theoretical purposes, it is sufficient to analyze problems of the form (1), where only the real-valued variables are coupled.

Remark 3

Modern MICP formulations, algorithms, and software can deal with rather general convex conic constraints [36]. Such constraints are left out for simplicity of presentation. Nevertheless all results in this paper can be easily extended for general conic constraints, as long as they are separable. From a purely theoretical perspective, one might argue that this can always be achieved by adding suitable convex penalty functions to the functions $f_i$, because this paper makes no assumptions on the differentiability properties of f. However, more tailored, practical algorithms that could exploit the structures of particular conic constraints are beyond the scope of this paper.

1.2 Notation

We use the notation

$$\begin{aligned} \partial _x g(x) = \left\{ a \in {\mathbb {R}}^n \mid \forall y \in {\mathbb {R}}^{n}, \; g(y) \ge g(x) + a^\mathsf {T}( y - x ) \right\} \end{aligned}$$

to denote the set of subgradients of a convex function $g: \mathbb R^{n} \rightarrow {\mathbb {R}}$ with respect to the variable x.

2 Outer approximation

The following section reviews OA by using a slightly more abstract viewpoint that is based on modern convex analysis notation. As mentioned in the introduction, OA has been introduced in the 1980s by Duran and Grossmann [16] using different notation, but the following section presents the method in a more general setting—also including Hijazi’s variants and extentions of OA—in order to prepare the developments in Sect. 3.

2.1 Polyhedral relaxations

In order to construct polyhedral outer approximations of the epigraph of the objective function f of (1), we consider the auxiliary optimization problem

$$\begin{aligned} f^\star (z) = \min _{x,y} \; f(x,z) \quad \mathrm {s.t.} \quad \left\{ \begin{array}{ll} x = y &{} \mid \; \lambda \\ A y = b \\ y \in X \; . \end{array} \right. \end{aligned}$$

(5)

for a fixed integer parameter $z \in Z$. Here, x and y are real valued primal optimization variables and the notation “$x = y \; \mid \lambda $” is used to say that $\lambda $ denotes the dual variable associated with the constraint $x=y$.

Proposition 1

If Assumption 1 is satisfied, strong duality holds for (5), i.e., we have

$$\begin{aligned} f^\star (z) = \max _{\lambda } \; \min _{x,y} \; f(x,z) + \lambda ^\mathsf {T}(y-x) \quad \mathrm {s.t.} \quad \left\{ \begin{array}{l} A y = b \\ y \in X \end{array} \right. \end{aligned}$$

(6)

for all $z \in Z$.

Proof

See “Appendix 1”. $\square $

Let $x^\star (z),y^\star (z),\lambda ^\star (z)$ denote any primal-dual solution of (5) in dependence on z. By writing out the optimality condition of (5) with respect to x, we find that

$$\begin{aligned} \lambda ^\star (z) \in \partial _x f( x^\star (z), z ) , \end{aligned}$$

i.e., $\lambda ^\star (z)$ must be a subgradient of f at the optimal solution of (5). For the developments given below it is helpful to keep in mind that the reverse statement is not correct, i.e., a subgradient of f at $(x^\star (z), z)$ is not necessarily a dual solution of (5).

In contrast to the particular choice of the subgradient $\lambda ^\star $ of f with respect to x, the construction of a subgradient of f with respect to z is less critical for the construction of outer approximation methods.^{Footnote 2} In the following, we assume that a function

$$\begin{aligned} \mu ^\star (z) \in \partial _z f( x^\star (z), z ) , \end{aligned}$$

is given, which returns a subgradient of f with respect to z at the optimal solution of (5). Because $f(x,z) = \sum _{i=1}^N f_i(x_i,z_i)$ is separable, the ith block components, $\lambda _i^\star (z)$ and $\mu _i^\star (z)$, of the subgradients of f are subgradients of $f_i$. Thus, the inequality

$$\begin{aligned} f_i(x_i, z_i) \ge f_i^\star ( {\hat{z}} ) + \left[ \lambda _i^\star ({\hat{z}}) \right] ^\mathsf {T}( x_i - x_i^\star ({\hat{z}}) ) + \left[ \mu _i^\star ({\hat{z}}) \right] ^\mathsf {T}( z_i - {\hat{z}}_i ) \end{aligned}$$

holds for all $x_i \in X_i$, and $z_i \in Z_i$ and all ${\hat{z}} \in Z$. Here, the shorthand

$$\begin{aligned} f_i^\star ( {\hat{z}} ) = f_i(x_i^\star ( {\hat{z}}), {\hat{z}}_i) \end{aligned}$$

is used. In this context, it is important to notice that the function $f_i^\star ({\hat{z}})$ depends on the whole vector ${\hat{z}}$, not only on its ith component, ${\hat{z}}_i$, because the equality constraints in (6) introduce a non-trivial coupling.

More generally, if $\varPi \subseteq Z_i$ denotes finite set of points in Z, we associate with $\varPi $ a set of hyperplane coefficients

$$\begin{aligned} {\mathcal {H}}_i( \varPi ) = \left\{ (\alpha ,\beta ,\gamma ) \left| \begin{array}{l} z \in \varPi \\ \alpha = \lambda _i^\star (z) \\ \beta = \mu _i^\star (z) \\ \gamma = f_i^\star (z) - \alpha ^\mathsf {T}x_i^\star (z) - \beta ^\mathsf {T}z_i \end{array} \right. \right\} \; . \end{aligned}$$

(7)

Notice that this set of hyperplane coefficients defines a polyhedral outer approximation of the epigraph of $f_i$. Thus, these coefficients can be used to construct a piecewise affine lower bound on $f_i$, which is for all $(x_i,z_i) \in X_i \times Z_i$ given by

$$\begin{aligned} \varPhi _{i}(x_i,z_i,\varPi ) = \max _{(\alpha ,\beta ,\gamma ) \in H_{i}(\varPi )} \left\{ \alpha ^\mathsf {T}x_i + \beta ^\mathsf {T}z_i + \gamma \right\} \; . \end{aligned}$$

(8)

Finally, we can construct the function

$$\begin{aligned} \varPhi (x,z,\varPi ) = \sum _{i=1}^N \varPhi _{i}(x_i,z_i,\varPi )\;. \end{aligned}$$

This function is—by construction—a piecewise affine lower bound on f,

$$\begin{aligned} \forall (x,z) \in X \times Z, \qquad \varPhi (x,z,\varPi ) \le f(x,z) \; . \end{aligned}$$

(9)

Next, our particular choice of the subgradient $\lambda ^\star (z)$ of f as the dual solution of (5) enables us to establish the following tightness property of the affine lower bound $\varPhi $.

Lemma 1

Let $\varPi \subseteq Z$ be any finite set of points. If Assumption 1 holds, then

$$\begin{aligned} f^\star (z) = \min _{x \in X} \; \varPhi (x,z,\varPi ) \quad \mathrm {s.t.} \quad A x = b \; . \end{aligned}$$

(10)

holds for all $z \in \varPi $.

Proof

See “Appendix 2”. $\square $

Note that Lemma 1 can be viewed as a special case of Theorem 2.3 in [23].

Remark 4

If the set $\varPi $ consists of m points, the computational cost for constructing the lower bound (9) of f has order $\mathbf {O}(m N)$. Notice that if we would have ignored the separable structure of f, the computational cost of computing the same lower bounding function would have been of order $\mathbf {O}(m^N)$. Thus, the construction of (9) as a sum of the lower bounds of the separable objective function is much cheaper than a direct construction of lower bounds of f. This reduction in complexity has for the first time been observed and exploited by Hijazi and co-workers [25]. By now, the expoitation of separability via extended formulations can be considered as a standard that has been adopted in many modern MICP algorithms and software tools [29, 36, 37].

2.2 Outer approximation algorithm

Algorithm 1 outlines the main steps of the outer approximation algorithm. Notice that this algorithm basically coincides with the original outer approximation algorithm that has been proposed in [16]. The only notable differences of Algorithm 1 compared to traditional OA are that the MILP in Step 3 uses the extended formulation based outer approximation variant from [25]. Moreover, because we do not assume that f is differentiable, we have to use the particular choice, $\lambda ^\star (z)$, of the subgradient, which is found as the dual solution^{Footnote 3} of (5).

Notice that Step 1 of Algorithm 1 solves (1) under the additional constraint that the integer z is fixed. This implies that

$$\begin{aligned} f(x^\star ,z) \ge V^\star \end{aligned}$$

is an upper bound on the optimal objective value $V^\star $ of (1). Thus, the current upper bound U can be updated in Step 2. Moreover, the MILP (20) is by construction a relaxation of (1) which implies that the solution of Step 3, $\sum _{i=1}^N y_i^+$, is a lower bound on the optimal objective value $V^\star $. Thus, the difference, $U - \sum _{i=1}^N y_i^+$, between the current upper and lower bounds can be used as a termination criterion, which is implemented in Step 3 of Algorithm 1. The following finite termination result for outer approximation is (at least in very similar versions) well-known in the literature [16, 36].

Theorem 1

If Assumption 1 is satisfied, then Algorithm 1 terminates after a finite number of iterations.

Proof

See “Appendix 3”. $\square $

Remark 5

Algorithm 1 uses Hijazi’s extended formulation [25] for constructing the MILPs (20), which arguably exploits separability of the objective function to some extent. However, Algorithm 1 is not a fully distributed algorithm. In fact, a major disadvantage of Algorithm 1 becomes apparent, if one considers the special case that $A=0$ and $b=0$. In this case, the optimal solution of (1) could have been found with much less effort by solving the separable MICPs,

$$\begin{aligned} \min _{x_i \in X_i,z_i \in Z_i} f_i(x_i,z_i) \end{aligned}$$

which have much fewer integer variables. However, if Algorithm 1 is applied to such a problem with redundant equality constraint, this property is not detected and a large number of large-scale NLPs and large scale MILPs might have to be solved instead, until convergence is achieved. The goal of this paper is to mitigate this limitation of Algorithm 1 by proposing a partially distributed outer approximation algorithm that exploits the structure of the separable objective in a better way.

3 Partially distributed outer approximation algorithm

This section introduces a partially distributed outer approximation optimization algorithm for finding $\varepsilon $-suboptimal solutions of (1).

3.1 Partially decoupled upper bounds

The main idea of many distributed convex and local optimization methods is to solve a set of smaller-scale decoupled optimization problems in place of a single large one [5, 10, 14]. Similarly, consider partially decoupled optimization problems of the form

$$\begin{aligned} V_k(z) =&\underset{x,y,\zeta _k}{\min }&f_k(x_k,\zeta _k) + \varPsi _k(x,z) \nonumber \\&\text {s.t.}&\left\{ \begin{array}{ll} x = y &{} \mid \; \lambda \\ A y = b \\ y \in X \\ \zeta _k \in Z_k \end{array} \right. \end{aligned}$$

(13)

for $k \in \{ 1, \ldots , N \}$. Note that Problem (13) regards only the local integer variables $\zeta _k \in Z_k$ as optimization variables, while all other integers are fixed. However, concerning the real-valued variables, the whole vector $x \in X$ is kept as an optimization variable. In this context, the shorthands

$$\begin{aligned} \forall (x,z) \in X \times Z, \qquad \varPsi _k(x,z) = \underset{j \ne k}{\sum } f_j(x_j,z_j) \end{aligned}$$

are introduced in order to keep the x-dependence of the remaining summands, i.e., all objective terms whose index is not equal to k. As in the previous section, $\lambda $ denotes the dual solution that is associated with the consensus constraint “$x = y$”. Because the constraint $\zeta _k \in Z_k$ enforces integrality, strong duality of (13) does not hold in general. However, if $\zeta _k^\star (z)$ denotes an optimal solution of (13) for the integer variable and if Assumption 1 holds, we still have

$$\begin{aligned} V_k(z) = \max _{\lambda }&\underset{x,y}{\min }&\; \; f_k(x_k,\zeta _k^\star (z)) + \varPsi _k(x,z) + \lambda ^\mathsf {T}(y-x) \nonumber \\&\text {s.t.}&\; \; \left\{ \begin{array}{l} A y = b \\ y \in X \; . \end{array} \right. \end{aligned}$$

(14)

The proof of this statement is completely analogous to Proposition 1, i.e., if the linear coupling constraint $Ax =b$ has a solution in X, a maximizer of (14) exists and can be used to define a suitable subgradient. Also note that the functions $V_k$ yield upper bounds on the objective value of (1),

$$\begin{aligned} \forall z \in Z, \qquad \min _{k} \, V_k(z) \ge V^\star \end{aligned}$$

(15)

At this point, it should be mentioned that one basic assumption of the algorithmic developments in this paper is that the complexity of the mixed integer optimization problems of interest depends mostly on the number of integer variables. This is in contrast to the number of real-valued variables, which may be assumed to have a negligible influence on the overall complexity of the mixed-integer optimization problem. In other words, we assume that (13) is much easier to solve than (1) in the sense that it contains much fewer integer variables, although both problems have the same number of real-valued variables. Here, it is important to keep in mind that, although the algorithmic developments in this paper are inspired by the field of distributed optimization, the algorithm in this paper is (at least in the form in which we present and analyze it) not fully distributed. This is because solving (13) requires the evaluation of the function $\varPsi _k$, which, in turn, requires the evaluation of all functions $f_j$ with $j \ne k$.

3.2 Partially decoupled lower bounds

In this paper, we suggest to solve the decoupled MICPs (13) by lower level solvers that implement the traditional outer approximation algorithm that has been reviewed in Sect. 2.2. Notice that if Assumption 1 is satisfied, strong duality holds, i.e., these lower level solvers will return piecewise affine models

$$\begin{aligned} \varTheta _k^\star : X \times Z_k \rightarrow {\mathbb {R}} , \end{aligned}$$

which must satisfy the condition

$$\begin{aligned} V_k(z) - \epsilon _{\mathrm {L}}\; \le \min _{x \in X,\zeta \in Z_k} \varTheta _k^\star (x,\zeta ) \quad \mathrm {s.t.} \quad A x = b \end{aligned}$$

(16)

upon termination. Here, $\epsilon _{\mathrm {L}}\ge 0$ denotes the numerical tolerance of the lower level OA solvers. Notice that the optimization problem on the right hand of (16) corresponds to the last MILP relaxation that is solved by the lower level OA solver. In practice the function $\varTheta _k^\star $ can be stored by maintaining a set of hyperplane coefficients as explained in detail in the previous section.

The main idea of partially distributed outer approximation is to communicate the piecewise lower bounding functions $\varTheta _k^\star $ to a central coordinator, who constructs a piecewise affine lower bound on the function f, solves a master MILP problem, and updates z. Here, one option is use the maximum over the function $\varTheta _k^\star $ in order to obtain the lower bound

$$\begin{aligned} \forall x \in X, \; \forall z \in Z, \qquad \max _{k} \; \varTheta _k^\star (x,z_k) \; \le \; f(x,z) \; . \end{aligned}$$

(17)

However, in order to arrive at a practical implementation, it is recommendable to further refine this bound. This can be done by maintaining a collection of integers, $\varPi \subseteq Z$, such that the function

$$\begin{aligned} \varTheta (x,z) = \max \left\{ \; \varPhi (x,z,\varPi ) , \; \max _{k} \; \varTheta _k^\star (x,z_k) \; \right\} , \end{aligned}$$

(18)

can be used as a piecewise affine lower bound on f. Recall that the function $\varPhi $, which has been introduced in the previous section, exploits the separability properties of f. The integer collection $\varPi $ is then maintained by updating

$$\begin{aligned} \varPi \leftarrow \varPi \cup \{ \zeta ^\star \} , \end{aligned}$$

where $\zeta ^\star = [ \zeta _1^\star , \zeta _2^\star , \ldots , \zeta _N^\star ]$ is an integer vector, whose components are optimal solutions for the integer variables of the partially decoupled problems (13).

3.3 Partially distributed outer approximation (PaDOA)

Algorithm 2 outlines a partially distributed algorithm for solving (1). There are four main steps. In the first step, the partially decoupled MICPs of the form (13) are solved by using a traditional outer approximation method. Under the assumption that the original MICP (1) is feasible, the partially decoupled MICPs are feasible, too. Thus, the outer approximation solvers will return optimal integer solutions $\zeta _k^\star $ and associated piecewise affine lower bounds $\varTheta _k^\star $ such that (16) is satisfied. The second step of Algorithm 2 updates the associated upper bound U based on the inequality (15) as well as the piecewise affine lower bound. In practice, this step is implemented by storing the union of all supporting hyperplane coefficients that are needed to represent $\varTheta $. Finally, the third step of Algorithm 2 solves a large scale MILP problem. This MILP is constructed in analogy to the corresponding step in the traditional outer approximation algorithm. It yields a lower bound,

$$\begin{aligned} \varTheta (x^+,z^+) \le V^\star , \end{aligned}$$

on the objective value $V^\star $ of (1). Thus, the difference between the current upper and lower bounds,

$$\begin{aligned} U - \varTheta (x^+,z^+) , \end{aligned}$$

can be used as a termination criterion, which is implemented in the fourth step of Algorithm 2. If the termination is not successful, the integer variables z are updated, and the algorithm subsequently proceeds to the next iteration.

Notice that the main difference between Algorithms 1 and 2 is the introduction of partially decoupled MICP problems that can be solved separately and which contain much fewer integer variables than the original MICP (1). The theoretical results in Sect. 3.5 will elaborate further on the benefits of this alternation strategy. Moreover, in Sect. 4 a numerical case study is examined, which illustrates the practical advantages of Algorithm 2.

Remark 6

For the special case that the functions $f_i$ are convex and piecewise affine, the original optimization problem is an MILP, which could also be solved directly by using an MILP solver. If one uses Algorithm 2 instead, this algorithms solves an MILP by solving a sequence of MILPs—a strategy that may seem rather counter-intuitive on the first view. However, one interesting observation of this paper is that Algorithm 2 can even for (sufficiently structured) MILPs help to speed up the convergence process, because the complexity of the MILPs in Step 4 of Algorithm 2 depends on the complexity of the current lower bound $\varTheta $ and eventually is much easier to solve than the original MILP. This effect is illustrated by numerical experiments in Sect. 4.2.

3.4 Relation to distributed local optimization methods

The idea to “augment” the local objective functions $f_i$ with a suitable function $\varPsi _i$ is frequently used in the context of distributed local optimization algorithms. For example, in the context of dual decomposition [19, 43], one augments the separable functions $f_i$ with linear functions of the form^{Footnote 4}

$$\begin{aligned} \varPsi _i(x) = \sigma ^\mathsf {T}A x , \end{aligned}$$

where $\sigma $ is the current dual iterate. Similarly, in the context of ADMM or ALADIN, one uses augmented Lagrangians [3, 48], as in

$$\begin{aligned} \varPsi _i(x,y) = \sigma ^\mathsf {T}A x + \frac{\rho }{2} \Vert x - y \Vert ^2 \;, \end{aligned}$$

(21)

where z and $\sigma $ are the current primal and dual iterates; see [10, 17, 26]. In fact, the construction of Algorithm 2 is inspired by the distributed local nonlinear programming method ALADIN. Here, we recall that ALADIN alternates between solving small-scale decoupled NLPs that are augmented by terms of the form (21) and large scale equality constrained quadratic programming problems that update $\sigma $ and y [26]. This is in analogy to Algorithm 2, which alternates between solving decoupled MICPs (Step 1) and large-scale coupled MILPs (Step 3). However, unlike ALADIN, augmented Lagrangians are not used in Algorithm 2 as Lagrange multipliers in integer programming are not related to sensitivity and generally not applicable.

Also note that the construction of the functions $\varPsi _i$ in Algorithm 2 also has similarities with Gauss-Seidel or more general block-coordinate descent methods [57, 60] in the sense that a partial decoupling is obtained by fixing some of the integer variables while others are optimized. However, despite all these analogies and similarities of Algorithm 2 with methods from the field of local and convex optimization, we would like to highlight that all these existing distributed optimization methods are not reliably applicable to Problem (1) [55].

3.5 Convergence analysis

In this section we provide a concise overview of the convergence properties of Algorithm 2. The following theorem establishes one of the main results of this paper, namely, that Algorithm 2 converges after one iteration if the integer iterate, z, is initialized with an optimal solution of (1). This is in contrast to Algorithm 1, which does not necessarily terminate after a small number of steps—not even if it is initialized at an optimal solution.

Theorem 2

Let Assumption 1 be satisfied and let $(x^\star ,z^\star )$ be a minimizer of (1). If Algorithm 2 is initialized with $z = z^\star $ and if the termination tolerances of the lower level solvers satisfy $\epsilon _{\mathrm {L}}\le \epsilon $, then the termination criterion in Step 4 is satisfied. In other words, the algorithm terminates after one step.

Proof

See “Appendix 4”. $\square $

Notice that the statement of the above theorem is of fundamental relevance and a very favorable property of PaDOA. If we work with other global optimization methods, say branch-and-bound, an empirical observation is that such existing global optimization algorithm often find a global solution early on but then keep on iterating until the lower bound is accurate enough to prove global optimality. In contrast to this, PaDOA terminates as soon as a global minimizer is added to the collection $\varPi $. In fact, Theorem 2 implies that global optimality of a point $z^\star \in Z$ can be verified by solving the N instances of the partially decoupled MICPs and the master MILP (20). Notice that this result is not in conflict with existing results from the field of complexity theory, because the master MILP (20) remains NP-hard [22, 40].

Remark 7

The result of Theorem 2 relies heavily on the convexity of the functions $f_i$ on the convex hull of $X_i \times Z_i$, although this fact is not highlighted explicitly in the proof. This convexity assumption is first of all required implicitly by our assumption that the lower level solvers return piecewise level models $\varTheta _k$, which satisfy the termination condition (16) (this assumption is only reasonable if strong duality holds) and which need to be global lower bounds on f. These properties are in general all not satisfied if one considers more general non-convex MINLPs.

The following theorem establishes the fact that Algorithm 2 converges after a finite number of iterations under exactly the same conditions under which convergence of Algorithm 1 can be established.

Theorem 3

Let Assumption 1 be satisfied. If the termination tolerances of the lower level solvers satisfy $\epsilon _{\mathrm {L}}\le \epsilon $, then Algorithm 2 terminates after a finite number of steps (independently of the initialization).

Proof

See “Appendix 5”. $\square $

4 Implementation and case study

This section presents two benchmark problems to assess the computational performance of Algorithm 2. The first aims to provide a cost-optimal heater/cooler activation scheme such that a set of temperatures remain within pre-set limits. The objective is formulatable as an MILP, MIQP, or higher order MICP allowing for testing on a broad class of problems. Furthermore, each problem instance is scalable both vertically (in time horizon/subproblem size) and horizontally (number of regions/number of subproblems), which allows for an examination of the benefit which may come from the parallelizable Step 1 of Algorithm 2.

A second benchmark example comes in the form of a mixed-integer lasso problem. This is used to demonstrate the versatility of Algorithm 2 and verify the results of the first benchmark problem given the presence of an $L_1$-norm. Like the first benchmark problem, it is also scalable both vertically and horizontally, and the results for ten unique instances of the problem are given. The Partially Distributed Outer Approximation method is implemented in MATLAB R2017b. The optimization subproblems are solved using Gurobi [47] implemented via CasADi v1.9.0 [2]. All numerical experiments were run on a 2.9 GHz Intel Core i5-4460S CPU with 8 GB of RAM.

4.1 Thermostatically controlled loads

An important problem in the planning and operation of a heating and/or cooling system is the scheduling of so-called Thermostatically Controlled Loads (TCLs) [31]. These are devices that are used to regulate the temperature of a room/building within a certain user-defined interval known as a “deadband”. The optimal operation strategy is especially difficult to determine when a non-constant cost function is introduced for a population of heterogeneous TCLs [61]. The cost function may represent the cost of electricity or user-discomfort from noise generation. Regardless, such devices typically only have an “on” and an “off” setting and thus the resulting scheduling problem can be formulated as a binary MIP for R regions with a discretized time horizon H, consisting of hour-long time steps.

$$\begin{aligned}&\min \limits _{T(\cdot ),u(\cdot )} \sum \limits _{i=1}^R\sum \limits _{t=0}^{H-1} c(t)u(t) + \gamma (T_i(t) - T_{i,ref}(t))^2, \end{aligned}$$

(22a)

$$\begin{aligned}&\text {subject to} \forall i\in \{1,\ldots ,R\}, \nonumber \\&\underline{T}_i \le T_i(t) \le \overline{T}_i, \quad \forall t\in \{0,\ldots ,H\} \end{aligned}$$

(22b)

$$\begin{aligned}&u_i(t)\in \{0,1\}, \forall t\in \{0,\ldots ,H-1\} \end{aligned}$$

(22c)

$$\begin{aligned}&\forall t\in \{0,\ldots ,H\},\nonumber \\&T_i(t+1)=T_i(t)+b_i u_i(t)+a_i\left( \frac{T_i(t)+T_{amb}(t)+\sum _{j \in N(i)}T_j(t)}{|N(i)|+2} - T_i(t)\right) , \end{aligned}$$

(22d)

where c(t) is the vector of device costs at time t, $\gamma $ is a comfort parameter, $\underline{T}_i$ and $\overline{T}_i$ are the deadband temperature limits of device i, $a_i$ and $b_i$ are heat transfer parameters, $T_{amb}(t)$ is the ambient temperature at time t and N(i) are the number of regions neighbouring i. Equation (22d) models the thermodynamics of each room in a simplified manner, i.e., it takes an average of the current and surrounding temperatures to update the temperature of the next time step. This formulation results in $H+1$ real-valued and H binary variables per region. Figure 1 shows two possible initial configurations of (22). The ambient temperature is taken from [15] for two days in June 2017 in the Karlsruhe (Germany) area, with a reference temperature $T_{i,ref} = 20^{\deg }$ for each room. High prices of 25.67 cents/kWh are set from 2pm to 8pm (time steps 6 to 12 and 29 to 35) with low and medium prices of 2.46 cents/kWh and 4.62 cents/kWh in all other time steps. Each region is initialized at 20 degrees with $a_i=0.2$ and $b_i=-2$. As the temperature for each room is more influenced by past temperatures than neighbouring rooms, Problem (22) is partitioned spatially for Algorithm 2, with each room in its own partition.

4.2 Results for MILP

If the comfort parameter $\gamma $ is taken to be zero then Problem (22) is linear and separable but coupled in both its discrete and real-valued variables. Shown in Tables 1 and 2 are the simulation results for each configuration, respectively. The results of Algorithm 2 are compared with results obtained from the MIP solver Bonmin with the Branch and Bound (B-BB) and Outer Approximation (B-OA) algorithm settings [7] as well as the commercial MIQP solvers Gurobi and CPLEX [27]. An example solution for the 3 room case is depicted in Fig. 2.^{Footnote 5}

Table 1 Results obtained for the TCL problem with a 3-room configuration

Full size table

Table 2 Results obtained for the TCL problem with a 4-room configuration

Full size table

Table 3 Results obtained for the TCL problem with a linear room configuration

Full size table

At first glance, the results from Tables 1 and 2 may seem surprising since the 4 room case has more space to keep cool but nonetheless is able to do so at a lower cost than the 3 room case. This is due to an insulation effect that the 4 room configuration enjoys. With the activation of two coolers in the first six time steps, the room temperatures can stay within their deadbands for the entire 48 h period. In contrast, the 3 room configuration is more susceptible to the ambient temperature and requires more use of the coolers. This also seems to have increased the computational complexity of the problem and requires more time for the 3 room case to be solved than the 4 room case. It should be noted that several initializations were tested and the solution times were not significantly affected, implying that this was not the cause of the runtime differences in the two cases.

One of the advantages of using a distributed method is the ability to solve problems that would be otherwise intractable for a centralized solver. Tables 1 and 2 show results for cases containing up to 496 variables, but even larger problems may be considered. Table 3 shows results for a variety of time horizons and rooms. Here, the room configuration is instead arranged such that the rooms are in a line. While unrealistic for most buildings, this setup is realistic for the temperature control of a train or rooms next to a corridor. Mathematically, this example differs somewhat from the other two. While the other problems have a significant amount of coupling between the control variables, this is not the case for the linear room configuration. The sparsity induced by this problem structure allows Algorithm 2 to outperform both Gurobi and CPLEX (applied to the centralized problem).

4.3 Results for MIQP

If the comfort parameter $\gamma $ is larger than zero then Problem (22) becomes a convex MIQP.^{Footnote 6} As in Sect. 4.2, the results of Algorithm 2 for each room configuration are compared with those obtained from Bonmin, Gurobi, and CPLEX. The value of $\gamma $ was chosen to be one to allow for an equal weighting of comfort and cost. These results are displayed in Tables 4, 5, and 6.

Table 4 Results obtained for Problem (22) with a 3-room configuration

Full size table

Table 5 Results obtained for Problem (22) with a 4-room configuration

Full size table

Table 6 Results obtained for Problem (22) with a linear 7 room configuration

Full size table

Shown in Fig. 3 are the trajectories obtained for the three-room scenario with a temperature deviation penalization. In contrast to Fig. 2, a quadratic penalty term is used to model discomfort caused by deviations from the set temperature. Indeed, the solution with $\gamma =1$ yields a trajectory with a similar number of activations as when $\gamma =0$ but with temperature trajectories that stay much closer to the middle of the deadband.

4.4 Higher Order Convex Problems

One of the advantages of the proposed algorithm is that it is applicable to a relatively large class of problems (namely, MICPs). While Sect. 4.3 shows favourable results for both Gurobi and CPLEX, if the problem were adjusted slightly such that it were no longer an MIQP then these solvers would no longer be applicable. For example, if the objective function of Problem (22) became

$$\begin{aligned} \min \limits _{T(\cdot ),u(\cdot )} \sum \limits _{t=0}^{H-1} c(t)u(t) + \gamma (T_i(t) - T_{ref}(t))^4, \end{aligned}$$

then this would still be solvable via PaDOA, but not Gurobi or CPLEX. However, Bonmin can still be applied.^{Footnote 7} The results for a variety of such problem configurations are shown below in Table 7. Therein it can be observed that Algorithm 2 returns the same, global solution as Bonmin with the B&B sub-algorithm, and does so in less time. The runtime difference is particularly striking for the 7 room scenarios as these contain the most variables and have the greatest potential for parallelization.

Table 7 Results obtained for Problem (22), but with a 4th order objective function

Full size table

4.5 Discussion of TCL Benchmark Results

Algorithm 2 is able to converge quite quickly for the MILP instance of (22) since it is underapproximating a linear problem with linear functions and thus can prove optimality quickly and with few iterations. More supporting hyperplanes are needed in the MIQP case and thus we see a couple more iterations and a slightly longer runtime for Algorithm 2. As shown in Figs. 4 and 5, the upper and lower bound converge quite quickly for the MIQP and MINLP cases, but require several iterations to refine the solution.

Interestingly, the MIQP cases is generally obtained more quickly than the MILP case, even though one might expect the higher order problem to be more difficult to solve. The quadratic term should make the MIQP instance of Problem (22) more computationally difficult than the MILP instance, but in some cases both Gurobi and CPLEX actually require less time. It may be the case that the MILP instance of (22) is more poorly conditioned than the MIQP instance, and thus the centralized solvers require much more effort to guarantee the optimality of a given solution. This may imply that quadratic underapproximating functions and an MIQP coupling step could be more efficient than the current affine implementation. This would also allow for Algorithm 2 to be applicable to a wider class of problems.

It is also worth noting that the majority of the runtime used by Algorithm 2 for the MIQP problem instance is spent in the MILP coupling step. In contrast, the majority of the time spent solving the 4th-order problem is in the MINLP subproblems, and the MILP coupling problems are solved relatively quickly. A breakdown of the runtime per iteration for both of these cases is reported in Tables 8 and 9.

Table 8 Runtime breakdown of Algorithm 2 applied to the 2nd-order version of Problem 22 with 7 room TCL problem with 48 time steps

Full size table

Table 9 Runtime breakdown of Algorithm 2 applied to the 4th-order version of Problem 22 with 4 rooms and 8 time steps

Full size table

4.6 Mixed-Integer Lasso

As a second benchmark example we consider a mixed-integer lasso problem based on Problem (11.1) from [10]. The details of this problem are summarized below for $n+N$ features and M training samples:

$$\begin{aligned}&\min \limits _{w,v} \sum _{j=1}^M e^{(-a_j^{\top }w-b_j^{\top }z-c_ijv)} + \lambda (||w||_1 + ||z||_1), \end{aligned}$$

(23a)

$$\begin{aligned}&\text {subject to:} \nonumber \\&\underline{w}\le w,v \le \overline{w}, \end{aligned}$$

(23b)

$$\begin{aligned}&z \in \mathcal {Z} \subset \mathbb {Z}^N, \end{aligned}$$

(23c)

where $w\in \mathbb {R}^n$, $v\in \mathbb {R}$. As in [10], the features consist of the tuple $(a_j,b_j,c_j)$ where $a_j\in \mathbb {R}^n$, $b_j\in \mathbb {R}^N$, and $c_j\in \{-1,1\}$. The regularization parameter $\lambda $ is chosen as described in [10]. Assuming that $M\gg N$, Problem (23) can be decomposed into a collection of smaller problems of the form (23) with appropriate consensus constraints. Given the structure of Algorithm 2, it is logical to decompose (23) such that only one integer variable is present per subproblem:

$$\begin{aligned}&\min \limits _{w,v} \sum _{i=1}^N\sum _{j=1}^{M_i} e^{(-a_j^{\top }w_i-b_jz_i-c_jv_i)} + \lambda (||w_i||_1 + ||z_i||_1), \end{aligned}$$

(24a)

$$\begin{aligned}&\text {subject to } \forall i \in \{1,\ldots ,N\}:\nonumber \\&\underline{w}_i\le w_i,v_i \le \overline{w}_i, \end{aligned}$$

(24b)

$$\begin{aligned}&z_i \in \mathcal {Z}_i \subset \mathbb {Z}, \end{aligned}$$

(24c)

$$\begin{aligned}&w_i=w_k, z_i=z_k, \text { and } v_i =v_k, \; \forall k \in \{1,\ldots ,N\}. \end{aligned}$$

(24d)

That is, the collection of $M=\sum _{i=1}^N M_i$ training samples is decomposed into N (evenly sized) parts, all of which must be in consensus.

4.7 Numerical Results for Mixed-Integer Lasso

To test Algorithm 2 on Problem (24) the results for a variety of problem and partition sizes are presented, with features generated according to [10]. These are displayed in Table 10 with the results of Bonmin used for comparison.

Table 10 Results obtained for Problem (24) using Algorithm 2 and Bonmin

Full size table

Note that Bonmin with standard settings fails for most of the benchmark problems. Nonetheless, the time spent by Bonmin to attempt a solution is still recorded in Table 10. Overall, Algorithm 2 tends to perform much better with a highly partitioned problem (large N), where each partition contains relatively few decision variables (low n).

Much like the 4th order instance of the TCL problem, the mixed-integer lasso problem requires many iterations until PaDOA converges. Likewise, the upper and lower bounds converge in a similar manner as shown in Fig. 6. That is, the lower bound increases much more quickly than the upper bound decreases.

One interesting difference between the results for the mixed-integer lasso and TCL problems is that the runtime for the TCL problem is much more balanced between the MILP and MINLP steps, as shown in Table 11. This may be due to the chosen problem partitioning, wherein each MINLP subproblem only requires the solution to a collection of continuous variables and a single integer variable.

Table 11 Runtime breakdown of Algorithm 2 applied to Problem (24) with $n=40$, $M=100$, and $N=10$

Full size table

5 Conclusions

This paper has introduces a new method (PaDOA) for partially distributing MICP solvers to find $\epsilon $-suboptimal points of the structured MICP (1). PaDOA proceeds by alternating between solving partially decoupled MICPs that comprise m integer variables and large scale MILPs with nN variables. Finite termination conditions for PaDOA are established in Theorem 3. Moreover, we discuss the major theoretical and practical advantages of PaDOA compared to exising extended formulation based OA solvers. In particular, Theorem 2 states that PaDOA terminates after the first iteration, if it is initialized at a global minimizer—an important property that is neither shared by existing OA nor by existing branch-and-bound based methods for MICP.

In Sect. 4, first, second and fourth order mixed integer problems are used to demonstrate the practical performance of PaDOA compared to other state of the art solvers by application to a scheduling problem of thermostatically controlled loads. While the solution and runtime are competitive for each of the case studies considered, the best performance occurs for problems with sparse Hessians and coupling constraints. Furthermore, it is observed that PaDOA is able to return a solution in several cases where the centralized approach fails due to memory constraints. These results are confirmed with a second benchmark example in Sect. 4.6. It remains to be seen what the specific limitations of this method for distributing MICP solvers are, and a full investigation of PaDOA and a more mature software implementation is still subject to future work.

As shown in Sect. 4, the MIQP implementation of Problem (22) is solved more efficiently than the MILP implementation. Thus, it may be the case that quadratic underapproximating functions and an MIQP coupling step could actually be more efficient for certain problems than the current implementation with affine supporting hyperplanes. This would also have the added benefit of allowing for applicability to a larger problem class. Future work will investigate the use of other under-approximating functions and an extension of Algorithm 2 to non-convex MINLPs. Furthermore, Step 4 requires full constraint information in order to return a feasible solution. This restricts the applicability in terms of fully distributed settings and future work will focus on sidestepping this restriction.

Notes

For more details see [8].
Note that we are referring to the relative importance of z and $\lambda $ and that the choice of subgradient should in general not be neglected. For example, [18] shows that with a poor choice of subgradients OA can fail to converge for convex nonsmooth MINLP problems.
The idea to use dual solutions as subgradients for the construction of polyhedral outer approximation is not new and can—in a very similar setting—be found in [36].
In the context of convex optimization, the functions $\varPsi _i$ only depend on x, and Problem (1) is considered without integer variables z.
Only two temperature trajectories are visible due to both end-rooms having identical solutions.
Convex in the same notion of convexity in MICPs. That is, a problem where the continuous relaxation yields a convex quadratic program.
It should be noted that all results seen in this section for Algorithm 2 use Bonmin to solve the MICP subproblems.

References

Alguacil, N., Motto, A.L., Conejo, A.J.: Transmission expansion planning: a mixed-integer lp approach. IEEE Trans. Power Syst. 18(3), 1070–1077 (2003)
Article Google Scholar
Andersson, J.A.E., Gillis, J., Horn, G., Rawlings, J.B., Diehl, M.: CasADi—a software framework for nonlinear optimization and optimal control. Math. Program. Comput. (2018) (in press)
Andreani, R., Birgin, E.G., Martinez, J.M., Schuverdt, M.L.: On augmented Lagrangian methods with general lower-level constraints. SIAM J. Optim. 18, 1286–1309 (2007)
Article MathSciNet MATH Google Scholar
Arora, J.S.: Introduction to Optimum Design. Elsevier, Amsterdam (2004)
Book Google Scholar
Benders, J.F.: Partitioning procedures for solving mixed-variables programming problems. Numer. Mat. 4(1), 238–252 (1962)
Article MathSciNet MATH Google Scholar
Bertsekas, D.P.: Nonlinear Programming, 2nd edn. Athena Scientific, Belmont (1999)
MATH Google Scholar
Bonami, P., Biegler, L.T., Conn, A.R., Cornuéjols, G., Grossmann, I.E., Laird, C.D., Lee, J., Lodi, A., Margot, F., Sawaya, N., Wächter, A.: An algorithmic framework for convex mixed integer nonlinear programs. Discrete Optim. 5(2), 186–204 (2008)
Article MathSciNet MATH Google Scholar
Bonami, P., Kilinç, M., Linderoth, J.: Algorithms and software for convex mixed integer nonlinear programs. In: Mixed Integer Nonlinear Programming, vol. 154, pp. 1–39. Springer, New York (2012)
Borchers, B., Mitchell, J.E.: An improved branch and bound algorithm for mixed integer nonlinear programs. Comput. Oper. Res. 21, 359–368 (1994)
Article MathSciNet MATH Google Scholar
Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3(1), 1–122 (2011)
Article MATH Google Scholar
Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, Cambridge (2004)
Book MATH Google Scholar
Carrion, M., Arroyo, J.M.: A computationally efficient mixed-integer linear formulation for the thermal unit commitment problem. IEEE Trans. Power Syst. 21(3), 1371–1378 (2006)
Article Google Scholar
Conforti, M., Cornuéjols, G., Zambelli, G.: Polyhedral approaches to mixed integer linear programming. In: 50 Years of Integer Programming 1958–2008, pp. 343–385. Springer, New York (2009)
Dantzig, G.B., Wolfe, P.: Decomposition principle for linear programs. Oper. Res. 8(1), 101–111 (1960)
Article MATH Google Scholar
Deutscher Wetterdienst. ftp://ftp-cdc.dwd.de/pub/CDC/observations_germany/climate/10_minutes/solar/historical/ (2017)
Duran, M.A., Grossmann, I.E.: An outer-approximation algorithm for a class of mixed-integer nonlinear programs. Math. Program. 36(3), 307–339 (1986)
Article MathSciNet MATH Google Scholar
Eckstein, J., Bertsekas, D.P.: On the Douglas–Rachford splitting method and the proximal point algorithm for maximal monotone operators. Math. Program. 55, 293–318 (1992)
Article MathSciNet MATH Google Scholar
Eronen, V., Mäkelä, M.M., Westerlund, T.: On the generalization of ECP and OA methods to nonsmooth convex MINLP problems. Optimization 63(7), 1057–1073 (2014)
Article MathSciNet MATH Google Scholar
Everett, H.: Generalized Lagrange multiplier method for solving problems of optimum allocation of resources. Oper. Res. 11(3), 399–417 (1963)
Article MathSciNet MATH Google Scholar
Fletcher, R., Leyffer, S.: Solving mixed integer nonlinear programs by outer approximation. Math. Program. 66(1–3), 327–349 (1994)
Article MathSciNet MATH Google Scholar
Gabay, D., Mercier, B.: A dual algorithm for the solution of nonlinear variational problems via finite element approximations. Comput. Math. Appl. 2, 17–40 (1976)
Article MATH Google Scholar
Garey, M.R., Johnson, D.S.: Computers and Intractability: A Guide to the Theory of NP-Completeness (Series of Books in the Mathematical Sciences), 1st edn. W. H. Freeman, New York (1979)
MATH Google Scholar
Geoffrion, A.: Generalized benders decomposition. J. Optim. Theory Appl. 10, 237–260 (1972)
Article MathSciNet MATH Google Scholar
Gupta, O.K., Ravindran, A.: Branch and bound experiments in convex nonlinear integer programming. Manag. Sci. 31, 1533–1546 (1985)
Article MathSciNet MATH Google Scholar
Hijazi, H., Bonami, P., Ouorou, A.: An outer-inner approximation for separable mixed-integer nonlinear programs. INFORMS J. Comput. 26(1), 31–44 (2014)
Article MathSciNet MATH Google Scholar
Houska, B., Frasch, J., Diehl, M.: An augmented Lagrangian based algorithm for distributed Non-Convex optimization. SIAM J. Optim. 26(2), 1101–1127 (2016)
Article MathSciNet MATH Google Scholar
IBM: Using the CPLEX callable library, version 12 (2009)
Kesavan, P., Allgor, R.J., Gatzke, E.P., Barton, P.I.: Outer approximation algorithms for separable nonconvex mixed-integer nonlinear programs. Math. Program. 100(3), 517–535 (2004)
Article MathSciNet MATH Google Scholar
Kilinç, M.R., Linderoth, J., Luedtke, J.: Lift-and-project cuts for convex mixed integer nonlinear programs. Math. Program. Comput. 9(4), 499–526 (2017)
Article MathSciNet MATH Google Scholar
Kocuk, B., Dey, S.S., Sun, X.: New formulation and strong MISOCP relaxations for AC optimal transmission switching problem. IEEE Trans. Power Syst. 32(6), 4161–4170 (2017)
Article Google Scholar
Kohlhepp, P., Hagenmeyer, V.: Technical potential of buildings in Germany as flexible power-to-heat storage for smart-grid operation. Energy Technol. 5(7), 1084–1104 (2017)
Article Google Scholar
Kronqvist, J., Bernal, D.E., Grossmann, I.E.: Using regularization and second order information in outer approximation for convex MINLP. Math. Program. 180(1), 285–310 (2020)
Article MathSciNet MATH Google Scholar
Kronqvist, J., Lundell, A., Westerlund, T.: Reformulations for utilizing separability when solving convex MINLP problems. J. Global Optim. 71(3), 571–592 (2018)
Article MathSciNet MATH Google Scholar
Kuindersma, S., Deits, R., Fallon, M., Valenzuela, A., Dai, H., Permenter, F., Koolen, T., Marion, P., Tedrake, R.: Optimization-based locomotion planning, estimation, and control design for the atlas humanoid robot. Auton. Robot. 40(3), 429–455 (2016)
Article Google Scholar
Leyffer, S.: Integrating SQP and branch-and-bound for mixed integer nonlinear programming. Comput. Optim. Appl. 18, 295–309 (2001)
Article MathSciNet MATH Google Scholar
Lubin, M., Yamangil, E., Bent, R., Vielma, J.P.: Polyhedral approximation in mixed-integer convex optimization. Math. Program. (2017)
Lundell, A., Kronqvist, J., Westerlund, T.: The supporting hyperplane optimizationtoolkit—a polyhedral outer approximation based convex MINLP solver utilizing a single branchingtree approach. Optim. Online (Preprint) (2018)
Marriott, K., Stuckey, P.J.: Programming with Constraints: An Introduction. MIT Press, Cambridge (1998)
Book MATH Google Scholar
Murray, A., Faulwasser, T., Hagenmeyer, V.: Mixed-integer vs. real-valued formulations of distributed battery scheduling problems. In: 10th Symposium on Control of Power and Energy Systems (CPES 2018) (2018)
Murty, K.G., Kabadi, S.N.: Some NP-complete problems in quadratic and nonlinear programming. Math. Program. 39, 117–129 (1987)
Article MathSciNet MATH Google Scholar
Muts, P., Nowak, I., Hendrix, E.M.T.: The decomposition-based outer approximation algorithm for convex mixed-integer nonlinear programming. J. Global Optim. 75, 1–22 (2020)
MathSciNet MATH Google Scholar
Muts, P., Nowak, I., Hendrix, E.M.T.: On decomposition and multiobjective-based column and disjunctive cut generation for MINLP. Optim. Eng. 1–30 (2020)
Necoara, I., Suykens, J.A.K.: Application of a smoothing technique to decomposition in convex optimization. IEEE Trans. Autom. Control 53(11), 2674–2679 (2008)
Article MathSciNet MATH Google Scholar
Nocedal, J., Wright, S.J.: Sequential Quadratic Programming. Springer, New York (2006)
Google Scholar
Nowak, I., Breitfeld, N., Hendrix, E.M.T., Njacheun-Njanzoua, G.: Decomposition-based inner-and outer-refinement algorithms for global optimization. J. Global Optim. 72(2), 305–321 (2018)
Article MathSciNet MATH Google Scholar
Nowak, I., Muts, P., Hendrix, E.M.T.: Multi-tree decomposition methods for large-scale mixed integer nonlinear optimization. In: Large Scale Optimization in Supply Chains and Smart Manufacturing, pp. 27–58. Springer (2019)
Gurobi Optimization: Gurobi optimizer reference manual (2009)
Powell, M.J.D.: A method for nonlinear constraints in minimization problems. In: Fletcher, R. (ed.) Optimization. Academic Press, Cambridge (1969)
Google Scholar
Quesada, I., Grossmann, I.E.: An LP/NLP based branch-and-bound algorithm for convex MINLP optimization problems. Comput. Chem. Eng. 16, 937–947 (1992)
Article Google Scholar
Ravemark, D.E., Rippin, D.W.T.: Optimal design of a multi-product batch plant. Comput. Chem. Eng. 22, 177–183 (1998)
Article Google Scholar
Rockafellar, R.T.: Convex analysis (1970)
Rückmann, J., Shapiro, A.: Augmented Lagrangians in semi-infinite programming. Math. Program. Ser. B 116, 499–512 (2009)
Article MathSciNet MATH Google Scholar
Sawaya, N.: Reformulations, relaxations and cutting planes for generalized disjunctive programming. Ph.D. thesis, Carnegie Mellon University (2006)
Shapiro, A., Sun, J.: Some properties of the augmented Lagrangian in cone constrained optimization. Math. Oper. Res. 29(3), 479–491 (2004)
Article MathSciNet MATH Google Scholar
Takapoui, R., Möhle, N., Boyd, S., Bemporad, A.: A simple effective heuristic for embedded mixed-integer quadratic programming. Int. J. Control 86, 1–11 (2016)
Google Scholar
Tawarmalani, M., Sahinidis, N.V.: A polyhedral branch-and-cut approach to global optimization. Math. Program. 103(2), 225–249 (2005)
Article MathSciNet MATH Google Scholar
Tseng, P.: Convergence of a block coordinate descent method for nondifferentiable minimization. J. Optim. Theory Appl. 109(3), 475–494 (2001)
Article MathSciNet MATH Google Scholar
Vielma, J.P., Dunning, I., Huchette, J., Lubin, M.: Extended formulations in mixed integer conic quadratic programming. Math. Program. Comput. 95, 1–50 (2016)
MATH Google Scholar
Westerlund, T., Pettersson, F.: A cutting plane method for solving convex MINLP problems. Comput. Chem. Eng. 19, 131–136 (1995)
Article Google Scholar
Wright, S.: Coordinate descent algorithms. Math. Program. Ser. B 151(1), 3–34 (2015)
Article MathSciNet MATH Google Scholar
Zhang, W., Kalsi, K., Fuller, J., Elizondo, M., Chassin, D.: Aggregate model for heterogeneous thermostatically controlled loads with demand response. In: 2012 IEEE Power and Energy Society General Meeting, pp. 1–8 (2012)

Download references

Funding

Open Access funding enabled and organized by Projekt DEAL.

Author information

Authors and Affiliations

Institute for Automation and Applied Computer Science, Karlsruhe Institute of Technology, Karlsruhe, Germany
Alexander Murray, Timm Faulwasser & Veit Hagenmeyer
School of Information Science and Technology, ShanghaiTech University, Pudong, China
Mario E. Villanueva & Boris Houska

Authors

Alexander Murray
View author publications
You can also search for this author in PubMed Google Scholar
Timm Faulwasser
View author publications
You can also search for this author in PubMed Google Scholar
Veit Hagenmeyer
View author publications
You can also search for this author in PubMed Google Scholar
Mario E. Villanueva
View author publications
You can also search for this author in PubMed Google Scholar
Boris Houska
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Alexander Murray.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Proofs

1.1 Proof of Proposition 1

If there is no $x \in X$ with $Ax = b$, both sides of (6) are equal to infinity and the statement of the proposition holds in the extended value sense. Thus, we may assume that the constraints in (6) are feasible. Consequently, Assumption 1 implies that (5) is a convex optimization problem with compact and non-empty feasible set. Moreover, since X is a polytope, all constraints in (5) are linear. It is well-known [6, 51] that strong duality holds under these conditions. $\square $

1.2 Proof of Lemma 1

If there is no $x \in X$ with $A x = b$, both sides of (10) are equal to infinity and the statement of the lemma holds in the extended value sense. Thus, we may assume that the equation $Ax = b$ has a solution in X. Next, because we have $z \in \varPi $, our particular construction of $\varPhi $ implies that

$$\begin{aligned} \varPhi (x,z,\varPi )\ge & {} f^\star (z) + \left[ \lambda ^\star (z) \right] ^\mathsf {T}( x - x^\star (z) ) + \underbrace{\left[ \mu ^\star (z) \right] ^\mathsf {T}( z - z )}_{=0} \end{aligned}$$

(25)

Thus, we have

$$\begin{aligned} \min _{x \in X, Ax = b} \; \varPhi (x,z,\varPi )= & {} \min _{y \in X, Ay = b} \; f^\star (z) + \left[ \lambda ^\star (z) \right] ^\mathsf {T}( y - x^\star (z) ) \\= & {} \min _{x,y} \; f(x,z) + \left[ \lambda ^\star (z) \right] ^\mathsf {T}( y - x ) \quad \mathrm {s.t.} \quad \left\{ \begin{array}{l} A y = b \\ y \in X \end{array} \right. \\= & {} \max _{\lambda } \; \min _{x,y} \; f(x,z) + \lambda ^\mathsf {T}(y-x) \quad \mathrm {s.t.} \quad \left\{ \begin{array}{l} A y = b \\ y \in X \end{array} \right. \; . \end{aligned}$$

Since Assumption 1 holds, we may substitute (6) (see Proposition 1), which yields the equation

$$\begin{aligned} \forall z \in \varPi , \qquad \min _{x \in X, Ax = b} \; \varPhi (x,z,\varPi ) = f^\star (z) \; . \end{aligned}$$

$\square $

1.3 Proof of Theorem 1

Notice that if the equation $Ax = b$ has no solution in X, this will be detected immediately by Step 2 of Algorithm 1, which causes termination. Thus, we may assume that all optimization problems are feasible. Now, the main idea of the proof is to show that the cardinality of the set $\varPi $ is strictly increasing in every iteration, if the algorithm does not terminate. For this aim, we first notice that any solution $(x^+,y^+,z^+)$ of the MILP (12) satisfies the equation

$$\begin{aligned} \varPhi (x^+,z^+,\varPi ) = \sum _{i=1}^N y_i^+ \end{aligned}$$

(26)

by construction. Moreover, because we have $A x^+ = b$, the inequality

$$\begin{aligned} \min _{x,Ax = b} \; \varPhi (x,z^+,\varPi ) \le \varPhi (x^+,z^+,\varPi ) \end{aligned}$$

(27)

holds. If we further assume that the termination criterion is not satisfied, we must have

$$\begin{aligned} \sum _{i=1}^N y_i^+ < U - \epsilon \end{aligned}$$

(28)

Thus, if we had $z^+ \in \varPi $, then the result of Lemma 1 would imply that

$$\begin{aligned} f^\star (z^+) \overset{(10)}{=} \min _{x,Ax = b} \; \varPhi (x,z^+,\varPi ) \end{aligned}$$

(29)

as well as $U \le f(z^+)$, since $z^+$ has already been added to the collection $\varPi $. By substituting all the above relations we would then find that

$$\begin{aligned} f^\star (z^+) \, \overset{(29),(27)}{\le } \, \varPhi (x^+,z^+,\varPi ) \, \overset{(26),(28)}{<} \, U - \epsilon \, \le \, f(z^+) - \epsilon , \end{aligned}$$

which is a contraction. Thus, either our assumption that the algorithm does not terminate or our assumption $z^+ \in \varPi $ must be wrong. In other words, if the algorithm does not terminate in the current step, then the cardinality of the set $\varPi $ increases by 1 in the next step, because $z^+$ is added to the collection $\varPi \subseteq Z$. But this is only possible for a finite number of steps, because the set Z contains only a finite number of points. Thus, Algorithm 1 must terminate after a finite number of iterations. $\square $

1.4 Proof of Theorem 2

Let $V^\star = \sum _{i=1}^N f_i(x_i^\star ,z_i^\star )$ denote the optimal value of (1). Because we assume that such an optimal solution exists while Assumption 1 is satisfied, the partially decoupled optimization problems are all feasible and return piecewise affine lower bounds that satisfy the termination condition (16) with $V_k(z^\star ) = V^\star $, i.e., we have

$$\begin{aligned} V^\star - \epsilon _{\mathrm {L}}\; \le \min _{x \in X,\zeta \in Z_k} \varTheta _k^\star (x,\zeta ) \quad \mathrm {s.t.} \quad A x = b \end{aligned}$$

(30)

for all $k \in \{ 1, \ldots , N \}$. Because the function $\varTheta $ is by construction an upper bound on $\varTheta _k$ (for any k), we further have

$$\begin{aligned} \min _{x \in X,\zeta \in Z_k,Ax = b} \varTheta _k^\star (x,\zeta ) \le \min _{x \in X,z \in Z,Ax = b} \varTheta (x,z) = \varTheta (x^+,z^+) , \end{aligned}$$

where $(x^+,z^+)$ denotes the solution of the master MILP (20). By substituting the above inequalities we find that

$$\begin{aligned} V^\star - \epsilon _{\mathrm {L}}\le \varTheta (x^+,z^+) \; . \end{aligned}$$

Because we assume that $\epsilon _{\mathrm {L}}\le \epsilon $, this implies that

$$\begin{aligned} U - \varTheta (x^+,z^+) = V^\star - \varTheta (x^+,z^+) \le \epsilon _{\mathrm {L}}\le \epsilon \; . \end{aligned}$$

Thus, the termination condition is satisfied and Algorithm 2 terminates after the first step. $\square $

1.5 Proof of Theorem 3

We may assume that the coupled equality constraint is feasible, as infeasibility would be detected immediately in Step 1 of Algorithm 2. Similar to the proof of Theorem 1, we need to keep track of the integer solutions of the master MILPs. For this aim, we introduce the following “artificial” additional step:

Step $3'$): After solving (20), update ${\tilde{\varPi }} = {\tilde{\varPi }} \cup \{ z^+ \}$.

If the set ${\tilde{\varPi }}$ is initialized with the empty set and if Step $3'$ is inserted in Algorithm 2 immediately after Step 3, the iterates of this algorithm remain unaffected. The main idea of the proof is now to show that the cardinality of the set ${\tilde{\varPi }}$ increases in every iteration of Algorithm 1 under the assumption that the termination criterion is not satisfied. Let us assume that the solution $z^+$ satisfies $z^+ \in {\tilde{\varPi }}$ (before ${\tilde{\varPi }}$ is updated in Step $3'$). Then we have

$$\begin{aligned} U \le V_k(z^+) \overset{(16)}{\le } \epsilon _{\mathrm {L}}+ \min _{x \in X,\zeta \in Z_k, Ax = b} \varTheta _k^\star (x,\zeta ) , \end{aligned}$$

Because $\varTheta $ is an upper bound on $\varTheta _k$, this implies that we also have

$$\begin{aligned} \min _{x \in X,\zeta \in Z_k, Ax = b} \varTheta _k^\star (x,\zeta ) \le \min _{x \in X,z \in Z, Ax = b} \varTheta ^\star (x,\zeta ) = \varPhi (x^+,z^+) , \end{aligned}$$

which yields $U - \varPhi (x^+,z^+) \le \epsilon _{\mathrm {L}}\le \epsilon $. Thus, either the termination criterion is satisfied or we have $z^+ \notin {\tilde{\varPi }}$. In the latter case, the cardinality of $\varPi $ increases by 1 in the current iteration. As this is only possible for a finite number of steps, Algorithm 2 must terminate. $\square $

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Murray, A., Faulwasser, T., Hagenmeyer, V. et al. Partially distributed outer approximation. J Glob Optim 80, 523–550 (2021). https://doi.org/10.1007/s10898-021-01015-0

Download citation

Received: 09 May 2019
Accepted: 20 March 2021
Published: 17 April 2021
Issue Date: July 2021
DOI: https://doi.org/10.1007/s10898-021-01015-0

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Partially distributed outer approximation

Abstract

Similar content being viewed by others

Towards Multi-tree Methods for Large-Scale Global Optimization

Distributed primal outer approximation algorithm for sparse convex programming with separable structures

Alternative regularizations for Outer-Approximation algorithms for convex MINLP

1 Introduction

1.1 Problem formulation

Assumption 1

Definition 1

Remark 1

Remark 2

Remark 3

1.2 Notation

2 Outer approximation

2.1 Polyhedral relaxations

Proposition 1

Proof

Lemma 1

Proof

Remark 4

2.2 Outer approximation algorithm

Theorem 1

Proof

Remark 5

3 Partially distributed outer approximation algorithm

3.1 Partially decoupled upper bounds

3.2 Partially decoupled lower bounds

3.3 Partially distributed outer approximation (PaDOA)

Remark 6

3.4 Relation to distributed local optimization methods

3.5 Convergence analysis

Theorem 2

Proof

Remark 7

Theorem 3

Proof

4 Implementation and case study

4.1 Thermostatically controlled loads

4.2 Results for MILP

4.3 Results for MIQP

4.4 Higher Order Convex Problems

4.5 Discussion of TCL Benchmark Results

4.6 Mixed-Integer Lasso

4.7 Numerical Results for Mixed-Integer Lasso

5 Conclusions

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Proofs

Proofs

1.1 Proof of Proposition 1

1.2 Proof of Lemma 1

1.3 Proof of Theorem 1

1.4 Proof of Theorem 2

1.5 Proof of Theorem 3

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation