Keywords

1 Introduction and Motivation

The aim of this chapter is to provide theory and methodology for Lagrangian dual approaches for solving mixed-integer linear optimization problems, when stated as mixed-binary problems.Footnote 1 It covers Lagrangian duality theory for mixed-binary linear optimization, generalizations of classical dual subgradient algorithms, cutting-plane methods, and (fractional) approximations of primal solutions from ergodic sequences, as well as the recovery of primal integer solutions. The chapter summarizes a research stream on subgradient methods for nondifferentiable optimization applications accomplished by the authors over two decades. While being based on a series of articles by the authors and co authors, some of the results presented here are, however, hitherto unpublished.

A strong motive for using Lagrangian dual methods in many applications of discrete and combinatorial optimization is that such problems may often be viewed as relatively easily solvable problems (being commonly highly structured) to which a number of complicating side constraints are added. An example is the (asymmetric) traveling salesperson problem, which can be viewed as that of finding a least cost trip assignment subject to subtour eliminating side constraints. In a Lagrangian dual method appropriate prices are assigned to the side constraints which then are included in the objective function. A solution to the resulting simpler problem yields a lower bound on the optimal value of the original problem, but does usually not fulfill the relaxed constraints. The prices are iteratively improved by means of some updating rule in an attempt to find prices such that the relaxed constraints become ’optimally fulfilled’, that is, such that an optimal solution to the original problem is obtained. In discrete optimization, however, such prices usually do not exist.

Lagrangian dual methods are nevertheless increasingly popular tools in discrete optimization. Among their merits are their flexibility and applicability to many different problem structures and their ease of implementation. They also often produce higher lower bounds on the optimal value than does a continuous (linear programming) relaxation.Footnote 2 Lagrangian dual methods are most often used successfully in combination with other discrete optimization methodologies, such as branch-and-bound algorithms—within which they provide lower bounds —, local search methods, and primal heuristics. Among the latter, a popular combination is Lagrangian heuristics, which combine Lagrangian dual schemes with manipulations of primal infeasible solutions, aiming at producing near optimal and primal feasible solutions.

We consider a Lagrangian dual approach for solving mixed-binary linear optimization problems. We outline the properties of the Lagrangian dual problem, its relations to the primal mixed-binary problem, and its optimality conditions. Further, primal–dual optimality conditions are presented, both for a convexified primal problem and for the mixed-binary problem.

It is described how the dual problem can be solved by means of a simple conditional subgradient optimization method. For this general method, we provide a convergence result for an ergodic (averaged) sequence of solutions to the Lagrangian subproblems. The ergodic sequence is constructed in the same manner as done in [44, 60] for the case of convex optimization, and which establishes that the ergodic sequences in the limit produce optimal solutions to the original problem. Here, however, we establish that the sequences accumulate at solutions to a convexified version of the original mixed-binary optimization problem.

We further present a cutting-plane approach to the Lagrangian dual problem; it amounts to solving the convexified problem utilizing Dantzig–Wolfe decomposition, that is, column generation with the columns being associated with the solutions to the Lagrangian subproblems. Then we describe an approach to generate high quality initial columns, obtained from subproblem solutions in a subgradient optimization method applied—in a prediction phase—to the Lagrangian dual.

In the following section, we present a Lagrangian dual approach to find feasible and near optimal solutions to mixed-binary optimization problems utilizing

  1. (1)

    a Lagrangian heuristic based on the ergodic sequences,

  2. (2)

    a core problem, which is constructed based on information from the ergodic sequences, and

  3. (3)

    the ergodic sequences to guide the branching in a branch-and-bound method.

The chapter is then concluded with an extensive section with notes, references, historical downturns and further reading tips.

2 Mixed-Binary Linear Optimization and Its Convexified Counterpart

We consider a general mixed-binary linear optimization problem. In our presentation and the derivation of methods to follow, the feasible set is described as the intersection of two sets. One set is characterized by general, explicit linear inequality constraints, which are to be Lagrangian relaxed. The other set is implicit and may be a Cartesian product set, resulting in one or several separable subproblems in the solution procedure(s); our description is general in that each subproblem may contain solely continuous, solely binary, or mixed variables.

Our mixed-binary linear optimization problem is defined as

(15.1a)
(15.1b)
(15.1c)

where z denotes the optimal value, \({\boldsymbol c}_{\mathrm {b}} \in \mathbb {R}^{n_{\mathrm {b}}}\), \({\boldsymbol c}_{\mathrm {c}} \in \mathbb {R}^{n_{\mathrm {c}}}\), \(A_{\mathrm {b}} \in \mathbb {R}^{m \times n_{\mathrm {b}}}\), \(A_{\mathrm {c}} \in \mathbb {R}^{m \times n_{\mathrm {c}}}\), \({\boldsymbol b} \in \mathbb {R}^m\), and \(n_{\mathrm {b}}, n_{\mathrm {c}}, m \in \mathbb {Z}_+\). The set \(X := \big \{ ({\boldsymbol x}_{\mathrm {b}}^{\top }, {\boldsymbol x}_{\mathrm {c}}^{\top })^{\top } \in \{ 0, 1 \}^{n_{\mathrm {b}}} \times \mathbb {R}^{n_{\mathrm {c}}}_+ \,|\, D_{\mathrm {b}} {\boldsymbol x}_{\mathrm {b}} + D_{\mathrm {c}} {\boldsymbol x}_{\mathrm {c}} \geq {\boldsymbol d} \,\big \}\), where \(D_{\mathrm {b}} \in \mathbb {R}^{k \times n_{\mathrm {b}}}\), \(D_{\mathrm {c}} \in \mathbb {R}^{k \times n_{\mathrm {c}}}\), \({\boldsymbol d} \in \mathbb {R}^k\), and \(k \in \mathbb {Z}_+\), is assumed to be nonempty and bounded. By defining \({\boldsymbol x} := ({\boldsymbol x}_{\mathrm {b}}^{\top }, {\boldsymbol x}_{\mathrm {c}}^{\top })^{\top }\), \({\boldsymbol c} := ({\boldsymbol c}_{\mathrm {b}}^{\top }, {\boldsymbol c}_{\mathrm {c}}^{\top })^{\top }\), A := (A b, A c), D := (D b, D c), and n := n b + n c, the optimization problem (15.1) can be equivalently expressed asFootnote 3

(15.2a)
(15.2b)
(15.2c)

where \(X = \big \{ {\boldsymbol x} \in \{ 0, 1 \}^{n_{\mathrm {b}}} \times \mathbb {R}^{n_{\mathrm {c}}}_+ \,|\, D {\boldsymbol x} \geq {\boldsymbol d} \,\big \}\). We generally assume that the mixed-binary linear optimization problem is feasible, that is, that {x ∈ X | Ax ≥b}≠ ∅ holds, and denote by \(X^* := { \operatorname *{\text{argmin}}_{{\boldsymbol x} \in X}} \left \{ {\boldsymbol c}^{\top } {\boldsymbol x} \,|\, A {\boldsymbol x} \geq {\boldsymbol b} \,\right \}\) its nonempty solution set.

By denoting an extreme point of the convex hull of the set X with x q and letting \(\mathcal {Q}\) be an index set for all such points, the convex hull can be expressed as

$$\displaystyle \begin{aligned} X_{\operatorname{\mathrm{conv}}} := \operatorname{\mathrm{conv}} X = {\displaystyle \operatorname*{\text{conv}}_{q \in \mathcal{Q}}} \, \big\{ {\boldsymbol x}^q \,\big\}. \end{aligned} $$
(15.3)

Any extreme point to \(X_{ \operatorname {\mathrm {conv}}}\) can be expressed as \({\boldsymbol x}^q = (({\boldsymbol x}_{\mathrm {b}}^q)^{\top }, ({\boldsymbol x}_{\mathrm {c}}^q)^{\top })^{\top }\), where \({\boldsymbol x}_{\mathrm {b}}^q \in \{ 0, 1 \}^{n_{\mathrm {b}}}\) and \({\boldsymbol x}_{\mathrm {c}}^q\) is an extreme point to the nonempty polyhedral set

$$\displaystyle \begin{aligned} X_{\mathrm{c}}({\boldsymbol x}_{\mathrm{b}}^q) := \left\{ \left. {\boldsymbol x}_{\mathrm{c}} \in \mathbb{R}_+^{n_{\mathrm{c}}} \,\right|\, D_{\mathrm{c}} {\boldsymbol x}_{\mathrm{c}} \geq {\boldsymbol d} - D_{\mathrm{b}} {\boldsymbol x}_{\mathrm{b}}^q \,\right\}, \qquad q \in \mathcal{Q}. \end{aligned} $$

The linear programming (LP) relaxation of the set X is expressed as

$$\displaystyle \begin{aligned} X_{\text{LP}} & := \left\{ ({\boldsymbol x}_{\mathrm{b}}^{\top}, {\boldsymbol x}_{\mathrm{c}}^{\top})^{\top} \in [ 0, 1 ]^{n_{\mathrm{b}}} \times \mathbb{R}^{n_{\mathrm{c}}}_+ \,\left|\, D_{\mathrm{b}} {\boldsymbol x}_{\mathrm{b}} + D_{\mathrm{c}} {\boldsymbol x}_{\mathrm{c}} \geq {\boldsymbol d} \right. \,\right\} {} \end{aligned} $$
(15.4a)
$$\displaystyle \begin{aligned} & \; = \left\{ {\boldsymbol x} \in [ 0, 1 ]^{n_{\mathrm{b}}} \times \mathbb{R}^{n_{\mathrm{c}}}_+ \,\left|\, D {\boldsymbol x} \geq {\boldsymbol d} \right. \,\right\}. {} \end{aligned} $$
(15.4b)

It holds that \(X \subseteq X_{ \operatorname {\mathrm {conv}}} \subseteq X_{\text{LP}}\). Replacing the set X in (15.2c) with its convex hull \(X_{ \operatorname {\mathrm {conv}}}\) results in the linear optimization problem defined as

$$\displaystyle \begin{aligned} z^{*}_{\operatorname{\mathrm{conv}}} := \underset{{\boldsymbol x}}{\min} \left\{ \left.{\boldsymbol c}^{\top} {\boldsymbol x} \,\right|\, A {\boldsymbol x} \geq {\boldsymbol b}; \; {\boldsymbol x} \in X_{\operatorname{\mathrm{conv}}} \,\right\} \leq z^{*}. \end{aligned} $$
(15.5)

By assumption, the set \(\{ {\boldsymbol x} \in X_{ \operatorname {\mathrm {conv}}} \,|\, A {\boldsymbol x} \geq {\boldsymbol b} \}\) is nonempty and bounded. We let \(X^*_{ \operatorname {\mathrm {conv}}}\) denote the solution set to the optimization problem (15.5). Replacing the set X in (15.2c) by X LP yields

$$\displaystyle \begin{aligned} z^*_{\text{LP}} :=\underset{{\boldsymbol x}}{\min} \left\{ \left. {\boldsymbol c}^{\top} {\boldsymbol x} \,\right|\, A {\boldsymbol x} \geq {\boldsymbol b}; \; {\boldsymbol x} \in X_{\text{LP}} \,\right\} \leq z_{\operatorname{\mathrm{conv}}}^*, \end{aligned} $$
(15.6)

and we let \(X^*_{\text{LP}}\) denote the solution set to the optimization problem (15.6).

For the case when the set X LP possesses the integrality property with respect to the binary variables, that is, when all its extreme points have only integer valued variables x b, the equality \(X_{ \operatorname {\mathrm {conv}}} = X_{\text{LP}}\) holds, implying the equality \(z_{ \operatorname {\mathrm {conv}}}^* = z_{\text{LP}}^*\).

Remark 15.1

In many applications, the set X is a Cartesian product set, here denoted by , where \(\mathcal {S} = \{ 1, \ldots , S \}\). It is then assumed that each set \(Y_s \subset \{ 0, 1 \}^{n_{\mathrm {b},s}} \times \mathbb {R}^{n_{\mathrm {c},s}}\) is defined over binary and/or continuous variables y s, such that \(( {\boldsymbol y}_1^{\top }, \ldots , {\boldsymbol y}_S^{\top })^{\top } \equiv {\boldsymbol x}\), and such that the relations \(n_{\mathrm {b},s}, n_{\mathrm {c},s} \in \mathbb {Z}_+\), \(s \in \mathcal {S}\), \(\sum _{s \in \mathcal {S}} n_{\mathrm {b},s} = n_{\mathrm {b}}\), and \(\sum _{s \in \mathcal {S}} n_{\mathrm {c},s} = n_{\mathrm {c}}\) hold. The optimization problem (15.2) is then expressed as

(15.7a)
(15.7b)
(15.7c)

where \({\boldsymbol c}_s \in \mathbb {R}^{n_s}\), \(A_s \in \mathbb {R}^{m \times n_s}\), and n s = n b,s + n c,s, \(s \in \mathcal {S}\). The constraints (15.7b) are said to be coupling, since relaxing them will result in a Lagrangian subproblem that separates into one minimization problem for each \(s \in \mathcal {S}\). The integrality property may be considered for each of the sets Y s, as needed/being relevant.

2.1 The Lagrangian Dual

The Lagrange function \(L : \mathbb {R}^n \times \mathbb {R}^m \mapsto \mathbb {R}\) with respect to the relaxation of the constraints (15.1b) by means of the price vector \({\boldsymbol u} \in \mathbb {R}^m_+\), also called dual variables or (Lagrangian or dual) multipliers, is defined as

$$\displaystyle \begin{aligned} L({\boldsymbol x},{\boldsymbol u}) := {\boldsymbol c}^{\top} {\boldsymbol x} + {\boldsymbol u}^{\top} ({\boldsymbol b} - A {\boldsymbol x}). \end{aligned} $$

Its minimization over x ∈ X determines the Lagrangian dual function \(h: \mathbb {R}^m \mapsto \mathbb {R}\), as defined by

$$\displaystyle \begin{aligned} h({\boldsymbol u}) := \operatorname*{\text{min}}_{{\boldsymbol x} \in X} L({\boldsymbol x},{\boldsymbol u}) &= {\boldsymbol b}^{\top} {\boldsymbol u} + \operatorname*{\text{min}}_{{\boldsymbol x} \in X} \Big\{ \big( {\boldsymbol c} - A^{\top} {\boldsymbol u} \big)^{\top} {\boldsymbol x} \,\Big\} \\ &= {\boldsymbol b}^{\top} {\boldsymbol u} + \operatorname*{\text{min}}_{q \in \mathcal{Q}} \Big\{ \big( {\boldsymbol c} - A^{\top} {\boldsymbol u} \big)^{\top} {\boldsymbol x}^q \,\Big\}. \end{aligned} $$
(15.8)

The function h is formed by the point wise minimum over \(| \mathcal {Q} | \) affine functions, and it is therefore piecewise affine and concave, hence continuous but generally not everywhere differentiable. Letting X(u) and \(\mathcal {Q}({\boldsymbol u})\) denote the optimal set to the respective inner minimization problem—the so-called Lagrangian subproblem or Lagrangian relaxed problem—in (15.8), the following relations hold:

$$\displaystyle \begin{aligned} X({\boldsymbol u}) &= \operatorname*{\text{argmin}}_{{\boldsymbol x} \in X} \Big\{ \big( {\boldsymbol c} - A^{\top} {\boldsymbol u} \big)^{\top} {\boldsymbol x} \Big\} = \Big\{ {\boldsymbol x}^q \Big\}_{q \in \mathcal{Q}({\boldsymbol u})} ; \end{aligned} $$
(15.9a)
$$\displaystyle \begin{aligned} X_{\operatorname{\mathrm{conv}}}({\boldsymbol u}) := \operatorname{\mathrm{conv}} X({\boldsymbol u}) &= \operatorname*{\text{argmin}}_{{\boldsymbol x} \in X_{\operatorname{\mathrm{conv}}}} \Big\{ \big( {\boldsymbol c} - A^{\top} {\boldsymbol u} \big)^{\top} {\boldsymbol x} \Big\} = {\displaystyle \operatorname*{\text{conv}}_{q \in \mathcal{Q}({\boldsymbol u})}} \Big\{ {\boldsymbol x}^q \Big\} . {} \end{aligned} $$
(15.9b)

The expression for X(u) in (15.9a) can always be replaced by its convexified version \(X_{ \operatorname {\mathrm {conv}}}({\boldsymbol u})\) in (15.9b), since for any linear objective there is an optimal extreme point to the set \(X_{ \operatorname {\mathrm {conv}}}\) that is also optimal with respect to the set X.

By weak duality, the inequality h(u) ≤c x holds whenever \({\boldsymbol u} \in \mathbb {R}^m_+\) and \({\boldsymbol x} = ({\boldsymbol x}_{\mathrm {b}}^{\top },{\boldsymbol x}_{\mathrm {c}}^{\top })^{\top }\) is feasible in (15.2) [and, consequently, in (15.1)]. In order to find the best possible underestimate of z , the prices u should be chosen as to maximize the Lagrangian dual function, that is, to solve the Lagrangian dual problem defined as

$$\displaystyle \begin{aligned} h^* := {\displaystyle \operatorname*{\text{max}}_{{\boldsymbol u} \in \mathbb{R}^m_+}} \; h({\boldsymbol u}). \end{aligned} $$
(15.10)

The problem (15.10) is a convex optimization problem having a concave and generally nondifferentiable objective function. By the assumption that the polyhedron \(\left \{ {\boldsymbol x} \in X_{ \operatorname {\mathrm {conv}}} \,|\, A {\boldsymbol x} \geq {\boldsymbol b} \,\right \}\) is nonempty, also the optimal set of (15.10)—denoted U —is nonempty and polyhedral. Thus, by weak duality, the inequality h ≤ z holds. For most mixed-binary linear optimization problems, however, it holds that h  < z , that is, the duality gapz − h is nonzero.

Using the transformations in (15.8) along with a LP dualization, the Lagrangian dual (15.10) can be reformulated according to

$$\displaystyle \begin{aligned} h^* & = {\displaystyle \operatorname*{\text{max}}_{{\boldsymbol u}}} \, \bigg\{ {\boldsymbol b}^{\top} {\boldsymbol u} + \operatorname*{\text{min}}_{q \in \mathcal{Q}} \Big\{ \big( {\boldsymbol c} - A^{\top} {\boldsymbol u} \big)^{\top} {\boldsymbol x}^q \,\Big\} \,\Big|\, {\boldsymbol u} \in \mathbb{R}^m_+ \bigg\} \end{aligned} $$
(15.11a)
$$\displaystyle \begin{aligned} & = {\displaystyle \operatorname*{\text{max}}_{{\boldsymbol u}, \, v}} \, \bigg\{ {\boldsymbol b}^{\top} {\boldsymbol u} + v \,\Big|\, \big( A {\boldsymbol x}^q \big)^{\top} {\boldsymbol u} + v \leq {\boldsymbol c}^{\top} {\boldsymbol x}^q, \, q \in \mathcal{Q}; \, {\boldsymbol u} \in \mathbb{R}^m_+; \, v \in \mathbb{R} \bigg\} {} \end{aligned} $$
(15.11b)
(15.11c)
$$\displaystyle \begin{aligned} & = {\displaystyle \operatorname*{\text{min}}_{{\boldsymbol x}, \, { \boldsymbol{\lambda}}}} \left\{ {\boldsymbol c}^{\top} {\boldsymbol x} \,\left|\, A {\boldsymbol x} \geq {\boldsymbol b}; \; {\boldsymbol x} = \sum_{q \in \mathcal{Q}} \lambda_q {\boldsymbol x}^q; \; \sum_{q \in \mathcal{Q}} \lambda_q = 1; \; \lambda_q \geq 0, \, q \in \mathcal{Q} \right. \,\right\} \end{aligned} $$
(15.11d)
$$\displaystyle \begin{aligned} & = {\displaystyle \operatorname*{\text{min}}_{{\boldsymbol x}}} \left\{ \left. {\boldsymbol c}^{\top} {\boldsymbol x} \,\right|\, A {\boldsymbol x} \geq {\boldsymbol b} ; \; {\boldsymbol x} \in X_{\operatorname{\mathrm{conv}}} \,\right\} = z^*_{\operatorname{\mathrm{conv}}}. {} \end{aligned} $$
(15.11e)

In summary, the above derivations of primal–dual connections imply the following weak and strong duality relations for the problems (15.1), (15.2), (15.5), (15.6), and (15.10):

$$\displaystyle \begin{aligned} z^* \geq z^*_{\operatorname{\mathrm{conv}}} = h^* \geq z_{\text{LP}}^*. \end{aligned} $$

For typical—as well as the most interesting—applications of Lagrangian dualization the straightforward continuous relaxation of the set X does, however, not result in a set X LP [defined in (15.4)] having integer extreme points. Hence, although the equality \(z^*_{ \operatorname {\mathrm {conv}}} = z_{\text{LP}}^*\) may hold, typically \(X_{ \operatorname {\mathrm {conv}}} \subset X_{\text{LP}}\) holds [\(X_{ \operatorname {\mathrm {conv}}}\) being defined in (15.3)], which—in practice—most often implies the strict inequality \(z^*_{ \operatorname {\mathrm {conv}}} > z_{\text{LP}}^*\).

2.2 Optimality Conditions for the Convexified Problem

In order to derive optimality conditions for the convexified optimization problem (15.5) and, eventually, also for the original optimization problem (15.1) [or (15.2)], we first define the subdifferential of the concave function h at \({\boldsymbol u} \in \mathbb {R}^m\) as

$$\displaystyle \begin{aligned} \partial h({{\boldsymbol u}}) := \left\{ {\boldsymbol \gamma} \in \mathbb{R}^m \,\left|\, h({{\boldsymbol v}}) \leq h({{\boldsymbol u}}) + {\boldsymbol \gamma}^{\top} ({{\boldsymbol v} - {\boldsymbol u}}), \, {{\boldsymbol v}} \in \mathbb{R}^m \right. \right\}, \end{aligned} $$

the elements of which are called subgradients. The following characterization of the subdifferential holds for any Lagrangian dual objective function.

Proposition 15.1 (Subdifferential of the Dual Objective Function)

For each \({\boldsymbol u} \in \mathbb {R}^m\), it holds that \(\partial h({\boldsymbol u}) = \{ {\boldsymbol b} - A {\boldsymbol x} \,|\, {\boldsymbol x} \in X_{ \operatorname {\mathrm {conv}}}({\boldsymbol u}) \} = \operatorname {\mathrm {conv}} \{ {\boldsymbol b} - A {\boldsymbol x}^q \,|\, q \in \mathcal {Q}({\boldsymbol u}) \}\). The function h is differentiable atuif and only if ∂h(u) is a singleton set, that is, if and only ifb − Axis constant on \(X_{ \operatorname {\mathrm {conv}}}({\boldsymbol u})\), in which caseh(u) = b − Axfor any \({\boldsymbol x} \in X_{ \operatorname {\mathrm {conv}}}({\boldsymbol u})\).Footnote 4

The normal cone to the set \(\mathbb {R}^m_+\) at \({\boldsymbol u} \in \mathbb {R}^m\) is defined as

Letting e i denote the i-th unit column, \(N_{\mathbb {R}^m_+}({\boldsymbol u}) = \operatorname {\mathrm {cone}} \, \{ - {\boldsymbol e}^i \,|\, u_i = 0, \, i \in \{ 1, \ldots , m \} \}\) holds for \({\boldsymbol u} \in \mathbb {R}^m_+\). The conditional subdifferential of h at \({\boldsymbol u} \in \mathbb {R}^m_+\), the elements of which will be referred to as conditional subgradients, is in our Lagrangian dual setting then defined as

$$\displaystyle \begin{aligned} \partial^{\mathbb{R}^m_+} h({\boldsymbol u}) :&= \left\{ {\boldsymbol \gamma} \in \mathbb{R}^m \,\left|\, h({\boldsymbol v}) \leq h({\boldsymbol u}) + {\boldsymbol \gamma}^{\top} ({\boldsymbol v} - {\boldsymbol u}), \; {\boldsymbol v} \in \mathbb{R}^m_+ \right. \,\right\} \\ & = \partial h({\boldsymbol u}) - N_{\mathbb{R}^m_+}({\boldsymbol u}) \\ & = \operatorname{\mathrm{conv}} \big\{ {\boldsymbol b} - A {\boldsymbol x}^q \,|\, q \in \mathcal{Q}({\boldsymbol u}) \big\} + \operatorname{\mathrm{cone}} \big\{ {\boldsymbol e}^i \,|\, u_i = 0, \, i \in \{ 1, \ldots, m \} \big\}. \end{aligned} $$
(15.12)

Clearly, \(\partial ^{\mathbb {R}^m_+} h({\boldsymbol u}) \supseteq \partial h({\boldsymbol u})\) holds for all \({\boldsymbol u} \in \mathbb {R}^m_+\). The next proposition is immediate.

Proposition 15.2 (Properties of the Conditional Subdifferential)

The conditional subdifferential \(\partial ^{\mathbb {R}^m_+} h({\boldsymbol u})\)is nonempty, closed and convex for all \({\boldsymbol u} \in \mathbb {R}^m_+\). Further, \(\partial ^{\mathbb {R}^m_+} h({\boldsymbol u})\)is unbounded whenever \({\boldsymbol u} \in \mathrm {bd} \, \mathbb {R}^m_+\).

Proposition 15.3 (Properties of the Lagrangian Dual)

The following statements are equivalent.

  1. (i)

    the Lagrangian dual problem (15.10) has a bounded optimal solution;

  2. (ii)

    \(\,\mathbf {0} \in \bigcup _{{\boldsymbol u} \in \mathbb {R}^m_+} \partial ^{\mathbb {R}^m_+} h({\boldsymbol u})\) ;

  3. (iii)

    \(\left \{ \left . {\boldsymbol x} \in X_{ \operatorname {\mathrm {conv}}} \,\right |\, A {\boldsymbol x} \geq {\boldsymbol b} \,\right \} \neq \emptyset \).

The optimality conditions for the Lagrangian dual (15.10) are expressed as follows.

Proposition 15.4 (Optimality Conditions for the Lagrangian Dual Problem)

A dual vector \({\boldsymbol u} \in \mathbb {R}^m_+\)is optimal, that is,u ∈ U , if and only if \(\partial h({\boldsymbol u}) - N_{\mathbb {R}^m_+}({\boldsymbol u})\ni \mathbf {0}\), or equivalently, \(\partial h({\boldsymbol u}) \cap N_{\mathbb {R}^m_+}({\boldsymbol u}) \neq \emptyset \), that is, if and only if there exists aγ ∈ ∂h(u) such thatγ ≤0andu γ = 0 hold.

Further, subgradients that verify the optimality of dual solutions correspond to optimal solutions to the convexified primal problem (15.11e), as expressed below.

Proposition 15.5 (Primal–Dual Optimality Conditions for the Convexified Problem)

Let \(({\boldsymbol x}, {\boldsymbol u}) \in X_{ \operatorname {\mathrm {conv}}} \times \mathbb {R}_+^m\). Then \(({\boldsymbol x}, {\boldsymbol u}) \in X^*_{ \operatorname {\mathrm {conv}}} \times U^*\)if and only if \({\boldsymbol x} \in X_{ \operatorname {\mathrm {conv}}}({\boldsymbol u})\)and \({\boldsymbol b} - A {\boldsymbol x} \in \partial h({\boldsymbol u}) \cap N_{\mathbb {R}^m_+}({\boldsymbol u})\)hold, that is, if and only if \({\boldsymbol x} \in X_{ \operatorname {\mathrm {conv}}}({\boldsymbol u})\), Ax ≥b, andu (b − Ax) = 0 hold.

As is well known, a primal–dual pair \((\overline {{\boldsymbol x}}, \overline {{\boldsymbol u}}) \in X_{ \operatorname {\mathrm {conv}}} \times \mathbb {R}^m_+\) satisfies the conditions of the proposition exactly when it is a saddle point of the Lagrange function L with respect to \(X_{ \operatorname {\mathrm {conv}}} \times \mathbb R^m_+\):

$$\displaystyle \begin{aligned} L(\overline{{\boldsymbol x}}, {\boldsymbol u}) \, \leq \, L(\overline{{\boldsymbol x}}, \overline{{\boldsymbol u}}) \, \leq \, L({\boldsymbol x}, \overline{{\boldsymbol u}}), \qquad {\boldsymbol u} \in \mathbb{R}^m_+, \quad {\boldsymbol x} \in X_{\operatorname{\mathrm{conv}}}. \end{aligned} $$
(15.13)

The mapping \(\partial h \cap N_{\mathbb {R}^m_+}\) is constant on the dual solution set U . Hence, irrespective of the choice of dual solution u  ∈ U the solution set to the primal problem (15.11e) may be expressed as

$$\displaystyle \begin{aligned} X^*_{\operatorname{\mathrm{conv}}} = \left\{ {\boldsymbol x} \in X_{\operatorname{\mathrm{conv}}}({\boldsymbol u}^*) \,\left|\, A {\boldsymbol x} \geq {\boldsymbol b}; \; \left( {{\boldsymbol u}^*} \right)^{\top} (A {\boldsymbol x} - {\boldsymbol b}) = 0 \right. \,\right\}. \end{aligned} $$
(15.14)

In the typical situation, the subproblem solution set \(X_{ \operatorname {\mathrm {conv}}}({\boldsymbol u}^*)\) is not a singleton, the dual objective function is nonsmooth on U , and finding a subgradient that verifies dual optimality is computationally expensive. Further, since not all points in \(X_{ \operatorname {\mathrm {conv}}}({\boldsymbol u}^*)\) are optimal in the primal problem,Footnote 5 finding a point \({\boldsymbol x} \in X^*_{ \operatorname {\mathrm {conv}}}\) is nontrivial; this phenomenon is referred to as non-coordinability, and is relevant both when the original problem is an LP or a mixed-binary linear optimization problem.

2.3 Conditions for Optimality and Near Optimality of Mixed-Binary Linear Optimization Problems

The optimality conditions in Proposition 15.5 and the characterization in (15.14) can be constructed because of the convexity of the problem (15.5), which yields that strong duality, that is, \(h^* = z_{ \operatorname {\mathrm {conv}}}^*\) holds. For the nonconvex mixed-binary linear optimization problem (15.2), for which h ≤ z (and typically h  < z ) holds, these conditions can be generalized through a well-defined relaxation.

Proposition 15.6 (Primal–Dual Optimality Conditions for Mixed-Binary Linear Optimization)

Let \(({\boldsymbol x}, {\boldsymbol u}) \in X \times \mathbb {R}_+^m\). Then (x, u) ∈ X × U if and only if the following system is consistent:

$$\displaystyle \begin{aligned} A {\boldsymbol x} & \geq {\boldsymbol b}, {} \end{aligned} $$
(15.15a)
$$\displaystyle \begin{aligned} {\boldsymbol c}^{\top} {\boldsymbol x} + {\boldsymbol u}^{\top} ({\boldsymbol b} - A {\boldsymbol x}) & \leq h({\boldsymbol u}) + \varepsilon, {} \end{aligned} $$
(15.15b)
(15.15c)
$$\displaystyle \begin{aligned} \varepsilon + \delta & \leq z^* - h^*, {} \end{aligned} $$
(15.15d)
$$\displaystyle \begin{aligned} \varepsilon, \delta & \geq 0. {} \end{aligned} $$
(15.15e)

The conditions stated in Proposition 15.6 characterize primal–dual optimal solutions through five properties:

  1. (i)

    primal feasibility [x ∈ X and (15.15a)];

  2. (ii)

    dual feasibility [\({\boldsymbol u} \in \mathbb {R}_+^m\)];

  3. (iii)

    Lagrangian ε-optimality [(15.15b)];

  4. (iv)

    a relaxed (δ-)complementarity [(15.15c)];

  5. (v)

    a bounded sum of nonnegative perturbations [(15.15d)–(15.15e)].

The combination of the inequalities in (15.15) with the condition \(({\boldsymbol x}, {\boldsymbol u}) \in X \times \mathbb {R}_+^m\) leads to the equality ε + δ = z − h being fulfilled. Further, the system (15.15) is consistent for the primal–dual pair \((\overline {{\boldsymbol x}}, \overline {{\boldsymbol u}}) \in X \times \mathbb {R}_+^m\) if and only if the inequalities

$$\displaystyle \begin{aligned} L(\overline{{\boldsymbol x}}, {\boldsymbol u}) - (z^* - h^*) + \varepsilon \, \leq \, L(\overline{{\boldsymbol x}}, \overline{{\boldsymbol u}}) \, \leq \, L({\boldsymbol x}, \overline{{\boldsymbol u}}) + \varepsilon \end{aligned} $$
(15.16)

hold for all \(\boldsymbol u \in \mathbb R^m_+\) and all x ∈ X, and for some ε ∈ [0, z − h ], meaning that \((\overline {{\boldsymbol x}}, \overline {{\boldsymbol u}})\) is a near saddle point to the Lagrange function L. For a zero duality gap, that is, when z  = h , the conditions (15.15) reduce to those stated in Proposition 15.5 while the near saddle point property (15.16) reduces to that of an ordinary saddle point, that is, (15.13).

Using the conditions (15.15) and an optimal Lagrangian dual solution u  ∈ U , the optimal set of the mixed-binary linear optimization problem (15.2) is given by

$$\displaystyle \begin{aligned} X^* & = \bigcup_{\substack{\delta = z^* - h^* - \varepsilon \\ \varepsilon \in [0, \, z^* - h^*]}} \left\{ {\boldsymbol x} \in X \,\left|\, A {\boldsymbol x} \geq {\boldsymbol b} ; \, {\boldsymbol c}^{\top} {\boldsymbol x} - h^* - \varepsilon \leq - ({\boldsymbol u}^*)^{\top} ({\boldsymbol b} - A {\boldsymbol x}) \leq \delta \right. \right\}. {} \end{aligned} $$
(15.17)

We define, for any \({\boldsymbol u} \in \mathbb {R}_+^m\), the sets of ε-optimal [cf. (15.9a)] and δ-complementary solutions to the Lagrangian subproblem by the following expressions:

$$\displaystyle \begin{aligned} X_{\varepsilon}^{\mathrm{opt}}({\boldsymbol u}) & := \Big\{ {\boldsymbol x} \in X \,\big|\, {\boldsymbol c}^{\top} {\boldsymbol x} + {\boldsymbol u}^{\top} ({\boldsymbol b} - A {\boldsymbol x}) \leq h({\boldsymbol u}) + \varepsilon \,\Big\}, && \varepsilon \geq 0, \\ X_{\delta}^{\mathrm{comp}}({\boldsymbol u}) & := \Big\{ {\boldsymbol x} \in X \,\big|\, - {\boldsymbol u}^{\top} ({\boldsymbol b} - A {\boldsymbol x}) \leq \delta \,\Big\}, && \delta \geq 0. \end{aligned} $$

The set X , as formulated in (15.17), can then for any u ∈ U be expressed as

$$\displaystyle \begin{aligned} X^* & = \bigcup_{\varepsilon \in [0, \, z^* - h^*]} \left\{ \left. {\boldsymbol x} \in X_{\varepsilon}^{\mathrm{opt}}({\boldsymbol u}^*) \,\right|\, A {\boldsymbol x} \geq {\boldsymbol b}; \, - ({\boldsymbol u}^*)^{\top} ({\boldsymbol b} - A {\boldsymbol x}) \leq z^* - h^*- \varepsilon \right\} {} \end{aligned} $$
(15.18a)
$$\displaystyle \begin{aligned} & = \bigcup_{\varepsilon \in [0, \, z^* - h^*]} \left\{ \left. {\boldsymbol x} \in X_{\varepsilon}^{\mathrm{opt}}({\boldsymbol u}^*) \cap X_{z^* - h^* - \varepsilon}^{\mathrm{comp}}({\boldsymbol u}^*) \,\right|\, A {\boldsymbol x} \geq {\boldsymbol b} \,\right\}, {} \end{aligned} $$
(15.18b)

where (15.18a) is derived as a generalization of (15.14).

Given an optimal dual solution, the conditions in the expression (15.14) can, in principle, be used to calculate an optimal solution to the convexified problem (15.5). In contrast, given an optimal dual solution, the conditions in the expressions (15.17) and (15.18) are in general not instrumental for finding optimal solutions to the problem (15.2), because the size of the duality gap z − h is unknown.

The characterizations (15.17) and (15.18) can be generalized to allow for non-optimal choices of \({\boldsymbol u} \in \mathbb {R}^m_+\) and to describe near optimal solutions to the problem (15.2). For this purpose, we introduce the functions \(\varepsilon , \delta : {X \times \mathbb {R}_+^m \mapsto \mathbb {R}}\), defined by

$$\displaystyle \begin{aligned} &\varepsilon({\boldsymbol x},{\boldsymbol u}) := {\boldsymbol c}^{\top} {\boldsymbol x} + {\boldsymbol u}^{\top} ({\boldsymbol b} - A {\boldsymbol x}) - h({\boldsymbol u}), \end{aligned} $$
(15.19a)
$$\displaystyle \begin{aligned} &\qquad \qquad \qquad \text{and}\\ &\delta({\boldsymbol x},{\boldsymbol u}) := - {\boldsymbol u}^{\top} ({\boldsymbol b} - A {\boldsymbol x}).{} \end{aligned} $$
(15.19b)

It holds that ε(x, u) ≥ 0 when x ∈ X, and that δ(x, u) ≥ 0 when \({\boldsymbol u} \in \mathbb {R}^m_+\) and Ax ≥b. For any primal–dual pair \(({\boldsymbol x}, {\boldsymbol u}) \in X \times \mathbb {R}_+^m\), (15.19) can be used to characterize the ε-optimality and the δ-complementarity of a solution x appr(u) that is near optimal in the Lagrangian subproblem (15.9a) and feasible in the primal problem (15.2), that is, values of ε ≥ 0 and δ ≥ 0 such that x appr(u) is included in the sets \(X_{\varepsilon }^{\mathrm {opt}}({\boldsymbol u}) = \{ {\boldsymbol x} \in X \,|\, \varepsilon ({\boldsymbol x}, {\boldsymbol u}) \leq \varepsilon \}\) and \(X_{\delta }^{\mathrm {comp}}({\boldsymbol u}) = \{ {\boldsymbol x} \in X \,|\, \delta ({\boldsymbol x}, {\boldsymbol u}) \leq \delta \}\), respectively. Then, for \({\boldsymbol u} \in \mathbb {R}^m_+\) and β ≥ 0, the set of β-optimal solutions to the problem (15.2) can be expressed as

$$\displaystyle \begin{aligned} X_{\beta}^* & := \left\{ {\boldsymbol x} \in X \,\left|\, A {\boldsymbol x} \geq {\boldsymbol b} ; \; {\boldsymbol c}^{\top} {\boldsymbol x} \leq z^* + \beta \right. \,\right\} \end{aligned} $$
(15.20a)
$$\displaystyle \begin{aligned} & \; = \left\{ {\boldsymbol x} \in X \,\left|\, A {\boldsymbol x} \geq {\boldsymbol b} ; \; \varepsilon({\boldsymbol x},{\boldsymbol u}) + \delta({\boldsymbol x},{\boldsymbol u}) \leq z^* - h({\boldsymbol u}) + \beta \right. \,\right\} \end{aligned} $$
(15.20b)
$$\displaystyle \begin{aligned} & \; = \left\{ \left. {\boldsymbol x} \in X_{\varepsilon}^{\mathrm{opt}}({\boldsymbol u}) \cap X_{\delta}^{\mathrm{comp}}({\boldsymbol u}) \,\right| A {\boldsymbol x} \geq {\boldsymbol b} ; \, \varepsilon + \delta \leq z^* \!- h({\boldsymbol u}) + \beta ; \, \varepsilon, \delta \geq 0 \right\}. \end{aligned} $$
(15.20c)

In particular, it holds that \(X_0^* = X^*\). Further, for β = 0 and any u ∈ U , the characterizations (15.17)–(15.18) are recovered.

Neither of the characterizations in (15.20) are practically useful for solving the problem (15.2), since the optimal value z is unknown, but it suggests a heuristic principle for searching for near optimal solutions. Given any value of \({\boldsymbol u} \in \mathbb {R}^m_+\) [preferably being (near) optimal in the Lagrangian dual (15.10)], a point x ∈ X such that Ax ≥b holds and such that the values of ε(x, u) and δ(x, u) are both small, should be sought, that is, a primal feasible solution that is ε-optimal and δ-complementary, with small values of ε ≥ 0 and δ ≥ 0. A natural starting point for such a heuristic search is a Lagrangian subproblem solution \({\boldsymbol x}({\boldsymbol u}) \in X_0^{\mathrm {opt}}({\boldsymbol u}) = X({\boldsymbol u})\) [for which ε(x(u), u) = 0 holds], and the search should typically be restricted to the set X. The search should then strive for primal feasibility with respect to Ax ≥b, while maintaining small values of ε(x, u) and δ(x, u). This heuristic principle has shown to be effective for finding near optimal solutions for applications where the problem (15.2) possesses specific structures that can be exploited in a search which strive for primal feasibility.

Remark 15.2

In the case when the constraints (15.2b) are instead equalities, δ(x, u) = 0 holds for any primal feasible solution. The characterization (15.20b) then reduces to

$$\displaystyle \begin{aligned} X_{\beta}^* = \left\{ {\boldsymbol x} \in X \,\left|\, A {\boldsymbol x} = {\boldsymbol b} ; \; \varepsilon({\boldsymbol x},{\boldsymbol u}) \leq z^* - h({\boldsymbol u}) + \beta \right. \,\right\}, \end{aligned} $$

that is, primal feasibility and ε-optimality in the Lagrangian relaxed problem. In particular, then \(X^* = \left \{ {\boldsymbol x} \in X \,\left |\, A {\boldsymbol x} = {\boldsymbol b} ; \; \varepsilon ({\boldsymbol x},{\boldsymbol u}) \leq z^* - h({\boldsymbol u}) \right . \right \}\). It follows that Ax ≠ b must hold for any x ∈ X such that ε(x, u) < z − h(u). Hence, a solution x ∈ X is optimal in (15.2) if and only if it is the most near optimal solution in the Lagrangian subproblem that is fulfills the relaxed constraints. This observation implies that (15.2) can be solved by enumerating elements of X with respect to increasing values of ε(x, u), whence the first one found that fulfills Ax = b is therefore also optimal.

For the case of inequality constraints in the problem (15.2), only

$$\displaystyle \begin{aligned} X^* \subseteq \left\{ {\boldsymbol x} \in X \,\left|\, A {\boldsymbol x} \geq {\boldsymbol b} ; \; \varepsilon({\boldsymbol x},{\boldsymbol u}) \leq z^* - h({\boldsymbol u}) \right. \right\} \end{aligned}$$

holds, and there is no guarantee that a feasible solution with a minimal value of ε(x, u) is optimal, since the corresponding value of δ(x, u) may be large [while an optimal solution may have a large value of ε(x, u) and a small value of δ(x, u)]. If, however, an upper bound \(\overline {z} \geq z^*\) is at hand, by enumerating all elements in the set \(\left \{ {\boldsymbol x} \in X \,\left |\, A {\boldsymbol x} \geq {\boldsymbol b} ; \; \varepsilon ({\boldsymbol x},{\boldsymbol u}) \leq \overline {z} - h({\boldsymbol u}) \right . \right \}\) an optimal solution to (15.2) will be found.

3 Conditional Subgradient Optimization

This section presents a generalization of subgradient optimization for solving a Lagrangian dual problem to utilize conditional subgradients. We discuss a particular choice of conditional subgradients, obtained through Euclidean projections, which leads to an easily implementable modification of traditional subgradient optimization schemes. Computational experiments have shown that the resulting subgradient projection method performs better than traditional subgradient optimization; in some cases the difference is considerable. This generalization of subgradient optimization is especially advantageous in the context of Lagrangian duals possessing many nonnegativity constraints, onto which Euclidean projections are simple. The section further presents a computationally cheap scheme for generating ergodic sequences of subproblem solutions and which is shown to converge to the optimal set of the convexified problem (15.5). This scheme is then enhanced, in terms of faster convergence to the optimal set. The section is then concluded by a heuristic scheme for the finite generation of feasible solutions that are ε-optimal, for an arbitrary ε > 0.

3.1 Basic Convergence in the Lagrangian Dual Problem

We consider solving the Lagrangian dual problem (15.10) by the conditional subgradient optimization method, which is given by the following. Choose a starting solution \({\boldsymbol u}^0 \in \mathbb {R}_+^m\) and compute iterates u t, t = 0, 1, …, according to

$$\displaystyle \begin{aligned} {\boldsymbol u}^{t+\frac{1}{2}} = {\boldsymbol u}^t + \alpha_t \big( {\boldsymbol b} - A {\boldsymbol x}({\boldsymbol u}^t) - \boldsymbol{\nu}({\boldsymbol u}^t) \big), \quad {\boldsymbol u}^{t+1} = \big[ {\boldsymbol u}^{t+\frac{1}{2}} \big]_+, \end{aligned} $$
(15.21)

where x(u t) ∈ X(u t) solves the Lagrangian subproblem in (15.9) at \({\boldsymbol u}^t \in \mathbb {R}^m_+\), so that b − Ax(u t) ∈ ∂h(u t) is a subgradient to h at u t, \(\boldsymbol {\nu }({\boldsymbol u}^t) \in N_{\mathbb {R}^m_+}({\boldsymbol u}^t)\) is an element of the normal cone of \(\mathbb {R}^m_+\) at u t, α t > 0 is the step length chosen at iteration t, and \(\left [ \cdot \right ]_+\) denotes the Euclidean projection onto the nonnegative orthant \(\mathbb {R}^m_+\). Note that \({\boldsymbol b} - A {\boldsymbol x}({\boldsymbol u}^t) - \boldsymbol {\nu }({\boldsymbol u}^t) \in \partial ^{\mathbb {R}^m_+} h({\boldsymbol u}^t)\), that is, the step direction belongs to the conditional subdifferential, defined in (15.12).

If {ν(u t)} := {0}, then the method (15.21) reduces to the traditional subgradient optimization method.

The choices \(\boldsymbol {\nu }({\boldsymbol u}^t) := \operatorname *{\text{argmin}}_{{\boldsymbol v} \in N_{\mathbb {R}^m_+}({\boldsymbol u}^t)} \|\, {\boldsymbol v} - ({\boldsymbol b} - A {\boldsymbol x}({\boldsymbol u}^t)) \,\|\), where ∥⋅∥ denotes the Euclidean norm, define a special case of the method (15.21) called the subgradient projection method, which uses a feasible direction from every \({\boldsymbol u}^t \in \mathbb {R}^m_+\), as

$$\displaystyle \begin{aligned} b_i - {\boldsymbol A}_i {\boldsymbol x}({\boldsymbol u}^t) - \nu_i({\boldsymbol u}^t) = \left\{ \begin{array}{lll} \big [b_i - {\boldsymbol A}_i {\boldsymbol x}({\boldsymbol u}^t) \big]_+ \, , \quad & \text{ if } & u_i^t = 0, \\ \, b_i - {\boldsymbol A}_i {\boldsymbol x}({\boldsymbol u}^t) \, , & \text{ if } & u_i^t > 0, \end{array} \right. \quad i=1, \ldots, m, \end{aligned} $$
(15.22)

here b i/A i/ν i(⋅) denotes the i-th element/row of the vector/matrix b/A/ν(⋅).

Due to the nondifferentiability of the Lagrangian dual objective function, subgradient based optimization methods cannot rely on a one dimensional maximization for determining the step length in each iteration. Instead, to ensure convergence of the sequence of dual iterates to optimality, the step lengths must be computed according to a (typically predefined) rule. We next present convergence results for the method (15.21) under different step length rules. The first assures that the sequence of dual iterates tends to the set of solutions to the Lagrangian dual.

Theorem 15.1 (Convergence to the Solution Set by Divergent Series Step Lengths)

Let the method (15.21) be applied to the problem (15.10), with the sequence {α t} of step lengths fulfilling the divergent series conditions

$$\displaystyle \begin{aligned} \alpha_t > 0, \qquad & t = 0, 1, 2, \ldots, {} \end{aligned} $$
(15.23a)
$$\displaystyle \begin{aligned} \alpha_t \rightarrow 0, \qquad & \mathit{\text{as}} \quad t \rightarrow \infty, {} \end{aligned} $$
(15.23b)

and

$$\displaystyle \begin{aligned} \left\{ \sum_{s=0}^{t-1} \alpha_s \,\right\} \rightarrow \infty, \qquad & \mathit{\text{as}} \quad t \rightarrow \infty. {} \end{aligned} $$
(15.23c)

If the sequence {ν(u t)} is bounded then it holds that {h(u t)}→ h and \(\big \{ \min _{{\boldsymbol u} \in U^*} \|\, {\boldsymbol u} - {\boldsymbol u}^t \,\| \,\big \} \rightarrow 0\)as t ∞.

Proof

We show that the iterates will eventually belong to an arbitrarily small neighbourhood of the set of solutions to (15.10).

Let δ > 0 be arbitrary and define \(B^{\delta }: = \{ {\boldsymbol u} \in \mathbb {R}^m \,|\, \|{\boldsymbol u} \| \leq \delta \}\). Since the function h is piecewise affine and concave, the set \(\mathbb {R}^m_+\) is nonempty, closed and convex, and the set U is nonempty and polyhedral, there exists an ε > 0 such that the level set \(U^{\varepsilon } := \{ {\boldsymbol u} \in \mathbb {R}^m_+ \,|\, h({\boldsymbol u}) \geq h^* - \varepsilon \}\) fulfills U ε ⊆ U  + B δ∕2. Further, the sequence {b − Ax(u t) −ν(u t)} is bounded and α t → 0. Hence, there exists an N δ such that α tb − Ax(u t) −ν(u t) ∥2 ≤ ε and α tb − Ax(u t) −ν(u t) ∥≤ δ∕2 for all t ≥ N δ.

The sequel of the proof is based on induction. First we show that there exists a t δ ≥ N δ such that \({\boldsymbol u}^{t_{\delta }} \in U^* + B^{\delta }\). Then, we establish that if the inclusion u t ∈ U  + B δ holds for some value t ≥ N δ, then also u t+1 ∈ U  + B δ.

For an arbitrary u ∈ U , in each iteration t of the method (15.21) the relations

$$\displaystyle \begin{aligned} \left\|\, {\boldsymbol u}^* - {\boldsymbol u}^{t+1} \,\right\|{}^2 & = \left\|\, {\boldsymbol u}^* - \big[ {\boldsymbol u}^t + \alpha_t \big( {\boldsymbol b} - A {\boldsymbol x}({\boldsymbol u}^t) - \boldsymbol{\nu}({\boldsymbol u}^t) \big) \big]_+ \,\right\|{}^2 {} \\ & \leq \left\|\, {\boldsymbol u}^* - {\boldsymbol u}^t - \alpha_t \big( {\boldsymbol b} - A {\boldsymbol x}({\boldsymbol u}^t) - \boldsymbol{\nu}({\boldsymbol u}^t) \big) \,\right\|{}^2 \\ & = \left\|\, {\boldsymbol u}^* - {\boldsymbol u}^t \,\right\|{}^2 - 2 \alpha_t \, \big( {\boldsymbol b} - A {\boldsymbol x}({\boldsymbol u}^t) - \boldsymbol{\nu}({\boldsymbol u}^t) \big)^{\top} \big( {\boldsymbol u}^* - {\boldsymbol u}^t \big) \\ & \qquad + \alpha_t^2 \left\|\, {\boldsymbol b} - A {\boldsymbol x}({\boldsymbol u}^t) - \boldsymbol{\nu}({\boldsymbol u}^t) \,\right\|{}^2 \end{aligned} $$
(15.24)

hold, where the inequality follows from the projection property.Footnote 6 Now suppose that, for all s ≥ N δ, the inequality

(15.25)

holds. Then by telescoping (15.24), we obtain that for every t ≥ N δ, the inequality

$$\displaystyle \begin{aligned} \left\|\, {\boldsymbol u}^* - {\boldsymbol u}^{t+1} \,\right\|{}^2 < \left\|\, {\boldsymbol u}^* - {\boldsymbol u}^{N_{\delta}} \,\right\|{}^2 - \varepsilon \sum_{s=N_{\delta}}^t \alpha_s, \end{aligned} $$

holds. Then, from (15.23c) it follows that the right-hand-side of this inequality tends to − as t →, which is clearly impossible. Therefore, the inequality

$$\displaystyle \begin{aligned} 2 \big( {\boldsymbol b} - A {\boldsymbol x}({\boldsymbol u}^t) - \boldsymbol{\nu}({\boldsymbol u}^t) \big)^{\top} \big( {\boldsymbol u}^* - {\boldsymbol u}^t \big) - \alpha_t \left\|\, {\boldsymbol b} - A {\boldsymbol x}({\boldsymbol u}^t) - \boldsymbol{\nu}({\boldsymbol u}^t) \,\right\|{}^2 \leq \varepsilon\end{aligned} $$
(15.26)

must hold for at least one iteration t = t δ ≥ N δ. By definition of N δ the inequality \(\alpha _{t_\delta } \|\, {\boldsymbol b} - A {\boldsymbol x}({\boldsymbol u}^{t_\delta }) - \boldsymbol {\nu }({\boldsymbol u}^{t_\delta }) \,\|{ }^2 \leq \varepsilon \) holds, which together with (15.26) implies the inequality \(({\boldsymbol b} - A {\boldsymbol x}({\boldsymbol u}^{t_{\delta }}) - \boldsymbol {\nu }({\boldsymbol u}^{t_{\delta }}))^{\top } ({\boldsymbol u}^* - {\boldsymbol u}^{t_{\delta }}) \leq \varepsilon \). Since \({\boldsymbol u}^*, {\boldsymbol u}^{t_{\delta }} \in \mathbb {R}^m_+\), the definition (15.12) implies the inequality \(h({\boldsymbol u}^*) - h({\boldsymbol u}^{t_{\delta }}) \leq ({\boldsymbol b} - A {\boldsymbol x}({\boldsymbol u}^{t_{\delta }}) - \boldsymbol {\nu }({\boldsymbol u}^{t_{\delta }}))^{\top } ({\boldsymbol u}^* - {\boldsymbol u}^{t_{\delta }})\). Hence, it holds that \(h({\boldsymbol u}^{t_{\delta }}) \geq h^* - \varepsilon \), that is, \({\boldsymbol u}^{t_{\delta }} \in U^{\varepsilon } \subseteq U^* + B^{\delta / 2} \subset U^* + B^{\delta }\).

Now, suppose that u t ∈ U  + B δ for some t ≥ N δ. If (15.25) holds, then (15.24) implies the inequality ∥ u u t+1 ∥ < ∥ u u t ∥ for any u  ∈ U . Defining the Euclidean projection of u t onto U as \(\overline {{\boldsymbol u}}_{\text{proj}}^t := \operatorname *{\text{argmin}}_{{\boldsymbol u} \in U^*} \|{\boldsymbol u} - {\boldsymbol u}^t \|\) then yields the inequalities

$$\displaystyle \begin{aligned} \left\|\, \overline{{\boldsymbol u}}_{\text{proj}}^{t+1} - {\boldsymbol u}^{t+1} \,\right\| & \leq \left\|\, \overline{{\boldsymbol u}}_{\text{proj}}^t - {\boldsymbol u}^{t+1} \,\right\| < \left\|\, \overline{{\boldsymbol u}}_{\text{proj}}^t - {\boldsymbol u}^t \,\right\| \leq \delta, \qquad t=0,1, \ldots,\end{aligned} $$

which imply the inclusion u t+1 ∈ U  + B δ. Otherwise, (15.26) must hold and, using the same arguments as above, we obtain the inequality h(u t) ≥ h − ε, that is, u t ∈ U ε ⊆ U  + B δ∕2. Since the relations

$$\displaystyle \begin{aligned} \left\|\, {\boldsymbol u}^{t+1} - {\boldsymbol u}^t \,\right\| & = \left\|\, \big[ {\boldsymbol u}^t + \alpha_t \big( {\boldsymbol b} - A {\boldsymbol x}({\boldsymbol u}^t) - \boldsymbol{\nu}({\boldsymbol u}^t) \big) \big]_+ - {\boldsymbol u}^t \,\right\| \\ & \leq \left\|\, {\boldsymbol u}^t + \alpha_t \big( {\boldsymbol b} - A {\boldsymbol x}({\boldsymbol u}^t) - \boldsymbol{\nu}({\boldsymbol u}^t) \big) - {\boldsymbol u}^t \,\right\| \\ &= \alpha_t \left\|\, {\boldsymbol b} - A {\boldsymbol x}({\boldsymbol u}^t) - \boldsymbol{\nu}({\boldsymbol u}^t) \,\right\| \leq \frac{\delta}{2} \end{aligned} $$

hold whenever t ≥ N δ, it follows that u t+1 ∈ U  + B δ∕2 + B δ∕2 = U  + B δ. We conclude that, in either case, whenever t ≥ N δ, u t ∈ U  + B δ implies that u t+1 ∈ U  + B δ.

By induction with respect to t ≥ t δ, it follows that u t ∈ U  + B δ for all t ≥ t δ. Since this holds for arbitrarily small values of δ > 0 and since the function h is continuous, the theorem follows. □

By requiring also that the sum of squared step lengths is convergent it is ensured that the sequence of dual iterates accumulates at a point in the dual solution set.

Theorem 15.2 (Convergence to a Solution by Quadratically Convergent Divergent Series Step Lengths)

Let the method (15.21) be applied to the problem (15.10), with the step lengths α tfulfilling the conditions (15.23) as well as

$$\displaystyle \begin{aligned} \left\{ \sum_{s=0}^{t-1} \alpha_s^2 \,\right\} & \rightarrow p, \quad \mathit{\text{as}} \quad t \rightarrow \infty, {} \end{aligned} $$
(15.27)

where p < ∞. If the sequence {ν(u t)} is bounded, then {u t}→u  U .

Proof

Let u ∈ U be arbitrary and let t ≥ 1. Telescoping (15.24) yields the inequality

$$\displaystyle \begin{aligned} \left\|\, {\boldsymbol u}^* - {\boldsymbol u}^t \,\right\|{}^2 & \leq \left\|\, {\boldsymbol u}^* - {\boldsymbol u}^0 \,\right\|{}^2 - 2 \sum_{s=0}^{t-1} \alpha_s \big( {\boldsymbol b} - A {\boldsymbol x}({\boldsymbol u}^s) - \boldsymbol{\nu}({\boldsymbol u}^s) \big)^{\top} \left( {\boldsymbol u}^* - {\boldsymbol u}^s \right) \\ & \quad + \sum_{s=0}^{t-1} \alpha_s^2 \big\|\, {\boldsymbol b} - A {\boldsymbol x}({\boldsymbol u}^s) - \boldsymbol{\nu}({\boldsymbol u}^s) \,\big\|{}^2. \end{aligned} $$
(15.28)

Since u ∈ U , \({\boldsymbol u}^s \in \mathbb {R}^m_+\), and \({\boldsymbol b} - A {\boldsymbol x}({\boldsymbol u}^s) - \boldsymbol {\nu }({\boldsymbol u}^s) \in \partial ^{\mathbb {R}^m_+} h({\boldsymbol u}^s)\) for all s ≥ 0 we obtain the inequalities

$$\displaystyle \begin{aligned} h({\boldsymbol u}^s) \leq h({\boldsymbol u}^*) \leq h({\boldsymbol u}^s) + \big( {\boldsymbol b} - A {\boldsymbol x}({\boldsymbol u}^s) - \boldsymbol{\nu}({\boldsymbol u}^s) \big)^{\top} \left( {\boldsymbol u}^* - {\boldsymbol u}^s \right), \end{aligned} $$
(15.29)

and hence that the inequality (b − Ax(u s) −ν(u s))(u u s) ≥ 0 holds for all s ≥ 0. We define c :=supt{∥ b − Ax(u t) −ν(u t) ∥}, so that the inequality ∥ b − Ax(u s) −ν(u s) ∥≤ c holds for all s ≥ 0. From (15.27) and (15.23a) follow that \(\sum _{s=0}^{t-1} \alpha _s^2 < p\) for all t ≥ 1. By inserting this in (15.28), we then conclude that ∥ u u t2 < ∥ u u 02 + pc 2 for any t ≥ 1; it follows that the sequence {u t} is bounded.

Assume now that there is no subsequence \(\mathcal {T}\) such that \(\{ ({\boldsymbol b} - A {\boldsymbol x}({\boldsymbol u}^t) - \boldsymbol {\nu }({\boldsymbol u}^t))^{\top } ({\boldsymbol u}^* - {\boldsymbol u}^t) \}_{t \in \mathcal {T}} \rightarrow 0\). Then, there exists an ε > 0 and a t ε > 1 such that the inequality (b − Ax(u s) −ν(u s))(u u s) ≥ ε for all s ≥ t ε. By (15.28) and (15.23c) then follow that {∥ u u t ∥}→−, which is clearly impossible. Therefore, there is a subsequence \(\mathcal {T}\) such that \(\{ ({\boldsymbol b} - A {\boldsymbol x}({\boldsymbol u}^t) - \boldsymbol {\nu }({\boldsymbol u}^t))^{\top } ({\boldsymbol u}^* - {\boldsymbol u}^t) \}_{t \in \mathcal {T}} \rightarrow 0\). From (15.29) then follows that \(\{ h({\boldsymbol u}^t) \}_{t \in \mathcal {T}} \rightarrow h^*\). The boundedness of the sequence {u t} implies the existence of an accumulation point of the subsequence \(\{ {\boldsymbol u}^t \}_{t \in \mathcal {T}}\), say u , and by the continuity of h then follows that u ∈ U .

To show that u is the only accumulation point of the sequence {u t}, let δ > 0 and choose an N δ ≥ 0 such that the inequalities \(\| \, {\boldsymbol u}^{\infty } - {\boldsymbol u}^{N_{\delta }} \|{ }^2 \, \leq \delta /2\) and \(\sum _{s=N_\delta }^{\infty } \alpha _s^2 \leq \delta /(2c^2)\) hold. Consider any t > N δ. Analogously to the derivation of (15.28), and using (15.29), we then obtain that

$$\displaystyle \begin{aligned} \left\|\, {\boldsymbol u}^\infty - {\boldsymbol u}^t \,\right\|{}^2 &\leq \left\|\, {\boldsymbol u}^\infty - {\boldsymbol u}^{N_\delta} \,\right\|{}^2 + \sum_{s=N_\delta}^{t-1} \alpha_s^2 \,\big\|\, {\boldsymbol b} - A {\boldsymbol x}({\boldsymbol u}^s) - \boldsymbol{\nu}({\boldsymbol u}^s) \,\big\|{}^2 \\ &< \frac{\delta}{2} + \frac{\delta}{2 c^2} c^2 = \delta. \end{aligned} $$

Since this holds for arbitrarily small values of δ > 0, the theorem follows. □

In subgradient optimization, the important Polyak step length rule has documented practical usefulness. Convergence to an optimal solution with this step length rule relies on the optimal objective value h . This result extends to the case of the conditional subgradient optimization method (15.21), for which the Polyak step length formula is defined by

$$\displaystyle \begin{aligned} \alpha_t := \frac{\theta_t \left( h^* - h({\boldsymbol u}^t) \right)}{\left\|\, {\boldsymbol b} - A {\boldsymbol x}({\boldsymbol u}^t) - \boldsymbol{\nu}({\boldsymbol u}^t) \,\right\|{}^2}, \quad 0 < \epsilon_1 \leq \theta_t \leq 2 - \epsilon_2 < 2, \quad t = 0, 1, 2, \ldots \end{aligned} $$
(15.30)

Proposition 15.7 (Convergence to a Solution by Polyak Step Lengths)

Let the method (15.21) be applied to the problem (15.10), with the step lengths α tfulfilling the conditions (15.30). If the sequence {ν(u t)} is bounded, then {h(u t)}→ h and {u t}→u  U .

Remark 15.3

With step lengths defined by (15.30), the case of subgradient projection according to (15.22) yields actual steps in (15.21) (that is, u t+1 −u t) that are longer, as compared with the case of plain subgradients, that is, when {ν t}≡{0} holds.

For an \(\varepsilon > \overline {h} - h^* \geq 0\), finite convergence to ε-optimality can be achieved by replacing h in (15.30) by an upper bound \(\overline {h} \geq h^*\) and letting θ t ≡ 1.

Proposition 15.8 (Finite ε-Optimality by Polyak Step Lengths)

Let the method (15.21) be applied to the problem (15.10), with the step lengths {α t} defined by

$$\displaystyle \begin{aligned} \alpha_t := \frac{\theta_t \big( \overline{h} - h({\boldsymbol u}^t) \big)}{\|\, {\boldsymbol b} - A {\boldsymbol x}({\boldsymbol u}^t) - \boldsymbol{\nu}({\boldsymbol u}^t) \,\|{}^2}, \qquad t = 0, 1, 2, \ldots, \end{aligned} $$
(15.31)

where θ t ≡ 1 and \(\overline {h} > h^*\). If the sequence {ν(u t)} is bounded, then for any ε > 0, there is a t ε > 0 such that \(h({\boldsymbol u}^{t_\varepsilon }) \geq 2 h^* - \overline {h} - \varepsilon \).

Our results to follow in Sect. 15.3.2—on ergodic convergence to a primal solution—rely on the divergent series conditions (15.23), (15.27) on the step lengths in the method (15.21). By ensuring that the sequence {α t} fulfills the conditions

$$\displaystyle \begin{aligned} \frac{\underline{a}}{b + t} \leq \alpha_t \leq \frac{\overline{a}}{b + t}, \qquad 0 < \underline{a} \leq \overline{a}, \quad b > 0, \quad t = 0, 1, \ldots,\end{aligned} $$
(15.32)

convergence in terms of Theorem 15.2 can, however, be shown.Footnote 7

Corollary 15.1 (Convergence to a Solution by Generalized Harmonic Series Step Lengths)

Let the method (15.21), where the sequence {α t} fulfills (15.32), be applied to the problem (15.10). If the sequence {ν(u t)} is bounded, then {u t}→u  U hold.

3.2 Ergodic Convergence in the Primal Problem

The conditional subgradient optimization method (15.21) constructs a sequence {x(u t)} of solutions to the Lagrangian subproblem (15.9). Due to the non-coordinability of the Lagrangian subproblem (see Sect. 15.2.2) this sequence is, however, not convergent. We propose a scheme for generating an ergodic sequence of subproblem solutions, which is shown to converge to the solution set \(X_{ \operatorname {\mathrm {conv}}}^*\). The generation of the ergodic sequence is computationally cheap and its storage requires a relatively small amount of memory. The sequence is defined by convexity weights that are proportional to the step lengths α t; the latter requirement is then generalized and improved.

From Propositions 15.2 and 15.4 follow that the set \(\partial h({\boldsymbol u}^\infty ) \cap N_{\mathbb {R}^m_+}({\boldsymbol u}^\infty )\) is nonempty. The next proposition establishes that the sequence {b − Ax(u t)} of subgradients to the dual objective function converges in an ergodic sense to an element that verifies optimality of the Lagrangian dual, in terms of Proposition 15.4.

Proposition 15.9 (Ergodic Subgradients Converge to the Optimality-Verifying Set)

Apply the method (15.21), (15.23) to the problem (15.10) and define the sequence \(\big \{ \overline {{\boldsymbol g}}^t \,\big \}\)as

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} \overline{{\boldsymbol g}}^t := \frac{1}{\sum_{s=0}^{t-1} \alpha_s} \sum_{s=0}^{t-1} \alpha_s \big( {\boldsymbol b} - A {\boldsymbol x} ({\boldsymbol u}^s) \big), \qquad t=1,2,\ldots \end{array} \end{aligned} $$

If the sequence {ν(u t)} is bounded, then \(\big \{ \operatorname *{\mathrm {min}}_{{ {\boldsymbol \gamma }} \in \partial h({\boldsymbol u}^\infty ) \cap N_{\mathbb {R}^m_+}({\boldsymbol u}^\infty )} \|\, {\boldsymbol \gamma } - \overline {{\boldsymbol g}}^t \,\| \,\big \} \rightarrow 0\).

The ergodic sequence \(\big \{ \overline {{\boldsymbol x}}^t \,\big \}\) is then defined as weighted averages (that is, convex combinations) of the subproblem solutions found up to iteration t of (15.21), as

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} \overline{{\boldsymbol x}}^t := \frac{1}{\sum_{s=0}^{t-1} \alpha_s} \sum_{s=0}^{t-1} \alpha_s {\boldsymbol x}({\boldsymbol u}^s), \qquad t=1,2,\ldots \end{array} \end{aligned} $$
(15.33)

The convergence of the sequence \(\{ \overline {{\boldsymbol x}}^t \}\) to the set \(X^*_{ \operatorname {\mathrm {conv}}}\), as expressed in (15.14), is then established in terms of fulfilment of the optimality conditions in Proposition 15.5.

Theorem 15.3 (\(\{ \overline {{\boldsymbol x}}^t \}\) Converges to the Optimal Set of the Convexified Problem)

Let the method (15.21), (15.23) be applied to the problem (15.10), the set \(X^*_{ \operatorname {\mathrm {conv}}}\)and the sequence \(\{ \overline {{\boldsymbol x}}^t \}\)be given by (15.14) and (15.33), respectively, and suppose that the sequence {ν(u t)} is bounded. Then, \(\big \{ \operatorname *{\mathrm {min}}_{{\boldsymbol x} \in X^*_{ \operatorname {\mathrm {conv}}}} \|\, {\boldsymbol x} - \overline {{\boldsymbol x}}^t \,\| \,\big \} \rightarrow 0\).

Efficient updates of the ergodic iterates \(\overline {{\boldsymbol x}}^t\) requires only the previous ergodic iterate \(\overline {{\boldsymbol x}}^{t-1}\) and subproblem solution x(u t−1), according to the convex combination

$$\displaystyle \begin{aligned} \overline{{\boldsymbol x}}^t := \frac{\sum_{s=0}^{t-1} \alpha_s - \alpha_{t-1}}{\sum_{s=0}^{t-1} \alpha_s}\, \overline{{\boldsymbol x}}^{t-1} + \frac{\alpha_{t-1}}{\sum_{s=0}^{t-1} \alpha_s}\, {\boldsymbol x}({\boldsymbol u}^{t-1}), \qquad t = 2, 3, \ldots \end{aligned} $$
(15.34)

with \(\overline {{\boldsymbol x}}^1 := {{\boldsymbol x}}({\boldsymbol u}^0)\).

3.3 Enhanced Primal Ergodic Convergence

The convergence of the ergodic sequence of subproblem solutions according to Theorem 15.3 is, however, typically very slow. Efforts have therefore been put into enhancing the convergence speed, by exploiting more information from later subproblem solutions than from earlier ones. We next present a more general pattern for constructing the ergodic sequences; the ergodic sequence \(\{ \tilde {{\boldsymbol x}}^t \}\) is defined by

$$\displaystyle \begin{aligned} \tilde{{\boldsymbol x}}^t := \sum_{s=0}^{t-1} \mu_s^t {\boldsymbol x}({\boldsymbol u}^s); \qquad \sum_{s=0}^{t-1} \mu_s^t = 1; \qquad \mu_s^t \geq 0, \quad s=0, \ldots, t-1, \end{aligned} $$
(15.35)

where the convexity weights \(\mu _s^t\) are defined as

$$\displaystyle \begin{aligned} \mu_s^t := \gamma_s^t \alpha_s, \quad s = 0, \ldots, t-1, \quad t = 1,2, \ldots, \end{aligned} $$
(15.36a)

and the parameters \(\gamma _s^t\) fulfil the requirements (N > 0 being a constant)

$$\displaystyle \begin{aligned} \gamma_s^t & \geq \gamma_{s-1}^t, && s = 1, \ldots, t-1, \, t = 2,3, \ldots, {} \end{aligned} $$
(15.36b)
(15.36c)
$$\displaystyle \begin{aligned} \gamma_0^t & \rightarrow 0, && \text{as } t \rightarrow \infty, {} \end{aligned} $$
(15.36d)
$$\displaystyle \begin{aligned} \gamma_{t-1}^t & \leq N, && t = 1, 2, \ldots {} \end{aligned} $$
(15.36e)

The requirement (15.36b), together with the definition (15.36a), implies the inequality \(\mu _s^t \, (\mu _{s-1}^t)^{-1} \geq \alpha _s \, (\alpha _{s-1})^{-1}\), which means that the ratio of any two consecutive convexity weights should be no less than that of the corresponding step lengths. The requirement (15.36c) implies that the difference between consecutive pairs of subsequent convexity weights tends to zero as t increases, meaning that no primal iterate may be completely neglected. The requirement (15.36d) implies, however, that the weight (that is, \(\mu _0^t\)) of the first iterate [that is, x(u 0)] tends to zero when t increases. The requirements (15.36e), (15.36d), and (15.36b) assure that, for decreasing step lengths α s, the convexity weights \(\mu _s^t\) decrease at a rate not slower than that of the step lengths. Efficient updates of the ergodic iterates \(\tilde {{\boldsymbol x}}^t\) can be made, analogously to (15.34).

Theorem 15.4 (\(\{ \tilde {{\boldsymbol x}}^t \}\) Converges to the Optimal Set of the Convexified Problem)

Apply the method (15.21), (15.23), (15.27) to the problem (15.10) and define the set \(X^*_{ \operatorname {\mathrm {conv}}}\)and the sequence \(\{ \tilde {{\boldsymbol x}}^t \}\)by (15.14) and (15.35), (15.36), respectively. If the sequence {ν(u t)} is bounded, then \(\textstyle \big \{ \min _{{\boldsymbol x} \in X^*_{ \operatorname {\mathrm {conv}}}} \|\, {\boldsymbol x} - \tilde {{\boldsymbol x}}^t \,\| \,\big \} \rightarrow 0\).

Remark 15.4

For any fixed value of s ∈{0, …, t − 1}, the requirements (15.36b)–(15.36e) imply that \(\gamma _s^t \leq \gamma _0^t + s \cdot \operatorname *{\text{max}}_{r \in \{ 1, \ldots , t-1 \}} \big \{ \gamma _r^t - \gamma _{r-1}^t \,\big \} \rightarrow 0\) as t →. This yields that \(\mu _s^t = \gamma _s^t \alpha _s \rightarrow 0\) as t →, since α s < , s = 0, 1, …

Remark 15.5

The ergodic sequence \(\{ \overline {{\boldsymbol x}}^t \}\), defined in (15.33), is equivalent to the special case of (15.35), (15.36) defined by \(\gamma _s^t := \big ( \sum _{r=0}^{t-1} \alpha _r \big )^{-1}\) (being independent of s). For this choice of \(\{ \gamma _s^t \}\), the requirements (15.36b)–(15.36e) are fulfilled with N := (α 0)−1.

Remark 15.6

For the special case of (15.35), (15.36) given by (15.32) with \(a := \underline {a} = \overline {a}\), that is, modified harmonic series step lengths

$$\displaystyle \begin{aligned} \alpha_s := \frac{a}{b + s}, \qquad a > 0, \quad b > 0, \quad s = 0, 1, \ldots, t-1, \end{aligned} $$
(15.37)

and choosing \(\gamma _s^t := (t \alpha _s)^{-1}\), the convexity weights become \(\mu _s^t = t^{-1}\) (then, \(\tilde {{\boldsymbol x}}^t\) equals a simple average of the subproblem solutions found). For these choices of \(\{ \gamma _s^t \}\) and {α s}, the requirements (15.36b)–(15.36e) are fulfilled, with N := a −1 ⋅max {b, 1}.

By choosing the step lengths α t according to (15.37) and letting the sequence \(\{ \mu _s^t \}\) of convexity weights fulfil the requirements

$$\displaystyle \begin{aligned} \mu_s^t & \geq \mu_{s-1}^t, && s = 1, \ldots, t-1, \, t = 2,3, \ldots, {} \end{aligned} $$
(15.38a)
(15.38b)
$$\displaystyle \begin{aligned} t \, \mu_{t-1}^t & \leq M, && t = 1, 2, \ldots, {} \end{aligned} $$
(15.38c)

where M > 0 is a constant, it can be shown that also the requirements (15.36b)–(15.36e) are fulfilled, with N = a −1M ⋅max{b, 1}.

The so-called s k-rule, for which the corresponding convexity weights, \(\mu _s^t\), fulfill the requirements (15.38), is defined by

$$\displaystyle \begin{aligned} \mu_s^t := \frac{(s+1)^k}{\sum_{r=0}^{t-1} (r+1)^k}, \qquad s = 0, \ldots, t-1, \quad t = 1, 2, \ldots, \quad k \geq 0. \end{aligned} $$
(15.39)

For k > 0, the s k-rule results in an ergodic sequence (15.35) in which the later iterates are assigned higher weights than the earlier ones. For larger values of k, the weights are shifted towards later iterates. Given that step lengths α t according to (15.37) are utilized in the method (15.21) applied to the Lagrangian dual (15.10), it can be shown that weights according to (15.39) fulfill the requirements (15.38), so that convergence for the resulting primal ergodic sequence \(\{ \tilde {{\boldsymbol x}}^t \}\) to the optimal set \(X_{ \operatorname {\mathrm {conv}}}^*\) can be established.

Remark 15.7

Note that the s 0-rule yields \(\mu _s^t = t^{-1}\) [cf. Remark 15.6, where \(\gamma _s^t = (t \alpha _s)^{-1}\)]. For k > 0, the s k-rule results in an ergodic sequence in which later iterates are assigned higher weights than earlier ones. For larger values of k, the weights are shifted towards increasingly later iterates.

Remark 15.8

Since the conditional subgradient optimization method (15.21) is memory-less, without loss of generality the computation of the ergodic sequences \(\{ \overline {{\boldsymbol x}}^t \}\), \(\{ \overline {{\boldsymbol g}}^t \}\), and \(\{ \tilde {{\boldsymbol x}}^t \}\) may be postponed a finite number of iterations. If the postponement is “long enough”, each \(\overline {{\boldsymbol x}}^t\) (or \(\tilde {{\boldsymbol x}}^t\)) will be a solution to the Lagrangian subproblem at the optimal dual point u , as defined in Theorem 15.2.

3.4 Finite Primal Feasibility and Finite ε-Optimality

Theorem 15.3 establishes optimality in the limit for the sequence \(\{ \overline {{\boldsymbol x}}^t \}\). While dual feasibility holds for the sequence {u t}, in general neither primal feasibility (that is, \(A \overline {{\boldsymbol x}}^t \geq {\boldsymbol b}\)) nor complementarity [that is, \(({{\boldsymbol u}^t})^{\top } ({\boldsymbol b} - A \overline {{\boldsymbol x}}^t) = 0\)] will be finitely satisfied. Eventually, however, \(\overline {{\boldsymbol x}}^t\) will be both near feasible and near complementary. Whenever finite primal feasibility is required, a procedure can be applied that converts any \(\overline {{\boldsymbol x}}^t\) into a feasible solution to (15.5), for example, by computing the Euclidean projection of \(\overline {{\boldsymbol x}}^t\) onto the feasible set, as

$$\displaystyle \begin{aligned} \overline{{\boldsymbol x}}_{\text{proj}}^t := \operatorname*{\text{argmin}}_{{\boldsymbol x} \in X_{\operatorname{\mathrm{conv}}}} \left\{ \left. \big\|\, {\boldsymbol x} - \overline{{\boldsymbol x}}^t \,\big\| \,\right|\, A {\boldsymbol x} \geq {\boldsymbol b} \,\right\}. \end{aligned} $$
(15.40)

Solving (15.40) regularly may, however, be computationally too expensive. Instead, we consider a heuristic procedure, preferably exploiting the structure of the set \(\{ {\boldsymbol x} \in X_{ \operatorname {\mathrm {conv}}} \,|\, A {\boldsymbol x} \geq {\boldsymbol b} \}\) in the search for a feasible and near optimal solution to (15.40). Let the function \(\delta : \mathbb {R}_+ \mapsto \mathbb {R}_+\) be continuous and such that δ(β) > 0 whenever β > 0 and \(\lim _{\beta \rightarrow 0^+} \delta (\beta ) = 0\). Define a heuristic projection \(\overline {{\boldsymbol x}}_{\text{heur}}^t\) of \(\overline {{\boldsymbol x}}^t \in X_{ \operatorname {\mathrm {conv}}}\) by the inclusion

$$\displaystyle \begin{aligned} \overline{{\boldsymbol x}}_{\text{heur}}^t \in \big\{ {\boldsymbol x} \in X_{\operatorname{\mathrm{conv}}} \,|\, A {\boldsymbol x} \geq {\boldsymbol b} \,\big\}, \end{aligned} $$
(15.41a)

and such that

$$\displaystyle \begin{aligned} \left\|\, \overline{{\boldsymbol x}}_{\text{heur}}^t - \overline{{\boldsymbol x}}_{\text{proj}}^t \,\right\| \leq \delta(\beta) \quad \text{ whenever } \quad \left\|\, \overline{{\boldsymbol x}}^t - \overline{{\boldsymbol x}}_{\text{proj}}^t \,\right\| & \leq \beta. \end{aligned} $$
(15.41b)

Theorem 15.5 (Convergence to Primal Optimality by Heuristic Projections)

Let the method (15.21), (15.23), (15.27) be applied to the problem (15.10). Let the set \(X^*_{ \operatorname {\mathrm {conv}}}\)and the sequences \(\{ \overline {{\boldsymbol x}}^t \}\), \(\{ \overline {{\boldsymbol x}}_{\mathrm {proj}}^t \}\), and \(\{ \overline {{\boldsymbol x}}_{\mathrm {heur}}^t \}\)be defined by (15.14), (15.33), (15.40), and (15.41a), respectively. If the sequence {ν(u t)} is bounded and the conditions (15.41b) hold, then \(\big \{ \operatorname *{\mathrm {min}}_{{\boldsymbol x} \in X^*_{ \operatorname {\mathrm {conv}}}} \|\, {\boldsymbol x} - \overline {{\boldsymbol x}}_{\mathrm {heur}}^t \,\| \,\big \} \rightarrow 0\).

We can now construct an algorithm employing heuristic projections and yielding convergence to the optimal value in both the primal and dual procedures.

Corollary 15.2 (Finite Termination at ε-Optimality)

Given the assumptions of Theorem 15.5, for every ε > 0 there is a t ε > 0 such that \({\boldsymbol c}^{\top } \overline {{\boldsymbol x}}_{\mathrm {heur}}^t - h({\boldsymbol u}^t) \leq \varepsilon \)holds for all t  t ε.

This projection principle is thus a way to recover primal feasibility, and eventually also optimality, in an otherwise purely dual method.

4 Dual Cutting-Planes: Dantzig–Wolfe Decomposition

Assuming that a nonempty subset \(\overline {\mathcal {Q}} \subseteq \mathcal {Q}\) is given, the Lagrangian dual function is outer approximated by the function \(\overline {h}: \mathbb {R}^m \mapsto \mathbb {R}\) as defined by

$$\displaystyle \begin{aligned} \overline{h}({\boldsymbol u}) & := {\boldsymbol b}^{\top} {\boldsymbol u} + \operatorname*{\text{min}}_{q \in \overline{\mathcal{Q}}} \, \Big\{ \big( {\boldsymbol c} - A^{\top} {\boldsymbol u} \big)^{\top}{\boldsymbol x}^q \,\Big\}. \end{aligned} $$
(15.42)

This function is formed by the point-wise minimum of \(\big |\, \overline {\mathcal {Q}} \,\big |\) affine functions. Hence, it is piecewise affine and concave, as is the Lagrangian dual function h. Clearly, \(\overline {h}({\boldsymbol u}) \geq h({\boldsymbol u})\) holds for any \({\boldsymbol u} \in \mathbb {R}^m\). Hence, the problem

$$\displaystyle \begin{aligned} \overline{h}^* := {\displaystyle \operatorname*{\text{max}}_{{\boldsymbol u} \in \mathbb{R}^m_+}} \; \overline{h}({\boldsymbol u}), \end{aligned} $$
(15.43)

is a relaxation of the Lagrangian dual problem (15.10), so that \(\overline {h}^* \geq h^*\) holds. The subdifferential of the function \(\overline {h}\) at u is given by

$$\displaystyle \begin{aligned} \partial \overline{h}({\boldsymbol u}) & := {\displaystyle \operatorname*{\text{conv}}_{q \in \overline{\mathcal{Q}}({\boldsymbol u})}} \, \big\{ {\boldsymbol b} - A {\boldsymbol x}^q \big\}, \qquad {\boldsymbol u} \in \mathbb{R}^m_+, \end{aligned} $$

where \(\overline {\mathcal {Q}}({\boldsymbol u})\) denotes the optimal index set of the inner minimization in (15.42), and its conditional subdifferential is given by

$$\displaystyle \begin{aligned} \partial^{\mathbb{R}^m_+} \overline{h}({\boldsymbol u}) = \partial \overline{h}({\boldsymbol u}) - N_{\mathbb{R}^m_+}({\boldsymbol u}), \qquad {\boldsymbol u} \in \mathbb{R}^m_+. \end{aligned} $$

The relaxed dual problem (15.43) has a bounded optimal solution if and only if

$$\displaystyle \begin{aligned} \mathbf{0} & \in \bigcup_{{\boldsymbol u} \in \mathbb{R}^m_+} \partial^{\mathbb{R}^m_+} \overline{h}({\boldsymbol u}). \end{aligned} $$
(15.44)

Remark 15.9

A sufficient condition for the relaxed Lagrangian dual problem (15.43) to have a bounded optimal solution is that some point \({\boldsymbol x}^{\overline {q}}\), \(\overline {q} \in \overline {\mathcal {Q}}\), is feasible in the original mixed-binary linear optimization problem (15.2), since this implies that

$$\displaystyle \begin{aligned} \overline{h}^* & = \operatorname*{\text{max}}_{{\boldsymbol u} \in \mathbb{R}^m_+} \; \Bigg\{ \operatorname*{\text{min}}_{q \in \overline{\mathcal{Q}}} \, \left\{ {\boldsymbol c}^{\top} {\boldsymbol x}^q + {\boldsymbol u}^{\top} \left( {\boldsymbol b} - A {\boldsymbol x}^q \right) \right\} \,\Bigg\} \\ &\leq \operatorname*{\text{max}}_{{\boldsymbol u} \in \mathbb{R}^m_+} \; \left\{ {\boldsymbol c}^{\top} {\boldsymbol x}^{\overline{q}} + {\boldsymbol u}^{\top} \left( {\boldsymbol b} - A {\boldsymbol x}^{\overline{q}} \right) \,\right\} \\ &= {\boldsymbol c}^{\top} {\boldsymbol x}^{\overline{q}} \end{aligned} $$

holds, where the second equality holds because \(A {\boldsymbol x}^{\overline {q}} \geq {\boldsymbol b}\).

Assuming that the condition (15.44) holds, we find a \(\overline {{\boldsymbol u}} \in \mathbb {R}^m_+\) such that the inclusion \(\mathbf {0} \in \partial ^{\mathbb {R}^m_+} \overline {h}(\overline {{\boldsymbol u}})\) holds, implying that \(\overline {{\boldsymbol u}}\) is optimal in (15.42). Hence, \(\overline {h}^* = \overline {h}(\overline {{\boldsymbol u}})\) holds. The value of the Lagrangian dual function at this point is

$$\displaystyle \begin{aligned} h(\overline{{\boldsymbol u}}) & = {\boldsymbol b}^{\top} \overline{{\boldsymbol u}} + \operatorname*{\text{min}}_{{\boldsymbol x} \in X} \, \left\{ \big( {\boldsymbol c} - A^{\top} \overline{{\boldsymbol u}} \big)^{\top} {\boldsymbol x} \,\right\} = {\boldsymbol b}^{\top} \overline{{\boldsymbol u}} + \operatorname*{\text{min}}_{q \in \mathcal{Q}} \, \left\{ \big( {\boldsymbol c} - A^{\top} \overline{{\boldsymbol u}} \big)^{\top} {\boldsymbol x}^q \,\right\}. \end{aligned} $$

From the Lagrangian dual problem (15.10) we conclude that \(h(\overline {{\boldsymbol u}}) \leq h^*\). To summarize, the relations \(h(\overline {{\boldsymbol u}}) \leq h^* \leq \overline {h}^*\) hold, that is, the values \(\overline {h}^*\) and \(h(\overline {{\boldsymbol u}})\) provide upper and lower bounds, respectively, on the optimal value h .

If the equality \(h(\overline {{\boldsymbol u}}) = \overline {h}^*\) holds, it follows that \(h(\overline {{\boldsymbol u}}) = h^*\), implying that \(\overline {{\boldsymbol u}}\) is optimal in (15.10). In the case when \(h(\overline {{\boldsymbol u}}) < \overline {h}^*\) holds, we consider any solution to the minimization in (15.8) for \({\boldsymbol u} = \overline {{\boldsymbol u}}\), which we denote by \({\boldsymbol x}^{\overline {q}}\), where \(\overline {q} \in \mathcal {Q}(\overline {{\boldsymbol u}})\), such that \(h(\overline {{\boldsymbol u}}) ={\boldsymbol b}^{\top } \overline {{\boldsymbol u}} + \left ( {\boldsymbol c} - A^{\top } \overline {{\boldsymbol u}} \right )^{\top }{\boldsymbol x}^{\overline {q}}\) holds. Then, the relations

$$\displaystyle \begin{aligned} {\boldsymbol b}^{\top} \overline{{\boldsymbol u}} + \big( {\boldsymbol c} - A^{\top} \overline{{\boldsymbol u}} \big)^{\top} {\boldsymbol x}^{\overline{q}} < \overline{h}^* = {\boldsymbol b}^{\top} \overline{{\boldsymbol u}} + {\displaystyle \operatorname*{\text{min}}_{q \in \overline{\mathcal{Q}}}} \, \left\{ \big( {\boldsymbol c} - A^{\top} \overline{{\boldsymbol u}} \big)^{\top} {\boldsymbol x}^q \,\right\} \end{aligned}$$

hold, which yields that \(\overline {q} \notin \overline {\mathcal {Q}}\). By then augmenting the set \(\overline {\mathcal {Q}}\) with the index \(\overline {q}\), an improved outer approximation of the Lagrangian dual function is obtained. By resolving the problem (15.43) and repeating, an iterative procedure for solving the Lagrangian dual problem (15.10) is obtained. Its convergence is finite, since the set \(\mathcal {Q}\) is finite and since a point \({\boldsymbol x}^{\overline {q}}\) can never be regenerated.

The above procedure is commonly described as a cutting-plane or constraint generation procedure for the LP formulation (15.11b) of the Lagrangian dual problem. This formulation can be restated as

$$\displaystyle \begin{aligned} h^* = {\displaystyle \operatorname*{\text{max}}_{({\boldsymbol u}, v) \in \mathbb{R}^m_+ \times \mathbb{R}}} \Big\{ {\boldsymbol b}^{\top} {\boldsymbol u} + v \,\big|\, {\boldsymbol c}^{\top} {\boldsymbol x}^q - \left( A {\boldsymbol x}^q \right)^{\top} {\boldsymbol u} - v \geq 0, \; q \in \mathcal{Q} \,\Big\}. \end{aligned} $$
(15.45)

Let \((\overline {{\boldsymbol u}}, \overline {v}) \in \mathbb {R}^m_+ \times \mathbb {R}\) be optimal in the relaxed problem

$$\displaystyle \begin{aligned} \overline{h}^* = {\displaystyle \operatorname*{\text{max}}_{({\boldsymbol u}, v) \in \mathbb{R}^m_+ \times \mathbb{R}}} \Big\{ {\boldsymbol b}^{\top} {\boldsymbol u} + v \,\big|\, {\boldsymbol c}^{\top} {\boldsymbol x}^q - \left( A {\boldsymbol x}^q \right)^{\top} {\boldsymbol u} - v \geq 0, \; q \in \overline{\mathcal{Q}} \,\Big\}. \end{aligned} $$
(15.46)

Then \(\overline {h}^* = {\boldsymbol b}^{\top } \overline {{\boldsymbol u}} + \overline {v}\). The question then is if all constraints in the problem (15.45) are satisfied or not at the point \(({\boldsymbol u}, v) = (\overline {{\boldsymbol u}}, \overline {v})\). This is determined by finding the most violated constraint, if any, and amounts to solving

$$\displaystyle \begin{aligned} {\displaystyle \operatorname*{\text{min}}_{q \in \mathcal{Q}}} \, \left\{{\boldsymbol c}^{\top} {\boldsymbol x}^q - \left( A {\boldsymbol x}^q \right)^{\top} \overline{{\boldsymbol u}} - \overline{v} \,\right\} & = {\displaystyle \operatorname*{\text{min}}_{q \in \mathcal{Q}}} \, \Big\{ {\boldsymbol b}^{\top} \overline{{\boldsymbol u}} + \big( {\boldsymbol c} - A^{\top} \overline{{\boldsymbol u}} \big)^{\top} {\boldsymbol x}^q - \overline{h}^* \,\Big\} \\ & = {\boldsymbol b}^{\top} \overline{{\boldsymbol u}} + \big( {\boldsymbol c} - A^{\top} \overline{{\boldsymbol u}} \big)^{\top} {\boldsymbol x}^{\overline{q}} - \overline{h}^*\\ &= \; h(\overline{{\boldsymbol u}}) - \overline{h}^*, \end{aligned} $$
(15.47)

where \({\boldsymbol x}^{\overline {q}} \in X(\overline {{\boldsymbol u}})\) solves the minimization in (15.8) at \({\boldsymbol u} = \overline {{\boldsymbol u}}\). If the minimum value \(h(\overline {{\boldsymbol u}}) - \overline {h}^*\) is negative, then a most violated constraint has been identified. Otherwise, \(h(\overline {{\boldsymbol u}}) = \overline {h}^*\) holds and \(\overline {{\boldsymbol u}} \in \mathbb {R}^m_+\) solves the dual problem (15.10).

The dual cutting-plane procedure sketched above is dually equivalent to applying Dantzig–Wolfe decomposition to the convexified problem (15.5), which by the reformulation (15.11c) is equivalent to the complete Dantzig–Wolfe master problem

$$\displaystyle \begin{aligned} {\displaystyle \operatorname*{\text{min}}_{{ \boldsymbol{\lambda}}}} \left\{ \sum_{q \in \mathcal{Q}} \left( {\boldsymbol c}^{\top} {\boldsymbol x}^q \right) \lambda_q \,\left|\, \sum_{q \in \mathcal{Q}} \left( A {\boldsymbol x}^q \right) \lambda_q \geq {\boldsymbol b}; \; \sum_{q \in \mathcal{Q}} \lambda_q = 1; \; \lambda_q \geq 0, \, q \in \mathcal{Q} \right.\right\}, \end{aligned} $$
(15.48)

while also being the LP dual of the problem (15.45). Since

$$\displaystyle \begin{aligned} X_{\operatorname{\mathrm{conv}}} = \left\{ {\boldsymbol x} = \sum_{q \in \mathcal{Q}} \lambda_q {\boldsymbol x}^q \,\left|\, \sum_{q \in \mathcal{Q}} \lambda_q = 1; \; \lambda_q \geq 0, \, q \in \mathcal{Q} \right. \,\right\} \end{aligned}$$

holds, the variables λ q in (15.48) are commonly referred to as convexity variables and the equality \(\sum _{q \in \mathcal {Q}} \lambda _q = 1\) as a convexity constraint. In Dantzig–Wolfe decomposition, this problem is solved using the linear programming technique called column generation, in which, for this application, a column corresponds to the data of a convexity variable, that is, the values of the scalar c x q, the vector Ax q, and a 1.

Using the analogous reformulation as in (15.11), we obtain

$$\displaystyle \begin{aligned} \overline{h}^* & = {\displaystyle \operatorname*{\text{max}}_{{\boldsymbol u} \in \mathbb{R}^m_+}} \Bigg\{ {\boldsymbol b}^{\top} {\boldsymbol u} + \operatorname*{\text{min}}_{q \in \overline{\mathcal{Q}}} \Big\{ \big( {\boldsymbol c} - A^{\top} {\boldsymbol u} \big)^{\top} {\boldsymbol x}^q \,\Big\} \Bigg\} \end{aligned} $$
(15.49a)
$$\displaystyle \begin{aligned} & = {\displaystyle \operatorname*{\text{min}}_{ \boldsymbol{\lambda}}} \left\{ \sum_{q \in \overline{\mathcal{Q}}} \big( {\boldsymbol c}^{\top} {\boldsymbol x}^q \big) \lambda_q \,\left|\, \sum_{q \in \overline{\mathcal{Q}}} \big( A {\boldsymbol x}^q \big) \lambda_q \geq {\boldsymbol b}; \; \sum_{q \in \overline{\mathcal{Q}}} \lambda_q = 1; \; \lambda_q \geq 0, \, q \in \overline{\mathcal{Q}} \right. \,\right\} {} \end{aligned} $$
(15.49b)
$$\displaystyle \begin{aligned} & = {\displaystyle \operatorname*{\text{min}}_{\boldsymbol x}} \,\Bigg\{ {\boldsymbol c}^{\top} {\boldsymbol x} \,\Big|\, A {\boldsymbol x} \geq {\boldsymbol b} ; \; {\boldsymbol x} \in {\displaystyle \operatorname*{\text{conv}}_{q \in \overline{\mathcal{Q}}}} \big\{ {\boldsymbol x}^q \,\big\} \Bigg\}. \end{aligned} $$
(15.49c)

The problem (15.49b) is the restricted Dantzig–Wolfe master problem, referring to the fact that this problem includes the variables λ q, with the corresponding columns \(\left ( {\boldsymbol c}^{\top } {\boldsymbol x}^q, (A {\boldsymbol x}^q)^{\top },1 \right )^{\top }\), only for \(q \in \overline {\mathcal {Q}} \subseteq \mathcal {Q}\), which is clearly equivalent to imposing the restrictions λ q = 0 for \(q \in \mathcal {Q} \setminus \overline {\mathcal {Q}}\) in (15.48).

Let \((\overline {{\boldsymbol u}}, \overline {v}) \in \mathbb {R}^m_+ \times \mathbb {R}\) be an optimal solution to the LP dual of the restricted master problem. Using strong duality for LP, it then holds that \(\overline {h}^* = {\boldsymbol b}^{\top } \overline {{\boldsymbol u}} + \overline {v}\). The LP reduced cost for any variable λ q, \(q \in \mathcal {Q}\), in the complete master problem can then be expressed as

$$\displaystyle \begin{aligned} \overline{c}^q := {\boldsymbol c}^{\top} {\boldsymbol x}^q - \overline{{\boldsymbol u}}^{\top} A {\boldsymbol x}^q - \overline{v} = {\boldsymbol b}^{\top} \overline{{\boldsymbol u}} + \left( {\boldsymbol c} - A^{\top} \overline{{\boldsymbol u}} \right)^{\top} {\boldsymbol x}^q - \overline{h}^*. \end{aligned}$$

The problem of finding the variable λ q, \(q \in \mathcal {Q}\), with the most negative reduced cost reduces to find

$$\displaystyle \begin{aligned} \operatorname*{\text{min}}_{q \in \mathcal{Q}} \, \left\{ \big( {\boldsymbol c} - A^{\top} \overline{{\boldsymbol u}} \big)^{\top} {\boldsymbol x}^q \,\right\} = \operatorname*{\text{min}}_{{\boldsymbol x} \in X} \, \left\{ \big( {\boldsymbol c} - A^{\top} {\boldsymbol u} \big)^{\top} {\boldsymbol x} \,\right\}, \end{aligned} $$
(15.50)

which is in this context known as a column generation problem, or Dantzig–Wolfe subproblem. If the solution to (15.50) corresponds to a negative reduced cost \(\overline {c}^{\overline {q}} < 0\), \(\overline {q} \in \mathcal {Q}\),Footnote 8 then \(\overline {\mathcal {Q}} := \overline {\mathcal {Q}} \cup \big \{ \overline {q} \,\big \}\) and the problem (15.49b) is resolved. When no more columns having negative reduced costs can be found, which clearly occurs finitely, the convexified problem (15.5) has been solved.

Note that the problem (15.50) is equivalent to that of finding the most violated cutting-plane in the dual space [cf. (15.47)], since each constraint in the cutting-plane formulation of the Lagrangian dual corresponds to a convexity variable in the master problem. Note also that relaxing the dual problem (15.45) into (15.46) is dually equivalent to restricting the complete Dantzig–Wolfe master problem (15.48) into (15.49b).

Remark 15.10

Many textbooks derive the Dantzig–Wolfe decomposition method for an LP problem with two sets of explicit affine constraints, in our notation corresponding to Ax ≥b and \({\boldsymbol x} \in X_{ \operatorname {\mathrm {conv}}}\). The method can, however, be applied also when no explicit representation of the set \(X_{ \operatorname {\mathrm {conv}}}\) in terms of affine constraints is available, as long as the subproblem (15.50) can be solved by some means.

Remark 15.11

If the mixed-binary linear optimization problem (15.2) has a Cartesian product structure, as in Remark 15.1, then a separate set of convexity variables and a convexity constraint can be defined for each of the sets in the product. Letting \(\{ {\boldsymbol y}_s^q \}_{q \in \mathcal {Q}_s}\) denote the extreme points of \( \operatorname {\mathrm {conv}} Y_s\), \(s \in \mathcal {S}\), the complete Dantzig–Wolfe master problem for the problem (15.7) becomes

(15.51a)
(15.51b)
(15.51c)
(15.51d)

When applying column generation to the master problem (15.51), there will be one Dantzig–Wolfe subproblem for each \(s \in \mathcal {S}\).

Since each vector y s, \(s \in \mathcal {S}\), is here described by a separate set of convexity variables, the master problem (15.51) is based on a disaggregate representation, while the master problem (15.48) is based on an aggregate representation. Whenever the feasible set of the original problem (15.2) is described by a Cartesian product set and coupling constraints, as in (15.7), a disaggregate representation is typically favourable. This is due to (1) that the disaggregate and aggregate complete master problems contain \(\sum _{s \in \mathcal {S}} \left | \mathcal {Q}_s \right |\) and \(\prod _{s \in \mathcal {S}} \left | \mathcal {Q}_s \right |\) convexity variables, respectively, and (2) that a disaggregate restricted master problem is a relaxation of the corresponding aggregate restricted master problem, and hence the former typically provides stronger bounds on z .

5 A Two-Phase Method: Subgradient Optimization and Dantzig–Wolfe Decomposition

Subgradient optimization and Dantzig–Wolfe decomposition as means for solving the Lagrangian dual problem (15.10) possess their distinct advantages and disadvantages, which are consequences of the nondifferentiability of the Lagrangian dual function and the inherent properties of these solution principles.

In subgradient optimization, the update of the dual iterate is inexpensive once the Lagrangian relaxed problem has been solved. The method lacks, however, a good termination criterion, and in practice it is therefore often run for a preset number of iterations. Further, primal feasible solutions are typically not easily obtained, neither to the mixed-binary linear optimization problem (15.2) nor to its convexified version (15.5). It is, however, quite common to use a Lagrangian heuristic—tailored to each specific application—to convert Lagrangian subproblem solutions into feasible solutions to the original mixed-binary problem.

Dantzig–Wolfe decomposition converges finitely and produces feasible solutions to the convexified problem (15.5). The latter property allows early termination when the upper and lower bounds on the optimal value of (15.5) are close enough. Each iteration of the method is, however, expensive, since the next dual iterate is found by reoptimizing the LP restricted master problem. Further, due to the inherent instability of cutting-plane approaches, the method typically shows a poor convergence behaviour, in the sense that successive dual iterates may be very far apart. This phenomenon is commonly prevented by the introduction of a stabilization mechanism, such as trust regions in the dual space.

Another way to improve a Dantzig–Wolfe decomposition scheme is to heuristically generate an initial set of columns of high quality. We here describe a specific means to generate such columns, leading to a two-phase method that benefits from the advantages of both subgradient optimization and Dantzig–Wolfe decomposition. A first prediction phase employs subgradient optimization, in which Lagrangian subproblem solutions are stored. At termination, a lower bound on the optimal value of the convexified problem as well as a number of Lagrangian subproblem solutions are at hand. A second solution phase uses these solutions to construct an initial restricted master problem, whereafter the Dantzig–Wolfe method is employed.

The prediction phase aims at setting up a high quality initial restricted master problem, such that the solution phase can attain and verify optimality in fewer column generations. If the prediction phase works perfectly, then it is enough to solve one restricted master problem and make one column generation in order to reach and verify optimality. Hence, the solution phase can alternatively be viewed as an evaluation of the outcome of the prediction phase, and if needed compensate for its shortcoming. The rationale for the prediction phase is that the subgradient optimization method asymptotically finds columns that are optimal in the complete master problem, if the step lengths are appropriately chosen, as demonstrated below.

We consider solving the problem (15.10) by using the conditional subgradient method (15.21), with a sequence {ν(u t)} that is bounded and with step lengths α t fulfilling the conditions (15.23) and (15.27), such that the assumptions of Theorem 15.2 are fulfilled. This method will then in the limit find a dual solution u , such that (u , v ) is optimal in the LP dual (15.45) of the complete master problem (15.48).

Define the index sets

$$\displaystyle \begin{aligned} &\mathcal{T}_{q} := \Big\{ t \in \mathbb{Z}_+ \,\big|\, {\boldsymbol x}({\boldsymbol u}^t) = {\boldsymbol x}^q \,\Big\}, \qquad q \in \mathcal{Q}, \\ &\qquad \qquad \qquad \text{and} \\ &\widehat{\mathcal{Q}} := \left\{ q \in \mathcal{Q} \,\left|\, \mathcal{T}_{q} \mbox{ is infinite} \right. \,\right\} \subseteq \mathcal{Q}, \end{aligned} $$

that is, \(\widehat {\mathcal {Q}}\) contains the indices \(q \in \mathcal {Q}\) such that x q solves the Lagrangian subproblem an infinite number of times. Consider then the corresponding restricted master problem

$$\displaystyle \begin{aligned} \widehat{h}^* := {\displaystyle \operatorname*{\text{min}}_{ \boldsymbol{\lambda}}} \left\{ \sum_{q \in \widehat{\mathcal{Q}}} \left( {\boldsymbol c}^{\top} {\boldsymbol x}^q \right) \lambda_q \,\left|\, \sum_{q \in \widehat{\mathcal{Q}}} \left( A {\boldsymbol x}^q \right) \lambda_q \geq {\boldsymbol b}; \; \sum_{q \in \widehat{\mathcal{Q}}} \lambda_q = 1; \, \lambda_q \geq 0, \, q \in \widehat{\mathcal{Q}} \right. \right\}\!. \end{aligned} $$
(15.52)

Theorem 15.6 (Asymptotic Generation of Optimal Columns)

Let the method (15.21) be applied to the problem (15.10), with the step lengths \(\left \{ \alpha _t \,\right \}\)fulfilling the conditions (15.23) and (15.27), and assume that the sequence {ν(u t)} is bounded. Then \(\widehat {h}^* = h^*\)holds.

Further, letu  U be the limit point for the sequence {u t} (cf. Theorem 15.2) and let v = v be optimal in (15.45) given thatu = u , so that (u , v ) is an optimal dual solution to the complete master problem (15.48). For each Lagrangian subproblem solutionx(u t), \(t \in \mathbb {Z}_+\), define the reduced cost \(\overline {c}_t := {\boldsymbol c}^{\top } {\boldsymbol x}({\boldsymbol u}^t) - \left ( {{\boldsymbol u}}^\infty \right )^{\top } A {\boldsymbol x}({\boldsymbol u}^t) - {v}^\infty \). Then \(\bar {c}_t = 0\)holds for every t that is sufficiently large.

Proof

Consider the ergodic sequence \(\big \{ \overline {{\boldsymbol x}}^t \,\big \}\) of subproblem solutions given by (15.33), that is, \(\overline {{\boldsymbol x}}^t = \left ( \sum _{s=0}^{t-1} \alpha _s \right )^{-1} \sum _{s=0}^{t-1} \alpha _s {\boldsymbol x}({\boldsymbol u}^s)\), t = 1, 2, … Define the convexity weights

$$\displaystyle \begin{aligned} \lambda_q^t := \frac{1}{\sum_{s=0}^{t-1} \alpha_s} \sum_{ s \in \mathcal{T}_q^t} \alpha_s, \qquad q \in {\mathcal Q}, \quad t=1,2,\ldots, \end{aligned} $$
(15.53)

where \(\mathcal {T}^t_{q} := \left \{ s \in \{0,1,\ldots ,t-1\} \,\left |\, {\boldsymbol x}({\boldsymbol u}^s) = {\boldsymbol x}^q \right . \,\right \}\), \(q \in \mathcal {Q}\), t = 1, 2, … Clearly, \(\lambda _q^t \geq 0\), \(q \in \mathcal {Q}\), and \(\sum _{q \in \mathcal {Q}} \lambda _q^t = 1\) hold for t = 1, 2, … Then, the ergodic solution \(\overline {{\boldsymbol x}}^t\) can alternatively be expressed as the convex combination

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} \overline{{\boldsymbol x}}^t = \sum_{q \in \mathcal{Q}} \lambda_q^t {\boldsymbol x}^q, \qquad t=1,2,\ldots \end{array} \end{aligned} $$
(15.54)

The sequence \(\{ \boldsymbol {\lambda }^t \}_{t=1}^\infty \), where \(\boldsymbol {\lambda }^t := \big ( \lambda ^t_q \big )_{q \in \mathcal {Q}}\), is contained in the unit simplex in \(\mathbb {R}^{|\mathcal {Q}|}\), in which it thus has some accumulation point, say \(\overline {\boldsymbol {\lambda }}\). The sequence \(\{ \overline {{\boldsymbol x}}^t \}\) then has an accumulation point at \(\overline {{\boldsymbol x}} := \sum _{q \in \mathcal {Q}} \overline {\lambda }_q {\boldsymbol x}^q\). From Theorem 15.3 follows that \(\overline {{\boldsymbol x}} \in X^*_{ \operatorname {\mathrm {conv}}}\) must hold, which implies that \(\overline {\boldsymbol {\lambda }}\) is optimal in the complete master problem (15.48).

For any \(q \notin \widehat {\mathcal {Q}}\), the set \(\mathcal {T}_{q}\) is finite and the formula (15.53) together with the divergent series conditions (15.23) yield that \(\overline {\lambda }_q=0\) holds. Hence, \(\overline {\boldsymbol {\lambda }}\) is feasible in the restricted master problem (15.52). It follows that \(\widehat {h}^* = h^*\) holds.

From the optimality of (u , v ) in the LP dual (15.45) of the complete master problem (15.48) follows that

$$\displaystyle \begin{aligned} h^* = {\boldsymbol b}^{\top} {\boldsymbol u}^\infty + v^\infty. \end{aligned} $$
(15.55)

Since the Lagrangian dual function h is polyhedral, for large enough values of t, x(u t) solves the Lagrangian subproblem at u t as well as at u . Hence, it holds that b − Ax(u t) ∈ ∂h(u t) ∩ ∂h(u ), and it follows that the inequalities

$$\displaystyle \begin{aligned} &h({\boldsymbol u}^\infty) \leq h({\boldsymbol u}^t) + \big( {\boldsymbol b} - A{\boldsymbol x}({\boldsymbol u}^t) \big)^{\top} \big( {\boldsymbol u}^\infty - {\boldsymbol u}^t \big)\\ &\qquad \qquad \qquad \qquad \text{and}\\ &h({\boldsymbol u}^t) \leq h({\boldsymbol u}^\infty) + \big( {\boldsymbol b} - A{\boldsymbol x}({\boldsymbol u}^t) \big)^{\top} \big( {\boldsymbol u}^t - {\boldsymbol u}^\infty \big) \end{aligned} $$

hold. By combining these inequalities and simplifying we obtain the equalities

$$\displaystyle \begin{aligned} h({\boldsymbol u}^\infty) & = h({\boldsymbol u}^t) + \big( {\boldsymbol b} - A{\boldsymbol x}({\boldsymbol u}^t) \big)^{\top} \big( {\boldsymbol u}^\infty - {\boldsymbol u}^t \big) \\ & = {\boldsymbol c}^{\top} {\boldsymbol x}({\boldsymbol u}^t) + \big( {\boldsymbol b} - A{\boldsymbol x}({\boldsymbol u}^t) \big)^{\top} {\boldsymbol u}^t + \big( {\boldsymbol b} - A{\boldsymbol x}({\boldsymbol u}^t) \big)^{\top} \big( {\boldsymbol u}^\infty - {\boldsymbol u}^t \big) \\ & = {\boldsymbol c}^{\top} {\boldsymbol x}({\boldsymbol u}^t) - \big( A{\boldsymbol x}({\boldsymbol u}^t) \big)^{\top} {\boldsymbol u}^\infty + {\boldsymbol b}^{\top} {\boldsymbol u}^\infty, \end{aligned} $$

which together with the equality (15.55) and h  = h(u ) yield that

$$\displaystyle \begin{aligned} {\boldsymbol c}^{\top} {\boldsymbol x}({\boldsymbol u}^t) - \big( A {\boldsymbol x}({\boldsymbol u}^t) \big)^{\top} {\boldsymbol u}^\infty - v^\infty = 0 \end{aligned}$$

holds when t is large enough. □

According to Theorem 15.6, all columns that are needed to solve the complete master problem (15.48) are eventually found in the subgradient optimization. In case of nondegeneracy at a dual optimum, late iterations will provide exactly the optimal basic columns of the complete master problem.Footnote 9 Further, columns found in the early iterations are never indispensable. These properties justify the use of subgradient optimization for predicting optimal basic columns before applying column generation.

In practice, the two-phase method works as follows. In the prediction phase, \(\bar {t}\) subgradient optimization iterations are performed, with step lengths fulfilling the conditions (15.23) and (15.27). Lagrangian subproblem solutions are collected and columns in the restricted master problem are constructed from iteration \( \underline {t} \le \bar {t}\). Upon termination of this phase, an initial restricted master problem is available. Thereafter, Dantzig–Wolfe decomposition is used as usual. In order to benefit from the two-phase method, the Lagrangian relaxed problem must be computationally cheap in comparison with the restricted master problems, since the prediction phase involves a relatively large number of subgradient optimization iterations.

Theorem 15.3 suggests an ergodic sequence computed within the subgradient optimization method (15.21) that asymptotically solves the problem (15.5), and hence also (15.48). Finitely generated ergodic solutions are, however, typically infeasible. In contrast, a Dantzig–Wolfe restricted master problem—finitely generated within the prediction phase—can be used to find feasible near optimal solutions to (15.5). The key difference is that in the former approach, the primal solution is defined by (15.54) with the convexity weights given a priori by (15.53), while in the latter approach the values of the convexity weights are optimized by the restricted master problem.

6 Recovery of Primal Integer Solutions by Means of Lagrangian Dual Methods

As an application of Theorem 15.3 we give a partial motivation for the success of Lagrangian heuristics in cases where the Lagrangian lower bound is strong, that is, when the difference z − h ≥ 0 is small. We establish that the extent to which one may expect such heuristics to generate feasible solutions of high quality is governed by the same factors as those determining the quality of lower bounds. Hence, a solution strategy that yields a high quality Lagrangian lower bound \(h^* = z_{ \operatorname {\mathrm {conv}}}^* \leq z^*\) is also likely to yield a high quality upper bound \(\overline {z} \geq z^*\).

6.1 Primal Integer Solutions From Lagrangian Heuristics

The basic idea behind Lagrangian heuristics is to use the information obtained from the Lagrangian dual problem (15.10) to construct feasible solutions to the original problem (15.2). A Lagrangian heuristic commonly works as follows. It is initiated at a solution that is feasible with respect to the non-relaxed constraints (15.2c). This solution is gradually adjusted through a finite number of steps that (1) use information from the Lagrangian dual problem, (2) retain feasible in the non-relaxed constraints (15.2c), and (3) strive for feasibility in the relaxed constraints (15.2b). If the heuristic is successful, the final solution is feasible in (15.2). Lagrangian heuristics are, however, often not guaranteed to find feasible solutions.

To comply with the required initial feasibility in the non-relaxed constraints (15.2c), Lagrangian heuristics are commonly initiated with (near) optimal solutions to the subproblem in (15.8). Appropriate adjustments of solutions are necessarily problem specific and range from simple roundings, via elaborate manipulations of solutions, to solving a mixed-integer linear optimization problem. The information used from the Lagrangian dual problem is typically a near optimal dual solution, obtained by, for example, subgradient optimization. The adjustments made when striving for feasibility in the relaxed constraints (15.2b) are often guided by a merit function defined by original costs or Lagrangian cost, such that the final solution, if feasible, is likely also near optimal. The heuristic may also continue with a local search or meta-heuristic search in the original problem, after feasibility is reached.

We distinguish between two types of Lagrangian heuristics: conservative and radical. The latter type allows the solution finally found to be far from optimal in the subproblem. A radical heuristic can, for example, solve a restriction of the original problem (e.g., a Benders subproblem), which yields a feasible solution.

In conservative heuristics, which are the more common, the initial solution is (near) optimal in the subproblem, the adjustments made are local and such that near optimality in the subproblem is retained, and the number of adjustments is required to be rather small. Due to these limitations, such a heuristic may produce very good feasible solutions, or frequently fail to find feasible solutions, depending on the characteristics of the original problem and the details of the heuristic.

Initialize a conservative Lagrangian heuristic by a solution \(\tilde {{\boldsymbol x}}\) to the subproblem (15.9). Then, the effect of the adjustments made can be characterized by the inclusion

$$\displaystyle \begin{aligned} {\boldsymbol x}_{\mathrm{heur}}(\tilde{{\boldsymbol x}}) \in \left\{ {\boldsymbol x} \in X \,\left|\, A {\boldsymbol x} \geq {\boldsymbol b}; \text{ ``the distance } \left\|\, {\boldsymbol x} - \tilde{{\boldsymbol x}} \,\right\| \text{ is small''} \right. \right\}. \end{aligned} $$
(15.56)

To understand (at least partially) why conservative heuristics can produce very good feasible solutions, we first recall that \(\tilde {{\boldsymbol x}}\) is optimal in a relaxation of the original problem. Further, in many applications a subproblem solution obtained using a near optimal Lagrangian dual solution is near feasible in the original problem. Moreover, if \(\tilde {{\boldsymbol x}}\) is near feasible, the adjustments needed to reach feasibility are small, and a feasible solution \({\boldsymbol x}_{\mathrm {heur}}(\tilde {{\boldsymbol x}})\), will indeed be close to the input \(\tilde {{\boldsymbol x}}\). Hence, it appears likely that \({\boldsymbol x}_{\mathrm {heur}}(\tilde {{\boldsymbol x}})\) can be near optimal in the original problem. The risk of failure in a conservative heuristic is due to the adjustments being only local, which may be insufficient to repair intricate infeasibilities.

As the dual sequence {u t} approaches the set U , the ergodic sequence \(\{ \tilde {{\boldsymbol x}}^{\,t} \}\) will approach the set \(X_{ \operatorname {\mathrm {conv}}}^*\). If the Lagrangian lower bound h is strong, that is, if the duality gap z − h ≥ 0 is small, the sets \(X_{ \operatorname {\mathrm {conv}}}^*\) and X are expected to be close, in the sense that the value \(\min \{ \|\, {\boldsymbol x} - {\boldsymbol y} \,\| \,\big |\, {\boldsymbol x} \in X_{ \operatorname {\mathrm {conv}}}^*, {\boldsymbol y} \in X^* \}\) is (relatively) small. Since the adjustments made to the ergodic primal iterates are quite small, a heuristic based on the inclusion (15.56) can thus be expected to produce solutions which are close to the set X . If, however, as in classical Lagrangian heuristics, the subproblem solutions x(u t) are used as inputs to the heuristic, the above arguments cannot be used to claim that the resulting point will be close to \(X_{ \operatorname {\mathrm {conv}}}^*\) or to X . We thus propose to use the ergodic iterate \(\tilde {{\boldsymbol x}}^{\,t}\) in place of the subproblem solution x(u t), as input to the Lagrangian heuristic of Algorithm 15.1; it is based on (15.56), that is, to find \({\boldsymbol x}_{\mathrm {heur}}(\tilde {{\boldsymbol x}}^{\,t})\).

Algorithm 15.1: Lagrangian heuristic for (15.2) utilizing ergodic sequences of subproblem solutions

6.2 Approximate Solutions to the Primal Problem via Core Problems

A common solution strategy for discrete optimization problems is to use some heuristic, problem adapted technique for predicting optimal values for a (relatively large) subset of the binary variables, and solve the (relatively small) restriction of the original problem to the nonfixed variables—the core problem—either exactly or approximately. The core problem should be constructed with the aim of making it feasible; an optimal solution to the core problem then yields a feasible and near optimal solution to the original problem. Whenever the core problem turns infeasible, previously fixed variables need to be relaxed and reinserted in the core.

Let \(\mathcal {J}_0^* \subseteq \{1, \ldots , n_{\mathrm {b}} \}\) and \(\mathcal {J}_1^* \subseteq \{1, \ldots , n_{\mathrm {b}} \} \setminus \mathcal {J}_0^*\) denote the sets of indices for the variables in x b which possess the value 0 and 1, respectively, in every optimal solution to the convexified problem (15.5). Further, let \(\mathcal {J}_{\text{frac}}^* := \{1, \ldots , n_{\mathrm {b}} \} \setminus (\mathcal {J}_0^* \cup \mathcal {J}_1^*)\) denote the complementary index set, corresponding to the variables in x b which possess a fractional optimal value in at least one optimal solution to (15.5).

For each iteration t in the method (15.21), let \(\tilde {{\boldsymbol x}}^{\,t}\) denote the weighted average of the solutions to the Lagrangian subproblem (15.8) as defined in (15.35), with step lengths according to (15.23), (15.27), and the convexity weights fulfilling the conditions (15.36) or (15.38). For each j ∈{1, …, n b} the value \(\tilde {x}_{\mathrm {b},j}^{\,t} \in [0, 1]\) can then be interpreted as the weighted relative frequency by which the variable x b,j attains the value 1 in an optimal solution to the subproblem. The following result is immediate.

Proposition 15.10 (On the Weighted Relative Frequency of Binary Solutions)

It holds that \(\big \{ \tilde {x}_{\mathrm {b},j}^{\,t} \,\big \} \rightarrow 0\)for all \(j \in \mathcal {J}_0^*\)and \(\big \{ \tilde {x}_{\mathrm {b},j}^{\,t} \,\big \} \rightarrow 1\)for all \(j \in \mathcal {J}_1^*\). If the sequence \(\big \{ \tilde {x}_{\mathrm {b},j}^{\,t} \,\big \}\)accumulates at a point in the open interval (0, 1), then \(j \in \mathcal {J}_{\mathrm {frac}}^*\).

Proposition 15.10 motivates the use of the weighted relative frequency of the binary subproblem solutions as an indicator for the solution to the convexified optimization problem (15.5), as well as for the solution to the original problem (15.1) [or (15.2)]. For each variable x b,j, j = 1, …, n b, we thus define the two threshold values \(\sigma _j^0, \sigma _j^1 \in (0, \frac {1}{2})\), which are used to define the two approximating sets

$$\displaystyle \begin{aligned} \mathcal{J}_0(\boldsymbol{\sigma}^0,\tilde{{\boldsymbol x}}^{\,t}) & := \Big\{ j \in \{ 1, \ldots, n_{\mathrm{b}} \} \,\big|\, \tilde{x}_{\mathrm{b},j}^{\,t} \leq \sigma_j^0 \,\Big\}\\ &\qquad \qquad \mbox{and} \\ \mathcal{J}_1(\boldsymbol{\sigma}^1,\tilde{{\boldsymbol x}}^{\,t}) & := \Big\{ j \in \{ 1, \ldots, n_{\mathrm{b}} \} \,\big|\, \tilde{x}_{\mathrm{b},j}^{\,t} \geq 1 - \sigma_j^1 \,\Big\}. \end{aligned} $$

A sequence of core problems—to be utilized in Algorithm 15.2—is then defined by

(15.57a)
(15.57b)
(15.57c)
(15.57d)

We let \({\boldsymbol x}_{\mathrm {core}}^t\) denote a feasible (optimal or approximate) solution to the core problem (15.57), the value of which—whenever feasible—is an upper bound on the optimal value, that is, the inequalities \({\boldsymbol c}^{\top } {\boldsymbol x}_{\mathrm {core}}^t \geq z_{\mathrm {core}}^*(\boldsymbol {\sigma }^0, \boldsymbol {\sigma }^1, \tilde {{\boldsymbol x}}^{\,t}) \geq z^*\) hold. We define \(z_{\mathrm {core}}^t := \mathrm {min}_{s=0, \ldots , t} \left \{ {\boldsymbol c}^{\top } {\boldsymbol x}_{\mathrm {core}}^s \,\right \}\). Since lower bounds h(u t) ≤ h are given by the dual iterates u t, a termination criterion can be based on the differences

$$\displaystyle \begin{aligned} z_{\mathrm{core}}^t - \overline{h}_t \geq z^* - h^* \geq 0,\end{aligned} $$
(15.58)

where \( \overline {h}_t := \mathrm {max}_{s=0, \ldots , t} \left \{ h({\boldsymbol u}^s) \,\right \}, \quad t = 1, 2, \ldots \)

Algorithm 15.2: Approximate solution of (15.2) from a sequence of core problems

6.3 Optimal Solutions via a Branch-and-Bound Framework

We next consider using ergodic sequences to obtain feasible solutions to the problem (15.2) within a branch-and-bound framework. A subgradient method (15.21) can be applied to the Lagrange dual (15.10) of the local linear program corresponding to each branch-and-bound tree node, yielding

  1. (1)

    a lower bound on z from a lower estimate of h (that is, from the value of an approximate solution to the Lagrangian dual of the node problem),

  2. (2)

    an upper bound on z from feasible solutions to (15.2) constructed using the subproblem solutions obtained in (15.21), and

  3. (3)

    a branching decision based on the (approximate) reduced costs obtained in the dual sequence {u t}.

One drawback of this principle is that it seldom provides a fractional primal solution, since the subproblem solutions are integer valued. So when aiming for a breadth-first branching, deciding on which variable to branch on is nontrivial. We propose a procedure in which the upper bounding (2) and the branching rule (3) are replaced by measures based on the ergodic iterates \(\tilde {{\boldsymbol x}}^{\,t}\). The upper bound is obtained by applying Algorithm 15.1, which provides feasible solutions to the problem (15.2). The branching rule is based on the ergodic iterate \(\tilde {{\boldsymbol x}}^{\,t}\) obtained from the method (15.21), (15.23), (15.27), (15.35) and (15.36) [or (15.38)]. Branching can be done on the variables with values close to binary or close to \(\frac {1}{2}\). The optimization problem addressed in node n of the branch-and-bound tree is then the problem (15.5), with the additional constraints x b,j = , \(j \in \mathcal {I}_\ell ^n\),  ∈{0, 1}, where the set \(\mathcal {I}^n_{\ell }\) contains the indices of the variables that have been fixed to in the parent node of node n. By defining the set \(X_{ \operatorname {\mathrm {conv}}}^n := \operatorname {\mathrm {conv}} \big \{ {\boldsymbol x} \in X \,|\, x_{\mathrm {b},j} = \ell , \, j \in \mathcal {I}_\ell ^n, \, \ell \in \{ 0, 1 \} \,\big \} \subseteq X_{ \operatorname {\mathrm {conv}}}\), this linear program can then be expressed as

$$\displaystyle \begin{aligned} z_n^* := \quad &\min_{{\boldsymbol x}} \quad {\boldsymbol c}^{\top} {\boldsymbol x} & {} \end{aligned} $$
(15.59a)
(15.59b)
(15.59c)

The branching procedure of Algorithm 15.3 is then applied, where the dual starting point u 0 in step 1 is often chosen as the final point u τ obtained from the subgradient scheme for the parent node. The search strategy for the branch-and-bound tree can be defined as, for example, depth-, breadth-, or best lower bound-first.

Algorithm 15.3: Branching decision based on ergodic primal iterates

7 Notes and Further Reading

In this section we collect notes regarding the various results presented in the former sections and review important background material, along with references to the literature. While mentioning a few of the classic articles in the field of Lagrangian duality—chosen for their eloquence and pioneer status rather than for representing the state of the art—we also include a selection of more recent articles in the field—that we think tie in well with the classics.

Notes on Sect. 15.2.1: The Lagrangian Dual

That the Lagrangian dual function h is piecewise affine and concave is shown in, for example, [11, Proposion 5.1.12]. That the dual optimal set U is nonempty and polyhedral when the feasible set of the convexified problem (15.5) is nonempty is verified in, for example, [11, Section 5]. When the feasible set of (15.5) is empty, however, the Lagrangian dual (15.10) is unbounded; conditions for the existence of an optimal Lagrangian dual set are detailed in [74].

In [31] Everett establishes a theorem for the Lagrangian function, which states that for a given vector of dual multipliers (not necessarily optimal), the minimization of the Lagrangian function over the primal variables yields a primal vector that is globally optimal for a primal problem whose available resources are identical to those utilized in the primal minimum of the Lagrangian function for the given multipliers. This result suggests that dual multipliers are near optimal when the minimum of the corresponding Lagrangian subproblem is near feasible. Brooks and Geoffrion extend in [15] the analysis in [31] to provide near optimal solutions also when the original problem may include, for example, integer variables.

In [46, 47] Held and Karp investigate approaches to the symmetric traveling salesperson problem, based on a 1-tree relaxation.Footnote 10 One seeks prices (i.e., multipliers) on the nodes (which, however, appear on the adjacent links) such that the cheapest 1-tree equals an optimal Hamiltonian cycle. Optimal prices are derived from the solution of the corresponding Lagrangian dual problem, which is to maximize the value of the 1-tree. The lower bounds obtained from the dual procedure are also utilized in a branch-and-bound procedure.

Guignard provides in [39] an extensive introduction to Lagrangean approaches for the exact or approximate solution of difficult combinatorial optimization problems. The theme is similar to ours, while it is aimed at less experienced readers.

A quite little studied form of Lagrangian relaxation—not covered by this chapter—is known as Lagrangian decomposition, or variable splitting; see [40, 53]. It can be applied to mixed-integer linear optimization problems with (at least) two sets of explicit constraints [in (15.1b), (15.2b), or (15.7b)], such that two different Lagrangian relaxations are possible. The starting point for this approach is a problem reformulation in which copies of (a subset of) the primal variables are introduced, in one of the sets of constraints, together with additional constraints that ensure consistency between the original variables and the copies. The consistency constraints are then Lagrangian relaxed. The corresponding Lagrangian dual problem can yield stronger bounds than both of the two possible straightforward Lagrangian relaxations of the original problem, provided that neither of the LP relaxations of the resulting Lagrangian subproblems possesses the integrality property.

Notes on Sect. 15.2.2: Optimality Conditions for the Convexified Problem

While some literature on nondifferentiable functions (naturally) draw a distinction between ‘supdifferentials of concave functions’ and ‘subdifferentials of convex functions’, we use the term subdifferential all through, relying on the readers’ familiarity with the concept.

The subdifferential of a concave function is defined in [11, Definition 3.2.3]. Proposition 15.1 follows from [11, Theorem 6.3.7], the convexity of the set X conv, and [64, Theorem 11]. Further, the differentiability property of the dual function h relying on the singleton property of the set X conv(u) is depicted in [11, Theorem 6.3.3]. For a proof of Proposition 15.4, see [11, Theorem 3.4.3]. The saddle point optimality in (15.13) is proven in [11, Theorem 6.2.4], while Proposition 15.5 follows from [11, Theorem 6.2.5]. In [17, Lemma 2] it is shown that the composite mapping \(\partial h \cap N_{\mathbb {R}^m_+}\) is constant on the solution set U . The definition of the subdifferential of an objective function is in [24, 25] generalized to take the feasible set into account; in our terminology it is called the conditional subdifferential. The non-coordinability phenomenon is a consequence of the non-differentiability of the Lagrangian dual function, which in turn is a consequence of the linearity of the primal problem. This phenomenon is of interest in economic systems which are described by linear models and where constant prices are used as a tool for steering decentralized decisions towards system optimality (e.g., [27]).

Turning to the question of how to exploit the Lagrangian dual problem (15.10) as a tool for finding an (near) optimal solution to the mixed-binary linear optimization problem (15.2), the fundamental observation is that the dual problem effectively solves the convexified primal problem (15.5), that is, finds an element of the solution set \(X^*_{\mathrm {conv}}\). Worth noting, however, is that whether such an element is directly available or not depends on the solution strategy employed for the dual problem.Footnote 11 If an (near) optimal, solution to the convexified problem (15.5) is at hand, a simple strategy to find a mixed-binary feasible solution is to employ rounding, that is, to systematically round the values of x b in a solution to the convexified problem to binary values, and adjust the values of x c accordingly, with the aim of reaching feasibility and near optimality in the original problem (15.2).

Simple rounding procedures are, however, in general inadequate for finding near optimal or even feasible solutions. A more general idea is randomized rounding (see [78]), which for certain applications works as approximation algorithms. A generic such technique entails the solution of a continuous relaxation of the original problem, followed by a randomization scheme to decide whether to round up or down. The article [12] provides a framework for finding approximate solutions to covering problems through generic heuristics, all based on rounding (deterministic—using primal and dual information—or randomized—with nonlinear rounding functions) of an optimal solution to a LP relaxation. The roundings are applied to several known, as well as to new, results for the set covering, facility location, general covering, network design, and cut covering problems. The follow-up article [13] describes a class of structure exploiting methods to round fractional solutions, by introducing dependencies in the process. The technique improves approximation bounds for several problems, including the min-k-SAT problem.

The reader may note that rounding and randomized rounding strategies have in common with the core problem principle (see Sect. 15.6.2) that they identify and elaborate with a subset of the original variables, based on primal information (e.g., variable values) or dual information (e.g., Lagrangian reduced costs).

Notes on Sect. 15.2.3: Conditions for Optimality and Near Optimality of Mixed-Binary Linear Optimization Problems

The global primal–dual optimality conditions (15.15) and the equivalent near saddle point condition (15.16) were introduced in [57], in which also computational results are reported. For examples of solution methods related to the enumeration principle discussed in Remark 15.2, see [45] and [21] (constrained shortest path problems), and [20] (train timetabling).

Notes on Sect. 15.3: Conditional Subgradient Optimization

Subgradient optimization methods for minimizing non-differentiable convex functions originate in a work by Shor from 1962; [83] reviews the early history of nonsmooth optimization. For the case of unconstrained optimization, Ermol’ev established in [30] the convergence of the method using step lengths according to a divergent series. Polyak extended in [76, 77] the method to the case of constrained convex optimization and presented additional convergence results; see also [82, Section 2]. These methods have been frequently and often successfully applied, particularly in connection with Lagrangian duality; see, for example, [32, 33, 36, 48]. Worth noting is that subgradient optimization methods are closely related to relaxation methods for solving systems of linear inequalities; see [37].

The important Polyak step length rule, which has proved to be very useful in computational practice, was presented in [77]. For the case when the optimal objective value h is known a priori, convergence to an optimal solution by the method (15.21) using the step lengths (15.30) was established: the restrictions on the step length parameter θ t guarantee that the distance between the current iterate and the solution set decreases in every iteration. Finite convergence to a near optimal point was also established for the case when the optimal value h in (15.30) is replaced by an estimate \(\overline {h} \geq h^*\); this generalization, expressed in (15.31), is the most commonly used in practice.

Notes on Sect. 15.3.1: Basic Convergence in the Lagrangian Dual Problem

The conditional subgradient optimization method was presented in [58] (see also [24, 25]) and generalizes the subgradient optimization method; it includes as a special case the subgradient projection method, as given in (15.22), which has shown a better practical performance than traditional subgradient methods (see [58]). The convergence of the method (15.21) was established in [58] for the divergent series and the Polyak step length rules. The proofs of Theorems 15.1 and 15.2 are special cases of those given in [58, Theorems 2.6 and 2.7, respectively]. The special case of subgradient projection is detailed in [58, Section 3]. Proposition 15.7 can be proven analogously as [77, Theorem 1] adapted to a Lagrangian dual problem, while Proposition 15.8 follows from [77, Theorem 4]. The condition (15.32) is referred to as the almost complete relaxation strategy; see [26, Section 3.4] and [58, Corollary 2.8].

Notes on Sect. 15.3.2: Ergodic Convergence in the Primal Problem

The principle of constructing ergodic sequences of primal subproblem solutions in subgradient optimization can be traced back a long time; see [82, pp. 117] and [2]. The results presented here are developed in a series of articles; see [56, 59, 60]. Proposition 15.9 and Theorem 15.3 are special cases of [60, Proposition 5 and Theorem 1, respectively]. The convergence of sequences of convex combinations in general relies on a result in [54] and which is described in [44, Lemma 3].

A relative to the principle of constructing ergodic primal solutions, and also to the two-phase method presented in Sect. 15.5, is the volume algorithm of Barahona and Anbil in [8]. Referencing the classic works of Held and Karp [46, 47] and Held et al. [48] on the search for good lower bounds for computationally challenging large-scale optimization problems, Barahona and Anbil identify the drawback of not considering the convergence characteristics in the primal space and of the lack of a natural stopping criterion. While at the outset admitting the lack of a complete convergence theory, the authors describe a dual scheme, which is similar to a classic conjugate subgradient method (see [93]), in tandem with a constructive primal heuristic that mimics the master problem in Dantzig–Wolfe decomposition. The volume algorithm is applied to large-scale linear optimization problems arising from continuous relaxations of set partitioning, set covering, airline crew scheduling, max-cut, and facility location problems. The conclusion from the numerical experiments is that the more favourable problem instances are those in which variables are bounded within the interval [0, 1], constraint coefficients lie in the set {0, 1, −1}, and the pricing problem is solvable in linear time.

Notes on Sect. 15.3.3: Enhanced Primal Ergodic Convergence

Ergodic sequences that exploit more information from later subproblem solutions (as compared to earlier ones) were first presented in 1996 by Sherali and Choi [81] for the case of LP; this construction of ergodic sequences was generalized in 2015 by Gustavsson et al. [44] to incorporate also general convex optimization, as well as refined to the so-called s k-rule. Theorem 15.4 follows from [44, Theorem 1]. The result that the requirements (15.38) on the convexity weights \(\mu _s^t\) combined with step lengths α t according to a modified harmonic series (15.37) imply the requirements (15.36b)–(15.36e) on the parameters \(\gamma _s^t\) is established in [44, Proposition 5]. Convergence analyses for the s k-rule are presented in [44], while the delayed initialization of the ergodic sequences according to Remark 15.8 is detailed in [60, Section 3, Remark 1].

Notes on Sect. 15.3.4: Finite Primal Feasibility and Finite ε-Optimality

Theorem 15.5 and the finiteness result in Corollary 15.2 are established in [60, Theorem 3 and Corollary 3, respectively]. Approximate solution of the Lagrangian subproblems yields directions being ε-subgradients; the resulting conditional ε-subgradient method is investigated and analysed in [61]. Finiteness results are also presented in [63], in which the ergodic sequences are generated within a simplicial decomposition framework for nondifferentiable convex optimization.

A characterization in the case of a possibly inconsistent primal problem (15.5) is carefully detailed in [74], for the more general case of convex optimization. Convergence is established of an ergodic sequence of subproblem solutions to a point in the primal space such that the Euclidean norm of the infeasibility in the relaxed primal constraints is minimized, while the sequence of dual iterates diverges along the feasible direction of steepest ascent for the Lagrangian dual function.

Notes on Sect. 15.4: Dual Cutting-Planes: Dantzig–Wolfe Decomposition

The article “Decomposition principle for linear programs” published in 1960 by Dantzig and Wolfe [23] is a pioneering work on the subject of large-scale optimization and among the most influential publications in the history of operations research. The Dantzig–Wolfe decomposition principle is founded on the representation theorem for polyhedra (e.g., [65, Chapter 3, Theorem 3]), which states that a point belongs to a polyhedron if and only if it can be expressed as a convex combination of the polyhedron’s extreme points plus a nonnegative linear combination of its extreme directions.

Dantzig–Wolfe decomposition is derived from the structure of the constraints of the linear program. It is assumed that the constraints can be partitioned into two sets, of which one is (relatively) computationally tractable. The other set of constraints is deemed complicating; hence these are Lagrangian relaxed—giving the method’s subproblem—and instead dealt with in a separate linear program—the restricted master problem. For practical problems Dantzig–Wolfe decomposition typically exploits a block diagonal structure of the subproblem, which can then be solved as several separate subproblems. Assuming that the subproblems always have finite optima, the optimal solutions obtained for different price vectors are—in the restricted master problem—convex combined, such that the complicating constraints are optimally utilized, with respect to the problem’s objective function.Footnote 12 Subject to the usual non-degeneracy assumptions (or means for dealing with degeneracy), the convergence is finite; this follows directly from the finite convergence of the simplex method, of which Dantzig–Wolfe decomposition is, in essence, an application.

Plenty of variations of this computational scheme are possible. For example, each subproblem can be terminated at a near optimal solution, as long as this solution is good enough to ensure some progress in the restricted master problem. Further, the solution of the restricted master problem can be truncated, as long as some progress has been made. Further, convergence is ensured even if only the current basis is maintained in the restricted master problem, in a revised simplex manner.

The extension of the Dantzig–Wolfe decomposition principle to the case of nonconvex [e.g., (mixed-) integer] or non-linear optimization problems is known as generalized LP. In [68], a fundamental property of generalized LP is established. It is shown that for nearly all problems of practical importance, any limit point of the sequence of dual solutions produced by the algorithm is optimal in the Lagrangian dual problem of the given, primal, problem. This result holds even if the generalized LP algorithm does not solve the primal problem, which is typically the case whenever this problem is nonconvex.

Among the first applications of the Dantzig–Wolfe decomposition principle to a mixed-integer optimization problem is the work in [29] by Dzielinski and Gomory from 1965; see also [27, Section 7.2]. They consider a problem of production and inventory planning, which can be characterized as a time indexed, multi product economic lot size scheduling problem with common production resources. The problem is straightforward to model as a mixed-integer optimization problem, but they instead consider an approximate linear optimization model, which is based on the work [69] by Manne from 1958 and in which each variable corresponds to a complete production schedule for a single product. The approximate model can be regarded as a Dantzig–Wolfe master problem for a convexified version of a mixed-integer model of the problem. Since the number of possible production schedules can be huge, column generation is used. The column generation problem (i.e., the Dantzig–Wolfe subproblem) finds a production schedule for a single product and can be efficiently solved by the well-known Wagner–Whitin lot-sizing algorithm (which is a dynamic programming scheme), while the restricted master problem optimally combines the available production schedules with respect to overall cost and the common production resources.

In the pioneering two-part work [3, 4] on the application of column generation in the field of vehicle routing, Appelgren describes column generation approaches to a ship scheduling problem, obtained from a Swedish ship owning company. The first article applies the Dantzig–Wolfe decomposition method to the LP relaxation of the scheduling problem, which—probably thanks to the favourable matrix structure—achieves solutions that are near integer. In the second article, the column generation method is combined with a branch-and-bound algorithm, in which the branching is performed on one of the “essential” fractional variables and the bounds are obtained by the decomposition algorithm. This combined method was able to solve all problem instances tested, mostly with one branching only.

The work by Appelgren is an early predecessor to what is today known as branch-and-price. (The term price refers to the column generation, i.e., the pricing problem.) In the article [10] the authors first summarize the relations between branch-and-cut and branch-and-price for (mixed-) integer optimization problems. In branch-and-cut, classes of valid inequalities, preferably facets of the convex hull of the set of feasible solutions, are left out of the LP relaxation, because they comprise far too many constraints to generate and handle efficiently, and most of them will neither be binding in an optimal solution. Then, if an optimal solution to a LP relaxation is infeasible (with respect to integrality restrictions), a subproblem—called the separation problem—is solved in order to identify violated inequalities in a class. If one or more violated inequalities are found, some of them are added to the linear optimization problem, in order to cut off the infeasible solution, followed by its re optimization. Branching occurs only when no violated inequalities can be found. Branch-and-cut is thus a generalization of branch-and-bound with LP relaxations, which allows cutting to be applied throughout the branch-and-bound tree.

The philosophy of branch-and-price resembles that of branch-and-cut, except that the focus is on column generation instead of row generation. In fact, pricing and cutting are complementary procedures for tightening a LP relaxation. In branch-and-price, sets of columns are left out of the LP relaxation because they are far too many to be generated and handled efficiently, and most of the associated variables will anyway be zero valued in an optimal solution. To check the optimality of a LP solution, a subproblem—the pricing problem, which is a separation problem for the dual linear program—is solved in order to identify columns to enter the basis, and the linear program is re-optimized. Branching occurs when no columns price out to enter the basis and the linear programming solution does not satisfy the integrality conditions. branch-and-price is a generalization of branch-and-bound with linear programming relaxations, which allows column generation to be applied throughout the branch-and-bound tree.

While appearing contradictory at first, there are several reasons (see [10]) for considering formulations with huge numbers of variables. Not infrequently a mixed-integer optimization formulation with many variables has a better LP relaxation (with respect to bound quality). Further, a compact formulation—which is a formulation not involving a huge number of variables—of a mixed-integer optimization problem may possess structural symmetries that allows solutions being mathematically different but having indifferent real-life interpretations; this causes branch-and-bound perform poorly as the problem barely changes after branching. A reformulation with a huge number of variables may eliminate such symmetries. Further, column generation provides a decomposition of the problem into a master problem and one or more subproblems. This decomposition may have a natural interpretation in the problem context, thus allowing for the incorporation of additional important constraints. Finally, a formulation with a huge number of variables may be the only choice.

At first glance, it may seem that branch-and-price involves nothing more than combining well-known ideas for solving linear programs by column generation and traditional branch-and-bound. This is, however, not that straightforward, as observed already many years ago by Appelgren in [3, 4]. The most fundamental difficulty arising is that the traditional single variable branching is no longer viable, the reason being that it leads to regeneration of (already available) columns. Appelgren resolved this by finding a best column among those that are not already available.

Today, the common countermeasure is to use other branching techniques, which are compatible with column generation by allowing branching restrictions to be transferred to the column generation problem without leading to essential changes of its properties. Vanderbeck analyzes in [88] the challenges in combining a branching scheme with column generation. The article presents a generic branching scheme in which the pricing oracle of the root node remains of use after branching, and which does not require an extended formulation of the original problem. It then recursively partitions the subproblem solution set. Branching constraints are enforced in the pricing problem, which is solved approximately by a limited number of calls to the pricing oracle. The scheme is illustrated on the cutting stock and bin packing problems; it is the first branch-and-price algorithm capable of solving such problems to integrality without modifying the subproblem or expanding its variable space.

An early and tidy application of branch-and-price is to the generalized assignment problem [80], which is decomposed into a set partitioning master problem and knapsack column generation problems. Another tidy application of branch-and-price is given in [87], which considers a time-indexed (i.e., time discretized) formulation of a machine scheduling problem. Such formulations are known to provide strong LP bounds, but they tend to be extremely large. The authors show how to (partly) alleviate this difficulty by means of Dantzig–Wolfe decomposition, leading to a reformulation with many more variables, but far fewer constraints. The central pricing problem is solved by dynamic programming in O(nT) time, with T and n being the number of time steps and jobs, respectively. To find an integer optimum, the decomposition approach is embedded in a branch-and-bound scheme.

For general surveys of column generation and branch-and-price, see [66, 92]. A recent research trend in mixed-integer optimization is to develop effective and efficient solution methods by combining decomposition approaches with heuristic or metaheuristic principles, in order to exploit their respective advantages. For a general overview of metaheuristic methods based on decomposition principles, see [79]. In a column generation context, such a combined method would extend a heuristic search beyond the columns necessary for solving the LP master problem. Classes of column generation based primal heuristics for mixed-integer linear optimization are reviewed in [51], with the aim to extract generic classes of column generation methods for use as black-box primal heuristics across applications. One such class consists of the so-called diving heuristics, which perform depth first searches in a branch-and-price tree, gradually obtaining integer solutions by variable fixings according to branchings which priorities columns that are part of LP optimal solutions in the nodes. To escape from local optima, partial backtracking can be used. Examples of applications of diving heuristics are found in, for example, [35, 41].

An interesting topic for further studies is to use an enumeration principle of the type outlined in Remark 15.2 with the aim to find favourable columns for inclusion in a restricted master problem. The goal is then to construct a restricted master problem capable of identifying an (near) optimal solution to the original problem (15.2), rather than to directly find an optimal solution through enumeration. This strategy is reasonable if the original problem has a Cartesian product structure—see Remarks 15.1 and 15.10—such that columns from different sets in the Cartesian product can be combined in order to achieve overall (near) optimality. Solution methods related to this strategy are found in [86] (a production planning problem) and [7] (a vehicle routing problem, not having a Cartesian product structure but a related suitable structure.)

Another interesting topic for further studies within the field of column generation is the little studied combination of Lagrangian decomposition (see [40, 53]) and Dantzig–Wolfe decomposition (i.e., dual cutting-planes) for solving the Lagrangian dual problem; see [72, 75, 94] for examples of this combination. Both Lagrangian decomposition and Dantzig–Wolfe decomposition can separately provide strong lower bounds, and the synergy between these two bounding principles has the potential to provide even stronger lower bounds.

Notes on Sect. 15.5: A Two-Phase Method: Subgradient Optimization and Dantzig–Wolfe Decomposition

The inherent instability of dual cutting-plane procedures is discussed in, for example, [49, Chapter 15]. The fundamental dual box step stabilization was introduced in [70]. Examples of applications of this type of stabilized dual cutting-plane method (i.e., stabilized Dantzig–Wolfe decomposition) are found in [62, 90]. More general stabilization techniques are given in [28]. The use of heuristically generated high quality starting columns in Dantzig–Wolfe decomposition is discussed in, for example, [66, Subsection 4.1.1].

The two-phase method was introduced in [91], which also reports successful computational experience from an application to large-scale multicommodity network flows. Some similar methods can be found in the literature; in contrast to the one presented, those methods are, however, not justified by theoretical results.

In [95], subgradient optimization is performed in a first phase, which—in each iteration—stores dual cuts and solves a Dantzig–Wolfe master problem; the objective value of the latter is used in the Polyak step length formula. If the same cut is found too many times, the method switches to a second phase: the Dantzig–Wolfe method. A drawback of this two-phase method is that a computationally demanding master problem is solved in each subgradient iteration of the first phase, although it is only the objective value of the master problem that is actually used. This two-phase method is further developed in [89], which studies different criteria for switching to the second phase, in which the effect of using a bundle method is also studied. Also [55] studies a combined subgradient optimization and bundle method.

In [9] a fixed number of subgradient iterations is run every few iterations of the Dantzig–Wolfe method, starting from the restricted master dual optimum. The columns found are then included in the master problem. Substantially shorter computing times are reported, as compared to the standard Dantzig–Wolfe method. This line of research is continued in [50], which employs subgradient optimization also for finding approximate dual solutions to Dantzig–Wolfe master problems.

Notes on Sect. 15.6: Recovery of Primal Integer Solutions by Means of Lagrangian Dual Methods

Besides our own stream of research, the recovery of primal solutions has been treated in, for example, [67, 81, 84].

Notes on Sect. 15.6.1: Primal Integer Solutions from Lagrangian Heuristics

A classic reference on Lagrangian relaxation and Lagrangian heuristics in integer optimization is [32] by Fisher from 1981. In [57] the characteristics of the Lagrangian heuristic principle is described more formally and such heuristics are classified as conservative or radical, depending on their nature. The essential difference is that in a conservative Lagrangian heuristic, the goal is to keep the values of ε(x, u) and δ(x, u) [defined in (15.19)] small, while in a radical heuristic they may be large. A conservative heuristic typically starts at x(u) ∈ X(u) and makes small changes, while a radical often involves solving an auxiliary optimization problem over a subset of the original problem variables, while keeping the values of the other variables fixed. For examples of conservative heuristics for specific applications, see [18, 34, 38, 52]. Examples of radical heuristics are found in [16, 22, 73, 85]; all of these exploit auxiliary optimization problems, some of which, however, being trivially solved.

The presented Lagrangian heuristic methodology utilizing ergodic sequences is developed in [1, 43].

Notes on Sect. 15.6.2: Approximate Solutions to the Primal Problem via Core Problems

The use of core problems was introduced by Balas and Zemel [6] in 1981, then applied to binary knapsack problems; a collection of improvements of their scheme is found in [71].

The core problem principle has also been applied to very large-scale set covering models arising in crew scheduling [19, 22]; the construction of the core problem is there based on near optimal LP reduced costs found by Lagrangian relaxation and subgradient optimization. Further applications of core problems include capacitated facility location [5] and fixed charge transportation [94].

Our procedure for constructing core problems using ergodic sequences of Lagrangian subproblem solutions is developed in [42, 43].

Notes on Sect. 15.6.3: Optimal Solutions via a Branch-and-Bound Framework

The use of dual subgradient methods and Lagrangian heuristics as a means for obtaining branching rules and bounds in branch-and-bound schemes is well studied (e.g., [14, 32]). The utilization of ergodic sequences to guide the branching in a branch-and-bound framework is developed in [1, 43].