1 Introduction

The relevance of any mathematical concept lies in its ability to generate many fruits in diverse areas and to produce long-lasting effects. This is definitely the case of the Lagrange multipliers [38], which have influenced the development of modern mathematics and are still fertile as far as mathematical programming theory and algorithms are concerned.

In this chapter we confine the discussion to the treatment of numerical optimization problems in a finite dimension setting, where the decision variables are the vectors of \(\mathbb {R}^n\).

Looking at the cornerstones of the historical development of such an area, we observe that Lagrange multipliers have survived the crucial passages from equality- to inequality-constrained problems, as well as from continuous to discrete optimization. In addition they have shown their potential also when nonsmooth analysis has come into play [48, 49].

Originally the rationale of the introduction of Lagrange multipliers was to extend to the equality constrained case the possible reduction of an optimization problem to the solution of a system of nonlinear equations, based on the observation that an optimality condition must necessarily involve both objective function and constraints. The geometry of the constraints became more and more important as soon as inequality constrained problems were taken into consideration, and Lagrangian multipliers were crucial in the definition of the related optimality conditions.

The introduction of the duality theory highlighted the role of the Lagrangian multipliers in the game-theoretic setting, as well as in the study of value functions associated to optimization problems. More recently, the 1960s of last century, it was clear [18] that a Lagrangian multiplier-based approach was promising even in dealing with optimization problems of discrete nature, showing that in some sense it was possible to reduce the gap between continuous and integer optimization.

In this chapter we focus on Lagrangian relaxation, which was introduced in [29], for dealing with integer or mixed integer optimization problems and immediately conquered the attention of many scientist as a general-purpose tools for handling hard problems [21]. Pervasiveness of Lagrangian relaxation is well evidenced in [39].

The objective of the presentation is to demonstrate the usefulness of combined use of Lagrangian relaxation and heuristic algorithms to tackle hard integer programming problems. We will introduce several examples of application in fairly diverse areas of practical optimization.

The chapter is organized as follows. Basic notions on Lagrangian relaxation are in Sect. 17.2, while solution methods for tackling the Lagrangian dual are discussed in Sect. 17.3. A basic scheme of Lagrangian heuristics, together with an example of dual ascent for the classic set covering problem, is in Sect. 17.4. A number of applications in areas such as assignment, network optimization, sensor location, logistics and machine learning are discussed in Sect. 17.5. Some conclusions are drawn in Sect. 17.6.

2 Basic Concepts

We introduce first the following general definition.

Definition 17.1 (Relaxation)

Given a minimization problem (P) in the following form:

$$\displaystyle \begin{aligned} \begin{cases} \text{minimize}\quad &f({{\boldsymbol x}}) \\ \text{subject to} &{\boldsymbol x} \in X \subset \mathbb{R}^n, \end{cases} \end{aligned} $$
(17.1)

with \(f: \mathbb {R}^n \rightarrow R\), any problem (P R)

$$\displaystyle \begin{aligned} \begin{cases} \text{minimize}\quad &h({{\boldsymbol x}}) \\ \text{subject to} &{\boldsymbol x} \in X_R \subset \mathbb{R}^n, \end{cases} \end{aligned} $$
(17.2)

with \(h: \mathbb {R}^n \rightarrow \mathbb {R}\), is a relaxation of (P), provided that the two following conditions hold:

$$\displaystyle \begin{aligned} &X \subset X_R; {} \end{aligned} $$
(17.3)
$$\displaystyle \begin{aligned} &h({\boldsymbol x}) \leq f({\boldsymbol x}),\quad {\boldsymbol x} \in X. {} \end{aligned} $$
(17.4)

Under the above definition, if x is any (global) minimum of (P R), then h(x ) is a lower bound on the optimal value of the problem (P). Of course it is advantageous to spend some effort to obtain such information whenever (P R) is substantially easier to solve than (P).

There exist many different ways for constructing relaxations. For example, the usual continuous relaxation of any integer linear programming (ILP) problem satisfies the conditions (17.3) and (17.4).

We introduce Lagrangian relaxation starting from a linear program (LP) in standard form:

$$\displaystyle \begin{aligned} \begin{cases} \text{minimize}\quad &{\boldsymbol c}{{{}^T}} {\boldsymbol x} \\ \text{subject to} &A {\boldsymbol x} = {\boldsymbol b}, \\ & {\boldsymbol x} \geq 0, \end{cases} \end{aligned} $$
(17.5)

with \( {\boldsymbol x}, {\boldsymbol c} \in \mathbb {R}^n \), \(A \in \mathbb {R}^{m \times n}\) and \({\boldsymbol b} \in \mathbb {R}^m\). We assume that the problem is feasible and has optimal solution, so that the dual problem is feasible as well.

For any choice of the Lagrangian multiplier vector \(\boldsymbol {\lambda } \in \mathbb {R}^m\) (or, simply, the multiplier vector), we define the Lagrangian relaxation (LR(λ))

$$\displaystyle \begin{aligned} \begin{cases} \text{minimize}\quad &{\boldsymbol c}{{{}^T}}{\boldsymbol x} + {\boldsymbol{\lambda}}^T({\boldsymbol b} - A{\boldsymbol x}) \\ \text{subject to} & {\boldsymbol x} \geq 0, \end{cases} \end{aligned} $$
(17.6)

which can be rewritten as

$$\displaystyle \begin{aligned} \begin{aligned} {\boldsymbol b}{{{}^T}}{\boldsymbol{\lambda}} \, \,+ \, \,&\text{min} ~\, ({\boldsymbol c} - A^T\boldsymbol{\lambda})^T{\boldsymbol x} \\ &\text{subject to} ~\, {\boldsymbol x} \geq 0. \end{aligned} \end{aligned} $$
(17.7)

Letting z LR(λ) be the optimal value of the problem (17.7), we obtain

$$\displaystyle \begin{aligned} z_{LR}(\boldsymbol{\lambda})=\begin{cases} {\boldsymbol b}{{{}^T}}{\boldsymbol{\lambda}}, \quad \text{if } \quad {\boldsymbol c} - A^T\boldsymbol{\lambda} \geq 0, \\ -\infty, \quad \text{otherwise.} \end{cases} \end{aligned} $$
(17.8)

From the definition of relaxation, z LR(λ) is a lower bound for the LP problem (17.5), possibly the trivial one for those values of λ which do not satisfy the condition A Tλ ≤c. It is quite natural to look for the best lower bound and this results in solving the problem

$$\displaystyle \begin{aligned} \displaystyle \text{maximize} \quad z_{LR}(\boldsymbol{\lambda}), \end{aligned} $$
(17.9)

which, taking into account (17.8), is exactly the dual of (17.5),

$$\displaystyle \begin{aligned} \begin{cases} \text{maximize} \quad &{\boldsymbol b}{{{}^T}}{\boldsymbol{\lambda}} \\ \text{subject to} \quad &A{{{}^T}}\boldsymbol{\lambda} \leq {\boldsymbol c}. \end{cases} \end{aligned} $$
(17.10)

The problem (17.9) will be referred to as the Lagrangian dual of the LP problem (17.5), and in fact the optimal solution λ of the dual is optimal for (17.9) as well.

The main motivation for the introduction of Lagrangian relaxation is the treatment of ILP problems. We will not consider in the sequel the mixed integer linear programming (MILP) case, where Lagrangian relaxation theory can be developed in a completely analogous way as in the pure case. The binary linear programming (BLP) problems can be considered as a special case of ILP. Thus we focus on the following problem:

$$\displaystyle \begin{aligned} \begin{cases} \text{minimize}\quad &{\boldsymbol c}{{{}^T}} {\boldsymbol x} \\ \text{subject to} &A {\boldsymbol x} = {\boldsymbol b}, \\ & B {\boldsymbol x}= {\boldsymbol d}, \\ & {\boldsymbol x} \geq 0, \,\text{integer,} \end{cases} \end{aligned} $$
(17.11)

with \( {\boldsymbol x}, {\boldsymbol c} \in \mathbb {R}^n \), \(A \in \mathbb {R}^{m \times n}\), \(B \in \mathbb {R}^{p \times n}\), \({\boldsymbol b} \in \mathbb {R}^m\) and \({\boldsymbol d} \in \mathbb {R}^p\). We assume that the problem is feasible and that the set

$$\displaystyle \begin{aligned} X= \{{\boldsymbol x} \in \mathbb{R}^n \mid B {\boldsymbol x}= {\boldsymbol d}, \,\,\, {\boldsymbol x}\geq 0, \,\, \text{integer}\} \end{aligned} $$

is finite, that is X = {x 1, x 2, …, x K} and we denote by \(\mathcal {K} = \{1,2,\ldots , K \}\) the corresponding index set. In writing the problem (17.11) two different families of constraints are highlighted, those defined through Ax = b being the complicating ones. By focusing exclusively on such set of constraints, we come out with the Lagrangian relaxation defined for \(\boldsymbol {\lambda } \in \mathbb {R}^m\)

$$\displaystyle \begin{aligned} \begin{cases} \text{minimize}\quad &{\boldsymbol c}{{{}^T}}{\boldsymbol x} + {\boldsymbol{\lambda}}^T({\boldsymbol b} - A{\boldsymbol x}) \\ \text{subject to} & {\boldsymbol x} \in X, \end{cases} \end{aligned} $$
(17.12)

which is supposed easier to solve than the original problem. By letting x(λ) and z LR(λ) be, respectively, the optimal solution and the optimal value of (17.12), it is

$$\displaystyle \begin{aligned} z_{LR}(\boldsymbol{\lambda})= {\boldsymbol c}{{{}^T}}{\boldsymbol x}(\boldsymbol{\lambda}) + {\boldsymbol{\lambda}}^T({\boldsymbol b} - A{\boldsymbol x}(\boldsymbol{\lambda})) = \displaystyle \min_{k \in \mathcal{K}} {\boldsymbol c}^T{\boldsymbol x}_k + {\boldsymbol{\lambda}}^T({\boldsymbol b} - A{\boldsymbol x}_k). \end{aligned} $$
(17.13)

Remark 17.1

Function z LR(λ) is often referred to as the dual function. Note that, in case x(λ) is feasible, that is Ax(λ) = b, then it is also optimal for the original problem (17.11)

In order to look for the best among the lower bounds z LR(λ), we define also in this case the Lagrangian dual

$$\displaystyle \begin{aligned} z_{LD} = \underset{{ \boldsymbol{\lambda}} \in \mathbb{R}^m}\max \quad z_{LR}(\boldsymbol{\lambda}), \end{aligned} $$
(17.14)

that is, from (17.13)

$$\displaystyle \begin{aligned} z_{LD}= \underset{{ \boldsymbol{\lambda}}} {\displaystyle \max}\, \min_{k \in \mathcal{K}} {\boldsymbol c}{{{}^T}}{\boldsymbol x}_k + \mathbf{ \boldsymbol{\lambda}}^T({\boldsymbol b} - A{\boldsymbol x}_k). \end{aligned} $$
(17.15)

The optimal value z LD is the best lower bound obtainable through Lagrangian relaxation.

It is worth noting that the problem (17.15), which we will discuss later in more details, consists in the maximization of a concave and piecewise affine function. It is in fact the NSO problems which can be tackled by means of any of the methods described in this book. On the other hand, by introducing the additional variable \(v \in \mathbb {R}\), it can be rewritten in the following LP form:

$$\displaystyle \begin{aligned} \begin{cases} \underset{{ \boldsymbol{\lambda}},v} {\displaystyle \text{maximize}}\quad &v\\ \text{subject to} & v \leq {\boldsymbol c}{{{}^T}}{\boldsymbol x}_k + {\boldsymbol{\lambda}}^T({\boldsymbol b} - A{\boldsymbol x}_k ),\,\,\,k \in \mathcal{K}, \end{cases} \end{aligned} $$
(17.16)

whose dual, in turn, is

$$\displaystyle \begin{aligned} \begin{cases} \underset{{ \boldsymbol{\mu}}} {\displaystyle \text{minimize}}\quad &{\boldsymbol c}{{{}^T}} \displaystyle (\sum_{k \in \mathcal{K}}\mu_k{\boldsymbol x}_k)\\ \text{subject to} & A \displaystyle (\sum_{k \in \mathcal{K}}\mu_k{\boldsymbol x}_k)= {\boldsymbol b}, \\ & \displaystyle \sum_{k \in \mathcal{K}}\mu_k=1, \\ &\mu_k \geq 0, \,\,k \in \mathcal{K}, \end{cases} \end{aligned} $$
(17.17)

or, equivalently,

$$\displaystyle \begin{aligned} \begin{cases} \text{minimize}\quad &{\boldsymbol c}{{{}^T}}{\boldsymbol x} \\ \text{subject to} & A {\boldsymbol x}= {\boldsymbol b}, \\ &{\boldsymbol x} \in \operatorname{\mathrm{conv}} X,\\ \end{cases} \end{aligned} $$
(17.18)

where \( X= \{{\boldsymbol x} \in \mathbb {R}^n \mid B {\boldsymbol x}= {\boldsymbol d}, \,\, {\boldsymbol x} \geq 0 \,\, \text{integer}\}.\) The Lagrangian dual is thus a partially convexified version of the ILP (17.11). From the inclusion

$$\displaystyle \begin{aligned} \operatorname{\mathrm{conv}} X \subseteq \{{\boldsymbol x} \in \mathbb{R}^n \mid B {\boldsymbol x}= {\boldsymbol d}, \,\, {\boldsymbol x} \geq 0\} = \bar{X}, \end{aligned} $$

it follows

$$\displaystyle \begin{aligned} z_{LD} \geq z_{LP}, \end{aligned} $$
(17.19)

where z LP is the optimal value of the continuous LP relaxation of (17.11):

$$\displaystyle \begin{aligned} \begin{cases} \text{minimize}\quad &{\boldsymbol c}{{{}^T}} {\boldsymbol x} \\ \text{subject to} &A {\boldsymbol x} = {\boldsymbol b}, \\ & B {\boldsymbol x}= {\boldsymbol d}, \\ & {\boldsymbol x} \geq 0. \end{cases} \end{aligned} $$
(17.20)

The property (17.19) indicates that the lower bound provided by the best (not by any!) multiplier vector setting is not worse than the optimal value of the continuous relaxation. Moreover, it is important to observe that in case the so-called integrality property holds, that is in case the vertices of the polyhedron \(\bar {X}\) are integer, it is z LD = z LP.

Whenever the integrality property is satisfied (and this is a rather common case in integer programming applications) Lagrangian relaxation may appear a quite weak approach compared to classic continuous relaxation: we need to solve a NSO problem of the maxmin type just to get he same bound obtainable by solving a LP! Nonetheless, Lagrangian relaxation may be a useful tool also in this case for the following reasons:

  • in several applications the continuous relaxation is a huge LP and it may be a good idea not to tackle it;

  • in Lagrangian relaxation the decision variables (think e.g. binary variables) keep the original physical meaning, which, instead, gets lost in continuous relaxation;

  • the possibly infeasible solutions of the Lagrangian relaxation are often more suitable for repairing heuristics than those of the continuous relaxation.

Coming back to the two formulations (17.15) and (17.17), we observe that they offer two possible schemes for solving the Lagrangian dual.

Consider first (17.15), which is a finite maxmin problem where the function to be maximized is concave. In fact it is defined as the pointwise minimum of a finite (possibly very large) number of affine functions of variable λ, one for each x k ∈ X. Application of the classic cutting plane model consists in generating an (upper) approximation of the \(\min \) function, initially based on a relatively small number of affine pieces, which becomes more and more accurate as new affine pieces (cuts) are added. In practice, taking any subset \( \mathcal {S} \subset \mathcal {K}\) (and the correspondent points x k ∈ X, \(k \in \mathcal {S}\)) the cutting plane approximation \(z_{\mathcal {S}}(\boldsymbol {\lambda })\) of z LR(λ) is defined as

$$\displaystyle \begin{aligned}z_{\mathcal{S}}(\boldsymbol{\lambda})=\min_{k \in \mathcal{S}} {\boldsymbol c}{{{}^T}}{\boldsymbol x}_k + {\boldsymbol{\lambda}}^T({\boldsymbol b} - A{\boldsymbol x}_k),\end{aligned}$$

and thus we come out with the restricted primal problem

$$\displaystyle \begin{aligned} z_{restr}(\mathcal{S})= \underset{{ \boldsymbol{\lambda}}}{\max} \min_{k \in \mathcal{S}} {\boldsymbol c}{{{}^T}}{\boldsymbol x}_k + \mathbf{ \boldsymbol{\lambda}}^T({\boldsymbol b} - A{\boldsymbol x}_k), \end{aligned} $$
(17.21)

which, in turn, can be put in the LP form of the type (17.16):

$$\displaystyle \begin{aligned} \begin{cases} \underset{{ \boldsymbol{\lambda}},v} {\displaystyle \text{maximize}}\quad &v\\ \text{subject to} & v \leq {\boldsymbol c}{{{}^T}}{\boldsymbol x}_k + {\boldsymbol{\lambda}}^T({\boldsymbol b} - A{\boldsymbol x}_k ),\,\,\,k \in \mathcal{S}. \end{cases} \end{aligned} $$
(17.22)

Assuming the problem (17.21) is not unbounded and letting \(\boldsymbol {\lambda }_{\mathcal {S}}\) be any optimal solution, we calculate now \(z_{LR}(\boldsymbol {\lambda }_{\mathcal {S}})\) (in fact we solve the Lagrangian relaxation for \(\boldsymbol {\lambda }= \boldsymbol {\lambda }_{\mathcal {S}}\)):

$$\displaystyle \begin{aligned} z_{LR}(\boldsymbol{\lambda}_{\mathcal{S}})=\min_{k \in \mathcal{K}} {\boldsymbol c}{{{}^T}}{\boldsymbol x}_k + \mathbf{\boldsymbol{\lambda}_{\mathcal{S}}}^T({\boldsymbol b} - A{\boldsymbol x}_k)={\boldsymbol b}^T\mathbf{\boldsymbol{\lambda}_{\mathcal{S}}}+\min_{k \in \mathcal{K}}({\boldsymbol c}-A^T\mathbf{\boldsymbol{\lambda}_{\mathcal{S}}})^T{\boldsymbol x}_k. \end{aligned} $$
(17.23)

Letting the optimal solution of the above problem be attained in correspondence to any index, say \(k_{\mathcal {S}}\), and assuming \(z_{LR}(\boldsymbol {\lambda }_{\mathcal {S}})\) be sufficiently smaller than \(z_{restr}(\mathcal {S})\), the procedure is then iterated after augmenting the set of affine pieces in (17.21) by the one associated to the newly generated point \({\boldsymbol x}_{k_{\mathcal {S}}}\).

Consider now the formulation of the Lagrangian dual provided by (17.17). It is a (possibly very large) LP in a form particularly suitable for column generation [53]. We observe in fact that the columns are associated to points x k, \(k \in \mathcal {K}\), in perfect analogy with the association affine pieces–points of X in the formulation (17.15). To apply the simplex method it is necessary to start from any subset of columns providing a basic feasible solution and then look for a column with negative reduced cost. Such operation is not necessarily accomplished by examining the reduced costs of all non basic columns, instead it can be implemented by solving a pricing problem. In fact the constraint matrix in (17.17) has size (m + 1) × K and has the form:

$$\displaystyle \begin{aligned} \left[ \begin{array}{ccc} A{\boldsymbol x}_1& \ldots &A{\boldsymbol x}_K\\ 1&\ldots&1\\ \end{array} \right]. \end{aligned} $$
(17.24)

Letting λ B be the vector assembling the first m dual variables associated to the basis, the components of the reduced cost vector \(\hat {{\boldsymbol c}}\) are:

$$\displaystyle \begin{aligned}\hat{{\boldsymbol c}}_k = ({\boldsymbol c}-A{{{}^T}}\boldsymbol{\lambda}_B)^T{\boldsymbol x}_k - \lambda_{m+1},\,\,\,k \in \mathcal{K},\end{aligned}$$

where λ m+1 is the last dual variable and plays the role of a constant in the definition of \(\hat {{\boldsymbol c}}_k\), \(k \in \mathcal {K}\). Consequently, the pricing problem consists of solving

$$\displaystyle \begin{aligned} \min_{k \in \mathcal{K}} \, ({\boldsymbol c}-A{{{}^T}}\boldsymbol{\lambda}_B)^T{\boldsymbol x}_k, {} \end{aligned} $$
(17.25)

which is a problem formally identical to (17.23).

Remark 17.2

It is worth noting that both (17.23) and (17.25) need not be solved at optimality, as for the former it is sufficient to generate a cut, that is to calculate any index k, \(k \notin \mathcal {S}\) such that

$$\displaystyle \begin{aligned} {\boldsymbol c}{{{}^T}}{\boldsymbol x}_k + \mathbf{\boldsymbol{\lambda}_{\mathcal{S} }}{{{}^T}}({\boldsymbol b} - A{\boldsymbol x}_k) < z_{restr}(\mathcal{S}),\end{aligned}$$

while for the latter the optimization process can be stopped as soon as a column with the negative reduced cost has been detected.

We observe, finally, that in the formulation (17.11) the complicating constraints Ax = b are in the equality form. Lagrangian relaxation is well defined also in case they are of the type Ax ≥b, the only difference being in the need of setting λ ≥ 0. This fact implies that the Lagrangian dual is now

$$\displaystyle \begin{aligned} \underset{{ \boldsymbol{\lambda} \geq {\mathbf{0}}}} {\displaystyle \max} \quad z_{LR}(\boldsymbol{\lambda}). \end{aligned} $$
(17.26)

It is worth noting that feasibility of x(λ) no longer implies optimality. In fact it can be easily proved that now x(λ) is only 𝜖-optimal, for

$$\displaystyle \begin{aligned}\epsilon=(A{\boldsymbol x}(\boldsymbol{\lambda})-{\boldsymbol b}){{{}^T}}\boldsymbol{\lambda} \geq 0.\end{aligned}$$

3 Tackling the Lagrangian Dual

Solving the Lagrangian dual problem (17.14) requires maximization of a concave and piecewise affine function (see (17.15)). Consequently, all the available machinery to deal with convex nondifferentiable optimization can be put in action. We have already sketched in previous section possible use of the cutting plane method [13, 35]. On the other hand the specific features of the Lagrangian dual can be fruitfully exploited.

The basic distinction is between algorithms which do or do not use the differential properties of function z LR. Observe that a subgradient (concavity would suggest the more appropriate term supergradient) is immediately available as soon as z LR has been calculated. Letting in fact (see (17.13))

$$\displaystyle \begin{aligned} z_{LR}(\boldsymbol{\lambda})= {\boldsymbol c}{{{}^T}}{\boldsymbol x}(\boldsymbol{\lambda}) + {\boldsymbol{\lambda}}^T({\boldsymbol b} - A{\boldsymbol x}(\boldsymbol{\lambda})) = \displaystyle \min_{k\in \mathcal{K}} {\boldsymbol c}^T{\boldsymbol x}_k + {\boldsymbol{\lambda}}^T({\boldsymbol b} - A{\boldsymbol x}_k) \end{aligned}$$

we observe that, for any \(\boldsymbol {\lambda }' \in \mathbb {R}^m\), it is

$$\displaystyle \begin{aligned} \begin{array}{lll} z_{LR}(\boldsymbol{\lambda}')&=\displaystyle \min_{k\in \mathcal{K}} {\boldsymbol c}{{{}^T}}{\boldsymbol x}_k + \mathbf{\boldsymbol{\lambda}'}^T({\boldsymbol b} - A{\boldsymbol x}_k)\leq {\boldsymbol c}^T{\boldsymbol x}(\boldsymbol{\lambda}) + \mathbf{\boldsymbol{\lambda}'}^T({\boldsymbol b} - A{\boldsymbol x}(\boldsymbol{\lambda})) & \\ &={\boldsymbol c}{{{}^T}}{\boldsymbol x}(\boldsymbol{\lambda}) + \mathbf{\boldsymbol{\lambda}'}^T({\boldsymbol b} - A{\boldsymbol x}(\boldsymbol{\lambda}))+\mathbf{ \boldsymbol{\lambda}}^T({\boldsymbol b} - A{\boldsymbol x}(\boldsymbol{\lambda}))-{\boldsymbol{\lambda}}^T({\boldsymbol b} - A{\boldsymbol x}(\boldsymbol{\lambda}))&\\ &=z_{LR}(\boldsymbol{\lambda})+ (\boldsymbol{\lambda}' - \boldsymbol{\lambda} ){{{}^T}}({\boldsymbol b} - A{\boldsymbol x}(\boldsymbol{\lambda})), \end{array} \end{aligned}$$

thus (b − Ax(λ)) is a subgradient of z LR at λ, that is g(λ) = (b − Ax(λ)) ∈ ∂z LR(λ).

As for methods that use the subgradient for maximizing z LR(λ), we mention first the classic (normalized) subgradient method [46, 51], where the h-th iteration is

$$\displaystyle \begin{aligned}\boldsymbol{\lambda}_{h+1}= \boldsymbol{\lambda}_{h}+ t_h \frac{{\boldsymbol g}(\boldsymbol{\lambda}_h)}{\|{\boldsymbol g}(\boldsymbol{\lambda}_h)\|},\end{aligned}$$

t h being the step size along the normalized subgradient direction. We remark that monotonicity of the sequence of values z LR(λ h) is not ensured, while convergence to a maximum is guaranteed under the well known conditions on the step size t h → 0 and \(\sum _{h=1}^{\infty }t_h \rightarrow \infty \). Very popular formulae for setting t h are

$$\displaystyle \begin{aligned}\displaystyle t_h=\frac{C}{h},\end{aligned} $$
(17.27)

and the Polyak formula

$$\displaystyle \begin{aligned} t_h=\displaystyle \frac{\hat{z}_{LD}^{(h)}-z_{LR}(\boldsymbol{\lambda}_h)}{\|{\boldsymbol g}(\boldsymbol{\lambda}_h)\|},\end{aligned} $$
(17.28)

where C is any positive constant and \(\hat {z}_{LD}^{(h)}\) is an improving overestimate at the iteration h of z LD, the optimal value of the Lagrangian dual.

Subgradient-type were the first and the most widely used methods for dealing with the Lagrangian dual, despite their slow convergence, mainly for their implementation simplicity. In more recent years, stemming from the approach introduced in [44], the so-called fast gradient methods have received considerable attention [26]. The approach consists in smoothing first the objective function and applying next gradient-type methods. Besides such stream, incremental subgradient methods, which are applicable whenever the objective function is expressed as a sum of several convex component functions, have been devised to deal with the Lagrangian dual [7, 36].

A careful analysis of the performance of subgradient methods for Lagrangian relaxation is in [25].

The cutting plane method previously summarized has been the building block for devising two families of algorithms, the bundle [34] and the analytic center [30] methods for convex minimization (or, equivalently, concave maximization). They can be considered as an evolution of the cutting plane, aimed at overcoming some inherent weakness in terms of stability, speed and well posedness. An update discussion on bundle methods is in Chap. 3 of this book.

The above methods have been successfully employed in several applications of Lagrangian relaxation in frameworks such as energy management [5, 8, 19, 31, 40], network design [25], train timetabling [20], telecommunication networks [41], logistics [43] etc.

As for methods which do not make explicit use of the subgradient of function z LR, they are iterative algorithms of the coordinate or block-coordinate search type, according to the fact that at each iteration just one or a (small) subset of multipliers is modified. The update is aimed at increasing z LR, thus such class of methods is, in general, referred to as the dual ascent one [33].

The choice of the component of the current multiplier vector λ to be updated is usually driven by the properties of the optimal solution x(λ). In general one picks up a violated constraint and modifies the corresponding multiplier trying to achieve a twofold objective:

  • to increase z LR;

  • to ensure feasibility of the previously violated constraint.

In several cases it is possible to calculate exactly (sometimes by solving some auxiliary problem) the multiplier update which guarantees satisfaction of both the above objectives. Most of the times, however, it is necessary to introduce a line search capable to handle possible null step exit, which cannot be excluded as consequence of nonsmoothness of function z LR.

From the computational point of view, comparison of dual ascent and subgradient methods shows that the former produce in general more rapid growth of the dual function z LR. On the other hand, dual ascent algorithms are often affected by premature stop at points fairly far from the maximum, whenever no coordinate direction is actually an ascent one. In such a case it is useful to accommodate for re-initialization, by adopting any subgradient as restart direction.

4 Lagrangian Heuristics

As pointed out in [39], Lagrangian relaxation is more than just a technique to calculate lower bounds. Instead, it is a general philosophy to approach problems which are difficult to tackle, because of their intrinsic complexity.

Lagrangian relaxation has been extensively used in the framework of exact methods for integer programming. We will not enter into the discussion on the best ways to embed Lagrangian relaxation into branch and bound, branch and cut and branch and price algorithms. We refer the interested reader to the surveys [24] and [32].

We will focus, instead, on the use of Lagrangian relaxation in the so-called Lagrangian heuristic framework for solving the problem (17.11). The computational scheme is fundamentally the following:

Algorithm 17.1: Lagrangian heuristic

The above scheme is just an abstract description of how a Lagrangian heuristic works and several points need to be specified.

As for the initialization, a possible choice, whenever the LP continuous relaxation (17.20) is not too hard to solve, is to set \(\boldsymbol {\lambda }^{(0)}= \boldsymbol {\lambda }^*_{LP}\), where \(\boldsymbol {\lambda }^*_{LP}\) is the dual optimal vector associated to constraints Ax = b.

At Step 2 the Lagrangian relaxation (17.12) is solved. As previously mentioned, it is expected to be substantially easier than the original problem (17.11), and in fact it is solvable, in many applications, in polynomial or pseudo-polynomial time. This is not always the case, as (17.12) may be still a hard combinatorial problem. In such cases it is possible to substitute an approximate calculation of z LR to the exact one, which amounts to adopt a heuristic algorithm to tackle the Lagrangian relaxation (17.12) at Step 1.

Inexact calculation of z LR has a strong impact on the way in which the Lagrangian dual (17.14) is tackled at Step 3. Here all machinery of convex optimization with inexact calculation of the objective function enters into play. For an extensive treatment of the subject see [16, 37] and Chap. 12 of this book. An example of the use of inexact calculation of z LR in a Lagrangian heuristic framework is in [27].

The repairing heuristic at Step 2 is, of course, the problem dependent and it is very often of the greedy type.

The termination test depends on the type of algorithm used for solving the Lagrangian dual. Classic termination tests based on approximate satisfaction of the condition 0 ∈ ∂z LR(λ (k)) can be used whenever bundle type algorithms are adopted. In case subgradient or dual ascent algorithms are at work, stopping tests based on variation of z LR can be adopted, together with a bound on the maximum number of iterations.

An important feature of Lagrangian heuristics is that they provide both a lower and an upper bound and, consequently, significant indications on the upper–lower bound gap are often at hand.

It is worth remarking, however, that, differently from the case where Lagrangian relaxation is used within an exact algorithm, here the main aim is to produce good quality feasible solutions rather than to obtain tight lower bounds. In fact it happens in several applications that poor lower bounds are accompanied by a fairly good upper bound. This is probably due to the fact that, as the algorithm proceeds, many feasible solutions are explored, through a search mechanism where multiplier update is finalized to get feasibility. From this point of view, adopting a not too fast NSO algorithm is not always a disadvantage!

Consequence of the above observations is that Lagrangian heuristics can be satisfactorily applied even to problems which exhibit the integrality property, that is also in case one knows in advance that Lagrangian dual is unable to produce anything better than the continuous LP relaxation lower bound.

4.1 An Example of Dual Ascent

The following example shows how to get, by tackling the Lagrangian dual, both a lower and an upper bound for a classic combinatorial problem. The Lagrangian dual is approached by means of a dual ascent algorithm, which works as a co-ordinate search method and modifies just one multiplier at a time.

Consider the standard set covering problem (SCP), which is known to be NP-hard. Suppose we are given a ground-set \(\mathcal {I}=\{1,\ldots ,m\}\), a family of n subsets \(\mathcal {S}_j\subseteq \mathcal {I}\), \(j\in \mathcal {J}=\{1,\ldots n\}\) and a real cost vector \({\boldsymbol c} \in \mathbb {R}^n\), c > 0. The problem is to select a minimum cost cover, that is an index set \(\mathcal {J}^*\subseteq \mathcal {J}\) such that \(\displaystyle \cup _{j\in \mathcal {J}^*}\mathcal {S}_j=\mathcal {I}\) with minimal associated cost \(\sum _{j \in \mathcal {J}^*}{\boldsymbol c}_j\). By defining the m × n binary incidence matrix A of the subsets \(\mathcal {S}_j\) and letting e be a vector of m ones, the SCP reads as follows

$$\displaystyle \begin{aligned} \begin{cases} \text{minimize}\quad &{\boldsymbol c}{{{}^T}} {\boldsymbol x} \\ \text{subject to} &A {\boldsymbol x} \geq {\boldsymbol e}, \\ & {\boldsymbol x} \,\,\text{binary}, \end{cases} \end{aligned} $$
(17.29)

where x is a binary decision variable vector with x j = 1 if j is taken into the cover and x j = 0 otherwise, \(j \in \mathcal {J}\).

By relaxing the covering constraints Ax ≥e by means of the multiplier vector \(\boldsymbol {\lambda } \geq {\mathbf {0}},~\boldsymbol {\lambda } \in \mathbb {R}^m\), we obtain

$$\displaystyle \begin{aligned} \begin{aligned} z_{LR}(\boldsymbol{\lambda}) \,\,= \,\, &\text{min} ~\, {\boldsymbol c}{{{}^T}}{\boldsymbol x}+\boldsymbol{\lambda}^T({\boldsymbol e}-A{\boldsymbol x}) \\ &\text{subject to} ~\,\, {\boldsymbol x} \,\,\text{binary}, \end{aligned} \end{aligned} $$
(17.30)

which can be in turn rewritten as

$$\displaystyle \begin{aligned} \begin{aligned} z_{LR}(\boldsymbol{\lambda}) \,=\, {\boldsymbol e}{{{}^T}}\boldsymbol{\lambda} \,\,+ \,\, &\text{min} ~\, ({\boldsymbol c}-A^T\boldsymbol{\lambda})^T{\boldsymbol x} \\ &\text{subject to} ~\,\, {\boldsymbol x} \,\,\text{binary}. \end{aligned} \end{aligned} $$
(17.31)

An optimal solution x(λ) to the problem (17.31) can be obtained by simple inspection of the (reduced) cost vector c(λ) = c − A Tλ, by setting

$$\displaystyle \begin{aligned}x_j(\boldsymbol{\lambda})= \begin{cases} 1, & \text{if} \,\,\, c_j(\boldsymbol{\lambda})=c_j-{\boldsymbol a}_j{{{}^T}}\boldsymbol{\lambda}\leq 0, \\ 0, & \text{otherwise}, \\ \end{cases} \end{aligned} $$
(17.32)

where a j is the column j of matrix A. Observe that in this case the integrality property holds, thus z LD = z LP.

We introduce now a rule for updating the multiplier vector λ, in case the corresponding x(λ) is infeasible, so that in the new multiplier setting:

  • the number of satisfied constraints is increased;

  • the function z LR increases as well.

We proceed by updating just one component of λ. Since we assume that x(λ) is infeasible, there exists at least one row index, say h, such that:

$$\displaystyle \begin{aligned} \sum_{j\in \mathcal{J}} a_{hj}x_j(\boldsymbol{\lambda})=0, {} \end{aligned} $$
(17.33)

which implies, taking into account (17.32),

$$\displaystyle \begin{aligned}c_j(\boldsymbol{\lambda}) = c_j-{\boldsymbol a}_j{{{}^T}}\boldsymbol{\lambda}>0, \,\,\,\,\mbox{ for all ~} j \in \mathcal{J}^{(h)}=\{j~|~a_{hj}=1\}. \end{aligned}$$

Now, defining a new multiplier setting in the form λ + = λ + δe h for some δ > 0, where e h is the h-th unit vector, the updated reduced cost is c(λ +) = c(λ) − δA Te h, that is

$$\displaystyle \begin{aligned}c_j(\boldsymbol{\lambda}^+)= \begin{cases} c_j(\boldsymbol{\lambda})- \delta, & \text{if} \,\,\, j \in \mathcal{J}^{(h)}, \\ c_j(\boldsymbol{\lambda}), & \text{otherwise}. \\ \end{cases} \end{aligned} $$
(17.34)

In particular, by setting \(\delta = \min _{j \in \mathcal {J}^{(h)}}c_j(\boldsymbol {\lambda })=c_{j^*}(\boldsymbol {\lambda })\), it is \(c_{j^*}(\boldsymbol {\lambda }^+)=0\) and thus, from (17.32), \(x_{j^*}(\boldsymbol {\lambda }^+)=1\). Summing up it is

$$\displaystyle \begin{aligned} x_j(\boldsymbol{\lambda}^+)= \begin{cases} x_j(\boldsymbol{\lambda}), & \text{if} \,\,\, j \neq j^*, \\ 1, & \text{if} \,\,\, j = j^*. \\ \end{cases} \end{aligned} $$
(17.35)

Corresponding to the new solution x(λ +), the constraint h is no longer violated and it is z LR(λ +) = z LR(λ) + δ. Summing up, dual ascent has been achieved and at least one of the constraints previously violated is satisfied, while the satisfied ones remain such. By iterating the multiplier update, we obtain, in at most m steps, both a feasible solution and a lower bound, produced by a sequence of ascent steps.

Algorithm 17.2: Dual ascent for set covering

At stop in correspondence to any iteration index k, we obtain both a feasible solution x(λ (k)), with associate objective function value z UB, together with the lower bound z LR(λ (k)).

Remark 17.3

For set covering problem the integrality property holds true, thus we cannot expect from the above procedure anything better than the LP lower bound. Moreover, since it is not at all guaranteed that the optimum of the Lagrangian is achieved when the algorithm stops, the lower bound provided might be definitely worse than the LP one. On the other hand the interplay between quest for feasibility and dual function improvement is a typical aspect of the applications we are going to describe in next section.

5 Applications

In this section we describe a number of applications of Lagrangian relaxation to integer programming problems coming from fairly diverse areas. In almost all cases a Lagrangian heuristic based on the abstract scheme sketched in Sect. 17.4 is designed. The Lagrangian dual is dealt with either via dual ascent or via subgradient algorithms. In particular the following problems will be treated:

  • generalized assignment [22];

  • spanning tree with minimum branch vertices [10];

  • directional sensors location [3];

  • cross docking scheduling [14];

  • feature selection in support vector machines [28];

  • multiple instance learning [4].

The presentation is necessarily synthetic and no numerical results are presented. The interested reader is referred to the original papers and to the references therein.

5.1 Generalized Assignment

A classic application of a dual ascent procedure based on updating one multiplier at a time is the method for solving the generalized assignment problem (GAP) described in [22]. GAP can be seen as the problem of assigning jobs to machines with limited amount of a resource (e.g. time or space), with the objective of maximizing the value of the assignment.

The sets \(\mathcal {I}\) and \(\mathcal {J}\) of machine and job indices, respectively, are given, together with the following data:

  • a ij, the resource required by job j when processed on machine i, \(i\in \mathcal {I}\) and \(j \in \mathcal {J}\);

  • b i, resource availability of machine i, \(i \in \mathcal {I}\);

  • c ij, value of assigning job j to machine i, \(i\in \mathcal {I}\) and \(j \in \mathcal {J}\).

Defining, for \(i\in \mathcal {I}\) and \(j \in \mathcal {J}\), the decision variable x ij = 1 if job j is assigned to machine i and x ij = 0 otherwise, the GAP is then formulated:

$$\displaystyle \begin{aligned} \begin{cases} \text{maximize}\,\,\,\,\,\,\,\displaystyle \sum_{i \in \mathcal{I}} \sum_{j \in \mathcal{J}} c_{ij} x_{ij}\\ \text{subject to} \,\,\,\,\,\, \displaystyle \sum_{i \in \mathcal{I}} x_{ij}=1,\,\,\,j \in \mathcal{J}, \\ \,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\, \displaystyle\sum_{j \in \mathcal{J}}a_{ij} x_{ij} \leq b_i,\,\,\,i \in \mathcal{I}, \\ \,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\, x_{ij} \,\,\text{binary}, \,\,\,i\in \mathcal{I}, \,\, j \in \mathcal{J} . \end{cases} \end{aligned} $$
(17.36)

A possible relaxation is obtained by acting on the semi-assignment constraints \(\sum _{i \in \mathcal {I}} x_{ij}=1\), \(j \in \mathcal {J}\), thus obtaining, for each choice of the multipliers λ j, \(j \in \mathcal {J}\) (they are grouped into vector λ), the following upper bound z LR(λ) (note that (17.36) is a maximization problem, hence the Lagrangian dual is a minimization problem):

$$\displaystyle \begin{aligned} \begin{aligned} z_{LR}(\boldsymbol{\lambda})\,=\,\displaystyle \sum_{j \in \mathcal{J}} \lambda_j \,\,+\,\,\,&\text{max} ~\quad \displaystyle\sum_{i \in \mathcal{I}} \displaystyle\sum_{j \in \mathcal{J}} (c_{ij}- \lambda_j) x_{ij} \\ &\text{subject to} \quad \displaystyle\sum_{j \in \mathcal{J}}a_{ij} x_{ij} \leq b_i,\,\,\,i \in \mathcal{I},\\ &\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\, \quad x_{ij} \,\,\text{binary}, \,\,\,i\in \mathcal{I}, \,\, j \in \mathcal{J} . \end{aligned} \end{aligned} $$
(17.37)

It is easy to verify that the problem (17.37) decomposes into \(|\mathcal {I}|\) binary knapsack subproblems, that is

$$\displaystyle \begin{aligned}z_{LR}(\boldsymbol{\lambda})=\displaystyle \sum_{j \in \mathcal{J}} \lambda_j+ \displaystyle \sum_{i \in \mathcal{I}} z^{(i)}_{LR}(\boldsymbol{\lambda}),\end{aligned}$$

where

$$\displaystyle \begin{aligned} \begin{aligned} z^{(i)}_{LR}(\boldsymbol{\lambda})\,=\,\,\,&\text{max} ~\quad \displaystyle \sum_{j \in \mathcal{J}} (c_{ij}- \lambda_j) x_{ij} \\ &\text{subject to} \quad \displaystyle\sum_{j \in \mathcal{J}}a_{ij} x_{ij} \leq b_i ,\\ &\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\, \quad x_{ij} \,\,\text{binary}, \,\,\, j \in \mathcal{J} . \end{aligned} \end{aligned} $$
(17.38)

If for any λ the solution x(λ) of the Lagrangian relaxation satisfies the relaxed constraints, then it is optimal for the GAP as well. In [22] a judicious selection of the initial values of the multipliers by setting \(\lambda _j= \max ^{(2)}_{i \in \mathcal {I}} c_{ij}\), where \(\max ^{(2)}\) indicates the second largest number in a set, makes x(λ) to satisfy the condition \(\sum _{i \in \mathcal {I}} x_{ij}(\boldsymbol {\lambda })\leq 1\), \(j \in \mathcal {J}\). Thus the only possible infeasibilities of such solution are of the type \(\sum _{i \in \mathcal {I}} x_{ij}(\boldsymbol {\lambda })=0\), for one or more jobs.

The core of the algorithm is the multiplier vector update which is driven by such kind of infeasibility. In fact, consider any job h such that \(\sum _{i \in \mathcal {I}} x_{ih}(\boldsymbol {\lambda }) = 0\). Any decrease in multiplier λ h makes all reduced costs (c ih − λ h) of such unassigned job increase. As consequence, job h becomes more competitive in view of possible assignment. The key point of the algorithm is the possibility of calculating exactly the minimum reduction Δ h of multiplier λ h which allows, under the new setting, assignment of the previously unassigned job to at least one machine.

This calculation can be performed as follows. It is first calculated Δ ih, the minimum reduction in λ h which allows assignment of job h to machine i, \(i \in \mathcal {I}\). To this aim, the following auxiliary knapsack problem is solved:

$$\displaystyle \begin{aligned} \begin{aligned} z^{(i,h)}_{LR}(\boldsymbol{\lambda})\,\,=\,\,\,&\text{max} ~\quad \displaystyle \sum_{j \in \mathcal{J}, \, j \neq h} (c_{ij}- \lambda_j) x_{ij} \\ &\text{subject to} \quad \displaystyle\sum_{j \in \mathcal{J}, \, j \neq h}a_{ij} x_{ij} \leq b_i-a_{ih},\\ &\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\, \quad x_{ij} \,\,\text{binary}, \,\,\, j \in \mathcal{J}, \, j \neq h, \end{aligned} \end{aligned} $$
(17.39)

and then it is

$$\displaystyle \begin{aligned}\varDelta_{ih}=z^{(i)}_{LR}(\boldsymbol{\lambda}) -\big(c_{ih}-\lambda_h +z^{(i,h)}_{LR}(\boldsymbol{\lambda})\big) \geq 0.\end{aligned}$$

Finally we have

$$\displaystyle \begin{aligned}\varDelta_{h}= \min_{i \in \mathcal{I}}\varDelta_{ih},\end{aligned}$$

and it is easy to verify that, in case Δ h > 0 and letting λ + = λ − Δ he h, function z LR reduces of exactly Δ h.

We skip here some details of the algorithm, e.g., how to enforce condition \(\sum _{i \in \mathcal {I}} x_{ij}(\boldsymbol {\lambda })\leq 1\), j ∈ J to hold throughout the execution. We wish to emphasize, instead, that at each iteration just one multiplier is updated (thus the algorithm is a co-ordinate descent one). Moreover, unlike the algorithm for set covering discussed in previous section, it is not ensured that the number of satisfied constraints increases monotonically. However, termination at a feasible (and hence optimal) solution is guaranteed.

An approach inspired by this method was introduced in [11] to deal with a classic location–allocation problem known as terminal location.

5.2 Spanning Tree with Minimum Branch Vertices

In previous subsection it has been presented a dual ascent procedure, based on modification of one multiplier at a time, with exact calculation of the step size along the corresponding coordinate axis. Here we discuss a dual ascent algorithm for the spanning tree with minimum branch vertices (ST-MBV) problem which still works modifying just one multiplier at a time, but it is equipped with a line search along the coordinate axis.

Another application of Lagrangian relaxation to a variant of the Steiner tree problem is in [17].

The ST-MBV problem arises in the design of optical networks, where it is necessary to guarantee connection to all nodes of a given network. Thus a spanning tree is to be found. Since at least one switch must be installed at each node of the tree whose degree is greater than two (branch vertices) the problem is to find a ST-MBV.

This problem admits several formulations. We focus on the integer programming (IP) one described in [10], where a Lagrangian heuristic is presented in details.

We consider an undirected network G = (V, E), where V  denotes the set of n vertices and E the set of m edges. The decision variables are the following:

  • x e, e ∈ E, binary; x e = 1 if edge e is selected and x e = 0 otherwise;

  • y v, v ∈ V , binary; y v = 1 if vertex v is of the branch type (that is its degree, as vertex of the tree, is greater than two), and y v = 0 otherwise.

Then, the IP formulation of the MBV is the following:

$$\displaystyle \begin{aligned} \begin{cases} \text{minimize} &\displaystyle \sum_{v \in V}y_{v}\\ \text{subject to} & \displaystyle \sum_{e \in E}x_{e}=n-1, \\ & \displaystyle \sum_{e \in E(S)}x_e \leq |S|-1, \, \, S \subseteq V, \\ &\displaystyle \sum_{e \in A(v)}x_e -2 \leq \delta_v y_v, \,\, v \; \in V, \\ & y_v \,\,\text{binary}, \;\; v \in V ~\text{and} ~ x_{e} \,\,\text{binary}, \;\; e \in E, \end{cases} \end{aligned} $$
(17.40)

where for any given subset of vertices S we denote by E(S) the set of edges having both the endpoints in S. Moreover, we denote by A(v) the set of incident edges to vertex v and by δ v its size, i.e. δ v = |A(v)|. The objective function to be minimized is the total number of branch vertices. Constraints

$$\displaystyle \begin{aligned}\sum_{e \in E}x_{e}=n-1~~~~ \mbox{and}~~~ \sum_{e \in E(S)}x_e \leq |S|-1,~S \subseteq V,\end{aligned} $$

ensure that a spanning tree is actually detected, while the complicating constraints are ∑eA(v)x e − 2 ≤ δ vy v, v ∈ V . They guarantee that variable y v is set to 1 whenever v has more than two incident edges in the selected tree.

By introducing the multipliers λ v ≥ 0, v ∈ V  (grouped, as usual, in vector λ), we obtain the following Lagrangian relaxation:

(17.41)

By simple manipulations, z LR(λ) may be rewritten as

$$\displaystyle \begin{aligned} z_{LR}(\boldsymbol{\lambda})= -2\sum_{v \in V}\lambda_v + z^{(1)}_{LR}(\boldsymbol{\lambda})+z^{(2)}_{LR}(\boldsymbol{\lambda}), {} \end{aligned} $$
(17.42)

where \(z^{(1)}_{LR}(\boldsymbol {\lambda })\) and \(z^{(2)}_{LR}(\boldsymbol {\lambda })\) are defined as follows:

$$\displaystyle \begin{aligned} \begin{aligned} z^{(1)}_{LR}(\boldsymbol{\lambda})\,\,=\,\,\, & \text{min} \quad \displaystyle \sum_{v \in V}y_v (1-\delta_v \lambda_v)\\ &\text{subject to} \,\, ~ y_v \,\,\text{binary}, \;\; v \in V, \end{aligned} \end{aligned} $$
(17.43)

and

$$\displaystyle \begin{aligned} \begin{aligned} z^{(2)}_{LR}(\boldsymbol{\lambda})\,=\,\,\, & \text{min} \quad \displaystyle \sum_{v \in V}\sum_{e\in A(v)}\lambda_vx_{e}\\ &\text{subject to} \,\, ~ \displaystyle \sum_{e \in E}x_{e}=n-1, \\ &\qquad \qquad \,\, ~\displaystyle \sum_{e \in E(S)}x_e \leq |S|-1, \,\, S \subseteq V, \\ & \qquad \qquad \,\, ~ x_{e} \,\,\text{binary}, \;\; e \in E. \end{aligned} \end{aligned} $$
(17.44)

Note that z LR(λ) is rather easy to calculate. In fact the problem (17.43) is solved by inspection of the cost coefficients, by setting

$$\displaystyle \begin{aligned} \begin{cases} y_v=1, \,\, &\text{if} ~\,\,1-\delta_v\lambda_v \leq 0,\\ y_v=0, \,\, &\text{otherwise}, \end{cases} \end{aligned} $$
(17.45)

while \(z^{(1)}_{LR}(\boldsymbol {\lambda })\) is the optimal value of the minimum spanning tree problem where the weight of edge e = (u, v) is λ u + λ v.

As for the Lagrangian dual, it is possible to prove [10] that the optimal multiplier vector λ satisfies the condition

$$\displaystyle \begin{aligned} \lambda^*_v \leq \frac{1}{\delta_v}, \,\,v \in V {}. \end{aligned} $$
(17.46)

On the basis of such property it is possible to devise an ascent strategy which modifies one multiplier at a time. Suppose that (x(λ), y(λ)) is an optimal solution to the Lagrangian relaxation for a given λ, then such strategy is at hand whenever one of the two following cases occurs for some v ∈ V :

  1. 1.

    \( \sum \limits _{e\in A(v)}x_{e}(\boldsymbol {\lambda })>2\) and y v(λ) = 0;

  2. 2.

    \( \sum \limits _{e\in A(v)}x_{e}(\boldsymbol {\lambda })\leq 2\) and y v(λ) = 1.

Consider first the case 1 and observe (see the objective function of the problem (17.41)) that the v component ∑eA(v)x e(λ) − 2 − δ vy v(λ)) of a subgradient of z LR is strictly positive, thus one can expect an increase in the objective if λ v is increased. Observe, in addition, that property (17.46) suggests to adopt an Armijo-type line search along the co-ordinate direction e v with \(\frac {1}{\delta _v}\) as initial step size. On the other hand, a consequence of nonsmoothness of z LR is that direction e v is not necessarily an ascent one, thus the backtracking line search must accommodate for possible failure (the so-called null step, to use the terminology of bundle methods).

As for the case 2, the v subgradient component ∑eA(v)x e(λ) − 2 − δ vy v(λ)) is negative and, in addition (see (17.45)) it is λ v > 0. Thus it is worth to consider a possible reduction on λ v, that is a move along the co-ordinate direction −e v, adopting, also in this case, an Armijo-type line search equipped with possible null step declaration.

The use of the co-ordinate search approach has both advantages and drawbacks. As pointed out in [10], co-ordinate search is definitely faster than standard subgradient but the lower bound is slightly worse due to possible premature stop (see Sect. 17.3).

We remark that for this application a feasible solution of the problem (17.40) is available with no additional computational cost at each iteration of the dual ascent procedure. After all, the Lagrangian relaxation provides a spanning tree, thus, to implement a Lagrangian heuristic, it is only needed to calculate the corresponding cost in terms of the objective function of (17.40).

5.3 Directional Sensors Location

In wireless sensor networks (WSN) [54] a sensor is a device capable to receive (and possibly store and forward) information coming from a sufficiently close area. In general, such an area is a circle of given radius, whose centre is the sensor location itself. Points inside the area are deemed covered by the sensor. In case the area, instead of being a circle, is an adjustable circular sector, the sensor is defined directional [12].

The directional sensors continuous coverage problem (DSCCP) is about covering several targets, distributed in a plane, by a set of directional sensors whose locations are known. Each sensor is adjustable, being characterized by a discrete set of possible radii and aperture angles. The sensor orientation is a decision variable too. The model accommodates for possible sensor switch off. Since different power consumption is associated to the sensor adjustments, the objective is to minimize the total power cost of coverage.

We report here the formulation given in [3] as a mixed integer nonlinear program (MINLP). Note that in most of the literature only a discrete number of sensor orientations are considered (see [50, Lemma 1 and Corollary 1]). The motivation for defining, instead, the orientation as a continuous variable is in the choice of Lagrangian relaxation as the attack strategy. It will be clear in the sequel, that solving the relaxed problem is easy once the sensor orientation is assumed to be a continuous variable.

Suppose that a given set \(\mathcal {I}\) of sensors is located in a certain area and \({\boldsymbol s}_i \in \mathbb {R}^2\) is the known position of sensor i, \(i \in \mathcal {I}\). The location \({\boldsymbol t}_j\in \mathbb {R}^2\), \(j\in \mathcal {J}\), of the targets is also known, together with the sensor-target distance parameters d ij = t j −s i and c ij = ∥t j −s i∥, \(i \in \mathcal {I}\) and \(j \in \mathcal {J}\). A set of K + 1 different power levels k, k = 0, …, K can be considered for sensors, each of them corresponding to a couple of values of the radius r k and of the half-aperture angle α k of the circular sector. In particular, the level k = 0 is associated to an inactive sensor (r 0 = 0 and α 0 = 0). We also introduce the parameter \(q_k = \cos \alpha _k\), q k ∈ [−1, 1].

The area covered by the sensor i, activated at level k, is then the revolution cone defined as

$$\displaystyle \begin{aligned} \big\{{\boldsymbol x} \;|\; {\boldsymbol x} \in \mathbb{R}^2,\;\|{\boldsymbol x}-{\boldsymbol s}_i\| \leq r_k, \, \;q_k \|{\boldsymbol x}-{\boldsymbol s}_i\| \leq ({\boldsymbol x}-{\boldsymbol s}_i){{{}^T}}{\boldsymbol w}_i\big\}, {} \end{aligned} $$
(17.47)

where \({\boldsymbol w}_i\in \mathbb {R}^2\), ∥w i∥ = 1, is the orientation direction assigned to sensor i, \(i \in \mathcal {I}\) (∥.∥ is assumed to be the Euclidean norm).

As a consequence, target j is covered by the sensor i, activated at level k, with orientation direction w i if and only if the following two conditions hold:

$$\displaystyle \begin{aligned} r_k \geq c_{ij} \qquad \text{and} \qquad q_kc_{ij} \leq {\boldsymbol d}_{ij}{{{}^T}}{\boldsymbol w}_i. {} \end{aligned} $$
(17.48)

The decision variables are:

  • \({\boldsymbol w}_i\in \mathbb {R}^2\), the orientation direction assigned to sensor i, \(i \in \mathcal {I}\);

  • x ik, \(i \in \mathcal {I}\), k = 0, …, K, binary: x ik = 1 if sensor i is activated at power level k and x ik = 0 otherwise;

  • σ ij, \(i \in \mathcal {I}\), \(j \in \mathcal {J}\), binary: σ ij = 0 implies that both conditions (17.48) are satisfied;

  • u j, \(j \in \mathcal {J}\), binary: u j = 0 implies that the target j is covered by at least one sensor.

The model considers two types of costs:

  • p k, the activation cost for turning on any sensor at power level k, k = 0, …, K (with p 0 = 0);

  • H, the penalty cost associated to an uncovered target.

Finally the DSCCP can be stated as follows:

$$\displaystyle \begin{aligned} \begin{cases} \text{minimize}& \displaystyle \sum_{i \in \mathcal{I}} \sum_{k=0}^K p_k x_{ik} + H \displaystyle \sum_{j \in \mathcal{J}} u_j \\ \text{subject to} & \displaystyle \sum_{k=0}^K x_{ik}=1, \;\; i\in \mathcal{I}, \\ & \displaystyle\sum_{k=0}^K x_{ik}r_k \geq c_{ij}-M\sigma_{ij},\; i\in \mathcal{I},~~ j\in \mathcal{J},\\ & c_{ij}\displaystyle\sum_{k=0}^K q_k x_{ik} - {\boldsymbol d}_{ij}{{{}^T}}{\boldsymbol w}_i\leq M \sigma_{ij},\; i\in \mathcal{I}, \,\,j\in \mathcal{J}, \\ &u_j \geq \displaystyle\sum_{i \in \mathcal{I}} \sigma_{ij}-(|I|-1),\;\; j\in \mathcal{J}, \\ & \|{\boldsymbol w}_i\|=1, \;\;i\in \mathcal{I}, \\ & {\boldsymbol x}, \boldsymbol{\sigma}, {\boldsymbol u} \;\; \mbox{binary}~\mbox{and}~ {\boldsymbol w}_i\in \mathbb{R}^2, \,\,i \in \mathcal{I}. \end{cases} \end{aligned} $$
(17.49)

The model contains the “big M” positive input parameter, which is common in formulation of location problems. Variables x ik, σ ij and u j are grouped into the vectors x, σ and u, respectively.

The objective function is the sum of activation cost and penalty for possibly uncovered targets. The first set of constraints are classic semi-assignment, ensuring that exactly one power level (possibly the 0-level) is assigned to each sensor. Constraints

$$\displaystyle \begin{aligned} \begin{cases} & \displaystyle\sum_{k=0}^K x_{ik}r_k \geq c_{ij}-M\sigma_{ij},\; i\in \mathcal{I},~~ j\in \mathcal{J},\\ & c_{ij}\displaystyle\sum_{k=0}^K q_k x_{ik} - {\boldsymbol d}_{ij}{{{}^T}}{\boldsymbol w}_i\leq M \sigma_{ij},\; i\in \mathcal{I}, \,\, j\in \mathcal{J}, \end{cases} \end{aligned} $$
(17.50)

are related to target coverage. They are trivially satisfied if σ ij = 1 but, whenever they are satisfied with σ ij = 0, then target j is covered by sensor i (see (17.48)). Constraints

$$\displaystyle \begin{aligned} &u_j \geq \displaystyle\sum_{i \in \mathcal{I}} \sigma_{ij}-(|\mathcal{I}|-1),\;\; j\in \mathcal{J} \end{aligned} $$
(17.51)

impose that, for any target j, u j = 1 in case σ ij = 1 for all \(i \in \mathcal {I}\), that is if the target j remains uncovered. At the optimum u j = 1 if and only if \(\sum _{i \in \mathcal {I}}\sigma _{ij}=|\mathcal {I}|\). Finally the (nonconvex) constraints ∥w i∥ = 1, \(i\in \mathcal {I}\), aim at normalizing the orientation direction assigned to each sensor.

By introducing the nonnegative multiplier vectors λ, θ and γ, the following Lagrangian relaxation is defined

(17.52)

which, rearranging the objective function, may be rewritten as

(17.53)

The above formulation leads to the following decomposition:

$$\displaystyle \begin{aligned} z_{LR}(\boldsymbol{\lambda}, {\boldsymbol \theta}, {\boldsymbol \gamma})=C(\boldsymbol{\lambda}, {\boldsymbol \gamma}) +z^{(1)}_{LR}(\boldsymbol{\lambda}, {\boldsymbol \theta})+z^{(2)}_{LR}(\boldsymbol{\lambda}, {\boldsymbol \theta}, {\boldsymbol \gamma})+z^{(3)}_{LR}({\boldsymbol \theta})+z^{(4)}_{LR}({\boldsymbol \gamma}), \end{aligned} $$
(17.54)

where

$$\displaystyle \begin{aligned} C(\boldsymbol{\lambda}, {\boldsymbol \gamma})= \displaystyle\sum_{i \in \mathcal{I}} \displaystyle\sum_{j \in \mathcal{J}} \lambda_{ij}c_{ij}+(1-|\mathcal{I}|)\displaystyle\sum_{j \in J} \gamma_j, \end{aligned} $$
(17.55)
$$\displaystyle \begin{aligned} &\begin{aligned} z^{(1)}_{LR}(\boldsymbol{\lambda}, {\boldsymbol \theta})= \,\,\,& \text{min} \quad \displaystyle\sum_{i \in \mathcal{I}} \displaystyle\sum_{k=0}^K [p_k+\displaystyle\sum_{j \in \mathcal{J}}(\theta_{ij}c_{ij}q_k-\lambda_{ij}r_k)]x_{ik}\\ &\text{subject to} \,\, ~ \displaystyle\sum_{k=0}^K x_{ik}=1, \;\;i \in \mathcal{I}, \\ &\qquad \qquad ~~~ {\boldsymbol x}\;\; \mbox{binary}, \end{aligned}{} \end{aligned} $$
(17.56)
$$\displaystyle \begin{aligned} \\ {} &\begin{aligned} z^{(2)}_{LR}(\boldsymbol{\lambda}, {\boldsymbol \theta}, {\boldsymbol \gamma})= \,\,\,& \text{min} \quad \displaystyle\displaystyle\sum_{i \in \mathcal{I}} \sum_{j \in \mathcal{J}} [\gamma_{j}-M(\lambda_{ij}+\theta_{ij})]\sigma_{ij}\\ &\text{subject to} \,\, ~ \displaystyle\sum_{k=0}^K x_{ik}=1, \;\;i \in \mathcal{I}, \\ &\qquad \qquad ~~~ \boldsymbol{\sigma} \;\mbox{binary}, \end{aligned}{} \end{aligned} $$
(17.57)
$$\displaystyle \begin{aligned} &\begin{aligned} z^{(3)}_{LR}( {\boldsymbol \theta})= -\,\,\,& \text{max} \quad \displaystyle\sum_{i \in \mathcal{I}} \displaystyle\sum_{j \in \mathcal{J}} \theta_{ij}{\boldsymbol d}_{ij}{{{}^T}}{\boldsymbol w}_i\\ &\text{subject to} \,\, ~ \|{\boldsymbol w}_i\|=1, \;\;i \in \mathcal{I}, \\ &\qquad \qquad ~~~ {\boldsymbol w}_i\in \mathbb{R}^2, \,\, i \in \mathcal{I}, \end{aligned} \end{aligned} $$
(17.58)

and

$$\displaystyle \begin{aligned} &\begin{aligned} z^{(4)}_{LR}( {\boldsymbol \gamma})= \,\,\,& \text{min} \quad \displaystyle\sum_{j \in \mathcal{J}}(H-\gamma_j)u_j\\ &\text{subject to} \,\, ~ {\boldsymbol u} \;\, \mbox{binary}. \end{aligned} \end{aligned} $$
(17.59)

Calculation of the constant C(λ, γ) is immediate. The solution of problems (17.56), (17.57), and (17.59) can be obtained by simple inspection of the corresponding cost coefficients, while the problem (17.58) has optimal solution w i(θ), \(i \in \mathcal {I}\), in the following closed form:

$$\displaystyle \begin{aligned} {\boldsymbol w}_i({\boldsymbol \theta})= \begin{cases}\displaystyle \frac{\sum_{j \in \mathcal{J}} \theta_{ij}{\boldsymbol d}_{ij}}{\|\sum_{j \in \mathcal{J}} \theta_{ij}{\boldsymbol d}_{ij}\|}, \,\,\, &\text{if} \,\,\,\sum_{j \in \mathcal{J}} \theta_{ij}{\boldsymbol d}_{ij} \neq {\mathbf{0}}, \\ \text{any vector} \,\,\,{\boldsymbol w} \in \mathbb{R}^2,\,\,\|{\boldsymbol w}\|=1,\,\,\,&\text{otherwise.} \end{cases} \end{aligned} $$
(17.60)

In [3] the Lagrangian dual is tackled by applying a coordinate search method which modifies one multiplier at a time, in view of possible increase of function z LR(λ, θ, γ). The choice of such multiplier is driven, as usual, by the type of infeasibility occurring at the current solution of the Lagrangian relaxation. Thus different rules are designed, according to the choice of the component either of λ or θ or γ to be updated. For the details of such rules we refer the reader to [3].

The dual ascent procedure is particularly suitable for embedding a Lagrangian heuristic. In fact the solution of the Lagrangian relaxation can be made feasible at a quite low computational cost, hence providing an upper bound.

More specifically, whenever the solution to (17.52) is not feasible for (17.49), the set \(\bar {\mathcal {J}}\) of possibly uncovered targets is considered. For each of them at least one of the following conditions holds:

$$\displaystyle \begin{aligned} &\sum_{k=0}^K x_{ik}r_k < c_{ij}\;\; \mbox{ for all ~} i \in \mathcal{I}, {} \end{aligned} $$
(17.61)
$$\displaystyle \begin{aligned} &c_{ij}\sum_{k=0}^K q_k x_{ik}> {\boldsymbol d}_{ij}{{{}^T}}{\boldsymbol w}_i\;\;\mbox{ for all ~} i \in \mathcal{I}{} \end{aligned} $$
(17.62)

at the optimum of the Lagrangian relaxation. Thus, for each target \(j\in \bar {\mathcal {J}}\), possible coverage is sought by acting on the power levels (variables x) assigned to those sensors which are eligible for covering it. In such search, sensor orientations (variables w i) returned by the Lagrangian relaxation are not modified.

Summing up, at each iteration of the dual ascent procedure both a lower and an upper bounds are calculated. In [3] it is highlighted that while the lower bound provided by the algorithm is extremely poor, the best feasible solution found is, in general, of good quality also for relatively large scale problems. It is worth noting that the algorithm is equipped with possible restart along a subgradient direction in case the Armijo line search fails along all coordinate directions.

5.4 Cross Docking Scheduling

A cross docking distribution centre (CD centre, in the following) is a modern logistic node where goods are unloaded from inbound trucks, consolidated with respect to the customer orders and then immediately loaded on outbound trucks, cancelling in practice the traditional and costly storing and retrieval phases. Management of a CD centre is a serious challenge since the processes of arrival and departure of goods are strongly coupled and sophisticated synchronization schemes are to be fulfilled.

Intensive research efforts have been made to devise effective models and algorithms for optimizing the combined scheduling of the inbound and outbound trucks; more recently Lagrangian relaxation has been applied in this area (see [14] and the references therein). The problem addressed is about finding the optimal inbound and outbound sequences at a CD centre characterized by only two gates (or doors), one for the inbound and the other for the outbound trucks. The objective is to minimize the total completion time (the makespan in scheduling theory parlance).

The rules of the game are the following:

  • only one truck at a time can be handled at a door and no interruption (preemption) is allowed;

  • the loading of an outbound truck cannot start until all goods it is expected to deliver have been unloaded;

  • all trucks require the same processing time (one slot) and are ready at the time 0.

As an input data we consider a set of n inbound trucks (\(\mathcal {I}=\{1,\ldots ,n\}\)) to be unloaded, together with a set of m outbound trucks (\(\mathcal {J}=\{1,\ldots ,m\}\)) to be loaded. The following sets are also given:

  • \(\mathcal {J}_i\), the set of outbound trucks receiving goods from the inbound truck i, \(i \in \mathcal {I}\);

  • \(\mathcal {I}_j\), the set of inbound trucks providing goods to the outbound truck j, \(j \in \mathcal {J}\).

The planning horizon is discretized into time-slots, each of them being capable to accommodate for processing of one truck. Let \(\mathcal {K}=\{1,\ldots ,n\}\) and \(\mathcal {L}=\{1,\ldots ,H \}, ~~ H \geq m+n \), be the time horizon for the inbound and outbound services, respectively (note that, under the assumptions made, n + m is an upper bound on the makespan).

Introducing the following binary decision variables:

  • x ik = 1, if the inbound truck i is assigned to the time-slot k and x ik = 0 otherwise, \( i \in \mathcal {I}\), \(k \in \mathcal {K}\);

  • y jh = 1, if the outbound truck j is assigned to the time-slot h, y jh = 0 otherwise, \( j \in \mathcal {J}\), \(h \in \mathcal {L}\);

and the integer variable C Max, the makespan, then the one door cross docking (ODCD) problem is modelled as follows:

$$\displaystyle \begin{aligned} \begin{aligned} Z\,\,= \,\,\, &\text{min} \quad C_{Max} \\ &\text{subject to} \,\, \displaystyle \sum_{k \in \mathcal{K}}x_{ik}=1, \quad i\in \mathcal{I}, \\ &\qquad \qquad \,\,\,~ \displaystyle\sum_{i \in \mathcal{I}}x_{ik}=1, \quad k \in \mathcal{K}, \\ &\qquad \qquad \,\,\,~\displaystyle\sum_{h \in \mathcal{L}}y_{jh}= 1, \quad j\in \mathcal{J}, \\ &\qquad \qquad \,\,\,~\displaystyle\sum_{j \in \mathcal{J}}y_{jh} \leq 1, \quad h\in \mathcal{L},\\ &\qquad \qquad \,\,\, ~C_{Max} \geq \displaystyle \sum_{h\in \mathcal{L}} hy_{jh}, \quad j \in \mathcal{J}, \\ &\qquad \qquad \,\,\,~\displaystyle \sum_{h\in \mathcal{L}} hy_{jh} \geq \displaystyle\sum_{k\in \mathcal{K}} kx_{ik}+1, \quad j\in \mathcal{J}, ~i \in \mathcal{I}_j, \\ &\qquad \qquad \,\,\,~x_{ik} \,\, \text{binary}, \,\,i \in \mathcal{I},~ k\in \mathcal{K}, \\ &\qquad \qquad \,\,\,~y_{jh} \,\, \text{binary}, \,\,j \in \mathcal{J}, ~ h\in \mathcal{L}. \end{aligned} \end{aligned} $$
(17.63)

The first four constraint sets regulate the time slot-truck assignment at both the inbound and outbound door. The constraints \(C_{Max} \geq \sum _{h\in \mathcal {L}} hy_{jh}\), \(j \in \mathcal {J}\) define the makespan C Max as the maximum truck completion time, of course on the outbound side. The constraints

$$\displaystyle \begin{aligned}\sum_{h\in \mathcal{L}} hy_{jh} \geq \displaystyle\sum_{k\in \mathcal{K}}kx_{ik}+1, \quad j\in \mathcal{J}, ~i \in \mathcal{I}_j \end{aligned}$$

ensure that loading of each outbound truck cannot start until the unloading of all corresponding inbound trucks has been completed. They are the complicating ones and lead to the following Lagrangian relaxation obtained via the multiplier vector λ ≥ 0.

$$\displaystyle \begin{aligned} \begin{aligned} z_{LR}(\boldsymbol{\lambda})\,\,\, = \,\,\,\,& \text{min} \quad C_{Max}+ \displaystyle \sum_{j \in \mathcal{J}}\sum_{i \in \mathcal{I}_{j}}\lambda_{ij}\big(\sum_{k \in \mathcal{K}}kx_{ik}-\sum_{h \in \mathcal{L}}hy_{jh}+1\big) \\ &\text{subject to} \,\, \displaystyle \sum_{k \in \mathcal{K}}x_{ik}=1, \quad i\in \mathcal{I}, \\ &\qquad \qquad \,\,\,~ \displaystyle\sum_{i \in \mathcal{I}}x_{ik}=1, \quad k \in \mathcal{K}, \\ &\qquad \qquad \,\,\,~\displaystyle\sum_{h \in \mathcal{L}}y_{jh}= 1, \quad j\in \mathcal{J}, \\ &\qquad \qquad \,\,\,~\displaystyle\sum_{j \in \mathcal{J}}y_{jh} \leq 1, \quad h\in \mathcal{L},\\ &\qquad \qquad \,\,\,~C_{Max} \geq \displaystyle \sum_{h\in \mathcal{L}} hy_{jh}, \quad j \in \mathcal{J}, \\ &\qquad \qquad \,\,\,~x_{ik} \,\, \text{binary,} \,\,i \in \mathcal{I},~ k\in \mathcal{K}, \\ &\qquad \qquad \,\,\,~y_{jh} \,\,\text{binary}, \,\, j\in \mathcal{J}, ~h\in \mathcal{L}. \end{aligned} \end{aligned} $$
(17.64)

The relaxation decomposes into two independent matching problems, related, respectively, to the inbound and outbound door. In fact, by simple manipulations based on the observation \(i \in \mathcal {I}_j \Leftrightarrow j \in \mathcal {J}_i\), and setting

$$\displaystyle \begin{aligned} s=\sum_{j \in \mathcal{J}}\sum_{i \in \mathcal{I}_{j}}\lambda_{ij}=\sum_{i \in \mathcal{I}}\sum_{j \in \mathcal{J}_{i}}\lambda_{ij}, \end{aligned} $$
(17.65)

and

$$\displaystyle \begin{aligned} \rho_i = \sum_{j \in \mathcal{J}_i} \lambda_{ij},~i \in \mathcal{I}, \,\, \quad \sigma_j= \sum_{i \in \mathcal{I}_j} \lambda_{ij}, ~j \in \mathcal{J}, {} \end{aligned} $$
(17.66)

we rewrite z LR as a function of vectors ρ and σ (grouping the ρ is and the σ js, respectively) as follows

$$\displaystyle \begin{aligned} z_{LR}(\boldsymbol{\rho}, \boldsymbol{\sigma})=s + \min \sum_{i \in \mathcal{I}} \sum_{k \in \mathcal{K}}k\rho_ix_{ik} + C_{Max} - \sum_{j \in \mathcal{J}} \sum_{h \in \mathcal{L}}h\sigma_jy_{jh}. \end{aligned} $$
(17.67)

Thus we define

$$\displaystyle \begin{aligned}z_{LR}(\boldsymbol{\rho}, \boldsymbol{\sigma})=s + z^{(1)}_{LR}(\boldsymbol{\rho})+z^{(2)}_{LR}(\boldsymbol{\sigma}), \end{aligned}$$

with

$$\displaystyle \begin{aligned} &\begin{aligned} z^{(1)}_{LR}(\boldsymbol{\rho})\, = \,\,\,& \text{min} \quad \displaystyle \sum_{i \in \mathcal{I}} \sum_{k \in \mathcal{K}}k\rho_ix_{ik}\\ &\text{subject to} \,\, ~ \displaystyle \sum_{k \in \mathcal{K}}x_{ik}=1, \quad i\in \mathcal{I},\\ &\qquad \qquad \,\,\, ~\displaystyle\sum_{i \in \mathcal{I}}x_{ik}=1, \quad k \in \mathcal{K} \\ &\qquad \qquad \,\,\, ~ x_{ik} \,\, \text{binary,} \,\,i \in \mathcal{I},~ k\in \mathcal{K}, \end{aligned} \end{aligned} $$
(17.68)

and

$$\displaystyle \begin{aligned} &\begin{aligned} z^{(2)}_{LR}(\boldsymbol{\sigma})\, = \,\,\,& \text{min} \quad C_{Max} - \displaystyle \sum_{j \in \mathcal{J}} \sum_{h \in \mathcal{L}}h\sigma_jy_{jh}\\ &\text{subject to} \,\, \displaystyle\sum_{j \in \mathcal{J}}y_{jh} \leq 1, \quad h\in \mathcal{L},\\ &\qquad \qquad \,\,\, ~C_{Max} \geq \displaystyle \sum_{h\in \mathcal{L}}hy_{jh}, \quad j \in \mathcal{J}, \\ &\qquad \qquad \,\,\, ~ y_{jh} \,\,\text{binary}, \,\, j\in \mathcal{J}, ~h\in \mathcal{L}. \end{aligned} \end{aligned} $$
(17.69)

Problems (17.68) and (17.69) [14] are two simple single machine scheduling problems which can be solved by appropriate sorting of the vectors ρ and σ.

The following holds for the Lagrangian dual.

Theorem 17.1

There exists an optimal solution to the Lagrangian dual problem such that

$$\displaystyle \begin{aligned} s= \sum_{i\in I}\sum_{j \in J_{i}}\lambda_{ij}=\sum_{i\in I}\rho_i=\sum_{j \in J}\sum_{i \in I_{j}}\lambda_{ij}=\sum_{j \in J}\sigma_j \leq 1. \end{aligned} $$
(17.70)

The above property is helpful in designing algorithms to solve the Lagrangian dual, mainly as it enables appropriate sizing of the step size in implementing a line search.

It is worth noting that the ODCD does not enjoy the integrality property (see, in particular, the problem (17.69)), thus the lower bound obtained from the Lagrangian dual is more accurate than that provided by the LP relaxation.

In addition, any solution of a Lagrangian relaxation which is not feasible for the ODCD can be repaired, at low computational cost, by implementing some heuristic method based on simple forward-shifting of those outbounds trucks j for which the relaxed constraints

$$\displaystyle \begin{aligned}\sum_{h \in \mathcal{L}}hy_{jh} \geq \sum_{k\in \mathcal{K}}kx_{ik}+1\end{aligned}$$

have been violated for at least one \(i \in \mathcal {I}_j\) at the optimum of (17.64).

In [14] the Lagrangian dual has been tackled by resorting to a standard (projected) subgradient method, which has been equipped, at each iteration, with a repairing procedure, thus allowing to possibly update both the lower and upper bounds.

The Lagrangian heuristic has provided satisfactory results on test problem characterized by a number of inbound trucks n up to 20 and a number of outbound ones m in the range [10, 40].

5.5 Feature Selection in Support Vector Machines

Feature selection (FS) is a relevant issue in machine learning and, in particular, in pattern classification. In fact classification is a kind of diagnostic process which consists in attaching a label (that is certifying exactly one class membership) to an individual (a sample or a pattern ), on the basis of a given number of known parameters (the features). Most of the research work has been focussed on binary classification, where the classes are only two. A binary classifier then is a tool which is able to attach the appropriate label to a sample whose class membership is unknown.

The classification models we are dealing with are of the supervised type, since the classifier is constructed on the basis of the information provided by a certain number of samples (the training set) whose class membership is known.

A fundamental paradigm to construct such classifier is the support vector machines (SVM) [15, 45] which consists of separating the samples belonging to the training set by means of a hyperplane, either in the feature space or in a higher dimension one, upon an appropriate kernel transformation. Once the hyperplane has been calculated, it is used to classify newly incoming patterns whose class membership is unknown.

In the standard SVM approach, the training set is formed by two given point-sets \({A} = \{ {\boldsymbol a}_i, i \in \mathcal {I} \}\) and \({ B} = \{{\boldsymbol b}_j, j \in \mathcal {J} \}\) in \(\mathbb {R}^n\) (the feature space). The problem is about finding a hyperplane defined by a couple \(({\boldsymbol w} \in \mathbb {R}^n, \gamma \in \mathbb {R})\) that strictly separates A and B. Thus one would require

$$\displaystyle \begin{aligned}{\boldsymbol w}{{{}^T}}{\boldsymbol a}_i+\gamma \leq-1, \,\,\,i \in \mathcal{I}{} \end{aligned} $$
(17.71)

and

$$\displaystyle \begin{aligned} {\boldsymbol w}{{{}^T}}{\boldsymbol b}_j+\gamma \geq 1, \,\,\, j \in \mathcal{J} {}.\end{aligned} $$
(17.72)

Since the necessary and sufficient condition for existence of such strictly separating hyperplane

$$\displaystyle \begin{aligned} \operatorname{\mathrm{conv}} \{A\} \cap \operatorname{\mathrm{conv}} \{ B\} = \emptyset \end{aligned}$$

is, in general, not known to hold in advance, we define the point classification error functionsξ i(w, γ), \(i \in \mathcal {I}\) and ζ j(w, γ), \(j \in \mathcal {J}\) as follows:

$$\displaystyle \begin{aligned}\xi_i({\boldsymbol w}, \gamma)=\max \big\{0, 1+({\boldsymbol w}{{{}^T}}{\boldsymbol a}_i+\gamma) \big\}, \,\,\, i \in \mathcal{I}, \end{aligned} $$
(17.73)

and

$$\displaystyle \begin{aligned} \zeta_j({\boldsymbol w}, \gamma)=\max \big\{0, 1-({\boldsymbol w}{{{}^T}}{\boldsymbol b}_j+\gamma)\big\}, \,\,\, j \in \mathcal{J}.{} \end{aligned} $$
(17.74)

Note that ξ i and ζ j are positive if and only if (17.71) and (17.72) are violated, respectively. Consequently, they can be considered as a measure of the classification error related to point a i ∈ A and b j ∈ B.

The convex and nonsmooth error function E(w, γ) is then defined as

$$\displaystyle \begin{aligned} E({\boldsymbol w}, \gamma) =& \sum_{i \in \mathcal{I}} \max \big\{0, 1+({\boldsymbol w}{{{}^T}}{\boldsymbol a}_i+\gamma)\big\} + \sum_{j \in \mathcal{J}} \max \big\{0, 1-({\boldsymbol w}^T{\boldsymbol b}_j+\gamma)\big\} \\ =&\sum_{i \in \mathcal{I}}\xi_i({\boldsymbol w}, \gamma)+\sum_{j \in \mathcal{J}}\zeta_j({\boldsymbol w}, \gamma) . \end{aligned} $$

Finally we come out with the standard SVM model:

$$\displaystyle \begin{aligned} \begin{cases} \underset{{\boldsymbol w},{\gamma}, { \boldsymbol{\xi}, \boldsymbol{\zeta}}} {\text{minimize}} \quad & \|{\boldsymbol w}\|{}^2 + C (\displaystyle \sum_{i \in \mathcal{I}} \xi_i+ \displaystyle \sum_{j \in \mathcal{J}} \zeta_j) \\ \text{subject to} &{\boldsymbol a}_i^{T}{\boldsymbol w}+ \gamma \leq \xi_i -1, \quad i \in \mathcal{I}, \\ & -{\boldsymbol b}_l^{T}{\boldsymbol w}- \gamma \leq \zeta_j -1, \quad j \in \mathcal{J}, \\ &\xi_i \geq 0, \quad i \in \mathcal{I}, \\ &\zeta_j \geq 0, \quad j \in J. \\ \end{cases} \end{aligned} $$
(17.75)

The objective function is the sum of the error function E weighted by the parameter C > 0 and the square of the norm of w. The latter term is aimed at maximizing the separation margin between the two sets A and B. In fact (see [15, Chapter 6]) \(\frac {2}{\|{\boldsymbol w}\|}\) is the distance (the separation margin) between the hyperplanes

$$\displaystyle \begin{aligned}H^-=\{{\boldsymbol x} \mid {\boldsymbol w}{{{}^T}}{\boldsymbol x}+b=-1\}~~\mbox{and}~~H^+=\{{\boldsymbol x} \mid {\boldsymbol w}^T{\boldsymbol x}+b=1\},\end{aligned}$$

thus minimization of ∥w∥ leads to maximization of the margin. In practical applications the squared norm ∥w2 replaces ∥w∥ in the objective function.

In the SVM framework the role of the FS is to detect those features that are really relevant for classification purposes. In other words a classifier embedding a FS mechanism is expected to guarantee classification correctness (effective separation of A and B) and, also, to be parsimonious, that is to provide a vector w (the normal to the separating hyperplane) with as few as possible non-zero components.

Several different optimization-based approaches for the FS are available in literature; we cite here, among the others, [9] and [47]. We focus on treatment of the FS problem via mixed integer programming (see [6, 42]). In particular, we refer to the model described in [28], where a Lagrangian relaxation approach has been implemented.

Aiming at enforcing a feature selection mechanism, that is at reducing the number of the non-zero components of w, the binary feature variable vector \({\boldsymbol y} \in \mathbb {R}^n\) is embedded into the model (17.75), with y k indicating whether or not feature k is active. The following mixed binary formulation of the SVM-FS problem is stated:

$$\displaystyle \begin{aligned} \begin{cases} \underset{{\boldsymbol w}, {\gamma},{ \boldsymbol{\xi}, \boldsymbol{\zeta}}} {\text{minimize}} \quad & \|{\boldsymbol w}\|{}^2 + C (\displaystyle \sum_{i \in \mathcal{I}} \xi_i + \displaystyle \sum_{j \in \mathcal{J}} \zeta_j)+ D \displaystyle \sum_{k=1}^{n} y_k \\ \text{subject to} &{\boldsymbol a}_i^{T}{\boldsymbol w}+ \gamma \leq \xi_i -1, \quad i \in \mathcal{I}, \\ & -{\boldsymbol b}_l^{T}{\boldsymbol w}- \gamma \leq \zeta_j -1, \quad j \in \mathcal{J}, \\ &-u_ky_k \leq w_k \leq u_ky_k, \quad k=1,\ldots,n, \\[0.1cm] &-u_k \leq w_k \leq u_k, \quad \quad k=1,\ldots,n, \\[0.1cm] &\xi_i \geq 0, \quad i \in \mathcal{I}, \\ & \zeta_j \geq 0, \quad j \in J, \\ & y_{k} \,\,\text{binary}, \,\,\, k=1,\ldots,n. \end{cases} \end{aligned} $$
(17.76)

The objective function of the problem (17.76) is the sum of three terms. The norm ∥w∥, as usual in SVM-type models, is intended to maximize the separation margin (we note in passing that in [28] the L 1-norm, instead of the more commonly used L 2, is adopted). The second term is the error function E(w, γ) and, finally, the third one represents the number of nonzero components of w. Note also the presence of the positive weights C and D in the objective function and of the upper bounds u k > 0 on the modulus of each component w k of w.

In [28] the problem above has been tackled by resorting to Lagrangian relaxation of the constraints linking the variables w and y, by means of the multiplier vectors λ ≥ 0 and μ ≥ 0. Thus we obtain

(17.77)

By rearranging the objective function, we get to the following decomposed formulation:

$$\displaystyle \begin{aligned}z_{LR}(\boldsymbol{\lambda}, \boldsymbol{\mu})=z_{LR}^{(1)}(\boldsymbol{\lambda}, \boldsymbol{\mu})+z_{LR}^{(2)}(\boldsymbol{\lambda}, \boldsymbol{\mu}), \end{aligned}$$

with

(17.78)

and

$$\displaystyle \begin{aligned} \begin{aligned} z_{LR}^{(2)}(\boldsymbol{\lambda}, \boldsymbol{\mu})\,\,= \,\,\,& \underset{{\boldsymbol y}} {\text{min}} \quad \displaystyle \sum_{k=1}^{n}(D-u_k(\lambda_k +\mu_k)) y_k\\ & \text{subject to} \quad y_{k} \,\,\text{binary}, \,\,\, k=1,\ldots,n. \\ \end{aligned} \end{aligned} $$
(17.79)

Note that calculation of \(z_{LR}^{(1)}(\boldsymbol {\lambda }, \boldsymbol {\mu })\) requires solution of a problem SVM-like, the only difference being the presence of the linear term \(\sum _{k=1}^{n} (\lambda _k - \mu _k)w_k\) into the objective function. As for the second problem, \(z_{LR}^{(2)}(\boldsymbol {\lambda }, \boldsymbol {\mu })\) can be simply calculated by sign inspection of the cost coefficients (D − u k(λ k + μ k)), k = 1, …, n.

As for the Lagrangian dual

$$\displaystyle \begin{aligned}z_{LD} = \underset{{ (\boldsymbol{\lambda}, \boldsymbol{\mu})}\geq 0} {\displaystyle \max} \quad z_{LR}(\boldsymbol{\lambda}, \boldsymbol{\mu}),\end{aligned}$$

the following proposition [28] holds:

Proposition 17.1

There exists an optimal solution to the Lagrangian dual satisfying the condition

$$\displaystyle \begin{aligned} u_k( \lambda_k+ \mu_k ) = D,~~~ k=1,\ldots,n. {} \end{aligned} $$
(17.80)

In [28] the above property is utilized to eliminate the variables μ k, k = 1, …, n in the Lagrangian dual, which is then tackled by resorting to the general purpose C++ bundle code developed in [23].

Solution of the Lagrangian dual is embedded into a Lagrangian heuristic algorithm. Note in fact that a feasible solution (and consequently an upper bound) for the problem (17.76) can be easily obtained starting from the optimal w(λ) obtained by solving the Lagrangian relaxation for any feasible choice of the multiplier vector λ. It is in fact sufficient to set y k = 1 whenever |w k(λ)| > 0 and y k = 0 otherwise.

Also in this application the Lagrangian heuristic approach has proved to work well. The numerical results [28] are quite satisfactory, particularly in terms of trade-off between the classification quality and the number of active features.

5.6 Multiple Instance Learning

Multiple instance learning (MIL) [1] is a classification paradigm, connected to SVM [15], which is capable to handle complex problems, mainly in medical image analysis and in text categorization. While the objective of SVM-like methods is to classify samples in a given feature space, MIL deals with classification of sets of samples, bags in Machine Learning parlance. We discuss here a specific MIL problem where binary classification of bags is considered. The approach has been introduced in [2] and Lagrangian relaxation has been applied in [4], giving rise to a Lagrangian heuristic algorithm.

To provide an intuitive explanation of the problem, we describe an example driven from image classification where the objects to be classified are images, each of them representing a certain mix of geometric figures (e.g. triangles, squares, circles and stars) of different size. Each image is a bag which is segmented according to some appropriate rule [55] and in turn a feature vector in \(\mathbb {R}^n\) is associated to each image segment, where aggregate information about segment luminosity, texture, geometry etc. are reported. Thus each image is represented by a set of points (instances), each of them being a vector in the feature spaces representing one of its segments. In our example the problem at hand is to discriminate between images containing at least one star (positive images) and those which contain no star (negative ones). In the supervised classification framework, the class membership of the images is known, thus each of them is labelled either as positive or negative.

A classic SVM scheme, aimed at separating by means of a hyperplane the instances of positive bags from those of the negative ones, does not seem profitable in this case. In fact the similarity degree between positive and negative images is high (after all triangles, square and circles may definitely appear in images from both classes) and, consequently, the expected separation quality is poor.

The MIL introduces a different paradigm. It is assumed in fact that a hyperplane in the feature space correctly classifies the images if all instances of each negative bag are in the same halfspace, while at least one instance of each positive bag is on the other side.

The MINLP formulation of the problem proposed in [2] follows: Assume m positive bags are given and let \(J^+=\{\mathcal {J}^+_1,\ldots ,\mathcal {J}^+_m\}\) be the family of the index sets of the instances of each bag. Analogously let \(J^-=\{\mathcal {J}^-_1,\ldots ,\mathcal {J}^-_k\}\) be the family of the index sets for k given negative bags. We indicate by \({\boldsymbol x}_j\in \mathbb {R}^n\) the j-th instance belonging to a positive or negative bag.

In the classic instance based SVM, we would look for a hyperplane defined by a couple \(({\boldsymbol w} \in \mathbb {R}^n, \gamma \in \mathbb {R})\) separating the instances belonging to the negative bags from those belonging to the positive ones.

Instead, according to [2], one looks for a hyperplane

$$\displaystyle \begin{aligned}H({\boldsymbol w},\gamma)=\{{\boldsymbol x} \mid {\boldsymbol w}{{{}^T}}{\boldsymbol x}+\gamma=0\},\end{aligned}$$

such that

  1. 1.

    all negative bags are contained in the set S  = {xw Tx + γ ≤−1};

  2. 2.

    at least one instance of each positive bag belongs to the set S + = {xw Tx + γ ≥ 1}.

The following optimization model, aimed at finding such a hyperplane, is then introduced. The decision variables, apart the couple \(({\boldsymbol w}\in \mathbb {R}^n, \gamma \in R)\), are the labels y j ∈{−1, 1} to be assigned to all instances of the positive bags. The twofold objective consists of minimizing the classification error (which is equal to zero in case a separating hyperplane is actually found) and of maximizing the separation margin, defined as the distance between the shifted hyperplanes (see Sect. 17.5.5)

$$\displaystyle \begin{aligned}H^-=\{{\boldsymbol x} \mid {\boldsymbol w}{{{}^T}}{\boldsymbol x}+\gamma=-1\}~~\mbox{ and}~~ H^+=\{{\boldsymbol x} \mid {\boldsymbol w}^T{\boldsymbol x}+\gamma=1\}.\end{aligned}$$

That is,

$$\displaystyle \begin{aligned} \begin{aligned} z^*\,= \,\,\,& \underset{{\boldsymbol w},\gamma,{\boldsymbol y}} {\text{min}} \quad \displaystyle f({\boldsymbol w},\gamma,{\boldsymbol y})\\ &\text{subject to} ~ \displaystyle\sum_{j\in \mathcal{J}_i^+}\displaystyle\frac{y_j+1}{2} \geq 1, \quad i=1,\ldots,m, \\ &\quad \quad \quad \quad \quad y_j\in\{-1,1\}, \quad j\in \mathcal{J}_i^+, \quad i=1,\ldots,m, \end{aligned} \end{aligned} $$
(17.81)

where

$$\displaystyle \begin{aligned} f({\boldsymbol w},\gamma,{\boldsymbol y})= \displaystyle \frac{1}{2}\|{\boldsymbol w}\|{}^2&+C\sum_{i=1}^k\sum_{j\in \mathcal{J}_i^-}\max\{0, 1+({\boldsymbol w}{{{}^T}} {\boldsymbol x}_j+\gamma)\} \\ &+C\sum_{i=1}^m\sum_{j\in \mathcal{J}_i^+}\max\{0, 1-y_j({\boldsymbol w}{{{}^T}} {\boldsymbol x}_j+\gamma)\}, \end{aligned} $$

with ∥⋅∥ being the Euclidean norm in \(\mathbb {R}^n\) and C > 0 the trade-off parameter. Note that constraints

$$\displaystyle \begin{aligned}\sum_{j\in \mathcal{J}_i^+}\displaystyle\frac{y_j+1}{2} \geq 1, \quad i=1,\ldots m \end{aligned}$$

impose that, for each positive bag, at least one of its samples must be labelled as a positive one. Function f is the sum of three terms:

  1. 1.

    \(\displaystyle \frac {1}{2}\|{\boldsymbol w}\|{ }^2\). As previously mentioned, minimization of ∥w∥ leads to maximization of the margin;

  2. 2.

    \(\sum _{i=1}^k\sum _{j\in \mathcal {J}_i^-}\max \{0, 1+({\boldsymbol w}{{{ }^T}} {\boldsymbol x}_j+\gamma )\}\). This term is the total classification error relatively to the negative bags;

  3. 3.

    \(\sum _{i=1}^m\sum _{j\in \mathcal {J}_i^+}\max \{0, 1-y_j({\boldsymbol w}{{{ }^T}} {\boldsymbol x}_j+\gamma )\}\). This term represents the total classification error of the instances belonging to positive bags. Notice that such an error is zero if and only if for each positive bag \(\mathcal {J}^+_i\), i = 1, …, m, there exists at least one instance \(j\in \mathcal {J}_i^+ \) such that w Tx j + γ ≥ 1. Note that, by letting the corresponding label y j = 1, feasibility with respect to constraint \(\sum _{j\in \mathcal {J}_i^+}\frac {y_j+1}{2} \geq 1 \) is achieved and, in addition, the classification error associated to such a bag is driven to zero, provided no instance of such bag falls into the “no man land”, that is in the area where |w Tx + γ| < 1.

Summing up the classification error is equal to zero if and only if all negative bags are contained in the set S , at least one instance of each positive bag belongs to the set S + and no instance x j of any positive bag satisfies the condition |w Tx j + γ| < 1.

The Lagrangian heuristic introduced in [4] is based on relaxation of the linear constraints \(\sum _{j\in \mathcal {J}_i^+}\frac {y_j+1}{2} \geq 1\). The following problem is then obtained:

$$\displaystyle \begin{aligned} \begin{aligned} z_{LR}(\boldsymbol{\lambda})\,\,= \,\,\,& \underset{{\boldsymbol w},\gamma,{\boldsymbol y}} {\text{min}} \quad \displaystyle f({\boldsymbol w},\gamma,{\boldsymbol y})+\displaystyle\sum_{i=1}^m \lambda_i \left(1-\displaystyle \sum_{j\in \mathcal{J}_i^+}\displaystyle\frac{y_j+1}{2}\right)\\ &\,\text{subject to} ~~ y_j\in\{-1,1\}, \,\,\, j\in \mathcal{J}_i^+, \,\,\,i=1,\ldots,m, \end{aligned} \end{aligned} $$
(17.82)

where λ ≥ 0 is a vector of multipliers in \(\mathbb {R}^m\).

The Lagrangian dual problem is, as usual,

$$\displaystyle \begin{aligned} z_{LD} = \underset{{ \boldsymbol{\lambda}}\geq 0} {\displaystyle \max} \quad z_{LR}(\boldsymbol{\lambda}). \end{aligned} $$
(17.83)

It is of course z LR(λ) ≤ z , for any choice of the multiplier λ ≥ 0, and z LD ≤ z . The tackling problem (17.83) within a Lagrangian heuristic scheme requires at each iteration calculation of function z LR(λ) by solving (17.82), a MINLP which is particularly suitable for application of a block coordinate descent method (see [52]). In fact the algorithm adopted in [4] works by alternately fixing, at each iteration, the values of y and of the couple (w, γ), according to the following scheme.

Algorithm 17.3: Calculation of z LR(λ)

We remark that the minimization problem at Step 1 is a standard SVM-like problem, while solution of the problem at Step 2 can be easily obtained by inspection of the values \(h_j^{(l+1)}= {\boldsymbol w}^{(l+1)T}{\boldsymbol x}_j + \gamma ^{(l+1)}\).

In [4] update of the multiplier vector λ is performed by standard subgradient method adopting the Polyak step size (17.28). A projection mechanism to take into account the nonnegativity of λ is also embedded.

To complete the bird’s-eye view of the Lagrangian heuristic, we observe that any solution (w(λ), γ(λ), y(λ)) of the Lagrangian relaxation which violates the relaxed constraints can be easily “repaired” to get feasibility by means of greedy modification of one or more label variables y j.

It is worth noting that the Lagrangian dual (17.83) enjoys the relevant property that the duality gap is equal to zero, that is z LD = z . Moreover, by solving the Lagrangian dual, one gets, in fact, also an optimal solution for the problem (17.81). These results are provided by the following theorem [4].

Theorem 17.2

Letλ be any optimal solution to the Lagrangian dual (17.83) and let (w , γ , y ) be any optimal solution to the Lagrangian relaxation (17.82) forλ = λ . Then (w , γ , y ) is optimal for the original problem (17.81) and z LD = z .

The implementation of the Lagrangian heuristic has provided satisfactory results on a number of benchmark test problems. In particular the zero duality gap has been fully confirmed.

6 Conclusions

We have provided some basic notions on Lagrangian relaxation, focusing on its application in designing heuristic algorithms of the so-called Lagrangian heuristic type. Although the number of applications available in the literature is practically uncountable, we are convinced that the potential for future application is still enormous.