Introduction

Many practical optimization problems include decision variables that are integer or 0-1. These problems, called mixed-integer programming problems or MIP for short, are in general difficult to solve, and there have been traditionally two classes of approaches to solve them: branch-and-bound or enumeration, and heuristic methods, either ad hoc or generic. Broadly speaking, branch-and-bound methods construct a tree, usually binary, that allows the systematic exploration of all integer or 0-1 combinations of the discrete variables. Logical considerations and/or bounds on the optimal value computed as one moves down the tree may allow the pruning of a branch and backtracking to its root because one discovers that it would lead to infeasibilities or inferior solutions. Typically, bounds are obtained by solving a simpler, relaxed, optimization problem, most of the time the continuous relaxation of the MIP problem in which integer or 0-1 variables are allowed to take on fractional values. Heuristics, on the other hand, search for better and better feasible integer solutions, and do not usually compute bounds on the optimum, and therefore, even though they are getting more and more sophisticated and excel at finding optimal or near optimal solutions, cannot guarantee the quality of the solutions found.

Lagrangian relaxation stands somehow at the crossroads of both approaches. More powerful in terms of bound quality than the continuous relaxation, it also produces partially infeasible, but integer, solutions. These can usually serve as excellent starting points for specialized heuristics, referred to as Lagrangian heuristics. Contrary to the general heuristics mentioned above, given that one has found a bound called the Lagrangian bound, one knows whether the best solution found is good enough, or if it requires further investigation.

There is an enormous amount of literature devoted to the theory and applications of Lagrangian relaxation, starting with the seminal papers of Held and Karp (1970, 1971) and of Geoffrion (1974), although one could trace it back to earlier sources, for instance Everett’s multipliers work (1963). Some early guides include (Fisher 1981, 1985).

Some of the questions to be addressed: Why use Lagrangian relaxation for integer programming problems? How does one construct a Lagrangian relaxation? What tools are there to analyze the strength of a Lagrangian relaxation? Are there more powerful extensions than standard Lagrangian relaxation, and when should they be used? Why is it that one can sometimes solve a strong Lagrangian relaxation by solving trivial subproblems? How does one compute the Lagrangian relaxation bound? Can one take advantage of Lagrangian problem decomposition? Does the strength of the model used make a difference in terms of bounds? Can one strengthen Lagrangian relaxation bounds by cuts, either kept or dualized? How can one design a Lagrangian heuristic? Can one achieve better results by remodeling the problem prior to doing Lagrangian relaxation?

The problems considered here have some integer variables, linear objective functions and constraints, and everything described below applies to maximization as well as minimization problems via the trivial sign transformations:

$$ {\rm{Max}\ }\left\{ {f(x)\left| {x \in V} \right.} \right\}{} = - {\rm{Min}\ }\left\{ { - f(x)\left| {x \in V} \right.} \right\}. $$

Notation

If (P) is an optimization problem,

FS(P) denotes

the set of feasible solutions of problem (P)

OS(P)

the set of optimal solutions of problem (P)

v(P)

the optimal value of problem (P)

uk, sk, etc.,

the value of u, s, etc., used at iteration k

x T

the transpose of x

x k

the kth extreme point of some polyhedron (see context)

x (k)

a solution found at iteration k.

denotes strict inclusion.

Co(V)

denotes the convex hull of set V.

Relaxations of Optimization Problems

Geoffrion (1974) formally defines a relaxation of a generic minimization problem as follows.

Definition 1

Problem (RPmin): \( {\rm{Min}\ }\left\{ {g(x)\left| {x \in W} \right.} \right\} \)is a relaxation of problem (Pmin): \( {\rm{Min}\ }\left\{ {f(x)\,\,\left| {\,x \in V} \right.} \right\} \)if and only if (i) the feasible set of (RPmin) contains that of (Pmin), and (ii) over the feasible set V of (Pmin), the objective function of (RPmin) dominates (is better than) that of (Pmin), i.e., ∀ xV, g(x) ≤ f(x).

It clearly follows that v(RPmin) ≤ v(Pmin), in other words (RPmin) is an optimistic version of (Pmin): it has more feasible solutions than (Pmin), and for feasible solutions of (Pmin), its own objective function is at least as good as (smaller than or equal to) that of (Pmin), thus it has a smaller minimum.

Lagrangian Relaxation (LR)

In the rest of the note, (P) is assumed to be of the form Min x \( \{ fx\left| {Ax} \right. \leq b \), Cxd, xX}, where X contains the integrality restrictions on x, i.e. \( X = {\mathbb{R}^{{n - p}}} \times {\mathbb{\ Z}^p} \), or \( X = \mathbb{R}^{{n - p}} \times {\{ 0,1\}^p} \). Let I(X) be the set of the p indices of x restricted to be integer (or binary). The constraints Axb are assumed complicating, in the sense that, without them, problem (P) would be much simpler to solve. The constraints Cxd (possibly empty) will be kept, together with X, to form the Lagrangian relaxation of (P) as follows. Let λ be a nonnegative vector of weights, called Lagrangian multipliers.

Definition 2

The Lagrangian relaxation of (P) relative to the complicating constraints Axb, with nonnegative Lagrangian multipliers λ, is the problem (LR λ ) \( {\rm{Mi}}{{\rm{n}}_x}\{ f{\ }x + \lambda (Ax - b)\left| {Cx \leq d,x \in X} \right.\} \).

Notice that (LR λ ) is still an integer programming problem, so its solutions, unlike those of the continuous relaxation, are integer solutions. However they need not be feasible solutions of (P), as they may violate some, or all, of the complicating constraints Axb, which are not enforced any more. In (LR λ ), the slacks of the complicating constraints Axb have been added to the objective function with weights λ. One says that the constraints Axb have been dualized. (LR λ ) is a relaxation of (P), since (i) FS(LR λ ) contains FS(P), and (ii) for any x feasible for (P), and any λ ≥ 0, fx +λ(Ax− b) is less than or equal to fx (i.e., not worse, since it is a minimization problem). It follows that v(LR λ ) ≤ v(P), for all λ ≥ 0, i.e., the optimal value v(LR λ ), which depends on λ, is a lower bound on the optimal value of (P).

Definition 3

The problem of finding the tightest Lagrangian lower bound on v(P) , i.e., (LR) Maxλ ≥ 0v(LR λ ), is called the Lagrangian dual of (P) relative to the complicating constraints Axb. v(LR) is called the Lagrangian relaxation bound, or simply the Lagrangian bound.

Let (LP) denote the linear programming relaxation of problem (P). By LP duality, any Lagrangian relaxation bound is always at least as good as the LP bound, i.e., v(P), never worse. Notice also that (LR) is a problem in the dual space of the Lagrangian multipliers, whereas (LR λ ) is a problem in x, i.e., in the primal space.

Feasible Lagrangian solution

Let x(λ) denote an optimal solution of (LR λ ) for some λ ≥ 0, then x(λ) is called a Lagrangian solution. One may be tempted to think that a Lagrangian solution x(λ) that is feasible for the integer problem (i.e., that satisfies the dualized constraints) is also optimal for that problem. In fact this is generally not the case. What is true is that the optimal value of (P), v(P), lies in the interval between fx(λ)+λ[Ax(λ)−b] and fx(λ), where fx(λ) is the value of a feasible solution of (P), thus an upper bound on v(P), and fx(λ)+λ[Ax(λ)−b] is the optimal value of the Lagrangian problem (LR λ ), thus a lower bound on v(P). If, however, complementary slackness holds, i.e., if λ[Ax(λ)−b] is 0, then fx(λ)+λ[Ax(λ)−b] = v(P) = fx(λ), and x(λ) is an optimal solution for (P).

Theorem 1

(1) If x(λ) is an optimal solution of (LR λ ) for some λ ≥ 0, then fx(λ)+λ[Ax(λ)−b] ≤ v(P). If in addition x(λ) is feasible for (P), then fx(λ)+λ[Ax(λ)−b] ≤ v(P) ≤ fx(λ).

(2) If in addition λ[Ax(λ)−b] = 0, then x(λ) is an optimal solution of (P), and v(P) = fx(λ).

Remarks

Notice first that (2) is a sufficient condition of optimality, but it is not necessary. I.e., it is possible for a feasible x(λ) to be optimal for (P), even though it does not satisfy complementary slackness. If the constraints that are dualized are equality constraints, and if x(λ) is feasible for (P), complementary slackness holds automatically, thus x(λ) is an optimal solution of (P), with v(P) = fx(λ).

Geometric Interpretation

The following theorem, from (Geoffrion 1974), is probably what sheds most light on Lagrangian relaxation. It gives a geometric interpretation of the Lagrangian dual problem in the x- space, i.e., in the primal space, and this permits an in-depth study of the strength of specific Lagrangian relaxation schemes.

Theorem 2

The Lagrangian dual (LR) is equivalent to the primal relaxation (PR) \( {\rm Min}{_x} \left\{ {fx\left| {Ax \leq b} \right.,x \in Co\left\{ {x \in X\left| {Cx \leq d} \right.} \right\}} \right\} \), in the sense that v(LR) = v(PR) (Fig. 1).

Fig. 1
figure 121figure 121

Geometric interpretation of Lagrangian relaxation

This result is based on LP duality and properties of optimal solutions of linear programs. Remember though that this result may not be true if the constraint matrices are not rational.

The following important definition and results follow from this geometric interpretation.

Definition 4

One says that (LR) has the Integrality Property (IP for short) if\( {\rm{Co}}\left\{ {x \in X\left| {Cx \leq d} \right.} \right\}{} = \left\{ {x \in {\mathbb{R}^n}\left| {Cx \leq d} \right.} \right\} \).

If (LR) has the Integrality Property, then the extreme points of \( \left\{ {x \in {\mathbb{R}^{^n }}\left| {Cx \leq d} \right.} \right\} \) are in X. The unfortunate consequence of this property, as stated in the following corollaries, is that such an LR scheme cannot produce a bound stronger than the LP bound. Sometimes, however, this is useful anyway because the LP relaxation cannot be computed easily. This may be the case for instance for some problems with an exponential number of constraints that can be relaxed anyway into easy to solve subproblems. The traveling salesman problem is an instance of a problem which contains an exponential number of (subtour elimination) constraints. A judicious choice of dualized constraints leads to Lagrangian subproblems that are 1-tree problems, thus eliminating the need to explicitly write all the subtour elimination constraints (Held and Karp 1970, 1971).

Here are the two corollaries of Theorem 2 that explain the important role played by the Integrality Property.

Corollary 1

If\( {\rm{Co}}\left\{ {x \in X\left| {Cx \leq d} \right.} \right\}{} = \left\{ {x \in {\mathbb{R}^n}\left| {Cx \leq d} \right.} \right\} \), then v(LP) = v(PR) = v(LR) ≤ v(P).

In that case, the Lagrangian relaxation bound is equal to (cannot be better than) the LP bound.

Corollary 2

If\({\rm{Co}}\left\{ {x \in X\left| {Cx \leq d} \right.} \right\} \subset \left\{ {x \in\,{\mathbb{R}^n}\left| {Cx \leq d} \right.} \right\} \), then v(LP) ≤ v(PR) = v(LR) ≤ v(P), and it may happen that the Lagrangian relaxation bound is strictly better than the LP bound.

Unless (LR) does not have the Integrality Property, it will not yield a stronger bound than the LP relaxation. It is thus important to know if all vertices of the rational polyhedron \( \left\{ {x \in\,{\mathbb{R}^n}|Cx \leq d} \right\} \) are in X.

Easy-to-Solve Lagrangian Subproblems

It may happen that Lagrangian subproblems, even though in principle hard to solve because they do not have the Integrality Property, are in fact much easier to solve through some partial decomposition; they can sometimes even be solved in polynomial time, by exploiting their special structure. It is of course important to be able to recognize such favorable situations, especially if one can avoid using Branch-and-Bound to solve them. It should be noted that these favorable cases do not in general occur naturally, but only after some constraint(s) have been dualized, due to a weakening of the original links between continuous and integer variables.

One case is due to what is sometimes called the Integer Linearization Property (or ILP for short) for mixed 0-1 problems.

Integer Linearization Property

Geoffrion (1974) and Geoffrion and McBride (1978) described and used this important property of some Lagrangian subproblems. W.l.o.g., assume that all variables are indexed by i∈I, and maybe by some additional indices, and that some of the 0-1 variables are called y i . If, except for constraints containing only these 0-1 variables y i , the Lagrangian problem, say, (LR λ ), has the property that the value taken by a given y i decides alone the fate of all other variables containing the same value of the index i − that usually means that if variable y i is 0, all variables in its family are 0, and if it is 1, they are solutions of a subproblem − one may be able to reformulate the problem in terms of the variables y i only. Often, but not always, when this property holds, it is because the Lagrangian problem, after removal of all constraints containing only the y i ’s − call it (LRP λ ), for partial problem – decomposes into one problem \( {(\rm{LRP}}_{\lambda}^i) \) for each i, i.e., for each 0-1 variable y i . The use of this property is based on the following fact. In problem \( {(\rm{LRP}}_{\lambda}^i) \), the integer variable y i can be viewed as a parameter, however one does know that for the mixed-integer problem \( {(\rm{LRP}}_{\lambda}^i) \), the feasible values of that parameter are only 0 and 1, and one can make use of the fact that there are only two possible values for \( v\left( {{\rm{LRP}}_{\lambda}^i} \right) \), the value computed for y i =1, say v i (= v i .y i for y i =1), and the value for y i =0, that is, 0 (= v i .y i for y i =0), which implies that for all possible values of y i , \( v\left( {{\rm{LRP}}_{\lambda}^i} \right) = {v_i}.{y_i} \). Hence the name integer linearization, as one replaces a piecewise linear function corresponding to 0 ≤ y i ≤ 1 by a line through the points (0, 0) and (1, v i ) (Fig. 2).

Fig. 2
figure 122figure 122

Integer linearization property

One may in such cases obtain LR bounds much tighter than the LP bounds, even though the subproblems are trivial to solve.

Constructing a Lagrangian Relaxation

There are often many ways in which a given problem can be relaxed in a Lagrangian fashion. A few standard ones are listed here, mostly to point out that often some reformulation prior to relaxation can help, and that for many complex models, intuition and some understanding of the constraint interactions may suggest ingenious and efficient relaxation schemes.

  1. (1)

    One can isolate an interesting subproblem and dualize the other constraints.

This is the most commonly used approach. It has the advantage that the Lagrangian subproblems are interesting (in the sense usually that they have a special structure that can be exploited) and there may even exist specialized algorithms for solving them efficiently.

  1. (2)

    If there are two (or more) interesting subproblems with common variables, one can split these variables first, then dualize the copy constraint.

This is called Lagrangian decomposition (LD) (Soenen 1977), variable splitting (Näsberg et al. 1985), or variable layering (Glover and Klingman 1988). One must first reformulate the problem using variable splitting, in other words, one must rename the variables in part of the constraints as if they were independent variables. Problem (P): \( {\rm{Mi}}{{\rm{n}}_x} \left\{ {f{\ }x\left| {Ax \leq b} \right.,Cx \leq d, x \in X} \right\} \) is clearly equivalent to problem (P′): \( {\mathrm{Mi}}{{\mathrm{n}}_{x,y }}\left\{ {f{\ }x\left| {Ax \leqslant b,x \in X, Cy \leqslant d, y \in X, x = y} \right.} \right\} \), in the sense that they have equal optimal values (but notice that they have different variable spaces). In addition if x* is an optimal solution of (P), then the solution (x, y) ≡ (x*, x*) is optimal for (P′), and if (x*, y*) is an optimal solution of (P′) with x* = y*, then x* is optimal for (P). One dualizes the copy constraint x = y in (P′) with multipliers λ, this separates the problem into an x-problem and a y-problem: (LD λ ) \( {\mathrm{Mi}}{{\mathrm{n}}_{x,y }}\left\{ {f{\ }x + \lambda \left( {y - x} \right)\left| {Ax \leqslant b,x \in X, Cy \leqslant d, y \in X} \right.} \right\} = {\mathrm{Mi}}{{\mathrm{n}}_x}\left\{ {(f - \lambda )x\left| {Ax \leqslant b,x \in X} \right.} \right\}{} + {\mathrm{Mi}}{{\mathrm{n}}_y}\left\{ {\lambda y\left| {Cy \leqslant d, y \in X} \right.} \right\} \).

This process creates a staircase structure, and thus decomposability, in the model. Notice that here λ is not required to be nonnegative.

Remember also that when one dualizes equality constraints, a feasible Lagrangian solution is automatically optimal for the original integer programming problem. The copy constraint being an equality constraint, if both Lagrangian subproblems have the same optimal solution, that solution is optimal for the IP problem.

Guignard and Kim (1987) showed that the LD bound can strictly dominate the LR bounds obtained by dualizing either set of constraints:

Theorem 3

$$ \begin{array} {ll}If\,v\left( {\mathrm{LD}} \right){} =& {\mathrm{Ma}}{{\mathrm{x}}_{\lambda }}[{\mathrm{Mi}}{{\mathrm{n}}_x}\{ (f - \lambda )x\left| {\,\,Ax} \right. \leqslant b,\,\,x \in X\} \\ & + {\mathrm{Min}} \{ \lambda y\left| C \right.y\,\, \leqslant\,\,d, y \in X\} ]\,\,then{\ } \\ & v\left( {\mathrm{LD}} \right){} = {\mathrm{Min}\ }\{ fx\mathop{{\left| {} \right.}}\limits^{_y } x \in {\mathrm{Co}}\{ x \in X\left| A \right.x \leqslant b\}\\& \cap {\mathrm{Co}}\{ x \in X\left| C \right.x \leqslant d\} \} . \\ \end{array} $$

This new geometric interpretation is demonstrated in Fig. 3.

Fig. 3
figure 123figure 123

Geometric interpretation of Lagrangean decomposition

Corollary 3

  • If one of the subproblems has the Integrality Property, then v(LD) is equal to the better of the two LR bounds corresponding to dualizing either Ax ≤ b or Cx ≤ d.

  • If both subproblems have the Integrality Property, then v(LD) = v(LP).

A very important application of the splitting variable scheme can be found in stochastic optimization, when the uncertainty is represented by 2-stage or multistage scenario trees. The non-anticipativity constraints (or NAC) must be satisfied by the variables attached to the scenario groups or nodes in the tree. Splitting variables in the NAC and dualizing the copy constraints produces a Lagrangean decomposition of the Deterministic Equivalent Model. See Escudero (2009) and Birge and Louveaux (2011), among others.

Occasionally the variable splitting will correspond to a physical split of one of the problem’s decision variables. This is illustrated by the following example.

Example 1

Guignard and Yan (1993) described the following problem and scheme for a hydroelectric power management problem.

Electric utility production planning is the selection of power generation and energy efficiency resources to meet customer demands for electricity over a multi-period time horizon. The project described in the paper is a real-world hydropower plant operations management problem of a dispatch type. The system consists of a chain of 10 consecutive hydropower plants separated by reservoirs and falls with 23 identical machines installed to generate electric power. Specifically there are two machines installed in eight power plants (plants 1, 2, 3, 4, 5, 6, 7, and 10), three machines in one power plant (plant 8) and four machines in the last power plant (plant 9). Each machine has two or four work parts for producing electric power, according to different water throughput. Since demand for electric power varies with different time periods, power plant managers must make optimal decisions concerning the number of machines that should be operated in each power plant during each time period. Managing the power generation requires decisions concerning water releases at each plant k in each time period. A period is two hours. The model (which is confidential) was constructed by an independent consulting firm. This results in a large mixed-integer program. The problem is complex, with 2,691 variables, 384 of which are binary, and 12,073 constraints. The firm had tried to solve the problem for the utility company with several of the best MIP software packages available, with help from the software companies themselves. Yet they did not succeed. Guignard and Yan repeated the tests with several solvers running under GAMS, on several RISC systems, also to no avail. The best result after 5 days and six hours on an HP workstation was a bracket [3174.97, 3534.17], i.e., a residual gap of more than 11%.

In order to reduce the complexity of the model, they tried several Lagrangian relaxations and decompositions. One of the decompositions tested consists in “cutting” each reservoir in half (see Fig. 4), i.e. splitting the water level variable in each reservoir, and dualizing the following copy constraint:

$$ {\rm{high}\ \rm{water}\ \rm{level}\ \rm{in}}\,k + {1} = {\rm{low}\ \rm{water}\ \rm{level}\ \rm{in}}\,k. $$
Fig. 4
figure 124figure 124

Lagrangian decomposition splits the water level

This Lagrangian decomposition produces one power management problem per power plant k. These subproblems do not have a special structure, but are much simpler and smaller than the original problem, are readily solvable by commercial software, and do not have the Integrality Property. They were solved by Branch-and-Bound.

This LD shrinks problem size, and yields Lagrangian bounds much stronger than the LP bounds. In addition the Lagrangian solutions can be modified to provide feasible schedules.

  1. (3)

    One can dualize linking constraints:

After possibly some reformulation, problems may contain independent structures linked by some constraints: \( {\rm{Mi}}{{\rm{n}}_{x,y }} \{ f{\ }x{} + gy\left| A \right. x \leq b,\,\,\,x \in X,\,\,Cy \leq d,y \in Y,\,\, Ex + Fy \leq h\} \). Dualizing the linking constraints Ex + Fyh splits the problem into an x-problem and a y-problem. The original problem may only contain x and some reformulation introduces a new variable y, while the relationship between x and y is captured by the new constraints Ex + Fyh.

Example 2

A production problem over multiple facilities contains constraints related to individual facilities, while the demand constraints link all plant productions. If one dualizes the demand constraints, the Lagrangian problem decomposes into a production problem for each facility, which is typically much easier to solve than the overall problem. If at least one of these subproblems does not have the Integrality Property, this LR may yield a tighter bound than the LP bound. In (Andalaft et al. 2003), a forest company must harvest geographically distinct areas, and dualizing the demand constraints splits the problem into one subproblem per area, which is typically much easier to solve than the overall problem.

  1. (4)

    One can sometimes dualize aggregate rather than individual copies of variables.

Instead of creating a copy y of variable x and introducing y into model (P) by rewriting the constraint Cxd as Cyd, to yield the equivalent model (P′): \( {\rm{Mi}}{{\rm{n}}_{x,y }}\,\{ f{\ }x\left| A \right.x \leq b,\,\,\,x \in X,\,\,Cy \leq d,\,\, y \in X,\,\, x = y\}, \) one can also create a problem (P′′) equivalent to problem (P) by introducing a new variable y and forcing the constraint Dy = Cx. This constraint is in general weaker than the constraint x = y. Model (P′′) is \( {\rm{Mi}}{{\rm{n}}_{x,y }}\{ f{\ }x\left| A \right.x \leq b,\,\,x \in X,Dy \leq d,\,\, y \in X,\,\,Dx= Cy\} \). The LR introduced here dualizes the aggregate copy constraint Dx = Cy.

Notice that the copy constraint is an equality constraint, therefore if the Lagrangian subproblems have optimal solutions x and y that satisfy the aggregate copy constraint, i.e., if Dy = Cx, then the x- solution is optimal for the IP problem.

Example 3

Consider the bi-knapsack problem

(BKP)\( {\rm{Ma}}{{\rm{x}}_x} \{ \sum\nolimits_i {c_i } {x_i}\vert \sum\nolimits_i {b_i } {x_i} \leq m,\,\,\sum\nolimits_i {d_i } {x_i} \leq n,{x_i} \in \{ {0,{1}} \}, \forall i\} . \)

One can introduce a new variable y, and write \( \sum\nolimits_i {b_i } {x_i} = \sum\nolimits_i {b_i } {y_i}. \) The equivalent problem is

(BKP’)\( {\rm{Ma}}{{\rm{x}}_{x,y }}\{ \sum\nolimits_i {c_i } {x_i}\vert \sum\nolimits_i {b_i } {y_i} \leq m,\,\,\sum\nolimits_i {d_i } {x_i} \leq n,\,\,\quad\;\sum\nolimits_i {b_i } {x_i} = \sum\nolimits_i {b_i } {y_i},\,\,{x_i},{y_i} \in \left\{ {0,{1}} \right\},\forall i \} \)

and the LR problem is

$$ \begin{array}{lll} ({\mathrm{L}}{{\mathrm{R}}_{\lambda }}){\mathrm{Ma}}{{\mathrm{x}}_{x,y }}\left\{ {\sum\nolimits_i {c_i } {x_i} - \lambda \left( {\sum\nolimits_i {b_i } {x_i} - \sum\nolimits_i {b_i } {y_i}} \right)} \right.\left| \right. . \\ {\sum\nolimits_i {b_i } {y_i} \leqslant m,} \sum\nolimits_i {d_i } {x_i} \leqslant n,\,\,{x_i},{y_i} \in \left\{ {0,{1}} \right\},\forall \left. i \right\} \\ = {\mathrm{Ma}}{{\mathrm{x}}_x}\left\{ {\sum\nolimits_i {({c_i}} - \lambda {b_i}){x_i}\left| {\sum\nolimits_i {d_i } {x_i}} \right. \leqslant n,} \right.{x_i} \in \left\{ {0,{1}} \right\},\forall \left. i \right\} \\ \quad+ {\mathrm{Ma}}{{\mathrm{x}}_y}\left\{ {\lambda \sum\nolimits_i {b_i } \left. {y_i } \right)} \right.\left| {\sum\nolimits_i {b_i } {y_i}} \right. \leqslant m,\,{y_i} \in \left\{ {0,{1}} \right\},\left. {\forall i} \right\}. \\ \end{array} $$

Here λ is a single real multiplier of arbitrary sign. The Lagrangian bound produced by this scheme is in between that of the LP bound and that of the Lagrangian decomposition bound obtained by dualizing x i = y i i. This is similar in spirit to the copy constraints introduced in Reinoso and Maculan (1992).

It would seem natural that a reduction in the number of multipliers should imply a reduction in the quality of the LR bound obtained. This is not always the case, however, as shown in example 4.

Example 4

Chen and Guignard (1998) considered an aggregate Lagrangian relaxation of the capacitated facility location problem. The model uses continuous variables x ij that represent the percentage of the demand d j of customer j supplied by facility i, and binary variables y i , equal to 1 if facility i with capacity a i is operational. The constraint \( \sum\limits_j {d_j } {x_{ij }} \leq {a_i}{y_i} \) imposes a conditional capacity restriction on the total amount that can be shipped from potential facility i.

(CPLP)

Constraint (T) is redundant, but may help getting tighter Lagrangian relaxation bounds.

The three best Lagrangian schemes are:

(LR) (Geoffrion and McBride 1978)

One dualizes (D) then uses the integer linearization property. The subproblems to solve are one continuous knapsack problem per plant ((C) with y i = 1) and one 0-1 knapsack problem over all plants (constraint (T)). The Lagrangian relaxation bound is tight, and it is obtained at a small computational cost.

(LD) (Guignard and Kim 1987).

Duplicate (T). Make copies x ij = \( x'_{ij} \) and y y = \( y'_{i} \) and use \( x'_{ij} \) and \( y'_{i} \) in (C) and in one of the (T)’s. One obtains the split

{(D), (B), (T)} → APLP

{(B), (T), (C)} → this is like in (LR)

This LD bound is tighter than the (LR) bound, but expensive to compute, in particular because of a large number of multipliers.

(LS) (Chen and Guignard 1998).

Copy ∑ j d j x ij = ∑ j d j \( x'_{ij} \) and y i = \( y'_{i} \) in (C). This yields the same split as (LD), and the same bound. This is very surprising, as it is less expensive to solve (LS) than (LD), in particular because (LS) has far fewer multipliers.

In example 4, creating new copy variables \( x'_{ij} \) and \( y'_{i} \), one can create an LS by dualizing the aggregate (linking) copy constraints \( \sum\limits_j {d_j {\ }{x_{ij }}} = \sum\limits_j {d_j x{'_{ij }}} \) and \( {a_i}{y_i} = {a_i}y{'_i} \). Surprisingly, one can prove that the LS bound for this problem is as strong as the LD bound obtained by dualizing individual copies x ij = \( x'_{ij} \) and y i = \( y'_{i} \). This suggests that “aggregating” variables before copying them may be an attractive alternative to Lagrangian decomposition, at least for some problem structures. A more general structure than CPLP is actually described in Chen and Guignard (1998).

Characteristics of the Lagrangian Function

The Lagrangian function z(λ) = v(LRλ) is an implicit function of λ. Suppose that the set Co\( \left\{ {x \in X\left| C \right.x \leq d} \right\} \) is a polytope, i.e., a bounded polyhedron, then there exists a finite family {x1, x2,…, xK} of extreme points of Co\( \left\{ {x \in X\left| C \right.x \leq d} \right\} \), i.e., of points of \( \left\{ {x \in X\left| C \right.x \leq d} \right\} \), such that \( {\rm{Co}}\left\{ {x \in X\left| C \right.x \leq d} \right\} = {\rm{Co}\ }\left\{ {x^1, {x^2}, \ldots, {x^K}} \right\} \). It then follows that

$$ \eqalign{ {\rm{Mi}}{{\rm{n}}_x} \left\{ {fx + \lambda \left( {b - Ax} \right)\left| C \right.x \leq d,x \in X} \right\} \cr = {\rm{Mi}}{{\rm{n}}_k}_{{= {1}, \ldots, K}} \left\{ {f{\ }{x^k} + \lambda \left( {b - A{x^k}} \right)} \right\} \cr } $$

and z(λ) is the lower envelope of a family of linear functions of λ, \( f{\ }{x^k} + \lambda \left( {b - A{x^k}} \right) \), k=1,…,K, and thus is a concave function of λ, with breakpoints where it is not differentiable, i.e., where the optimal solution of (LRλ) is not unique. Figure 5 shows a Lagrangian function for the case where (P) is a maximization problem, this (LR) is a minimization problem, and z(λ) a convex function of (λ).

Fig. 5
figure 125figure 125

The Lagrangean function of a maximization problem

A concave function f(x) is continuous over the relative interior of its domain, and it is differentiable almost everywhere, i.e., except over a set of measure 0. At points where it is not differentiable, the function does not have a gradient, but is always has subgradients.

Definition 5

A vector y ∈ (Rn)* is a subgradient of a concave function f(x) at a point x0 ∈ Rnif for all x ∈ Rn

$$ f(x) - f\left( {x^0 } \right) \leq y\cdot(x - {x^0}). $$

Definition 6

The set of all subgradients of a concave function f(x) at a point x0is called the subdifferential of f at x0and it is denoted ∂f(x0).

Theorem 4

The subdifferential ∂f(x0)of a concave function f(x) at a point x0is always nonempty, closed, convex and bounded.

If the subdifferential of f at x0 consists of a single element, that element is the gradient of f at x0, denoted by \( \nabla f(x^0) \).

The dual problem (LR) is

$$ \begin{array}{*{20}{l}} \qquad{\mathrm{Ma}}{{\mathrm{x}}_{{\lambda\,\,\, \geqslant\,\,0}}}v({\mathrm{L}}{{\mathrm{R}}_{\lambda }}){} = {\mathrm{Ma}}{{\mathrm{x}}_{{\lambda \geqslant 0}}}z(\lambda ) = \\ {(\mathrm{LR})} \, {\mathrm{Ma}}{{\mathrm{x}}_{{\lambda\,\,\, \geqslant\,\,0}}}{\mathrm{Mi}}{{\mathrm{n}}_k}_{{= {1}, \ldots, K}} \left\{ {f{\ }{x^k} + \lambda \left( {b - A{x^k}} \right)} \right\} = \\ \qquad{\mathrm{Ma}}{\mathrm{x}}_{{\lambda\,\, \geqslant\,\,0,\eta }}\left\{ {\eta \left| \eta \right. \leqslant f{\ }{x^k} + \lambda (b - A{x^k}),k = {1}, \ldots, K} \right\}. \end{array} $$

Let λ* be a minimizer of z(λ), η* = z*), λk be a current “guess” at λ*, let η k = zk), and \( {{\rm{H}}_k} = \left\{ {\lambda \left| f \right.{x^k} + \lambda (b - A{x^k}){} = {\eta^k}} \right\} \) be a level hyperplane passing through λk.

  • If z(λ) is differentiable at λk, i.e., if (LRλ) has a unique optimal solution xk, it has a gradientzk) at λk:

    $$ {\nabla^{\rm{T}}}z({\lambda^k}){} = (b - A{x^k}) \bot {{\rm{H}}_k}. $$
  • If z(λ) is nondifferentiable at λk, i.e., if (\( {\rm LR}^k_\lambda \)) has multiple optimal solutions, the vector \( {s^k} = {(b - A{x^k})^{\rm{T}}} \) is a subgradient of z(λ) at λk. That vector sk is orthogonal to H k .

If one considers the contours \({\rm{C}}(k\} {} = \left\{ {\lambda \in \mathbb{R}_{+}^m|z(\lambda ) \geq a} \right\} \), α a scalar, these contours are convex polyhedral sets. See Fig. 6.

Fig. 6
figure 126figure 126

Contours and subgradient

Note: A subgradient is not necessarily a direction of increase for the function, even locally, as seen on Fig. 6.

Theorem 5

The vector (b−Axk)T is a subgradient of z(λ) at λk.

Primal and Dual Methods to Solve Relaxation Duals

A number of methods have been proposed to solve Lagrangian duals. They are either ad-hoc, like for instance dual ascent methods, or general purpose, usually aiming at solving a generic nonsmooth convex optimization problem. This section reviews the most important approaches.

Subgradient Method

This method was proposed in (Held and Karp 1971). It is an iterative method in which at iteration k, given the current multiplier vector λk, a step is taken along a subgradient of z(λk), then, if necessary, the resulting point is projected onto the nonnegative orthant.

Let x(k) be an optimal solution of (\( {\rm LR}^k_\lambda \)). Then \( {s^k} = {(b - A{x^{(k) }})^{\rm{T}}} \) is a subgradient of z(λ) at λk. If λ* is an (unknown) optimal solution of (LR), with η* = z*), let λ′k+1 be the projection of λk on the hyperplane H* parallel to H k , defined by

$$ {{\rm{H}}^{*}} = \left\{ {\lambda \left| f \right.{x^k} + \lambda (b - A{x^{(k) }}){} = {\eta^{*}}} \right\}. $$

The vector sk is perpendicular to both H k and H*, therefore λ′k+1 − λk is a nonnegative multiple of sk:

$$ \lambda {'^k}^{{+ {1}}} - {\lambda^k} = \mu {s^k}, \mu \geq 0. $$

Also, λ′k+1 belongs to H*:

$$ f{\ }{x^{(k) }} + \lambda {'^{{k +}}}^1(b - A{x^{(k) }}){} = {\eta^{*}}, $$

therefore \( f{\ }{x^k} + \mu {s^k}(b - A{x^{(k) }}){} = {\eta^k} + \mu {s^k}.{s^k} = {\eta^{*}} \)

and \( \mu = ({\eta^{*}} - {\eta^k})/||{s^k}|{|^2} \),

so that \( \lambda {'^k}^{{+ {1}}} = {\lambda^k} + {s^k}.{\ }({\eta^{*}} - {\eta^k}){\ }/||{s^k}|{|^2} \).

Finally define \( {\lambda^k}^{{+ {1}}} = {[\lambda '^k}^{{+ {1}}}{]^{+ }} \), i.e., define the next iterate λk+1 as the projection of λ′k+1 onto the nonnegative orthant, as λ must be nonnegative. Given the geometric projections described above, it is clear that λk+1 is closer to λ* than λk, thus the sequence \( ||{\lambda^k} - {\lambda^{*}}|{|^2} \) is monotone nonincreasing.

Remark

This formula unfortunately uses the unknown optimal value η* of (LR). One can try to use an estimate for that value, but then one may be using either too small or too large a multiple of sk. If one sees that the objective function values do not improve for too many iterations, one should suspect that η* has been overestimated (for a maximization problem) and that one is overshooting, thus one should try to reduce the difference η*k. This can be achieved by introducing from the start a positive factor ε k ∈ (0,2), in the subgradient formula:

$$ {\lambda^k}^{{+ {1}}} = {\lambda^k} + {s^k}.\,\,{\epsilon_k}({\eta^{*}} - {\eta^k}){\ }/||{s^k}|{|^2}, $$

and reducing the scalar ε k when there is no improvement for too long.

Practical convergence of the subgradient method is unpredictable, sometimes quick and fairly reliable, sometimes erratic. Many authors have studied this problem and have proposed a variety of remedies.

Dual Ascent Methods

In this kind of approach, one takes advantage of the structure of the Lagrangian dual to create a sequence of multipliers that guarantee a monotone increase in Lagrangian function value. This approach had been pioneered by Bilde and Krarup (1967, 1977) for solving approximately the LP relaxation of the uncapacitated facility location problem (UFLP). General principles for developing a successful Lagrangian dual ascent method can be found in (Guignard and Rosenwein 1989).

Constraint Generation Method (Also Called Cutting Plane Method, or CP)

In this method , one uses the fact that z(λ) is the lower envelope of a family of linear functions:

$$ \begin{array}{*{20}{l}} \quad\quad {\mathrm{Ma}}{{\mathrm{x}}_{{\lambda \geqslant {0}}}}v({\mathrm{L}}{{\mathrm{R}}_{\lambda }}){} = {\mathrm{Ma}}{{\mathrm{x}}_{{\lambda \geqslant {0}}}}{\mathrm{z}}(\lambda ) = \\ {\left( {\mathrm{LR}} \right)} {\mathrm{Ma}}{{\mathrm{x}}_{{\lambda \geqslant {0} }}}{\mathrm{Mi}}{{\mathrm{n}}_k}_{{= {1}, \ldots, K}} \left\{ {f{\ }{x^k} + \lambda \left( {b - A{x^k}} \right)} \right\} = \\ \quad\quad {\mathrm{Ma}}{\mathrm{x}}_{{\lambda \geqslant {0},\eta }}\left\{ {\eta \left| \eta \right. \leqslant f{\ }{x^k} + \lambda \left( {b - A{x^k}} \right),k = {1}, \ldots, K} \right\}. \\ \\ \end{array} $$

At each iteration k, one generates one or more cuts of the form

$$ \eta \leq f{x^k} + \lambda (b - A{x^{(k) }}), $$

by solving the Lagrangian subproblem (\( {\rm LR}^k_\lambda \)) with solution x(k). These cuts are added to those generated in previous iterations to form the current LP master problem:

$$ \begin{array} {ll} \left( {{\rm{M}}{{\rm{P}}^k}} \right)\quad {\rm{Ma}}{\rm{x}}_{{\lambda \geq {0},\eta }}\left\{ {\eta \left| \eta \right. \leq f{x^{(h) }} + \lambda (b - A{x^{(h) }}),h = {1}, \ldots, k} \right\}, \end{array} $$

whose solution is the next iterate λk+1. The process terminates when v(MPk) = zk+1). This value is the optimal value of (LR).

Column Generation (CG)

(CG) has been used extensively, in particular for solving very large scheduling problems (airline, buses, etc.). It consists in reformulating a problem as an LP (or an IP) whose activities (or columns) correspond to feasible solutions of a subset of the problem constraints, subject to the remaining constraints. The variables are weights attached to these solutions.

There are two aspects to column generation: first, the process is dual to Lagrangian relaxation and to CP. Secondly, it can be viewed as an application of Dantzig and Wolfe’s decomposition algorithm (Dantzig and Wolfe 1960, 1961).

Let the \( {x^k} \in \left\{ {x \in X|C{x^k} \leq d} \right\} \), kK, be chosen such that \( {{\rm Co}}\left\{ {x^k } \right\} = {\rm{Co}}\left\{ {x \in X\left| C \right.x \leq d} \right\} \). A possible choice for the xk’s is all the points of \( {\rm{Co}}\left\{ {x \in X\left| C \right.x \leq d} \right\} \) but a cheaper option is all extreme points of \( {\rm{Co}}\left\{ {x \in X\left| C \right.x \leq d} \right\} \).

Problem (P): \( {\rm{Mi}}{{\rm{n}}_x}\left\{ {fx\left| A \right.x \leq b,Cx \leq d,x \in X} \right\} \) yields the Lagrangian dual (i.e., in the λ-space) problem

$$ \left( {\rm{LR}} \right)\quad {\rm{Ma}}{{\rm{x}}_{{\lambda \geq 0}}}{\rm{Mi}}{{\rm{n}}_x}\left\{ {fx + \lambda (Ax - b)\left| C \right.x \leq d,x \in X} \right\} $$

which is equivalent to the primal (i.e., in the x-space) problem

$$ \left( {\rm{PR}} \right)\quad {\rm{Mi}}{\mathbf{{n}}_x} \left\{ {fx\left| A \right.x \leq b,x \in \mathbf{{Co}}\left\{ {x \in X\left| C \right.x \leq d} \right\}} \right\}{,} $$

which itself can be rewritten as (PR) \(\eqalign{\,\,\,\,{{\rm Mi}}{\mathbf{{n}}_x}\left\{ { f\left( {\sum\limits_{{k \in K}} {\mu_k } x^k} \right)\left| {A\left( {\sum\limits_{{k \in K}} {\mu_k } x^k} \right)} \right.x \leq b} \right\} \cr = {\rm{Mi}}{\mathbf{{n}}_x}\left\{ {\sum\limits_{{k \in K}} {\mu_k }.(fx^k)\left| {\sum\limits_{{k \in K}} {\mu_k }.(Ax^k) \leq b} \right.} \right\}, \cr } \)given that one can write \( x \in \mathbf{{\rm Co}}\left\{ {x \in X\left| C \right.x \leq d} \right\} \) as \( x = \sum\limits_{{k \in K}} {\mu_k } x^k \), with \( \sum\limits_{{k \in K}} {\mu_k } = 1 \) and \( {\mu_k} \geq 0 \).

The separation of a problem into a master- and a sub-problem is equivalent to the separation of the constraints into kept and dualized constraints. The columns generated are solutions of integer subproblems that have the same constraints as the Lagrangian subproblems.

The value of the LP relaxation of the master problem is equal to the Lagrangian relaxation bound. The strength of a CG or LR scheme would then seem to be based on the fact that the subproblems do not have the integrality property. It may happen however that such a scheme can be successful at solving problems with the integrality property because it permits the indirect computation of v(LP) when this value could not be computed directly, e.g., because of an exponential number of constraints (Held and Karp 1970, 1971).

One substantial advantage of (CP) or (CG) over subgradient algorithms is the existence of a true termination criterion v(MPk) = zk+1).

Bundle Methods

Lemaréchal (1974) introduced an extension of subgradient methods, called bundle methods, in which past information is collected to provide a better approximation of the Lagrangian function. The standard CP algorithm uses the bundle of the subgradients that were already generated, and constructs a piecewise linear approximation of the Lagrangian function. This method is usually slow and unstable. Three different stabilization approaches have been proposed. At any moment, one has a model representing the Lagrangian function, and a so-called stability center, which should be a reasonable approximation of the true optimal solution. One generates a next iterate which is a compromise between improving the objective function and keeping close to the stability center. The next iterate becomes the new stability center ( a serious step) only if the objective function improvement is “good enough”. Otherwise, one has a null step, in which however one improves the function approximation. In addition, this next iterate shouldn’t be too far away from the stability center. The three stabilization approaches propose different ways of controlling the amount of move that is allowed. Either the next iterate must remain within a so-called trust region, or one adds a penalty term to the approximation of the function that increases with the distance from the stability center, or one remains within a region where the approximation of the function stays above a certain level (for a maximizaton problem). This proximity measure is the one parameter that may be delicate to adjust in practical implementations. There is a trade-off between the safety net provided by this small move concept, and the possibly small size of the bound improvement.

The Volume Algorithm (VA)

The volume algorithm (Barahona and Anbil 2000), an extension of the subgradient algorithm, can be seen as a fast way to approximate Dantzig-Wolfe decomposition, with a better stopping criterion, and it produces primal as well as dual vectors by estimating the volume below the faces that are active at an optimal dual solution. It has been used successfully to solve large-scale LP’s arising in combinatorial optimization, such as set partitioning or location problems.

Subproblem Decomposition

In many cases, the Lagrangian subproblem decomposes into smaller problems, and this means that the feasible region is actually the Cartesian product of several smaller regions. One clear advantage is the reduction in computational complexity for the Lagrangian subproblems: indeed, it is generally much easier to solve 50 problems with 100 binary variables each, say, than a single problem with 5,000 (i.e., 50x100) binary variables.

It also means that in column generation, the columns (i.e., the vectors that are feasible solutions of the kept constraints) decompose into smaller subcolumns, and each subcolumn is a convex combination of extreme points of a small region. By assigning different sets of weights to these convex combinations, one allows mix-and-match solutions, in other words, one may combine a subcolumn for the first subproblem that was generated at iteration 10, say, with a subcolumn for the second subproblem generated at iteration 7, etc. , to form a full size column. If one had not decomposed the problem ahead of time, one may have had to wait a long time for such a complete column to be generated.

By duality, this means that in a cutting plane environment, one can also generate sub-cuts for each subproblem, which amounts to first replacing η by z − λb in

$$ \begin{array}{l} \left( {{\mathrm{M}}{{\mathrm{P}}^k}} \right)\,\,{\mathrm{Ma}}{\mathrm{x}}_{{\lambda \geqslant 0,\eta }}\left\{ {\eta \left| \eta \right. \leqslant f{\ }{x^{(h) }} + \lambda (b - A{x^{(h) }}),h = {1}, \ldots, k} \right\} \\ = {\mathrm{Ma}}{\mathrm{x}}_{{\lambda \geqslant 0,z }}\left\{ {z - \lambda b\left| z \right. \leqslant (f - \lambda A){x^{(h) }},h = {1}, \ldots, k} \right\}, \\ \end{array} $$

and then z by a sum of scalars z l , with z l ≤ (fl − λA l ) \( x_l^{(h) } \), where l is the index of the Lagrangian subproblem, fl, A l , and \( x_l^{(h) } \) are the lth portions of the corresponding submatrices and vectors, and \( x_l^h \) is a Lagrangian solution of the lth subproblem found at iteration h, yielding the disaggregated master problem

$$ \left( {{\mathrm{MP}}{{\mathrm{D}}^k}} \right){\mathrm{Ma}}{{\mathrm{x}}_{{\lambda \geqslant {0,\ }{z_l}}}}\left\{ {\sum {_l{z_l}} - \lambda b\left| {} \right.{z_l} \leqslant {{(f - \lambda A)}^l}x_l^h,h = {1}, \ldots, k} \right\}. $$

Example 5

Consider the Generalized Assignment Problem, or GAP (for the minimization case, although it would work in exactly the same way with maximization).

$$ \begin{array}{ll}{\left( {\rm{GAP}} \right)} \;{\rm{Min}}\,\,\sum\nolimits_i {\sum\nolimits_j {} {c_{ij }}} {x_{ij }} \cr \qquad \,\,\,\,\,\ {\rm{s}}.{\rm{t}}.\,\,\,\,\,\,\,\,\,\,\sum\nolimits_j {{a_{ij }}} {x_{ij }} \leq {b_i},\quad \forall i \in I\,\,\,\,\,\left( {\rm{KP}} \right) \cr \qquad \,\, \,\, \,\, \,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\sum\nolimits_i {} {x_{ij }} = 1,\quad \forall j \in J\,\,\,\,\,\,\,\,\,\,\left( {\rm{MC}} \right) \cr \qquad \,\, \,\, \,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,{x_{ij }} \in \{ 0,1\}, \quad \forall i \in I,j \in J. \end{array} $$

Its strong Lagrangian relaxation is

$$ \begin{array}{ll}({\rm{L}}{{\rm{R}}_{\lambda }}) {\rm{Min}}\,\,\sum\nolimits_{i,j } {{c_{ij }}} {x_{ij }} + \sum\nolimits_j {\lambda_j (1 - \sum\nolimits_i {{x_{ij }}} } ) \cr \,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,{\rm{s}}.{\rm{t}}.\,\,\sum\nolimits_j {{a_{ij }}} {x_{ij }} \leq {b_i},\quad \forall i\,\,\,\,\,\,\,\,\,\,\,\left( {\rm{KP}} \right) \cr = \sum\nolimits_j {\lambda_j { +}\sum\nolimits_i {\left\{ {\rm{Min}} \right.} } \sum\nolimits_j {({c_{ij }}} - {\lambda_j}){x_{ij }}\,\,\left| {\sum\nolimits_j {{a_{ij }}} {x_{ij }} \leq {b_i},\forall i } \right. \cr {x_{ij }} \in \{ 0,1\}, \quad \left. {\forall j} \right\}, \end{array} $$

and (LR) is the maximum with respect to λ of v(LR λ ).

Let \( {\rm{EP}}\left( {\rm{KP}} \right){} = \left\{ {x^k |k \in K} \right\} \) be the set of all integer feasible solutions of the constraints (KP), and let \( {\rm{EP}}\left( {{\rm{K}}{{\rm{P}}_{\rm{i}}}} \right){} = \left\{ {x_{i.}^k\left| {k \in {K_i}} \right.} \right\} \) be the set of all integer feasible solution of the ith knapsack, with \( K = \prod\limits_i {K_i } \). Then a feasible solution of (LR λ ) can be described by \( {x_{ij }} = \sum\limits_{{k \in {K_i}}} {\mu_{_k}^i} x_{ij}^k,\forall i,j. \)

The Lagrangian dual is equivalent to the aggregate master problem AMP:

$$ \begin{array}{l} \left( {\rm{AMP}} \right)\;\;{\rm{Ma}}{{\rm{x}}_{{\lambda, \zeta }}} \left\{ {\zeta | \zeta \leq \sum\nolimits_{i,j } {{c_{ij }}} x_{{_{ij}}}^k + \sum\nolimits_j {\lambda_j (1 - \sum\nolimits_i {x_{{_{ij}}}^k} } ),\forall k \in K} \right\} \cr = {\rm{Ma}}{{\rm{x}}_{{\lambda, z}}}\left\{ {z + \sum\nolimits_j {\lambda_j } \left| z \right. \leq \sum\nolimits_{ij } {\left( {{c_{ij }} - {\lambda_j}} \right)x_{ij}^k},\,\,\,\forall k \in K} \right\}\end{array} $$

with the substitution \( \zeta = z + \sum\nolimits_j {\lambda_j } \).

If one had first written the column generation formulation for the Lagrangian dual, one would naturally have de-coupled the solutions of the independent knapsack subproblems, using the independent sets K i instead of K, the column generation master problem would have been disaggegated:

$$\begin{array}{ll} & \left( {\rm{DMP}} \right) {\rm{Ma}}{{\rm{x}}_{{\lambda, z}}}\sum\nolimits_i {z_i + \sum\nolimits_j {\lambda_j } } \cr & \qquad {\rm{s}}.{\rm{t}}.{z_i} \leq\,\,\,\sum\nolimits_j {({c_{ij }} - {\lambda_j})x_{{_{ij}}}^k,\,\,\forall i,\forall k \in {K_i}} \end{array}$$

and its dual

$$ \eqalign{ {\rm{Mi}}{{\rm{n}}_{\mu }}\sum\nolimits_{{k \in {K_i}}} {\sum\nolimits_{i,j } {{c_{ij }}x_{{_{ij}}}^k} } \mu_{_k}^{(i)}\left| {\sum\nolimits_{{k \in {K_i}}} {\sum\nolimits_i {x_{{_{ij}}}^k} } \mu_{_k}^{(i) } = 1,\forall j,} \right. \cr \,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\mu_k^i \geq 0,\sum\nolimits_{{k \in {K_i}}} {\mu_{_k}^{(i) }} = 1,\left. {\forall i} \right\}, \cr } $$

is clearly the Dantzig-Wolfe decomposition of the primal equivalent

$$ \left( {\rm{PR}} \right)\quad {\rm{Mi}}{{\rm{n}}_x}\left\{ {\sum\nolimits_{i,j } {{c_{ij }}x_{{_{ij }}}} \left| {\sum\nolimits_i {{x_{i,j }}} = 1,\,{x_{ij }} \geq 0} \right.} \right\} $$

of (LR).

Relax-and-Cut

One question that often arises in the context of Lagrangian relaxation is how to strengthen the Lagrangian relaxation bound. One possible answer is the addition of cuts that are currently violated by the Lagrangian solution. It is clear however that adding these to the Lagrangian problem will change its structure and may make it much harder to solve. One possible way out is to dualize these cuts (for a more detailed analysis, see (Guignard 1998)). Remember that dualizing does not mean discarding! The cuts will be added to the set of complicating constraints, and intuitively they will be useful only if the intersection NI (for “new intersection”) of the new relaxed polyhedron and of the convex hull of the integer solutions of the kept constraints is “smaller” than the intersection OI (for “old intersection”) of the old relaxed polyhedron and of the convex hull of the integer solutions of the kept constraints. This in turn is only possible if the new relaxed polyhedron is smaller than the old one, since the kept constraints are the same in both cases. This has the following implications. Consider a cut that is violated by the current Lagrangian solution:

  1. (1)

    if the cut is just a convex combination of the current constraints, dualized and/or kept, it cannot possibly reduce the intersection, since every point of the “old” intersection will also satisfy it; so in particular surrogate constraints of the dualized constraints cannot help.

  2. (2)

    if the cut is a valid inequality for the Lagrangian problem, then every point in the convex hull of the integer points of the kept constraints satisfies it, because every integer feasible solution of the Lagrangian subproblem does;

  3. (3)

    it is thus necessary for the cut to use “integer” information from both the dualized and the kept constraints, and to remove part of the intersection. (Remember that the Lagrangian solution is an integer point required to satisfy only the kept constraints).

A Relax-and-Cut scheme could proceed as follows:

  1. 1.

    Initialize the Lagrangian multiplier λ.

  2. 2.

    Solve the current Lagrangian problem, let x(λ) be the Lagrangian solution. If the Lagrangian dual is not solved yet, update λ. Else end.

  3. 3.

    Identify a cut that is violated by x(λ), and dualize it. Go back to 2.

The term Relax-and-Cut was first used by (Escudero et al. 1994). In that paper, a partial description of the constraint set was used, and violated constraints (not cuts) were identified, added to the model and immediately dualized. The idea, if not the name, had actually been used earlier. For instance in solving TSP problems, subtour elimination constraints were generated on the fly and immediately dualized in Balas and Christofides (1981). The usefulness of constraints is obvious, contrary to that of cuts. A missing constraint can obviously change the problem solution.

Here are examples of cuts that if dualized cannot possibly tighten Lagrangian relaxation bounds.

Non-improving Dualized Cuts: Example for the GAP

Consider again the GAP model.

If one dualizes (MC), the Lagrangian relaxation problem decomposes into one subproblem per j:

$$ \eqalign{ {{\rm (LR}_\lambda})\;{\rm{Min}}\sum\nolimits_{i,j } {{c_{ij }}} {x_{ij }} + \sum\nolimits_j {\lambda_j (1 - \sum\nolimits_i {{x_{ij }}} } ) \cr {\rm{s}}.{\rm{t}}.\,\,\,\,\,\,\,\,\,\,\sum\nolimits_j {{a_{ij }}} {x_{ij }} \leq {b_i},\quad \forall i\,\,\,\,\,\left( {\rm{KP}} \right) \cr = {\rm{Min}}\,\,\{\sum\nolimits_{i,j } {({c_{ij }}} - {\lambda_j}){x_{ij }} + {\sum\nolimits_j \lambda_j}\left| {} \right. \cr \sum\nolimits_j {{a_{ij }}} {x_{ij }} \leq {b_i},\quad \forall i,\,\,\,\,\,{x_{ij }} \in \left\{ {0,{\ 1}} \right\},\forall i,j\} \cr = \sum\nolimits_j {\lambda_j { +}\sum\nolimits_i {{\{ \rm{Min}}}}\,\,\sum\nolimits_j {({c_{ij }}} - {\lambda_j})\,\,\,{x_{ij }}\left| {} \right. \cr \sum\nolimits_j {{a_{ij }}} {x_{ij }} \leq {b_i},\quad \forall i\,{x_{ij }} \in \{ 0,1\}, \quad \left. {\forall j} \right\}. \cr } $$

Thus the ith Lagrangian subproblem is a knapsack problem for the ith machine. After solving all knapsack problems, the solution x(λ) may violate some multiple choice constraint, i.e., there may exist some j for which \( \sum\nolimits_i {} {x_{ij }} \ne 1 \), and as a consequence the condition \( \sum\nolimits_i {\sum\nolimits_j {} } {x_{ij }} = |J| \) may be violated. Adding this “cut” (it indeed cuts out the current Lagrangian solution!), and immediately dualizing it, does not reduce the intersection, as every point of the old intersection OI already satisfies all multiple choice constraints (MC), i.e., the dualized constraints.

Can kept Cuts Strengthen the Lagrangian Bound?

What happens if one keeps the cuts instead of dualizing them? It is clear that adding these to the Lagrangian problem will change its structure, but it may still be solvable rather easily. The cuts will be added to the set of easy constraints, and intuitively they will be useful only if the intersection NI (for “new intersection”) of the relaxed polyhedron and of the new convex hull of the integer solutions of the kept constraints is smaller than the intersection OI (for “old intersection”) of the relaxed polyhedron and of the old convex hull of the integer solutions of the kept constraints. This in turn is only possible if the new convex hull polyhedron is smaller than the old one, since the dualized constraints are the same in both cases.

Example 6

Consider again the GAP, and its weak Lagrangian relaxation in which the knapsack constraints (KP) are dualized. One could add to the remaining multiple choice constraints a surrogate constraint of the dualized constraints, for instance the sum of all knapsack constraints, which is obviously weaker than the original knapsack constraints. The Lagrangian problem does not decompose anymore, but its new structure is that of a multiple choice knapsack problem, which is usually easy to solve with specialized software, and much easier than the aggregate knapsack without multiple choice constraints. The above strengthening of the Lagrangian bound is simple, yet potentially powerful.

Lagrangian Heuristics and Branch-and-Price

Lagrangian relaxation provides bounds, but it also generates Lagrangian solutions. If a Lagrangian solution is feasible and satisfies complementary slackness (CS), one knows that it is an optimal solution of the IP problem. If it is feasible but CS does not hold, it is at least a feasible solution of the IP problem and one still has to determine, by BB or otherwise, whether it is optimal. Otherwise, Lagrangian relaxation generates infeasible integer solutions. Yet quite often these solutions are nearly feasible, as one got penalized for large constraints violations. There exists a very large body of literature dealing with possible ways of modifying existing infeasible Lagrangian solutions to make them feasible. Lagrangian heuristics are essentially problem dependent. Here are a few hints on how one may want to proceed. One may for instance try to get feasible solutions in the following way:

  1. (1)

    by modifying the solution to correct its infeasibilities while keeping the objective function deterioration small.

Example: in production scheduling, if one relaxes the demand constraints, one may try to change production levels (down or up) so as to meet the demand (de Matta and Guignard 1994).

  1. (2)

    by fixing (at 1 or 0) some of the meaningful decision variables according to their value in the current Lagrangian solution, and solving optimally the remaining problem. Chajakis et al. (1996) called this generic approach the lazy Lagrangian heuristic. One guiding principle may be to fix variables that satisfy relaxed constraints.

Part of the success of Lagrangian relaxation comes from clever implementations of methods for solving the Lagrangian dual, with powerful heuristic imbedded at every iteration. In many cases, the remaining duality gap, i.e., the relative percentage gap between the best Lagrangian bound found and the best feasible solution found by heuristics is sufficiently small to forego enumeration. In some instances however an optimal or almost optimal solution is desired, and a Branch-and-Bound scheme adapted to replace LP bounds by LR bounds can be used. If the Lagrangian dual is solved by column generation, the scheme is called Branch-and-Price, as new columns may need to be priced-out as one keeps branching see Desrosiers et al. 1984), (Barnhart et al., 1998). In that case, branching rules need to be carefully designed. The hope is that such schemes will converge faster than LP-based Branch-and-Bound, as bounds will normally be tighter and nodes may be pruned faster. The amount of work done at a node, though, may be substantially more than solving an LP.

Concluding Remarks

  • Lagrangian relaxation is a powerful family of tools for solving approximately integer programming problems. It provides

    • stronger bounds than LP relaxation when the problem(s) don’t have the Integrality Property.

    • good starting points for heuristic search.

  • The availability of powerful interfaces (GAMS, AMPL, etc.) and of flexible IP packages makes it possible for the user to try various schemes and to implement and test them.

  • As illustrated by the varied examples described in this paper, Lagrangian relaxation is very flexible. Often some reformulation is necessary for a really good scheme to appear.

  • It is not necessary to have special structures embedded in a problem to try to use Lagrangian schemes. If it is possible to decompose the problem structurally into meaningful components and to split them through constraint dualization, possibly after having introduced new variable expressions, it is probably worth trying.

  • Finally, solutions to one or more of the Lagrangian subproblems might lend themselves to Lagrangian heuristics, possibly followed by interchange heuristics, to obtain good feasible solutions.

Lagrangian relaxation bounds coupled with Lagrangian heuristics provide the analyst with brackets around the optimal integer value. These are usually much tighter than the brackets coming from LP-based bounds and heuristics

See

Branch and Bound

Convex Hull

Convex Optimization

Heuristics

Integer and Combinatorial Optimization

Traveling Salesman Problem