Keywords

1 Introduction

In this chapter we give a unified overview of the Lagrangian relaxation technique used in finite dimensional optimization problems with either continuous and/or discrete decision variables and show as an example of how to apply these techniques by solving some classical transportation and location models. The main purpose and goal of this chapter is to give an introduction to Lagrangian duality accessible for less mathematically oriented readers and explain its main ideas. At the same time we try to be rigorous proving the most well-known results about strong duality for the class of K-convex optimization problems by means of the simplest possible proofs. All the proofs related to strong duality will be given with one exception. We will only mention the most basic separation result in convex analysis separating a nonempty convex set from a point outside this set. In general, contrary to continuous K-convex optimization problems, no strong duality holds for integer linear programming problems. By means of elementary proofs, we show how to apply the Lagrangian relaxation technique to these problems. Another more abstract approach to Lagrangian duality given by the perturbation function approach (Frenk and Kassay, 2005; Rockafellar, 1970) requires the knowledge of conjugate and biconjugate functions well-known in convex analysis.

To show strong duality results for K-convex optimization problems, one uses in this approach the Fenchel–Moreau theorem. This theorem states that under some mild topological conditions a convex function equals its biconjugate function. Although this approach is more general and has the advantage of shortening proofs, applying directly the Fenchel–Moreau theorem the perturbation function approach hides the intuitive ideas that are shown more clearly in the Lagrangian relaxation approach. For this reason, we have chosen this approach to explain the main results in duality theory. This approach is closely related to the minmax approach used in a two-person noncooperative game with the decision maker playing against nature. In this framework strong duality is equivalent to the existence of a Nash equilibrium point (Frenk and Kassay, 2006, 2008).

In the first section we introduce the Lagrangian relaxation approach and its application to finite dimensional optimization problems. In the second section we apply this approach to (integer) linear programming problems and discuss the main relations between such an integer linear program and some of the most well-known relaxations used within this field. In the third section we consider the subgradient method to find the optimal Lagrangian multiplier in a Lagrangian dual problem. Finally in the last section we consider some classical location and transportation models and show how the Lagrangian relaxation technique results in heuristics by solving these problems.

Although no new results are discussed, this chapter introduces the reader to the main ideas of the Lagrangian relaxation technique originally introduced by Joseph-Louis Lagrange in his treatise “Leçons sur le calcul des functions” (nouvelle edition) published in 1806 (Lagrange, 1806) and revealed in a series of letters to Euler between 1754 and 1756. Joseph-Louis Lagrange applied his approach to infinite dimensional optimization problems, and his approach eventually lead to the development of optimal control theory. To prove the claims of Lagrange, the theory of convex sets and functions was later developed. Without any difficulty, one can also extend the finite dimensional approach discussed in this chapter to infinite dimensional optimization problems using the well-known Hahn–Banach theorem in functional analysis (Choquet, 1976) or the related minmax approach (Frenk Kas and Kassay, 2007).

This chapter also tries to be complete in discussing in a hopefully transparent way some of the mathematical technicalities found in the books on convex analysis. As such the rationale behind the Lagrangian relaxation approach can be seen as one of the main ideas in optimization theory. It relates a difficult constrained optimization problem to an easier solvable constrained optimization problem with fewer constraints and another objective function. In this newly selected objective function the deleted constraints of the original problem are penalized using a linear penalty function.

Developments occurring 150 years later within the field of nonlinear and linear programming like the primal–dual relations in linear programming and the well-known Karush–Kuhn–Tucker conditions in nonlinear programming are special instances of this Lagrangian relaxation approach. Also the developments of penalty-based optimization methods in nonlinear programming (Fiacco and McCormick, 1969) can be seen as an easy extension of the ideas behind this relaxation approach. These penalty methods eventually led to the development of interior point polynomial algorithms in convex programming (see page 2 of Nesterov and Nemirovski (2001)). By these observations, it seems clear that the ideas of Lagrange play a fundamental role in the development of the theory of optimization known nowadays.

2 On the Principle of Lagrange

Before discussing the principle of Lagrange for general optimization problems with continuous and/or discrete decision variables, we first introduce some definitions. If the sets \(A,B\subseteq \mathbb {R}^{n}\) are nonempty sets, then for every \(\alpha ,\beta \in \mathbb {R}\) we define the so-called Minkowski sum:

$$\displaystyle \begin{aligned} \alpha A+\beta B:=\{\alpha \mathbf{x}+\beta \mathbf{y}:\mathbf{x}\in A, \mathbf{y}\in B\}. \end{aligned} $$
(1)

Also we denote by \(\mathbb {Z}_{+}^{n}\) \(\mathbb {(R}_{+}^{n})\) the set of nonnegative integer valued (real valued) n-dimensional vectors and by \( \mathbb {Z}_{-}^{n}:=-\mathbb {Z}_{+}^{n}\) and \(\mathbb {R}_{-}^{n}:=-\mathbb {R} _{+}^{n}.\)

Definition 2.1

A nonempty set \(L\subseteq \mathbb {R} ^{n}\) is called a linear space if αL + βL ⊆ L for every \(\alpha ,\beta \in \mathbb {R}.\) A nonempty set \(M\subseteq \mathbb {R} ^{n}\) is called affine if αM + (1 − α)M ⊆ M for every \(\alpha \in \mathbb {R}\). A nonempty set \(C\subseteq \mathbb {R}^{n}\) is called convex if αC + (1 − α)C ⊆ C for every 0 < α < 1. A nonempty set \(K\subseteq \mathbb {R}^{n}\) is called a cone if αK ⊆ K for every α > 0 and it is called a pointed cone if K is a cone satisfying K ∩ (−K) = {0}. A nonempty set \(A\subseteq \mathbb {R}^{n}\) is called proper if the set A is strictly contained in \(\mathbb {R}^{n}.\)

It is easy to verify that a pointed cone is proper and a cone K is convex if and only if K + K ⊆ K. To represent feasible sets within optimization theory, we need to introduce an ordering on a set.

Definition 2.2

A binary relation ≼ on a nonempty set A is called a transitive ordering on the set A if for every x i ∈ A, i = 1, 2, 3 satisfying x 1 ≼x 2 and x 2 ≼x 3 it follows that x 1 ≼x 3. A binary relation ≼ on a nonempty set A is called a partial ordering on the set A if it is a transitive ordering and for every x ∈ A it satisfies x ≼x (reflexive property ) and for every x i ∈ A,i = 1, 2, satisfying x 1 ≼x 2 and x 2 ≼x 1 it holds that x 1 = x 2 (antisymmetry property)

For any convex cone \(K\subseteq \mathbb {R}^{n}\), we introduce the transitive ordering ≼K on the set \(\mathbb {R}^{n}\) (Boyd and Vandenberghe , 2004; Frenk and Kassay, 1999) given by

$$\displaystyle \begin{aligned} \mathbf{y}\preceq _{K}\mathbf{x}\Leftrightarrow \mathbf{x}-\mathbf{y}\in K. {} \end{aligned} $$
(2)

It follows that the transitive ordering ≼K on \(\mathbb {R}^{n}\) for any convex pointed cone K is a partial ordering on \(\mathbb {R}^{n}\). In most cases, unless otherwise specified, we will only consider orderings ≼K on \(\mathbb {R}^{n}\) with K a convex pointed cone. Consider now a nonempty set \(X\subseteq \mathbb {R}^{n}\) and \(K\subseteq \mathbb {R}^{m}\) a nonempty proper convex cone. Moreover, let \(f:\mathbb {R}^{n}\rightarrow \mathbb {R}\) be some finite valued function and \( \mathbf {g}:\mathbb {R}^{n}\rightarrow \mathbb {R}^{m}\) a finite valued vector function given by g(x) := (g 1(x), …, g m(x)) with \(g_{i}:\mathbb {R}^{n}\rightarrow \mathbb {R}\) and introduce the minimization problem:

$$\displaystyle \begin{aligned} \upsilon (P):=\inf \{f(\mathbf{x}):\mathbf{x\in \mathcal{F}}\} {} \end{aligned} $$
(P)

with

$$\displaystyle \begin{aligned} \mathcal{F}:=\{\mathbf{x}\in X:\mathbf{g}(\mathbf{x})\preceq _{K}\mathbf{0}\} {} \end{aligned} $$
(3)

the so-called feasible region. The optimization problem ( P ) is called a primal problem, and using the ordering ≼K with K a proper convex cone yields the possibility to model a lot of different optimization problems. The value υ(P) is by definition equal to if the feasible region \(\mathcal {F}\) of optimization problem ( P ) is empty. In this case optimization problem ( P ) is called infeasible. If the feasible region \(\mathcal {F}\) is nonempty, then optimization problem ( P ) is called feasible and −≤ υ(P) < . It is not assumed that an optimal feasible solution exists and so we cannot replace \(\inf \) by \(\min \). Since

$$\displaystyle \begin{aligned} -\sup \{-f(\mathbf{x}):\mathbf{x\in \mathcal{F}}\}=\inf \{f(\mathbf{x)}: \mathbf{x\in }\mathcal{F}\}, \end{aligned}$$

we also cover in this framework maximization problems. Important instances of optimization problem ( P ) are given by nonlinear optimization problems with feasible region \(\mathcal {F}\) consisting of p inequality and m − p equality constraints (Bazaraa, Sherali and Shetty, 1993; Nocedal and Wright, 2006). If this holds, the convex cone K is given by \(K=\mathbb {R}_{+}^{p}\times \{\mathbf {0}\},\mathbf {0}\in \mathbb {R} ^{m-p},0\leq p\leq m\). For p = 0, this reduces to a feasible region \(\mathcal {F}\) only consisting of m equality constraints and we obtain K = {0}. Special cases are linear programming problems (Bazaraa, Jarvis, and Sherali, 1990; Chvatal, 1983) and for X a mixed discrete–continuous set mixed integer linear programming problems (Nemhauser and Wolsey, 1988; Wolsey, 1998). Finally we also mention conic convex programming problems (Ben-Tal and Nemirovski, 2001) given by

$$\displaystyle \begin{aligned} \inf \{{\mathbf{c}}^{\top }\mathbf{x:x-b}\in L,\mathbf{x}\in K\mathbf{\}} \end{aligned} $$
(4)

with K a proper convex cone and L a linear space. In most cases it is difficult to find a feasible solution of optimization problem (P) and even more difficult (if it exists!) to find an optimal solution. A possible way to overcome this problem is to replace optimization problem ( P ) by an easier optimization problem selected in such a way that it resembles the original problem. Having this idea in mind we introduce the next definition.

Definition 2.3

An optimization problem given by

$$\displaystyle \begin{aligned} \upsilon (R):=\inf \{f_{R}(\mathbf{x}):\mathbf{x}\in \mathcal{F}_{R}\} {} \end{aligned} $$
(R)

is called a relaxation of the primal problem ( P ) if the feasible region \(\mathcal {F}_{R}\) satisfies \(\mathcal {F}\subseteq \mathcal {F}_{R}\) and f R(x) ≤ f(x) for every x belonging to \(\mathcal {F}.\)

An immediate consequence of the definition of a relaxation is given by

$$\displaystyle \begin{aligned} -\infty \leq \upsilon (\mbox{{{$R$}}})\leq \upsilon (\mbox{{{P}}})\leq \infty . {} \end{aligned} $$
(5)

An easy way to obtain a relaxation is to delete some of the difficult constraints within the feasible set \(\mathcal {F}\) given by relation (3). However, to achieve that the new constructed optimization problem resembles the original problem (P) it is in general not sufficient to delete constraints. A more refined way to achieve this goal is to delete constraints and at the same time incorporate within the objective function some penalty cost to compensate for the loss of these constraints and/or give some reward for still satisfying feasibility. To formalize this idea, we first introduce the standard inner product on \(\mathbb {R}^{n}\) given by

$$\displaystyle \begin{aligned} {\mathbf{x}}^{\intercal }\mathbf{y:}=\sum\nolimits_{i=1}^{n}x_{i}y_{i}. {} \end{aligned} $$
(6)

The most simplest way to achieve penalization of the loss of constraints is to consider a linear penalty function and introduce for some penalty parameter λ \(\in \mathbb {R}^{m}\) the problem

$$\displaystyle \begin{aligned} \inf \{f(\mathbf{x})+\boldsymbol{\lambda }^{\intercal }\mathbf{g}(\mathbf{x}): \mathbf{x}\in X\}. {} \end{aligned} $$
(7)

Clearly it follows that \(\mathcal {F}\subseteq X\) and to guarantee that

$$\displaystyle \begin{aligned} f(\mathbf{x})+\boldsymbol{\lambda }^{\intercal }\mathbf{g}(\mathbf{x})\leq f( \mathbf{x}) {} \end{aligned} $$
(8)

for every x belonging to \(\mathcal {F}\) it is sufficient and necessary to assume that \(\boldsymbol {\lambda }^{\intercal }\mathbf {g}(\mathbf {x} )\leq 0\) for every x belonging to \(\mathcal {F}\). Since we are dealing with minimization problems, a reward means that \(\boldsymbol {\lambda } ^{\intercal }\mathbf {g}(\mathbf {x})\leq 0\) for every feasible x. Due to g(x) ≼K 0 for every \(\mathbf {x} \in \mathcal {F}\) and introducing the so-called dual cone K of a set \(K\subseteq \mathbb {R}^{m}\) given by

$$\displaystyle \begin{aligned} K^{\ast }:=\{\mathbf{y}\in \mathbb{R}^{m}:{\mathbf{y}}^{\intercal }\mathbf{x} \geq 0~\text{ for}\ \text{every}\ \mathbf{x}\in K\}, {} \end{aligned} $$
(9)

this inequality certainly holds for every λ belonging to K . Also by the definition of a dual cone, the set K is a nonempty closed convex cone. A similar operation for the special case of a linear space L \(\subseteq \mathbb {R}^{m}\) is to introduce its orthogonal complement \(L^{\perp }=\{\mathbf {y}\in \mathbb {R}^{m}:{\mathbf {y}}^{\intercal }\mathbf {x}=0\) for every x ∈ L}. It is easy to show for a linear space L that

$$\displaystyle \begin{aligned} L^{\ast }=L^{\perp }. {} \end{aligned} $$
(10)

At the moment, except for finiteness of the functions f and g on their domain we do not impose any other conditions on these functions and formally introduce the minimization problem:

$$\displaystyle \begin{aligned} \theta (\boldsymbol{\lambda }):=\inf \{L(\mathbf{x},\boldsymbol{\lambda }):\mathbf{x} \in X\} {} \end{aligned} $$
(Dλ )

with the function L : X × K → [−, ) the so-called Lagrangian function given by

$$\displaystyle \begin{aligned} L(\mathbf{x},\boldsymbol{\lambda }):=f(\mathbf{x})+\boldsymbol{\lambda }^{\intercal } \mathbf{g}(\mathbf{x}). {} \end{aligned} $$
(11)

Since the set X is nonempty, it follows that −≤ θ(λ) < . The objective function θ : K → [−, ) is called the Lagrangian dual function and the vector λ the vector of Lagrangian multipliers or dual variables. To extend the definition of the function θ to \(\mathbb {R}^{m}\), we set θ(λ) = − for every λ not belonging to K . By the construction of the Lagrangian dual function, the following result is easy to verify. Implicitly it is always assumed that solving optimization problem (D(λ)) is much more easier for any λ than solving the original problem (P).

Lemma 2.1

The function \(\theta :\mathbb {R }^{m}\rightarrow \lbrack -\infty ,\infty )\) is concave satisfying θ(λ) ≤ υ( P ) for every \(\boldsymbol {\lambda }\in \mathbb {R}^{m}\).

However, even for a primal problem having an optimal solution it might happen that the optimization problem (D(λ)) does not have an optimal solution for any λ ∈ K and so θ(λ) = − for every \(\boldsymbol {\lambda } \in \mathbb {R}^{m}\). This is shown by the following example. If this happens, the convex set \(dom(\theta )=\{\boldsymbol {\lambda }\in \mathbb {R}^{m}:\theta (\boldsymbol { \lambda })>-\infty \}\) is an empty set and solving optimization problem (D(λ)) for some given λ does not give any information about the value υ(P).

Example 2.1

Consider the function \(f:\mathbb {R}\rightarrow \mathbb {R}\) given by f(x) = −x 2 if x < 0 and f(x) = x if x ≥ 0 and consider the optimization problem:

$$\displaystyle \begin{aligned} \inf\{f(x):0\leq x<\infty\}=\inf\{f(x):x\preceq_{K}0\} \end{aligned}$$

with \(K=\mathbb {R}_{-}\) and \(X=\mathbb {R}\). This problem has optimal solution x = 0, and the dual cone K is given by \(K^{\ast }=\mathbb {R}_{-}\). Constructing now the Lagrangian function, we obtain for λ ≤ 0 the Lagrangian dual function:

$$\displaystyle \begin{aligned} \theta (\lambda )=\inf \{f(x)+\lambda x:x\in \mathbb{R}\}. \end{aligned}$$

Since limx↓ f(x) + λx = − for every λ ≤ 0, it follows that θ(λ) = − for every λ ≤ 0.

We now discuss how much the Lagrangian relaxation optimization problem (D(λ)) resembles the original problem ( P ). A similar result for an optimization problem consisting only of equality constraints or equivalently K = {0} is discussed by Everett (1963).

Lemma 2.2

If x opt(λ) is an optimal solution of optimization problem:

$$\displaystyle \begin{aligned} \theta (\boldsymbol{\lambda })=\inf \{f(\mathbf{x})+\boldsymbol{\lambda }^{\intercal }\mathbf{g}(\mathbf{x}):\mathbf{x}\in X\} \end{aligned}$$

for some λ ∈ K with K a convex pointed cone, then x opt(λ) is an optimal solution of the perturbed primal problem:

$$\displaystyle \begin{aligned} \inf \{f(\mathbf{x}):\mathbf{g}(\mathbf{x})\preceq _{K}\mathbf{g}(\mathbf{x} _{opt}(\boldsymbol{\lambda })),\mathbf{x}\in X\}. \end{aligned}$$

Proof

Let x opt(λ) be an optimal solution of \(\inf \{f( \mathbf {x})+\boldsymbol {\lambda }^{\intercal }\mathbf {g}(\mathbf {x}):\mathbf {x} \in X\}\) for some λ ∈ K and consider an arbitrary x ∈ X satisfying g(x) ≼K g(x opt(λ)). We show that x opt(λ) is feasible for the perturbed primal problem and f(x opt(λ)) ≤ f(x). Since the cone is pointed and hence 0 ∈ K, the feasibility is obvious. To show f(x opt(λ)) ≤ f(x) for every x ∈ X and g(x) ≼K g(x opt(λ)), we observe for λ ∈ K that \(\boldsymbol {\lambda } ^{\intercal }(\mathbf {g}({\mathbf {x}}_{opt}(\boldsymbol {\lambda })-\mathbf {g}( \mathbf {x}))\geq 0\). This yields for λ ∈ K and x opt(λ) an optimal solution of the minimization problem (D(λ)) that

$$\displaystyle \begin{aligned} \begin{array}{lllll} f({\mathbf{x}}_{opt}(\boldsymbol{\lambda })) & = & L({\mathbf{x}}_{opt}(\mathbf{ \lambda }),\boldsymbol{\lambda })-\boldsymbol{\lambda }^{\intercal }\mathbf{g}( {\mathbf{x}}_{opt}(\boldsymbol{\lambda })) & \leq & L(\mathbf{x},\boldsymbol{\lambda } )-\boldsymbol{\lambda }^{\intercal }\mathbf{g}({\mathbf{x}}_{opt}(\boldsymbol{\lambda } )) \\ & = & f(\mathbf{x})+\boldsymbol{\lambda }^{\intercal }(\mathbf{g}(\mathbf{x})- \mathbf{g}({\mathbf{x}}_{opt}(\boldsymbol{\lambda })) & \leq & f(\mathbf{x)} \end{array} \end{aligned}$$

and we have verified the result. __

To approximate the objective value υ( P ) of a primal problem ( P ) as good as possible by means of therelaxation approach, it is therefore by Lemma 2.1 necessary (maybe not sufficient!) to maximize the dual function θ(λ) over the set K . This motivates the following definition.

Definition 2.4

For a primal problem ( P ), the optimization problem (D) given by

$$\displaystyle \begin{aligned} \upsilon (D)=\sup \{\theta (\boldsymbol{\lambda }):\boldsymbol{\lambda }\in K^{\ast }\} {} \end{aligned} $$
(D)

is called the Lagrangian dual of the primal problem ( P ).

An immediate consequence of Lemma 2.1 is given by the next result and this result is known as weak duality.

Lemma 2.3

It follows that υ(D) ≤ υ(P) ≤.

In the next example we will use the principle of Lagrange to construct the Lagrangian dual problem of a canonical linear programming problem as proposed on page 237 in Bazaraa, Sherali and Shetty (1993). In different books canonical linear programming problems are defined differently, and one can derive similar results for these linear programs.

Example 2.2

Consider the canonical linear programming problem (Bazaraa, Sherali and Shetty, 1993):

$$\displaystyle \begin{aligned} \upsilon (P):=\inf \{{\mathbf{c}}^{\intercal }\mathbf{x}:A\mathbf{x\geq b}, \mathbf{x}\geq \mathbf{0\}} \end{aligned}$$

with A some m × n matrix, \(\mathbf {c}\in \mathbb {R}^{n}\), and \( \mathbf {b}\in \mathbb {R}^{m}.\) It follows that

$$\displaystyle \begin{aligned} \upsilon (P)=\inf \{{\mathbf{c}}^{\intercal }\mathbf{x:g}(\mathbf{x})\preceq _{K}\mathbf{0,x}\geq \mathbf{0}\} \end{aligned}$$

with g(x) = b − A x and \(K=\mathbb {R} _{+}^{m}\). Hence by penalizing the constraints b − A x, we obtain for every \(\boldsymbol {\lambda }\in K^{\ast }=\mathbb {R}_{+}^{m}\) that the Lagrangian dual function \(\theta :\mathbb {R}_{+}^{m}\rightarrow \lbrack -\infty ,\infty )\) equals

$$\displaystyle \begin{aligned} \theta (\boldsymbol{\lambda })=\boldsymbol{\lambda }^{\intercal }\mathbf{b+}\inf \{( \mathbf{c-}A^{\intercal }\boldsymbol{\lambda })^{\intercal }\mathbf{x}:\mathbf{x} \geq \mathbf{0}\}=\left\{ \begin{array}{ll} \boldsymbol{\lambda }^{\intercal }\mathbf{b} & \ \quad \text{if } \ \mathbf{c-} A^{\intercal }\boldsymbol{\lambda }\geq \mathbf{0} \\ -\infty & \ \quad \text{ otherwise}\ \end{array} \right. \end{aligned}$$

This yields that the Lagrangian dual problem ( D ) is given by

$$\displaystyle \begin{aligned} \sup \{\theta (\boldsymbol{\lambda }):{\boldsymbol{\lambda }}\in \mathbb{R}^{m}\}=\sup \{{\boldsymbol{\lambda }}^{\intercal }\mathbf{b}:A^{\intercal }{\boldsymbol{\lambda }} \leq \mathbf{c},\boldsymbol{\lambda} \geq \boldsymbol{0}\}, \end{aligned}$$

and this coincides with the so-called dual linear programming of the above canonical linear programming problem mostly introduced without any explanation. To generalize this construction to conic convex problems, we first observe that for X = K

$$\displaystyle \begin{aligned} \inf \{(\mathbf{c-\lambda )}^{\top }\mathbf{x}:\mathbf{x}\in X\}= \left\{ \begin{array}{ll} 0 & \ \quad \text{ if }~\mathbf{c-\lambda }\in K^{\ast } \\ -\infty & \ \quad \text{ otherwise} \end{array} \right.{} \end{aligned} $$
(12)

Introducing the vector function g(x) = b −x and using that any linear space is a convex cone and L  = L , it follows that

$$\displaystyle \begin{aligned} \inf \{{\mathbf{c}}^{\top }\mathbf{x}:\mathbf{x-b}\in L,\mathbf{x}\in K\}=\inf \{{\mathbf{c}}^{\top }\mathbf{x:g(x})\preceq _{L}\mathbf{0,x}\in K\}. \end{aligned}$$

By a similar reasoning as in the first part of this example, using relation ( 12) we obtain that the Lagrangian dual of the above conic convex programming problem is given by \(\sup \{{\boldsymbol {\lambda }} ^{\top }\mathbf {b}:\mathbf {c-\lambda }\in K^{\ast },{\boldsymbol {\lambda }}\in L^{\perp }\}.\)

In Example 2.2 it is shown that the well-known LP-dual of a linear programming problem (Bazaraa, Sherali and Shetty, 1993) mostly introduced without any explanation can be constructed using the principle of Lagrange. A direct consequence of Lemma 2.3 for standard linear programming problems is given by the observation that υ(P) = − implies that the dual problem is infeasible. If in Lemma 2.3 we have υ(D) < υ(P), a so-called duality gap exists. As shown by the following example, such a duality gap can even occur for linear programming problems (see also page 60 of Chvatal (1983)).

Example 2.3

Consider the linear programming problem:

$$\displaystyle \begin{aligned} \inf \{-x_{1}-x_{2}:x_{1}-x_{2}\geq 1,-x_{1}+x_{2}\geq 1,\mathbf{x}\in \mathbb{R}_{+}^{2}\}. \end{aligned}$$

Clearly this optimization problem is infeasible and so υ( P ) = . Penalizing the constraints x 1 − x 2 − 1 ≥ 0 and − x 1 + x 2 − 1 ≥ 0 using the nonpositive Lagrangian multipliers λ 1 and λ 2, we obtain that the Lagrangian dual function \(\theta :R_{-}^{2}\rightarrow \lbrack -\infty ,\infty )\) is given by

$$\displaystyle \begin{aligned} \theta ({\boldsymbol{\lambda }})=\inf \{x_{1}(\lambda _{1}-\lambda _{2}-1)+x_{2}(\lambda _{2}-\lambda _{1}-1):\mathbf{x}\in \mathbb{R}_{+}^{2}\}. \end{aligned}$$

Observe now for every \({\boldsymbol {\lambda }}\in \mathbb {R}_{-}^{2}\) that λ 1 − λ 2 − 1 ≥ 0 ⇒ λ 2 − λ 1 ≤−1 and λ 2 − λ 1 − 1 ≥ 0 ⇒ λ 1 − λ 2 ≤−1 and by this observation it follows that θ(λ) = − for every \({\boldsymbol {\lambda }}\in \mathbb {R} _{-}^{2}\) or equivalently υ( D ) = −.

By Example 2.3, it is clear that one should be careful in applying the principle of Lagrange to a primal problem with an empty feasible region. Fortunately in a lot of practical problems one can decide beforehand that the feasible region \(\mathcal {F}\) is nonempty and so we only consider primal problems satisfying −≤ υ( P ) < . By Lemma 2.3, we obtain that

$$\displaystyle \begin{aligned} \upsilon (\mbox{{{$P$}}})=-\infty \Rightarrow \upsilon (\mbox{{{P}}})=\upsilon (\mbox{{{$D$}}}) {} \end{aligned} $$
(13)

and so in this case no duality gap exists and every λ ∈ K is an optimal dual solution. By relation (13) and Example 2.3, we therefore only need to check for υ(P) finite whether a duality gap exists. For the class of feasible primal problems satisfying υ( P ) > −, it is now important to decide under which conditions

$$\displaystyle \begin{aligned} \upsilon (\mbox{{{$D$}}})=\upsilon (\mbox{{{$P$}}}). \end{aligned} $$
(14)

An important issue in optimization theory is to determine the largest possible class satisfying the above equality. By identifying such a class, we are able in theory to solve an optimization problem belonging to this class using the dual approach or a combination of the dual and primal approach. The most well-known example of such an approach is the primal dual simplex method in linear programming (Chvatal, 1983). We return to this identification problem in the next section. Remember in Example 2.1 we constructed a primal optimization problem not having the property listed in relation (14). In this example an optimal solution of the primal optimization problem exists and υ(P) > υ(D) = − and the Lagrangian dual problem does not have any optimal solution. As shown in the next section, this never happens for linear programming problems or K-convex optimization problems satisfying a so-called regularity condition.

Despite the negative result as shown by Example 2.1, the Lagrangian relaxation approach is often used. This applies in particular to discrete optimization problems since for these problems mostly no strong duality holds. During the search for an upper bound on the objective value υ( P ), a number of times the optimization problem (D(λ)) for different values λ i ∈ K , i = 1, …, k is solved and we obtain the dual feasible optimal solutions x opt(λ i), 1 ≤ i ≤ k. Sometimes the dual feasible optimal solution x opt(λ i) might also be primal feasible or equivalently g(x opt(λ i)) ≼K 0. However, in most cases this does not happen and so x opt(λ i) is not primal feasible. If x opt(λ i) is not primal feasible, it is sometimes possible to construct using the primal nonfeasible x opt(λ i), i = 1…, k a primal feasible x i and we obtain a feasible candidate of our primal problem. The heuristic procedure that converts a primal nonfeasible x opt(λ) into a primal feasible x is called a Lagrangian heuristic. On the other hand, it is also possible to generate for most problems a feasible solution by means of a so-called primal heuristic. Such a primal heuristic is mostly an ad hoc procedure particularly suited for the specific problem and it does not use dual information as done by a Lagrangian heuristic. We consider some examples of this approach in location and transportation problems. One might now wonder whether it is possible to say something about the quality of the generated primal feasible solution by this Lagrangian heuristic with respect to the optimal objective value of the primal problem. This motivates the next definition.

Definition 2.5

For any 𝜖 > 0, a vector \(\mathbf { x}\in \mathbb {R}^{n}\) is called an 𝜖-optimal solution of the optimization problem \(\upsilon (P)=\inf \{f(\mathbf {x}):\mathbf {x}\in \mathcal {F}\}\) if f(x) ≤ υ(P) + 𝜖 and \(\mathbf { x\in }\mathcal {F}.\)

The next result discusses the quality of a primal feasible 𝜖 -optimal solution of the optimization problem (D(λ)). The assumption of the next lemma automatically implies that υ(P) is finite.

Lemma 2.4

If for some λ ∈ K there exists a primal feasible x 0 with x 0 an 𝜖 -optimal solution of optimization problem:

$$\displaystyle \begin{aligned} \theta (\boldsymbol{\lambda })=\inf \{f(\mathbf{x})+{\boldsymbol{\lambda }}^{\intercal }\mathbf{g}(\mathbf{x}):\mathbf{x}\in X\}, \end{aligned}$$

then

$$\displaystyle \begin{aligned} \upsilon (\mathit{\mbox{{{$P$}}}})\leq f({\mathbf{x}}_{0})\leq \upsilon (\mathit{\mbox{{{P}}}})+\epsilon -{\boldsymbol{\lambda }}^{\intercal }\mathbf{g}(\mathbf{x} _{0})<\infty . \end{aligned}$$

Proof

Since x 0 is primal feasible, we obtain by the weak duality result given by Lemma 2.3 that

$$\displaystyle \begin{aligned} \begin{array}{lll} \upsilon (\mbox{{{$P$}}}) & \leq & f({\mathbf{x}}_{0})=L({\mathbf{x}}_{0}, {\boldsymbol{\lambda }})-{\boldsymbol{\lambda }}^{\intercal }\mathbf{g}(\mathbf{x} _{0}) \\ & \leq & \theta ({\boldsymbol{\lambda }})+\epsilon -{\boldsymbol{\lambda }}^{\intercal } \mathbf{g}({\mathbf{x}}_{0})\leq \upsilon (\mbox{{{$D$}}})+\epsilon - {\boldsymbol{\lambda }}^{\intercal }\mathbf{g}({\mathbf{x}}_{0}) \\ & \leq & \upsilon (\mbox{{{$P$}}})+\epsilon -{\boldsymbol{\lambda }} ^{\intercal }\mathbf{g}({\mathbf{x}}_{0}) \end{array} \end{aligned}$$

and this shows the result. __

The next result is partly an easy implication of Lemma 2.4.

Lemma 2.5

The following conditions are equivalent:

  1. 1.

    The vector x 0 is primal feasible (g(x 0) ≼K 0), λ 0 is dual feasible (λ 0 ∈ K ), and f(x 0) = θ(λ 0).

  2. 2.

    The vector x 0 is an optimal solution of optimization problem ( P ), the Lagrangian multiplier λ 0 ∈ K is an optimal solution of the Lagrangian dual problem ( D ), and v( P ) = v( D ).

  3. 3.

    The primal feasible solution x 0 is an optimal solution of (D(λ 0)) with λ 0 ∈ K and θ(λ 0) finite and \({\boldsymbol {\lambda }} _{0}^{\intercal }\mathbf {g}({\mathbf {x}}_{0})=0\).

Proof

To prove the implication 1 → 3, we observe

$$\displaystyle \begin{aligned} f({\mathbf{x}}_{0})=\theta ({\boldsymbol{\lambda }}_{0})\leq f({\mathbf{x}}_{0})+ {\boldsymbol{\lambda }}_{0}^{\intercal }\mathbf{g}({\mathbf{x}}_{0}) {} \end{aligned} $$
(15)

and this shows \({\boldsymbol {\lambda }}_{0}^{\intercal }\mathbf {g}(\mathbf {x} _{0})\geq 0\). Since g(x 0) ≼K 0 and λ 0 ∈ K , this yields \({\boldsymbol {\lambda }} _{0}^{\intercal }\mathbf {g}({\mathbf {x}}_{0})=0\). By relation (15), we obtain \(f({\mathbf {x}}_{0})+{\boldsymbol {\lambda }}_{0}^{\intercal }\mathbf {g}(\mathbf {x} _{0})=\theta ({\boldsymbol {\lambda }} _{0})\) and so x 0 is an optimal solution of θ(λ 0) showing part 3. The implication 3 → 2 is a direct consequence of Lemma 2.4, while the implication 2 → 1 is trivial.

The condition \({\boldsymbol {\lambda }}_{0}^{\intercal }\mathbf {g}(\mathbf {x} _{0})=0 \) mentioned in part 3 of Lemma 2.5 is called the complementary slackness condition. Observe if K = {0} this complementary slackness condition is automatically satisfies for x 0 primal feasible. Unfortunately in most integer linear programming problems, there exists a duality gap unless the matrix describing the feasible region has a special structure like total unimodularity (Nemhauser and Wolsey, 1988) and so in general condition 3 in Lemma 2.5 does not hold. Observe that we are interested in condition 3 since our procedure is to solve optimization problems (D(λ)) for different values of λ ∈ K . If during this procedure we find a vector x 0 satisfying condition 3 of Lemma 2.5, then this vector x 0 is an optimal solution of our problem ( P ). In Lemma 2.5 we assume that no duality gap exists and so it is important to know for which optimization problems such a result holds. This is discussed in the next section.

2.1 K-Convex Minimization Problems and Strong Duality

In this section we consider a class of optimization problems ( P ) for which under a so-called regularity condition on the feasible region no duality gap exists. To identify these classes of optimization problems, we introduce the following class of vector valued functions (Giannessi , 1984; Wolkowicz, 1981). Observe in this definition we do not assume that K is a convex pointed cone.

Definition 2.6

Let \(X\subseteq \mathbb {R}^{n}\) be a nonempty convex set and \(K\subseteq \mathbb {R}^{m}\) is a nonempty convex cone. The vector valued function \(\mathbf {g}:X\rightarrow \mathbb {R}^{m}\) is called K-convex if and only if

$$\displaystyle \begin{aligned} \mathbf{g}(\alpha {\mathbf{x}}_{1}+(1-\alpha ){\mathbf{x}}_{2})\preceq _{K}\alpha \mathbf{g}({\mathbf{x}}_{1})+(1-\alpha )\mathbf{g}({\mathbf{x}}_{2}) {} \end{aligned} $$
(16)

for every 0 < α < 1, x 1, x 2 ∈ X and ≼K the transitive ordering introduced in relation (2).

If \(K=\mathbb {R}_{+}^{m}\) (and so a convex pointed cone), then K-convexity reduces to the classical definition of convexity. Also it is easy to check that any affine function g(x) = A x + b is K-convex for any nonempty convex pointed cone K. To relate the above definition of a K -convex function for \(K\subseteq \mathbb {R}^{m}\) a convex pointed cone to the modern definition of a convex function (Rockafellar, 1970), we introduce for any convex pointed cone \( K\subseteq \mathbb {R}^{m}\) and any vector function \(\mathbf {g}:\mathbb {R} ^{n}\rightarrow (\mathbb {R\cup \{\infty \})}^{m}\) the so-called epigraph epi K(g) given by

$$\displaystyle \begin{aligned} epi_{K}(\mathbf{g}):=\{(\mathbf{x},\mathbf{r}):\mathbf{g}(\mathbf{x})\preceq _{K} \mathbf{r}\}\subseteq \mathbb{R}^{n+m}. \end{aligned}$$

Since K is a convex pointed cone, we know that the ordering ≼K is a partial ordering. It is shown in (Frenk and Kassay, 2007) for K a convex pointed cone and \(\mathbf {h}:X\rightarrow \mathbb {R}^{m}\) some given vector function on a nonempty domain \(X\subseteq \mathbb {R}^{n}\) that the set epi K(h) is a convex set if and only if X is a convex set and \(\mathbf {h}:X\rightarrow \mathbb {R}^{m}\) is K-convex and so for K a convex pointed cone the old definition of convexity coincides with the modern definition of convexity.

Definition 2.7

The optimization problem \(\inf \{f(\mathbf {x}):\mathbf {g(x} )\preceq _{K}\mathbf {0,x}\in X\}\) with K a proper convex cone and nonempty feasible region \(\mathcal {F}\) is called a K-convex minimization problem if the set \(X\subseteq \mathbb {R}^{n}\) is convex, the function \( f:X\rightarrow \mathbb {R}\) is convex, and the function \(\mathbf {g} :X\rightarrow \mathbb {R}^{m}\) is K-convex.

In the above definition we could also have assumed that K is a convex cone. However, using this alternative definition we only added the case \(K=\mathbb {R}^{n}\), and since in this case K  = {0}, the Lagrangian dual function reduces to the original problem. At the same time, we could have imposed that the cone K is pointed (mostly assumed for K-convex optimization problems) but this condition only implies that the ordering ≼K is a partial ordering and this property is not necessary for achieving strong duality for K-convex optimization problems. Since by relation (16) the function x↦ λ g(x) is convex for λ ∈ K and any K-convex function \(\mathbf {g}:X\rightarrow \mathbb {R}^{m}\), it follows for any K -convex minimization problem that the optimization problem (D(λ)) is a convex minimization problem for every λ ∈ K . To show that for the class of K-convex minimization problems satisfying some additional constraint qualification the strong duality property holds, we need the following definition.

Definition 2.8

  1. 1.

    If \(S\subseteq \mathbb {R}^{n}\) is nonempty, then the set aff(S) is the smallest affine set containing S. This set is called the affine hull of S.

  2. 2.

    If \(S\subseteq \mathbb {R}^{n}\) and y ∈ S, then the point y is called a relative interior point of the set S if there exists some 𝜖 > 0 such that the intersection of aff(S) and the set \(\mathbf {y}+\epsilon \mathcal {B}\) with \(\mathcal {B}\) the closed Euclidean ball given by

    $$\displaystyle \begin{aligned} \mathcal{B}:=\{\mathbf{x}\in \mathbb{R}^{n}:\parallel \mathbf{x\parallel } \leq 1\} \end{aligned}$$

    and ∥ . ∥ the Euclidean norm is contained in S.

  3. 3.

    The relative interior ri(S) of a set S ⊆ R n is the collection of all relative interior points of the set S.

By the generalization of inequalities to cones, we needed the concept of K -convex functions. However, as for classical convex programming problems with equalities the property of K-convexity is stronger than is needed for the proof of the no-duality gap for K-convex optimization problems. What we need for this proof is the following implication of K-convex functions, which is weaker than K-convexity.

Lemma 2.6

If \(K\subseteq \mathbb {R}^{m}\) is a nonempty convex cone and the function g : X  R m is K-convex, then the set g(X) + ri(K) is nonempty and convex.

Proof

Since the set \(K\subseteq \mathbb {R}^{m}\) is a nonempty convex cone, it is well-known (Frenk and Kassay, 2005; Rockafellar, 1970) that ri(K) is also a nonempty convex cone and so the set g(X) + ri(K) is nonempty. To check that the set g(X) + ri(K) is convex, we need to verify for any x 1, x 2 belonging to X and 0 < α < 1 that

$$\displaystyle \begin{aligned} \alpha (\mathbf{g}({\mathbf{x}}_{1})+ri(K))+(1-\alpha )(\mathbf{g}(\mathbf{x} _{2})+ri(K))\subseteq \mathbf{g}(X)+ri(K). \end{aligned}$$

By the convexity of the set ri(K), it follows that αri(K) + (1 − a)ri(K) ⊆ ri(K) and this implies that

$$\displaystyle \begin{aligned} \alpha (\mathbf{g}({\mathbf{x}}_{1})+ri(K))+(1-\alpha )(\mathbf{g}(\mathbf{x} _{2})+ri(K))\subseteq \alpha \mathbf{g}({\mathbf{x}}_{1})+(1-\alpha )\mathbf{g} ({\mathbf{x}}_{2})+ri(K). \end{aligned}$$

By the K-convexity of the function g and relation (2), we know that

$$\displaystyle \begin{aligned} \alpha \mathbf{g}({\mathbf{x}}_{1})+(1-\alpha )\mathbf{g}({\mathbf{x}}_{2})\in \mathbf{g}(\alpha {\mathbf{x}}_{1}+(1-\alpha ){\mathbf{x}}_{2})+K {}. \end{aligned} $$
(17)

Since it is well-known that K + ri(K) ⊆ ri(K) (Frenk and Kassay, 2005; Rockafellar, 1970) and the set X is convex, we obtain by relation (17) that

$$\displaystyle \begin{aligned} \begin{array}{lll} \alpha (\mathbf{g}({\mathbf{x}}_{1})+ri(K))+(1-\alpha )(\mathbf{g}(\mathbf{x} _{2})+ri(K)) & \subseteq & \mathbf{g}(X)+K+ri(K) \\ & \subseteq & \mathbf{g}(X)+ri(K) \end{array} \end{aligned}$$

and we have verified the desired result. __

We now need a separation result to prove that under some regularity condition no duality gap exists for the class of K-convex minimization problems. However, we did not discuss what we mean by separation and so we list the following definition (Frenk and Kassay, 2005; Rockafellar, 1970).

Definition 2.9

If \(C\subseteq \mathbb {R}^{n}\) is a nonempty set and 0 ∈ R n does not belong to C, then the sets C and {0} are said to be properly separated if there exists some vector σ belonging to \(\mathbb {R}^{n}\) satisfying

$$\displaystyle \begin{aligned} \inf \{\boldsymbol{\sigma }^{\intercal }\mathbf{x}:\mathbf{x}\in C\}\geq 0 \end{aligned}$$

and for some x belonging to C it follows that \(\boldsymbol {\sigma } ^{\intercal }\mathbf {x}>0.\) The vector σ is called the normal vector of the separating hyperplane between the sets C and {0}.

By Definition 2.9, it follows that σ is not equal to the zero vector. The next theorem mentions an important separation result for an arbitrary proper convex set (Frenk and Kassay, 2005; Hiriart-Urruty and Lemaréchal, 2013; Rockafellar, 1970).

Theorem 2.1

If \(C\subseteq \mathbb {R}^{n}\) is a nonempty convex set and \( \mathbf {0}\in \mathbb {R}^{n}\) does not belong to C, then the sets C and {0} can be properly separated. Moreover, the normal vector σ of the separating hyperplane can be chosen to belong to the affine hull aff(C) of the set C.

By the above separation result, it is possible to show that any K-convex minimization problem satisfying an additional constraint qualification has no duality gap.

Theorem 2.2

If the K-convex minimization problem satisfies∞ < υ( P ) < ∞ and the vector 0 belongs to ri(g(X) + K), then the Lagrangian dual ( D ) of the K-convex optimization problem given by \(\sup \{\theta ({\boldsymbol {\lambda }}):\mathbf { \lambda }\in K^{\ast }\}\) has an optimal solution and the duality gap equals zero.

Proof

Without loss of generality, we may assume that υ( P ) = 0. Since optimization problem ( P ) is a K-convex optimization problem, we obtain that the function h : X → R m+1 given by

$$\displaystyle \begin{aligned} \mathbf{h}(\mathbf{x})=(f(\mathbf{x}),\mathbf{g}(\mathbf{x})) \end{aligned}$$

is K e-convex with K e = [0, ) × K. This implies by Lemma 2.6 that the set h(X) + ri(K e) is convex. If 0 belongs to the convex set h(X) + ri(K e), then there exists some x 0 ∈ X satisfying h(x 0) ∈ − ri(K e). Since ri(K e) = (0, ) × ri(K), we obtain f(x 0) < 0 and g(x 0) ≼K 0. This contradicts our assumption υ( P ) = 0 and so 0∉h(X) + ri(K e). Applying now Theorem 2.1, there exists some nonzero vector σ = (μ, λ 0) satisfying for every x ∈ X and k e = (r, k) with r > 0 and k ∈ ri(K)

$$\displaystyle \begin{aligned} \boldsymbol{\sigma }^{\top }(\mathbf{h}(\mathbf{x})+{\mathbf{k}}_{e})=\mu (f( \mathbf{x})+r)+{\boldsymbol{\lambda }}_{0}^{\intercal }(\mathbf{g}(\mathbf{x})+ \mathbf{k})\geq 0 {} \end{aligned} $$
(18)

with a strict inequality for at least one x 0 ∈ X, r 0 > 0 and k 0 ∈ ri(K). Also by Theorem 2.1, the component λ 0 of this vector belongs to the set aff(g(X) + K). We now show that μ > 0. Since ri(K e) is a convex cone, this implies for every α > 0 and k e = (r, k) ∈ ri(K e) that α k e ∈ ri(K e) and so we obtain by relation (18 ) for x ∈ X

$$\displaystyle \begin{aligned} \boldsymbol{\sigma }^{\top }(\mathbf{h}(\mathbf{x})+\alpha {\mathbf{k}}_{e})\geq 0. {} \end{aligned} $$
(19)

This shows by contradiction for x ∈ X fixed and letting α tend to infinity in relation (19) that σ k e ≥ 0 for every k e ∈ ri(K e) and so σ belongs to \((ri(K_{e}))^{\ast }=K_{e}^{\ast }=[0,\infty )\times K^{\ast }.\) Hence it follows that μ ≥ 0 and if we assume that μ = 0 or σ = (0, λ 0) it follows again by relation (18) that

$$\displaystyle \begin{aligned} {\boldsymbol{\lambda }}_{0}^{\intercal }(\mathbf{g}(\mathbf{x})+\mathbf{k})\geq 0 {} \end{aligned} $$
(20)

for every x ∈ X and k ∈ ri(K). Since by assumption 0 belongs to ri(g(X) + K) and λ 0 ∈ aff(g(X) + K), one can find some 𝜖 > 0 such that \(-\epsilon ^{\intercal }{\boldsymbol {\lambda }}_{0}\in \mathbf {g}(X)+K\) and this implies by relation (20) that

$$\displaystyle \begin{aligned} -\epsilon {\boldsymbol{\lambda }}_{0}^{\intercal }{\boldsymbol{\lambda }}_{0}=-\epsilon \parallel {\boldsymbol{\lambda }}_{0}\parallel ^{2}\geq 0 \end{aligned}$$

or equivalently λ 0 = 0. Hence the normal vector σ equals the null vector and this contradicts Theorem 2.1. Therefore μ > 0 and applying now relation (19), we obtain for any α > 0, r ∈ ri(K), x ∈ X and k ∈ K that

$$\displaystyle \begin{aligned} (f(\mathbf{x})+\alpha r)+\mu ^{-1}{\boldsymbol{\lambda }}_{0}^{\intercal }(\mathbf{ g}(\mathbf{x})+\mathbf{k})\geq 0 {}. \end{aligned} $$
(21)

Since for k ∈ K it follows that α k ∈ K for any α > 0, we can replace k in relation (21) by α k and letting α tend to zero we conclude from the same relation that for every x ∈ X

$$\displaystyle \begin{aligned} f(\mathbf{x})+\mu ^{-1}{\boldsymbol{\lambda }}_{0}^{\intercal }\mathbf{g}(\mathbf{x })\geq 0 \end{aligned}$$

with μ −1 λ 0 ∈ K . This shows θ(μ −1 λ 0) ≥ 0 and applying the weak duality result yields 0 = υ( D ) and μ −1 λ 0 is an optimal dual solution. __

The additional constraint qualification in Theorem 2.2 is called a Slater-type condition and as shown in the proof of Theorem 2.2 this condition is sufficient to guarantee that an optimal solution of the Lagrangian dual problem ( D ) of a general K-convex minimization problem exists. The constraint qualification is slightly stronger than the feasibility condition since

$$\displaystyle \begin{aligned} \mathcal{F}=\{\mathbf{x}\in X:\mathbf{g}(\mathbf{x})\preceq _{K}\mathbf{0}\}\text{ nonempty } \Leftrightarrow \mathbf{0\in g}(X)+K. \end{aligned}$$

and it means that the feasible region has a nonempty relative interior. At the same time, it might happen that the primal problem does not have an optimal solution. Actually the Slater-type condition is a sufficient condition to guarantee the existence of an optimal solution of the dual problem and this condition guarantees that an optimal solution of the dual problem is contained in a compact subset of K . Due to the convexity-type assumptions on the bifunction L : X × K given by \( L(\mathbf {x},\boldsymbol {\lambda })=f(\mathbf {x})+{\boldsymbol {\lambda }}^{\intercal } \mathbf {g(x})\) as listed in relation (11), the strong duality follows immediately as a corollary of Sion’s minmax theorem already proven in 1928 by Neumann (1928) using Brouwer’s fixed point theorem. This result was later rediscovered by Sion (1958) and proved using the Knaster–Kuratowski–Mazurkiewicz (KKM) lemma (George Xian-Zhi Yuan, 1999). This lemma is equivalent to Brouwer’s fixed point theorem. This means that the Lagrangian relaxation approach can be seen as a particular instance of a noncooperative two-person game in which nature plays again the decision maker. The action set of nature is given by K , and the action set of the decision maker is given by \(X\subseteq \mathbb {R}^{n}.\) The strong duality result states now that the value of this particular two-person game exists, and Sion’s minmax theorem shows under which sufficient conditions on the bifunction L this game has a value. In fact the strong duality result is an immediate consequence of the property that a convex set is connected and so it is a topological result. For more details on how to prove a generalization of Sion’s minmax theorem using a more elementary approach by contradiction and Joos method and the obvious fact that a convex set is connected, the reader should consult (Frenk and Kassay, 2006). Connectedness of a set means that any two points in such a set can be connected by a continuous curve contained within this set.

For linear programming problems given by \(\inf \{{\mathbf {c}}^{\top }\mathbf {x: }A\mathbf {x}\leq \mathbf {b},\mathbf {x}\in R^{n}\}\), one can verify by inspection of the last lemma in the simplex method (Chvatal, 1983; Saigal, 1997) used to solve the above linear programming problem that the Lagrangian dual of this linear programming problem has an optimal solution and no duality gap exists. This means for linear programming problems that we only need to assume υ ( P ) is finite and so no Slater-type condition is needed. Next to linear programming another class of important optimization problems is represented by the function \(d_{cl(K)}:\mathbb {R} ^{n}\rightarrow \mathbb {R}\) with

$$\displaystyle \begin{aligned} \begin{array}{lll} d_{cl(K)}(\mathbf{x}) & := & \inf \{\frac{1}{2}\Vert \mathbf{x}-\mathbf{y} \Vert ^{2}:\mathbf{y}\in cl(K)\} \\ & \; = & \inf \{\frac{1}{2}\Vert \mathbf{x}-\mathbf{y}\Vert ^{2}:-\mathbf{ y\preceq }_{cl(K)}\mathbf{0}\} \end{array} \end{aligned}$$

and \(K\subseteq \mathbb {R}^{n}\) a nonempty proper convex cone, ∥.∥ denoting the Euclidean norm, and cl(K) the closure of this convex cone. This optimization problem denotes the orthogonal projection of a vector on a given convex cone, and such a problem shows up in the subgradient method. It is easy to see that

$$\displaystyle \begin{aligned} d_{cl(K)}(\mathbf{x})=0\Leftrightarrow \mathbf{x}\in cl(K) {} \end{aligned} $$
(22)

and for any fixed x the above optimization problem is clearly a K-convex programming problem (even convex!) with \(f(\mathbf {y})=\frac {1}{2} \Vert \mathbf {x}-\mathbf {y}\Vert ^{2}\) and g(y) = −y and \(X=\mathbb {R}^{n}.\) It follows trivially that \(\mathbf {g}(\mathbb {R} ^{n})=\mathbb {R}^{n}\) and so the Slater-type condition reduces to \(\mathbf {0} \in ri(\mathbb {R}^{n}+cl(K))=ri(\mathbb {R}^{n})=\mathbb {R}^{n}\), which is automatically satisfied. Since (cl(K)) = K and applying Theorem 2.2 we obtain that

$$\displaystyle \begin{aligned} d_{cl(K)}(\mathbf{x})=\sup \{\theta ({\boldsymbol{\lambda }}):{\boldsymbol{\lambda }}\in K^{\ast }\} \end{aligned}$$

with

$$\displaystyle \begin{aligned} \theta ({\boldsymbol{\lambda }})=\inf \left\{\frac{1}{2}\Vert \mathbf{x}-\mathbf{y} \Vert ^{2}-{\boldsymbol{\lambda }}^{\top }\mathbf{y:y}\in \mathbb{R}^{n}\right\}. \end{aligned}$$

Since the function \(\mathbf {y}\rightarrow \frac {1}{2}\Vert \mathbf {x}- \mathbf {y}\Vert ^{2}-{\boldsymbol {\lambda }}^{\top }\mathbf {y}\) is strictly convex and differentiable on \(\mathbb {R}^{n}\), an optimal solution for the optimization problem (D(λ) is obtained by solving the necessary and sufficient first-order conditions. This implies with y opt(λ) denoting the optimal feasible solution of the optimization problem (D(λ)) that

$$\displaystyle \begin{aligned} -(\mathbf{x}-{\mathbf{y}}_{opt}({\boldsymbol{\lambda }}))-{\boldsymbol{\lambda }}=\mathbf{ 0\Leftrightarrow y}_{opt}({\boldsymbol{\lambda }})=\mathbf{x+\lambda } \end{aligned}$$

and so \(\theta ({\boldsymbol {\lambda }})=\frac {1}{2}\Vert {\boldsymbol {\lambda }}\Vert ^{2}-{\boldsymbol {\lambda }}^{\top }(\mathbf {x+\lambda })=-\frac {1}{2}\Vert {\boldsymbol {\lambda }}\Vert ^{2}-{\boldsymbol {\lambda }}^{\top }\mathbf {x.}\) This finally shows

$$\displaystyle \begin{aligned} d_{cl(K)}(\mathbf{x})=\max \left\{-\frac{1}{2}\Vert {\boldsymbol{\lambda }}\Vert ^{2}- {\boldsymbol{\lambda }}^{\top }\mathbf{x:\lambda }\in K^{\ast }\right\}. {} \end{aligned} $$
(23)

and using relation (23) the next so-called important bipolar theorem can be verified.

Theorem 2.3

If \(K\subseteq \mathbb {R}^{n}\) is a nonempty convex cone, then it follows that cl(K) = K ∗∗ with K ∗∗ := (K ).

Proof

If x belongs to K, then by the definition of K it follows for every λ ∈ K that \({\boldsymbol {\lambda }} ^{\intercal }\mathbf {x}\geq 0\) and this shows that x belongs to K ∗∗. Hence K ⊆ K ∗∗ and since K ∗∗ is closed we obtain cl(K) ⊆ K ∗∗. To prove the reverse inclusion, let x belong to K ∗∗. By the definition of K ∗∗, it follows that x λ ≥ 0 for every λ ∈ K and this shows \(-\frac {1}{2}\Vert {\boldsymbol {\lambda }}\Vert ^{2}-{\boldsymbol {\lambda }}^{\top }\mathbf {x\leq }-\frac {1 }{2}\Vert {\boldsymbol {\lambda }}\Vert ^{2}\) for every λ ∈ K . Since 0 belongs to K , this implies by relation (23) that

$$\displaystyle \begin{aligned} 0\leq d_{cl(K)}(\mathbf{x})\leq \max \left\{ -\frac{1}{2}\Vert \mathbf{ \lambda }\Vert ^{2}:{\boldsymbol{\lambda }}\in K^{\ast }\right\} =0 \end{aligned}$$

and hence by relation (22) we obtain x ∈ cl(K). _

A very special case of the bipolar theorem is the so-called lemma of Farkas known in linear programming, and this result can be used to show (Saigal, 1997) that any linear programming problem with a finite optimal objective function value satisfies the strong duality property. By the above bipolar theorem, it is easy to give a univariate characterization of K -convex functions in case K is a closed convex cone.

Lemma 2.7

For a closed convex cone K, it follows that the function \(\mathbf {x} \rightarrow {\boldsymbol {\lambda }}^{\intercal }\mathbf {h}(\mathbf {x})\) is convex on the convex set X for every λ K if and only if h is K-convex.

An important implication of Theorem 2.2 and Lemma 2.7 is given by the following result. Observe a strictly convex function is a convex function with the inequalities in the definition replaced by strict inequalities for any strict convex combination of any two vectors.

Lemma 2.8

If the objective function f in the K-convex optimization problem

$$\displaystyle \begin{aligned} \inf\{f(\mathbf{x}):\mathbf{g(x})\preceq_{K}\mathbf{0,x}\in X\mathbf{\}} \end{aligned}$$

is strictly convex and 0 ∈ ri(g(X) + K) and this optimization problem has an optimal solution, then the Lagrangian dual problem has an optimal solution λ opt and the optimal solution x opt(λ opt) of the optimization problem D(λ opt) exists and is unique and coincides with the unique optimal solution of the primal problem.

Proof

Since it is assumed that the primal K-convex optimization problem (P) has an optimal solution and the objective function f is strictly convex, the optimal solution of the problem (P) denoted by x opt is unique. By Theorem 2.2, we know that an optimal solution λ opt ∈ K of the Lagrangian dual problem exists and it satisfies

$$\displaystyle \begin{aligned} f({\mathbf{x}}_{opt})=\upsilon(P)=\upsilon(D)=\theta(\boldsymbol{\lambda}_{opt}). \end{aligned}$$

This shows that condition 1 of Lemma 2.5 is satisfied and again by Lemma 2.5 the primal feasible solution x opt is an optimal solution of D(λ 0). Since by Lemma 2.7 we obtain that the function \(\mathbf {x}\rightarrow \boldsymbol {\lambda }_{opt}^{\intercal }\mathbf {g(x)}\) is convex on X and hence the function \(\mathbf {x}\rightarrow f(\mathbf {x})+\boldsymbol {\lambda }^{\intercal }\mathbf {g(x)}\) is strictly convex, the optimization problem D(λ opt) has a unique optimal solution and since x opt is such an optimal solution we obtain the desired result.

From a computational point of view, the following theorem for K-convex optimization problems is important and is a direct consequence of Lemma 2.5 and Theorem 2.2.

Theorem 2.4

If the optimization problem ( P ) is a K-convex minimization problem and 0 belongs to ri(g(X) + K), then the vector x 0 is an optimal solution of ( P ) if and only if for some optimal dual variable λ 0 ∈ K it follows that

$$\displaystyle \begin{aligned} {\mathbf{x}}_{0}=\mathit{\text{argmin}}\{f(\mathbf{x})+{\boldsymbol{\lambda }}_{0}^{\intercal }\mathbf{g}(\mathbf{x}):\mathbf{x}\in X\},{\boldsymbol{\lambda }}_{0}^{\intercal } \mathbf{g}({\mathbf{x}}_{0})=0\mathit{\text{ and }}-\mathbf{g}({\mathbf{x}}_{0})\in K. \end{aligned}$$

Proof

Since ( P ) is a K-convex optimization problem and 0 ∈ ri(g(X) + K), it follows by Theorem 2.2 that there exists an optimal dual solution λ 0 and υ( P ) = υ( D ). Hence condition 2 of Lemma 2.5 is satisfied and this yields by 2 → 3 of Lemma 2.5 that x 0 satisfies the above system of equations. The reverse implication is also a direct consequence of Lemma 2.5.

In the next result the so-called Karush–Kuhn–Tucker conditions for K-convex programs with differentiable functions are given and this result is an immediate consequence of Theorem 2.4. It also shows that the Karush–Kuhn–Tucker vector is an optimal solution of the Lagrangian dual problem.

Theorem 2.5

If the optimization problem ( P ) is a K-convex optimization problem consisting of differentiable functions and the convex set X equals \(\mathbb {R}^{n}\) , then under the regularity condition 0 ∈ ri(g(X) + K) the vector x 0 is an optimal solution of ( P ) if and only if for some optimal dual variable λ ∈ K it follows that

$$\displaystyle \begin{aligned} \nabla f({\mathbf{x}}_{0})+\sum\nolimits_{i=1}^{n}\lambda _{i}\nabla g_{i}( {\mathbf{x}}_{0})=\mathbf{0},{\boldsymbol{\lambda }}^{\intercal }\mathbf{g}(\mathbf{x }_{0})=0\mathit{\text{ and }}-\mathbf{g}({\mathbf{x}}_{0})\in K. \end{aligned}$$

Proof

By Theorem 2.4, the vector x 0 is an optimal solution of ( P ) if and only if for some optimal dual variable λ ∈ K we obtain

$$\displaystyle \begin{aligned} {\mathbf{x}}_{0}=\text{argmin}\{f(\mathbf{x})+{\boldsymbol{\lambda }}^{\intercal } \mathbf{g}(\mathbf{x}):\mathbf{x}\in \mathbb{R}^{n}\},{\boldsymbol{\lambda }} ^{\intercal }\mathbf{g}({\mathbf{x}}_{0})=0,-\mathbf{g}({\mathbf{x}}_{0})\in K. \end{aligned}$$

Since the optimization problem ( P ) is a differentiable K-convex optimization problem, it follows that the function \(\mathbf {x}\rightarrow f( \mathbf {x})+{\boldsymbol {\lambda }}^{\intercal }\mathbf {g}(\mathbf {x})\) is convex and differentiable on \(\mathbb {R}^{n}\) and so the result follows by the sufficient and necessary first-order conditions for an unconstrained convex optimization problem.

If we consider a K-convex optimization problem satisfying Slater’s constraint qualification and having an optimal solution and we identified an optimal dual solution, then by a similar proof as in the first part of Lemma 2.8 we can only show that the optimal solution set

$$\displaystyle \begin{aligned} \arg \min \{f(\mathbf{x})+{\boldsymbol{\lambda }}^{\intercal }\mathbf{g}(\mathbf{x} ):\mathbf{x}\in X\} \end{aligned}$$

contains a primal feasible optimal solution satisfying the complementary slackness condition. However, the optimal solution set of the optimization problem (D(λ)) can contain more than one element and so it might be computationally difficult to identify this primal feasible element satisfying the complementary slackness condition. The implication of Theorem 2.5 is that for differentiable K-convex minimization problems satisfying the Slater-type condition the Karush–Kuhn–Tucker vector λ completely coincide with the optimal dual variables and so the Karush–Kuhn–Tucker conditions for K-convex programs are actually duality results. These conditions are also known in linear programming as primal–dual relations. We show in the next section the relation between Lagrangian relaxation and linear programming relaxation applied to an integer linear programming problem.

2.2 On Integer Linear Programming and Lagrangian and LP Relaxations

In this section we consider for D some p × n matrix and A some m × n matrix the integer linear programming problem:

$$\displaystyle \begin{aligned} \inf \{{\mathbf{c}}^{\top }\mathbf{x:x}\in \mathcal{F}_{INT}\} {} \end{aligned} $$
(INT)

with

$$\displaystyle \begin{aligned} \mathcal{F}_{INT}:=\{\mathbf{x}\in \mathbb{Z}_{+}^{n}:D\mathbf{x}\leq \mathbf{d},A\mathbf{x}=\mathbf{b}\} \end{aligned}$$

a nonempty feasible region and apply the Lagrangian relaxation approach to this problem. In the above representation it is assumed that the optimization problem \( \inf \{{\mathbf {h}}^{\top }\mathbf {x:A\mathbf {x}=\mathbf {b}},\mathbf {x}\in \mathbb {Z}_{+}^{n}\mathbf {\}}\) is relatively easy to solve for any \(\mathbf {h }\in \mathbb {R}^{n}\) by means of a polynomially bounded algorithm, while the original problem ( INT ) is extremely difficult to solve. Due to this assumption, we penalize in optimization problem ( INT ) the constraint D x d ≤0 and so the Lagrangian dual problem (sometimes called the partial dual problem) is given by

$$\displaystyle \begin{aligned} \sup \{\theta ({\boldsymbol{\lambda }}):{\boldsymbol{\lambda }}\geq \mathbf{0}\} {} \end{aligned} $$
(D)

with

$$\displaystyle \begin{aligned} \theta ({\boldsymbol{\lambda }})=-{\boldsymbol{\lambda }}^{\top }\mathbf{d}+\inf \{( \mathbf{c}+D^{\top }{\boldsymbol{\lambda }})^{\top }\mathbf{x}:A\mathbf{x=b}, \mathbf{x}\in \mathbb{Z}_{+}^{n}\}. {} \end{aligned} $$
(24)

Also, a relaxation of optimization problem ( INT ) is given by the optimization problem:

$$\displaystyle \begin{aligned} \inf \{{\mathbf{c}}^{\top }\mathbf{x:}D\mathbf{x}\leq \mathbf{d},\mathbf{x}\in conv(X)\} {} \end{aligned} $$
(LPR)

with conv(X) denoting the smallest convex set containing \(X=\{\mathbf {x} \in \mathbb {Z}_{+}^{n}:A\mathbf {x}=\mathbf {b}\}\). The set conv(X) (also called the convex hull of the set X) can be completely characterized by linear inequalities and so the optimization problem ( LPR ) is actually a linear programming problem. Unfortunately it is not possible to generate the linear inequality description of the set conv(X) and so optimization problem ( LPR ) cannot be solved by a linear programming solver. Remember we only used a relaxation of the original problem if this relaxation can be easily solved and for problem ( LPR ) this is certainly not true. A relaxation of problem ( INT ) that can be solved by a linear programming solver is given by the standard LP relaxation:

$$\displaystyle \begin{aligned} \inf \{{\mathbf{c}}^{\top }\mathbf{x:}D\mathbf{x}\leq \mathbf{d},A\mathbf{x}= \mathbf{b},\mathbf{x}\geq 0\} {} \end{aligned} $$
(SLPR)

of optimization problem ( INT ). In the next result we relate the optimal objective function values of the above optimization problems.

Lemma 2.9

If \(\mathcal {F}_{INT}\) is nonempty, then

$$\displaystyle \begin{aligned} \upsilon (\mbox{{{$SLPR$}}})\leq \upsilon (\mbox{{{$D$}}} )=\upsilon (\mbox{{{$LPR$}}})\leq \upsilon (\mbox{{{$INT$}}}). \end{aligned}$$

Proof

The inequalities υ(SLPR) ≤ υ(LPR) ≤ υ(INT) are obvious and so we only need to verify that υ(D) = υ(LPR). As observed, the optimization problem ( LPR ) is a linear programming problem and if υ(LPR) is finite, we obtain by strong duality and the observation after Theorem 2.2 that \(\upsilon (\mbox{{{LPR}}})=\sup \nolimits _{{\boldsymbol {\lambda }}\geq \mathbf { 0}}\{\theta (\boldsymbol {\lambda })\}\) with

$$\displaystyle \begin{aligned} \theta (\boldsymbol{\lambda })=-{\boldsymbol{\lambda }}^{\intercal }\mathbf{d}+\inf \{( \mathbf{c+}D^{\top }\boldsymbol{\lambda })^{\top }\mathbf{x}:\mathbf{x}\in conv(X)\}. \end{aligned}$$

Since the function x → (c + D λ) x is linear on conv(X), it follows that

$$\displaystyle \begin{aligned} \begin{array} [c]{lll} \inf\nolimits_{\mathbf{x}\in conv(X)}\{(\mathbf{c}+D^{\intercal}{\boldsymbol{\lambda }})^{\intercal}\mathbf{x\}} & = & \inf\nolimits_{\mathbf{x}\in X}\{(\mathbf{c}+D^{\intercal}\boldsymbol{\lambda})^{\intercal}\mathbf{x\} }\\ & = & \inf\{(\mathbf{c}+D^{\intercal}\boldsymbol{\lambda})^{\intercal} \mathbf{x:}A\mathbf{x}=\mathbf{b},\mathbf{x}\in\mathbb{Z}_{+}^{n}\mathbf{\}} \end{array} \end{aligned}$$

and this shows by relation (24) the desired result. __

In Lemma 2.9 it might happen that the optimal objective function value υ(D) is strictly above υ(SLPR) and in this case we obtain a stronger lower bound on υ( INT ). Hence it might be computationally more efficient to compute υ(D) and use this approach in a branch and bound procedure. In general it is computationally easier to compute υ(SLPR) instead of υ(D) but due to the lower bound difference computing υ(D) might be more efficient in branching in a classical branch and bound procedure. However, under some conditions the values υ(SLPR) and υ(D) are equal, and this happens if the optimization problem satisfies the so-called integrability property. Observe in this case all the lower bounds discussed in Lemma 2.9 are the same.

Definition 2.10

The optimization problem \(\upsilon (R)=\inf \{\mathbf {h} ^{\top }\mathbf {x:}A\mathbf {x}=\mathbf {b,x}\in \mathbb {Z}_{+}^{n}\mathbf {\}}\) satisfies the so-called integrability property if \(\upsilon (R)=\inf \{\mathbf { h}^{\top }\mathbf {x:}A\mathbf {x}=b,\mathbf {x}\geq \mathbf {0}\}\) for every vector h.

A sufficient condition for the integrability property is that the matrix A is totally unimodular (Schrijver (1998)), and these matrices are extremely important in polyhedral combinatorics. For sufficient conditions to check whether a matrix is totally unimodular, consult Schrijver (1998)). These matrices appear a lot in network flow problems. By a similar argument as used in Lemma 2.9, one can now show the following result.

Lemma 2.10

If \(\mathcal {F}_{INT}\) is nonempty and the optimization problem \(\inf \{{\mathbf {h}}^{\top }\mathbf {x:}A\mathbf {x}=\mathbf {b,x}\in \mathbb {Z}_{+}^{n}\mathbf {\}}\) satisfies the integrability property, then υ(SLPR) = υ(D) = υ(LPR).

Proof

By penalizing the constraints D x ≤d in problem ( SLPR ) and using the integrability property, we obtain by the strong duality result that \(\upsilon (\mbox{{{SLPR}}})=\sup \nolimits _{{\boldsymbol {\lambda }}\geq \mathbf {0}}\{\theta (\boldsymbol {\lambda })\}\) with

$$\displaystyle \begin{aligned} \begin{array}{lll} \theta (\boldsymbol{\lambda }) & = & -{\boldsymbol{\lambda }}^{\top }\mathbf{d}+\inf \{(\mathbf{c}+D^{\top }{\boldsymbol{\lambda }})^{\top }\mathbf{x}:A\mathbf{x}= \mathbf{b,x}\geq \mathbf{0}\}\\ & = & -{\boldsymbol{\lambda }}^{\top }\mathbf{d}+\inf \{(\mathbf{c}+D^{\top }\mathbf{\lambda })^{\top }\mathbf{x}:A\mathbf{x}=\mathbf{b,x}\in \mathbb{Z} _{+}^{n}\} \end{array} \end{aligned}$$

and the desired result follows. __

By the above observation, it follows that the Lagrangian dual has the same optimal objective function value as the standard linear programming relaxation of ( INT ) and so it seems useless to solve the Lagrangian dual. However, from a computational point of view it is sometimes more efficient to solve the Lagrangian dual than to use a standard linear programming package to solve the standard linear programming relaxation of ( INT ). Hence to solve the Lagrangian dual or the standard linear programming relaxation depends on the problem under consideration. This concludes our discussion of the principle of Lagrange for finite dimensional optimization problems. Until now we only dealt in general with the principle of Lagrange and we did not discuss how to optimize the Lagrangian dual function. This is the topic of the next section.

2.3 On Subgradient Optimization and the Dual Problem

Until now we only dealt with the general structure of a Lagrangian dual problem and we did not discuss how to solve the Lagrangian dual problem \( \sup \{\theta ({\boldsymbol {\lambda }}):{\boldsymbol {\lambda }}\in K^{\ast }\}\) with K a closed convex cone and the Lagrangian dual function θ : K → [−, ) given by

$$\displaystyle \begin{aligned} \theta ({\boldsymbol{\lambda }})=\inf \{f(\mathbf{x})+{\boldsymbol{\lambda }}^{\intercal }\mathbf{g}(\mathbf{x}):\mathbf{x}\in X\} {}. \end{aligned} $$
(25)

Since the set X ⊆ R n is nonempty, it follows −≤ θ(λ) < . Moreover, by Lemma 2.1 the function − θ is convex and

$$\displaystyle \begin{aligned} -\sup \{\theta ({\boldsymbol{\lambda }}):{\boldsymbol{\lambda }}\in K^{\ast }\}=\inf \{-\theta ({\boldsymbol{\lambda }}):{\boldsymbol{\lambda }}\in K^{\ast }\} . {} \end{aligned} $$
(−D)

The optimization problem (D ) has the same optimal solution as optimization problem ( D ) if an optimal solution exists and without loss of generality we always assume that the set dom(−θ) := {θ ∈ K  : θ(λ) > −} is nonempty. In most practical cases the nonemptyness of the set dom(−θ) can be easily verified (look at location problems!). It is now easy to check that

$$\displaystyle \begin{aligned} -\infty \leq \upsilon (\mbox{{{$-D$}}})<\infty \Leftrightarrow \text{ dom}(-\theta )\text{ is}\ \text{nonempty.} \end{aligned}$$

Observe that the function − θ : K → (−, ] is in general not differentiable. This can be easily checked by means of a picture if we assume for simplicity that the set X consists of two elements. Also the optimization problem (D ) is an unconstrainedconvex programming problem in case K = {0 } and a constrained convex programming problem otherwise. Hence in principle we can apply to this problem the subgradient optimization procedure (Hiriart-Urruty and Lemaréchal, 2013) if it is possible to compute a 𝜖- subgradient with 𝜖 ≥ 0 of the function − θ : K → (−, ] at every point λ ∈ K . Observe that the set of 𝜖-subgradients of the function − θ at the point λ ∈ K 0 is called the 𝜖-subgradient set and this set is denoted by 𝜖(−θ)(λ). In case 𝜖 equals 0 a 0-subgradient can be seen as a generalization of a gradient for a differentiable function.

Definition 2.11

A vector \({\boldsymbol {\lambda }}_{0}^{\ast }\in \mathbb {R}^{n}\) is called a 𝜖-subgradient for some 𝜖 ≥ 0 of the function f : K → (−, ] at the point λ 0 ∈ K if for every λ ∈ K it follows that

$$\displaystyle \begin{aligned} f({\boldsymbol{\lambda }})\geq f({\boldsymbol{\lambda }}_{0})+({\boldsymbol{\lambda }}-\mathbf{ \lambda }_{0})^{\intercal }{\boldsymbol{\lambda }}_{0}^{\ast }-\epsilon. \end{aligned}$$

It is called a subgradient if 𝜖 = 0.

In general it is difficult to compute a 𝜖-subgradient for an arbitrary finite valued convex function. However, for the function − θ a 𝜖-subgradient for this function at a given point λ ∈ K is easy to compute by solving optimization problem (D(λ)). This is shown in the next lemma and can be easily proved.

Lemma 2.11

If for some λ 0 ∈ K the optimization problem:

$$\displaystyle \begin{aligned} \theta ({\boldsymbol{\lambda }}_{0}):=\inf \{f(\mathbf{x})+{\boldsymbol{\lambda }} _{0}^{\intercal }\mathbf{g}(\mathbf{x}):\mathbf{x}\in X\} \end{aligned}$$

has an optimal solution x opt(λ 0), then it follows thatg(x opt(λ 0)) is a subgradient of the function θ at λ 0. Moreover, if the feasible solution x 𝜖(λ 0) is an 𝜖 -optimal solution of the optimization problem (D(λ 0)), then the vectorg(x 𝜖(λ 0)) is a 𝜖-subgradient of the function θ at λ 0.

Observe that we introduce 𝜖-subgradients since in general one cannot perform exact calculations on a computer and so in most cases we actually are calculating a 𝜖-optimal solution. Solving the optimization problem (D(λ)) with λ = λ 0 and obtaining the optimal solution x( λ 0) yield both the value − θ(λ 0) and the affine function \( {\boldsymbol {\lambda }}\rightarrow \theta ({\boldsymbol {\lambda }}_{0})-(\mathbf { \lambda }-{\boldsymbol {\lambda }}_{0})^{\intercal }\mathbf {g}({\mathbf {x}}_{opt}(\boldsymbol {\lambda }_{0}))\). By Lemma 2.11, the above affine function serves as a lower bound on the convex function − θ on K .

We now discuss in detail the subgradient method applied to the optimization problem:

$$\displaystyle \begin{aligned} \inf \{f({\boldsymbol{\lambda }}):{\boldsymbol{\lambda }}\in D\} {} \end{aligned} $$
(C)

with \(D\subseteq \mathbb {R}^{n}\) a closed convex set and \(f:\mathbb {R} ^{n}\rightarrow (-\infty ,\infty ]\) with f(λ) <  for some λ ∈ D. By our assumption, it follows that −≤ υ(C) < . Moreover, for this problem we assume that for every λ ∈ D a 𝜖-subgradient of f at λ can be computed and by Lemma 2.11 this condition is clearly satisfied by our optimization problem \(\inf \{-\theta ({\boldsymbol {\lambda }}): {\boldsymbol {\lambda }}\in K^{\ast }\}\) selecting an optimal Lagrangian multiplier. To introduce the subgradient method, we observe the following. If λ opt is the unknown optimal solution of optimization problem ( C ) and we apply an algorithm to find this optimal solution, this algorithm generates a sequence {λ n : n ∈ N}⊆ D. It is desirable that at each next step of this algorithm the newly generated point λ n+1 is closer to λ opt, then the presently generated point λ n. This means that the inequality:

$$\displaystyle \begin{aligned} \parallel {\boldsymbol{\lambda }}_{n+1}-{\boldsymbol{\lambda }}_{opt}\parallel ^{2}<\parallel {\boldsymbol{\lambda }}_{n}-{\boldsymbol{\lambda }}_{opt}\parallel ^{2} {} \end{aligned} $$
(26)

with

$$\displaystyle \begin{aligned} \parallel {\boldsymbol{\lambda }}\parallel ^{2}:=\sum\nolimits_{i=1}^{n}\lambda _{i}^{2} \end{aligned}$$

the squared Euclidean norm should hold for our algorithm. To achieve this, suppose that we are after n steps of our iteration procedure at the vector λ n and some oracle supplies us with a search direction d n. If the new point λ n+1 is given by

$$\displaystyle \begin{aligned} {\boldsymbol{\lambda }}_{n+1}:={\boldsymbol{\lambda }}_{n}-t_{n}{\mathbf{d}}_{n} \end{aligned}$$

for some t n > 0, then it follows that

$$\displaystyle \begin{aligned} \parallel {\boldsymbol{\lambda }}_{n+1}-{\boldsymbol{\lambda }}_{opt}\parallel ^{2}=\parallel {\boldsymbol{\lambda }}_{n}-{\boldsymbol{\lambda }}_{opt}\parallel ^{2}+t_{n}^{2}\parallel {\boldsymbol{\lambda }}_{opt}\parallel ^{2}+2t_{n}\mathbf{d }_{n}^{\intercal }({\boldsymbol{\lambda }}_{\text{opt}}-{\boldsymbol{\lambda }}_{n}). {} \end{aligned} $$
(27)

To guarantee that relation (26) holds, it is necessary (maybe not sufficient!) to choose the direction d n in such a way that

$$\displaystyle \begin{aligned} {\mathbf{d}}_{n}^{\intercal }({\boldsymbol{\lambda }}_{opt}-{\boldsymbol{\lambda }} _{n})\leq 0 {}. \end{aligned} $$
(28)

If d n is an 𝜖-subgradient of the function f at λ n for some 𝜖 ≥ 0 (remember for 𝜖 = 0 the 𝜖-subgradient is actually a subgradient!), this implies

$$\displaystyle \begin{aligned} {\mathbf{d}}_{n}^{\intercal }({\boldsymbol{\lambda }}_{opt}-{\boldsymbol{\lambda }} _{n})\leq f({\boldsymbol{\lambda }}_{opt})-f({\boldsymbol{\lambda }}_{n})+\epsilon \leq \epsilon {} \end{aligned} $$
(29)

and since it is necessary that relation (28) holds it follows by relation (29) that a hopefully good choice of the direction d n is selecting a 𝜖-subgradient of f at λ n. By this observation, we obtain the usual subgradient iteration scheme given by the following algorithm.

Algorithm 2.1

Subgradient optimization scheme:

$$\displaystyle \begin{aligned} {\boldsymbol{\lambda }}_{n+1}:={\boldsymbol{\lambda }}_{n}-t_{n}{\boldsymbol{\lambda }} _{n}^{\ast }\text{ ,}t_{n}>0\text{ and }{\boldsymbol{\lambda }}_{n}^{\ast }\text{ a } \epsilon \text{-subgradient}\ \text{of}\ f\text{ at }{\boldsymbol{\lambda }}_{n}. \end{aligned}$$

However, using the above formula might create problems for constrained optimization problems. It can happen that the point λ n belongs to D and λ n+1 does not belong to D and so the point λ n+1 becomes infeasible. A way to solve this is to use the orthogonal projection of λ n+1 onto the closed convex set D. This means that the modified subgradient iteration scheme has the form:

Algorithm 2.2

Modified subgradient iteration scheme:

$$\displaystyle \begin{aligned} {\boldsymbol{\lambda }}_{n+1}=P_{D}({\boldsymbol{\lambda }}_{n}-t_{n}{\boldsymbol{\lambda }} _{n}^{\ast }),t_{n}>0\text{ and }{\boldsymbol{\lambda }}_{n}^{\ast }\text{ a } \epsilon \text{-subgradient}\ \text{of }f\text{ at }{\boldsymbol{\lambda }}_{n} {} \end{aligned} $$
(30)

with

$$\displaystyle \begin{aligned} P_{D}({\mathbf{x}}_{0}):=\arg \min \{\parallel {\mathbf{x}}_{0}-{\boldsymbol{\lambda }} \parallel ^{2}:{\boldsymbol{\lambda }}\in D\},{\mathbf{x}}_{0}\text{ fixed}\ \text{vector} \end{aligned}$$

and argmin{Q} denoting the optimal solution of optimization problem (Q).

Since the function λ →∥x 0 −λ ∥2 for x 0 fixed is strictly convex and D is a closed nonempty convex set, there exists a unique optimal solution and so the vector P D(x 0) is well-defined or equivalently x → P D(x) represents a function. Moreover, we mention without proof (check first-order conditions of the above constrained optimization problem!) that the function x → P D(x) is a so-called contraction and this means that

$$\displaystyle \begin{aligned} \parallel P_{D}({\mathbf{x}}_{0})-P_{D}({\mathbf{x}}_{1})\parallel \leq \parallel {\mathbf{x}}_{0}-{\mathbf{x}}_{1}\parallel {} \end{aligned} $$
(31)

for every x 0 and x 1 belonging to \(\mathbb {R}^{n}\). For our primal optimization problem, it almost always follows that \(K=\mathbb {R} _{+}^{m}\times \mathbb {R}^{n-m}\)and so \( K^{\ast }=\mathbb {R}_{+}^{m}\times \{\mathbf {0}\}.\) This means that we can solve the projection by hand (make a picture!). It is now possible to show the following fundamental inequality (Correa and Lemaréchal , 1993).

Lemma 2.12

If λ ∈ D and \({\boldsymbol {\lambda }} _{n}^{\ast }\) is an 𝜖 n -subgradient of the function f at the present iteration point λ n for some 𝜖 n ≥ 0 and \({\boldsymbol {\lambda }}_{n+1}=P_{D}({\boldsymbol {\lambda }}_{n}-t_{n}\mathbf { \lambda }_{n}^{\ast })\) , then it follows that

$$\displaystyle \begin{aligned} \parallel {\boldsymbol{\lambda }}_{n+1}-{\boldsymbol{\lambda }}\parallel ^{2}\leq \parallel {\boldsymbol{\lambda }}_{n}-{\boldsymbol{\lambda }}\parallel ^{2}+t_{n}^{2}\parallel {\boldsymbol{\lambda }}_{n}^{\ast }\parallel ^{2}+2t_{n}(f( {\boldsymbol{\lambda }})-f({\boldsymbol{\lambda }}_{n})+\epsilon _{n}). \end{aligned}$$

Proof

Since obviously P D(λ) = λ for λ ∈ D, it follows by relation (30) that

$$\displaystyle \begin{aligned} \parallel {\boldsymbol{\lambda }}_{n+1}-{\boldsymbol{\lambda }}\parallel ^{2}=\parallel P_{D}({\boldsymbol{\lambda }}_{n}-t_{n}{\boldsymbol{\lambda }}_{n}^{\ast })-P_{D}( {\boldsymbol{\lambda }})\parallel ^{2} \end{aligned}$$

and this implies by relation (31) that

$$\displaystyle \begin{aligned} \parallel {\boldsymbol{\lambda }}_{n+1}-{\boldsymbol{\lambda }}\parallel ^{2}\leq \Vert {\boldsymbol{\lambda }}_{n}-{\boldsymbol{\lambda }}\parallel ^{2}+t_{n}^{2}\parallel {\boldsymbol{\lambda }}_{n}^{\ast }\parallel ^{2}+2t_{n}(\boldsymbol{\lambda -\lambda } _{n})^{\intercal }{\boldsymbol{\lambda }}_{n}^{\ast } {}. \end{aligned} $$
(32)

Since \({\boldsymbol {\lambda }}_{n}^{\ast }\) is an 𝜖 n-subgradient of the function f at λ n, we obtain that \(f(\mathbf { \lambda })-f({\boldsymbol {\lambda }}_{n})\geq ({\boldsymbol {\lambda }}-{\boldsymbol {\lambda }} _{n})^{\intercal }{\boldsymbol {\lambda }}_{n}^{\ast }-\epsilon _{n}\) and this implies by relation (32) that

$$\displaystyle \begin{aligned} \parallel {\boldsymbol{\lambda }}_{n+1}-{\boldsymbol{\lambda }}\parallel ^{2}\leq \parallel {\boldsymbol{\lambda }}_{n}-{\boldsymbol{\lambda }}\parallel ^{2}+t_{n}^{2}\parallel {\boldsymbol{\lambda }}_{n}^{\ast }\parallel ^{2}+2t_{n}(f( {\boldsymbol{\lambda }})-f({\boldsymbol{\lambda }}_{n})+\epsilon _{n}) \end{aligned}$$

showing the desired inequality. __

By the above inequality, the convergence of our subgradient optimization scheme is easily proved and this is shown by the following result.

Theorem 2.6

If limn↑∞ 𝜖 n = 0 and the positive sequence \( \{t_{n}:n\in \mathbb {N}\}\) satisfies

$$\displaystyle \begin{aligned} \sum\nolimits_{n=1}^{\infty }t_{n}=\infty \mathit{\text{ and }}\lim\nolimits_{n \uparrow \infty }t_{n}\parallel {\boldsymbol{\lambda }}_{n}^{\ast }\parallel ^{2}=0, \end{aligned}$$

then it follows that

$$\displaystyle \begin{aligned} \lim\nolimits_{n\uparrow \infty }m_{n}=\upsilon (\mbox{{{$C$}}})\geq -\infty \end{aligned}$$

with

$$\displaystyle \begin{aligned} m_{n}:=\min \{f({\boldsymbol{\lambda }}_{k}):k\leq n\}. \end{aligned}$$

Proof

Clearly the sequence m n, n ∈ N is decreasing and so it has a limit c. If the limit c equals −, the result is proved and so we assume that the limit c is finite. If c > υ( C ), one can find by the definition of an infimum some δ > 0 and λ e ∈ D satisfying

$$\displaystyle \begin{aligned} v(C)\leq f({\boldsymbol{\lambda }}_{e})\leq c-\delta\leq f({\boldsymbol{\lambda }}_{n})-\delta {} \end{aligned} $$
(33)

for every \(n\in \mathbb {N}\). Hence by Lemma 2.12 with λ = λ e, we obtain that

$$\displaystyle \begin{aligned} \parallel {\boldsymbol{\lambda }}_{n+1}-{\boldsymbol{\lambda }}_{e}\parallel ^{2}\leq \parallel {\boldsymbol{\lambda }}_{n}-{\boldsymbol{\lambda }}_{e}\parallel ^{2}+t_{n}^{2}\parallel {\boldsymbol{\lambda }}_{n}^{\ast }\parallel ^{2}+2t_{n}(f( {\boldsymbol{\lambda }}_{e})-f({\boldsymbol{\lambda }}_{n})+\epsilon _{n}). \end{aligned}$$

This implies by relation (33) that

$$\displaystyle \begin{aligned} \parallel {\boldsymbol{\lambda }}_{n+1}-{\boldsymbol{\lambda }}_{e}\parallel ^{2}\leq \parallel {\boldsymbol{\lambda }}_{n}-{\boldsymbol{\lambda }}_{e}\parallel ^{2}+t_{n}(t_{n}\parallel {\boldsymbol{\lambda }}_{n}^{\ast }\parallel ^{2}+2(\epsilon _{n}-\delta )). \end{aligned}$$

Since we assume that

$$\displaystyle \begin{aligned} \lim\nolimits_{n\uparrow \infty }t_{n}\parallel {\boldsymbol{\lambda }}_{n}^{\ast }\parallel ^{2}=0\text{ and }\lim\nolimits_{n\uparrow \infty }\epsilon _{n}=0, \end{aligned}$$

one can find some n 0 ∈ N such that for every n ≥ n 0 it follows that

$$\displaystyle \begin{aligned} t_{n}\parallel {\boldsymbol{\lambda }}_{n}^{\ast }\parallel ^{2}+2(\epsilon _{n}-\delta )\leq -\delta \end{aligned}$$

and this yields for every n ≥ n 0 that

$$\displaystyle \begin{aligned} \parallel {\boldsymbol{\lambda }}_{n+1}-{\boldsymbol{\lambda }}_{e}\parallel ^{2}\leq \parallel {\boldsymbol{\lambda }}_{n}-{\boldsymbol{\lambda }}_{e}\parallel ^{2}-\delta t_{n} {}. \end{aligned} $$
(34)

Hence we obtain by relation (34) that

$$\displaystyle \begin{aligned} 0\leq \parallel {\boldsymbol{\lambda }}_{m}-{\boldsymbol{\lambda }}_{e}\parallel ^{2}\leq \parallel {\boldsymbol{\lambda }}_{n_{0}}-{\boldsymbol{\lambda }}_{e}\parallel ^{2}-\delta \sum\nolimits_{k=n_{0}}^{m-1}t_{k} \end{aligned}$$

for every m ≥ n 0 and this implies by the assumption \( \sum _{k=n_{0}}^{\infty }t_{k}=\infty \) that

$$\displaystyle \begin{aligned} 0\leq \lim_{m\uparrow \infty }\parallel {\boldsymbol{\lambda }}_{m}-\mathbf{ \lambda }_{e}\parallel ^{2}=-\infty. \end{aligned}$$

Hence we obtain a contradiction and so it must follow that c = υ(C). _

Since it can happen that an optimal solution of optimization problem ( C ) does not exist, we cannot show that the sequence λ n generated by the subgradient method converges to the optimal solution of ( C ). In general it also does not hold that the sequence f(λ n) is decreasing. Although we have shown under fairly general conditions that the sequence m n converges to v( C ), it is very difficult to give a proper stopping rule for the subgradient method. Also in case we apply the subgradient method to optimization problem (D ), the selection of the step sizes t n, n ∈ N is a matter of trial and error (Nemhauser and Wolsey, 1988) and so devising a proper scheme is an art in itself. For more details on these practical issues, the reader is referred to Beasley et al. (1993). We finally consider one theoretical case in which one knows in advance that the current iteration point is indeed optimal. This result is presented in the next lemma.

Lemma 2.13

If at the (n + 1)th step it follows that

$$\displaystyle \begin{aligned} {\boldsymbol{\lambda }}_{n+1}:=P_{D}({\boldsymbol{\lambda }}_{n}-t_{n}{\boldsymbol{\lambda }} _{n}^{\ast }) \end{aligned}$$

equals λ n with \({\boldsymbol {\lambda }}_{n}^{\ast }\) a 𝜖 n -subgradient of the function f at λ n , then the vector λ n is an 𝜖 n -optimal solution of optimization problem ( C ).

Proof

We only give a proof of this result in case \(\boldsymbol {\lambda }_{n}-t_{n} \boldsymbol {\lambda }_{n}^{\ast }\) belongs to D. If this holds, it follows since t n > 0 that

$$\displaystyle \begin{aligned} \boldsymbol{\lambda}_{n+1}:=P_{D}(\boldsymbol{\lambda}_{n}-t_{n}\boldsymbol{\lambda} _{n}^{\ast})=\boldsymbol{\lambda}_{n}-t_{n}\boldsymbol{\lambda}_{n}^{\ast} \end{aligned}$$

and so \(\boldsymbol {\lambda }_{n}^{\ast }=\mathbf {0}.\) Since \(\boldsymbol {\lambda } _{n}^{\ast }\) is an 𝜖 n-subgradient, this implies that

$$\displaystyle \begin{aligned} f(\boldsymbol{\lambda})\geq f(\boldsymbol{\lambda}_{n})-\epsilon_{n} \end{aligned}$$

for every λ ∈ D and the desired result follows. __

For more theoretical results regarding the subgradient method, the reader is referred to Shor (2012) and Correa and Lemaréchal (1993). In the next section we apply the above results to derive Lagrangian heuristics for location and transportation problems.

3 Application of the Principle of Lagrange to Problems in Transportation and Location Theory

In this section we consider two classical problems in location and transportation theory and apply to these problems the Lagrangian relaxation approach. In the first subsection we discuss the well-known set covering problem and in the second subsection several versions of the capacitated facility location problem.

3.1 The Set Covering Problem

In this subsection we formulate the set covering problem. Observe that this problem shows up in a lot of different applications within computer and management science. To give a particular example in transportation, consider a company that needs to serve every day m different customers given by the set \(\mathcal {M}=\{1,\ldots ,m\}\). To serve these customers, the company selects from a given set \(\mathcal {N}=\{1,\ldots ,n\}\) of known routes a subset of these routes covering all present day customers. Each route \(j\in \mathcal {N},\) has a known cost c j > 0 and so the company wants to select the subset that serves all customers on a given day and has minimal cost. Selecting a route \(j\in \mathcal {N}\) for today is a yes∖no decision and so we introduce for each route \(j\in \mathcal {N}\) the decision variable

$$\displaystyle \begin{aligned} x_{j}=\left\{ \begin{array}{ll} 1 & \ \quad \mbox{ if route } j\mbox{ is selected} \\ 0 & \ \quad \mbox{ otherwise} \end{array} \right.. \end{aligned}$$

To model that every customer who needs to be served today is at least covered by one selected route, we construct the m × n matrix A = (a ij), given by

$$\displaystyle \begin{aligned} a_{ij}=\left\{ \begin{array}{ll} 1 & \ \quad \mbox{ if customer }~i\mbox{ is on route }j \\ 0 & \ \quad \mbox{ otherwise } \end{array} \right.. \end{aligned}$$

Clearly the value \(\sum \nolimits _{j\in \mathcal {N}}a_{ij}x_{j},i\in \mathcal {M}\) yields the number of times customer \(i\in \mathcal {M}\) is on the subset of selected routes and we need to solve the optimization problem:

$$\displaystyle \begin{aligned} \inf\left\{ \sum\nolimits_{j\in \mathcal{N}}c_{j}x_{j}:\sum\nolimits_{j\in \mathcal{N}}a_{ij}x_{j}\geq 1,i\in \mathcal{M},x_{j}\in \mathbb{B},j\in \mathcal{N}\right\} {} \end{aligned} $$
(SP)

with \(\mathbb {B=\{}0,1\}\). Hence the set covering problem (SP) is given by

$$\displaystyle \begin{aligned} \inf\{\sum\nolimits_{j\in\mathcal{N}}c_{j}x_{j}:\mathbf{g}(\mathbf{x} )\preceq_{K}\mathbf{0,x}\in\mathbb{B}^{n}\} \end{aligned}$$

with \(\mathbf {g}(\mathbf {x})=(g_{1}(\mathbf {x}),\ldots ,g_{m}(\mathbf {x} )),g_{i}(\mathbf {x})=1-\sum _{j\in \mathcal {N}}a_{ij}x_{j}\) and \(K=\mathbb {R} _{+}^{m}\). The feasible set contains only a finite number of elements bounded above by 2n and so the set covering problem has an optimal solution. It is well-known that the set covering problem is \(\mathcal {N}\mathcal {P}\)-hard (Garey and Johnson, 1979) and so by present day belief it seems very unlikely that there exists a polynomially bounded algorithm to solve this problem. To analyze this problem by means of the Lagrangian relaxation approach, we penalize the constraints \(\sum \nolimits _{j\in \mathcal {N}}a_{ij}x_{j} \geq 1,i\in \mathcal {M}\) and construct the Lagrangian dual function \(\theta : \mathbb {R}_{+}^{m}\rightarrow \mathbb {R}\) given by

$$\displaystyle \begin{aligned} \theta ({\boldsymbol{\lambda }})=\min \left\{ \sum\nolimits_{j\in \mathcal{N} }c_{j}x_{j}+\sum\nolimits_{i\in \mathcal{M}}\lambda _{i}(1-\sum\nolimits_{j\in \mathcal{N}}a_{ij}x_{j}):\mathbf{x}\in \mathbb{B} ^{n}\right\}. \end{aligned}$$

After some easy calculations, it follows that

$$\displaystyle \begin{aligned} \theta(\boldsymbol{\lambda})=\sum\nolimits_{i\in\mathcal{M}}\lambda_{i} +\sum\nolimits_{j\in\mathcal{N}}\min\left\{ c_{j}-\sum\nolimits_{i\in \mathcal{M}}a_{ij}\lambda_{i},0\right\} \end{aligned} $$
(35)

and the optimal solution x(λ) = (x 1(λ), …, x n(λ)) is given by

$$\displaystyle \begin{aligned} x_{j}(\boldsymbol{\lambda} )=1_{S_{j}}(\boldsymbol{\lambda }) \end{aligned} $$
(36)

with \(S_{j}=\{{\boldsymbol {\lambda }}\in \mathbb {R }_{+}^{m}:c_{j}\leq \sum \nolimits _{i\in \mathcal {M}}a_{ij}\lambda _{i}\},j\in \mathcal {N}.\) We now apply the subgradient method to solve the Lagrangian dual problem

$$\displaystyle \begin{aligned} \upsilon(D)=\min\nolimits_{\lambda\geq\mathbf{0}}\left\{ \sum\nolimits_{i\in \mathcal{M}}\lambda_{i}+\sum\nolimits_{j\in\mathcal{N}}\min\{c_{j} -\sum\nolimits_{i\in\mathcal{M}}a_{ij}\lambda_{i},0\}\right\} \end{aligned}$$

and obtain as an optimal dual solution a vector λ opt and construct the solution x(λ opt) given in relation (36). By Lemma 2.9, this dual problem satisfies

$$\displaystyle \begin{aligned} \upsilon (D)=\min \left\{ \sum\nolimits_{j\in \mathcal{N} }c_{j}x_{j}:\sum\nolimits_{j\in \mathcal{N}}a_{ij}x_{j}\geq 1,i\in \mathcal{M},0\leq x_{j}\leq 1,j\in \mathcal{N}\right\}. \end{aligned}$$

If the solution x(λ opt) is primal feasible and it satisfies the complementary slackness conditions

$$\displaystyle \begin{aligned} \lambda_{i}(1-\sum\nolimits_{j\in\mathcal{N}}a_{ij}x_{j}({\boldsymbol{\lambda }}_{opt}))=0,i\in\mathcal{M}, \end{aligned}$$

then it follows by Lemma 2.5 that this solution is an optimal primal solution. If the solution x(λ opt) is only primal feasible, we have found a feasible solution for our problem and so \(\theta (\boldsymbol {\lambda }_{opt})\leq \upsilon (SP)\leq \sum \nolimits _{j\in \mathcal {N}}c_{j}x_{j}(\boldsymbol {\lambda }_{opt}).\) If the solution is not primal feasible, then we use the following Lagrangian heuristic to convert the solution x(λ opt) into a primal feasible solution. Consider the set of routes selected by our solution x(λ opt) given by

$$\displaystyle \begin{aligned} R(\boldsymbol{\lambda}_{opt})=\{1\leq j\leq n:x_{j}(\boldsymbol{\lambda}_{opt})=1\}. \end{aligned}$$

Starting with the initial set R(λ opt), we check for each customer \(i\in \mathcal {M}\) whether this customer is visited by one of the routes in the presently selected set. If not, we select the cheapest possible route covering this customer and add this route to this presently selected set of routes. After evaluating all the customers, we have constructed a set of routes that covers all customers. Some customers might be assigned to different routes and we delete them from all the routes except one. Observe that this heuristic is applied in a branch and bound procedure and at each branch we solve a special case of the set covering problem with some routes already selected. This concludes our discussion of the set covering problem. In the next section we consider the (un)capacitated facility location problem.

3.2 Fixed Charged Location Models on Discrete Spaces

To introduce the so-called fixed charged location problems in discrete location, we denote as before the set of customers by \(\mathcal {M}=\{1,\ldots ,m\} \) and the set of possible sites of facilities by \(\mathcal {N}=\{1,\ldots ,n\}.\) We denote by \(f_{j},j\in \mathcal {N}\) the fixed setup cost of site j, while c ij denotes the transportation costs of supplying the demand of customer \(i\in \mathcal {M}\) by site \(j\in \mathcal {N}\). The uncapacitated facility location problem tries to determine a subset of the set \(\mathcal {N}\) of possible sites in such a way that the total transportation and setup costs are minimized. In this problem the demand of all customers should be satisfied. Moreover, each open facility has unlimited capacity. A more difficult and related problem is given by the capacitated version of this problem. In this problem each site j has a fixed capacity q j. For the capacitated facility location problem, there actually exist two versions. In the first version the total demand of each customer can only be supplied by one facility and this problem is called the single source capacitated facility location problem, while in the other version the total demand of each customer can be supplied by more than one facility. This problem is called the multiple source capacitated facility location problem. To formulate an optimization problem for the uncapacitated version, we introduce the binary decision vector \(\mathbf {y}=(y_{j})_{j\in \mathcal {N}}\) given by

$$\displaystyle \begin{aligned} y_{j}=1\Leftrightarrow \text{facility }j~\text{is}\ \text{opened}\ \text{at}\ \text{site }j. \end{aligned}$$

Since in the uncapacitated facility location problem the capacity of each facility is unlimited, we assign each customer to the “cheapest ” open facility and this implies that the model can be represented by the optimization problem:

$$\displaystyle \begin{aligned} \inf \left\{\sum\nolimits_{i\in \mathcal{M}}\min \{c_{ij}:y_{j}=1\}+\sum\nolimits_{j\in \mathcal{N}}f_{j}y_{j}:\mathbf{y}\in \mathbb{B}^{n}\right\}.. \end{aligned}$$

However, in this optimization problem the objective function is not linear and to linearize the objective function we introduce the decision variables \( \mathbf {x}=(x_{ij})_{i\in \mathcal {M},j\in \mathcal {N}}\) given by

$$\displaystyle \begin{aligned} x_{ij}=\text{fraction }\ \text{of}\ \text{demand}\ \text{of}\ \text{customer }i\in \mathcal{M}\text{ supplied}\ \text{by}\ \text{site }j\in \mathcal{N}. \end{aligned}$$

Using these decision variables, the objective function is given by

$$\displaystyle \begin{aligned} \sum\nolimits_{i\in \mathcal{M}}\sum\nolimits_{j\in \mathcal{N} }c_{ij}x_{ij}+\sum\nolimits_{j\in \mathcal{N}}f_{j}y_{j}. \end{aligned}$$

Since all demand should be delivered, it is clear that these decision variables need to satisfy the so-called assignment constraints \( \sum \nolimits _{j\in \mathcal {N}}x_{ij}=1\) for every \(i\in \mathcal {M}\) and so the feasible region \(\mathcal {F}_{UFL}\) of the uncapacitated facility location problem is given by

$$\displaystyle \begin{aligned} \mathcal{F}_{UFL}=\{(\mathbf{x},\mathbf{y}):0\leq x_{ij}\leq y_{j},\sum\nolimits_{j\in \mathcal{N}}x_{ij}=1,y_{j}\in \mathbb{B},i\in \mathcal{M},j\in \mathcal{N}\}. {} \end{aligned} $$
(37)

Hence the optimization problem for the uncapacitated facility location problem is given by

$$\displaystyle \begin{aligned} \inf \left\{\sum\nolimits_{i\in \mathcal{M}}\sum\nolimits_{j\in \mathcal{N} }c_{ij}x_{ij}+\sum\nolimits_{j\in \mathcal{N}}f_{j}y_{j}:(\mathbf{x},\mathbf{ y})\in \mathcal{F}_{UFL}\right\}. {} \end{aligned} $$
(UFL)

If we consider the multiple-source capacitated facility location problem, we need to add the capacity constraints \(\sum \nolimits _{i\in \mathcal {M} }d_{i}x_{ij}\leq q_{j}y_{j}\) for every \(j\in \mathcal {N}\) with d i denoting the total demand of customer \(i\in \mathcal {M}.\) Hence the feasible region \(\mathcal {F}_{MSCFL}\) of the multiple-source capacitated facility location problem is given by

$$\displaystyle \begin{aligned} \mathcal{F}_{MSCFL}=\mathcal{F}_{UFL}\cap \left\{(\mathbf{x},\mathbf{y} ):\sum\nolimits_{i\in \mathcal{M}}d_{i}x_{ij}\leq q_{j}y_{j},j\in \mathcal{N} \right\}\end{aligned} $$

and we need to solve the optimization problem:

$$\displaystyle \begin{aligned} \inf \left\{\sum\nolimits_{i\in \mathcal{M}}\sum\nolimits_{j\in \mathcal{N} }c_{ij}x_{ij}+\sum\nolimits_{j\in \mathcal{N}}f_{j}y_{j}:(\mathbf{x},\mathbf{ y})\in \mathcal{F}_{MSCFL}\right\}. \end{aligned} $$
(MSCFL)

If we consider the single source capacitated facility location problem, each customer can only be supplied by exactly one open facility and this implies that the decision variables \(x_{ij}\in \mathbb {B}\). Hence the feasible region \(\mathcal {F}_{\text{SSCFL}}\) is given by

$$\displaystyle \begin{aligned} \mathcal{F}_{SSCFL}=\mathcal{F}_{UFL}\cap \left\{(\mathbf{x},\mathbf{y} ):\sum\nolimits_{i\in \mathcal{M}}d_{i}x_{ij}\leq q_{j}y_{j},x_{ij}\in \mathbb{B},i\in \mathcal{N},j\in \mathcal{M}\right\}\end{aligned} $$

and the optimization problem is given by

$$\displaystyle \begin{aligned} \inf \left\{\sum\nolimits_{i\in \mathcal{M}}\sum\nolimits_{j\in \mathcal{N} }c_{ij}x_{ij}+\sum\nolimits_{j\in \mathcal{N}}f_{j}y_{j}:(\mathbf{x},\mathbf{ y})\in \mathcal{F}_{SSCFL}\right\}. \end{aligned} $$
(SSCFL)

For an extensive overview of transportation-location models on discrete spaces, the reader should consult Daskin (2011) or Francis, McGinnis and White (1992). It is well-known that the uncapacitated facility location problem ( UFL ) is \(\mathcal {N}\mathcal {P}\)-hard (Garey and Johnson, 1979) and as already observed represented by

with

$$\displaystyle \begin{aligned} \mathbf{g(x})=(g_{1}(\mathbf{x}),\ldots,g_{m}(\mathbf{x))}, g_{i} (\mathbf{x})=\sum\nolimits_{j\in\mathcal{N}}x_{ij}-1 \end{aligned}$$

and \(K=\{\mathbf {0}\}\subset \mathbb {R}^{m}\). To construct the Lagrangian dual function of the uncapacitated facility location model, we first observe for f j = 0 that we always open a facility at site j and so without loss of generality we may assume that f j > 0. In the uncapacitated facility location model we penalize the assignment constraints \(\sum _{j\in \mathcal {N}}x_{ij}-1=0\) for every \(i\in \mathcal {M}\). Since \(K^{\ast }=\{\mathbf {0}\}^{\ast }=\mathbb {R}^{m}\), the Lagrangian dual function \(\theta :\mathbb {R}^{m}\rightarrow \lbrack -\infty ,\infty )\) is defined on \(\mathbb {R}^{m}\) and it is given by

$$\displaystyle \begin{aligned} \theta ({\boldsymbol{\lambda }})=-\sum\nolimits_{i\in \mathcal{M} }\lambda _{i}+\sum\nolimits_{j\in \mathcal{N}}\inf \left\{ \sum\nolimits_{i\in \mathcal{M}}(c_{ij}+\lambda _{i})x_{ij}+f_{j}y_{j}:(\mathbf{x},\mathbf{y})\in X_{j}\right\} \end{aligned} $$
(38)

with \(X_{j}:=\{(\mathbf {x},\mathbf { y}):0\leq x_{ij}\leq y_{j},i\in \mathcal {M},y_{j}\in \mathbb {B}\}\), \(j\in \mathcal {N}\). To solve the minimization problem:

$$\displaystyle \begin{aligned} \inf\left\{ \sum\nolimits_{i\in\mathcal{M}}(c_{ij}+\lambda_{i})x_{ij} +f_{j}y_{j}:(\mathbf{x,y)}\in X_{j}\right\} \end{aligned} $$
(Hj )

for a given \(j\in \mathcal {N}\), we take x ij as large (small) as possible in case c ij + λ i is negative (positive). This implies that an optimal solution (x(λ), y(λ)) of optimization problem (H j) is given by

$$\displaystyle \begin{aligned} x_{ij}(\boldsymbol{\lambda})=\left\{ \begin{array} [c]{ll} y_{j}(\boldsymbol{\lambda}) & \ \quad \text{ if }~c_{ij}+\lambda_{i}<0\\ 0 & \ \quad \text{ if }~c_{ij}+\lambda_{i}\geq0 \end{array} \right.. \end{aligned} $$
(39)

By relation (39), it follows that the decision variables x ij, \(i\in \mathcal {M},j\in \mathcal {N}\) can be eliminated and we obtain

$$\displaystyle \begin{aligned} \inf \left\{\sum\nolimits_{i\in \mathcal{M}}(c_{ij}+\lambda _{i})x_{ij}+f_{j}y_{j}:(\mathbf{x},\mathbf{y})\in X_{j}\right\}=\inf \{(f_{j}+b_{j}({\boldsymbol{\lambda }} ))y_{j}:y_{j}\in \mathbb{B}\} {} \end{aligned} $$
(40)

with \(b_{j}:\mathbb {R}^{m}\rightarrow (-\infty ,\infty )\) given by

$$\displaystyle \begin{aligned} b_{j}({\boldsymbol{\lambda }}):=\sum\nolimits_{i\in \mathcal{M}}\min \{c_{ij}+\lambda _{i},0\}. \end{aligned} $$
(41)

To determine the optimal objective value of problem (H j), we finally observe that

$$\displaystyle \begin{aligned} \inf\{(f_{j}+b_{j}(\boldsymbol{\lambda}))y_{j}:y_{j}\in\mathbb{B}\}=\min \{f_{j}+b_{j}(\boldsymbol{\lambda}),0\} \end{aligned} $$
(42)

with optimal solution \(\mathbf {y}(\boldsymbol {\lambda })=(y_{j}(\boldsymbol {\lambda } ))_{j\in \mathcal {N}}\) given by

$$\displaystyle \begin{aligned} y_{j}(\boldsymbol{\lambda})=\left\{ \begin{array} [c]{ll} 1 & \ \quad \mbox{ if }f_{j}+b_{j}(\boldsymbol{\lambda})\leq0\\ 0 & \ \quad \mbox{ if }f_{j}+b_{j}(\boldsymbol{\lambda})>0 \end{array} \right.. \end{aligned} $$
(43)

Hence the Lagrangian dual function \(\theta :\mathbb {R}^{m}\rightarrow \mathbb {R}\) of the uncapacitated facility location problem is given by

$$\displaystyle \begin{aligned} \theta ({\boldsymbol{\lambda }})=-\sum\nolimits_{i\in \mathcal{N}}\lambda _{i}+\sum\nolimits_{j\in \mathcal{M}}\min \{f_{j}+b_{j}({\boldsymbol{\lambda }} ),0\} \end{aligned}$$

and its Lagrangian dual by

$$\displaystyle \begin{aligned} \sup \{\theta ({\boldsymbol{\lambda }}):{\boldsymbol{\lambda }}\in \mathbb{R}^{n}\}. \end{aligned}$$

Applying the subgradient method to \(\inf \{-\theta ({\boldsymbol {\lambda }}):{\boldsymbol {\lambda }}\in \mathbb {R}^{m}\}\), we determine the optimal Lagrangian multiplier λ opt. In case the optimal solution (x(λ opt), y(λ opt)) given by relations (39) and (43) is also primal feasible, we obtain since K = {0} that the complementary slackness conditions are automatically satisfied. This implies by Lemma 2.5 that this solution is an optimal solution of the uncapacitated facility location problem. However, in most cases this solution is not primal feasible and this means that either no facilities are opened or there exists a customer whose demand is not satisfied. The following Lagrangian heuristic now transforms the solution (x(λ opt), y(λ opt)) into a primal feasible solution.

Algorithm 3.1

Lagrangian heuristic:

  1. 1.

    Consider the possibly empty set S 1(λ opt) of open facilities given by

    $$\displaystyle \begin{aligned} S_{1}(\boldsymbol{\lambda}_{opt}):=\{j\in \mathcal{N}:y_{j}(\boldsymbol{\lambda}_{opt})=1\} \end{aligned}$$

    and go to step 2.

  2. 2.

    If this set is empty, open a facility and assign all customers to this facility. If this set is nonempty, consider all customers “overassigned” or not assigned and make them nonassigned. Assign now these nonassigned customers to the cheapest facility belonging to the set S 1(λ opt).

As for the set covering problem, we need to solve in a branch and bound procedure at each branch a special instance of the uncapacitated facility location problem with some facilities already opened and or some customers already assigned to some open facilities. A more efficient and special purpose procedure to solve the uncapacitated facility location problem starts with the observation that for positive setup costs we may replace the decision variables \(y_{j}\in \mathbb {B}\) in problem ( UFL ) by \(y_{j}\in \mathbb {Z}_{+}\) without changing the set of optimal solutions and the optimal objective value. This means that we replace the feasible region \(\mathcal {F}_{UFL}\) by the bigger set \(\mathcal {F}_{RUFL}\) given by

$$\displaystyle \begin{aligned} \mathcal{F}_{RUFL}:=\{(\mathbf{x},\mathbf{y}):0\leq x_{ij}\leq y_{j},\sum\nolimits_{j\in \mathcal{N}}x_{ij}=1,y_{j}\in \mathbb{Z}_{+}\}, \end{aligned}$$

and so we obtain the relaxation:

$$\displaystyle \begin{aligned} \inf \left\{\sum\nolimits_{i\in \mathcal{M}}\sum\nolimits_{j\in \mathcal{N} }c_{ij}x_{ij}+\sum\nolimits_{j\in \mathcal{M}}f_{j}y_{j}:(\mathbf{x},\mathbf{ y})\in \mathcal{F}_{RUFL}\right\}. {} \end{aligned} $$
(RUFL)

As already observed, it follows that υ(UFL) = υ(RUFL) and the optimal solutions of both optimization are the same. Penalizing now the assignment constraints in the above optimization problem ( RUFL ), it follows by the same procedure as used for the uncapacitated facility location problem that the Lagrangian dual problem is given by

$$\displaystyle \begin{aligned} \upsilon (D)=\sup \left\{-\sum\nolimits_{i\in \mathcal{M}}\lambda _{i}:f_{j}+b_{j}({\boldsymbol{\lambda }})\geq 0,j\in \mathcal{M},{\boldsymbol{\lambda }} \in \mathbb{R}^{m}\right\}. \end{aligned}$$

This Lagrangian dual problem is also called the condensed dual. There exists now a special purpose heuristic called DUALOC (Erlenkotter, 1978) to solve the above dual problem and this is surprisingly effective. Finally we look at the multisource capacitated version of the facility location problem. Again penalizing the assignment constraints, we obtain by the same approach as for the uncapacitated version that the Lagrangian dual function \(\theta :\mathbb {R}^{m}\rightarrow \mathbb {R}\) is given by

$$\displaystyle \begin{aligned} \theta (\boldsymbol{\lambda })=-\sum\nolimits_{i\in \mathcal{N}}\lambda _{i}+\sum\nolimits_{j\in \mathcal{M}}\min \{0,f_{j}+h_{j}(\boldsymbol{\lambda })\} \end{aligned}$$

with

$$\displaystyle \begin{aligned} \begin{array}{lll} h_{j}(\boldsymbol{\lambda }) & = & \inf \left\{ \sum\nolimits_{i\in \mathcal{M} }(c_{ij}+\lambda _{j})x_{ij}:0\leq x_{ij}\leq 1,\sum\nolimits_{i\in \mathcal{M}}d_{i}x_{ij}\leq q_{j},j\in\mathcal{N}\right\} \\ & = & -\sup \{\sum\nolimits_{i\in \mathcal{M}}(-c_{ij}-\lambda _{j})x_{ij}:0\leq x_{ij}\leq 1,\sum\nolimits_{i\in \mathcal{M} }d_{i}x_{ij}\leq q_{j}, j\in \mathcal{N}\}. \end{array} \end{aligned}$$

The last problem is a LP-relaxation of the knapsack problem. Without loss of generality, we may now assume that − c ij − λ j > 0 for every \(i\in \mathcal {M}\) (if not set x ij = 0 and reduce problem!). To solve this problem, it is well-known that we need to order the ratios \(\frac { -c_{ij}-\lambda _{j}}{d_{i}}\) in decreasing order given by

$$\displaystyle \begin{aligned} \frac{-c_{\pi (1)j}-\lambda _{j}}{d_{\pi (1)}}\geq \frac{-c_{\pi (2)j}-\lambda _{j}}{d_{\pi (2)}}\geq \ldots.\geq \frac{-c_{\pi (m)j}-\lambda _{j}}{d_{\pi (m)}} \end{aligned}$$

for some permutation π and start assigning the demand of customers π(1), π(2), …, π(n) to location j until the capacity of this location is satisfied. The last customer assigned might only get a part of his demand delivered from that facility j. Again we can easily construct as for the uncapacitated version a Lagrangian heuristic to convert the optimal solution of θ(λ) to a primal feasible solution. Observe that for the single source capacitated facility location problem we obtain the classical knapsack problem. Although this problem is \(\mathcal {N}\mathcal {P}\)-hard, there are fast heuristics solving this problem approximately (Martello, 1990). This ends our discussion of fixed charged location problems on discrete spaces.

4 Conclusion

In this study, the Lagrangian relaxation approach and the main ideas behind this approach are discussed in detail for finite dimensional optimization problems with continuous and/or discrete decision variables. Contrary to many courses taught in graduate programs distinguishing between continuous and discrete optimization problems, we start in this chapter with a generic description of an optimization problem incorporating both type of problems and apply to that optimization problem the Lagrangian relaxation approach. By introducing the main ideas behind this approach, we identify the so-called Lagrangian dual problem and a class of continuous optimization problems for which the optimal objective value of the primal problem equals the optimal objective value of the Lagrangian dual problem. At the same time, we show that many important results much later developed in the field of Operational Research are special instances of the Lagrangian relaxation approach. The most important ones are the primal–dual relations in linear programming and the dual linear program and the Karush–Kuhn–Tucker conditions in nonlinear programming. As such this chapter does not contain any new results but tries to give an easy introduction to duality theory and its use in finite dimensional optimization problems for less mathematically oriented readers using the simplest possible proofs. Since many models in Operational Research can be formulated as optimization problems and the Lagrangian relaxation technique tries to find solutions of these models, a basic understanding of this technique is of importance to more application oriented researchers in this field. Also it is important to understand that seemingly different algorithms for solving certain problems are actually based on the same ideas. As an example (although not discussed in this chapter), we mention the Dantzig–Wolfe decomposition technique in linear programming and the dual version of it called Benders method. To illustrate its use to some specific examples, we showed in the last part of this chapter how these techniques can be applied in generating approximative solutions for the classical set covering and fixed charged facility location models.