Mathematics Subject Classification

Primary 90C05; Secondary 49N15

Synonyms

Linear optimization (LP)

Short Definition

The Linear Programming Problem (LP) is the problem of maximizing or minimizing a linear function of one or more, and typically thousands of, variables subject to a similarly large number of equality and/or inequality constraints.

Description

Although Leonid Kantorovich [3] is generally credited with being the first to recognize the importance of linear programming as a tool for solving many practical operational problems, much credit goes to George Dantzig for independently coming to this realization a few years later (see [1, 2]). Originally, most applications arose out of military operations. However, it was quickly appreciated that important applications appear in all areas of science, engineering, and business analytics.

A problem is said to be in symmetric standard form if all the constraints are inequalities and all of the variables are nonnegative:

$$\displaystyle{ \begin{array}{lrcl@{}l} \mbox{ maximize } &c^{T}x\\ \mbox{subject to }&Ax& \leq &b \\ & x& \geq &0&.\end{array} }$$
(1)

Here, A is an m × n matrix whose (i, j)-th element is ai, j, b is an m-vector whose i-th element is b i , and c is an n-vector whose j-th element is c j . The linear function cTx is called the objective function. A particular choice of x is said to be feasible if it satisfies the constraints of the problem.

It is easy to convert any linear programming problem into an equivalent one in standard form. For example, any greater-than-or-equal-to constraint can be converted to a less-than-or-equal-to constraint by multiplying by minus one, any equality constraint can be replaced with a pair of inequality constraints, a minimization problem can be converted to maximization by negating the objective function, and every unconstrained variable can be replaced by a difference of two nonnegative variables.

Duality

Associated with every linear programming problem is a dual problem. The dual problem associated with (1) is

$$\displaystyle{ \begin{array}{lrcl@{}l} \mbox{ minimize } & b^{T}y \\ \mbox{ subject to }&A^{T}y& \geq &c \\ & y& \geq &0&.\end{array} }$$
(2)

Written in standard form, the dual problem is

$$\displaystyle{\begin{array}{lrcl@{}l} -\mbox{ maximize } & - b^{T}y \\ \phantom{ -}\mbox{ subject to }& - A^{T}y& \leq &- c \\ & y& \geq &0 &.\end{array} }$$

From this form we see that the dual of the dual is the primal. We also see that the dual problem is in some sense the negative-transpose of the primal problem.

The weak duality theorem states that, if x is feasible for the primal problem and y is feasible for the dual problem, then \(c^{T}x \leq b^{T}y\). The proof is trivial: \(c^{T}x \leq y^{T}Ax \leq y^{T}b\). The weak duality theorem is useful in that it provides a certificate of optimality: if x is feasible for the primal problem and y is feasible for the dual problem and \(c^{T}x = b^{T}y\), then x is optimal for the primal problem and y is optimal for the dual problem.

There is also a strong duality theorem. It says that, if x is optimal for the primal problem, then there exists a y that is optimal for the dual problem and the two objective function values agree: cTx = bTy.

All algorithms for linear programming are based on simultaneously finding an optimal solution for both the primal and the dual problem (or showing that either that the primal problem is infeasible or unbounded). The value of the dual is that it proves that the primal solution is optimal.

Slack Variables and Complementarity

It is useful to introduce slack variables into the primal and dual problems so that all inequalities are simple nonnegativities:

Primal Problem:

$$\displaystyle{\begin{array}{l@{}r@{}c@{}l@{}l} \mbox{ maximize } & c^{T}x\\ \mbox{ subject to } &Ax + w&\,\,=\,\,&b \\ & x,w&\,\,\geq \,\,&0&. \end{array} }$$

Dual Problem:

$$\displaystyle{\begin{array}{l@{}r@{}c@{}l@{}l} \mbox{ minimize } & b^{T}y \\ \mbox{ subject to }&A^{T}y - z&\,\, =\,\,&c \\ & y,z&\,\,\geq \,\,&0&. \end{array} }$$

It is trivial to check that \((c + z)^{T}x = y^{T}Ax = y^{T}(b - w)\). Hence, if x and w are feasible for the primal problem and y and z are feasible for the dual problem and \(c^{T}x = b^{T}y\), then it follows that x is optimal for the primal, y is optimal for the dual and \(z^{T}x + y^{T}w = 0\). Since all of the terms in these inner products are nonnegative, it follows that

$$\displaystyle{z_{j}x_{j} = 0\quad \mbox{ for all $j$}\qquad \mbox{ and}\qquad y_{i}w_{i} = 0\quad \mbox{ for all $i$}.}$$

This condition is called complementarity.

Geometry

The feasible set is an n-dimensional polytope defined by the intersection of n + m halfspaces where each halfspace is determined either by one of the m constraint inequalities, Axb, or one of the n nonnegativity constraints on the variables, x ≥ 0. Generally speaking, the vertices of this polytope correspond to the intersection of n hyperplanes defined as the boundaries of a specific choice of n out of the n + m halfspaces. Except in degenerate cases, the optimal solution to the LP occurs at one of the vertices.

Ignoring, momentarily, which side of a hyperplane is feasible and which is not, the n + m hyperplanes generate up to \((n + m)!/n!m!\) possible vertices corresponding to the many ways that one can choose n hyperplanes from the n + m. Assuming that these points of intersection are disjoint one from the other, these points in n-space are called basic solutions. The intersections that lie on the feasible set itself are called basic feasible solutions.

Simplex Methods

Inspired by the geometric view of the problem, George Dantzig introduced a class of algorithms, called simplex methods, that start at the origin and repeatedly jump from one basic solution to an adjacent basic solution in a systematic manner such that eventually a basic feasible solution is found and then ultimately an optimal vertex is found.

With the slack variables defined, the problem has n + m variables. As the slack variables w and the original variables x are treated the same by the simplex method, it is convenient to use a common notation:

$$\displaystyle{x\longleftarrow \left [\begin{array}{c} x\\ w \end{array} \right ].}$$

A basic solution corresponds to choosing n of these variables to be set to zero. The m equations given by

$$\displaystyle{ Ax + w = b }$$
(3)

can then be used to solve for the remaining m variables. Let \(\mathcal{N}\) denote a particular choice of n of the n + m indices and let \(\mathcal{B}\) denote the complement of this set (so that \(\mathcal{B}\cup \mathcal{N} =\{ 1,\ldots,n + m\}\)). Let \(x_{\mathcal{N}}\) denote the n-vector consisting of the variables x j , \(j \in \mathcal{N}\). These variables are called nonbasic variables. Let \(x_{\mathcal{B}}\) denote the m-vector consisting of the rest of the variables. They are called basic variables. Initially, \(x_{\mathcal{N}} = [x_{1}\;\cdots \;x_{n}]^{T}\) and \(x_{\mathcal{B}} = [x_{n+1}\;\cdots \;x_{n+m}]^{T}\) so that (3) can be rewritten as

$$\displaystyle{ x_{\mathcal{B}} = b - Ax_{\mathcal{N}}. }$$
(4)

While doing jumps from one basic solution to another, this system of equations is rearranged so that the basic variables always remain on the left and the nonbasics appear on the right. Down the road, these equations become

$$\displaystyle{ x_{\mathcal{B}} = x_{\mathcal{B}}^{{\ast}}- B^{-1}Nx_{ \mathcal{N}} }$$
(5)

where B denotes the m × m invertible matrix consisting of the columns of the matrix [AI] associated with the basic variables \(\mathcal{B}\), N denotes those columns of that matrix associated with the nonbasic variables \(\mathcal{N}\), and \(x_{\mathcal{B}}^{{\ast}} = B^{-1}b\). Equation (5) is called a primal dictionary because it defines the primal basic variables in terms of the primal nonbasic variables. The process of updating equation (5) from one iteration to the next is called a simplex pivot.

Associated with each dictionary is a basic solution obtained by setting the nonbasic variables to zero and reading from the dictionary the values of the basic variables

$$\displaystyle{x_{\mathcal{N}} = 0\qquad \mbox{ and}\qquad x_{\mathcal{B}} = x_{\mathcal{B}}^{{\ast}}.}$$

In going from one iteration to the next, a single element of \(\mathcal{N}\), say j, and a single element of \(\mathcal{B}\), say i, are chosen and these two variables are swapped in these two sets. The variable \(x_{j^{{\ast}}}\) is called the entering variable and \(x_{i^{{\ast}}}\) is called the leaving variable.

In complete analogy with the primal problem, one can write down a dual dictionary and read off a dual basic solution. The initial primal/dual pair had a symmetry that we called the negative-transpose property. It turns out that this symmetry is preserved by the pivot operation. As a consequence, it follows that primal/dual complementarity holds in every primal/dual basic solution. Hence, a basic solution is optimal if and only if it is primal feasible and dual feasible.

Degeneracy and Cycling

Every variant of the simplex method chooses the entering and leaving variables at each iteration with the intention of improving some specific measure of a distance either from feasibility or optimality. If such a move does indeed make a strict improvement at every iteration, then it easily follows that the algorithm will find an optimal solution in a finite number of pivots because there are only a finite number of ways to partition the set \(\{1,2,\ldots,n + m\}\) into m basic and n nonbasic components. If the metric is always making a strict improvement, then it can never return to a place it has been before. However, it can happen that a simplex pivot can make zero improvement in one or more iterations. Such pivots are called degenerate pivots. It is possible, although exceedingly rare, for simple variants of the simplex method to produce a sequence of degenerate pivots eventually returning to a basic solution already visited. If the algorithm chooses the entering and leaving variables according to a deterministic rule, then returning once implies returning infinitely often and the algorithm fails. This failure is called cycling. There are many safe-guards to prevent cycling, perhaps the simplest being to add a certain random aspect to the entering/leaving variable selection rules. All modern implementations of the simplex method have anti-cycling safeguards.

Empirical Average-Case Performance

Given the anti-cycling safeguards, it follows that the simplex method is a finite algorithm. But, how fast is it in practice? The answer is that, on average, most variants of the simplex method take roughly order min(n, m) pivots to find an optimal solution. Such average case performance is about the best that one could hope for and accounts for much of the practical usefulness of linear programming in solving important everyday problems.

Worst-Case Performance

One popular variant of the simplex method assumes that the initial primal dictionary is feasible and, at each iteration, selects for the entering variable the non-basic variable that provides the greatest rate of increase of the objective function and it then chooses the leaving variable so as to preserve primal feasibility. In 1972, Klee and Minty [6] constructed a simple family of LPs in which the n-th instance involved n variables and a feasible polytope that is topologically equivalent to an n-cube but for which the pivot rule described above takes short steps in directions of high rate of increase rather than huge steps in directions with a low rate of increase and in so doing visits all 2n vertices of this distorted n-cube in 2n−1 pivots thus showing that this particular variant of the simplex method has exponential complexity. It is an open question whether or not there exists some variant of the simplex method whose worst-case performance is better than exponential.

Interior-Point Methods

For years it was unknown whether or not there existed an algorithm for linear programming that is guaranteed to solve problems in polynomial time. In 1979, Leonid Khachiyan [5] discovered the first such algorithm. But, in practice, his algorithm was much slower than the simplex method. In 1984, Narendra Karmarkar [4] developed a completely different polynomial time algorithm. It turns out that his algorithm and the many variants of it that have appeared over time are also highly competitive with the simplex method.

The class of algorithms inspired by Karmarkar’s algorithm are called interior-point algorithms. Most implementations of algorithms of this type belong to a generalization of this class called infeasible interior-point algorithms. These algorithms are iterative algorithms that approach optimality only in the limit – that is, they are not finite algorithms. But, for any \(\epsilon> 0\), they get within \(\epsilon\) of optimality in polynomial time. The adjective “infeasible” points to the fact that these algorithms may, and often do, approach optimality from outside the feasible set. The adjective “interior” means that even though the iterates may be infeasible, it is required that all components of all primal and dual variables be strictly positive at every iteration.

Complexity

In the worst case, Karmarkar’s algorithm requires on the order of \(\sqrt{n}\log (1/\epsilon\)) iterations to get within ε of an optimal solution. But, an iteration of an interior-point method is more computationally intensive (order n3) than an iteration of the simplex method (order n2). Comparing arithmetic operations, one gets that interior-point methods require on the order of \(n^{3.5}\log (1/\epsilon\)) arithmetic operations in the worst case, which is comparable to the average case performance of the simplex method.