Optimization

Dostál, Zdeněk

doi:10.1007/978-1-4939-6834-3_3

Zdeněk Dostál⁷

Part of the book series: Advances in Mechanics and Mathematics ((AMMA,volume 36))

857 Accesses

Abstract

In this chapter, we briefly review the results concerning the minimization of quadratic functions to the extent which is sufficient for understanding the algorithms described in Part II. The results are presented with specialized arguments, typically algebraic, that exploit the specific structure of these problems. Systematic exposition of optimization theory in the framework of nonlinear optimization can be found in the books by Bertsekas (Nonlinear Optimization, 1999, [1]), Nocedal and Wright (Numerical Optimization, 2000, [2]), Conn, Gould, and Toint (Trust Region Methods, 2000, [3]), Bazaraa, Sherali, and Shetty (Nonlinear Programming, Theory and Algorithms, 1993, [4]), or Griva, Nash, and Sofer (Linear and Nonlinear Optimization, 2009, [5]).

Access provided by Autonomous University of Puebla. Download chapter PDF

Nonlinear Optimization: A Brief Overview

Nonlinear Optimization: The Nelder-Mead Simplex Search Procedure

Optimization Theory

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

In this chapter, we briefly review the results concerning the minimization of quadratic functions to the extent which is sufficient for understanding the algorithms described in Part II. The results are presented with specialized arguments, typically algebraic, that exploit the specific structure of these problems. Systematic exposition of optimization theory in the framework of nonlinear optimization can be found in the books by Bertsekas [1], Nocedal and Wright [2], Conn, Gould, and Toint [3], Bazaraa, Sherali, and Shetty [4], or Griva, Nash, and Sofer [5].

1 Optimization Problems and Solutions

Optimization problems considered in this book are described by a cost (objective, target) function f defined on a subset ${\mathscr {D}}\subseteq \mathbb {R}^n$ and by a constraint set $\varOmega \subseteq {\mathscr {D}}$. The elements of $\varOmega $ are called feasible vectors. Important ingredients of scalable algorithms for the frictionless contact problems are efficient algorithms for the solution of quadratic programming (QP) problems with a quadratic cost function f and a constraint set $\varOmega \subseteq \mathbb {R}^n$ described by linear equalities and inequalities. The solution of problems with friction requires effective algorithms for special quadratically constrained quadratic programmes (QCQP) with the constraints described by linear equalities and separable quadratic inequalities.

We look either for a solution $\mathbf {\overline{x}}\in \mathbb {R}^n$ of the unconstrained minimization problem which satisfies

$$\begin{aligned} f(\mathbf {\overline{x}})\le f(\mathbf {x}), \ \mathbf {x}\in \mathbb {R}^n, \end{aligned}$$

(3.1)

or for a solution $\mathbf {\overline{x}}\in \varOmega $ of the constrained minimization problem

$$\begin{aligned} f(\mathbf {\overline{x}})\le f(\mathbf {x}), \ \ \mathbf {x}\in \varOmega , \ \ \varOmega \subset \mathbb {R}^n. \end{aligned}$$

(3.2)

A solution of the minimization problem is called its minimizer or global minimizer.

A nonzero vector $\mathbf {d}\in \mathbb {R}^n$ is a feasible direction of $\varOmega $ at a feasible point $\mathbf {x}$ if $\mathbf {x}+\varepsilon \mathbf {d}\in \varOmega $ for all sufficiently small $\varepsilon >0$. A nonzero vector $\mathbf {d}\in \mathbb {R}^n$ is a recession direction , or simply a direction, of $\varOmega $ if for each $\mathbf {x}\in \varOmega , \ \mathbf {x}+\alpha \mathbf {d}\in \varOmega $ for all $\alpha >0$.

2 Unconstrained Quadratic Programming

Let us first recall some simple results which concern unconstrained quadratic programming.

2.1 Quadratic Cost Functions

We consider the cost functions in the form

$$\begin{aligned} f(\mathbf {x})=\frac{1}{2}\mathbf {x}^T{\mathsf {A}}\mathbf {x}-\mathbf {b}^T\mathbf {x}, \end{aligned}$$

(3.3)

where ${\mathsf {A}}\in \mathbb {R}^{n\times n}$ denotes a given SPS or SPD matrix of order n and $\mathbf {b}\in \mathbb {R}^n$.

If $\mathbf {x, d}\in \mathbb {R}^n$, then using elementary computations and ${\mathsf {A=A}}^T$, we get

$$\begin{aligned} f(\mathbf {x}+ \mathbf {d})=f(\mathbf {x})+ ({\mathsf {A}}\mathbf {x}-\mathbf {b})^T \mathbf {d} +\frac{1}{2}\mathbf {d}^T{\mathsf {A}}\mathbf {d}. \end{aligned}$$

(3.4)

The formula (3.4) is Taylor’s expansion of f at $\mathbf {x}$, so that the gradient of f at $\mathbf {x}$ is given by

$$\begin{aligned} \nabla f(\mathbf {x})={\mathsf {A}}\mathbf {x}-\mathbf {b}, \end{aligned}$$

(3.5)

and the Hessian of f at $\mathbf {x}$ is given by

$$\nabla ^2 f(\mathbf {x})={\mathsf {A}}. $$

Taylor’s expansion will be our simple but powerful tool in what follows.

A vector $\mathbf {d}$ is a decrease direction of f at $\mathbf {x}$ if

$$f(\mathbf {x}+\varepsilon \mathbf {d})<f(\mathbf {x}) $$

for all sufficiently small values of $\varepsilon >0$. Using Taylor’s expansion (3.4) in the form

$$f(\mathbf {x}+ \varepsilon \mathbf {d})=f(\mathbf {x})+ \varepsilon ({\mathsf {A}}\mathbf {x}-\mathbf {b})^T \mathbf {d} +\frac{\varepsilon ^2}{2}\mathbf {d}^T{\mathsf {A}}\mathbf {d}, $$

we get that $\mathbf {d}$ is a decrease direction if and only if

$$({\mathsf {A}}\mathbf {x}-\mathbf {b})^T\mathbf {d}<0. $$

2.2 Unconstrained Minimization of Quadratic Functions

The following proposition gives algebraic conditions that are satisfied by the solutions of the unconstrained QP problem to find

$$\begin{aligned} \min _{\mathbf {x}\in \mathbb {R}^n} f(\mathbf {x}), \end{aligned}$$

(3.6)

where f is a quadratic function defined by (3.3).

Proposition 3.1

Let the quadratic function f be defined by an SPS or SPD matrix ${\mathsf {A}}\in \mathbb {R}^{n\times n}$ and $\mathbf {b}\in \mathbb {R}^n$. Then the following statements hold:

(i) A vector $\mathbf {\overline{x}}$ is a solution of the unconstrained minimization problem (3.6) if and only if

$$\begin{aligned} \nabla f(\mathbf {\overline{x}})={\mathsf {A}}\mathbf {\overline{x}}-\mathbf {b}=\mathbf {o}. \end{aligned}$$

(3.7)

(ii) The minimization problem (3.6) has a unique solution if and only if ${\mathsf {A}}$ is SPD.

Proof

The proof is a simple corollary of Taylor’s expansion formula (3.4). $\square $

Remark 3.1

Condition (3.7) can be written as a variational equality

$$({\mathsf {A}}\mathbf {\overline{x}})^T(\mathbf {x}-\overline{\mathbf {x}})=\mathbf {b}^T(\mathbf {x}-\overline{\mathbf {x}}), \quad \mathbf {x}\in \mathbb {R}^n. $$

Examining the gradient condition (3.7), we get that problem (3.6) has a solution if and only if ${\mathsf {A}}$ is SPS and

$$\begin{aligned} \mathbf {b}\in \mathrm {Im}{\mathsf {A}}. \end{aligned}$$

(3.8)

Denoting by ${\mathsf {R}}$ a matrix the columns of which span Ker${\mathsf {A}}$, we can rewrite (3.8) as ${\mathsf {R}}^T \mathbf {b}=\mathbf {o}$. This condition has a simple mechanical interpretation: if a mechanical system is in equilibrium, the external forces must be orthogonal to the rigid body motions.

If $\mathbf {b}\in \mathrm {Im}{\mathsf {A}}$, a solution of (3.6) is given by

$$\mathbf {\overline{x}}={\mathsf {A}}^+\mathbf {b}, $$

where ${\mathsf {A}}^+$ is a left generalized inverse introduced in Sect. 2.3. After substituting into f and simple manipulations, we get

$$\begin{aligned} \min _{\mathbf {x}\in \mathbb {R}^n}f(\mathbf {x})=-\frac{1}{2}\mathbf {b}^T{\mathsf {A}}^+\mathbf {b}. \end{aligned}$$

(3.9)

In particular, if ${\mathsf {A}}$ is positive definite, then

$$\begin{aligned} \min _{\mathbf {x}\in \mathbb {R}^n}f(\mathbf {x})=-\frac{1}{2}\mathbf {b}^T{\mathsf {A}}^{-1}\mathbf {b}. \end{aligned}$$

(3.10)

The above formulae can be used to develop useful estimates. Indeed, if (3.8) holds and $\mathbf {x}\in \mathbb {R}^n$, we get

$$ f(\mathbf {x})\ge -\frac{1}{2}\mathbf {b}^T{\mathsf {A}}^+\mathbf {b} =-\frac{1}{2}\mathbf {b}^T{\mathsf {A}}^{\dagger }\mathbf {b} \ge -\frac{1}{2}\Vert {\mathsf {A}}^{\dagger }\Vert \Vert \mathbf {b}\Vert ^2 =-\frac{\Vert \mathbf {b}\Vert ^2}{2\overline{\lambda }_{\min }}, $$

where ${\mathsf {A}}^{\dagger }$ denotes the Moore–Penrose generalized inverse and $\overline{\lambda }_{\min }$ denotes the least nonzero eigenvalue of ${\mathsf {A}}$. In particular, it follows that if ${\mathsf {A}}$ is positive definite and $\lambda _{\min }$ denotes the least eigenvalue of ${\mathsf {A}}$, then for any $\mathbf {x}\in \mathbb {R}^n$

$$\begin{aligned} f(\mathbf {x})\ge -\frac{1}{2}\mathbf {b}^T{\mathsf {A}}^{-1}\mathbf {b} \ge -\frac{1}{2}\Vert {\mathsf {A}}^{-1}\Vert \Vert \mathbf {b}\Vert ^2 =-\frac{\Vert \mathbf {b}\Vert ^2}{2\lambda _{\min }}. \end{aligned}$$

(3.11)

If the dimension n of the unconstrained minimization problem (3.6) is large, then it can be too ambitious to look for a solution which satisfies the gradient condition (3.7) exactly. A natural idea is to consider the weaker condition

$$\begin{aligned} \Vert \nabla f(\mathbf {x})\Vert \le \varepsilon \end{aligned}$$

(3.12)

with a small epsilon. If $\mathbf {x}$ satisfies the latter condition with $\varepsilon $ sufficiently small and ${\mathsf {A}}$ nonsingular, then $\mathbf {x}$ is near the unique solution $\mathbf {\widehat{x}}$ as

$$\begin{aligned} \Vert \mathbf {x}-\mathbf {\widehat{x}}\Vert =\Vert {\mathsf {A}}^{-1}{\mathsf {A}}\left( \mathbf {x}-\mathbf {\widehat{x}}\right) \Vert =\Vert {\mathsf {A}}^{-1}({\mathsf {A}}\mathbf {x}-\mathbf {b})\Vert \le \Vert {\mathsf {A}}^{-1}\Vert \Vert \nabla f(\mathbf {x})\Vert . \end{aligned}$$

(3.13)

The typical “solution” returned by an iterative solver is just $\mathbf {x}$ that satisfies (3.12).

3 Convexity

Intuitively, convexity is a property of the sets that contain the joining segment with any two points. More formally, a subset $\varOmega $ of $\mathbb {R}^n$ is convex if for any $\mathbf {x}$ and $\mathbf {y}$ in $\varOmega $ and $\alpha \in (0,1)$, the vector $\mathbf {s}=\alpha \mathbf {x}+(1-\alpha )\mathbf {y}$ is also in $\varOmega $.

Let $\mathbf {x}_1, \dots , \mathbf {x}_k$ be vectors of $\mathbb {R}^n$. If $\alpha _1, \dots , \alpha _k$ are scalars such that

$$\alpha _i\ge 0, \quad i=1, \dots , k, \qquad \sum _{i=1}^k\alpha _i=1, $$

then the vector $\mathbf {v}=\sum _{i=1}^k\alpha _i\mathbf {x}_i$ is said to be a convex combination of vectors $\mathbf {x}_1, \dots , \mathbf {x}_k$. The convex hull of $\mathbf {x}_1, \dots , \mathbf {x}_k$, denoted $\mathrm {Conv}\{\mathbf {x}_1, \dots , \mathbf {x}_k\}$, is the set of all convex combinations of $\mathbf {x}_1, \dots , \mathbf {x}_k$.

3.1 Convex Quadratic Functions

Given a convex set $\varOmega \in \mathbb {R}^n$, a mapping $h: \varOmega \rightarrow \mathbb {R}$ is said to be a convex function if its epigraph is convex, that is, if

$$h\left( \alpha \mathbf {x}+(1-\alpha )\mathbf {y}\right) \le \alpha h(\mathbf {x}) +(1-\alpha )h(\mathbf {y}) $$

for all $\mathbf {x}, \mathbf {y}\in \varOmega $ and $\alpha \in (0,1)$, and it is strictly convex if

$$h\bigl (\alpha \mathbf {x}+(1-\alpha )\mathbf {y}\bigr )< \alpha h(\mathbf {x}) +(1-\alpha )h(\mathbf {y}) $$

for all $\mathbf {x}, \mathbf {y}\in \varOmega , \ \mathbf {x}\ne \mathbf {y}$, and $\alpha \in (0,1)$.

The following proposition gives a characterization of convex functions.

Proposition 3.2

Let V be a subspace of $\mathbb {R}^n$. The restriction f|V of a quadratic function f with the Hessian matrix ${\mathsf {A}}$ to V is convex if and only if ${\mathsf {A}}|{V}$ is positive semidefinite, and f|V is strictly convex if and only if ${\mathsf {A}}|{V}$ is positive definite.

Proof

Let V be a subspace, let $\mathbf {x}, \mathbf {y}\in V$, $\alpha ~\in ~(0,1)$, and $\mathbf {s}=\alpha \mathbf {x}+(1-\alpha )\mathbf {y}$. Then by Taylor’s expansion (3.4) of f at $\mathbf {s}$

$$\begin{aligned} f(\mathbf {s})+ \nabla f(\mathbf {s})^T(\mathbf {x}-\mathbf {s})+\frac{1}{2} (\mathbf {x}-\mathbf {s})^T{\mathsf {A}}(\mathbf {x}-\mathbf {s})= f(\mathbf {x}),\\ f(\mathbf {s})+ \nabla f(\mathbf {s})^T(\mathbf {y}-\mathbf {s})+\frac{1}{2} (\mathbf {y}-\mathbf {s})^T{\mathsf {A}}(\mathbf {y}-\mathbf {s})= f(\mathbf {y}). \end{aligned}$$

Multiplying the first equation by $\alpha $, the second equation by $1-\alpha $, and summing up, we get

$$\begin{aligned} \begin{aligned}f(\mathbf {s})+\frac{\alpha }{2} (\mathbf {x}-\mathbf {s})^T{\mathsf {A}}(\mathbf {x}-\mathbf {s})+ \frac{1-\alpha }{2} (\mathbf {y}-\mathbf {s})^T&{\mathsf {A}}(\mathbf {y}-\mathbf {s})\\&= \alpha f(\mathbf {x}) +(1-\alpha )f(\mathbf {y}). \end{aligned} \end{aligned}$$

(3.14)

It follows that if ${\mathsf {A}}|{V}$ is positive semidefinite, then f|V is convex. Moreover, since $\mathbf {x}=\mathbf {y}$ is equivalent to $\mathbf {x}=\mathbf {s}$ and $\mathbf {y}=\mathbf {s}$, it follows that if ${\mathsf {A}}|{V}$ is positive definite, then f|V is strictly convex.

Let us now assume that f|V is convex, let $\mathbf {z}\in {V}$, $\alpha =\frac{1}{2}$, and denote $\mathbf {x}=2\mathbf {z}$, $\mathbf {y}=\mathbf {o}$. Then $\mathbf {s}=\mathbf {z}, \ \mathbf {x-s}=\mathbf {z}, \ \mathbf {y-s}=-\mathbf {z}$, and substituting into (3.14) results in

$$f(\mathbf {s})+ \frac{1}{2}\mathbf {z}^T{\mathsf {A}}\mathbf {z}= \alpha f(\mathbf {x}) +(1-\alpha )f(\mathbf {y}). $$

Since $\mathbf {z}\in {V}$ is arbitrary and f|V is assumed to be convex, it follows that

$$\frac{1}{2}\mathbf {z}^T{\mathsf {A}}\mathbf {z}= \alpha f(\mathbf {x}) +(1-\alpha )f(\mathbf {y}) -f\left( \alpha \mathbf {x}+(1-\alpha )\mathbf {y}\right) \ge 0. $$

Thus ${\mathsf {A}}|{V}$ is positive semidefinite. Moreover, if f|V is strictly convex, then ${\mathsf {A}}|{V}$ is positive definite. $\square $

The strictly convex quadratic functions have a nice property that $f(\mathbf {x})\rightarrow \infty $ when $\Vert \mathbf {x}\Vert \rightarrow \infty $. The functions with this property are called coercive functions. More generally, a function $f:\mathbb {R}^n\rightarrow \mathbb {R}$ is said to be coercive on $\varOmega \subseteq \mathbb {R}^n$ if

$$f(\mathbf {x})\rightarrow \infty \quad \mathrm {for} \quad \Vert \mathbf {x}\Vert \rightarrow \infty , \ \mathbf {x}\in \varOmega .$$

3.2 Minimizers of Convex Function

Under the convexity assumptions, each local minimizer is a global minimizer. We shall formulate this result together with some observations concerning the set of solutions.

Proposition 3.3

Let f and $\varOmega \subseteq \mathbb {R}^n$ be a convex quadratic function defined by (3.3) and a closed convex set, respectively. Then the following statements hold:

(i) Each local minimizer of f subject to $\mathbf {x}\in \varOmega $ is a global minimizer of f subject to $\mathbf {x}\in \varOmega $.

(ii) If $\mathbf {\overline{x}}$, $\mathbf {\overline{y}}$ are two minimizers of f subject to $\mathbf {x}\in \varOmega $, then

$$\mathbf {\overline{x}}-\mathbf {\overline{y}}\in \mathrm{Ker}{\mathsf {A}}\cap \mathrm {Span}\{\mathbf {b}\}^{\bot }. $$

(iii) If f is strictly convex on $\varOmega $ and $\mathbf {\overline{x}}$, $\mathbf {\overline{y}}$ are two minimizers of f subject to $\mathbf {x}\in \varOmega $, then $\mathbf {\overline{x}}=\mathbf {\overline{y}}$.

Proof

(i) Let $\mathbf {\overline{x}}\in \varOmega $ and $\mathbf {\overline{y}}\in \varOmega $ be local minimizers of f subject to $\mathbf {x}\in \varOmega $, $f(\mathbf {\overline{x}})<f(\mathbf {\overline{y}})$. Denoting $\mathbf {y}_{\alpha } =\alpha \mathbf {\overline{x}}+(1-\alpha )\mathbf {\overline{y}}$ and using that f is convex, we get

$$f(\mathbf {y}_{\alpha }) =f(\alpha \mathbf {\overline{x}}+(1-\alpha )\mathbf {\overline{y}}) \le \alpha f(\mathbf {\overline{x}})+(1-\alpha )f(\mathbf {\overline{y}}) <f(\mathbf {\overline{y}}) $$

for every $\alpha \in (0, 1)$. Since

$$\Vert \mathbf {\overline{y}}-\mathbf {y}_{\alpha }\Vert =\alpha \Vert \mathbf {\overline{y}} -\mathbf {\overline{x}}\Vert ,$$

the inequality contradicts the assumption that $\mathbf {\overline{y}}$ is a local minimizer.

(ii) Let $\mathbf {\overline{x}}$ and $\mathbf {\overline{y}}$ be global minimizers of f on $\varOmega $. Then for any $\alpha \in [0,1]$

$$\mathbf {\overline{x}}+\alpha (\mathbf {\overline{y}}-\mathbf {\overline{x}}) = (1-\alpha )\mathbf {\overline{x}}+\alpha \mathbf {\overline{y}}\in \varOmega , \quad \mathbf {\overline{y}}+\alpha (\mathbf {\overline{x}}-\mathbf {\overline{y}}) = (1-\alpha )\mathbf {\overline{y}}+\alpha \mathbf {\overline{x}}\in \varOmega .$$

Moreover, using Taylor’s formula, we get

$$\begin{aligned} 0\le & {} f\bigl (\mathbf {\overline{x}}+\alpha (\mathbf {\overline{y}}-\mathbf {\overline{x}})\bigr )- f(\mathbf {\overline{x}})= \alpha ({\mathsf {A}}\mathbf {\overline{x}}-\mathbf {b})^T(\mathbf {\overline{y}}-\mathbf {\overline{x}}) +\frac{\alpha ^2}{2}(\mathbf {\overline{y}}-\mathbf {\overline{x}})^T{\mathsf {A}}(\mathbf {\overline{y}}-\mathbf {\overline{x}}),\nonumber \\ 0\le & {} f\bigl (\mathbf {\overline{y}}+\alpha (\mathbf {\overline{x}}-\mathbf {\overline{y}})\bigr )- f(\mathbf {\overline{y}})= \alpha ({\mathsf {A}}\mathbf {\overline{y}}-\mathbf {b})^T(\mathbf {\overline{x}}-\mathbf {\overline{y}}) +\frac{\alpha ^2}{2}(\mathbf {\overline{x}}-\mathbf {\overline{y}})^T{\mathsf {A}}(\mathbf {\overline{x}}-\mathbf {\overline{y}}). \nonumber \end{aligned}$$

Since the latter inequalities hold for arbitrarily small $\alpha $, it follows that

$$({\mathsf {A}}\mathbf {\overline{x}}-\mathbf {b})^T(\mathbf {\overline{y}}-\mathbf {\overline{x}})\ge 0 \quad \mathrm {and} \quad ({\mathsf {A}}\mathbf {\overline{y}}-\mathbf {b})^T(\mathbf {\overline{x}}-\mathbf {\overline{y}})\ge 0. $$

After summing up the latter inequalities and simple manipulations, we have

$$-(\mathbf {\overline{x}}-\mathbf {\overline{y}})^T{\mathsf {A}}(\mathbf {\overline{x}}-\mathbf {\overline{y}})\ge 0. $$

Since the convexity of f implies by Proposition 3.2 that ${\mathsf {A}}$ is positive semidefinite, it follows that $\mathbf {\overline{x}}-\mathbf {\overline{y}}\in \mathrm{Ker}{\mathsf {A}}$.

If $f(\mathbf {\overline{x}})=f(\mathbf {\overline{y}})$ and $ \mathbf {\overline{x}}, \mathbf {\overline{y}}\in \mathrm {Ker}{\mathsf {A}}$, then

$$\mathbf {b}^T(\mathbf {\overline{x}} - \mathbf {\overline{y}})=f( \mathbf {\overline{x}})-f(\mathbf {\overline{y}})=0, $$

i.e., $ \mathbf {\overline{x}} - \mathbf {\overline{y}}\in \mathrm {Span}\{\mathbf {b}\}^{\bot }$.

(iii) Let f be strictly convex and let $\mathbf {\overline{x}}, \mathbf {\overline{y}}\in \varOmega $ be different global minimizers of f on $\varOmega $, so that $f(\mathbf {\overline{x}}) =f(\mathbf {\overline{y}})$. Then $\mathrm{Ker}{\mathsf {A}}=\{\mathbf {o}\}$ and by (ii) $\mathbf {\overline{x}}-\mathbf {\overline{y}}=\mathbf {o}$. $\square $

3.3 Existence of Minimizers

Since quadratic functions are continuous, existence of at least one minimizer is guaranteed by the Weierstrass theorem provided $\varOmega $ is compact, that is, closed and bounded. The following standard results do not assume that $\varOmega $ is bounded.

Proposition 3.4

Let f be a convex quadratic function and let $\varOmega $ denote a closed convex set. Then the following statements hold:

(i) If f is strictly convex, then there is a unique minimizer of f subject to $\mathbf {x}\in \varOmega $.

(ii) If f is coercive on $\varOmega $, then a global minimizer of f subject to $\mathbf {x}\in \varOmega $ exists.

(iii) If f is bounded from below on $\varOmega $, then there is a global minimizer of f subject to $\mathbf {x}\in \varOmega $.

Proof

(i) If f is strictly convex, it follows by Proposition 3.2 that ${\mathsf {A}}$ is SPD and $\mathbf {z}={\mathsf {A}}^{-1}\mathbf {b}$ is by Proposition 3.1 the unique minimizer of f on $\mathbb {R}^n$. Thus for any $\mathbf {x}\in \mathbb {R}^n$

$$f(\mathbf {x})\ge f(\mathbf {z}).$$

It follows that the infimum of $f(\mathbf {x})$ subject to $\mathbf {x}\in \varOmega $ exists, and there is a sequence of vectors $\mathbf {x}^k\in \varOmega $ such that

$$\lim _{k \rightarrow \infty } f(\mathbf {x}^k) =\inf _{\mathbf {x}\in \varOmega } f(\mathbf {x}). $$

The sequence $\{\mathbf {x}^k\}$ is bounded as

$$f(\mathbf {x}^k) -f(\mathbf {z})= \frac{1}{2}(\mathbf {x}^k -\mathbf {z})^T{\mathsf {A}}(\mathbf {x}^k -\mathbf {z}) \ge \frac{\lambda _{\min }}{2}\Vert \mathbf {x}^k -\mathbf {z}\Vert ^2, $$

where $\lambda _{\min }$ denotes the least eigenvalue of ${\mathsf {A}}$. It follows that $\{\mathbf {x}^k\}$ has at least one cluster point $\mathbf {\overline{x}}\in \varOmega $. Since f is continuous, we get

$$f(\mathbf {\overline{x}})=\inf _{\mathbf {x}\in \varOmega } f(\mathbf {x}). $$

The uniqueness follows by Proposition 3.3.

(ii) The proof is similar to that of (i). See, e.g., Bertsekas [1, Proposition A.8].

(iii) The statement is the well-known Frank–Wolfe theorem [6]. $\square $

3.4 Projections to Convex Sets

Let us define the projection $P_{\varOmega }$ to the (closed) convex set $\varOmega \subset \mathbb {R}^n$ as a mapping which assigns to each $\mathbf {x}\in \mathbb {R}^n$ its nearest vector $\mathbf {\widehat{x}}\in \varOmega $ as in Fig. 3.1. The following proposition concerns the projection induced by the Euclidean scalar product.

Proposition 3.5

Let $\varOmega \subseteq \mathbb {R}^n$ be a nonempty closed convex set and $\mathbf {x}\in \mathbb {R}^n$. Then there is a unique point $\mathbf {\widehat{x}}\in \varOmega $ with the minimum Euclidean distance from $\mathbf {x}$, and for any $\mathbf {y}\in \varOmega $

$$\begin{aligned} (\mathbf {x}-\mathbf {\widehat{x}})^T(\mathbf {y}-\mathbf {\widehat{x}})\le 0. \end{aligned}$$

(3.15)

Proof

Since the proof is trivial for $\mathbf {x}\in \varOmega $, let us assume that $\mathbf {x}\notin \varOmega $ is arbitrary but fixed and observe that the function f defined on $\mathbb {R}^n$ by

$$f(\mathbf {y})=\Vert \mathbf {x}-\mathbf {y}\Vert ^2=\mathbf {y}^T\mathbf {y} -2\mathbf {y}^T\mathbf {x}+\Vert \mathbf {x}\Vert ^2 $$

has the Hessian

$$\nabla ^2 f(\mathbf {y})=2{\mathsf {I}}. $$

The identity matrix being positive definite, it follows by Proposition 3.2 that f is strictly convex, so that the unique minimizer $\mathbf {\widehat{x}}\in \varOmega $ of $f(\mathbf {y})$ subject to $\mathbf {y}\in \varOmega $ exists by Proposition 3.4(i).

If $\mathbf {y}\in \varOmega $ and $\alpha \in (0,1)$, then by convexity of $\varOmega $

$$(1-\alpha )\mathbf {\widehat{x}}+\alpha \mathbf {y}=\mathbf {\widehat{x}}+\alpha (\mathbf {y}-\mathbf {\widehat{x}})\in \varOmega , $$

so that for any $\mathbf {x}\in \mathbb {R}^n$

$$\Vert \mathbf {x}-\mathbf {\widehat{x}}\Vert ^2 \le \Vert \mathbf {x}-\mathbf {\widehat{x}}-\alpha (\mathbf {y}-\mathbf {\widehat{x}})\Vert ^2. $$

Using simple manipulations and the latter inequality, we get

$$\begin{aligned} \Vert \mathbf {x}-\mathbf {\widehat{x}}-\alpha (\mathbf {y}-\mathbf {\widehat{x}})\Vert ^2= & {} \Vert \mathbf {\widehat{x}}-\mathbf {x}\Vert ^2+\alpha ^2\Vert \mathbf {y}-\mathbf {\widehat{x}}\Vert ^2-2\alpha (\mathbf {x}-\mathbf {\widehat{x}})^T(\mathbf {y}-\mathbf {\widehat{x}})\nonumber \\\le & {} \Vert \mathbf {x}-\mathbf {\widehat{x}}-\alpha (\mathbf {y}-\mathbf {\widehat{x}})\Vert ^2\nonumber \\&+\alpha ^2\Vert \mathbf {y}-\mathbf {\widehat{x}}\Vert ^2 -2\alpha (\mathbf {x}-\mathbf {\widehat{x}})^T(\mathbf {y}-\mathbf {\widehat{x}})\nonumber . \end{aligned}$$

Thus

$$2\alpha (\mathbf {x}-\mathbf {\widehat{x}})^T(\mathbf {y}-\mathbf {\widehat{x}})\le \alpha ^2\Vert \mathbf {y}-\mathbf {\widehat{x}}\Vert ^2 $$

for any $\alpha \in (0,1)$. To obtain (3.15), just divide the last inequality by $\alpha $ and observe that $\alpha $ may be arbitrarily small. $\square $

Using Proposition 3.5, it is not difficult to show that the mapping $P_{\varOmega }$ which assigns to each $\mathbf {x}\in \mathbb {R}^n$ its projection to $\varOmega $ is nonexpansive as in Fig. 3.2.

Corollary 3.1

Let $\varOmega \subseteq \mathbb {R}^n$ be a nonempty closed convex set, and for any $\mathbf {x}\in \mathbb {R}^n$, let $\mathbf {\widehat{x}}\in \varOmega $ denote the projection of $\mathbf {x}$ to $\varOmega $. Then for any $\mathbf {x, y}\in \varOmega $

$$\begin{aligned} \Vert \mathbf {\widehat{x}}-\mathbf {\widehat{y}}\Vert \le \Vert \mathbf {x}-\mathbf {y}\Vert . \end{aligned}$$

(3.16)

Proof

If $\mathbf {x, y}\in \mathbb {R}$, then by Proposition 3.5 their projections $\mathbf {\widehat{x}, \widehat{y}}$ to $\varOmega $ satisfy

$$(\mathbf {x}-\mathbf {\widehat{x}})^T(\mathbf {z}-\mathbf {\widehat{x}})\le 0\quad \mathrm{and} \quad (\mathbf {y}-\mathbf {\widehat{y}})^T(\mathbf {z}-\mathbf {\widehat{y}})\le 0 $$

for any $\mathbf {z}\in \varOmega $. Substituting $\mathbf {z=\widehat{y}}$ into the first inequality, $\mathbf {z=\widehat{x}}$ into the second inequality, and summing up, we get

$$(\mathbf {x}-\mathbf {\widehat{x}}-\mathbf {y}+\mathbf {\widehat{y}})^T(\mathbf {\widehat{y}}-\mathbf {\widehat{x}})\le 0. $$

After rearranging the entries and using the Schwarz inequality, we get

$$\Vert \mathbf {\widehat{x}}-\mathbf {\widehat{y}}\Vert ^2\le (\mathbf {x}-\mathbf {y})^T(\mathbf {\widehat{x}}-\mathbf {\widehat{y}}) \le \Vert \mathbf {x}-\mathbf {y}\Vert \Vert \mathbf {\widehat{x}}-\mathbf {\widehat{y}}\Vert , $$

which proves (3.16). $\square $

4 Equality Constrained Problems

We shall now consider the problems with the constraint set described by a set of linear equations. More formally, we shall look for

$$\begin{aligned} \min _{\mathbf {x}\in \varOmega _E} f(\mathbf {x}), \end{aligned}$$

(3.17)

where f is a convex quadratic function defined by (3.3), $\varOmega _E=\{\mathbf {x}\in \mathbb {R}^n: {\mathsf {B}}\mathbf {x}=\mathbf {c}\}$, ${\mathsf {B}}\in \mathbb {R}^{m\times n}$, and $\mathbf {c}\in \text {Im}{\mathsf {B}}$. We assume that ${\mathsf {B}}\ne {\mathsf {O}}$ is not a full column rank matrix, so that $\mathrm{Ker}{\mathsf {B}}\ne \{\mathbf {o}\}$, but we admit dependent rows of ${\mathsf {B}}$. It is easy to check that $\varOmega _E$ is a nonempty closed convex set.

A feasible set $\varOmega _E$ is a linear manifold of the form

$$\varOmega _E =\mathbf {\overline{x}}+\mathrm{Ker}{\mathsf {B}}, $$

where $\mathbf {\overline{x}}$ is any vector which satisfies

$${\mathsf {B}}\mathbf {\overline{x}}=\mathbf {c}. $$

Thus, a nonzero vector $\mathbf {d}\in \mathbb {R}^n$ is a feasible direction of $\varOmega _E$ at any $\mathbf {x}\in \varOmega _E$ if and only if $\mathbf {d}\in \mathrm{Ker}{\mathsf {B}}$, and $\mathbf {d}$ is a recession direction of $\varOmega _E$ if and only if $\mathbf {d}\in \mathrm{Ker}{\mathsf {B}}$.

Substituting $\mathbf {x}=\mathbf {\overline{x}}+\mathbf {z}, \ \mathbf {z}\in \mathrm{Ker}{\mathsf {B}}$, we can reduce (3.17) to the minimization of

$$\begin{aligned} f_{\overline{x}}(\mathbf {z}) =\frac{1}{2}\mathbf {z}^T{\mathsf {A}}\mathbf {z}-\left( \mathbf {b}-{\mathsf {A}}\mathbf {\overline{x}}\right) ^T\mathbf {z} \end{aligned}$$

(3.18)

over the subspace $\mathrm{Ker}{\mathsf {B}}$. Thus we can assume, without loss of generality, that $\mathbf {c}=\mathbf {o}$ in the definition of $\varOmega _E$. We shall occasionally use this assumption to simplify our exposition.

A useful tool for the analysis of equality constrained problems is the Lagrangian function $L_0:\mathbb {R}^{n+m} \rightarrow \mathbb {R}$ defined by

$$\begin{aligned} L_0(\mathbf {x},\varvec{\lambda })=f(\mathbf {x})+ \varvec{\lambda }^T({\mathsf {B}}\mathbf {x}-\mathbf {c}) =\frac{1}{2}\mathbf {x}^T{\mathsf {A}}\mathbf {x}-\mathbf {b}^T\mathbf {x}+({\mathsf {B}}\mathbf {x}-\mathbf {c})^T\varvec{\lambda }. \end{aligned}$$

(3.19)

Obviously

$$\begin{aligned} \nabla _{\mathbf {xx}}^2 L_0(\mathbf {x},\varvec{\lambda })= & {} \nabla ^2 f(\mathbf {x}) ={\mathsf {A}},\end{aligned}$$

(3.20)

$$\begin{aligned} \nabla _{\mathbf {x}} L_0(\mathbf {x},\varvec{\lambda })= & {} \nabla f(\mathbf {x})+ {\mathsf {B}}^T\varvec{\lambda } ={\mathsf {A}}\mathbf {x}-\mathbf {b}+{\mathsf {B}}^T\varvec{\lambda },\end{aligned}$$

(3.21)

$$\begin{aligned} L_0(\mathbf {x}+\mathbf {d},\varvec{\lambda })= & {} L_0(\mathbf {x},\varvec{\lambda }) +({\mathsf {A}}\mathbf {x}-\mathbf {b}+{\mathsf {B}}^T\varvec{\lambda })^T\mathbf {d} +\frac{1}{2}\mathbf {d}^T{\mathsf {A}}\mathbf {d}. \end{aligned}$$

(3.22)

The Lagrangian function is defined in such a way that if considered as a function of $\mathbf {x}$, then its Hessian and its restriction to $\varOmega _E$ are exactly those of f, but its gradient $\nabla _{\mathbf {x}} L_0(\mathbf {x},\varvec{\lambda })$ varies depending on the choice of $\varvec{\lambda }$. It simply follows that if f is convex, then $L_0$ is convex for any fixed $\varvec{\lambda }$, and the global minimizer of $L_0$ with respect to $\mathbf {x}$ also varies with $\varvec{\lambda }$. We shall see that it is possible to give conditions on ${\mathsf {A}}$, ${\mathsf {B}}$, and $\mathbf {b}$ such that with a suitable choice $\varvec{\lambda }=\widehat{\varvec{\lambda }}$, the solution of the constrained minimization problem (3.17) reduces to the unconstrained minimization of $L_0$ as in Fig. 3.3.

4.1 Optimality Conditions

The main questions concerning the optimality and solvability conditions of (3.17) are answered by the next proposition.

Proposition 3.6

Let the equality constrained problem (3.17) be defined by an SPS or SPD matrix ${\mathsf {A}}\in \mathbb {R}^{n\times n}$, a constraint matrix ${\mathsf {B}}\in \mathbb {R}^{m\times n}$ the column rank of which is less than n, and vectors $\mathbf {b}\in \mathbb {R}^n$, $\mathbf {c}\in \mathrm{Im}{\mathsf {B}}$. Then the following statements hold:

(i) A vector $\mathbf {\overline{x}}\in \varOmega _E$ is a solution of (3.17) if and only if

$$\begin{aligned} ({\mathsf {A}}\mathbf {\overline{x}}-\mathbf {b})^T\mathbf {d}=0 \end{aligned}$$

(3.23)

for any $\mathbf {d}\in \mathrm{Ker}{\mathsf {B}}$.

(ii) A vector $\mathbf {\overline{x}}\in \varOmega _E$ is a solution of (3.17) if and only if there is a vector $\overline{\varvec{\lambda }}\in \mathbb {R}^m$ such that

$$\begin{aligned} {\mathsf {A}}\mathbf {\overline{x}}-\mathbf {b}+{\mathsf {B}}^T\overline{\varvec{\lambda }}=\mathbf {o}. \end{aligned}$$

(3.24)

Proof

(i) Let $\mathbf {\overline{x}}$ be a solution of the equality constrained minimization problem (3.17), so that for any $\mathbf {d}\in \mathrm{Ker}{\mathsf {B}}$ and $\alpha \in \mathbb {R}$

$$\begin{aligned} 0\le f(\mathbf {\overline{x}}+ \alpha \mathbf {d})-f(\mathbf {\overline{x}})= \alpha ({\mathsf {A}}\mathbf {\overline{x}}-\mathbf {b})^T \mathbf {d} +\frac{\alpha ^2}{2}\mathbf {d}^T{\mathsf {A}}\mathbf {d}. \end{aligned}$$

(3.25)

For sufficiently small values of $\alpha $ and $({\mathsf {A}}\mathbf {\overline{x}} -\mathbf {b})^T\mathbf {d}\ne 0$, the sign of the right-hand side of (3.25) is determined by the sign of $\alpha ({\mathsf {A}}\mathbf {\overline{x}} -\mathbf {b})^T\mathbf {d}$. Since we can choose the sign of $\alpha $ arbitrarily and the right-hand side of (3.42) is nonnegative, we conclude that (3.23) holds for any $\mathbf {d}\in \mathrm {Ker}{\mathsf {B}}$.

Let us now assume that (3.33) holds for a vector $\mathbf {\overline{x}}\in \varOmega _E$. Then

$$f(\mathbf {\overline{x}}+ \mathbf {d})-f(\mathbf {\overline{x}})= \frac{1}{2}\mathbf {d}^T{\mathsf {A}}\mathbf {d}\ge 0 $$

for any $\mathbf {d}\in \mathrm {Ker}{\mathsf {B}}$, so that $\mathbf {\overline{x}}$ is a solution of (3.17).

(ii) Let $\mathbf {\overline{x}}$ be a solution of (3.17), so that by (i) $\mathbf {\overline{x}}$ satisfies (3.23) for any $\mathbf {d}\in \mathrm {Ker}{\mathsf {B}}$. The latter condition is by (2.30) equivalent to ${\mathsf {A}}\mathbf {\overline{x}}-\mathbf {b}\in \mathrm {Im}{\mathsf {B}}^T$, so that there is $\overline{\varvec{\lambda }}\in \mathbb {R}^m$ such that (3.24) holds.

If there are $\overline{\varvec{\lambda }}$ and $\mathbf {\overline{x}}\in \varOmega _E$ such that (3.57) holds, then by Taylor’s expansion (3.22)

$$f(\mathbf {\overline{x}}+ \mathbf {d})-f(\mathbf {\overline{x}}) =L_0(\mathbf {\overline{x}}+ \mathbf {d}, \overline{\varvec{\lambda }})-L_0(\mathbf {\overline{x}}, \overline{\varvec{\lambda }})=\frac{1}{2}\mathbf {d}^T{\mathsf {A}}\mathbf {d}\ge 0 $$

for any $\mathbf {d}\in \mathrm {Ker}{\mathsf {B}}$, so $\mathbf {\overline{x}}$ is a solution of the equality constrained problem (3.17). $\square $

The conditions (ii) of Proposition 3.6 are known as the Karush–Kuhn–Tucker (KKT) conditions for the solution of the equality constrained problem (3.17). If $\mathbf {\overline{x}}\in \varOmega _E$ and $\overline{\varvec{\lambda }}\in \mathbb {R}^m$ satisfy (3.24), then $(\mathbf {\overline{x}}, \overline{\varvec{\lambda }})$ is called a KKT pair of problem (3.17). Its second component $\overline{\varvec{\lambda }}$ is called the vector of Lagrange multipliers or simply the multiplier. Weshall often use the notation $\mathbf {\widehat{x}}$ or $\widehat{\varvec{\lambda }}$ to denote the components of a KKT pair that are uniquely determined.

Proposition 3.6 has a simple geometrical interpretation. The condition (3.33) requires that the gradient of f at a solution $\mathbf {\overline{x}}$ is orthogonal to $\mathrm {Ker}{\mathsf {B}}$, the set of feasible directions of $\varOmega _E$, so that there is no feasible decrease direction as illustrated in Fig. 3.4. Since $\mathbf {d}$ is by (2.30) orthogonal to $\mathrm {Ker}{\mathsf {B}}$ if and only if $\mathbf {d}\in \mathrm {Im}{\mathsf {B}}^T$, it follows that (3.23) is equivalent to the possibility to choose $\overline{\varvec{\lambda }}$ so that $\nabla _{\mathbf {x}}L_0(\mathbf {\overline{x}}, \overline{\varvec{\lambda }})=\mathbf {o}$. If f is convex, then the latter condition is equivalent to the condition for the unconstrained minimizer of $L_0$ with respect to $\mathbf {x}$ as illustrated in Fig. 3.5.

Notice that if f is convex, then the vector of Lagrange multipliers which is the component of a KKT pair modifies the linear term of the original problem in such a way that the solution of the unconstrained modified problem is exactly the same as the solution of the original constrained problem. In terms of mechanics, if the original problem describes the equilibrium of a constrained elastic body subject to traction, then the modified problem is unconstrained with the constraints replaced by the reaction forces.

4.2 Existence and Uniqueness

Using the optimality conditions of Sect. 3.4.1, we can formulate the conditions that guarantee the existence or uniqueness of a solution of (3.17).

Proposition 3.7

Let the equality constrained problem (3.17) be defined by an SPS or SPD matrix ${\mathsf {A}}\in \mathbb {R}^{n\times n}$, a constraint matrix ${\mathsf {B}}\in \mathbb {R}^{m\times n}$ the column rank of which is less than n, and vectors $\mathbf {b}\in \mathbb {R}^n, \mathbf {c}\in \mathrm{Im}{\mathsf {B}}$. Let ${\mathsf {R}}$ denote a matrix the columns of which span $\mathrm{Ker}{\mathsf {A}}$. Then the following statements hold:

(i) If ${\mathsf {A}}$ is an SPS matrix, then problem (3.17) has a solution if and only if

$$\begin{aligned} {\mathsf {R}}^T\mathbf {b}\in \mathrm{Im}({\mathsf {R}}^T{\mathsf {B}}^T). \end{aligned}$$

(3.26)

(ii) If ${\mathsf {A}}|\mathrm{Ker}{\mathsf {B}}$ is positive definite, then problem (3.17) has a unique solution.

(iii) If $(\mathbf {\overline{x}}, \overline{\varvec{\lambda }})$ and $(\mathbf {\overline{y}}, \overline{\varvec{\mu }})$ are KKT couples for problem (3.17), then

$$\mathbf {\overline{x}}- \mathbf {\overline{y}}\in \mathrm {Ker}{\mathsf {A}}\cap \mathrm {Span}\{\mathbf {b}\}^{\bot } \quad \mathrm {and} \quad \overline{\varvec{\lambda }}- \overline{\varvec{\mu }}\in \mathrm{Ker}{\mathsf {B}}^T. $$

In particular, if problem (3.17) has a solution and

$$ \mathrm {Ker}{\mathsf {B}}^T=\{\mathbf {o}\}, $$

then there is a unique Lagrange multiplier $\widehat{\varvec{\lambda }}$.

Proof

(i) Using Proposition 3.6(ii), we have that problem (3.17) has a solution if and only if there is $\varvec{\lambda }$ such that $\mathbf {b}-{\mathsf {B}}^T\varvec{\lambda }\in \mathrm{Im}{\mathsf {A}}$, or, equivalently, that $\mathbf {b}-{\mathsf {B}}^T\varvec{\lambda }$ is orthogonal to $\mathrm{Ker}{\mathsf {A}}$. The latter condition reads ${\mathsf {R}}^T\mathbf {b}-{\mathsf {R}}^T{\mathsf {B}}^T\varvec{\lambda }=\mathbf {o}$ and can be rewritten as (3.26).

(ii) First observe that if ${\mathsf {A}}|\mathrm {Ker}{\mathsf {B}}$ is positive definite, then $f|\mathrm {Ker}{\mathsf {B}}$ is strictly convex by Proposition 3.2 and it is easy to check that $f|\varOmega _E$ is strictly convex. Since $\varOmega _E$ is closed, convex, and nonempty, it follows by Proposition 3.4(i) that the equality constrained problem (3.17) has a unique solution.

(iii) First observe that $\mathrm{Ker}{\mathsf {B}}=\{\mathbf {x}-\mathbf {y}: \mathbf {x}, \mathbf {y}\in \varOmega _E\}$ and that f is convex on $\mathrm{Ker}{\mathsf {B}}$ by the assumption and Proposition 3.2. Thus if $\mathbf {\overline{x}}$ and $\mathbf {\overline{y}}$ are any solutions of (3.17), then the left relation follows by Proposition 3.3(ii). The rest follows by a simple analysis of the KKT conditions (3.57). $\square $

If ${\mathsf {B}}$ is not a full row rank matrix and $\overline{\varvec{\lambda }}$ is a Lagrange multiplier for (3.17), then by Proposition 3.7(iii) any Lagrange multiplier $\varvec{\lambda }$ can be expressed in the form

$$\begin{aligned} \varvec{\lambda }=\overline{\varvec{\lambda }}+\mathbf {d}, \ \ \mathbf {d}\in \mathrm{Ker}{\mathsf {B}}^T. \end{aligned}$$

(3.27)

The Lagrange multiplier $\varvec{\lambda }_\mathrm{LS}$ which minimizes the Euclidean norm is called the least square Lagrange multiplier; it is a unique multiplier which belongs to $\mathrm {Im}{\mathsf {B}}$. If $\overline{\varvec{\lambda }}$ is a vector of Lagrange multipliers, then $\varvec{\lambda }_\mathrm{LS}$ can be evaluatedby

$$\begin{aligned} \varvec{\lambda }_\mathrm{LS}=\left( {\mathsf {B}}^{\dagger }\right) ^T{\mathsf {B}}^T\overline{\varvec{\lambda }} \end{aligned}$$

(3.28)

and

$$ \overline{\varvec{\lambda }}=\varvec{\lambda }_\mathrm{LS}+\mathbf {d}, \ \mathbf {d}\in \mathrm{Ker}{\mathsf {B}}^T. $$

If ${\mathsf {A}}$ is positive definite, then the unique solution $\mathbf {\widehat{x}}$ of (3.17) is by Proposition 3.6 fully determined by the matrix equation

$$\begin{aligned} \left[ \begin{array}{cc} {\mathsf {A}} &{} {\mathsf {B}}^T\\ {\mathsf {B}} &{} {\mathsf {O}} \end{array} \right] \left[ \begin{array}{c} \mathbf {x}\\ \varvec{\lambda } \end{array}\right] =\left[ \begin{array}{c} \mathbf {b}\\ \mathbf {c} \end{array}\right] , \end{aligned}$$

(3.29)

which is known as the Karush–Kuhn–Tucker system, briefly KKT system or KKT conditions for the equality constrained problem (3.17). Proposition 3.6 does not require that the related KKT system is nonsingular, in agreement with observation that the solution of the equality constrained problem should not depend on the description of $\varOmega _E$.

4.3 Sensitivity

The Lagrange multipliers emerged in Proposition 3.6 as auxiliary variables which nobody had asked for, but which turned out to be useful in alternative formulations of the optimality conditions. However, it turns out that the Lagrange multipliers frequently have an interesting interpretation in specific practical contexts, as we have mentioned at the end of Sect. 3.4.1, where we briefly described their mechanical interpretation. Here we show that if they are uniquely determined by the KKT conditions (3.29), then they are related to the rates of change of the optimal cost due to the violation of constraints.

Let us assume that ${\mathsf {A}}$ and ${\mathsf {B}}$ are positive definite and full rank matrices, respectively, so that there is a unique KKT couple $(\mathbf {\widehat{x}}, \widehat{\varvec{\lambda }})$ of the equality constrained problem (3.17). For $\mathbf {u}\in \mathbb {R}^m$, let us consider also the perturbed problem

$$\min _{{\mathsf {B}}\mathbf {x}=\mathbf {c}+\mathbf {u}} f(\mathbf {x}) $$

as in Fig. 3.6. Its solution $\mathbf {x}(\mathbf {u})$ and the corresponding vector of Lagrange multipliers $\varvec{\lambda }(\mathbf {u})$ are fully determined by the KKT conditions

$$\left[ \begin{array}{cc} {\mathsf {A}} &{} {\mathsf {B}}^T\\ {\mathsf {B}} &{} {\mathsf {O}} \end{array} \right] \left[ \begin{array}{c} \mathbf { x}(\mathbf {u})\\ \varvec{\lambda }(\mathbf {u}) \end{array}\right] =\left[ \begin{array}{c} \mathbf {b}\\ \mathbf {c}+\mathbf {u} \end{array}\right] , $$

so that

$$ \left[ \begin{array}{c} \mathbf { x}(\mathbf {u})\\ \varvec{\lambda }(\mathbf {u}) \end{array}\right] =\left[ \begin{array}{cc} {\mathsf {A}} &{} {\mathsf {B}}^T\\ {\mathsf {B}} &{} {\mathsf {O}} \end{array} \right] ^{-1} \left[ \begin{array}{c} \mathbf {b}\\ \mathbf {c}+\mathbf {u} \end{array}\right] =\left[ \begin{array}{cc} {\mathsf {A}} &{} {\mathsf {B}}^T\\ {\mathsf {B}} &{} {\mathsf {O}} \end{array} \right] ^{-1} \left[ \begin{array}{c} \mathbf {b}\\ \mathbf {c} \end{array}\right] +\left[ \begin{array}{cc} {\mathsf {A}} &{} {\mathsf {B}}^T\\ {\mathsf {B}} &{} {\mathsf {O}} \end{array} \right] ^{-1} \left[ \begin{array}{c} \mathbf {o}\\ \mathbf {u} \end{array}\right] . $$

First observe that $\mathbf {d}(\mathbf {u})=\mathbf {x}(\mathbf {u})-\mathbf {\widehat{x}}$ satisfies

$$ {\mathsf {B}}\mathbf {d}(\mathbf {u})={\mathsf {B}}\mathbf {x}(\mathbf {u})-{\mathsf {B}}\mathbf {\widehat{x}}=\mathbf {u}, $$

so that we can use $\nabla f(\mathbf {\widehat{x}})=-{\mathsf {B}}^T\widehat{\varvec{\lambda }}$ to approximate the change of optimal cost by

$$ \nabla f(\mathbf {\widehat{x}})^T\mathbf {d}(\mathbf {u}) =-({\mathsf {B}}^T\widehat{\varvec{\lambda }})^T\mathbf {d}(\mathbf {u}) =-\widehat{\varvec{\lambda }}^T{\mathsf {B}}\mathbf {d}(\mathbf {u}) =-\widehat{\varvec{\lambda }}^T\mathbf {u}. $$

It follows that $-[\widehat{\varvec{\lambda }}]_i$ can be used to approximate the change of the optimal cost due to the violation of the ith constraint by $[\mathbf {u}]_i$.

To give a more detailed analysis of the sensitivity of the optimal cost with respect to the violation of constraints, let us define for each $\mathbf {u}\in \mathbb {R}^m$ the primal function

$$ p(\mathbf {u})=f\left( \mathbf {x}(\mathbf {u})\right) . $$

Observing that $\mathbf {\widehat{x}}=\mathbf {x}(\mathbf {o})$ and using the explicit formula (2.4) to evaluate the inverse of the KKT system, we get

$$ \mathbf {x}(\mathbf {u})=\mathbf {\widehat{x}}+{\mathsf {A}}^{-1}{\mathsf {B}}^T{\mathsf {S}}^{-1}\mathbf {u}, $$

where ${\mathsf {S}}={\mathsf {B}}{\mathsf {A}}^{-1}{\mathsf {B}}^T$ denotes the Schur complement matrix. Thus

$$ \mathbf {x}(\mathbf {u}) -\mathbf {\widehat{x}}= {\mathsf {A}}^{-1}{\mathsf {B}}^T{\mathsf {S}}^{-1}\mathbf {u}, $$

so that

$$\begin{aligned} p(\mathbf {u})-p(\mathbf {o})= & {} f\left( \mathbf {x}(\mathbf {u})\right) -f(\mathbf {\widehat{x}})\nonumber \\= & {} \nabla f(\mathbf {\widehat{x}})^T \bigl (\mathbf {x}(\mathbf {u})-\mathbf {\widehat{x}}\bigr ) +\frac{1}{2}\bigl (\mathbf {x}(\mathbf {u})-\mathbf {\widehat{x}}\bigr )^T{\mathsf {A}}\bigl (\mathbf {x}(\mathbf {u})-\mathbf {\widehat{x}}\bigr )\nonumber \\= & {} \nabla f(\mathbf {\widehat{x}})^T {\mathsf {A}}^{-1}{\mathsf {B}}^T{\mathsf {S}}^{-1}\mathbf {u} +\frac{1}{2}\mathbf {u}^T{\mathsf {S}}^{-1}{\mathsf {B}}{\mathsf {A}}^{-1}{\mathsf {B}}^T{\mathsf {S}}^{-1}\mathbf {u}.\nonumber \end{aligned}$$

It follows that the gradient of the primal function p at $\mathbf {o}$ is given by

$$ \nabla p(\mathbf {o})=\left( \nabla f(\mathbf {\widehat{x}})^T{\mathsf {A}}^{-1}{\mathsf {B}}^T{\mathsf {S}}^{-1}\right) ^T={\mathsf {S}}^{-1}{\mathsf {B}}{\mathsf {A}}^{-1}\nabla f(\mathbf {\widehat{x}}). $$

Recalling that $\nabla f({\mathbf {\widehat{x}}})=-{\mathsf {B}}^T\widehat{\varvec{\lambda }}$, we get

$$\begin{aligned} \nabla p(\mathbf {o}) =-{\mathsf {S}}^{-1}{\mathsf {B}}{\mathsf {A}}^{-1}{\mathsf {B}}^T\widehat{\varvec{\lambda }} =-\widehat{\varvec{\lambda }}. \end{aligned}$$

(3.30)

The analysis shows that the decrease of the total differential of f outside $\varOmega _E$ near $\mathbf {\widehat{x}}$ is compensated by the increase of $\widehat{\varvec{\lambda }}^T({\mathsf {B}}\mathbf {x}-\mathbf {c})$. See also Fig. 3.3. The components of $\widehat{\varvec{\lambda }}$ are also called shadow prices after their interpretation in economy.

5 Inequality Constrained Problems

Let us now consider the problems

$$\begin{aligned} \min _{\mathbf {x}\in \varOmega _I} f(\mathbf {x}), \quad \varOmega _I=\{\mathbf {x}\in \mathbb {R}^n: \mathbf {h}(\mathbf {x})\le \mathbf {o}\}, \end{aligned}$$

(3.31)

where f is a quadratic function defined by (3.3) and the constraints are defined by continuously differentiable convex functions $h_i(\mathbf {x})=[\mathbf {h}(\mathbf {x})]_i, i=1, \dots , s,$ that satisfy $\nabla h_i(\mathbf {x})\ne \mathbf {o}$ when $h_i(\mathbf {x})=0$. In our applications, $h_i$ are either linear forms

$$ h_i(\mathbf {x})=\mathbf {b}^T_i\mathbf {x}-c_i, \ \ c_i\in \mathbb {R}, $$

or strictly convex separable quadratic functions, i.e.,

$$ {h}_i(\mathbf {x})=(\mathbf {x}_i -\mathbf {y}_i)^T{\mathsf {H}}_i(\mathbf {x}_i-\mathbf {y}_i)-c_i, \ \ \mathbf {x}_i, \mathbf {y}_i\in \mathbb {R}^2, \ \ {\mathsf {H}}_i \ \ \mathrm {SPD}, \ \ c_i>0. $$

We assume that $\varOmega _I$ is nonempty. If the definition of $\varOmega _I$ includes a quadratic inequality, we call (3.31) the QCQP (Quadratic Constraints Quadratic Cost) problem.

At any feasible point $\mathbf {x}$, we define the active set

$$ {\mathscr {A}}(\mathbf {x}) = \{ i\in \{1,\dots , s\}: \ h_i(\mathbf {x}) = 0\}. $$

In particular, if $\mathbf {\overline{x}}$ is a solution of (3.31) with $\mathbf {h}(\mathbf {x})={\mathsf {B}}\mathbf {x}-\mathbf {c}$, ${\mathsf {B}}\in \mathbb {R}^{s\times n}$, then each feasible direction of $\overline{\varOmega }_E =\{\mathbf {x}\in \mathbb {R}^n: \ [{\mathsf {B}}\mathbf {x}]_{{\mathscr {A}}(\mathbf {\overline{x}})}=\mathbf {c}_{{\mathscr {A}}(\mathbf {\overline{x}})}\}$ at $\mathbf {\overline{x}}$ is a feasible direction of $\varOmega _I$ at $\mathbf {\overline{x}}$. Using the arguments of Sect. 3.4.1, we get that $\mathbf {\overline{x}}$ is also a solution of the equality constrained problem

$$\begin{aligned} \min _{\mathbf {x}\in \overline{\varOmega }_E} f(\mathbf {x}),\quad \overline{\varOmega }_E =\{\mathbf {x}\in \mathbb {R}^n: \ [{\mathsf {B}}\mathbf {x}]_{{\mathscr {A}}(\mathbf {\overline{x}})}=\mathbf {c}_{{\mathscr {A}}(\mathbf {\overline{x}})}\}. \end{aligned}$$

(3.32)

Thus (3.31) is a more difficult problem than the equality constrained problem (3.17) as its solution necessarily enhances the identification of ${\mathscr {A}}(\mathbf {\overline{x}})$.

5.1 Optimality Conditions for Linear Constraints

We shall start our exposition with the following optimality conditions.

Proposition 3.8

Let the inequality constrained problem (3.31) be defined by an SPS or SPD symmetric matrix ${\mathsf {A}}\in \mathbb {R}^{n\times n}$, the constraint matrix ${\mathsf {B}}\in \mathbb {R}^{m\times n}$, and the vectors $\mathbf {b, c}\in \mathbb {R}^n$. Let $\varOmega _I\ne \emptyset $. Then the following statements hold:

(i) $\ \mathbf {\overline{x}}\in \varOmega _I$ is a solution of (3.31) if and only if

$$\begin{aligned} ({\mathsf {A}}\mathbf {\overline{x}}-\mathbf {b})^T\mathbf {d}\ge 0 \end{aligned}$$

(3.33)

for any feasible direction $\mathbf {d}$ of $\varOmega _I$ at $\mathbf {\overline{x}}$.

(ii) $\ \mathbf {\overline{x}}\in \varOmega _I$ is a solution of (3.31) with linear inequality constraints if and only if there is $\overline{\varvec{\lambda }}\in \mathbb {R}^m$ such that

$$\begin{aligned} \overline{\varvec{\lambda }}\ge \mathbf {o}, \quad {\mathsf {A}}\mathbf {\overline{x}}-\mathbf {b}+{\mathsf {B}}^T\overline{\varvec{\lambda }}=\mathbf {o}, \quad and \quad \overline{\varvec{\lambda }}^T({\mathsf {B}}\mathbf {\overline{x}}-\mathbf {c})=0. \end{aligned}$$

(3.34)

Proof

(i) Let $\mathbf {\overline{x}}$ be a solution of the inequality constrained problem (3.31) and let $\mathbf {d}$ denote a feasible direction of $\varOmega _I$ at $\mathbf {\overline{x}}$, so that the right-hand side of

$$\begin{aligned} f(\mathbf {\overline{x}}+ \alpha \mathbf {d})-f(\mathbf {\overline{x}})= \alpha ({\mathsf {A}}\mathbf {\overline{x}}-\mathbf {b})^T \mathbf {d} +\frac{\alpha ^2}{2}\mathbf {d}^T{\mathsf {A}}\mathbf {d} \end{aligned}$$

(3.35)

is nonnegative for all sufficiently small $\alpha >0$. To prove (3.33), it is enough to take $\alpha >0$ so small that the nonnegativity of the right-hand side of (3.35) implies that

$$\begin{aligned} \alpha ({\mathsf {A}}\mathbf {\overline{x}}-\mathbf {b})^T\mathbf {d}\ge 0. \end{aligned}$$

Let us assume that $\mathbf {\overline{x}}\in \varOmega _I$ satisfies (3.33) and $\mathbf {x}\in \varOmega _I$. Since $\varOmega _I$ is convex, it follows that $\mathbf {d}=\mathbf {x}-\mathbf {\overline{x}}$ is a feasible direction of $\varOmega _I$ at $\mathbf {\overline{x}}$, so that, using Taylor’s expansion and the assumptions, we have

$$f(\mathbf {x})-f(\mathbf {\overline{x}})=({\mathsf {A}}\mathbf {\overline{x}}-\mathbf {b})^T\mathbf {d} +\frac{1}{2}\mathbf {d}^T{\mathsf {A}}\mathbf {d}\ge 0. $$

(ii) Notice any solution $\mathbf {\overline{x}}$ of (3.31) solves (3.32), so that by Proposition 3.6(ii) there is $\mathbf {y}$ such that

$${\mathsf {A}}\mathbf {\overline{x}}-\mathbf {b}+{\mathsf {B}}^T_{{\mathscr {A}}(\mathbf {\overline{x}})}\mathbf {y} =\mathbf {c}_{{\mathscr {A}}(\mathbf {\overline{x}})}, $$

and $\mathbf {y}\ge \mathbf {o}$ by the arguments based on the sensitivity of the minimum in Sect. 3.4.3. To finish the proof, it is enough to define $\varvec{\lambda }$ as $\mathbf {y}$ padded with zeros.

If (3.34) holds and $\overline{\mathbf {x}}+\mathbf {d}\in \varOmega _I$, then ${\mathsf {A}}\overline{\mathbf {x}}-\mathbf {b}=-{\mathsf {B}}^T\varvec{\lambda }$ and

$$\ \ f(\overline{\mathbf {x}}+\mathbf {d})-f(\overline{\mathbf {x}}) =({\mathsf {A}}\overline{\mathbf {x}}-\mathbf {b})^T\mathbf {d} +\frac{1}{2}{\mathbf {d}}^T{\mathsf {A}}{\mathbf {d}} \ge -\varvec{\lambda }^T{\mathsf {B}}{\mathbf {d}} =-\varvec{\lambda }^T\left( {\mathsf {B}}(\overline{\mathbf {x}}+{\mathbf {d}})-\mathbf {c}\right) \ge \mathbf {o}. $$

$\square $

Remark 3.2

Condition (3.33) can be written as a variational inequality

$$({\mathsf {A}}\mathbf {\overline{x}})^T(\mathbf {x}-\overline{\mathbf {x}})\ge \mathbf {b}^T(\mathbf {x}-\overline{\mathbf {x}}), \quad \mathbf {x}\in \varOmega _{I}. $$

The conditions (3.34) are called the KKT conditions for inequality constraints. The last of these conditions is called the condition of complementarity.

5.2 Optimality Conditions for Bound Constrained Problems

A special case of problem (3.31) is the bound constrained problem

$$\begin{aligned} \min _{\mathbf {x}\in \varOmega _B} f(\mathbf {x}), \ \ \varOmega _B=\{\mathbf {x}\in \mathbb {R}^n: \ \mathbf {x}\ge \ell \}, \end{aligned}$$

(3.36)

where f is a quadratic function defined by (3.3) and $\ell \in \mathbb {R}^n$. The optimality conditions for convex bound constrained problems can be written in a more convenient form.

Proposition 3.9

Let f be a convex quadratic function defined by (3.3) with a positive semidefinite Hessian ${\mathsf {A}}$. Then $\mathbf {\overline{x}}\in \varOmega _B$ solves (3.36) if and only if

$$\begin{aligned} {\mathsf {A}}\mathbf {\overline{x}}-\mathbf {b}\ge \mathbf {o} \quad {and} \quad ({\mathsf {A}}\mathbf {\overline{x}}-\mathbf {b})^T(\mathbf {\overline{x}}-\ell )=0. \end{aligned}$$

(3.37)

Proof

First observe that denoting ${\mathsf {B}}=-{\mathsf {I}}_n, \ \mathbf {c}=-\ell $, and

$$\varOmega _I=\{\mathbf {x}\in \mathbb {R}^n: \ {\mathsf {B}}\mathbf {x}\le \mathbf {c}\},$$

the bound constrained problem (3.36) becomes the standard inequality constrained problem (3.31) with $\varOmega _I=\varOmega _B$. Using Proposition 3.11, it follows that $\mathbf {\overline{x}}\in \varOmega _B$ is the solution of (3.36) if and only if there is $\varvec{\lambda }\in \mathbb {R}^n$ such that

$$\begin{aligned} \varvec{\lambda }\ge \mathbf {o}, \quad {\mathsf {A}}\mathbf {\overline{x}}-\mathbf {b}-{\mathsf {I}}\varvec{\lambda }=\mathbf {o}, \quad \mathrm{and} \quad \varvec{\lambda }^T(\mathbf {\overline{x}}-\ell )=0. \end{aligned}$$

(3.38)

We complete the proof by observing that (3.37) can be obtained from (3.38) and vice versa by substituting $\varvec{\lambda }={\mathsf {A}}\mathbf {\overline{x}}-\mathbf {b}$. $\square $

In the proof, we have shown that $\varvec{\lambda }=\nabla f(\mathbf {\overline{x}})$ is a vector of Lagrange multipliers for the constraints $-\mathbf {x}\le -\ell $, or, equivalently, for $\mathbf {x}\ge \ell $. Notice that the conditions (3.37) require that none of the vectors $\mathbf {s}_i$ is a feasible decrease direction of $\varOmega _B$ at $\mathbf {\overline{x}}$, where $\mathbf {s}_i$ denotes a vector of the standard basis of $\mathbb {R}^n$ formed by the columns of ${\mathsf {I}}_n$, $i\in {\mathscr {A}}(\mathbf {\overline{x}})$.

5.3 Optimality Conditions for More General Constraints

If the constraints $h_i$ that define (3.31) are nonlinear, it can happen that their first-order representation by means of the gradients $\nabla h_i$ is not adequate. The reason is illustrated in Fig. 3.7, where the linear cone ${\mathscr {L}}_{\varOmega _{I}}(\mathbf {\overline{x}})$ of $\varOmega _{I}$ at $\mathbf {\overline{x}}$ on the boundary of $\varOmega _{I}$ defined by

$${\mathscr {L}}_{\varOmega _{I}}(\mathbf {\overline{x}}) = \{\mathbf {d}\in \mathbb {R}^n:\mathbf {d}^T \nabla h_i(\mathbf {\overline{x}})\le 0,\ i\in {\mathscr {A}}(\mathbf {\overline{x}})\} $$

comprises the whole line, while the tangent cone ${\mathscr {T}}_{\varOmega _{I}}(\mathbf {\overline{x}})$ of $\varOmega _{I}$ at $\mathbf {\overline{x}}$ on the boundary of $\varOmega _{I}$ defined by

$$ {\mathscr {T}}_{\varOmega _{I}}(\mathbf {\overline{x}}) = \{\mathbf {d}\in \mathbb {R}^n: \ \mathbf {d}=\lim _{i\rightarrow \infty }\mathbf {d}^i, \ \mathbf {\overline{x}}+\alpha _i \mathbf {d}^i\in \varOmega _I, \ \lim _{i\rightarrow \infty }\alpha _i=0,\ \alpha _i>0\} $$

comprises only one point. To avoid such pathological situations, we shall assume that ${\mathscr {L}}_{\varOmega _{I}}(\mathbf {\overline{x}}) ={\mathscr {T}}_{\varOmega _{I}}(\mathbf {\overline{x}})$. This is also called the Abadie constraint qualification (ACQ) [5]. Notice that linear constraints satisfy ACQ. The ACQ assumption reduces the analysis of the conditions of minima to the linear case, so that we can formulate the following proposition.

Proposition 3.10

Let the inequality constrained problem (3.31) be defined by an SPS or SPD matrix ${\mathsf {A}}\in \mathbb {R}^{n\times n}$ and convex differentiable functions $h_i$ and let $\varOmega _I$ satisfies ACQ. Then $\mathbf {\overline{x}}\in \varOmega _I$ is a solution of (3.31) if and only if there is $\overline{\varvec{\lambda }}\in \mathbb {R}^m$ such that

$$\begin{aligned} \overline{\varvec{\lambda }}\ge \mathbf {o}, \quad {\mathsf {A}}\mathbf {\overline{x}}-\mathbf {b} +\nabla \mathbf {h}(\mathbf {\overline{x}}){\overline{\varvec{\lambda }}}=\mathbf {o}, \quad and \quad \overline{\varvec{\lambda }}^T\mathbf {h}(\mathbf {\overline{x}})=0. \end{aligned}$$

(3.39)

Proof

First notice that due to the definition of tangential cone, $\mathbf {\overline{x}}$ solves (3.31) if and only if

$$ f(\mathbf {\overline{x}})=\min _{\mathbf {\overline{x}}+ {\mathscr {T}}_{\varOmega _{I}}(\mathbf {\overline{x}})}f(\mathbf {x}). $$

Since we assume ${\mathscr {T}}_{\varOmega _{I}}(\mathbf {\overline{x}}) ={\mathscr {L}}_{\varOmega _{I}}(\mathbf {\overline{x}})$, the letter problem has the same solution as

$$ \min _{\mathbf {\overline{x}}+ {\mathscr {L}}_{\varOmega _{I}}(\mathbf {\overline{x}})}f(\mathbf {x}). $$

Using Proposition 3.8, we get that $\mathbf {\overline{x}}\in \varOmega _I$ solves (3.31) if and only if $\mathbf {\overline{x}}$ satisfies (3.39). $\square $

5.4 Existence and Uniqueness

In our discussion of the existence and uniqueness results for the inequality constrained QP problem (3.31), we restrict our attention to the following results that are useful in our applications.

Proposition 3.11

Let the inequality constrained problem (3.31) be defined by convex functions $h_i$ and f. Let ${\mathscr {C}}$ denote the cone of recession directions of the nonempty feasible set $\varOmega _I$. Then the following statements hold:

(i) If problem (3.31) has a solution, then

$$\begin{aligned} \mathbf {d}^T\mathbf {b}\le 0 \quad {for} \quad \mathbf {d}\in {{\mathscr {C}}}\cap \mathrm{Ker}{\mathsf {A}}.\end{aligned}$$

(3.40)

(ii) If the constraints are linear, then (3.40) is sufficient for the existence of minima.

(iii) If the constraints are linear and $(\mathbf {\overline{x}}, \overline{\varvec{\lambda }})$ and $(\mathbf {\overline{y}}, \overline{\varvec{\mu }})$ are KKT couples for (3.31), then

$$\begin{aligned} \mathbf {\overline{x}}-\mathbf {\overline{y}}\in \mathrm{Ker}{\mathsf {A}}\cap \mathrm {Span}\{\mathbf {b}\}^{\bot } \quad {and} \quad \overline{\varvec{\lambda }}-\overline{\varvec{\mu }}\in \mathrm{Ker}{\mathsf {B}}^T. \end{aligned}$$

(3.41)

(iv) If ${\mathsf {A}}$ is positive definite, then the inequality constrained minimization problem (3.31) has a unique solution.

Proof

(i) Let $\mathbf {\overline{x}}$ be a global solution of the inequality constrained minimization problem (3.31), and recall that

$$\begin{aligned} f(\mathbf {\overline{x}}+ \alpha \mathbf {d})-f(\mathbf {\overline{x}})= \alpha ({\mathsf {A}}\mathbf {\overline{x}}-\mathbf {b})^T \mathbf {d} +\frac{\alpha ^2}{2}\mathbf {d}^T{\mathsf {A}}\mathbf {d} \end{aligned}$$

(3.42)

for any $\mathbf {d}\in \mathbb {R}^n$ and $\alpha \in \mathbb {R}$. Taking $\mathbf {d}\in {{\mathscr {C}}}\cap \mathrm{Ker}{\mathsf {A}}$, (3.42) reduces to

$$f(\mathbf {\overline{x}}+ \alpha \mathbf {d})-f(\mathbf {\overline{x}})= -\alpha \mathbf {b}^T \mathbf {d}, $$

which is nonnegative for any $\alpha \ge 0$ if and only if $\mathbf {b}^T \mathbf {d}\le 0$.

(ii) See Dostál [7].

(iii) The first inclusion of (3.41) holds by Proposition 3.3(ii) for the solutions of any convex problem. The inclusion for multipliers follows by the KKT condition (3.34).

(iv) If ${\mathsf {A}}$ is SPD, then f is strictly convex by Proposition 3.2, so by Proposition 3.4 there is a unique minimizer of f subject to $\mathbf {x}\in \varOmega _I$. $\square $

6 Equality and Inequality Constrained Problems

In the previous sections, we have obtained the results concerning optimization problems with either equality or inequality constraints. Here we extend these results to the optimization problems with both equality and inequality constraints. More formally, we look for

$$\begin{aligned} \min _{\mathbf {x}\in \varOmega _{IE}} f(\mathbf {x}), \quad \varOmega _{IE}=\{\mathbf {x}\in \mathbb {R}^n: \ [\mathbf {h(x)}]_{{\mathscr {I}}}\le \mathbf {o}_{{\mathscr {I}}}, \ [\mathbf {h(x)}]_{{\mathscr {E}}}= \mathbf {o}_{{\mathscr {E}}}\}, \end{aligned}$$

(3.43)

where f is a quadratic function with the SPS Hessian ${\mathsf {A}}\in \mathbb {R}^{n\times n}$ and the linear term defined by $\mathbf {b}\in \mathbb {R}^n$, ${\mathscr {I}}, \ {\mathscr {E}}$ are the disjunct sets of indices which decompose $\{1,\dots ,m\}$, and the equality and inequality constraints are defined respectively by linear and continuously differentiable convex functions $[\mathbf {h}(\mathbf {x})]_i=h_i(\mathbf {x})$, $i=1, \dots , m$. We assume that $\nabla h_i(\mathbf {x})\ne \mathbf {o}$ when $h_i(\mathbf {x})=0$ and that $\varOmega _i\ne \emptyset $. We are especially interested in linear equality constraints and the inequality constraints defined either by linear forms or by strictly convex separable quadratic functions.

If we describe the conditions that define $\varOmega _{IE}$ in components, we get

$$\varOmega _{IE}=\{\mathbf {x}\in \mathbb {R}^n: \ {h}_i(\mathbf {x})\le 0, \ i\in {{\mathscr {I}}}, \ \mathbf {b}_i^T\mathbf {x}=c_i, \ i \in {{\mathscr {E}}}\}, $$

which makes sense even for ${\mathscr {I}}=\emptyset $ or ${\mathscr {E}}=\emptyset $; we consider the conditions which concern the empty set as always satisfied. For example, ${\mathscr {E}}=\emptyset $ gives

$$\varOmega _{IE} =\{\mathbf {x}\in \mathbb {R}^n: \ \mathbf {h}_i(\mathbf {x})\le 0, \ i\in {{\mathscr {I}}}\}, $$

and the kernel of an “empty” matrix is defined by

$$\mathrm{Ker}{\mathsf {B}}_{{\mathscr {E}} *}=\{\mathbf {x}\in \mathbb {R}^n: \ \mathbf {b}^T_i\mathbf {x}=0, \ i\in {{\mathscr {E}}}\}=\mathbb {R}^n. $$

If all constraints are linear, then $\varOmega _I$ is defined by ${\mathsf {B}}\in \mathbb {R}^{m\times n}$ and $\mathbf {c}\in \mathbb {R}^m$,

$$ {\mathsf {B}}=\left[ \begin{array}{c} \mathbf {b}_1^T\\ \dots \\ \mathbf {b}_m^T \end{array}\right] =\left[ \begin{array}{c} {\mathsf {B}}_I\\ {\mathsf {B}}_E \end{array}\right] , \quad \mathbf {c}=\left[ \begin{array}{c} \mathbf {c}_I\\ \mathbf {c}_E \end{array}\right] , $$

and we get a QP variant of (3.43)

$$\begin{aligned} \min _{\mathbf {x}\in \varOmega _{IE}} f(\mathbf {x}), \quad \varOmega _{IE}=\{\mathbf {x}\in \mathbb {R}^n: \ {\mathsf {B}}_I\mathbf {x}\le \mathbf {c}_I, \ \ {\mathsf {B}}_E\mathbf {x}= \mathbf {c}_E\}. \end{aligned}$$

(3.44)

6.1 Optimality Conditions

First observe that any equality constraint $\mathbf {b}_i^T\mathbf {x}=c_i, \ i\in {\mathscr {E}}$ can be replaced by the couple of inequalities $\mathbf {b}_i^T\mathbf {x}\le c_i$ and $-\mathbf {b}_i^T\mathbf {x}\le -c_i$. We can thus use our results obtained by the analysis of the inequality constrained problems in Sect. 3.5 to get similar results for general bound and equality constrained QP problem (3.43).

Proposition 3.12

Let the quadratic function f be defined by the SPS matrix ${\mathsf {A}}$ and $\mathbf {b}\in \mathbb {R}^n$. Let $\varOmega _{IE}\ne \emptyset $ be defined by

$$\mathbf {h}_{{\mathscr {E}}}(\mathbf {x})={\mathsf {B}}_E\mathbf {x}-\mathbf {c}_E $$

and convex differential functions $h_i, \ i\in {\mathscr {I}}$. Then the following statements hold:

(i) $\mathbf {\overline{x}}\in \varOmega _{IE}$ is a solution of (3.43) if and only if

$$\begin{aligned} \nabla f(\mathbf {\overline{x}})=({\mathsf {A}}\mathbf {\overline{x}}-\mathbf {b})^T\mathbf {d}\ge 0 \end{aligned}$$

(3.45)

for any feasible direction $\mathbf {d}$ of $\varOmega _{IE}$ at $\mathbf {\overline{x}}$.

(ii) If $\varOmega _{IE}$ is defined by linear constraints (3.44), then $\mathbf {\overline{x}}\in \varOmega _{IE}$ is a solution of (3.43) if and only if there is a vector $\overline{\varvec{\lambda }}\in \mathbb {R}^m$ such that

$$\begin{aligned} \overline{\varvec{\lambda }}_{{\mathscr {I}}}\ge \mathbf {o}, \quad {\mathsf {A}}\mathbf {\overline{x}}-\mathbf {b}+{\mathsf {B}}^T\overline{\varvec{\lambda }}=\mathbf {o}, \quad \mathrm {and} \quad \overline{\varvec{\lambda }}^T_{{\mathscr {I}}}[{\mathsf {B}}\mathbf {\overline{x}}-\mathbf {c}]_{{\mathscr {I}}}=0. \end{aligned}$$

(3.46)

(iii) If $\varOmega _{IE}$ is a feasible set for the problem (3.43) and the constraints satisfy ACQ, then $\mathbf {\overline{x}}\in \varOmega _{IE}$ is a solution of (3.43) if and only if there is a vector $\overline{\varvec{\lambda }}\in \mathbb {R}^m$ such that

$$\begin{aligned} \overline{\varvec{\lambda }}_{{\mathscr {I}}}\ge \mathbf {o}, \quad {\mathsf {A}}\mathbf {\overline{x}}-\mathbf {b} +\nabla \mathbf {h}(\mathbf {\overline{x}})\overline{\varvec{\lambda }}=\mathbf {o}, \quad \mathrm {and} \quad \overline{\varvec{\lambda }}^T_{{\mathscr {I}}}\mathbf {h}_{{\mathscr {I}}}(\mathbf {\overline{x}})=0. \end{aligned}$$

(3.47)

Proof

First observe that if ${{\mathscr {E}}}=\emptyset $, then the statements of the above proposition reduce to Propositions 3.8 and 3.10, and if ${{\mathscr {I}}}=\emptyset $, then they reduce to Proposition 3.6. Thus we can assume in the rest of the proof that ${{\mathscr {I}}}\ne \emptyset $ and ${{\mathscr {E}}}\ne \emptyset $.

As mentioned above, (3.43) may be rewritten also as

$$\begin{aligned} \min _{\mathbf {x}\in \varOmega _{I}} f(\mathbf {x}), \ \ \varOmega _{I}=\{\mathbf {x}\in \mathbb {R}^n: \ [{\mathsf {B}}\mathbf {x}]_{{\mathscr {I}}}\le \mathbf {c}_{{\mathscr {I}}}, \ [{\mathsf {B}}\mathbf {x}]_{{\mathscr {E}}}\le \mathbf {c}_{{\mathscr {E}}}, -[{\mathsf {B}}\mathbf {x}]_{{\mathscr {E}}}\le -\mathbf {c}_{{\mathscr {E}}}\}, \end{aligned}$$

(3.48)

where $\varOmega _{I}=\varOmega _{IE}$. Thus the statement (i) is a special case of Proposition 3.8. Observing that we can always ignore one of the inequality constraints related tho the same equality constraint, we get easily (ii) and (iii) from Propositions 3.8 and 3.10, respectively. $\square $

Remark 3.3

Condition (3.45) can be written as a variational inequality

$$({\mathsf {A}}\mathbf {\overline{x}})^T(\mathbf {x}-\overline{\mathbf {x}})\ge \mathbf {b}^T(\mathbf {x}-\overline{\mathbf {x}}), \quad \mathbf {x}\in \varOmega _{IE}. $$

7 Duality for Quadratic Programming Problems

The duality associates each problem (3.43), which we shall also call primal problem, with a maximization problem in Lagrange multipliers that we shall call the dual problem. The solution of the dual problem is a Lagrange multiplier of the solution of the primal problem, so that having the solution of the dual problem, we can get the solution of the primal problem by solving an unconstrained problem. Here we limit our attention to the QP problems (3.44), postponing the discussion of more general cases to specific applications.

The cost function of the dual problem is the dual function

$$\begin{aligned} \varTheta (\varvec{\lambda }) =\inf _{\mathbf {x}\in \mathbb {R}^n}L_0(\mathbf {x},\varvec{\lambda }). \end{aligned}$$

(3.49)

If A is SPD, then $L_0$ is a quadratic function with the SPD Hessian and $\varTheta $ is a quadratic function in $\varvec{\lambda }$ which can be defined by an explicit formula. However, in our applications, it often happens that ${\mathsf {A}}$ is only SPS, so the cost function f need not be bounded from below and $-\infty $ can be in the range of the dual function $\varTheta $. We resolve this problem by keeping $\varTheta $ quadratic at the cost of introducing equality constraints.

Proposition 3.13

Let matrices ${\mathsf {A, B}}$, vectors $\mathbf {b, c}$, and index sets ${{\mathscr {I}}},{{\mathscr {E}}}$ be those of the definition of problem (3.44) with ${\mathsf {A}}$ positive semidefinite and $\varOmega _{IE}\ne \emptyset $. Let ${\mathsf {R}}\in \mathbb {R}^{n\times d}$ be a full rank matrix such that

$$\begin{aligned} \mathrm {Im}{\mathsf {R}}=\mathrm {Ker}{\mathsf {A}}, \end{aligned}$$

let ${\mathsf {A}}^+$ denote an SPS generalized inverse of ${\mathsf {A}}$, and let

$$\begin{aligned} \varTheta (\varvec{\lambda }) =-\frac{1}{2}\varvec{\lambda }^T{\mathsf {B}}{\mathsf {A}}^+{\mathsf {B}}^T\varvec{\lambda } +\varvec{\lambda }^T({\mathsf {B}}{\mathsf {A}}^+\mathbf {b}-\mathbf {c}) -\frac{1}{2}\mathbf {b}^T{\mathsf {A}}^+\mathbf {b}. \end{aligned}$$

(3.50)

Then the following statements hold:

(i) If $(\mathbf {\overline{x}}, \overline{\varvec{\lambda }})$ is a KKT pair for (3.44), then $\overline{\varvec{\lambda }}$ is a solution of

$$\begin{aligned} \max _{\varvec{\lambda }\in \varOmega _{BE}} \varTheta (\varvec{\lambda }), \quad \varOmega _{BE}=\{\varvec{\lambda }\in \mathbb {R}^m: \ \varvec{\lambda }_{{\mathscr {I}}}\ge \mathbf {o}, \ {\mathsf {R}}^T{\mathsf {B}}^T\varvec{\lambda }={\mathsf {R}}^T\mathbf {b}\}. \end{aligned}$$

(3.51)

Moreover, there is $\overline{\varvec{\alpha }}\in \mathbb {R}^d$ such that $(\overline{\varvec{\lambda }}, \overline{\varvec{\alpha }})$ is a KKT pair for problem (3.51) and

$$\begin{aligned} \mathbf {\overline{x}}={\mathsf {A}}^{+}(\mathbf {b}-{\mathsf {B}}^T\overline{\varvec{\lambda }}) +{\mathsf {R}}\overline{\varvec{\alpha }}. \end{aligned}$$

(3.52)

(ii) If $(\overline{\varvec{\lambda }}, \overline{\varvec{\alpha }})$ is a KKT pair for problem (3.51), then $\mathbf {\overline{x}}$ defined by (3.52) is a solution of the equality and inequality constrained problem (3.44).

(iii) If $(\mathbf {\overline{x}}, \overline{\varvec{\lambda }})$ is a KKT pair for problem (3.44), then

$$\begin{aligned} f(\mathbf {\overline{x}})=\varTheta (\overline{\varvec{\lambda }}). \end{aligned}$$

(3.53)

Proof

(i) Assume that $(\mathbf {\overline{x}}, \overline{\varvec{\lambda }})$ is a KKT pair for (3.44), so that $(\mathbf {\overline{x}}, \overline{\varvec{\lambda }})$ is by Proposition 3.12 a solution of

$$\begin{aligned} \varvec{\lambda }_{{\mathscr {I}}}\ge & {} \mathbf {o}, \end{aligned}$$

(3.54)

$$\begin{aligned} \nabla _{\mathbf {x}}L_0({\mathbf {x}},\varvec{\lambda })= & {} {\mathsf {A}}\mathbf {x}-\mathbf {b}+{\mathsf {B}}^T\varvec{\lambda }=\mathbf {o},\end{aligned}$$

(3.55)

$$\begin{aligned} \left[ \nabla _{\varvec{\lambda }}L_0({\mathbf {x}},\varvec{\lambda })\right] _{{\mathscr {I}}}= & {} \left[ {\mathsf {B}}\mathbf {x}-\mathbf {c}\right] _{{\mathscr {I}}}\le \mathbf {o},\end{aligned}$$

(3.56)

$$\begin{aligned} \left[ \nabla _{\varvec{\lambda }}L_0({\mathbf {x}},\varvec{\lambda })\right] _{{\mathscr {E}}}= & {} \left[ {\mathsf {B}}\mathbf {x}-\mathbf {c}\right] _{{\mathscr {E}}}=\mathbf {o},\end{aligned}$$

(3.57)

$$\begin{aligned} \varvec{\lambda }^T_{{\mathscr {I}}}[{\mathsf {B}}\mathbf {x}-\mathbf {c}]_{{\mathscr {I}}}= & {} 0. \end{aligned}$$

(3.58)

Notice that given a vector $\varvec{\lambda }\in \mathbb {R}^m$, we can express the condition

$$ \mathbf {b}-{\mathsf {B}}^T\varvec{\lambda }\in \mathrm {Im}{\mathsf {A}}, $$

which guarantees solvability of (3.55) with respect to $\mathbf {x}$, conveniently as

$$\begin{aligned} {\mathsf {R}}^T({\mathsf {B}}^T\varvec{\lambda }-\mathbf {b})=\mathbf {o}. \end{aligned}$$

(3.59)

If the latter condition is satisfied, then we can use any symmetric left generalized inverse ${\mathsf {A}}^+$ to find all solutions of (3.55) with respect to $\mathbf {x}$ in the form

$$ \mathbf {x}(\varvec{\lambda }, \varvec{\alpha })={\mathsf {A}}^{+}(\mathbf {b}-{\mathsf {B}}^T\varvec{\lambda })+{\mathsf {R}}\varvec{\alpha }, \ \varvec{\alpha }\in \mathbb {R}^d, $$

where d is the dimension of $\mathrm {Ker}{\mathsf {A}}$. After substituting for $\mathbf {x}$ into (3.56)–(3.58), we get

$$\begin{aligned}&[\,-{\mathsf {B}}{\mathsf {A}}^{+}{\mathsf {B}}^T\varvec{\lambda }+({\mathsf {B}}{\mathsf {A}}^{+}\mathbf {b}-\mathbf {c}) +{\mathsf {B}}{\mathsf {R}}\varvec{\alpha }]_{{\mathscr {I}}}\le \mathbf {o},\end{aligned}$$

(3.60)

$$\begin{aligned}&[\,-{\mathsf {B}}{\mathsf {A}}^{+}{\mathsf {B}}^T\varvec{\lambda }+({\mathsf {B}}{\mathsf {A}}^+\mathbf {b}-\mathbf {c}) +{\mathsf {B}}{\mathsf {R}}\varvec{\alpha }]_{{\mathscr {E}}}=\mathbf {o}, \end{aligned}$$

(3.61)

$$\begin{aligned} \varvec{\lambda }_{{\mathscr {I}}}^T&[\,-{\mathsf {B}}{\mathsf {A}}^{+}{\mathsf {B}}^T\varvec{\lambda } +({\mathsf {B}}{\mathsf {A}}^+\mathbf {b}-\mathbf {c})+{\mathsf {B}}{\mathsf {R}}\varvec{\alpha }]_{{\mathscr {I}}} =0. \end{aligned}$$

(3.62)

The formulae in (3.60)–(3.62) look like something that we have already seen. Indeed, introducing the vector of Lagrange multipliers $\varvec{\alpha }$ for (3.59) and denoting

$$\begin{aligned} \varLambda (\varvec{\lambda }, \varvec{\alpha })= & {} \varTheta (\varvec{\lambda })+\varvec{\alpha }^T({\mathsf {R}}^T{\mathsf {B}}^T\varvec{\lambda }-{\mathsf {R}}^T\mathbf {b})\nonumber \\= & {} -\frac{1}{2}\varvec{\lambda }^T{\mathsf {B}}{\mathsf {A}}^{+}{\mathsf {B}}^T\varvec{\lambda }+\varvec{\lambda }^T({\mathsf {B}}{\mathsf {A}}^{+}\mathbf {b}-\mathbf {c})-\frac{1}{2}\mathbf {b}^T{\mathsf {A}}^+\mathbf {b}\nonumber \\&+\varvec{\alpha }^T({\mathsf {R}}^T{\mathsf {B}}^T\varvec{\lambda }-{\mathsf {R}}^T\mathbf {b}),\nonumber \end{aligned}$$

$$ \mathbf {g}=\nabla _{\varvec{\lambda }}\varLambda (\varvec{\lambda }, \varvec{\alpha }) =-{\mathsf {B}}{\mathsf {A}}^{+}{\mathsf {B}}^T\varvec{\lambda }+({\mathsf {B}}{\mathsf {A}}^{+}\mathbf {b}-\mathbf {c}) +{\mathsf {B}}{\mathsf {R}}\varvec{\alpha }, $$

we can rewrite the relations (3.60)–(3.62) as

$$\begin{aligned} \mathbf {g}_{{\mathscr {I}}}\le \mathbf {o}, \quad \ \mathbf {g}_{{\mathscr {E}}}= \mathbf {o},\quad \mathrm {and} \quad \varvec{\lambda }_{{\mathscr {I}}}^T\mathbf {g}_{{\mathscr {I}}}=0. \end{aligned}$$

(3.63)

Comparing (3.63) with the KKT conditions for the bound and equality constrained problem, we conclude that (3.63) are the KKT conditions for

$$\begin{aligned} \max \varTheta (\varvec{\lambda }) \quad \text {subject} \ \text {to} \quad {\mathsf {R}}^T{\mathsf {B}}^T\varvec{\lambda }-{\mathsf {R}}^T\mathbf {b}=\mathbf {o} \quad \text {and} \quad \varvec{\lambda }_{{\mathscr {I}}}\ge \mathbf {o}. \end{aligned}$$

(3.64)

We have thus proved that if $(\mathbf {\overline{x}}, \overline{\varvec{\lambda }})$ solves (3.54)–(3.58), then $\overline{\varvec{\lambda }}$ is a feasible vector for problem (3.64) which satisfies the related KKT conditions. Recalling that ${\mathsf {A}}^+$ is by the assumption symmetric positive semidefinite, so that ${\mathsf {B}}{\mathsf {A}}^{+}{\mathsf {B}}^T$ is also positive semidefinite, we conclude that $\overline{\varvec{\lambda }}$ solves (3.51). Moreover, we have shown that any solution $\mathbf {\overline{x}}$ can be obtained in the form (3.52) with a KKT pair $(\overline{\varvec{\lambda }}, \overline{\varvec{\alpha }})$, where $\overline{\varvec{\alpha }}$ is a vector of the Lagrange multipliers for the equality constraints in (3.51).

(ii) Let $(\overline{\varvec{\lambda }}, \overline{\varvec{\alpha }})$ be a KKT pair for problem (3.51), so that $(\overline{\varvec{\lambda }}, \overline{\varvec{\alpha }})$ satisfies (3.59)–(3.62) and $\overline{\varvec{\lambda }}_{{\mathscr {I}}}\ge \mathbf {o}$. If we denote

$$ \mathbf {\overline{x}}={\mathsf {A}}^{+}(\mathbf {b}-{\mathsf {B}}^T\overline{\varvec{\lambda }})+{\mathsf {R}}\overline{\varvec{\alpha }}, $$

we can use (3.60)–(3.62) to verify directly that $\mathbf {\overline{x}}$ is feasible and $(\mathbf {\overline{x}}, \overline{\varvec{\lambda }})$ satisfies the complementarity conditions, respectively. Finally, using (3.59), we get that there is $\mathbf {y}\in \mathbb {R}^n$ such that

$$ \mathbf {b}-{\mathsf {B}}^T\overline{\varvec{\lambda }}={\mathsf {A}}\mathbf {y}. $$

Thus

$$\begin{aligned} {\mathsf {A}}\mathbf {\overline{x}}-\mathbf {b}+{\mathsf {B}}^T\overline{\varvec{\lambda }}= & {} {\mathsf {A}}\bigl ({\mathsf {A}}^+(\mathbf {b}-{\mathsf {B}}^T\overline{\varvec{\lambda }})+{\mathsf {R}}\overline{\varvec{\alpha }}\bigr )-\mathbf {b}+{\mathsf {B}}^T\overline{\varvec{\lambda }}\nonumber \\= & {} {\mathsf {AA}}^+{\mathsf {A}}\mathbf {y}-\mathbf {b}+{\mathsf {B}}^T\overline{\varvec{\lambda }} =\mathbf {b}-{\mathsf {B}}^T\overline{\varvec{\lambda }}-\mathbf {b}+{\mathsf {B}}^T\overline{\varvec{\lambda }}=\mathbf {o}, \nonumber \end{aligned}$$

which proves that $(\mathbf {\overline{x}}, \overline{\varvec{\lambda }})$ is a KKT pair for (3.44).

(iii) Let $(\mathbf {\overline{x}}, \overline{\varvec{\lambda }})$ be a KKT pair for (3.44). Using the feasibility condition (3.57) and the complementarity condition (3.58), we get

$$ \overline{\varvec{\lambda }}^T({\mathsf {B}}\mathbf {\overline{x}}-\mathbf {c})= \overline{\varvec{\lambda }}^T_{{\mathscr {E}}}[{\mathsf {B}}\mathbf {\overline{x}}-\mathbf {c}]_{{\mathscr {E}}}+ \overline{\varvec{\lambda }}^T_{{\mathscr {I}}}[{\mathsf {B}}\mathbf {\overline{x}}-\mathbf {c}]_{{\mathscr {I}}}= 0. $$

Hence

$$ f(\mathbf {\overline{x}})=f(\mathbf {\overline{x}})+\overline{\varvec{\lambda }}^T({\mathsf {B}}\mathbf {\overline{x}}-\mathbf {c}) =L_0(\mathbf {\overline{x}}, \overline{\varvec{\lambda }}). $$

Next recall that if $(\mathbf {\overline{x}}, \overline{\varvec{\lambda }})$ is a KKT pair, then

$$ \nabla _{\mathbf {x}} L_0(\mathbf {\overline{x}}, \overline{\varvec{\lambda }})=\mathbf {o}. $$

Since $L_0$ is convex, the latter is the gradient condition for the unconstrained minimizer of $L_0$ with respect to $\mathbf {x}$; therefore

$$ L_0(\mathbf {\overline{x}},\overline{\varvec{\lambda }}) =\min _{\mathbf {x}\in \mathbb {R}^n}L_0(\mathbf { x},\overline{\varvec{\lambda }}) =\varTheta (\overline{\varvec{\lambda }}). $$

Thus

$$ f(\mathbf {\overline{x}}) =L_0(\mathbf {\overline{x}},\overline{\varvec{\lambda }}) =\varTheta (\overline{\varvec{\lambda }}). $$

$\square $

Since the constant term is not essential in our applications and we formulate our algorithms for minimization problems, we shall consider the function

$$\begin{aligned} \theta (\varvec{\lambda })=-\varTheta (\varvec{\lambda })-\frac{1}{2}\mathbf {b}^T{\mathsf {A}}^+\mathbf {b} =\frac{1}{2}\varvec{\lambda }^T{\mathsf {B}}{\mathsf {A}}^+{\mathsf {B}}^T\varvec{\lambda }-\varvec{\lambda }^T({\mathsf {B}}{\mathsf {A}}^+\mathbf {b}-\mathbf {c}), \end{aligned}$$

(3.65)

so that

$$ \arg \ \min _{\lambda \in \varOmega _{BE}} \theta (\varvec{\lambda }) =\arg \ \max _{\lambda \in \varOmega _{BE}} \varTheta (\varvec{\lambda }). $$

7.1 Uniqueness of a KKT Pair

We shall complete our exposition of duality by formulating the results concerning the uniqueness of the solution for the constrained dual problem

$$\begin{aligned} \min _{\varvec{\lambda }\in \varOmega _{BE}} \theta (\varvec{\lambda }), \quad \varOmega _{BE}=\{\varvec{\lambda }\in \mathbb {R}^m: \ \varvec{\lambda }_{{\mathscr {I}}}\ge \mathbf {o}, \ {\mathsf {R}}^T{\mathsf {B}}^T\varvec{\lambda }={\mathsf {R}}^T\mathbf {b}\}, \end{aligned}$$

(3.66)

where $\theta $ is defined by (3.65).

Proposition 3.14

Let the matrices ${\mathsf {A}}$, ${\mathsf {B}}$, the vectors $\mathbf {b}$, $\mathbf {c}$, and the index sets ${\mathscr {I}}$, ${\mathscr {E}}$ be those from the definition of problem (3.44) with ${\mathsf {A}}$ positive semidefinite, $\varOmega _{IE}\ne \emptyset $, and $\varOmega _{BE}\ne \emptyset $. Let ${\mathsf {R}}\in \mathbb {R}^{n\times d}$ be a full rank matrix such that

$$ \mathrm {Im}{\mathsf {R}}=\mathrm {Ker}{\mathsf {A}}. $$

Then the following statements hold:

(i) If $\ {\mathsf {B}}^T$ and ${\mathsf {BR}}$ are full column rank matrices, then there is a unique solution $\widehat{\varvec{\lambda }}$ of problem (3.66).

(ii) If $\widehat{\varvec{\lambda }}$ is a unique solution of the constrained dual problem (3.66),

$$ {\mathscr {A}}=\{i: \ [\varvec{\lambda }]_i>0\}\cup {{\mathscr {E}}}, $$

and ${\mathsf {B}}_{{\mathscr {A}}*}{\mathsf {R}}$ is a full column rank matrix, then there is a unique triple $(\mathbf {\widehat{x}}, \widehat{\varvec{\lambda }}, \widehat{\varvec{\alpha }})$ such that $(\mathbf {\widehat{x}}, \widehat{\varvec{\lambda }})$ solves the primal problem (3.44) and $(\widehat{\varvec{\lambda }},\widehat{\varvec{\alpha }})$ solves the constrained dual problem (3.66). If $\widehat{\varvec{\lambda }}$ is known, then

$$\begin{aligned} \widehat{\varvec{\alpha }}=({\mathsf {R}}^T{\mathsf {B}}_{{\mathscr {A}} *}^T{\mathsf {B}}_{{\mathscr {A}} *}{\mathsf {R}})^{-1}{\mathsf {R}}^T{\mathsf {B}}_{{\mathscr {A}} *}^T \left( {\mathsf {B}}_{{\mathscr {A}}*}{\mathsf {A}}^{+}{\mathsf {B}}^T\widehat{\varvec{\lambda }} -({\mathsf {B}}_{{\mathscr {A}}*}{\mathsf {A}}^+\mathbf {b}-\mathbf {c}_{{\mathscr {A}}})\right) \end{aligned}$$

(3.67)

and

$$\begin{aligned} \mathbf {\widehat{x}}={\mathsf {A}}^{+}(\mathbf {b}-{\mathsf {B}}^T\widehat{\varvec{\lambda }}) +{\mathsf {R}}\widehat{\varvec{\alpha }}. \end{aligned}$$

(3.68)

(iii) If ${\mathsf {B}}^T$ and $\ {\mathsf {B}}_{{\mathscr {E}} *}{\mathsf {R}}$ are full column rank matrices, then there is a unique triple $(\mathbf {\widehat{x}}, \widehat{\varvec{\lambda }}, \widehat{\varvec{\alpha }})$ such that $(\mathbf {\widehat{x}}, \widehat{\varvec{\lambda }})$ solves the primal problem (3.44) and $(\widehat{\varvec{\lambda }}, \widehat{\varvec{\alpha }})$ solves the constrained dual problem (3.66).

Proof

(i) Let ${\mathsf {B}}^T$ and ${\mathsf {BR}}$ be full column rank matrices. To show that there is a unique solution of (3.66), we examine the Hessian ${\mathsf {B}}{\mathsf {A}}^{+}{\mathsf {B}}^T$ of $\theta $. Let ${\mathsf {R}}^T{\mathsf {B}}^T\varvec{\lambda }=\mathbf {o}$ and ${\mathsf {B}}{\mathsf {A}}^{+}{\mathsf {B}}^T\varvec{\lambda }=\mathbf {o}$. Using the definition of ${\mathsf {R}}$, it follows that ${\mathsf {B}}^T\varvec{\lambda }\in \mathrm {Im}{\mathsf {A}}$. Hence there is $\varvec{\mu }\in \mathbb {R}^n$ such that

$$ {\mathsf {B}}^T\varvec{\lambda }={\mathsf {A}}\varvec{\mu } $$

and

$$ \varvec{\mu }^T{\mathsf {A}}\varvec{\mu }=\varvec{\mu }^T{\mathsf {A}}{\mathsf {A}}^+{\mathsf {A}}\varvec{\mu } =\varvec{\lambda }^T{\mathsf {B}}{\mathsf {A}}^{+}{\mathsf {B}}^T\varvec{\lambda } =0. $$

Thus $\varvec{\mu }\in \mathrm {Ker}{\mathsf {A}}$ and

$$ {\mathsf {B}}^T\varvec{\lambda }={\mathsf {A}}\varvec{\mu }=\mathbf {o}. $$

Since we assume that ${\mathsf {B}}^T$ has independent columns, we conclude that $\varvec{\lambda }=\mathbf {o}$. We have thus proved that the restriction of ${\mathsf {B}}{\mathsf {A}}^{+}{\mathsf {B}}^T$ to $\mathrm {Ker}({\mathsf {R}}^T{\mathsf {B}}^T)$ is positive definite, so that $\theta |\mathrm {Ker}{\mathsf {R}}^T{\mathsf {B}}^T$ is by Proposition 3.4 strictly convex, and it is easy to check that it is strictly convex on

$$ {\mathscr {U}}=\{\varvec{\lambda }\in \mathbb {R}^m: \ {\mathsf {R}}^T{\mathsf {B}}^T\varvec{\lambda }={\mathsf {R}}^T\mathbf {b}\}. $$

Since $\varOmega _{BE}\ne \emptyset $ and $\varOmega _{BE}\subseteq {{\mathscr {U}}}$, we have that $\theta $ is strictly convex on $\varOmega _{BE}$, and it follows by Proposition 3.3 that there is a unique solution $\widehat{\varvec{\lambda }}$ of (3.66).

(ii) Let $\widehat{\varvec{\lambda }}$ be a unique solution of problem (3.66). Since the solution satisfies the related KKT conditions, it follows that there is $\widehat{\varvec{\alpha }}$ such that

$$ {\mathsf {B}}_{{\mathscr {A}} *}{\mathsf {A}}^{+}{\mathsf {B}}^T\widehat{\varvec{\lambda }} -({\mathsf {B}}_{{\mathscr {A}} *}{\mathsf {A}}^+\mathbf {b}-\mathbf {c}_{{\mathscr {A}}}) -{\mathsf {B}}_{{\mathscr {A}} *}{\mathsf {R}}\widehat{\varvec{\alpha }} =\mathbf {o}. $$

After multiplying on the left by ${\mathsf {R}}^T{\mathsf {B}}_{{\mathscr {A}}*}^T$ and simple manipulations, we get (3.67). The inverse exists and the solution $\widehat{\varvec{\alpha }}$ is unique due to the uniqueness of $\widehat{\varvec{\lambda }}$ and the assumption on the full column rank of ${\mathsf {B}}_{{\mathscr {A}}*}{\mathsf {R}}$.

(iii) If ${\mathsf {B}}^T$ and ${\mathsf {B}}_{{\mathscr {E}} *}{\mathsf {R}}$ are full column rank matrices, then ${\mathsf {B}}{\mathsf {R}}$ is also a full column rank matrix. Hence, there is a unique solution $\widehat{\varvec{\lambda }}$ of problem (3.66) by (i). Since ${{\mathscr {E}}}\subseteq {{\mathscr {A}}}$ and ${\mathsf {B}}_{{\mathscr {E}} *}{\mathsf {R}}$ has independent columns, it follows that ${\mathsf {B}}_{{\mathscr {A}} *}{\mathsf {R}}$ has also independent columns. Thus we can use (ii) to finish the proof. $\square $

The reconstruction formula (3.67) can be modified in order to work whenever the dual problem has a solution $\overline{\varvec{\lambda }}$. The resulting formula obtained by the analysis of the related KKT conditions then reads

$$\begin{aligned} \overline{\varvec{\alpha }}=({\mathsf {R}}^T{\mathsf {B}}_{{\mathscr {A}}*}^T{\mathsf {B}}_{{\mathscr {A}}*}{\mathsf {R}})^{+}{\mathsf {R}}^T{\mathsf {B}}_{{\mathscr {A}}*}^T \left( {\mathsf {B}}_{{\mathscr {A}}*}{\mathsf {A}}^{+}{\mathsf {B}}^T\overline{\varvec{\lambda }}-({\mathsf {B}}_{{\mathscr {A}}*}{\mathsf {A}}^+\mathbf {b}-\mathbf {c}_{{\mathscr {A}}})\right) . \end{aligned}$$

(3.69)

The duality theory can be illustrated on a problem to find the displacement $\mathbf {x}$ of an elastic body under traction $\mathbf {b}$. After the finite element discretization, we get a convex QP problem. We assume that the body is fixed on a part of the boundary in normal direction, so that the vector of nodal displacements satisfies ${\mathsf {B}}_{{\mathscr {E}}*}\mathbf {x}=\mathbf {c}_{{\mathscr {E}}}$ as in Fig. 3.9. Moreover, the body may not be allowed to penetrate an obstacle, so that ${\mathsf {B}}_{{\mathscr {I}}*}\mathbf {x}\le \mathbf {c}_{{\mathscr {I}}}$ as in Fig. 3.8.

The displacement $\mathbf {\overline{x}}$ of the body in equilibrium is a minimizer of the convex energy function f. The Hessian ${\mathsf {A}}$ of f is positive semidefinite if the constraints admit rigid body motions. The Lagrange multipliers solve the dual problem. The condition ${\mathsf {R}}^T\mathbf {b}={\mathsf {R}}^T{\mathsf {B}}^T\widehat{\varvec{\lambda }}$ requires that the resulting forces are balanced in the directions of the rigid body motions and $\widehat{\varvec{\lambda }}_{{\mathscr {I}}}\ge \mathbf {o}$ guarantees that the body is not glued to the obstacle. If the reaction forces ${\mathsf {B}}^T\widehat{\varvec{\lambda }}$ determine the components of $\widehat{\varvec{\lambda }}$, then $\widehat{\varvec{\lambda }}$ is uniquely determined by the conditions of equilibrium. Notice that ${\mathsf {B}}^T\widehat{\varvec{\lambda }}$ is always uniquely determined by the conditions of equilibrium. If no rigid body motion is possible due to the active constraints ${\mathsf {B}}_{{\mathscr {A}} *}\mathbf {x}=\mathbf {c}_{{\mathscr {A}}}$ as in Fig. 3.8, then the displacement $\mathbf {x}$ is uniquely determined. If this is not the case, then the displacement is determined up to some rigid body motion as in Fig. 3.9.

References

Bertsekas, D.P.: Nonlinear Optimization. Athena Scientific, Belmont (1999)
MATH Google Scholar
Nocedal, J., Wright, S.F.: Numerical Optimization. Springer, New York (2000)
MATH Google Scholar
Conn, A.R., Gould, N.I.M., Toint, Ph.L: Trust Region Methods. SIAM, Philadelphia (2000)
Google Scholar
Bazaraa, M.S., Shetty, C.M., Sherali, H.D.: Nonlinear Programming, Theory and Algorithms, 2nd edn. Wiley, New York (1993)
MATH Google Scholar
Griva, I., Nash, S.G., Sofer, A.: Linear and Nonlinear Optimization. SIAM, Philadelphia (2009)
Book MATH Google Scholar
Frank, M., Wolfe, P.: An algorithm for quadratic programming. Naval Research Logistic Quarterly 3, 95–110 (1956)
Article MathSciNet Google Scholar
Dostál, Z.: On solvability of convex non-coercive quadratic programming problems. JOTA 143(2), 413–416 (2009)
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

National Supercomputer Center and Department of Applied Mathematics, VŠB-Technical University of Ostrava, Ostrava, Czech Republic
Zdeněk Dostál

Authors

Zdeněk Dostál
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zdeněk Dostál .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Dostál, Z. (2016). Optimization. In: Scalable Algorithms for Contact Problems. Advances in Mechanics and Mathematics, vol 36. Springer, New York, NY. https://doi.org/10.1007/978-1-4939-6834-3_3

Download citation

DOI: https://doi.org/10.1007/978-1-4939-6834-3_3
Published: 29 January 2017
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4939-6832-9
Online ISBN: 978-1-4939-6834-3
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics

Optimization

Abstract

Similar content being viewed by others

Nonlinear Optimization: A Brief Overview

Nonlinear Optimization: The Nelder-Mead Simplex Search Procedure

Optimization Theory

Keywords

1 Optimization Problems and Solutions

2 Unconstrained Quadratic Programming

2.1 Quadratic Cost Functions

2.2 Unconstrained Minimization of Quadratic Functions

Proposition 3.1

Proof

Remark 3.1

3 Convexity

3.1 Convex Quadratic Functions

Proposition 3.2

Proof

3.2 Minimizers of Convex Function

Proposition 3.3

Proof

3.3 Existence of Minimizers

Proposition 3.4

Proof

3.4 Projections to Convex Sets

Proposition 3.5

Proof

Corollary 3.1

Proof

4 Equality Constrained Problems

4.1 Optimality Conditions

Proposition 3.6

Proof

4.2 Existence and Uniqueness

Proposition 3.7

Proof

4.3 Sensitivity

5 Inequality Constrained Problems

5.1 Optimality Conditions for Linear Constraints

Proposition 3.8

Proof

Remark 3.2

5.2 Optimality Conditions for Bound Constrained Problems

Proposition 3.9

Proof

5.3 Optimality Conditions for More General Constraints

Proposition 3.10

Proof

5.4 Existence and Uniqueness

Proposition 3.11

Proof

6 Equality and Inequality Constrained Problems

6.1 Optimality Conditions

Proposition 3.12

Proof

Remark 3.3

7 Duality for Quadratic Programming Problems

Proposition 3.13

Proof

7.1 Uniqueness of a KKT Pair

Proposition 3.14

Proof

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation