Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

When solving a matrix computational problem in finite machine arithmetic obeying the IEEE Standard 754-2008 [29] there are three main factors determining the accuracy of the computed solution: (a) the parameters of the machine arithmetic and in particular the rounding unit, (b) the sensitivity of the problem and in particular its condition number, and (c) the properties of the numerical algorithm and in particular the parameters specifying its numerical stability. Only taking into account these factors it is possible to derive an error estimate for the computed solution, see e.g. [27, 28, 38, 75]. Without such an estimate a computational procedure cannot be accepted as reliable.

The sensitivity of matrix computational problems can be revealed by the methods and techniques of perturbation (or sensitivity) analysis. In turn, the perturbation analysis may be norm-wise and componentwise [38, 87] (below we present results on norm-wise analysis). But the necessity of such an analysis is motivated by at least two other reasons. First, it enlightens the very nature of the problem independently on its practical applicability. And second, the mathematical models of real systems and processes are subject to parametric and measurement uncertainties [95]. Within these uncertainties we actually have a family of models. Such a family can be characterized by the methods of perturbation analysis.

In this chapter we consider three classes of matrix perturbation problems: matrix equations, unitary decompositions of matrices and modal control of controllable systems. Besides numerous articles, there are many dissertations and books devoted to these and related problems, see e.g. [2, 4, 1618, 31, 32, 86] and [38, 4547]. Related matrix and control problems are considered in [5, 23, 60, 66, 75].

A considerable contribution to the perturbation theory of matrix equations and decompositions as well as to the corresponding numerical methods is made by Volker Mehrmann and many of his coauthors since 1991, see [1, 10, 12, 3841, 64, 6870, 72] for theoretical considerations and [69, 25, 28, 6567, 71, 76, 78] for numerical methods, algorithms and software.

2 Notation

In what follows we denote by \(\mathbb{R}^{p\times q}\) (resp. \(\mathbb{C}^{p\times q}\)) the space of p × q matrices over \(\mathbb{R}\) (resp. \(\mathbb{C}\)); we write \(\mathbb{R}^{p}\) for \(\mathbb{R}^{p\times 1}\) and A , \(\overline{\mathbf{A}}\) and \(\mathbf{A}^{\mathrm{H}} = \overline{\mathbf{A}}^{\top }\) for the transpose, complex conjugate and complex conjugate transpose of the matrix A with elements a k, l , respectively (we use bold for matrices and vectors). For block matrices we use MATLABFootnote 1-like notation, e.g. \(\mathbf{A} = \left [\begin{array}{cc} \mathbf{A}_{1,1} & \mathbf{A}_{1,2} \\ \mathbf{A}_{2,1} & \mathbf{A}_{2,2} \end{array} \right ] = [\mathbf{A}_{1,1},\mathbf{A}_{1,2};\mathbf{A}_{2,1},\mathbf{A}_{2,2}]\). In particular the vectorized column-wise form of the (p × q)-matrix \(\mathbf{A} = [\mathbf{a}_{1},\mathbf{a}_{2},\ldots,\mathbf{a}_{q}]\) with columns a k is the column pq-vector \(\mathrm{vec}(\mathbf{A}) = [\mathbf{a}_{1};\mathbf{a}_{2};\ldots;\mathbf{a}_{q}]\).

The size of the involved matrices (in particular the size of the identity matrix I) shall be clear from the context but we also use the notation I n for the identity (n × n)-matrix and 0 p×q for the zero (p × q)-matrix.

The Frobenius and the 2-norm of a matrix A are denoted as \(\|\mathbf{A}\|\) and \(\|\mathbf{A}\|_{2}\). We recall that \(\|\mathbf{A}\|^{2} =\sum _{k,l}\vert a_{k,l}\vert ^{2}\) and \(\|\mathbf{A}\|_{2}\) is the square root of the maximum eigenvalue of the matrix A H A (or the maximum singular value of A). The Kronecker product of the matrices A = [a k, l ] and B is AB = [a k, l B].

The Frobenius norm is very useful in matrix perturbation problems. Indeed, if Y = AXB then \(\mathrm{vec}(\mathbf{Y}) = (\mathbf{B}^{\top }\otimes \mathbf{A})\mathrm{vec}(\mathbf{X})\) and \(\|\mathbf{Y}\| \leq \| (\mathbf{B}^{\top }\otimes \mathbf{A})\|_{2}\|\mathbf{X}\|\), where the equality \(\|\mathbf{Y}\| =\| (\mathbf{B}^{\top }\otimes \mathbf{A})\|_{2}\|\mathbf{X}\|\) is reachable. In addition, if the matrix A is perturbed to A + E then we usually have a bound \(\|\mathbf{E}\| \leq \delta\) on the perturbation E in terms its Frobenius norm \(\|\mathbf{E}\|\) rather than a bound \(\|\mathbf{E}\|_{2} \leq \delta\) in terms of its 2-norm \(\|\mathbf{E}\|_{2}\).

Finally, the notation ‘: = ’ means ‘equal by definition’.

3 Problem Statement

Explicit matrix problems may be written as

$$\displaystyle{\mathbf{X} =\varPhi (\mathbf{A}),}$$

where the data A and the result X are nonzero real or complex matrices (or collection of matrices), while implicit problems are defined via matrix equations

$$\displaystyle{F(\mathbf{A},\mathbf{X}) = 0.}$$

The function Φ satisfies the Lipschitz condition

$$\displaystyle{\|\varPhi (\mathbf{A} + \mathbf{E}) -\varPhi (\mathbf{A})\| \leq L\|\mathbf{E}\|,}$$

where E is a certain perturbation in the data A. Here for some η > 0 the Lipschitz constant L = L(A, η) is the supremum of the quantity

$$\displaystyle{\frac{\|\varPhi (\mathbf{A} + \mathbf{E}) -\varPhi (\mathbf{A})\|} {\|\mathbf{E}\|} }$$

over all E satisfying \(0 <\| \mathbf{E}\| \leq \eta\). Denoting by

$$\displaystyle{\mathbf{Y} = \mathbf{Y}(\mathbf{A},\mathbf{E}):=\varPhi (\mathbf{A} + \mathbf{E}) -\varPhi (\mathbf{A})}$$

the perturbation in the solution we obtain

$$\displaystyle{ \frac{\|\mathbf{Y}\|} {\|\mathbf{X}\|} \leq K \frac{\|\mathbf{E}\|} {\|\mathbf{A}\|},\ \mathbf{X}\neq 0,\ \mathbf{A}\neq 0, }$$
(7.1)

where the quantity

$$\displaystyle{ K = K(\mathbf{A},\eta ):= L(\mathbf{A},\eta )\frac{\|\mathbf{A}\|} {\|\mathbf{X}\|} }$$
(7.2)

is the relative condition number of the problem. We stress that the estimate (7.1), (7.2) is nonlocal since it holds true for all E with \(\|\mathbf{E}\| \leq \eta\).

Let the problem be solved by a numerically stable algorithm in floating-point arithmetic with rounding unit u (in double precision mode \(\mathrm{u} = 2^{-53} \simeq 1.1 \times 10^{-16}\)). Then the computed solution \(\tilde{\mathbf{X}}\) may be represented as

$$\displaystyle{\tilde{\mathbf{X}} =\tilde{\varPhi } (\mathbf{A}),}$$

where \(\tilde{\varPhi }(\mathbf{A})\) is close to \(\varPhi (\tilde{\mathbf{A}})\) for some data \(\tilde{\mathbf{A}}\) which in turn is close to A in the sense that [28, 75]

$$\displaystyle{ \|\tilde{\varPhi }(\mathbf{A}) -\varPhi (\tilde{\mathbf{A}})\| \leq b\mathrm{u}\vert \mathbf{X}\|\ \ \mbox{ and}\ \ \|\tilde{\mathbf{A}} -\mathbf{A}\| \leq a\mathrm{u}\|\mathbf{A}\| }$$
(7.3)

for some positive constants a, b depending on the algorithm (and, eventually, on the data).

It may be shown [28] that within first order terms in u we have the relative accuracy estimate

$$\displaystyle{ \frac{\|\tilde{\mathbf{X}} -\mathbf{X}\|} {\|\mathbf{X}\|} \leq \mathrm{u}(\mathit{aK} + b). }$$
(7.4)

For a = 0 the algorithm is forwardly numerically stable, while for b = 0 it is backwardly numerically stable according to the definitions given in [94], see also [28].

The inequality (7.4) in view of (7.3) is one of the most useful estimates in matrix perturbation analysis. It reveals the influence of the three main factors determining the accuracy of the solution: the machine arithmetic (via u), the problem (via K) and the algorithm (via a and b). But there are some difficulties in applying this approach.

First, it is hard to estimate the constants a and b. That is why heuristically it is often assumed that a = 1 and b = 0. This is the case when the algorithm is backwardly stable (the computed solution is the exact solution of a close problem) and the only errors in the computational process are introduced when rounding the data A to some machine matrix fl(A) with \(\|\mathrm{fl}(\mathbf{A}) -\mathbf{A}\| \leq \mathrm{u}\|\mathbf{A}\|\). This is a very successful approach as the computational practice suggests. Under the heuristic assumption the relative error in the computed solution is bounded by Ku,

$$\displaystyle{\frac{\|\tilde{\mathbf{X}} -\mathbf{X}\|} {\|\mathbf{X}\|} \leq K\mathrm{u},}$$

and for Ku ≪ 1 (which is most often the case) we may expect about − log10(Ku) true decimal digits in the solution.

Second, to estimate the conditioning of the problem may also be difficult. Various aspects of the conditioning of computational problems are considered in [13, 21, 22, 34, 80]. It may be hard to compute L(A, η) and, in addition, it is not clear how to determine η a priori. That is why it is often assumed that

$$\displaystyle{L \approx L(\mathbf{A},0) =\|\varPhi '(\mathbf{A})\|,}$$

where Φ′(A) is the Fréchet derivative of Φ computed at the data A. This results in a local perturbation analysis when the norm of the perturbation in the solution is assumed to be bounded by a linear function of the perturbation in the data. Unfortunately this assumption may be severely violated as the next simple example shows.

Consider the linear scalar equation

$$\displaystyle{\mathit{AX} = 1.}$$

For A = 1 the solution is X = 1. Let the data A = 1 be perturbed to 1 + E, where E > −1. Then the solution X = 1 is perturbed to 1 + Y, where

$$\displaystyle{Y = \frac{-E} {1 + E}.}$$

For any η ∈ (0, 1) the Lipschitz constant is

$$\displaystyle{L(1,\eta ) = \frac{1} {1-\eta }}$$

and the correct perturbation bound (7.1) is

$$\displaystyle{\vert Y \vert \leq L(1,\eta )\vert E\vert,\ \vert E\vert \leq \eta.}$$

If we use L(1, 0) = 1 instead of L(1, η), then the local analysis gives the approximate estimate | Y | ≤ | E | (with no restrictions on E) while at the same time | Y | →  for E → −1! Moreover, the local bound “works” even for \(E = -1\) when there is no solution at all. Of course, we know that E should be small but in a real problem we do not know what “small” means.

The drawbacks of the local analysis may be overcome by the techniques of nonlocal perturbation analysis. In this case a quantity r > 0 and a non-decreasing function

$$\displaystyle{f: [0,r] \rightarrow \mathbb{R}_{+}}$$

are defined such that f(0) = 0 and

$$\displaystyle{\|\mathbf{Y}\| \leq f(\|\mathbf{E}\|),\ \|\mathbf{E}\| \leq r.}$$

This is the desired nonlocal (and in general nonlinear) perturbation estimate.

In many cases A is not a single matrix but a collection

$$\displaystyle{\mathbf{A} = (\mathbf{A}_{1},\mathbf{A}_{2},\ldots,\mathbf{A}_{m}) \in \mathcal{A}}$$

of m matrices A k . If the data matrices are perturbed as A k  → A k + E k with

$$\displaystyle{\|\mathbf{E}_{k}\| \leq \delta _{k}\ (k = 1,2,\ldots,m)}$$

the problem is to estimate (locally or nonlocally) the norm \(\|\mathbf{Y}\|\) of the perturbation Y in the solution X as a function of the perturbation vector

$$\displaystyle{\delta = [\delta _{1};\delta _{2};\ldots;\delta _{m}] \in \mathbb{R}_{+}^{m}.}$$

4 Matrix Equations

4.1 Introductory Remarks

Consider the matrix equation

$$\displaystyle{ F(\mathbf{A},\mathbf{X}) = \mathbf{0}, }$$
(7.5)

where A is a collection of matrices as above, \(\mathbf{X} \in \mathcal{X}\) is the solution and the function

$$\displaystyle{F(\mathbf{A},\cdot ): \mathcal{X} \rightarrow \mathcal{X}}$$

is Fréchet differentiable (\(\mathcal{X}\) is a certain space of matrices), while the function

$$\displaystyle{F(\cdot,\mathbf{X}): \mathcal{A}\rightarrow \mathcal{X}}$$

is at least Fréchet pseudo-differentiable [38]. The latter case occurs in complex equations when the data includes both the matrix A k and its complex conjugate \(\overline{\mathbf{A}}_{k}\) for some k. The correct treatment of this case was firstly given in [43], see also [38, 47] and [92].

Denoting by

$$\displaystyle{\mathbf{E} = (\mathbf{E}_{1},\mathbf{E}_{2},\ldots,\mathbf{E}_{m}) \in \mathcal{A}}$$

the collection of perturbations, the perturbed equation is written as

$$\displaystyle{ F(\mathbf{A} + \mathbf{E},\mathbf{X} + \mathbf{Y}) = \mathbf{0}. }$$
(7.6)

If the Fréchet derivative F X (A, X) is invertible we may rewrite (7.6) as an equivalent operator equation

$$\displaystyle{ \mathbf{Y} =\varPi (\mathbf{A},\mathbf{X},\mathbf{E},\mathbf{Y}), }$$
(7.7)

where

$$\displaystyle\begin{array}{rcl} \varPi (\mathbf{A},\mathbf{X},\mathbf{E},\mathbf{Y})&:=& -F_{\mathbf{X}}^{-1}(\mathbf{A},\mathbf{X})(F_{\mathbf{ A}}(\mathbf{A},\mathbf{X})(\mathbf{E}) + G(\mathbf{A},\mathbf{X},\mathbf{E},\mathbf{Y})), {}\\ G(\mathbf{A},\mathbf{X},\mathbf{E},\mathbf{Y})&:=& F(\mathbf{A} + \mathbf{E},\mathbf{X} + \mathbf{Y}) - F(\mathbf{A},\mathbf{X}) - F_{\mathbf{A}}(\mathbf{A},\mathbf{X})(\mathbf{E}) - F_{\mathbf{X}}(\mathbf{A},\mathbf{X})(\mathbf{Y}){}\\ \end{array}$$

(we must have in mind that F(A, X) = 0 for the particular solution X).

If for example

$$\displaystyle{F(\mathbf{A},\mathbf{X}):= \mathbf{A}_{1}+\mathbf{A}_{2}\mathbf{X}\mathbf{A}_{3}+\mathbf{A}_{4}\mathbf{X}\mathbf{A}_{5}\mathbf{X}\mathbf{A}_{6}+\mathbf{A}_{7}\mathbf{X}\mathbf{A}_{8}\mathbf{X}\mathbf{A}_{9}\mathbf{X}\mathbf{A}_{10}+\mathbf{A}_{11}\mathbf{X}^{-1}\mathbf{A}_{ 12}}$$

then

$$\displaystyle\begin{array}{rcl} F_{\mathbf{X}}(\mathbf{A},\mathbf{X})(\mathbf{Y})& =& \mathbf{A}_{2}\mathbf{Y}\mathbf{A}_{3} + \mathbf{A}_{4}\mathbf{X}\mathbf{A}_{5}\mathbf{Y}\mathbf{A}_{6} + \mathbf{A}_{4}\mathbf{Y}\mathbf{A}_{5}\mathbf{X}\mathbf{A}_{6} +\ \mathbf{A}_{7}\mathbf{X}\mathbf{A}_{8}\mathbf{X}\mathbf{A}_{9}\mathbf{Y}\mathbf{A}_{10} {}\\ & & +\ \mathbf{A}_{7}\mathbf{X}\mathbf{A}_{8}\mathbf{Y}\mathbf{A}_{9}\mathbf{X}\mathbf{A}_{10} + \mathbf{A}_{7}\mathbf{Y}\mathbf{A}_{8}\mathbf{X}\mathbf{A}_{9}\mathbf{X}\mathbf{A}_{10} -\mathbf{A}_{11}\mathbf{X}^{-1}\mathbf{Y}\mathbf{X}^{-1}\mathbf{A}_{ 12}.{}\\ \end{array}$$

The perturbation analysis of algebraic matrix equation is subject to numerous studies, e.g. [19, 20, 26, 33, 37, 49, 50, 79, 81] and [2, 3, 35, 38, 51, 92, 93], see also the bibliography in the monograph [38]. A general framework for such an analysis is given in [38, 41]. Perturbation analysis of general coupled matrix quadratic equations is given in [36]. Such an analysis for the H problem involving two Riccati equations and other relations is done in [59], while perturbation analysis of differential matrix quadratic equations is presented in [33, 42].

4.2 Local Estimates

Neglecting second order terms in δ in (7.7) it is possible to derive an expression

$$\displaystyle{\mathbf{y} \approx \mathbf{z}:=\sum _{ k=1}^{m}\mathbf{L}_{ k}\mathbf{e}_{k},}$$

where

$$\displaystyle{\mathbf{y}:= \mathrm{vec}(\mathbf{Y}),\ \mathbf{e}_{k}:= \mathrm{vec}(\mathbf{E}_{k})}$$

and L k are certain matrices. Since

$$\displaystyle{\|\mathbf{Y}\| =\| \mathbf{y}\| \approx \|\mathbf{z}\|}$$

the problem is to find a tight bound on \(\|\mathbf{z}\|\) as a function of δ. Such an improved norm-wise estimate is given in [38, 55]

$$\displaystyle{ \|\mathbf{z}\| \leq \mathrm{est}(\mathbf{L};\delta ):=\min \left \{\|\mathbf{L}\|_{2}\|\delta \|,\sqrt{\delta ^{\top } \varLambda (\mathbf{L} )\delta }\right \}, }$$
(7.8)

where \(\mathbf{L}:= [\mathbf{L}_{1},\mathbf{L}_{2},\ldots,\mathbf{L}_{m}]\) and \(\varLambda =\varLambda (\mathbf{L}) \in \mathbb{R}_{+}^{m\times m}\) is a matrix with elements

$$\displaystyle{\lambda _{k,l}:= \left \|\mathbf{L}_{k}^{\mathrm{H}}\mathbf{L}_{ l}\right \|_{2}\ (k,l = 1,2,\ldots,m).}$$

Note that est(L; ⋅ ) is a continuous first order non-differentiable function \(\mathbb{R}_{+}^{m} \rightarrow \mathbb{R}_{+}\).

The estimate (7.8) is in general better than the linear estimate

$$\displaystyle{\|\mathbf{z}\| \leq \sum _{k=1}^{m}\|\mathbf{L}_{ k}\|_{2}\delta _{k}}$$

based on the individual absolute condition numbers \(\|\mathbf{L}_{k}\|_{2}\).

Componentwise local estimates for various classes of matrix equations are also known, see [38] and the bibliography therein.

4.3 Nonlocal Estimates

Nonlocal perturbation estimates for matrix equations may be derived by the technique of Lyapunov majorants [24, 30, 38, 63] and using fixed point principles.

The exact Lyapunov majorant for Eq. (7.7) is

$$\displaystyle{l(\delta,\rho ):=\max \left \{\|\varPi (\mathbf{A},\mathbf{X},\mathbf{E},\mathbf{Y})\|:\| \mathbf{E}_{k}\| \leq \delta _{k},\ \|\mathbf{Y}\| \leq \rho \right \}}$$

(the dependence of l on A and X is not marked since the latter two quantities are considered fixed in the framework of the perturbation problem).

The exact Lyapunov majorant for nonlinear algebraic matrix equations is nonlinear and strictly convex in ρ. However, with rare exceptions, exact Lyapunov majorants cannot be constructed explicitly. That is why we use Lyapunov majorants

$$\displaystyle{h(\delta,\rho ) \geq l(\delta,\rho )}$$

which are not exact but can be constructed in an explicit form.

The technique of Lyapunov majorants uses the majorant equation

$$\displaystyle{ \rho = h(\delta,\rho ) }$$
(7.9)

for determining of ρ as a function of δ. Complete analysis of different types of majorant equations is presented in [38, 47].

Let \(\varDelta \subset \mathbb{R}_{+}^{m}\) be the set of all \(\delta \in \mathbb{R}_{+}^{m}\) such that Eq. (7.9) has a nonnegative solution. The following facts are established in [38] under certain general assumptions (see also [24, 63]).

  • The interior Δ o of Δ is nonempty.

  • For a part of the boundary of Δ the majorant equation has double solution.

  • For δ ∈ Δ o the majorant equation has two positive solutions f(δ), g(δ) such that f(δ) < g(δ).

Moreover, the function f is increasing in its arguments and f(0) = 0. This function is referred to as the small solution to the majorant equation.

Since the operator Π(A, X, E, ⋅ ) transforms the central ball of radius f(δ) into itself, then according to the Schauder fixed point principle equation (7.7) has a solution Y such that

$$\displaystyle{ \|\mathbf{Y}\| \leq f(\delta ),\ \delta \in \varDelta. }$$
(7.10)

This is the desired nonlocal nonlinear perturbation estimate.

Explicit expressions for f(δ) can be derived for polynomial and fractional-affine matrix equations [38]. If for example the matrix equation is quadratic then

$$\displaystyle{ h(\delta,\rho ) = a_{0}(\delta ) + a_{1}(\delta )\rho + a_{2}(\delta )\rho ^{2}, }$$
(7.11)

where a 0(δ), a 1(δ) are expressions of type est(L, δ). In particular \(a_{0}(0) = a_{1}(0) = 0\) and hence

$$\displaystyle{ \|\mathbf{Y}\| \leq f(\delta ):= \frac{2a_{0}(\delta )} {1 - a_{1}(\delta ) + \sqrt{(1 - a_{1 } (\delta ))^{2 } - 4a_{0 } (\delta )a_{2 } (\delta )}} }$$
(7.12)

for

$$\displaystyle{\delta \in \varDelta:= \left \{\delta: \mathbb{R}_{+}^{m}: a_{ 1}(\delta ) + 2\sqrt{a_{0 } (\delta )a_{2 } (\delta )} \leq 1\right \}.}$$

Relations (7.11), (7.12) constitute the nonlocal estimate in this case.

We note finally that local, nonlocal, norm-wise and componentwise perturbation estimates for linear, polynomial and fractional-affine algebraic matrix equations are given in the monograph [38]. Such estimates are used in modern reliable computational methods and algorithms [66, 71, 75].

4.4 Linear Equations

As a first example consider the Lyapunov matrix equation

$$\displaystyle{ F(\mathbf{A},\mathbf{X}):= \mathbf{A}_{1} + \mathbf{A}_{2}\mathbf{X} + \mathbf{X}\mathbf{A}_{2}^{\mathrm{H}} = \mathbf{0}, }$$
(7.13)

where \(\mathbf{A}_{1},\mathbf{A}_{2},\mathbf{X} \in \mathbb{C}^{n\times n}\). Suppose that

$$\displaystyle{\lambda _{p}(\mathbf{A}_{2}) + \overline{\lambda }_{q}(\mathbf{A}_{2})\neq 0\ \ (p,q = 1,2,\ldots,n),}$$

where λ p (A 2) are the eigenvalues of the matrix A 2 counted according to their algebraic multiplicities. Under this assumption the matrix

$$\displaystyle{\mathbf{A}_{0}:= \mathbf{I}_{n} \otimes \mathbf{A}_{2} + \overline{\mathbf{A}_{2}} \otimes \mathbf{I}_{n} \in \mathbb{C}^{n^{2}\times n^{2} }}$$

of the Lyapunov operator XA 2 X + XA 2 H is invertible and Eq. (7.13) has a unique solution X. Moreover, if A 1 H = A 1 then X H = X as well.

The perturbed Lyapunov equation is

$$\displaystyle{ F(\mathbf{A}+\mathbf{E},\mathbf{X}+\mathbf{Y}) = \mathbf{A}_{1} +\mathbf{E}_{1} +(\mathbf{A}_{2} +\mathbf{E}_{2})(\mathbf{X}+\mathbf{Y})+(\mathbf{X}+\mathbf{Y})(\mathbf{A}_{2} +\mathbf{E}_{2})^{\mathrm{H}} = \mathbf{0}, }$$
(7.14)

where the perturbations in the data are bounded as

$$\displaystyle{\|\mathbf{E}_{k}\| \leq \delta _{k}\ \ (k = 1,2).}$$

The condition

$$\displaystyle{\delta _{2} <\delta _{ 2}^{0}:= \frac{1} {2l_{1}},\ \ l_{1}:= \left \|\mathbf{A}_{0}^{-1}\right \|_{ 2}}$$

is sufficient for Eq. (7.14) to have a unique solution. At the same time when δ 2 ≥ δ 2 0 this equation may have no solution or may have a variety of solutions.

We have

$$\displaystyle{\mathbf{y}:= \mathrm{vec}(\mathbf{Y}) = \mathbf{z} + \mathrm{O}(\|\delta \|^{2}),\ \delta \rightarrow 0,}$$

where

$$\displaystyle{\mathbf{z}:= \mathbf{L}_{1}\mathbf{e}_{1} + \mathbf{L}_{2}\mathbf{e}_{2} + \mathbf{L}_{3}\overline{\mathbf{e}}_{2},\ \mathbf{e}_{k} = \mathrm{vec}(\mathbf{E}_{k})}$$

and

$$\displaystyle{\mathbf{L}_{1}:= -\mathbf{A}_{0}^{-1},\ \mathbf{L}_{ 2}:= \mathbf{L}_{1}(\mathbf{X}^{\top }\otimes \mathbf{I}_{ n}),\ \mathbf{L}_{3}:= \mathbf{L}_{1}(\mathbf{I}_{n} \otimes \mathbf{X})\mathcal{V}_{n}.}$$

Here \(\mathcal{V}_{n} \in \mathbb{R}^{n^{2}\times n^{2} }\) is the vec-permutation matrix such that \(\mathrm{vec}(\mathbf{Z}^{\top }) = \mathcal{V}_{n}\mathrm{vec}(\mathbf{Z})\).

According to [38, 47] we have the local perturbation estimate

$$\displaystyle{\|\mathbf{z}\| \leq \mathrm{est}(\mathbf{M}_{1},\mathbf{M}_{2};\delta _{1},\delta _{2}),}$$

where

$$\displaystyle{\mathbf{M}_{1}:= \left [\begin{array}{rr} \mathbf{L}_{10} & -\mathbf{L}_{11} \\ \mathbf{L}_{11} & \mathbf{L}_{10} \end{array} \right ],\ \mathbf{M}_{2}:= \left [\begin{array}{cc} \mathbf{L}_{20} + \mathbf{L}_{30} & \mathbf{L}_{31} -\mathbf{L}_{21} \\ \mathbf{L}_{21} + \mathbf{L}_{31} & \mathbf{L}_{20} -\mathbf{L}_{30} \end{array} \right ]}$$

and \(\mathbf{L}_{k} = \mathbf{L}_{k0} + \imath \mathbf{L}_{k1}\), \(\imath ^{2} = -1\).

To obtain a nonlocal estimate we rewrite the equivalent operator equation for Y in a vector form for y = vec(Y) as

$$\displaystyle{\mathbf{y} =\pi (\mathbf{A},\mathbf{E},\mathbf{y}):= \mathbf{L}_{1}\mathbf{e}_{1} + \mathbf{L}_{2}\mathbf{e}_{2} + \mathbf{L}_{3}\overline{\mathbf{e}}_{2} + \mathbf{L}_{1}\mathrm{vec}(\mathbf{E}_{2}\mathbf{Y} + \mathbf{Y}\mathbf{E}_{2}^{\mathrm{H}}).}$$

Therefore the Lyapunov majorant h is defined by

$$\displaystyle{\|\pi (\mathbf{A},\mathbf{E},\mathbf{y})\| \leq h(\delta,\rho ):= \mathrm{est}(\mathbf{M}_{1},\mathbf{M}_{2};\delta _{1},\delta _{2}) + 2l_{1}\delta _{2}\rho,\ \|\mathbf{y}\| \leq \rho.}$$

Hence for δ 2 < δ 2 0 we have the nonlocal estimate

$$\displaystyle{\|\mathbf{Y}\| =\| \mathbf{y}\| \leq \frac{\mathrm{est}(\mathbf{M}_{1},\mathbf{M}_{2};\delta _{1},\delta _{2})} {1 - 2l_{1}\delta _{2}}.}$$

Similar norm-wise bounds as well as component-wise bounds for more general linear matrix equations

$$\displaystyle{ \mathbf{B}_{0} +\sum _{ k=1}^{r}\mathbf{B}_{ k}\mathbf{X}\mathbf{C}_{k} = \mathbf{0} }$$
(7.15)

with \(m = 2r + 1\) matrix coefficients \(\mathbf{A} = (\mathbf{B}_{0},\mathbf{B}_{1},\mathbf{C}_{1},\ldots,\mathbf{B}_{r},\mathbf{C}_{r})\) are analyzed in a similar way, see e.g. [38, 56]. Here the coefficients B k , C k may not be independent and relations such as B k  = C k H are possible for some indices k.

4.5 Quadratic Equations

Consider the matrix quadratic equation

$$\displaystyle{ F(\mathbf{A},\mathbf{X}):= \mathbf{A}_{1} + \mathbf{A}_{2}\mathbf{X} + \mathbf{X}\mathbf{A}_{3} + \mathbf{X}\mathbf{A}_{4}\mathbf{X} = \mathbf{0}, }$$
(7.16)

where the coefficients A k and the solution X are matrices of appropriate size (we shall use the same notations A k , L k , a k , h etc. for different quantities in each subsection of this paper). Important particular case of a quadratic equation is the Riccati equation

$$\displaystyle{\mathbf{A}_{1} + \mathbf{A}_{2}\mathbf{X} + \mathbf{X}\mathbf{A}_{2}^{\top }-\mathbf{X}\mathbf{A}_{ 3}\mathbf{X} = \mathbf{0};\ \ \mathbf{A}_{1}^{\top } = \mathbf{A}_{ 1},\ \mathbf{A}_{3}^{\top } = \mathbf{A}_{ 3},}$$

arising in the theory of control and filtering of linear continuous time-invariant systems.

Let the matrices A k in (7.16) be perturbed as A k + E k with \(\|\mathbf{E}_{k}\| \leq \delta _{k}\) (k = 1, 2, 3, 4) and let X + Y be the solution to the perturbed equation

$$\displaystyle{F(\mathbf{A} + \mathbf{E},\mathbf{X} + \mathbf{Y}) = 0.}$$

Suppose that the matrix

$$\displaystyle{\mathbf{A}_{0}:= \mathbf{I} \otimes (\mathbf{A}_{2} + \mathbf{X}\mathbf{A}_{4}) + (\mathbf{A}_{3} + \mathbf{A}_{4}\mathbf{X})^{\top }\otimes \mathbf{I}}$$

of the linear matrix operator \(\mathbf{Y}\mapsto (\mathbf{A}_{2} + \mathbf{X}\mathbf{A}_{4})\mathbf{Y} + \mathbf{Y}(\mathbf{A}_{3} + \mathbf{A}_{4}\mathbf{X})\) is invertible and denote

$$\displaystyle{\mathbf{L}_{1}:= -\mathbf{A}_{0}^{-1},\ \mathbf{L}_{ 2}:= \mathbf{L}_{1}(\mathbf{X}^{\top }\otimes \mathbf{I}),\ \mathbf{L}_{ 3}:= \mathbf{L}_{1}(\mathbf{I} \otimes \mathbf{X}),\ \mathbf{L}_{4}:= \mathbf{L}_{1}(\mathbf{X}^{\top }\otimes \mathbf{X}).}$$

Then we have the local bound

$$\displaystyle{\|\mathbf{Y}\| = a_{0}(\delta ) + \mathrm{O}(\|\delta \|^{2}),\ \delta \rightarrow 0,}$$

and the nonlocal bound (7.12), where (see [47])

$$\displaystyle\begin{array}{rcl} a_{0}(\delta )&:=& \mathrm{est}(\mathbf{L}_{1},\mathbf{L}_{2},\mathbf{L}_{3},\mathbf{L}_{4};\delta _{1},\delta _{2},\delta _{3},\delta _{4}), {}\\ a_{1}(\delta )&:=& \mathrm{est}(\mathbf{L}_{1},\mathbf{L}_{2},\mathbf{L}_{3};\delta _{2} +\delta _{3},\delta _{4},\delta _{4}), {}\\ a_{2}(\delta )&:=& l_{1}(\|\mathbf{A}_{4}\|_{2} +\delta _{4}),\ l_{1}:=\| \mathbf{L}_{1}\|_{2}. {}\\ \end{array}$$

Matrix quadratic equations involving more general expressions of the form B k XC k XD k with matrix coefficients B k , C k , D k are analyzed similarly [50].

4.6 Polynomial Equations of Degree d > 2

Matrix polynomial equations of degree d > 2 in X and with m matrix coefficients give rise to Lyapunov majorants

$$\displaystyle{h(\delta,\rho ):=\sum _{ k=0}^{d}a_{ k}(\delta )\rho ^{k},\ \delta = [\delta _{ 1};\delta _{2};\ldots;\delta _{m}] \in \mathbb{R}_{+}^{m}.}$$

Here a k are continuous nonnegative non-decreasing functions in δ of type est or polynomials in δ satisfying

$$\displaystyle{a_{0}(0) = 0,\ \ a_{1}(0) < 1,\ \ a_{d}(0) > 0}$$

(usually we even have a 1(0) = 0). An example of a simple third degree matrix equation is

$$\displaystyle{ \mathbf{A}_{1} + \mathbf{A}_{2}\mathbf{X} + \mathbf{A}_{3}\mathbf{X}^{2} + \mathbf{A}_{ 4}\mathbf{X}^{3} = 0. }$$
(7.17)

For δ sufficiently small (in particular we must at least guarantee that a 1(δ) < 1) the ME

$$\displaystyle{\rho = h(\delta,\rho )}$$

in ρ has small positive solution ρ = f(δ) vanishing together with δ and such that the Frobenius norm \(\|\mathbf{Y}\|\) of the perturbation Y in the solution X satisfies

$$\displaystyle{\|\mathbf{Y}\| \leq f(\delta ),\ \delta \in \varDelta \subset \mathbb{R}_{+}^{m}.}$$

Here Δ is the domain of all δ for which the majorant equation has nonnegative roots.

The boundary ∂ Δ of Δ is defined by the pair of equations

$$\displaystyle{ \rho = h(\delta,\rho ),\ \ \frac{\partial h(\delta,\rho )} {\partial \rho } = 1 }$$
(7.18)

and the inequalities δ k  ≥ 0 (\(k = 1,2,\ldots,m\)). Hence for δ ∈ ∂ Δ either the discriminant of the algebraic equation

$$\displaystyle{a_{0}(\delta ) - (1 - a_{1}(\delta ))\rho +\sum _{ k=2}^{d}a_{ k}(\delta )\rho ^{k} = 0}$$

in ρ is zero or δ k  = 0 for some k.

In general there is no convenient explicit expression for f(δ) when d > 2. Therefore the problem is to find a tight easily computable upper bound \(\hat{f}(\delta )\) for f(δ). For this purpose the dth degree Lyapunov majorant h(δ, ρ) is replaced by a second degree Lyapunov majorant

$$\displaystyle{\hat{h}(\delta,\rho ):= a_{0}(\delta ) + a_{1}(\delta )\rho +\hat{ a}_{2}(\delta )\rho ^{2} \geq h(\delta,\rho ),\ \rho \in [0,\tau (\delta )].}$$

Here the positive quantity τ(ρ) satisfies h(δ, τ(δ)) ≤ τ(δ).

Denoting by \(\hat{f}(\delta )\) the small solution of the new ME

$$\displaystyle{ \rho =\hat{ h}(\delta,\rho ) }$$
(7.19)

we get the perturbation estimate

$$\displaystyle{ \|\mathbf{Y}\| \leq \hat{ f}(\delta ):= \frac{2a_{0}(\delta )} {1 - a_{1}(\delta ) + \sqrt{(1 - a_{1 } (\delta ))^{2 } - 4a_{0 } (\delta )\hat{a}_{2 } (\delta )}} }$$
(7.20)

provided

$$\displaystyle{a_{1}(\delta ) + 2\sqrt{a_{0 } (\delta )\hat{a}_{2 } (\delta )} \leq 1\ \ \mathrm{and}\ \ h(\delta,\tau (\delta )) \leq \tau (\delta ).}$$

We note that both f(δ) and \(\hat{f}(\delta )\) have asymptotic order

$$\displaystyle{ \frac{a_{0}(\delta )} {1 - a_{1}(0)} + \mathrm{O}(\|\delta \|^{2}),\ \delta \rightarrow 0.}$$

To find \(\hat{h}(\delta,\rho )\) and τ(δ) we proceed as follows. For any τ > 0 and ρ ≤ τ we have

$$\displaystyle{h(\delta,\rho ) \leq a_{0}(\delta ) + a_{1}(\delta )\rho +\beta (\delta,\tau )\rho ^{2},}$$

where

$$\displaystyle{ \beta (\delta,\tau ):= a_{2}(\delta ) +\sum _{ k=2}^{d-1}a_{ k+1}(\delta )\tau ^{k-1}. }$$
(7.21)

Let now τ(δ) be a positive nondecreasing expression in δ and ρ ≤ τ(δ). Then we may find an upper bound \(\hat{a}_{2}(\delta )\) for β(δ, τ(δ)) and use it in the estimate (7.20). Choosing different expressions for τ(δ) we obtain different upper bounds \(\hat{a}_{2}(\delta )\) for β(δ, τ(δ)) and different Lyapunov majorants \(\hat{h}(\delta,\rho )\). As a result we get different estimates \(\|\mathbf{Y}\| \leq \hat{ f}(\delta )\).

It must be stressed that if the ME (7.19) has positive roots then its small root \(\hat{f}(\delta )\) does not exceed the value of ρ for which the second equation

$$\displaystyle{ 1 =\omega (\delta,\rho ):=\sum _{ k=0}^{d-1}(k + 1)a_{ k+1}(\delta )\rho ^{k} }$$
(7.22)

in (7.18) is fulfilled. For sufficiently small δ it is fulfilled

$$\displaystyle{\omega (\delta,0) = a_{1}(\delta ) < 1\ \ \mathrm{and}\ \ \omega (\delta,r) > 1,\ \ r:= (\mathit{da}_{d}(\delta ))^{1/(1-d)}.}$$

Hence there is a unique positive solution ρ = τ(δ) of Eq. (7.22). This solution may exist even when the ME (7.19) has no positive solutions. But if Eq. (7.19) has positive solutions \(\hat{f}(\delta ) \leq \hat{ g}(\delta )\) then

$$\displaystyle{f(\delta ) \leq \tau (\delta ) \leq g(\delta )\ \mathrm{and}\ h(\delta,\tau (\delta )) \leq \tau (\delta )}$$

by necessity. Using this approach we distinguish two cases.

The case \(\boldsymbol{d = 3}\). Here τ(δ) may be computed directly from (7.22) which in this case is a quadratic equation

$$\displaystyle{3a_{3}(\delta )\tau ^{2} + 2a_{ 2}(\delta )\tau - (1 - a_{1}(\delta )) = 0.}$$

We have

$$\displaystyle{ \tau (\delta ) = \frac{1 - a_{1}(\delta )} {a_{2}(\delta ) + \sqrt{a_{2 }^{2 }(\delta ) + 3a_{3 } (\delta )(1 - a_{1 } (\delta ))}}. }$$
(7.23)

For ρ ≤ τ(δ) we have

$$\displaystyle\begin{array}{rcl} h(\delta,\rho ) \leq \hat{ h}(\delta,\rho ):= a_{0}(\delta ) + a_{1}(\delta )\rho +\hat{ a}_{2}(\delta )\rho ^{2},\ \hat{a}_{ 2}(\delta ):= a_{2}(\delta ) + a_{3}(\delta )\tau (\delta ).& &{}\end{array}$$
(7.24)

Hence \(\hat{h}(\delta,\rho )\) is a new quadratic Lyapunov majorant. As a result we get the nonlocal perturbation estimate (7.20) with \(\hat{a}_{2}(\delta )\) defined in (7.24).

Consider again the matrix cubic equation (7.17). Suppose that the matrix

$$\displaystyle{\mathbf{A}_{0}:= \mathbf{I} \otimes (\mathbf{A}_{2} + \mathbf{A}_{3}\mathbf{X} + \mathbf{A}_{4}\mathbf{X}^{2}) + \mathbf{X}^{\top }\otimes \mathbf{A}_{ 3} + (\mathbf{X}^{\top }\otimes \mathbf{A}_{ 4})(\mathbf{I} \otimes \mathbf{X} + \mathbf{X}^{\top }\otimes \mathbf{I})}$$

of the linear operator

$$\displaystyle{\mathbf{Y}\mapsto (\mathbf{A}_{2} + \mathbf{A}_{3}\mathbf{X} + \mathbf{A}_{4}\mathbf{X}^{2})\mathbf{Y} + \mathbf{A}_{ 3}\mathbf{Y}\mathbf{X} + \mathbf{A}_{4}(\mathbf{X}\mathbf{Y} + \mathbf{Y}\mathbf{X})\mathbf{X}}$$

is invertible. Denote

$$\displaystyle{\mathbf{L}_{1}:= -\mathbf{A}_{0}^{-1},\ \mathbf{L}_{ k}:= \mathbf{L}_{1}\left ((\mathbf{X}^{k-1})^{\top }\otimes \mathbf{I}\right ),\ l_{ k}:=\| \mathbf{L}_{k}\|_{2}\ (k = 1,2,3,4),\ x:=\| \mathbf{X}\|_{2}.}$$

Then the Lyapunov majorant for Eq. (7.17) is

$$\displaystyle{a_{0}(\delta ) + a_{1}(\delta )\rho + a_{2}(\delta )\rho ^{2} + a_{ 3}(\delta )\rho ^{3},}$$

where

$$\displaystyle\begin{array}{rcl} a_{0}(\delta )&:=& \mathrm{est}(\mathbf{L}_{1},\mathbf{L}_{2},\mathbf{L}_{3},\mathbf{L}_{4};\delta _{1},\delta _{2},\delta _{3},\delta _{4}), {}\\ a_{1}(\delta )&:=& l_{1}\delta _{2} + (l_{1}x + l_{2})\delta _{3} + (l_{1}x^{2} + l_{ 2}x + l_{3})\delta _{4}, {}\\ a_{2}(\delta )&:=& \|\mathbf{L}_{1}(\mathbf{I} \otimes \mathbf{A}_{3})\|_{2} + (1 + x)\left \|\mathbf{L}_{1}\left (\mathbf{X}^{\top }\otimes \mathbf{A}_{ 4}\right )\right \|_{2} +\| \mathbf{L}_{1}(\mathbf{I} \otimes (\mathbf{A}_{4}\mathbf{X}))\|_{2} {}\\ & & +\ l_{1}\delta _{3} + (l_{2} + 2l_{1}x)\delta _{4}, {}\\ a_{3}(\delta )&:=& \|\mathbf{L}_{1}(\mathbf{I} \otimes \mathbf{A}_{4})\|_{2} + l_{1}\delta _{4}. {}\\ \end{array}$$

Now we may apply the estimate (7.20) in view of (7.24).

The case \(\boldsymbol{d > 3}\). Here the estimation of the quantity τ(δ) is more subtle. We shall work with certain easily computable quantities

$$\displaystyle{\gamma _{k+1}(\delta ) \geq a_{k+1}(\delta )\tau ^{k-1}(\delta )}$$

in (7.21), see [38].

Consider again Eq. (7.22) in ρ for a given small δ which guarantees that (7.22) has a (unique) root ρ = τ(δ). For \(k = 2,3,\ldots,d - 1\) we have

$$\displaystyle{(k + 1)a_{k+1}(\delta )\tau ^{k}(\delta ) \leq 1 - a_{ 1}(\delta )}$$

and

$$\displaystyle{\tau (\delta ) \leq \left ( \frac{1 - a_{1}(\delta )} {(k + 1)a_{k+1}(\delta )}\right )^{1/k},\ a_{ k+1}(\delta ) > 0.}$$

Hence

$$\displaystyle{a_{k+1}(\delta )\tau ^{k-1}(\delta ) \leq \gamma _{ k+1}(\delta ):= a_{k+1}^{1/k}(\delta )\left (\frac{1 - a_{1}(\delta )} {k + 1} \right )^{1-1/k}}$$

and

$$\displaystyle{ \beta (\delta,\tau (\delta )) \leq \hat{ a}_{2}(\delta ):= a_{2}(\delta ) +\sum _{ k=2}^{d-1}\gamma _{ k+1}(\delta ). }$$
(7.25)

Hence we get again the nonlocal perturbation estimate (7.20) with \(\hat{a}_{2}(\delta )\) defined in (7.25).

4.7 Fractional-Affine Equations

Fractional-affine matrix equations (FAME) involve inversions of affine expressions in X such as the left-hand side of Eq. (7.15). A famous example of a FAME is the equation

$$\displaystyle{\mathbf{A}_{1} -\mathbf{X} + \mathbf{A}_{2}^{\mathrm{H}}(\mathbf{I} + \mathbf{A}_{ 3}\mathbf{X})^{-1}\mathbf{A}_{ 2} = 0}$$

arising in linear-quadratic optimization and filtering of discrete-time dynamic systems [66, 75]. Here the matrices A 1 = A 1 H and A 3 = A 3 H are nonnegative definite, the pair [A 2, A 3) is stabilizable and the pair (A 1, A 2] is detectable (we denote matrix pairs so that the state matrix A 2 is near the square bracket).

The Lyapunov majorant h(δ, ρ) for such equations may be chosen as a rational function of ρ with coefficients depending on the perturbation vector δ (see [47] for more details)

$$\displaystyle{ h(\delta,\rho ):= b_{0}(\delta ) + b_{1}(\delta )\rho + \frac{b_{2}(\delta ) + b_{3}(\delta )\rho + b_{4}(\delta )\rho ^{2}} {b_{5}(\rho ) - b_{6}(\delta )\rho }. }$$
(7.26)

Here the following conditions are fulfilled (some of them for δ sufficiently small).

  1. 1.

    The functions \(b_{1},b_{2},\ldots,b_{6}\) are nonnegative and continuous in δ.

  2. 2.

    The functions b k are nondecreasing for k ≠ 5, while the function b 5 is positive and non-increasing in δ.

  3. 3.

    The relations

    $$\displaystyle{b_{0}(0) = b_{2}(0) = 0,\ b_{1}(0) < 1,\ b_{5}(0) > 0,\ b_{1}(0) + \frac{b_{3}(0)} {b_{5}(0)} < 1}$$

    take place.

Denote

$$\displaystyle\begin{array}{rcl} c_{0}(\delta )&:=& b_{2}(\delta ) + b_{0}(\delta )b_{5}(\delta ), {}\\ c_{1}(\delta )&:=& b_{5}(\delta )(1 - b_{1}(\delta )) + b_{0}(\delta )b_{6}(\delta ) - b_{3}(\delta ), {}\\ c_{2}(\delta )&:=& b_{4}(\delta ) + b_{6}(\delta )(1 - b_{1}(\delta )). {}\\ \end{array}$$

Then the majorant equation ρ = h(δ, ρ) takes the form

$$\displaystyle{c_{2}(\delta )\rho ^{2} - c_{ 1}(\delta )\rho + c_{0}(\delta ) = 0.}$$

On the other hand we have c 0(0) = 0 and c 1(0) > 0. Hence for small δ it is fulfilled that c 1(δ) > 0 and c 1 2(δ) > 4c 0(δ)c 2(δ). Set

$$\displaystyle{\varDelta:= \left \{\delta \in \mathbb{R}_{+}^{m}: c_{ 1}(\delta ) > 0,\ c_{1}^{2}(\delta ) \geq 4c_{ 0}(\delta )c_{2}(\delta )\right \}.}$$

It may be shown that the set Δ has nonempty interior. Hence the perturbation bound corresponding to the Lyapunov majorant (7.26) is

$$\displaystyle{ f(\delta ) = \frac{2c_{0}(\delta )} {c_{1}(\delta ) + \sqrt{c_{1 }^{2 }(\delta ) - 4c_{0 } (\delta )c_{2 } (\delta )}},\ \delta \in \varDelta. }$$
(7.27)

As a particular example consider the FAME

$$\displaystyle{ F(\mathbf{A},\mathbf{X}):= \mathbf{A}_{1} + \mathbf{A}_{2}\mathbf{X} + \mathbf{X}\mathbf{A}_{3} + \mathbf{A}_{4}\mathbf{X}^{-1}\mathbf{A}_{ 5} = \mathbf{0}. }$$
(7.28)

The perturbation analysis of such equations uses the technique of Lyapunov majorants combined with certain useful matrix identities.

The solution X of (7.28) is invertible and so is the matrix \(\mathbf{Z}:= \mathbf{X} + \mathbf{Y}\) for small matrix perturbations Y. If in particular \(\|\mathbf{Y}\| \leq \rho <\sigma\) then Z is invertible and

$$\displaystyle{\|\mathbf{Z}^{-1}\|_{ 2} \leq \frac{1} {\sigma -\rho },}$$

where \(\sigma =\| \mathbf{X}^{-1}\|^{-1}\) is the minimum singular value of X. We also have the identities

$$\displaystyle\begin{array}{rcl} \mathbf{Z}^{-1}& =& \mathbf{X}^{-1} -\mathbf{X}^{-1}\mathbf{Y}\mathbf{Z}^{-1} = \mathbf{X}^{-1} -\mathbf{Z}^{-1}\mathbf{Y}\mathbf{X}^{-1} {}\\ & =& \mathbf{X}^{-1} -\mathbf{X}^{-1}\mathbf{Y}\mathbf{X}^{-1} + \mathbf{X}^{-1}\mathbf{Y}\mathbf{Z}^{-1}\mathbf{Y}\mathbf{X}^{-1}. {}\\ \end{array}$$

As a result we get the following matrix identity typical for the proper perturbation analysis of FAME [47, 54]

$$\displaystyle\begin{array}{rcl} F(\mathbf{A} + \mathbf{E},\mathbf{X} + \mathbf{Y})& =& F(\mathbf{A},\mathbf{X}) + F_{\mathbf{X}}(\mathbf{A},\mathbf{X})(\mathbf{Y}) {}\\ & & +\ F_{0}(\mathbf{A},\mathbf{X},\mathbf{E}) + F_{1}(\mathbf{A},\mathbf{X},\mathbf{E},\mathbf{Y}) + F_{2}(\mathbf{A},\mathbf{X},\mathbf{Y}), {}\\ \end{array}$$

where the Frćhet derivative F X (A, X) is determined by

$$\displaystyle{F_{\mathbf{X}}(\mathbf{A},\mathbf{X})(\mathbf{Y}) = \mathbf{A}_{2}\mathbf{Y} + \mathbf{Y}\mathbf{A}_{3} -\mathbf{A}_{4}\mathbf{X}^{-1}\mathbf{Y}\mathbf{X}^{-1}\mathbf{A}_{ 5}}$$

and

$$\displaystyle\begin{array}{rcl} F_{0}(\mathbf{A},\mathbf{X},\mathbf{E})&:=& \mathbf{E}_{1} + \mathbf{E}_{2}\mathbf{X} + \mathbf{X}\mathbf{E}_{3} + \mathbf{A}_{4}\mathbf{X}^{-1}\mathbf{E}_{ 5} + \mathbf{E}_{4}\mathbf{X}^{-1}\mathbf{A}_{ 5} + \mathbf{E}_{4}\mathbf{Z}^{-1}\mathbf{E}_{ 5}, {}\\ F_{1}(\mathbf{A},\mathbf{X},\mathbf{E},\mathbf{Y})&:=& \mathbf{E}_{2}\mathbf{Y} + \mathbf{Y}\mathbf{E}_{3} -\mathbf{A}_{4}\mathbf{X}^{-1}\mathbf{Y}\mathbf{Z}^{-1}\mathbf{E}_{ 5} -\mathbf{E}_{4}\mathbf{Z}^{-1}\mathbf{Y}\mathbf{X}^{-1}\mathbf{A}_{ 5}, {}\\ F_{2}(\mathbf{A},\mathbf{X},\mathbf{Y})&:=& \mathbf{A}_{4}\mathbf{X}^{-1}\mathbf{Y}\mathbf{Z}^{-1}\mathbf{Y}\mathbf{X}^{-1}\mathbf{A}_{ 5}. {}\\ \end{array}$$

Suppose that the matrix

$$\displaystyle{\mathbf{A}_{0}:= \mathbf{I} \otimes \mathbf{A}_{2} + \mathbf{A}_{3}^{\top }\otimes \mathbf{I} -\mathbf{B}}$$

of the linear matrix operator F X (A, X) is invertible, where

$$\displaystyle{\mathbf{B}:= \left (\mathbf{X}^{-1}\mathbf{A}_{ 5}\right )^{\top }\otimes \left (\mathbf{A}_{ 4}\mathbf{X}^{-1}\right ),}$$

and denote

$$\displaystyle\begin{array}{rcl} & & \mathbf{L}_{1}:= -\mathbf{A}_{0}^{-1},\ \mathbf{L}_{ 2}:= \mathbf{L}_{1}(\mathbf{I} \otimes \mathbf{X}),\ \mathbf{L}_{3}:= \mathbf{L}_{1}(\mathbf{I} \otimes \mathbf{X}), {}\\ & & \mathbf{L}_{4}:= \mathbf{L}_{1}\left (\left (\mathbf{X}^{-1}\mathbf{A}_{ 5}\right )^{\top }\otimes \mathbf{I}\right ),\ \mathbf{L}_{ 5}:= \mathbf{L}_{1}\left (\mathbf{I} \otimes \left (\mathbf{A}_{4}\mathbf{X}^{-1}\right )\right ). {}\\ \end{array}$$

As it is shown in [47] the Lyapunov majorant of type (7.26) for Eq. (7.28) is determined by the relations

$$\displaystyle\begin{array}{rcl} b_{0}(\delta )&:=& \mathrm{est}(\mathbf{L}_{1},\mathbf{L}_{2},\mathbf{L}_{3},\mathbf{L}_{4};\delta _{1},\delta _{2},\delta _{3},\delta _{4}),\ b_{1}(\delta ):= l_{1}(\delta _{2} +\delta _{3}),\ b_{2}(\delta ):= l_{1}\delta _{4}\delta _{5}, \\ b_{3}(\delta )&:=& \mathrm{est}(\mathbf{L}_{4},\mathbf{L}_{5};\delta _{4},\delta _{5}),\ b_{4}:=\| \mathbf{L}_{1}\mathbf{B}\|_{2},\ b_{5} =\sigma,\ b_{6}:= 1,\ l_{1}:=\| \mathbf{L}_{1}\|_{2}. {}\end{array}$$
(7.29)

Therefore the perturbation bound for Eq. (7.28) is given by (7.27), where the quantities \(b_{1},b_{2},\ldots,b_{6}\) are defined in (7.29).

5 Matrix Decompositions

5.1 Introductory Remarks

A complex or real square matrix U is said to be unitary if U H U = I, where I is the identity matrix. The group of unitary (n × n)-matrices is denoted as \(\mathcal{U}(n)\). Real unitary matrices satisfy U U = I and are also called orthogonal – a term which is not very suitable since orthogonality is a relation between two objects. The columns u k of an unitary matrix U satisfy u k H u l  = δ k, l (the Kronecker delta).

Unitary matrices and transformations play a major role in theoretical and numerical matrix algebra [23, 27]. Moreover, a computational matrix algorithm for implementation in finite machine arithmetic can hardly be recognized as reliable unless it is based on unitary transformations (algorithms like Gaussian elimination are among the rare exceptions from this rule).

It suffices to mention the QR decomposition and singular value decomposition (SVD) of rectangular matrices, the Schur decomposition and anti-triangular Schur decomposition of square matrices and the Hamiltonian-Schur and block-Schur decomposition of Hamiltonian matrices [1, 62].

Unitary (n × n)-matrices U have excellent numerical properties since they are easily inverted (without rounding errors) and have small norm:

$$\displaystyle{\mathbf{U}^{-1} = \mathbf{U}^{\mathrm{H}}\ \ \mathrm{and}\ \ \|\mathbf{U}\|_{ 2} = 1,\ \|\mathbf{U}\| = \sqrt{n}\,.}$$

Perturbation bounds for these and other matrix decompositions are presented in many articles and monographs [11, 14, 15, 40, 52, 57, 58, 8490].

A unified effective approach to the perturbation analysis of matrix problems involving unitary matrices was firstly proposed in [52, 74]. It is called the Method of Splitting Operators and Lyapunov Majorants (MSOLM). This method and its main applications are presented in [44, 45]. The main applications of MSOLM include the unitary matrix decompositions mentioned above as well as certain control problems. Among the latter we shall mention the synthesis of linear systems with desired equivalent form (pole assignment synthesis in particular) and the transformation into canonical or condensed unitary (orthogonal in particular) form.

5.2 Splitting Operators and Lyapunov Majorants

MSOLM is based on splitting of matrix operators \(\mathcal{P}: \mathbb{C}^{m\times n} \rightarrow \mathbb{C}^{m\times n}\) and their matrix arguments X into strictly lower, diagonal and strictly upper parts

$$\displaystyle{\mathbf{X}_{1} = \mathrm{Low}(\mathbf{X}),\ \mathbf{X}_{2} = \mathrm{Diag}(\mathbf{X})\ \ \mathrm{and}\ \ \mathbf{X}_{3} = \mathrm{Up}(\mathbf{X}),}$$

namely

$$\displaystyle{\mathbf{X} = \mathbf{X}_{1} + \mathbf{X}_{2} + \mathbf{X}_{3}\ \ \mathrm{and}\ \ \mathcal{P} = \mathcal{P}_{1} + \mathcal{P}_{2} + \mathcal{P}_{3},}$$

where

$$\displaystyle{\mathcal{P}_{1}:= \mathrm{Low} \circ \mathcal{P},\ \mathcal{P}_{2}:= \mathrm{Diag} \circ \mathcal{P}\ \ \mathrm{and}\ \ \mathcal{P}_{3}:= \mathrm{Up} \circ \mathcal{P}.}$$

If for example \(\mathbf{X} = [x_{k,l}] \in \mathbb{C}^{3\times 3}\) then

$$\displaystyle{\mathbf{X}_{1} = \left [\begin{array}{ccc} 0 & 0 &0\\ x_{ 2,1} & 0 &0 \\ x_{3,1} & x_{3,2} & 0 \end{array} \right ],\ \mathbf{X}_{2} = \left [\begin{array}{ccc} x_{1,1} & 0 & 0 \\ 0 &x_{2,2} & 0 \\ 0 & 0 &x_{3,3} \end{array} \right ]\ \ \mathrm{and}\ \ \mathbf{X}_{3} = \left [\begin{array}{ccc} 0&x_{1,2} & x_{1,3} \\ 0& 0 &x_{2,3} \\ 0& 0 & 0 \end{array} \right ].}$$

The operators Low, Diag and Up are projectors of the matrix space \(\mathbb{C}^{m\times n}\) on the subspaces of strictly lower triangular, diagonal and strictly upper triangular matrices. The properties of splitting operators are studied in detail in [44, 45].

Let for simplicity m = n and denote by \(\mathcal{T}_{n} \subset \mathbb{C}^{n\times n}\) the set of upper triangular matrices. Then we have

$$\displaystyle{\mathrm{Up}(\mathbf{X}) = \mathrm{Low}(\mathbf{X}^{\top })^{\top },\ \mathcal{T}_{ n} = (\mathrm{Diag} + \mathrm{Up})(\mathbb{C}^{n\times n}).}$$

If \(\mathbf{e}_{k} \in \mathbb{C}^{n}\) is the unit vector with e k (l) = δ k, l (the Kronecker delta) then

$$\displaystyle{\mathrm{Low}(\mathbf{X}) =\sum _{ l=1}^{n-1}\sum _{ k=l+1}^{n}\mathbf{e}_{ k}\mathbf{e}_{k}^{\top }\mathbf{X}\mathbf{e}_{ l}\mathbf{e}_{l}^{\top },\ \mathrm{Diag}(\mathbf{X}) =\sum _{ k=1}^{n}\mathbf{e}_{ k}\mathbf{e}_{k}^{\top }\mathbf{X}\mathbf{e}_{ k}\mathbf{e}_{k}^{\top }.}$$

The vectorizations of the splitting operators contain many zeros. That is why we prefer to work with the compressed vectorizations of the splitting operators Low and Diag, namely

$$\displaystyle{\mathrm{lvec}: \mathbb{C}^{n\times n} \rightarrow \mathbb{C}^{\nu },\ \mathrm{dvec}: \mathbb{C}^{n\times n} \rightarrow \mathbb{C}^{n},}$$

where \(\nu:= n(n - 1)/2\) and

$$\displaystyle\begin{array}{rcl} \mathrm{lvec}(\mathbf{X})&:=& [x_{2,1},x_{3,1},\ldots,x_{n,1},x_{3,2},x_{4,2},\ldots,x_{n,2},\ldots,x_{n-1,n-2},x_{n,n-2},x_{n,n-1}]^{\top }\in \,\mathbb{C}^{\nu }, {}\\ \mathrm{dvec}(\mathbf{X})&:=& [x_{1,1},x_{2,2},\ldots,x_{n,n}]^{\top }\in \mathbb{C}^{n}. {}\\ \end{array}$$

Thus the vector \(\mathrm{lvec}(\mathbf{X})\) contains the strictly lower part of the matrix Low(X) spanned column-wise and dvec(X) is the vector of the diagonal elements of the matrix Diag(X).

We have

$$\displaystyle{\mathrm{lvec}(\mathbf{X}):= \mathbf{M}_{\mathrm{lvec}}\mathrm{vec}(X),\ \mathrm{dvec}(\mathbf{X}):= \mathbf{M}_{\mathrm{dvec}}\mathrm{vec}(X),}$$

where the matrices M lvec, M dvec of the operators lvec, dvec are given by

$$\displaystyle\begin{array}{rcl} \mathbf{M}_{\mathrm{lvec}}&:=& \left [\mathrm{diag}(\mathbf{N}_{1},\mathbf{N}_{2},\ldots,\mathbf{N}_{n-1}),\mathbf{0}_{\nu \times n}\right ] \in \mathbb{R}^{\nu \times n^{2} }, {}\\ \mathbf{M}_{\mathrm{dvec}}&:=& \mathrm{diag}\left (\mathbf{e}_{1}^{\top },\mathbf{e}_{ 2}^{\top },\ldots,\mathbf{e}_{ n}^{\top }\right ) \in \mathbb{R}^{n\times n^{2} }, {}\\ \end{array}$$

where

$$\displaystyle{\mathbf{N}_{k}:= [\mathbf{0}_{(n-k)\times k},\mathbf{I}_{n-k}] \in \mathbb{R}^{(n-k)\times n}\ (k = 1,2,\ldots,n - 1).}$$

Let

$$\displaystyle{\mathrm{lvec}^{\dag }: \mathbb{C}^{\nu } \rightarrow \mathbb{C}^{n\times n}}$$

be the right inverse of the operator lvec such that

$$\displaystyle{\mathrm{lvec} \circ \mathrm{lvec}^{\dag } = \mathbf{I}_{\nu \times \nu }\ \ \mathrm{and}\ \ \mathrm{lvec}^{\dag }\circ \mathrm{lvec} = \mathrm{Low}.}$$

Then the matrix of lvec is

$$\displaystyle{\mathbf{M}_{\mathrm{lvec}^{\dag }} = \mathbf{M}_{\mathrm{lvec}}^{\top }.}$$

An important property of the above splittings is that if \(\mathbf{T} \in \mathcal{T}_{n}\) and \(\mathbf{X} \in \mathbb{C}^{n\times n}\) then the matrices Low(XT) and Low(TX) depend only on the strictly lower part Low(X) of X rather than on the whole matrix X.

In a more general setting we have the following result [44]. Let the matrix operator \(\mathcal{L}: \mathbb{C}^{n\times n} \rightarrow \mathbb{C}^{n\times n}\) be determined from

$$\displaystyle{\mathcal{L}(\mathbf{X}) = \mathbf{A}\mathbf{X}\mathbf{B},}$$

where \(\mathbf{A},\mathbf{B} \in \mathcal{T}_{n}\) are given matrices. Then

$$\displaystyle{\mathrm{Low} \circ \mathcal{L} = \mathrm{Low} \circ \mathcal{L}\circ \mathrm{Low}.}$$

The matrix problems considered in this and next sections may be formulated as follows. We have a collection \(\mathbf{A} = (\mathbf{A}_{1},\mathbf{A}_{2},\ldots,\mathbf{A}_{m}) \in \mathcal{A}\) of data matrices A k . The resulting matrix (or the solution)

$$\displaystyle{\mathbf{R} =\varPsi (\mathbf{A},\mathbf{U})}$$

is an upper triangular (or anti-triangular) matrix, where \(\mathbf{U} = (\mathbf{U}_{1},\mathbf{U}_{2},\ldots,\mathbf{U}_{k})\) is a collection of matrices \(\mathbf{U}_{p} \in \mathcal{U}(n)\). At the same time the matrix arguments A and U of Ψ are not independent. Moreover, in many problems the matrix collection U is determined by the data A “almost uniquely” via the requirement

$$\displaystyle{\mathrm{Low}(\varPsi (\mathbf{A},\mathbf{U})) = \mathbf{0}.}$$

(we stress that the result R may also be a collection rather than a single matrix).

In certain problems with k = 1 the matrix U = U 1 is uniquely determined by the data. This may be the case when R is the canonical form of A for certain multiplicative action of the group \(\mathcal{U}(n)\) on the set of data [45, 48]. Let for example the matrix \(\mathbf{A} \in \mathbb{C}^{n\times n}\) be invertible and

$$\displaystyle{\mathbf{A} = \mathbf{U}\mathbf{R}}$$

be its QR decomposition, where \(\mathbf{U} \in \mathcal{U}(n)\) and Low(R) = 0 (for unity of notations we denote this decomposition as A = UR instead of the widely used A = QR). Under certain additional requirements we may consider

$$\displaystyle{\mathbf{R} =\varPsi (\mathbf{A},\mathbf{U}):= \mathbf{U}^{\mathrm{H}}\mathbf{A}}$$

as the canonical form of A under the left multiplicative action of the group \(\mathcal{U}(n)\). Since the diagonal elements of R are nonzero we may force them to be real and positive. In this case U is determined uniquely and R is the canonical form of A for this action [45].

In other problems the transformation matrix \(\mathbf{U} \in \mathcal{U}(n)\) cannot be chosen uniquely. For example, in the Schur decomposition A = URU H of A, the Schur form R (either canonical or only condensed) satisfies Low(R) = 0. At the same time any other matrix ν U is also unitary and transforms A into R provided that ν = exp(ı ω) and \(\omega \in \mathbb{R}\).

However, in practice condensed rather than canonical forms are used. In this case, due to the non-uniqueness of the condensed forms and their perturbations, the perturbation bounds are valid only for some (but not for all) of the solutions of the perturbed problem.

Let \(\mathbf{E} = (\mathbf{E}_{1},\mathbf{E}_{2},\ldots,\mathbf{E}_{m})\) be a perturbation in the collection \(\mathbf{A}\) such that

$$\displaystyle{\|\mathbf{E}_{k}\| \leq \delta _{k}\ \ (k = 1,2,\ldots,m).}$$

Suppose that the perturbed problem with data A + E has a solution

$$\displaystyle{\mathbf{R} + \mathbf{Z} =\varPsi (\mathbf{A} + \mathbf{E},\mathbf{U} + \mathbf{V}) \subset \mathcal{T}_{n},}$$

such that the perturbation Z in R satisfies Low(Z) = 0 and let

$$\displaystyle{\mathbf{U} + \mathbf{V} = \mathbf{U}(\mathbf{I} + \mathbf{X}) \in \mathcal{U}(n),\ \mathbf{X}:= \mathbf{U}^{\mathrm{H}}\mathbf{V}}$$

be the perturbed transformation matrix.

The norm-wise perturbation problem is to estimate the norm of the perturbation

$$\displaystyle{\mathbf{Z} = Z(\mathbf{A},\mathbf{E},\mathbf{X}) =\varPsi (\mathbf{A} + \mathbf{E},\mathbf{U}(\mathbf{I} + \mathbf{X})) -\varPsi (\mathbf{A},\mathbf{U})}$$

in the solution R as well as the norm of the perturbation V in the transformation matrix U as functions of the perturbation vector

$$\displaystyle{\delta = [\delta _{1};\delta _{2};\ldots;\delta _{m}] \in \mathbb{R}_{+}^{m},}$$

e.g.

$$\displaystyle{ \|\mathbf{Z}\| \leq f_{\mathbf{R}}(\delta ),\ \|\mathbf{V}\| =\| \mathbf{X}\| \leq f_{\mathbf{U}}(\delta ), }$$
(7.30)

where the nonnegative valued functions f R and f U are continuous and nondecreasing in δ, and \(f_{\mathbf{R}}(0) = f_{\mathbf{U}}(0) = 0\).

We stress again that the perturbed problem may have solutions in which V is not small even when E is small or even zero (in the latter case the problem is actually not perturbed). This may occur when we deal with condensed forms or with canonical forms for which U is not uniquely determined.

To illustrate this phenomenon consider again the Schur decomposition A = URU H of the matrix A and let E = 0. Choosing \(\mathbf{V} = -2\mathbf{U}\) we see that the matrix \(\mathbf{U} + \mathbf{V} = -\mathbf{U} \in \mathcal{U}(n)\) also solves the problem, i.e. transforms A into R. However, in this case the norm \(\|\mathbf{V}\| = 2\sqrt{n}\) of the perturbation V is the maximum possible and does not satisfy any estimate of type (7.30). Similar effects may arise in some perturbation problems in control theory for which the solution set is not even bounded! So what is way out of this situation?

We can only assert that the perturbation estimates (7.30) are valid for some perturbations Z and V. At the same time the perturbed problem may have other solutions which are not small and for which the inequalities (7.30) does not hold true. We may formalize these considerations as follows.

For a given fixed perturbation E let \(\mathbb{V}_{\mathbf{E}} \subset \mathbb{C}^{n\times n}\) be the set of all V satisfying the perturbed problem

$$\displaystyle{\mathrm{Low}(\varPsi (\mathbf{A} + \mathbf{E},\mathbf{U} + \mathbf{V}) -\varPsi (\mathbf{A},\mathbf{U})) = \mathbf{0},\ \mathbf{U} + \mathbf{V} \in \mathcal{U}(n).}$$

Since the set \(\mathbb{V}_{E}\) is defined by a system of polynomial equations it is compact and the infimum

$$\displaystyle{\inf \{\|\mathbf{V}\|: \mathbf{V} \in \mathbb{V}_{\mathbf{E}}\} =\| \mathbf{V}_{0}\|}$$

is reached for some \(\mathbf{V}_{0} \in \mathbb{V}_{\mathbf{E}}\). Now we may choose V = V 0 and claim that the estimates (7.30) will be valid for this particular value of V.

Since the matrix I + X is unitary we have \((\mathbf{I} + \mathbf{X})^{\mathrm{H}}(\mathbf{I} + \mathbf{X}) = \mathbf{I}\) and

$$\displaystyle{\mathbf{X}^{\mathrm{H}} + \mathbf{X} + \mathbf{X}^{\mathrm{H}}\mathbf{X} = \mathbf{0}.}$$

Hence

$$\displaystyle{\mathbf{X}^{\mathrm{H}} = -\mathbf{X} + \mathrm{O}(\|\mathbf{X}\|^{2}),\ \mathbf{X} \rightarrow \mathbf{0}.}$$

Now the problem is to estimate the norm of X. Splitting X as \(\mathbf{X}_{1} + \mathbf{X}_{2} + \mathbf{X}_{3}\) above, we rewrite the perturbed problem as an operator equation

$$\displaystyle{\mathbf{X} =\varPi (\mathbf{A},\mathbf{E},\mathbf{X}),}$$

or as a system of three operator equations

$$\displaystyle\begin{array}{rcl} \mathbf{X}_{1}& =& \varPi _{1}(\mathbf{A},\mathbf{E},\mathbf{X}), \\ \mathbf{X}_{2}& =& \varPi _{2}(\mathbf{X}):= -0.5\,\mathrm{Diag}(\mathbf{X}^{\mathrm{H}}\mathbf{X}), \\ \mathbf{X}_{3}& =& \varPi _{3}(\mathbf{X}):= -\mathrm{Up}(\mathbf{X}^{\mathrm{H}}) -\mathrm{Up}(\mathbf{X}^{\mathrm{H}}\mathbf{X}).{}\end{array}$$
(7.31)

The right-hand side Π 1(A, E, X) of the first equation in (7.31) depends on the particular problem, while the second and the third equalities are universal equations. The only information that we need about the universal equations is that

$$\displaystyle{\|\varPi _{2}(\mathbf{X})\| \leq 0.5\|\mathbf{X}\|^{2},\ \|\varPi _{ 3}(\mathbf{X})\| \leq \|\mathbf{X}_{1}\| +\mu _{n}\|\mathbf{X}\|^{2},}$$

where

$$\displaystyle{ \mu _{n}:= \sqrt{\frac{n - 1} {2n}} \,. }$$
(7.32)

Further on we introduce a generalized norm \(\vert \cdot \vert: \mathbb{C}^{n\times n} \rightarrow \mathbb{R}_{+}^{3}\) from

$$\displaystyle{\vert \mathbf{X}\vert:= [\|\mathbf{X}_{1}\|;\|\mathbf{X}_{2}\|;\|\mathbf{X}_{3}\|] \in \mathbb{R}_{+}^{3}\,.}$$

In certain problems the splitting of X and Π is done in p > 3 parts \(\mathbf{X}_{1},\mathbf{X}_{2},\ldots,\mathbf{X}_{p}\) and \(\varPi _{1},\varPi _{2},\ldots,\varPi _{p}\). Here the generalized norm | X | of X is a nonnegative p-vector (see [40]),

$$\displaystyle{\vert \mathbf{X}\vert:= [\|\mathbf{X}_{1}\|;\|\mathbf{X}_{2}\|;\ldots;\|\mathbf{X}_{p}\|] \in \mathbb{R}_{+}^{p}\,.}$$

Here the vector Lyapunov majorant

$$\displaystyle{\mathbf{h} = [h_{1};h_{2};\ldots;h_{p}]: \mathbb{R}_{+}^{m} \times \mathbb{R}_{ +}^{p} \rightarrow \mathbb{R}_{ +}^{p}}$$

is a continuous function which satisfies the following conditions.

  1. 1.

    For all perturbations E and matrices \(\mathbf{X} \in \mathbb{C}^{n\times n}\), and for all vectors \(\delta \in \mathbb{R}_{+}^{m}\), \(\xi \in \mathbb{R}_{+}^{p}\) with

    $$\displaystyle{\vert \mathbf{E}\vert \preceq \delta,\ \vert \mathbf{X}\vert \preceq \xi }$$

    it is fulfilled

    $$\displaystyle{\vert \varPi (\mathbf{E},\mathbf{X})\vert \preceq \mathbf{h}(\delta,\xi )}$$

    where is the component-wise partial order relation.

  2. 2.

    For each \(\delta \in \mathbb{R}_{+}^{m}\) the function \(\mathbf{h}(\delta,\cdot ): \mathbb{R}_{+}^{p} \rightarrow \mathbb{R}_{+}^{p}\) is differentiable in the domain \(\mathbb{R}_{+}^{p}\setminus \{\mathbf{0}\}\).

  3. 3.

    The elements h k of h are non-decreasing strictly convex functions of all their arguments and h(0, 0) = 0.

  4. 4.

    There is a continuous matrix function \(J: \mathbb{R}_{+}^{m} \times \mathbb{R}_{+}^{p} \rightarrow \mathbb{R}^{p\times p}\) such that

    $$\displaystyle{\frac{\partial \mathbf{h}(\delta,\xi )} {\partial \xi } \preceq J(\delta,\xi );\ \mathbf{0}\preceq \delta,\ \mathbf{0} \prec \boldsymbol{\xi }}$$

    and the spectral radius of J(0, 0) is less than 1 (here 0 ≺ ξ means that all elements of ξ are positive).

Applying the Schauder fixed point principle it may be shown [38] that under these conditions and for some \(\delta _{0} \succ 0\) the vector majorant equation

$$\displaystyle{\xi = \mathbf{h}(\delta,\xi ),\ \delta \preceq \delta _{0}}$$

has a solution \(\xi = \mathbf{f}(\delta )\succeq \mathbf{0}\) which tends to 0 together with δ. Finally the desired perturbation estimate for X is

$$\displaystyle{\vert \mathbf{X}\vert \preceq \mathbf{f}(\delta )}$$

which also yields

$$\displaystyle{\|\mathbf{V}\| =\| \mathbf{X}\| = \sqrt{\|\mathbf{X} _{1 } \|^{2 } +\| \mathbf{X} _{2 } \|^{2 } +\| \mathbf{X} _{3 } \|^{3}} \leq \|\mathbf{f}(\delta )\|.}$$

To construct the operator Π 1(A, E, ⋅ ) in (7.31) we represent Z as

$$\displaystyle{\mathbf{Z} = Z(\mathbf{A},\mathbf{E},\mathbf{X}) = \mathcal{L}(\mathbf{A})(\mathbf{X}) -\varOmega (\mathbf{A},\mathbf{E},\mathbf{X}),}$$

where \(\mathcal{L}(\mathbf{A})(\mathbf{X})\) is the main part of Z(A, E, X) with respect to X (in particular \(\mathcal{L}(\mathbf{A})\) can be the Fréchet derivative of Z in X computed for E = 0, X = 0). In turn, the expression

$$\displaystyle{\varOmega (\mathbf{A},\mathbf{E},\mathbf{X}):= \mathcal{L}(\mathbf{A})(\mathbf{X}) - Z(\mathbf{A},\mathbf{E},\mathbf{X})}$$

contains first order terms in \(\|\mathbf{E}\|\) and higher order terms in \(\|\mathbf{E}\| +\| \mathbf{X}\|\).

Since Low(Z) = 0 we have

$$\displaystyle{\mathcal{L}_{\mathrm{low}}(\mathbf{A})(\mathrm{lvec}(\mathbf{X})) = \mathrm{lvec}(\varOmega (\mathbf{A},\mathbf{E},\mathbf{X})).}$$

Here \(\mathcal{L}_{\mathrm{low}}(\mathbf{A}): \mathbb{C}^{\nu } \rightarrow \mathbb{C}^{\nu }\) is the lower compression of \(\mathcal{L}(\mathbf{A})\) with matrix

$$\displaystyle{ \mathbf{L}_{\mathrm{low}}(\mathbf{A}):= \mathbf{M}_{\mathrm{lvec}}\mathbf{L}(\mathbf{A})\mathbf{M}_{\mathrm{lvec}}^{\top }, }$$
(7.33)

where \(\mathbf{L}(\mathbf{A}) \in \mathbb{C}^{\nu \times \nu }\) is the matrix of the operator \(\mathcal{L}(\mathbf{A})\).

Under certain generic conditions the operator \(\mathcal{L}_{\mathrm{low}}(\mathbf{A})\) is invertible although \(\mathcal{L}(\mathbf{A})\) is not. Thus we have

$$\displaystyle{\mathbf{X}_{1} =\varPi _{1}(\mathbf{A},\mathbf{E},\mathbf{X}):= \mathrm{lvec}^{\dag }\circ L_{\mathrm{ low}}^{-1}(\mathbf{A}) \circ \mathrm{lvec}(\varOmega (\mathbf{A},\mathbf{E},\mathbf{X})).}$$

The explicit expression for Π 1(A, E, X) may not be constructed. Instead, to apply the technique of vector Lyapunov majorants it suffices to use the estimate

$$\displaystyle{\|\varPi _{1}(\mathbf{A},\mathbf{E},\mathbf{X})\| \leq \lambda \|\varOmega (\mathbf{A},\mathbf{E},\mathbf{X})\|,}$$

where

$$\displaystyle{ \lambda:= \left \|L_{\mathrm{low}}^{-1}(\mathbf{A})\right \|_{ 2}. }$$
(7.34)

Fortunately, a Lyapunov majorant for Ω(A, E, X) is usually constructed relatively easy.

This is in brief the general scheme for perturbation analysis of matrix problems involving unitary transformations. Applications of this scheme to particular problems are outlines in the next subsections.

5.3 QR Decomposition

Perturbation analysis of the QR decomposition

$$\displaystyle{\mathbf{A} = \mathbf{U}\mathbf{R},\ \mathrm{Low}(\mathbf{R}) = 0,\ \mathbf{U} \in \mathcal{U}(n)}$$

of the matrix \(\mathbf{A} \in \mathbb{C}^{n\times n}\) is done in [88]. Later on such an analysis has been performed by the MSOLM [44] thus getting tighter perturbation bounds.

Here the matrix L low(A) from (7.33) is

$$\displaystyle{\mathbf{L}_{\mathrm{low}}(\mathbf{A}):= \mathbf{M}_{\mathrm{lvec}}(\mathbf{R}^{\top }\otimes \mathbf{I})\mathbf{M}_{\mathrm{ lvec}}^{\top } = \mathbf{M}_{\mathrm{ lvec}}((\mathbf{A}^{\top }\overline{\mathbf{U}}) \otimes \mathbf{I})\mathbf{M}_{\mathrm{ lvec}}^{\top },}$$

where

$$\displaystyle{\mathbf{R} = \mathbf{U}^{\mathrm{H}}\mathbf{A} = [r_{ k,l}]\ (k,l = 1,2,\ldots,n).}$$

The eigenvalues of the matrix L low(A) are r 1, 1 with multiplicity n − 1, r 2, 2 with multiplicity n − 2,…, and \(r_{n-1,n-1}\) with multiplicity 1. Let either rank(A) = n, or \(\mathrm{rank}(\mathbf{A}) = n - 1\). In the latter case let us rearrange the columns of A so that the first n − 1 columns of A are linearly independent. Then the matrix L low(A) is invertible.

The perturbed QR decomposition is

$$\displaystyle{\mathbf{A} + \mathbf{E} = \mathbf{U}(\mathbf{I} + \mathbf{X})(\mathbf{R} + \mathbf{Z}),\ \mathbf{I} + \mathbf{X} \in \mathcal{U}(n),\ \mathrm{Low}(\mathbf{Z}) = \mathbf{0}}$$

and the problem is to estimate \(\|\mathbf{Z}\|\) and \(\|\mathbf{X}\|\) as functions of \(\delta:=\| \mathbf{E}\|\).

In this case the vector majorant equation is equivalent to a quadratic equation which yields the estimates [44, 45]

$$\displaystyle\begin{array}{rcl} \|\mathbf{Z}\|& \leq & f_{\mathbf{R}}(\delta ):=\| \mathbf{A}\|_{2}f_{\mathbf{U}}(\delta )+\delta, \\ \|\mathbf{X}\|& \leq & f_{\mathbf{U}}(\delta ):= \frac{2\lambda \delta } {\sqrt{1 - 2\lambda \mu _{n } \delta + \sqrt{w(\delta )}}}{}\end{array}$$
(7.35)

provided

$$\displaystyle{\delta \leq \frac{1} {\lambda \left (2\mu _{n} + \sqrt{2 + 8\mu _{n }^{2}}\right )}.}$$

Here

$$\displaystyle{w(\delta ):= (1 - 2\lambda \mu _{n}\delta )^{2} - 2\lambda ^{2}(1 + 4\mu _{ n}^{2})\delta ^{2}}$$

and the quantities μ n and λ are given in (7.32) and (7.34) respectively.

5.4 Singular Value Decomposition

Consider the singular value decomposition (SVD)

$$\displaystyle{\mathbf{A} = \mathbf{U}_{1}\mathbf{R}\mathbf{U}_{2}^{\mathrm{H}},\ \ \mathbf{U}_{ 1},\mathbf{U}_{2} \in \mathcal{U}(n),\ \mathrm{Low}(\mathbf{R}) = \mathrm{Low}(\mathbf{R}^{\top }) = \mathbf{0},}$$

of the invertible matrix \(\mathbf{A} \in \mathbb{C}^{n\times n}\) (the case of general matrices A is more subtle but is treated similarly). We have \(\mathbf{R} = \mathrm{diag}(r_{1,1},r_{2,2},\ldots,r_{n,n})\), where \(r_{1,1} \geq r_{2,2} \geq \cdots \geq r_{n,n} > 0\). The singular values σ k  = r k, k of A are the square roots of the eigenvalues of the matrix A H A. Thus R is the canonical form of A for the multiplicative action

$$\displaystyle{\mathbf{A}\mapsto \mathbf{U}_{1}^{\mathrm{H}}\mathbf{A}\mathbf{U}_{ 2}}$$

of the group \(\mathcal{U}(n) \times \mathcal{U}(n)\) on the set of invertible matrices.

Using splittings we may introduce a condensed form \(\hat{\mathbf{R}}\) for this action from

$$\displaystyle{\mathrm{Low}(\hat{\mathbf{R}}) = \mathrm{Low}(\hat{\mathbf{R}}^{\top }) = \mathbf{0}}$$

without ordering the elements of \(\hat{\mathbf{R}}\).

Let A be perturbed to A + E with \(\varepsilon:=\| \mathbf{E}\|_{2} <\sigma _{n}\) and let

$$\displaystyle{\mathbf{A} + \mathbf{E} = \mathbf{U}_{1}(\mathbf{I} + \mathbf{X}_{1})(\mathbf{R} + \mathbf{Z})(\mathbf{I} + \mathbf{X}_{2}^{\mathrm{H}})\mathbf{U}_{ 2}^{\mathrm{H}},}$$

where

$$\displaystyle{\ \mathbf{I} + \mathbf{X}_{k} \in \mathcal{U}(n),\ \mathrm{Low}(\mathbf{Z}) = \mathrm{Low}(\mathbf{Z}^{\top }) = \mathbf{0},}$$

be the SVD of the matrix A + E. Here

$$\displaystyle{\mathbf{R} + \mathbf{Z} = \mathrm{diag}(\tau _{1},\tau _{2},\ldots,\tau _{n})}$$

and

$$\displaystyle{0 <\sigma _{k}-\varepsilon \leq \tau _{k} \leq \sigma _{k} +\varepsilon \ \ (k = 1,2,\ldots,n).}$$

Thus we have the perturbation estimates

$$\displaystyle{\|\mathbf{Z}\|_{2} \leq \varepsilon \ \ \mathrm{and}\ \ \|\mathbf{Z}\| \leq \varepsilon \sqrt{n}\,.}$$

This reflects the well known fact that the SVD is well conditioned. In this case a variant of MSOLM may be used to estimate the norms of the perturbations X 1, X 2 as well.

5.5 Schur Decomposition

The sensitivity of subspaces connected to certain eigenvalue problems are considered in [82, 83]. Nonlocal and local perturbation analysis of the Schur decomposition

$$\displaystyle{\mathbf{A} = \mathbf{U}\mathbf{R}\mathbf{U}^{\mathrm{H}},\ \ \mathbf{U} \in \mathcal{U}(n),\ \mathrm{Low}(\mathbf{R}) = \mathbf{0},}$$

of the matrix \(\mathbf{A} \in \mathbb{C}^{n\times n}\) is first done in [52]. Here the matrix L low(A) from (7.33) is

$$\displaystyle{\mathbf{L}_{\mathrm{low}}(\mathbf{A}):= \mathbf{M}_{\mathrm{lvec}}(\mathbf{I} \otimes \mathbf{R} -\mathbf{R}^{\top }\otimes \mathbf{I})\mathbf{M}_{\mathrm{ lvec}}^{\top },\ \mathbf{R} = [r_{ p,q}] = \mathbf{U}^{\mathrm{H}}\mathbf{A}\mathbf{U}.}$$

The eigenvalues of the matrix L low(A) are

$$\displaystyle{r_{p,p} - r_{q,q} =\lambda _{p}(\mathbf{A}) -\lambda _{q}(\mathbf{A})\ \ (q = 1,2,\ldots,n - 1,\ p = q + 1,q + 2,\ldots,n).}$$

Hence it is invertible if and only if A has distinct eigenvalues \(\lambda _{1}(\mathbf{A}),\lambda _{2}(\mathbf{A}),\ldots,\lambda _{n}(\mathbf{A})\) (note that if A has multiple eigenvalues the Schur form of A + E may even be discontinuous as a function of E!).

The perturbed Schur decomposition is

$$\displaystyle{\mathbf{A} + \mathbf{E} = \mathbf{U}(\mathbf{I} + \mathbf{X})(\mathbf{R} + \mathbf{Z})(\mathbf{I} + \mathbf{X}^{\mathrm{H}})\mathbf{U}^{\mathrm{H}},}$$

where

$$\displaystyle{\mathbf{I} + \mathbf{X} \in \mathcal{U}(n),\ \mathrm{Low}(\mathbf{Z}) = \mathbf{0}.}$$

The corresponding vector majorant equation is equivalent to a 6th degree algebraic equation. After certain manipulations it is replaced by a quadratic equation which yields explicit nonlocal perturbation estimates of type (7.35), see [47, 52].

5.6 Anti-triangular Schur Decomposition

The anti-triangular Schur decomposition of the matrix \(\mathbf{A} \in \mathbb{C}^{n\times n}\) is described in [65]

$$\displaystyle{\mathbf{A} = \overline{\mathbf{U}}\mathbf{R}\mathbf{U}^{\mathrm{H}},\ \mathbf{U} \in \mathcal{U}(n),}$$

where the matrix R is anti-triangular,

$$\displaystyle{\mathbf{R} = \left [\begin{array}{ccccc} 0 & 0 &\ldots & 0 & r_{1,n} \\ 0 & 0 &\ldots & r_{2,n-1} & r_{2,n}\\ \vdots & \vdots &\ddots & \vdots & \vdots \\ 0 &r_{n-1,2} & \ldots & r_{n-1,n-1} & r_{n-1,n} \\ r_{n,1} & r_{n,2} & \ldots & r_{n,n-1} & r_{n,n} \end{array} \right ]}$$

This decomposition arises in solving palindromic eigenvalue problems [65].

Special matrix splittings are derived for the perturbation analysis of this decomposition and a variant of MSOLM for this purpose is presented recently in [15].

5.7 Hamiltonian-Schur Decomposition

The Hamiltonian-Schur decomposition of a Hamiltonian matrix

$$\displaystyle{\mathbf{A} = \left [\begin{array}{cc} \mathbf{A}_{1} & \mathbf{A}_{2} \\ \mathbf{A}_{3} & -\mathbf{A}_{1}^{\mathrm{H}} \end{array} \right ] \in \mathbb{C}^{2n\times 2n},\ \mathbf{A}_{ 2}^{\mathrm{H}} = \mathbf{A}_{ 2},\ \mathbf{A}_{3}^{\mathrm{H}} = \mathbf{A}_{ 3}}$$

is considered in [64]. When A has no imaginary eigenvalues there exist a matrix

$$\displaystyle{\mathbf{U} = \left [\begin{array}{rr} \mathbf{U}_{1} & \mathbf{U}_{2} \\ -\mathbf{U}_{2} & \mathbf{U}_{1} \end{array} \right ] \in \mathcal{U}\mathcal{S}(2n),}$$

where \(\mathcal{U}\mathcal{S}(2n) \subset \mathcal{U}(2n)\) is the group of unitary symplectic matrices, such that

$$\displaystyle{\mathbf{R}:= \mathbf{U}^{\mathrm{H}}\mathbf{A}\mathbf{U} = \left [\begin{array}{cc} \mathbf{R}_{1} & \mathbf{R}_{2} \\ \mathbf{0} & -\mathbf{R}_{1}^{\mathrm{H}} \end{array} \right ],\ \mathrm{Low}(\mathbf{R}_{1}) = \mathbf{0},\ \mathbf{R}_{2}^{\mathrm{H}} = \mathbf{R}_{ 2}.}$$

Less condensed forms

$$\displaystyle{\hat{\mathbf{R}} = \left [\begin{array}{cc} \hat{\mathbf{R}}_{1,1} & \hat{\mathbf{R}}_{1,2} \\ \mathbf{0} &\hat{\mathbf{R}}_{2,2} \end{array} \right ]}$$

(block Hamiltonian-Schur form relative to \(\mathcal{U}\mathcal{S}(2n)\) and block Schur form relative \(\mathcal{U}(2n)\)) of Hamiltonian matrices are introduced in [40]. The (1,1) and (2,2) blocks in \(\hat{\mathbf{R}}\) are less structured in comparison with the corresponding blocksof R.

Local and nonlocal perturbation analysis of the Hamiltonian-Schur and block-Schur forms of Hamiltonian matrices using MSOLM is presented in [40, 45]. For Hamiltonian-Schur (resp. block-Shur) forms the vector Lyapunov majorant h has 4 components and the vector majorant equation is equivalent to a 8th degree (resp. 6th degree) algebraic equation. After certain manipulations it is replaced by a bi-quadratic equation which yields explicit nonlocal perturbationbounds.

6 Control Problems

Perturbation analysis of control problems is subject to many investigations [45, 58, 73, 90]. Here we briefly mention two such major problems.

6.1 Unitary Canonical Forms

Unitary canonical forms (UCF) of linear time-invariant control systems

$$\displaystyle{\mathbf{x}'(t) = \mathbf{A}\mathbf{x}(t) + \mathbf{B}\mathbf{u}(t),}$$

where \(\mathbf{x}(t) \in \mathbb{C}^{n}\) is the state vector, \(\mathbf{u}(t) \in \mathbb{C}^{m}\) is the control vector and

$$\displaystyle{\mathbf{A} \in \mathbb{C}^{n\times n},\ \mathbf{B} \in \mathbb{C}^{n\times m},\ \mathrm{rank}(\mathbf{B}) = m < n,}$$

have been introduced in [48, 61]. The rigorous definition of these forms is given in [48]. The perturbation analysis of UCF is done in [57, 58, 74, 90] by MSOLM. We stress that UCF now play a major role in the analysis and synthesis of linear time-invariant systems.

The action of the group \(\mathcal{U}(n)\) on the set of controllable matrix pairs \([\mathbf{A},\mathbf{B}) \in \mathbb{C}^{n\times n} \times \mathbb{C}^{n\times m}\) is given by

$$\displaystyle{[\mathbf{A},\mathbf{B})\mapsto [\mathbf{R},\mathbf{T}):= [\mathbf{U}^{\mathrm{H}}\mathbf{A}\mathbf{U},\mathbf{U}^{\mathrm{H}}\mathbf{B}),\ \mathbf{U} \in \mathcal{U}(n).}$$

(we prefer to denote by B and A the system matrices instead of the more ‘consistent’ notation A 1 and A 2).

The canonical pair [R, T) has a very involved structure depending on the controllability indexes of [A, B), see [48]. In particular the matrix \([\mathbf{T},\mathbf{R}] \in \mathbb{C}^{n\times (n+m)}\) is block upper triangular. Hence R is a block Hessenberg matrix as shownbelow

$$\displaystyle{[\mathbf{T},\mathbf{R}] = \left [\begin{array}{rrrrrrr} \mathbf{T}_{1,0} & \mathbf{R}_{1,1} & \mathbf{R}_{1,2} & \ldots & \mathbf{R}_{1,p-2} & \mathbf{R}_{1,p-1} & \mathbf{R}_{1,p} \\ \mathbf{0}&\mathbf{R}_{2,1} & \mathbf{R}_{2,2} & \ldots & \mathbf{R}_{2,p-2} & \mathbf{R}_{2,p-1} & \mathbf{R}_{2,p} \\ \mathbf{0}& \mathbf{0}&\mathbf{R}_{3,2} & \ldots & \mathbf{R}_{3,p-2} & \mathbf{R}_{3,p-1} & \mathbf{R}_{3,p}\\ \vdots & \vdots & \vdots &\ddots & \vdots & \vdots &\vdots \\ \mathbf{0}& \mathbf{0}& \mathbf{0}&\ldots &\mathbf{R}_{p-1,p-2} & \mathbf{R}_{p-1,p-1} & \mathbf{R}_{p-1,p} \\ \mathbf{0}& \mathbf{0}& \mathbf{0}&\ldots & \mathbf{0}& \mathbf{R}_{p,p-1} & \mathbf{R}_{p,p} \end{array} \right ] =}$$

where \(\mathbf{T}_{1,0} \in \mathbb{R}^{m_{1}\times m_{0}}\), \(\mathbf{R}_{k,k-1} \in \mathbb{R}^{m_{k}\times m_{k-1}}\), m 1 = m and

$$\displaystyle{\mathrm{Low}(\mathbf{T}_{1,0}) = \mathbf{0},\ \ \mathrm{Low}(\mathbf{R}_{k,k-1}) = \mathbf{0}\ (k = 1,2,\ldots,p).}$$

Here p > 1 is the controllability index and m 1 ≥ m 2 ≥ ⋯ ≥ m p  ≥ 1 are the conjugate Kronecker indexes of the pair [A, B). Note that in the generic case \(m = m_{1} = m_{2} = \cdots = m_{p-1}\).

The complete set of arithmetic and algebraic invariants relative to the action of various unitary (orthogonal in particular) matrix groups is described in [45, 48].

Consider the perturbed pair \([\mathbf{A} + \mathbf{E}_{2},\mathbf{B} + \mathbf{E}_{1})\) with

$$\displaystyle{\|\mathbf{E}_{1}\| \leq \delta _{1},\ \|\mathbf{E}_{2}\|_{2} \leq \delta _{2},}$$

which is reduced to UCF \([\mathbf{R} + \mathbf{Z},\mathbf{T} + \mathbf{Y})\) by the transformation matrix \(\mathbf{U}(\mathbf{I} + \mathbf{X}) \in \mathcal{U}(n)\). The perturbation problem here is to estimate the norms of X, Y and Z as functions of the perturbation vector \(\delta = [\delta _{1};\delta _{2}] \in \mathbb{R}_{+}^{2}\). Local and nonlocal perturbation estimates for this problem are presented in [45, 57, 58, 73, 90]. The most general and tight results are those given in [58].

6.2 Modal Control

Consider the linear time-invariant system

$$\displaystyle{\mathbf{x}'(t) = \mathbf{A}\mathbf{x}(t) + \mathbf{B}\mathbf{u}(t),\ \mathbf{y}(t) = \mathbf{C}\mathbf{x}(t),}$$

where \((\mathbf{C},\mathbf{A},\mathbf{B}) \in \mathbb{C}^{r\times n} \times \mathbb{C}^{n\times n} \times \mathbb{C}^{n\times m}\) and mr ≥ n, r ≤ n, m < n. We suppose that the triple (C, A, B) is complete, i.e. that the pair [A, B) is controllable and the pair (C, A] is observable.

The static feedback u(t) = Ky(t) results in the closed-loop system

$$\displaystyle{\mathbf{x}'(t) = (\mathbf{A} + \mathbf{B}\mathbf{K}\mathbf{C})\mathbf{x}(t).}$$

The purpose of the modal control is to find an output feedback matrix \(\mathbf{K} \in \mathbb{C}^{m\times r}\) so as the closed-loop system matrix A + BKC to have certain desirable properties. In particular it should have a prescribed set \(\{\lambda _{1},\lambda _{2},\ldots,\lambda _{n}\}\) of eigenvalues (pole assignment synthesis), or, more generally, should be similar to a given matrix \(\mathbf{D} \in \mathbb{C}^{n\times n}\), i.e.

$$\displaystyle{ \mathbf{U}^{-1}(\mathbf{A} + \mathbf{B}\mathbf{K}\mathbf{C})\mathbf{U} = \mathbf{D} }$$
(7.36)

for some U ∈ Γ, where the matrix group \(\varGamma \subset \mathbb{C}^{n\times n}\) is either the unitary group \(\mathcal{U}(n)\) or the group \(\mathcal{G}\mathcal{L}(n)\) of invertible matrices. In particular we may choose a desired form D with Low(D) = 0 and with diagonal elements d k, k  = λ k \((k = 1,2,\ldots,n)\). Suppose that \(\varGamma = \mathcal{U}(n)\) in order to achieve reliability of the numerical procedure for feedback synthesis as proposed in [77].

Conditions for solvability of the problem of determining U and K from (7.36) are given in [45, 77]. When the problem is solvable its solution for K is an (mrn)-parametric algebraic variety \(\mathcal{K}(\mathbf{C},\mathbf{A},\mathbf{B},\mathbf{D}) \subset \mathbb{C}^{m\times r}\).

If the data (C, A, B) and D is perturbed to \((\mathbf{C} + \mathbf{E}_{3},\mathbf{A} + \mathbf{E}_{2},\mathbf{B} + \mathbf{E}_{1})\) and D + E 4 with \(\|\mathbf{E}_{k}\| \leq \delta _{k}\) (k = 1, 2, 3, 4), then under certain conditions the perturbed problem

$$\displaystyle{(\mathbf{I} + \mathbf{X}^{\mathrm{H}})\mathbf{U}^{\mathrm{H}}(\mathbf{A} + \mathbf{E}_{ 2} + (\mathbf{B} + \mathbf{E}_{1})(\mathbf{K} + \mathbf{Z})(\mathbf{C} + \mathbf{E}_{3}))\mathbf{U}(\mathbf{I} + \mathbf{X}) = \mathbf{D} + \mathbf{E}_{4}}$$

has a solution Z, X with \(\mathbf{I} + \mathbf{X} \in \mathcal{U}(n)\). The task now is to estimate the quantities \(\|\mathbf{Z}\|\) and \(\|\mathbf{X}\|\) as functions of the perturbation vector \(\delta = [\delta _{1};\delta _{2};\delta _{3};\delta _{4}] \in \mathbb{R}_{+}^{4}\).

Perturbation analysis for the pole assignment synthesis problem is presented in [91] for the particular case when r = n and the desired poles λ k are pairwise distinct, using specific matrix techniques. However, this restriction on λ k is not necessary and more general and tighter perturbation bounds may be derived. This is done in [53] using MSOLM.

An important feature of the feedback synthesis of linear systems is the possibility to use the freedom in the solution \(\mathbf{K} \in \mathcal{K}(\mathbf{C},\mathbf{A},\mathbf{B},\mathbf{D})\) when mr > n for other design purposes. A reliable algorithm for this purpose is proposed in [77].