Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

The algebraic and differential Riccati equations (AREs/DREs) play a fundamental role in the solution of problems in systems and control theory. They have found widespread applications in applied mathematics and engineering, many of which can be found in the monographs [1, 36, 68, 86]. In this chapter, we focus on Riccati equations associated to control problems, as these have always inspired Volker Mehrmann’s work, and he has mainly focused on the resulting symmetric Riccati equations – symmetric in the sense that the associated Riccati operators map symmetric (Hermitian) matrices onto symmetric (Hermitian) matrices. Hence, also the solutions to the Riccati equations we will consider are expected to be symmetric (Hermitian). A class of nonsymmetric AREs that arises, e.g., in queuing theory, certain fluid flow problems, and transport theory (see, e.g., [36]) is of importance as well, but for conciseness, we will omit these AREs even though Volker has also contributed to this area [81].

In most of the literature on AREs and DREs, the motivation is taken from the classical linear-quadratic regulator (LQR) problem. This was the topic of Volker’s habilitation thesis [74], where, building upon earlier work by Bender and Laub [6, 7], he extended the LQR theory to so-called descriptor systems, i.e., systems with dynamics described by differential-algebraic equations (DAEs). Much of Volker’s early work on AREs culminated in this thesis which later became the appraised book [76]. We therefore also start by formulating the LQR problems in continuous- and discrete time and how they relate to AREs. These optimal control problems can be formulated in various levels of generality. The setting we consider is:

minimize

$$\displaystyle{ \mathcal{J} (x^{0},u)\ =\ \frac{1} {2}\int \limits _{0}^{\infty }\left (y(t)^{T}\mathit{Qy}(t) + 2y(t)^{T}\mathit{Su}(t) + u(t)^{T}\mathit{Ru}(t)\right )\mathit{dt} }$$
(4.1)

subject to the linear time-invariant (LTI) system

$$\displaystyle\begin{array}{rcl} E\dot{x}(t)& =& \mathit{Ax}(t) + \mathit{Bu}(t),\qquad x(0) = x^{0},{}\end{array}$$
(4.2a)
$$\displaystyle\begin{array}{rcl} y(t)& =& \mathit{Cx}(t),{}\end{array}$$
(4.2b)

in the continuous-time case and

minimize

$$\displaystyle{ \mathcal{J} (x^{0},u)\ =\ \frac{1} {2}\sum _{k=0}^{\infty }\left (y_{ k}^{T}\mathit{Qy}_{ k} + 2y_{k}^{T}\mathit{Su}_{ k} + u_{k}^{T}\mathit{Ru}_{ k}\right ) }$$
(4.3)

subject to the discrete-time LTI system

$$\displaystyle\begin{array}{rcl} \mathit{Ex}_{k+1}& =& \mathit{Ax}_{k} + \mathit{Bu}_{k},\qquad x_{0} = x^{0},{}\end{array}$$
(4.4a)
$$\displaystyle\begin{array}{rcl} y_{k}& =& \mathit{Cx}_{k},{}\end{array}$$
(4.4b)

in the discrete-time case. In both settings, \(A,E \in \mathbb{R}^{n\times n}\), \(B \in \mathbb{R}^{n\times m}\), \(C \in \mathbb{R}^{p\times n}\), \(Q \in \mathbb{R}^{p\times p}\), \(R \in \mathbb{R}^{m\times m}\), and \(S \in \mathbb{R}^{p\times m}\). Furthermore, we assume Q and R to be symmetric. In both cases, the initial state \(x^{0} \in \mathbb{R}^{n}\) can be chosen freely if E is nonsingular and is constrained to a manifold in the descriptor case, see the Chap. 16 for more details on this. In the continuous-time case, u is considered as an element of an appropriate function space like k times continuously differentiable functions C k(0, ) or square-integrable functions L 2(0, ). No further constraints are imposed in this setting. In discrete-time, u represents a sequence \((u_{k})_{k=0}^{\infty }\) that should be (square-)summable in an appropriate sense. A formulation in complex arithmetic is possible, and most of the results considered here remain valid (cf. [68] for a detailed treatment of real and complex AREs), but as most practical applications are formulated using real data, we stick to this case here.

Under fairly general conditions, the LQR problems have solutions of the form \(u(t) = \mathcal{F}_{c}(x(t))\) and \(u_{k} = \mathcal{F}_{d}(x_{k})\), respectively. As they appear in feedback form, that is, the current control input depends on the current state information, this observation lays the basis for modern (feedback) control [54]. Possible sufficient conditions to obtain such feedback solutions are: E is nonsingular, Q and R are positive definite, S is “small” enough such that \(\left [\begin{matrix}\scriptstyle Q &\scriptstyle S \\ \scriptstyle S^{T}&\scriptstyle R\end{matrix}\right ]\) is positive semidefinite, and the matrix triples (E, A, B) and (E, A, C) are stabilizable and detectable, respectively. Here, stabilizable means that \(\mathop{\mathrm{rank}}\nolimits [A -\lambda E,B] = n\) for all \(\lambda \in \{ z \in \mathbb{C}\,\vert \,\mathfrak{R}(z) \geq 0\}\) in the continuous-time case and for all \(\lambda \in \{ z \in \mathbb{C}\,\vert \,\vert z\vert \geq 1\}\) in the discrete-time case, while detectability of (E, A, C) is equivalent to stabilizability of \((E^{T},A^{T},C^{T})\). Under these conditions, the LQR problems have unique solutions. These are given by the feedback laws

$$\displaystyle{ u(t) = -R^{-1}(B^{T}X_{ c}E + S^{T}C)x(t) =: -F_{ c}x(t),\qquad t \geq 0, }$$
(4.5)

in the continuous-time case and

$$\displaystyle{ u_{k} = -(R + B^{T}X_{ d}B)^{-1}(B^{T}X_{ d}A + S^{T}C)x_{ k} =: -F_{d}x_{k},\quad k = 0,1,\ldots, }$$
(4.6)

in the discrete-time case. Now, the relation of the LQR problem to AREs becomes evident as X c is the unique stabilizing solution to the (generalized) continuous-time algebraic Riccati equation (CARE)

$$\displaystyle{ \begin{array}{ll} 0 = \mathcal{R}_{c}(X)&:= C^{T}\mathit{QC} + A^{T}\mathit{XE} + E^{T}\mathit{XA}- \\ &\qquad \qquad \quad (E^{T}\mathit{XB} + C^{T}S)R^{-1}(B^{T}\mathit{XE} + S^{T}C), \end{array} }$$
(4.7)

while X d is the unique stabilizing solution of the (generalized) discrete-time algebraic Riccati equation (DARE)

$$\displaystyle{ \begin{array}{ll} 0 = \mathcal{R}_{d}(X)&:= C^{T}\mathit{QC} + A^{T}\mathit{XA} - E^{T}\mathit{XE}- \\ &\qquad \qquad \quad (A^{T}\mathit{XB} + C^{T}S)(R + B^{T}\mathit{XB})^{-1}(B^{T}\mathit{XA} + S^{T}C). \end{array} }$$
(4.8)

The solutions X c and X d are stabilizing in the sense that the feedback solutions generated by inserting (4.5) and (4.6) into (4.2) and (4.4), respectively, are asymptotically stable, i.e., x(t), x k  → 0 for t, k →  and all initial values \(x^{0} \in \mathbb{R}^{n}\). Under the given assumptions, CAREs and DAREs have exactly one stabilizing solution, despite the fact that there may exist many other solutions [68]. These stabilizing solutions are symmetric and positive semidefinite, the latter property again uniquely identifies the solutions X c , X d in the respective solution sets.

In his work, Volker has often considered the case that E is singular. This is in particular the case in his habilitation thesis and the resulting book [76]. In general, in this case the relation between the AREs (4.7) and (4.8) and the feedback solutions to the LQR problems is lost. Several modifications of the AREs to re-establish this connection have been suggested in the literature. They usually require special conditions, and the resulting generalized AREs are often not easily solvable numerically. Only recently, efficient methods for a class of generalized CAREs with singular E have been suggested in [35]. Similarly to [6, 7], Volker shows in [76, § 5] how the LQR problem for descriptor systems (i.e., E singular) can be reduced to a problem with nonsingular E for which the Riccati solutions exist. Nevertheless, this procedure requires strong conditions, in particular quite complicated consistency conditions for the initial values, that are often not satisfied in practice. This observation, among others, led Volker to work on alternative approaches to solve the LQR problem for descriptor systems, avoiding AREs completely. This is the topic of [63, 64, 66, 67], where also extensions to systems with time-varying coefficients are considered and the theory of strangeness-free DAEs [65] is applied to the LQR problem. We will not further discuss these fundamental contributions to optimal control for DAEs, as they do not directly relate to Riccati equations anymore. Recently, a solution concept for LQR problems based on purely algebraic considerations was derived in the Ph.D. thesis of Matthias Voigt [88]. This theory relates the feedback solution of the LQR problem for descriptor systems to dissipation inequalities, even matrix pencils (see the Chaps. 6 and 12), and the stabilizing solutions of Lur’e equations. This work may complete three decades of the quest for a general algebraic solution theory for the LQR problem that does not require a special index concept or restrictions on the index of the underlying DAE. Notably, Matthias is Volker’s academic grandchild!

In [76], Volker also considers LQR problems with finite time-horizon, i.e., the cost functionals are replaced by finite integrals 0 T, where 0 < T < , or finite sums \(\sum _{k=0}^{K-1}\), \(K \in \mathbb{N}\). In this situation, usually also the final states x(T), x K are penalized in the cost functional. Often, the LQR problem is used for stabilization problems so that it is desired that the final state is close to zero, which suggests a positive definite quadratic weighting function. This is particularly the case when the modeling is done in such a way that x represents the deviation of the state of a dynamical system from a desired state. In the continuous-time case, the LQR problem can then be formulated as follows:

minimize

$$\displaystyle{\mathcal{J} (x^{0},u)\ =\ \frac{1} {2}\left (x(T)^{T}\mathit{Mx}(T) +\int \limits _{ 0}^{T}\left (y(t)^{T}\mathit{Qy}(t) + 2y(t)^{T}\mathit{Su}(t) + u(t)^{T}\mathit{Ru}(t)\right )\mathit{dt}\right )}$$

subject to (4.2)

with \(M = M^{T} \in \mathbb{R}^{n\times n}\) positive semidefinite. Note that for consistency, one could write the penalty term at T in terms of y(T) instead of x(T). As we can replace y(T) by Cx(T) using (4.2), this is merely a notational issue.

Under similar conditions as in the infinite-time horizon case, the unique solution of the finite-time LQR problem is then given by the feedback control

$$\displaystyle{ u(t) = -R^{-1}(B^{T}X(t)E + S^{T}C)x(t) =: -F_{ c}(t)x(t),\qquad t \geq 0, }$$
(4.9)

where X(t) is the unique solution of the DRE

$$\displaystyle{ -E^{T}\dot{X}(t)E = \mathcal{R}_{ c}(X(t)) }$$
(4.10)

with terminal condition E T X(T)E = M and \(\mathcal{R}_{c}\) as in (4.7). The existence and uniqueness of the DRE solution is considered in [1, 86]. The theory can also be extended to time-varying systems, i.e., systems with A = A(t), etc. In general, if E is singular, again the relation of the solution to the finite-time LQR problem and the DRE is lost, but is re-established using a reduction procedure described by Volker in [76, § 3]. We will re-visit this technique, as well as a numerical procedure to solve (4.10) as suggested by Mehrmann and Kunkel in [62], in the next section.

The finite horizon discrete-time LQR problem is also treated in [76]. This leads to the solution of difference Riccati equations of the form

$$\displaystyle{ -E^{T}X_{ k}E = \mathcal{R}_{d}(X_{k+1}) }$$
(4.11)

with terminal condition \(E^{T}X_{K}E = M\). As in the other cases considered so far, a reduction to the case E nonsingular is necessary to establish the link between the LQR problem and (4.11). Such a reduction procedure is suggested again in [76, § 3]. A numerical procedure to solve (4.11) is sketched in [76, § 19]. As (4.11) is solved backwards starting from X K , advancing from X k+1 to X k can be achieved by simply evaluating \(\mathcal{R}_{d}(X_{k+1})\) using, e.g., a Cholesky decomposition of \(R + B^{T}X_{k+1}B\), and then solving a linear system of equations using, e.g., the LU decomposition of E. Due to its conceptual simplicity, we will not discuss this approach here any further.

The remainder of this chapter is organized as follows. As already mentioned above, we will focus on Volker’s contributions to the solution of DREs in the next section. The relation between CAREs and DAREs is explored in Sect. 4.3. The bulk of this chapter summarizes Volker’s contributions to the numerical solution of AREs and comprises Sect. 4.4, while we briefly touch upon Volker’s passion to solve control problems avoiding AREs in Sect. 4.5. Final remarks are given in Sect. 4.6.

2 Numerical Methods for Differential Riccati Equations

In [62], the DRE (4.10) is considered with S = 0, and in the simplified representation

$$\displaystyle{ -E^{T}\dot{X}(t)E = F + A^{T}X(t)E + E^{T}X(t)A - E^{T}X(t)\mathit{GX}(t)E }$$
(4.12)

with terminal condition E T X(T)E = M, where even time-dependence of A, E, F, G is allowed under the assumption of sufficient smoothness.

The contributions of [62], and [76, § 3,19] are twofold. First of all, conditions are derived under which the solution of the finite-time horizon LQR problem is given via the feedback control law (4.9), defined by the solution of the DRE. In case of singular E, this requires a reduction procedure resulting in a DRE with nonsingular E matrix. We will sketch this procedure briefly. Second, a numerical solution method for (4.12) is derived that resolves some singularity issues when (4.12) is solved in vectorized form as a DAE. In the notation derived later on by Kunkel and Mehrmann [65], it is assumed that the DAE \(E\dot{x}(t) = \mathit{Ax}(t)\) is strangeness-free (see the Chap. 16 for details), (E, A) is a regular matrix pair, and \(\mathop{\mathrm{rank}}\nolimits (E)\) is constant on [0, T]. For the ease of presentation, in the following we will only consider the time-invariant case, but the derivations in the time-varying case in [62] are formally completely analogous.

2.1 Reduction to a Standard LQR Problem

Basically, the singular value decomposition of E,

$$\displaystyle{ E = U\left [\begin{array}{*{10}c} \varSigma &0\\ 0 &0 \end{array} \right ]V ^{T} }$$
(4.13)

is the key step in the reduction procedure. Here, \(U,V \in \mathbb{R}^{n\times n}\) are orthogonal, \(\varSigma \in \mathbb{R}^{r\times r}\) is nonsingular, and \(r =\mathop{ \mathrm{rank}}\nolimits (E)\). Not that we do not necessarily need a reduction of Σ to diagonal form, a URV-type decomposition with a nonsingular Σ suffices. Therefore, we do not assume symmetry of Σ in the following. If we now insert the SVD of E in (4.2), multiply the first equation with U T from the left, and using

$$\displaystyle{U^{T}\mathit{AV } = \left [\begin{array}{*{10}c} A_{11} & A_{12} \\ A_{21} & A_{22} \end{array} \right ],\quad U^{T}B = \left [\begin{array}{*{10}c} B_{1} \\ B_{2} \end{array} \right ],\quad \mathit{CV } = \left [\begin{array}{*{10}c} C_{1},\;C_{2} \end{array} \right ],}$$

as well as the change of basis \(\left [\begin{matrix}\scriptstyle x_{1} \\ \scriptstyle x_{2}\end{matrix}\right ] = V ^{T}x\) with the partitioning implied by the SVD (4.13), the descriptor system (4.2) becomes

$$\displaystyle\begin{array}{rcl} \varSigma \dot{x}_{1}(t)& =& A_{11}x_{1}(t) + A_{12}x_{2}(t) + B_{1}u(t),{}\end{array}$$
(4.14a)
$$\displaystyle\begin{array}{rcl} 0& =& A_{21}x_{1}(t) + A_{22}x_{2}(t) + B_{2}u(t),{}\end{array}$$
(4.14b)
$$\displaystyle\begin{array}{rcl} y(t)& =& C_{1}x_{1}(t) + C_{2}x_{2}(t).{}\end{array}$$
(4.14c)

The strangeness-freeness assumption implies that A 22 is nonsingular, so that we can solve the second of these equations for x 2 and insert the result in the first and third equations. This leads to the standard LTI system

$$\displaystyle\begin{array}{rcl} \varSigma \dot{x}_{1}(t)& =& \underbrace{\mathop{(A_{11} - A_{12}A_{22}^{-1}A_{21})}}\limits _{=:\ \hat{A}}x_{1}(t) +\underbrace{\mathop{ (B_{1} - A_{12}A_{22}^{-1}B_{2})}}\limits _{=:\ \hat{B}}u(t),{}\end{array}$$
(4.15)
$$\displaystyle\begin{array}{rcl} y(t)& =& \underbrace{\mathop{(C_{1} - C_{2}A_{22}^{-1}A_{21})}}\limits _{=:\ \hat{C}}x_{1}(t) +\underbrace{\mathop{ (-C_{2}A_{22}^{-1}B_{2})}}\limits _{=:\ \hat{D}}u(t).{}\end{array}$$
(4.16)

Now also performing the change of basis in the cost functional results in a standard finite-time horizon LQR problem which can be solved via the DRE

$$\displaystyle{ -\varSigma ^{T}\dot{\hat{X}}(t)\varSigma =\hat{ Q} +\varSigma ^{T}\hat{X}\hat{A} +\hat{ A}^{T}\hat{X}\varSigma -\left (\hat{B}^{T}\hat{X}\varSigma +\hat{ S}^{T}\right )^{T}\hat{R}^{-1}\left (\hat{B}^{T}\hat{X}\varSigma +\hat{ S}^{T}\right ) }$$
(4.17)

with terminal condition \(\varSigma ^{T}\hat{X}\varSigma =\hat{ M}\), where \(\hat{M}\) results from the terminal condition of the original LQR problem after the change of coordinates. Note that when \(B_{2}\not =0\), u(T) appears in the terminal penalty term which changes the nature of the problem. As M can usually be chosen freely, it is therefore convenient to assume in the partitioning of \(V ^{T}\mathit{MV } = \left [\begin{matrix}\scriptstyle M_{11} &\scriptstyle M_{12} \\ \scriptstyle M_{12}^{T}&\scriptstyle M_{ 22}\end{matrix}\right ]\) that \(M_{12} = 0,M_{22} = 0\). The coefficient matrices \(\hat{Q},\hat{S},\hat{R}\) in (4.17) can be read off from the cost functional after applying the change of coordinates and inserting the modified output equation (4.16) in the first integrand:

$$\displaystyle\begin{array}{rcl} y(t)^{T}\mathit{Qy}(t)& =& (\hat{C}x_{ 1}(t) +\hat{ D}u(t))^{T}Q(\hat{C}x_{ 1}(t) +\hat{ D}u(t)) {}\\ & =& x_{1}(t)^{T}\underbrace{\mathop{ \hat{C}^{T}Q\hat{C}}}\limits _{ =:\ \hat{Q}}x_{1}(t) + 2x_{1}(t)^{T}\underbrace{\mathop{ \hat{C}^{T}Q\hat{D}}}\limits _{ =:\ \hat{S}}u(t) + u(t)^{T}\hat{D}^{T}Q\hat{D}u(t).{}\\ \end{array}$$

With \(\hat{R} = R +\hat{ D}^{T}Q\hat{D}\), and the assumption that S = 0, the LQR theory for LTI systems yields the feedback solution via the DRE (4.17). In order for this reduction procedure to work, it is of course necessary that the modified matrices defining the LTI system and cost functional inherit the properties like positive (semi)definiteness of \(\hat{Q},\hat{R}\) as well as the stabilizabilty and detectability properties. In a numerical procedure, this needs to be checked before the solution of the original LQR problem can be derived from that of the reduced LQR problem. While often, these properties indeed carry over, a more severe restriction is caused by the consistency condition implied by (4.14b) for t = 0: if \(\left [\begin{matrix}\scriptstyle x_{1}^{0} \\ \scriptstyle x_{2}^{0}\end{matrix}\right ] = V ^{T}x^{0}\), then (4.14b) implies

$$\displaystyle{x_{2}^{0} = -A_{ 22}^{-1}\left (A_{ 21}x_{1}^{0} + B_{ 2}u(0)\right ).}$$

If \(B_{2}\not =0\), this is a restriction on the possible controls, while for B 2 = 0, it yields a consistency condition for the initial values. Whether or not this restricts the applicability of the reduction approach to the LQR problem for descriptor systems certainly depends on the application. It should also be noted that under certain conditions on the system matrices, higher-index DAE problems can be reduced to regular index-1 problems using output feedback. This topic is discussed in more detail in Chap. 15

Remark 1

The reduction procedure using the SVD (4.13) of E can also be applied directly to the DRE (4.12) without considering the LQR background. This basically leads again to the DRE (4.17) with additional consistency and solvability conditions, see [62, Section 3].

2.2 Numerical Solution of DREs

In [62] and [76, § 19], it is suggested to solve the DRE (4.12) by vectorization. This uses the “vec” operator

$$\displaystyle{\mathop{\mathrm{vec}}: \mathbb{R}^{\ell\times k} \rightarrow \mathbb{R}^{\ell k}\,: X\mapsto \mathop{\mathrm{vec}}(X),}$$

which stacks the columns of a matrix on top of each other. Vectorization of expressions like AXB is simplified by using the Kronecker product ⊗, see, e.g., [61, 69]. In particular, the formulas

$$\displaystyle{ \mathop{\mathrm{vec}}(\mathit{AXB}) = \left (B^{T} \otimes A\right )\mathop{\mathrm{vec}}(X),\quad \left (A^{T} \otimes B^{T}\right ) = \left (A \otimes B\right )^{T} }$$
(4.18)

are useful for our purposes. Applying vectorization to (4.12) and using (4.18) yields the system of ordinary differential equation (ODEs) or DAEs

$$\displaystyle{ \begin{array}{ll} -\left (E \otimes E\right )^{T}\mathop{ \mathrm{vec}}(\dot{X}(t))& =\mathop{ \mathrm{vec}}(F) + \left (\left (E \otimes A\right )^{T} + \left (A \otimes E\right )^{T}\right )\mathop{\mathrm{vec}}(X(t)) \\ &\qquad -\left (E \otimes E\right )^{T}\left (X(t) \otimes X(t)\right )\mathop{\mathrm{vec}}(G).\end{array} }$$
(4.19)

The terminal condition yields

$$\displaystyle{ \left (E \otimes E\right )^{T}\mathop{ \mathrm{vec}}(X(T)) =\mathop{ \mathrm{vec}}(M). }$$
(4.20)

Together, (4.19) and (4.20) form an initial value problem (in reverse time) for a system of ODEs/DAEs if E is nonsingular/singular with quadratic nonlinearity. (In case n = 1, a classical Riccati differential equation is obtained, thus the name “differential Riccati equation”.)

In case of E being nonsingular, this initial value problem can be solved by any integration method for ODEs. As (4.19) is then a system of n 2 ODEs, already one time step of an explicit integrator would require \(\mathcal{O}(n^{4})\) floating point operations (flops) in general for the matrix vector products. In an implicit integration technique, linear systems of equations need to be solved at a computational cost of \(\mathcal{O}(n^{6})\) flops. Therefore, this approach is limited to very small dimensions n, even if one exploits the symmetry of the DRE as suggested in [62] and only works with the \(n(n + 1)/2\) necessary equations, ignoring the redundant other ones (which is possible, but the data handling is cumbersome). After this initial work on numerical methods for DREs by Kunkel/Mehrmann, it was suggested in [48] to re-write ODE integrators in matrix form. This reduces the cost to \(\mathcal{O}(n^{3})\) flops in dense arithmetic. Due to the usual inherent stiffness of DREs, in the literature, mostly implicit integrators are discussed. In [48], in particular the backward differentiation formulas (BDF) were considered. This was followed up in [53], where an efficient implementation of the matrix-valued BDF for DREs was described. Later, this was extended to large-scale problems employing sparsity in the coefficient matrices in [29]. Also other popular implicit integration techniques were investigated, e.g., the Rosenbrock methods in matrix form in [30, 70] by members of Volker’s academic descendants family.

The case that E is singular is much more involved, though. Let \(\tilde{E}:= \left (E \otimes E\right )^{T}\) and \(\tilde{A}:= \left (E \otimes A\right )^{T} + \left (A \otimes E\right )^{T}\). Then it is shown in [62] that even in the simplest form of a strangeness-free DAE \(E\dot{x} = \mathit{Ax}\), the matrix pencil \(\lambda \tilde{E} -\tilde{ A}\) is singular. If the nilpotency index of (E, A) is 1, then so is the nilpotency index of \(\lambda \tilde{E} -\tilde{ A}\), and the DRE (4.19) can be solved by simply omitting the singular “zero blocks”. A procedure to achieve this is suggested in [62, Section 5]. Apart from the SVD (4.13) and the associated reduction, it includes consistency checks and requires additional algebraic equations to be solved. The reduced DRE is then solved by standard DAE integrators for index-1 systems. Also, a variant for time-dependent coefficients A(t), E(t),  is provided in [62]. If the nilpotency index of the regular matrix pencil λ EA is larger than 1, then in addition to singularity of \(\lambda \tilde{E} -\tilde{ A}\), this matrix pencil may have even a larger nilpotency index, and also different kinds of singular blocks. Therefore, a numerical procedure for this situation cannot be derived easily, and this is an open problem up to now to the best of this author’s knowledge. An interesting question for further research would be whether the singularity problem can be avoided in matrix-valued DRE solvers. Recently, a first step in this direction was proposed by another of Volker’s many Ph.D. students, Jan Heiland. In his thesis [58], he suggests a matrix-valued implicit Euler scheme for a special index-2 DRE, which he calls a differential-algebraic Riccati equation and which arises in the finite-time horizon LQR problem for flow control problems. Further research in the area of matrix-valued solvers for DREs or differential-algebraic Riccati equations is certainly needed.

3 A Unified Treatment of CAREs and DAREs?

A unified treatment of discrete- and continuous-time algebraic Riccati equations is discussed in [77]. We will recall the main result from this paper related to AREs: a then new result on the solution of DAREs using the idea of a unified treatment of continuous- and discrete-time control problems. As this result was derived in complex arithmetic, we will (in contrast to the other sections) in this chapter also formulate the results in complex arithmetic. For this, we denote by M H the complex conjugate transpose of a matrix or vector M, while \(\overline{\mu }\) denotes as usual the complex conjugate of the scalar \(\mu \in \mathbb{C}\).

In addition to the Hamiltonian and symplectic matrices introduced already in Chap. 1, we will need the following structures in this section.

Definition 1

Let \(J = \left [\begin{matrix}\scriptstyle 0_{n} &\scriptstyle I_{n} \\ \scriptstyle -I_{n}&\scriptstyle 0_{n}\end{matrix}\right ] \in \mathbb{R}^{2n\times 2n}\) with I n , 0 n the identity and zero matrices of order n.

  1. (a)

    A matrix pencil λ KH is Hamiltonian iff \(\mathit{KJH}^{H} + \mathit{HJK}^{H} = 0\).

  2. (b)

    A matrix pencil λ TS is symplectic iff \(\mathit{TJT}^{H} = \mathit{SJS}^{H}\).

In this and the next section, a special class of n-dimensional subspaces in \(\mathbb{C}^{2n}\) plays a prominent role.

Definition 2

Let \(\mathcal{L}\subset \mathbb{C}^{2n}\) and \(\dim (\mathcal{L}) = n\). Then \(\mathcal{L}\) is Lagrangian if \(x^{H}\mathit{Jy} = 0\) for all \(x,y \in \mathcal{L}\).

A typical example of a Lagrangian subspace is the graph subspace \(\left [\begin{matrix}\scriptstyle I_{n} \\ \scriptstyle M\end{matrix}\right ]\) for a Hermitian matrix M, see the Chap. 5 for more on the use of graph subspaces.

If we now consider the continuous- and discrete-time LQR problems from Sect. 4.1 with S = 0 for simplicity, and the associated AREs, then we have the following well-known observation: let E = I n , and define F: = C H QC, \(G = \mathit{BR}^{-1}B^{H}\), and

$$\displaystyle{ H = \left [\begin{array}{*{10}c} A& G\\ F &-A^{H} \end{array} \right ],\qquad \lambda T-S:=\lambda \left [\begin{array}{*{10}c} I_{n}&-G \\ 0_{n}& A^{H} \end{array} \right ]-\left [\begin{array}{*{10}c} A&0_{n} \\ F &I_{n} \end{array} \right ]. }$$
(4.21)

Then H is Hamiltonian, λ TS is symplectic. We then have the following result, which is a collection of results that can be found, e.g., in [68]:

Proposition 1

Let \(E = I_{n},S = 0\) , and H,S,T as in (4.21).

  1. (a)

    Assume that \(X_{c} = X_{c}^{H}\) is a solution to the CARE (4.7) , then the columns of \(\left [\begin{matrix}\scriptstyle I_{n} \\ \scriptstyle -X_{c}\end{matrix}\right ]\) span a Lagrangian H-invariant subspace. On the other hand, if the columns of \(\left [\begin{matrix}\scriptstyle U \\ \scriptstyle V \end{matrix}\right ]\) span a Lagrangian H-invariant subspace with \(U \in \mathbb{C}^{n\times n}\) invertible, then \(X_{c} = -\mathit{VU}^{-1}\) is a Hermitian solution to the CARE (4.7) .

  2. (b)

    Let A be nonsingular. Assume that \(X_{d} = X_{d}^{H}\) is a solution to the DARE (4.8) , then the columns of \(\left [\begin{matrix}\scriptstyle I_{n} \\ \scriptstyle -X_{d}\end{matrix}\right ]\) span a Lagrangian deflating subspace of \(\lambda T - S\) . On the other hand, if the columns of \(\left [\begin{matrix}\scriptstyle U \\ \scriptstyle V \end{matrix}\right ]\) span a Lagrangian deflating subspace of λT − S with \(U \in \mathbb{C}^{n\times n}\) invertible, then \(X_{d} = -\mathit{VU}^{-1}\) is a Hermitian solution to the DARE (4.8) .

Note that the CARE (DARE) solutions requested in the LQR problems are associated to the stable invariant (deflating) subspaces of H (λ TS), that is, those associated to the n eigenvalues in the open left half of the complex plane (inside the open unit disk).Footnote 1

This result can be used in order to link the CARE and DARE in the following way, employing the generalized Cayley transformation: given \(\mu \in \mathbb{C}\) with | μ |  = 1, \(\mu ^{2}\not = - 1\). Then the generalized Cayley transformation of a matrix pencil λ LM is given by the matrix pair

$$\displaystyle{ \mathcal{C}_{\mu }(L,M):= \left ((L -\mu M),(\mu L + M)\right ). }$$
(4.22)

With this, Volker proved the following result:

Lemma 1 ([77, Lemma 2])

  1. (a)

    If λT − S is a symplectic pencil, then \(\mathcal{C}_{\mu }(T,S) = (K,H)\) defines a Hamiltonian pencil λK − H.

  2. (b)

    If λK − H is a Hamiltonian pencil, then \(\mathcal{C}_{-\mu }(K,H) = (T,S)\) defines a symplectic pencil λT − S.

As the Hamiltonian matrix H from (4.21) also defines a Hamiltonian pencil λ IH, this lemma relates the Hamiltonian matrix and associated CARE to a symplectic pencil via Lemma 1(b), to which, using some cumbersome algebraic manipulations and some further assumptions on the Cayley shift μ, a DARE can be associated. Vice versa, the symplectic pencil associated to the DARE (4.8) can be related to a Hamiltonian pencil via Lemma 1(a). As it is easy to see that the Cayley transformation leaves deflating subspaces invariant, we thus can use numerical methods for computing deflating subspaces of Hamiltonian pencils to solve DAREs via Cayley transformation and Proposition 1. An algorithm for the Hamiltonian pencil eigenproblem with the desirable properties of being numerically backward stable, requiring only \(\mathcal{O}(n^{3})\) flops, and being structure-preserving in the sense that the symmetry of the spectrum is preserved exactly, was then derived later in [28]. This algorithm extends the method for Hamiltonian matrices based on the symplectic URV decomposition presented in the same paper and discussed in detail in Sect. 1.4 Unfortunately, this algorithm only computes eigenvalues, but no deflating subspaces. Already the extension of the method for Hamiltonian matrices to compute also invariant subspaces turned out to be rather complicated [27], so an extension for the Hamiltonian pencil algorithm was never derived. But it turned out a bit later that the better structure to consider was that of skew-Hamiltonian/Hamiltonian pencils (see the Chap. 2), or even more general, that of even/odd pencils, or palindromic pencils (see the Chap. 3). Numerically stable and efficient algorithms respecting these structures have been derived by Volker and co-workers in the last decades, see various chapters in this book and the recent overview [24].

The main contribution of [77] was to extend the use of the generalized Cayley transformation to more general situations than those discussed in Proposition 1 and Lemma 1. In particular, \(E\not =I_{n}\) is allowed, and the quite restrictive assumption of A being nonsingular in the discrete-time case is dropped. In the situations where \(E\not =I_{n}\), the matrix pencils associated to the CARE and DARE are no longer Hamiltonian and symplectic, respectively. Nevertheless, the same spectral symmetries are observed (which is evident when one considers, as done later, the associated even and palindromic pencil structures). A number of interesting relations of the Cayley transformed pencils are derived in [77], and it is shown that controllability properties of the linear-time invariant systems associated to the Hamiltonian and symplectic matric pencils before and after Cayley transformation remain invariant. This lead Volker to suggest an implication scheme which states that if one can prove “A ⇒ B” for a continuous-time system, one can prove the analogous result for the discrete-time system obtained via Cayley transformation (and vice versa). This eventually lead to the proof of the following result for the existence of DARE solutions which generalizes Proposition 1(b) and is proved using this implication scheme applied to a variant of Proposition 1(a).

Theorem 1 ([77, Theorem 14(b)])

Consider the symplectic matrix pencil

$$\displaystyle{\lambda T-S:=\lambda \left [\begin{array}{*{10}c} A&0_{n} \\ F &I_{n} \end{array} \right ]-\left [\begin{array}{*{10}c} I_{n}&-\mathit{BR}^{-1}B^{H} \\ 0_{n}& A^{H} \end{array} \right ].}$$

Let the columns of \(\left [\begin{matrix}\scriptstyle U \\ \scriptstyle V \end{matrix}\right ]\) span an n-dimensional deflating subspace of λT − S with \(U,V \in \mathbb{C}^{n\times n}\) , \(V ^{H}U = U^{H}V\) , and not containing eigenvectors corresponding to infinite eigenvalues. Suppose there exists \(\mu \in \mathbb{C}\) , |μ| = 1, such that I n −μA and \(\mu I + A -\mathit{BR}^{-1}B^{H}(\mu I_{n} - A^{H})^{-1}F\) are invertible, as well as

$$\displaystyle{\varPsi (\mu ):= R + B^{H}(A -\mu I_{ n})^{-H}F(A -\mu I_{ n})^{-1}B}$$

is definite. Assume further that (A,B) is controllable (i.e., \(\mathop{\mathrm{rank}}\nolimits ([A -\lambda I_{n},\,B]) = n\) for all \(\lambda \in \mathbb{C}\) ).

Then U is invertible and \(X_{d}:= -\mathit{VU}^{-1}\) solves the DARE

$$\displaystyle{0 = F + A^{H}\mathit{XA} - X + A^{H}\mathit{XB}(R + B^{H}\mathit{XB})^{-1}B^{H}\mathit{XA}.}$$

This result does not require A to be invertible. The nonsingularity assumptions in the theorem are needed to apply the generalized Cayley transformation with shift μ to λ TS and to convert the resulting Hamiltonian pencil to a Hamiltonian matrix, for which a variant of Proposition 1(a) can then be used to prove the assertions in the continuous-time case. The proof then follows from the implication scheme.

Highlighting this result gives a glimpse on the often observed interdisciplinary work of Volker Mehrmann, here linking systems and control theory with matrix analysis. It also demonstrates his keen interest in deriving fundamental and general principals that allow to solve classes of problems rather than just specific instances of a given problem.

4 Numerical Methods for AREs

We note that this section is based in large parts on [10] which concentrated on the CARE and continued the effort of providing surveys of ARE solvers given as in [4] and by Volker with co-workers [40]. We use an abridged and updated version of the survey given there, with pointers to corresponding methods for DAREs which appear throughout Volker’s work. In this section, we restrict ourselves again to AREs in real arithmetic. Also, we will now always assume that E is nonsingular such that there is a clear relation of the LQR problem and the related ARE as outlined in the Introduction. For the ease of presentation, we will use the following simplified ARE versions. We consider CAREs of the form

$$\displaystyle{ 0 = \mathcal{R}_{c}(X)\:=\ F + A^{T}X + \mathit{XA} -\mathit{XGX}, }$$
(4.23)

and DAREs

$$\displaystyle{ 0 = \mathcal{R}_{d}(X) = F + A^{T}\mathit{XA} - X - (A^{T}\mathit{XB})(R + B^{T}\mathit{XB})^{-1}(A^{T}\mathit{XB})^{T}, }$$
(4.24)

where in the LQR setting, F: = C T QC and \(G:= \mathit{BR}^{-1}B^{T}\). The solution of (4.23) yielding the optimal feedback control then is the unique stabilizing solution X c , i.e., AGX c is stable in the sense that all its eigenvalues are in the open left half plane \(\mathbb{C}^{-}\). Similarly, in the discrete-time case, the stabilizing solution X d yields a stable closed-loop matrix \(A - B(R + B^{T}X_{d}B)^{-1}B^{T}X_{d}A\) in the sense that all its eigenvalues are inside the open unit disk.

It should be noted that in the non-descriptor, non-singular case, i.e., when E and R are invertible, it is always possible to rewrite CAREs and DAREs in this simplified way. In practice, though, this should only be done in case all transformations are well-conditioned. See, e.g., [4, 9, 31, 76] for algorithms working in the more general formulations and avoiding inversions and matrix products in forming the coefficients as far as possible and necessary.

4.1 Methods Based on the Hamiltonian and Symplectic Eigenproblems

As discussed in the previous section, solving AREs can be achieved using methods to compute certain invariant or deflating subspaces of Hamiltonian matrices or symplectic pencils. If such a subspace \(\left [\begin{matrix}\scriptstyle U \\ \scriptstyle V \end{matrix}\right ]\) is Lagrangian with \(U \in \mathbb{R}^{n\times n}\) invertible, the formula

$$\displaystyle{ X = -\mathit{VU}^{-1} }$$
(4.25)

yields a symmetric solution to the CARE or DARE, respectively. Of course, for numerical stability the condition number of U should be small, which can be achieved by certain scaling procedures (see [9] for a comparison of several scaling strategies) or, more recently, using the principle of permuted graph matrices introduced by Mehrmann and Poloni, see the Chap. 5 for details of this approach.

The required solutions for solving the LQR problems are obtained if \(\left [\begin{matrix}\scriptstyle U \\ \scriptstyle V \end{matrix}\right ]\) corresponds to the stable eigenvalues of the corresponding Hamiltonian matrix or symplectic pencil as explained in the previous section. Due to spectral symmetries of Hamiltonian matrices H explained in Chap. 1, i.e., with λ also \(-\overline{\lambda }\) is an eigenvalue of H, and the analogous property of symplectic pencils (with λ also \(1/\overline{\lambda }\) is an eigenvalue), a sufficient condition for the existence of n-dimensional stable invariant subspaces is spectral dichotomy, i.e., no eigenvalues lie on the boundary of the respective stability region. This dichotomy property is usually satisfied under assumptions that lead to the unique solvability of LQR problems via AREs and is thus assumed throughout this section.

First, we will consider methods for CAREs based on solving the Hamiltonian eigenproblem for H as in (4.21). Using Proposition 1(a), in order to solve the CARE it is sufficient to find a nonsingular matrix \(T \in \mathbb{R}^{2n\times 2n}\) such that

$$\displaystyle{ T^{-1}\mathit{HT} = \left [\begin{array}{cc} H_{11} & H_{12} \\ 0 &H_{22} \end{array} \right ], }$$
(4.26)

where the eigenvalues of \(H_{11} \in \mathbb{R}^{n\times n}\) are all in \(\mathbb{C}^{-}\). Partitioning T analogously, the columns of \(\left [\begin{matrix}\scriptstyle T_{11} \\ \scriptstyle T_{12}\end{matrix}\right ]\) span the stable Lagrangian H-invariant subspace, and the desired stabilizing solution X c of the CARE (4.23) is obtained via (4.25) setting U = T 11 and V = T 12.

Most algorithms for solving matrix eigenproblems, i.e., for computing eigenvalues and -vectors or invariant subspaces of some matrix \(M \in \mathbb{R}^{\ell\times \ell}\), are based on the following approach:

Generic procedure to compute invariant subspaces:

  1. 1.

    Compute an initial transformation matrix \(S_{0} \in \mathbb{R}^{\ell\times \ell}\) in order to reduce M to some condensed form, i.e., compute

    $$\displaystyle{ M_{0}\:=\ S_{0}^{-1}\mathit{MS}_{ 0}. }$$
    (4.27)
  2. 2.

    Then construct a sequence of similarity transformations such that in each step

    $$\displaystyle{ M_{j+1}\:=\ S_{j+1}^{-1}M_{ j}S_{j+1},\qquad j = 0,1,2,\ldots, }$$
    (4.28)

    the reduced form is preserved and moreover, if we define \(T_{j}:=\prod _{ k=0}^{j}S_{k}\) , then \(\lim _{j\rightarrow \infty }T_{j} = T\) and \(\lim _{j\rightarrow \infty }M_{j} = M_{{\ast}}\) exist and eigenvalues and eigenvectors and/or M-invariant subspaces can be read off from M and T.

The purpose of the initial reduction to a condensed form and the preservation of this form throughout the iteration is twofold: first, such a reduction is usually necessary in order to satisfy the complexity requirements – an iteration step (4.28) on a reduced form can usually be implemented much cheaper than for a full matrix; second, using such a reduced form it is usually easier to track the progress of the iteration and detect if the problem can be decoupled (deflation) into smaller subproblems that can then be treated separately. For details see [56, Chapters 78].

For numerical stability, one usually requires all S j and thus T j , T to be orthogonal. Under this requirement, the real Schur decomposition of H is a reasonable choice in (4.26), and this was suggested in the seminal paper by Laub [71]. A disadvantage is that the structure of the Hamiltonian matrix is already destroyed in the initial Hessenberg reduction, leading to the loss of spectral symmetry in finite precision arithmetic. In the worst case, this may lead to violation of the spectral dichotomy as perturbations may move the eigenvalue to or across the boundary of the stability region. This can be avoided by using symplectic similarity transformations as then all iterates in the generic procedure above remain Hamiltonian and thus, the spectral symmetry is preserved. As already seen in Chap. 1, implementing the above procedure satisfying both demands, symplectic and orthogonal similarity transformations is only possible under certain assumptions due to the lack of an efficient procedure for computing the Hamiltonian Schur form, also called Van Loan’s curse.

We will now briefly discuss the numerical methods for the Hamiltonian and symplectic eigenproblems that can be used to solve CAREs and DAREs. As most of them have already been discussed in Chap. 1, we keep this discussion short and highlight only Volker’s contributions in more or less chronological order.

4.1.1 The SR Algorithm

In [41], Bunse-Gerstner and Mehrmann suggest to use the SR algorithm to implement the generic procedure to compute the stable H-invariant subspace and then to solve the CARE via (4.25). The SR algorithm is described in some detail in Sect. 1.2 Its obvious advantage is the preservation of the Hamiltonian structure due to the exclusive use of symplectic similarity transformations. Another advantage is that it is fast: it requires only \(\mathcal{O}(n)\) flops per iteration and generically converges with a cubic rate. As non-orthogonal transformations are required, numerical stability is lost and the computed CARE solution may not be as accurate as desired. A possible remedy for this problem is defect correction (see Sect. 4.4.2), which can be used to improve the accuracy of an approximate CARE solution.

Given that the method is not numerically backward stable, some variations have been suggested that compute the approximation to the CARE solution even faster by giving up orthogonal transformations altogether, see [42]. Another idea is to iterate directly on the approximation of the CARE rather than computing the Lagrangian invariant subspaces explicitly. For this, an updating procedure is suggested in [47].

Moreover, an SR algorithm for symplectic eigenvalue problems was also derived by Volker and co-workers, see [55] for a first variant and Chap. 1 for further developments of this method.

4.1.2 The Hamiltonian QR Algorithm

The ideal algorithm for the Hamiltonian and symplectic eigenvalue problems would be a method that computes the Hamiltonian Schur form using the generic method described above, where only symplectic and simultaneously orthogonal similarity transformations are used. The existence of the Hamiltonian Schur form under rather generic assumptions, in particular under the spectral dichotomy condition we assume here, was proved in [82]. The resulting quest for a Hamiltonian QR algorithm became known under the name Van Loan’s curse and is described in some detail in Chap. 1. In summary, the lack of the existence of a Hamiltonian or symplectic QR algorithm of cubic complexity under the same general assumptions that allow for structured Schur forms is due to the non-existence of structured Hessenberg-like forms that stay invariant under structured QR iterations. Byers [43, 44] was able to derive such a Hamiltonian Hessenberg form for the special case that \(\mathop{\mathrm{rank}}\nolimits (F) = 1\) or \(\mathop{\mathrm{rank}}\nolimits (G) = 1\) in (4.21). This allows to compute the Hamiltonian Schur form using orthogonal and symplectic similarity transformations in \(\mathcal{O}(n^{3})\) flops and to solve the CARE (4.23) and the associated continuous-time LQR problem via (4.25).

Under the same rank assumptions on G or F for the symplectic pencil in (4.21), Volker was able to derive an analogous QZ-type algorithm [75]. This algorithm then was probably the most involved eigenvalue algorithm of QR-type, with a technically demanding sequence of elementary eliminations in the bulge-chasing process!

In the following, we will discuss the Hamiltonian QR algorithm and Volker’s contribution to stop the search for a Hamiltonian Hessenberg form in the general situation. We omit the symplectic case as it is technically much more involved and refer to the original paper [75] for details.

As already noted, the Hamiltonian QR algorithm should be based on orthogonal and symplectic similarity transformations. This implies a special structure.

Lemma 2 ([82])

If \(U \in \mathbb{R}^{2n\times 2n}\) is orthogonal and symplectic, then

$$\displaystyle{ U = \left [\begin{array}{cc} U_{1} & U_{2} \\ - U_{2} & U_{1} \end{array} \right ],\qquad U_{1},U_{2} \in \mathbb{R}^{n\times n}. }$$
(4.29)

Moreover, as the intersection of two matrix groups, orthogonal symplectic matrices form a matrix group \(\mathcal{U}\!\!\mathcal{S}_{2n}\) with respect to matrix multiplication.

As elements of \(\mathcal{U}\!\!\mathcal{S}_{2n}\) are determined by the 2n 2 parameters given by the entries of U 1, U 2, only these parameters need to be stored and updated throughout a sequence of similarity transformations.

Now, the following theorem raises the hope that it is possible to find an algorithm based on symplectic and orthogonal similarity transformations for solving CAREs.

Theorem 2 ([82])

If H is Hamiltonian and \(\varLambda (H) \cap \imath \mathbb{R} = \varnothing \) , then there exists \(U \in \mathcal{U}\!\!\mathcal{S}_{2n}\) such that

$$\displaystyle{ U^{T}\mathit{HU}\ =\ \left [\begin{array}{cc} \hat{H}_{11} & \hat{H}_{12} \\ 0 & -\hat{ H}_{11}^{T} \end{array} \right ],\qquad \hat{H}_{11},\hat{H}_{12} \in \mathbb{R}^{n\times n}, }$$
(4.30)

where \(\hat{H}_{11}\) is in real Schur form and \(\varLambda (\hat{H}_{11}) =\varDelta\) (the stable part of the spectrum of H).

Partitioning U from (4.30) as in (4.29), we have from (4.25) that the stabilizing solution of the CARE (4.23) is given by \(X_{c} = U_{2}U_{1}^{-1}\).

Remark 2

The decomposition given in (4.30) is called the Hamiltonian Schur form. It can be shown that such a form may also exist if eigenvalues on the imaginary axis are present. They have to satisfy certain properties, the most obvious one is that their algebraic multiplicity needs to be even; see [72] and Chap. 6.

As the QR algorithm is considered to be the best method for solving the dense non-symmetric eigenproblem, it is straightforward to strive for a symplectic variant of the QR algorithm converging to the Hamiltonian Schur form given in (4.30). A framework for such an algorithm can easily be derived. Denote the iterates of such an algorithm by H j . If we choose the QR decomposition performed in each step, i.e., \(p_{j}(H_{j}) = S_{j+1}R_{j+1}\), such that all S j+1 are symplectic and orthogonal, then it follows that all iterates \(H_{j+1} = S_{j+1}^{T}H_{j}S_{j+1}\) are Hamiltonian. Unfortunately, such a symplectic QR decomposition does not always exist. Sets of matrices in \(\mathbb{R}^{2n\times 2n}\) for which it exists are described in [39]. In particular, it is also shown there (see [44] for a constructive proof) that if M is symplectic, then there exists \(S \in \mathcal{U}\!\!\mathcal{S}_{2n}\) such that

(4.31)

where \(R_{11},R_{12} \in \mathbb{R}^{n\times n}\). Uniqueness of this decomposition can be achieved by requiring all diagonal entries of R 11 to be positive.

As the matrix R in (4.31) is permutationally similar to an upper triangular matrix and the Hamiltonian Schur form is similar to the real Schur form using the same permutations, it can be shown under mild assumptions that such a Hamiltonian QR algorithm converges to Hamiltonian Schur form if it exists. Moreover, as only similarity transformations in \(\mathcal{U}\!\!\mathcal{S}_{2n}\) are used, the algorithm can be shown to be strong backward stable in the sense of Bunch [38].

Byers shows in [44] that if the rational function p j is chosen to be the Cayley shift \(c_{k}(t):= (t -\mu _{k})(t +\mu _{k})^{-1}\), where μ k is an approximate real eigenvalue of H, or \(d_{k}(t):= (t -\mu _{k})(t -\overline{\mu _{k}})(t +\mu _{k})^{-1}(t + \overline{\mu _{k}})^{-1}\), where μ k is an approximate complex eigenvalue of H, then p j (H j ) is symplectic, and hence, the symplectic QR decomposition of p j (H j ) exists. In case ±μ k are exact eigenvalues of H and hence of H j , then deflation is possible, and we can proceed with the deflated problem of smaller dimension without ever being forced to invert a singular matrix. In this way, a double or quadruple Hamiltonian QR step can be implemented.

Unfortunately, the so derived algorithm is of complexity \(\mathcal{O}(n^{4})\) as each symplectic QR decomposition requires \(\mathcal{O}(n^{3})\) flops and usually, \(\mathcal{O}(n)\) iterations are required (based on the experience that for each eigenvalue, 1–2 iterations are needed). The missing part that would bring the computational cost down to \(\mathcal{O}(n^{3})\) is an initial reduction analogous to the Hessenberg reduction in the QR algorithm that

  • is invariant under the similarity transformation performed in each step of the Hamiltonian QR algorithm (the Hamiltonian QR step);

  • admits an implementation of the Hamiltonian QR step using only \(\mathcal{O}(n^{2})\) flops.

In [44] Byers shows that such a form exists.

Definition 3

A Hamiltonian matrix \(H \in \mathbb{R}^{2n\times 2n}\) is in Hamiltonian Hessenberg form if

(4.32)

where \(H_{ij} \in \mathbb{R}^{n\times n}\), i, j = 1, 2, H 11 is upper Hessenberg, and \(H_{21} =\varphi e_{n}e_{n}^{T}\) with \(\varphi \in \mathbb{R}\) and e n being the nth unit vector. The Hamiltonian Hessenberg matrix H is unreduced if \(h_{i+1,i}\not =0\), \(i = 1,\ldots,n - 1\), and \(\varphi \not =0\).

Byers [44] shows that if H j is in Hamiltonian Hessenberg form and the rational function p j is chosen as a Cayley shift, then H j+1 is in Hamiltonian Hessenberg form again and the Hamiltonian QR step can be implemented in \(\mathcal{O}(n^{2})\) flops.

The crux of this algorithm is the initial reduction of a Hamiltonian matrix to Hamiltonian Hessenberg form. Byers shows how this can be achieved if one of the off-diagonal blocks of the Hamiltonian matrix H in (4.21) has rank 1. (This is related to control systems of the form (4.2) having only one input (m = 1), i.e., single-input systems and/or only one output (p = 1), i.e., single-output systems.) But unfortunately no algorithm is known for reducing a general Hamiltonian matrix to Hamiltonian Hessenberg form. But the situation is even worse. Analogous to the standard QR algorithm where the QR step is performed on unreduced Hessenberg matrices (possibly zeros on the subdiagonal are used for deflation, i.e., splitting the problem in two or more subproblems consisting of unreduced Hessenberg matrices), the Hamiltonian QR algorithm works for unreduced Hamiltonian Hessenberg matrices. The following theorem due to Ammar and Mehrmann [3] shows that the situation is in general hopeless with respect to the existence of the unreduced Hamiltonian Hessenberg form.

Theorem 3

If \(H \in \mathbb{R}^{2n\times 2n}\) is Hamiltonian, then there exists an orthogonal and symplectic matrix transforming H to unreduced Hamiltonian Hessenberg form if and only if the nonlinear set of equations

$$\displaystyle{x^{T}x = 1\qquad \mbox{ and}\qquad x^{T}\mathit{JH}^{2k-1}x = 0\quad \mbox{ for}\quad k = 1,\ldots,n - 1,}$$

has a solution that is not contained in an H-invariant subspace of dimension n or less.

Obviously, if JH is positive definite, such a vector cannot exist, showing that there really exist situations in which the unreduced Hamiltonian Hessenberg form does not exist. A similar result holds in the symplectic case. Therefore, other approaches have been investigated during the last decade.

4.1.3 The Multishift Algorithm

From Theorem 3 we know that the reduction to Hamiltonian Hessenberg form which is necessary to efficiently implement the Hamiltonian QR algorithm is in general not possible. Nevertheless, the same paper [3] suggested a possible alternative method that became the topic of my diploma thesis [8] and eventually lead to the paper [2].

The basis of this idea is that by orthogonal symplectic similarity transformations, the following reduction due to Paige and Van Loan [82] can be achieved.

Theorem 4

Let \(H \in \mathbb{R}^{2n\times 2n}\) . Then there exists \(U \in \mathcal{U}\!\!\mathcal{S}_{2n}\) such that

(4.33)

where \(H_{11} \in \mathbb{R}^{n\times n}\) is upper Hessenberg and \(H_{21} \in \mathbb{R}^{n\times n}\) is upper triangular. The transformation matrix U can be chosen such that

$$\displaystyle{ U = \left [\begin{array}{cc|cc} 1 & 0 &0& 0\\ 0 & \tilde{U}_{ 1} & 0&\tilde{U}_{2} \\ \hline 0& 0 &1& 0\\ 0 & -\tilde{ U}_{ 2} & 0&\tilde{U}_{1} \end{array} \right ]. }$$
(4.34)

If in addition, H is Hamiltonian, then

(4.35)

i.e., H 21 is diagonal and H 12 is symmetric.

The reduced form (4.35) of a Hamiltonian matrix will be called PVL form in the following. An algorithm for computing the transformation given in (4.35) is derived in [82]. It can be implemented using a finite number of similarity transformations and requires \(\mathcal{O}(n^{3})\) flops. Unfortunately, the PVL form is not preserved under the Hamiltonian QR iteration and can therefore not serve for the initial reduction step of the Hamiltonian QR algorithm. In the following, we will see that the PVL form can be used in what can be considered as a Hamiltonian multishift QR algorithm.

First, we need some more theory. As before, we denote \(J = \left [\begin{matrix}\scriptstyle 0_{n} &\scriptstyle I_{n} \\ \scriptstyle -I_{n}&\scriptstyle 0_{n}\end{matrix}\right ]\).

Definition 4

A real subspace \(\mathcal{L}\subset \mathbb{R}^{2n}\) is isotropic iff \(x^{T}\mathit{Jy} = 0\) for all \(x,y \in \mathcal{L}\). If \(\mathcal{L}\) is maximal, i.e., not contained in an isotropic subspace of larger dimension, then \(\mathcal{L}\) is a Lagrangian subspace (cf. Definition 1).

By the definition of symplectic matrices, we obtain immediately the following lemma.

Lemma 3

Let \(S \in \mathbb{R}^{2n\times 2n}\) be symplectic. Then the first r columns of S, 1 ≤ r ≤ n, span an isotropic subspace of \(\mathbb{R}^{2n}\) . For r = n, this subspace is Lagrangian.

The basis for the multishift algorithm is contained in the following result.

Proposition 2 ([3])

Let \(H \in \mathbb{R}^{2n\times 2n}\) be a Hamiltonian matrix with spectrum \(\varLambda (H) =\varDelta _{n} \cup (-\varDelta _{n})\) , \(\varDelta _{n} \cap (-\varDelta _{n}) = \varnothing \) , and \(\varDelta _{n} = \overline{\varDelta _{n}} =\{\,\lambda _{1},\ldots,\lambda _{n}\}\) . Then the multishift vector

$$\displaystyle{ x =\alpha (H -\lambda _{1}I_{2n})\cdots (H -\lambda _{n}I_{2n})e_{1},\quad \alpha \in \mathbb{R}, }$$
(4.36)

where \(e_{1} \in \mathbb{R}^{2n}\) is the first unit vector, is contained in the n-dimensional H-invariant subspace corresponding to −Δ n . Moreover, this subspace is Lagrangian. In particular, if \(\varDelta _{n} \subset \mathbb{C}^{+}:=\{ z \in \mathbb{C}\,\vert \,\mathfrak{R}(z) > 0\}\) , then this Lagrangian subspace is the stable H-invariant subspace.

So, once we know the spectrum of H, we can compute one vector that is contained in the subspace required for solving the corresponding CARE. This observation can be combined with the computation of the PVL form in order to derive a multishift step as follows – assuming for simplicity that H has no eigenvalues on the imaginary axis.

Algorithm 1 One step of the multishift algorithm for Hamiltonian matrices

Multishift step

Using this approach, it is possible to get the whole stable H-invariant subspace. The following theorem will indicate how Algorithm 1 can be used to achieve this.

Theorem 5

Let \(H \in \mathbb{R}^{2n\times 2n}\) be Hamiltonian and let \(\mathcal{V}_{n}\) be an n-dimensional H-invariant Lagrangian subspace corresponding to \(\varDelta _{n} \subset \varLambda (H)\) with Δ n as in Proposition  2 . Further, let the multishift vector x from (4.36) be computed using \(-\varDelta _{n} =\{\,\lambda _{1},\ldots,\lambda _{n}\,\}\) as the shifts. If 1 ≤ p ≤ n is the dimension of the minimal isotropic H-invariant subspace \(\mathcal{V}_{p}\) containing x, then after Step 4 of the multishift step, H 2 has the form

$$\displaystyle{ H_{2} = \left [\begin{array}{cc|cc} A_{11} & A_{12} & G_{11} & G_{21} \\ 0 &A_{22} & G_{21}^{T} & G_{22} \\ \hline 0& 0 & - A_{11}^{T}& 0 \\ 0 &F_{22} & - A_{12}^{T}& - A_{22}^{T} \end{array} \right ]\!\!\begin{array}{l} \}p \\ \}n - p\\ \}p \\ \}n - p \end{array}, }$$
(4.37)

where \(A_{11} \in \mathbb{R}^{p\times p}\) , \(\varLambda (A_{11}) \subset \varDelta _{n}\) , and the Hamiltonian submatrix

$$\displaystyle{H_{22}:= \left [\begin{array}{*{10}c} A_{22} & G_{22} \\ F_{22} & -A_{22}^{T} \end{array} \right ] \in \mathbb{R}^{2(n-p)\times 2(n-p)}}$$

is in PVL form.

Furthermore, for \(U_{1},U_{2} \in \mathcal{U}\!\!\mathcal{S}_{2n}\) from the multishift step we have

$$\displaystyle{U:= U_{1}U_{2}\,=\,[\,u_{1},\ldots,u_{p},u_{p+1},\ldots,u_{2n}\,] \in \mathcal{U}\!\!\mathcal{S}_{2n},\quad u_{j} \in \mathbb{R}^{2n}\ \mbox{ for}\ j\,=\,1,\ldots,2n,}$$

and \(\mathrm{span}\left \{u_{1},\ldots,u_{p}\right \} = \mathcal{V}_{p} \subset \mathcal{V}_{n}\) .

The detailed proof of this result is contained in [10].

The theorem shows that if the multishift vector x from (4.36) has components in all directions of a Lagrangian H-invariant subspace, then after one multishift step, a basis for this invariant subspace is given by the first n columns of U 1 U 2. Otherwise, the first p columns of U 1 U 2 span a p-dimensional H-invariant subspace contained in this subspace and the problem decouples into two subproblems. Algorithm 1 can then repeatedly be applied to the resulting Hamiltonian submatrix \(H_{22} \in \mathbb{R}^{2(n-p)\times 2(n-p)}\) until p = n. The implementation of this algorithm is described in detail in [2].

As only orthogonal symplectic similarity transformations are used, a multishift step is strongly backward stable. The computational cost of one multishift step for p = 0 is around 15 % of the Schur vector method [71]. The complete computational cost depends on the number of iteration steps necessary. In a worst case scenario, i.e., in each step only one basis vector of \(\mathcal{V}_{n}\) is found, the complexity of this algorithm becomes basically \(\mathcal{O}(n^{4})\). This is rarely observed in praxis, though. On the other hand, rounding errors during the computation, in particular while forming the multishift vector, and the fact that the eigenvalues are usually only known approximately, make it practically impossible that deflation occurs exactly. Often, some iteration steps are necessary to detect deflation when using finite precision arithmetic. Generally speaking, as long as the size of the problem is modest (n ≤ 100), the method is feasible and the number of required iterations is acceptable.

When solving CAREs, usually the stable H-invariant subspace is required. In that case, Δ n in Proposition 2 has to be chosen such that \(\mathfrak{R}(\lambda _{j}) < 0\) for all j = 1, , n. Note that the stable H-invariant subspace is Lagrangian; see, e.g., [3, 76]. But observe that in principle, the multishift algorithm can be used to compute the CARE solution corresponding to any Lagrangian H-invariant subspace. This is of particular importance in some applications, e.g., in some \(\mathcal{H}_{\infty }\)-control problems, ARE solutions exist and have to be computed if H has eigenvalues on the imaginary axis. As long as these eigenvalues permit a Lagrangian invariant subspace, the corresponding ARE solutions can be computed by the multishift algorithm.

The computation of the multishift vector in (4.36) requires the knowledge of the spectrum of H. Hence, what remains to show is how to obtain the eigenvalues of a Hamiltonian matrix H. One possibility is to run the QR algorithm without accumulating the transformations. But then the problems with eigenvalues close to the imaginary axis as mentioned above have to be expected. A different approach, which costs only one third of the QR algorithm and takes the symmetry of Λ(H) into account, was suggested by Van Loan [87]. Consider K: = H 2. Obviously, if λ ∈ Λ(H), then \(\lambda _{K}:=\lambda ^{2} \in \varLambda (K)\). If \(\mathfrak{R}(\lambda )\not =0\), then λ K is a double eigenvalue of K due to the symmetry of Λ(H). Squared Hamiltonian matrices are skew-Hamiltonian, that is, they satisfy \(\mathit{KJ} = -(\mathit{KJ})^{T}\) and therefore have the explicit block structure

$$\displaystyle{ K = \left [\begin{array}{*{10}c} K_{1} & K_{2} \\ K_{3} & K_{1}^{T} \end{array} \right ],\qquad K_{2} = -K_{2}^{T},\quad K_{ 3} = -K_{3}^{T}. }$$
(4.38)

The skew-Hamiltonian structure is preserved under symplectic similarity transformations [87]. Hence, computing the PVL form (4.33) for skew-Hamiltonian matrices yields

(4.39)

Hence, Λ(K) can be obtained by computing the eigenvalues of the upper Hessenberg matrix \(\tilde{K}_{1}\), e.g., by applying the QR iteration to \(\tilde{K}_{1}\). Let \(\varLambda (\tilde{K}_{1}) =\{\,\mu _{1},\ldots,\mu _{n}\,\}\), then \(\varLambda (H) =\{\, \pm \sqrt{\mu _{1}},\ldots,\pm \sqrt{\mu _{n}}\,\}\). Note that no information about eigenvectors or invariant subspaces of H is obtained.

The resulting method is strongly backward stable for K and preserves the symmetry structures of Λ(K) and Λ(H). An implicit version of this algorithm is also suggested in [87]; U from (4.39) is applied directly to the Hamiltonian matrix such that \(\tilde{H}:= U^{T}HU\) is square-reduced, i.e., \(\tilde{H}^{2}\) has the form given in (4.39). The disadvantage of Van Loan’s method is that a loss of accuracy up to half the number of significant digits of the computed eigenvalues of H is possible. An error analysis in [87] shows that for a computed simple eigenvalue \(\tilde{\lambda }\) corresponding to λ ∈ Λ(H) we have

$$\displaystyle{ \vert \lambda -\tilde{\lambda }\vert \approx \min \left \{\frac{\varepsilon \Vert H\Vert _{2}^{2}} {s(\lambda )\vert \lambda \vert },\; \frac{\sqrt{\varepsilon }\Vert H\Vert _{2}} {s(\lambda )} \right \} =\varepsilon \frac{\Vert H\Vert _{2}} {s(\lambda )} \times \min \left \{\frac{\Vert H\Vert _{2}} {\vert \lambda \vert },\; \frac{1} {\sqrt{\varepsilon }}\right \}, }$$
(4.40)

where s(λ), the reciprocal condition number of λ, is the cosine of the acute angle between the left and right eigenvectors of H corresponding to λ. Basically, this error estimate indicates that eigenvalues computed by Van Loan’s method are as accurate as those computed by a numerically backward stable method provided that \(\lambda \approx \| H\|_{2}\) while for \(\lambda \ll \| H\|_{2}\), the error grows with the ratio \(\|H\|_{2}/\vert \lambda \vert \).

Usually, eigenvalues computed by Van Loan’s method are satisfactory as shifts for the multishift algorithm and in most other practical circumstances. On the other hand, removing the possible \(1/\sqrt{\varepsilon }\) loss of accuracy provides the motivation of the algorithm presented in the next section.

4.1.4 A Method Based on the Symplectic URV Decomposition

The method described in this subsection was the key to breaking Van Loan’s curse as already described in Chap. 1. As we investigate it here in the context of solving CAREs, we will also need some details and therefore repeat the essential steps.Footnote 2

The central problem of Van Loan’s method is that squaring the Hamiltonian matrix leads to a possible loss of half of the accuracy. For products of general matrices, this possible loss of accuracy caused by forming the product can be circumvented by employing the periodic or cyclic QR algorithm [37, 59, 60]. If \(A = A_{1} \cdot A_{2}\cdots A_{p}\), where \(A_{j} \in \mathbb{R}^{n\times n}\), j = 1, , p, then this algorithm computes the real Schur form of A without forming A explicitly. This is achieved by cyclically reducing the factors A j to (quasi-)upper triangular form:

(4.41)

Here, \(U_{1}^{T}A_{1}U_{2}\) is in real Schur form while \(U_{j}^{T}A_{j}U_{(j+1)\bmod p}\), j = 2, , p, are upper triangular such that the product is in real Schur form. The eigenvalues are then obtained from computing the eigenvalues of the 1 × 1 and 2 × 2 blocks on the diagonal of the product in (4.41). This method is numerically backward stable and avoids the loss of accuracy in the eigenvalues as the product A is never formed explicitly.

The idea is now to employ this approach to H 2 by replacing the reduction of H 2 to PVL form by \(U^{T}H^{2}U = (U^{T}\mathit{HV })(V ^{T}\mathit{HU})\), where \(U,V \in \mathcal{U}\!\!\mathcal{S}_{2n}\). This can be achieved by the symplectic URV -like decomposition given in [28].

Proposition 3

For \(H \in \mathbb{R}^{2n\times 2n}\) there exist \(U,V \in \mathcal{U}\!\!\mathcal{S}_{2n}\) such that

(4.42)

i.e., H 1 is upper triangular and H 2 is upper Hessenberg. If, in addition, H is Hamiltonian, then

(4.43)

and the eigenvalues of H are the positive and negative square roots of the eigenvalues of the upper Hessenberg matrix H 2 H 1 .

That is, using the decomposition given in (4.42) we obtain the PVL form of H 2 without explicitly squaring H. In order to obtain the eigenvalues of H we then apply the periodic QR algorithm to \(H_{2}H_{1}\).

In [28] an algorithm for computing the decomposition given in (4.42) is presented. It requires a finite number of transformations. The combined cost of computing the decomposition (4.42) and applying the periodic QR algorithm to H 2 H 1 is about 48n 3 flops – this is 1.5 × the computational cost of Van Loan’s method and about 60 % of the cost of the QR algorithm applied to a non-symmetric 2n × 2n matrix. The method is numerically backward stable as only orthogonal transformations are used. The symmetry property of Λ(H) is preserved and in this sense the method can be considered to be strongly backward stable.

A detailed error analysis of the above method yields the following result [28]. Essentially (under mild assumptions), for a nonzero and simple eigenvalue λ of a Hamiltonian matrix \(H \in \mathbb{R}^{2n\times 2n}\), the algorithm based on the symplectic URV-like decomposition followed by applying the periodic QR algorithm to H 2 H 1 from (4.42) yields a computed eigenvalue \(\tilde{\lambda }\) satisfying

$$\displaystyle{\vert \tilde{\lambda }-\lambda \vert \leq \frac{2\Vert H\Vert \varepsilon } {s(\lambda )} + \mathcal{O}(\varepsilon ^{2}).}$$

This is the accuracy to be expected from any backward stable method like the QR algorithm and shows that by avoiding to square H, we get the full possible accuracy.

Nevertheless, as Van Loan’s method, the approach presented above does not provide the H-invariant subspaces. But based on (4.42) it is possible to derive an algorithm that can be used to compute the stable H-invariant subspace and the solution of the CARE (4.23) [27]. The basis for this algorithm is the following theorem.

Theorem 6 ([27])

Let \(A \in \mathbb{R}^{n\times n}\) and define \(B = \left [{ \,0\;\;A\, \atop \,A\;\;0\,} \right ]\) . Then the spectra of A and B are related via \(\varLambda (B) =\varLambda (A) \cup (-\varLambda (A))\) . Further, let \(\varLambda (A) \cap \imath \mathbb{R} = \varnothing \) . If the columns of \(\left [U_{1}^{T},U_{2}^{T}\right ]^{T} \in \mathbb{R}^{2n\times n}\) span an orthogonal basis for a B-invariant subspace such that

$$\displaystyle{B\left [\begin{array}{l} U_{1} \\ U_{2} \end{array} \right ] = \left [\begin{array}{l} U_{1} \\ U_{2} \end{array} \right ]R,\qquad \varLambda (R) \subset \mathbb{C}^{+}\cap \varLambda (B),}$$

then \(\mathrm{range}\left (U_{1} + U_{2}\right )\) is the A-invariant subspace corresponding to \(\varLambda (A) \cap \mathbb{C}^{+}\) and \(\mathrm{range}\left (U_{1} - U_{2}\right )\) is the stable A-invariant subspace.

An orthogonal basis for the subspace defined by \(\mathrm{range}\left (U_{1} - U_{2}\right )\) can be obtained, e.g., from a rank-revealing QR decomposition of U 1U 2; see, e.g., [56].

In general it is of course not advisable to use the above result in order to obtain the stable invariant subspace of a matrix A as one would have to double the dimension and thereby increase the computational cost and required workspace significantly as compared to applying the QR algorithm to A. But we will see that for Hamiltonian matrices, the given structure makes this approach very attractive.

Let \(H \in \mathbb{R}^{2n\times 2n}\) be Hamiltonian with \(\varLambda (H) \cap \imath \mathbb{R} = \varnothing \). Define a permutation matrix \(P \in \mathbb{R}^{4n\times 4n}\) by

$$\displaystyle{P = \left [\begin{array}{cccc} I_{n}& 0 & 0 & 0 \\ 0 & 0 &I_{n}& 0 \\ 0 &I_{n}& 0 & 0 \\ 0 & 0 & 0 &I_{n} \end{array} \right ].}$$

Then \(P^{T}\left [\begin{matrix}\scriptstyle 0 &\scriptstyle H \\ \scriptstyle H&\scriptstyle 0 \end{matrix}\right ]P\) is a Hamiltonian matrix in \(\mathbb{R}^{4n\times 4n}\). The basic idea is now to employ the decomposition (4.42) in order to make P T HP block-upper triangular. Therefor, let \(\hat{U},\hat{V } \in \mathcal{U}\!\!\mathcal{S}_{2n}\) be as in Proposition 3 such that \(\hat{V }^{T}H\hat{U}\) has the form given in (4.42). Then we apply the periodic QR algorithm to H 2 H 1. From this we obtain orthogonal matrices \(V _{1},V _{2} \in \mathbb{R}^{n\times n}\) such that both, the product

$$\displaystyle{(V _{1}^{T}H_{ 2}V _{2})(V _{2}^{T}H_{ 1}V _{1}) =:\hat{ H}_{2}\hat{H}_{1}}$$

and \(\hat{H}_{2}\), are in upper real Schur form while \(\hat{H}_{1}\) is upper triangular. Define

$$\displaystyle{U_{1}:=\hat{ U}\left [\begin{array}{cc} V _{1} & 0 \\ 0 &V _{1} \end{array} \right ],\quad U_{2}:=\hat{ V }\left [\begin{array}{cc} V _{2} & 0 \\ 0 &V _{2} \end{array} \right ],\quad \mbox{ and}\quad U:= \left [\begin{array}{cc} U_{1} & 0 \\ 0 &U_{2} \end{array} \right ].}$$

Then

$$\displaystyle{B:= P^{T}U^{T}\left [\begin{array}{cc} 0 &H \\ H & 0 \end{array} \right ]\mathit{UP} = \left [\begin{array}{cccc} 0 &\hat{H}_{2} & 0 & \hat{H}_{3}^{T} \\ \hat{H}_{1} & 0 & \hat{H}_{3} & 0 \\ 0 & 0 & 0 & -\hat{ H}_{1}^{T} \\ 0 & 0 & -\hat{ H}_{2}^{T}& 0 \end{array} \right ]}$$

is Hamiltonian and block upper triangular with \(\hat{H}_{1}\) upper triangular, \(\hat{H}_{2}\) in real Schur form, and \(\hat{H}_{3} = V _{2}^{T}(H_{2}H_{3} - H_{3}^{T}H_{2}^{T})V _{1}\).

Now let U 3 be orthogonal such that

$$\displaystyle{ U_{3}^{T}\left [\begin{array}{cc} 0 &\hat{H}_{2} \\ \hat{H}_{1} & 0 \end{array} \right ]U_{3} = \left [\begin{array}{cc} T_{1} & T_{3} \\ 0 & - T_{2} \end{array} \right ] }$$
(4.44)

is in upper real Schur form with \(T_{j} \in \mathbb{R}^{n\times n}\), j = 1, 2, 3, and \(\varLambda (T_{1}) =\varLambda (T_{2}) \subset \mathbb{C}^{+}\). Note that this is possible as the eigenvalues of \(\left [{ \;0\;\;\;\;\hat{H}_{2} \atop \hat{H}_{1}\;\;\;0\,} \right ]\) are exactly those of H 2 and \(\varLambda (H) \cap \imath \mathbb{R} = \varnothing \). Hence,

$$\displaystyle{\tilde{B}:= \left [\begin{array}{cc} U_{3}^{T}& 0 \\ 0 &U_{3}^{T} \end{array} \right ]B\left [\begin{array}{cc} U_{3} & 0 \\ 0 &U_{3} \end{array} \right ] = \left [\begin{array}{cccc} T_{1} & T_{3} & R_{1} & R_{2} \\ 0 & - T_{2} & R_{2}^{T} & R_{3} \\ 0 & 0 & - T_{1}^{T}& 0 \\ 0 & 0 & - T_{3}^{T}&T_{2}^{T} \end{array} \right ]}$$

is in Hamiltonian Schur form. In order to apply Theorem 6, we need to reorder the eigenvalues in the Hamiltonian Schur form such that all eigenvalues in the upper left 2n × 2n block are in the open right half plane. This can be achieved, e.g., by the symplectic re-ordering algorithm due to Byers [43, 44]. With this algorithm it is possible to determine \(\tilde{U} \in \mathcal{U}\!\!\mathcal{S}_{2n}\) such that

$$\displaystyle{\tilde{U}^{T}\tilde{B}\tilde{U} = \left [\begin{array}{cccc} T_{1} & \tilde{T}_{3} & R_{1} & \tilde{R}_{2} \\ 0 &\tilde{T}_{2} & \tilde{R}_{2}^{T} & R_{3} \\ 0 & 0 & - T_{1}^{T}& 0 \\ 0 & 0 & -\tilde{ T}_{3}^{T}& -\tilde{ T}_{2}^{T} \end{array} \right ],\qquad \varLambda (\tilde{T}_{2}) =\varLambda (T_{2}).}$$

Now define

$$\displaystyle{ S:= P^{T}\left [\begin{array}{cc} U_{1} & 0 \\ 0 &U_{2} \end{array} \right ]P\left [\begin{array}{cc} U_{3} & 0 \\ 0 &U_{3} \end{array} \right ]\tilde{U}. }$$
(4.45)

Then \(S \in \mathcal{U}\!\!\mathcal{S}_{4n}\) and

$$\displaystyle{ T:= S^{T}P^{T}\left [\begin{array}{cc} 0 &H \\ H & 0 \end{array} \right ]\mathit{PS} =: \left [\begin{array}{cc} T_{11} & T_{12} \\ 0 & - T_{11}^{T} \end{array} \right ] }$$
(4.46)

is in Hamiltonian Schur form with \(\varLambda (T_{11}) \subset \mathbb{C}^{+}\). Now we can apply Theorem 6 with A replaced by H and R: = T 11.

Corollary 1

Let \(H \in \mathbb{R}^{2n\times 2n}\) be Hamiltonian with \(\varLambda (H) \cap \imath \mathbb{R} = \varnothing \) and let S be as in (4.45) such that (4.46) holds. If \(\mathit{PS}:= \left [\begin{matrix}\scriptstyle S_{11}&\scriptstyle S_{12} \\ \scriptstyle S_{21}&\scriptstyle S_{22}\end{matrix}\right ]\) , with \(S_{\mathit{ij}} \in \mathbb{R}^{2n\times 2n}\) , then the n-dimensional, stable H-invariant subspace is given by \(\mathrm{range}\left (S_{11} - S_{21}\right )\) .

The above transformations yielding S are described in more detail in [27]. The solution of the CARE can be obtained from an orthogonal basis of \(\mathrm{range}\left (S_{11} - S_{21}\right )\) computed by a rank-revealing QR decomposition or directly from S 11S 21; for details see [27]. The latter approach saves a significant amount of work such that the cost of the algorithm described above for computing the stabilizing solution of the CARE (4.23) is approximately 60 % of the cost of the Schur vector method.

Remark 3

The transformation of \(\left [\begin{matrix}\scriptstyle 0 &\scriptstyle \hat{H}_{2} \\ \scriptstyle \hat{H}_{1}&\scriptstyle 0 \end{matrix}\right ]\) to real Schur form and the computation of the matrix U 3 in (4.44) can be efficiently implemented employing the available structure. An algorithm for this is given in [27].

It is shown in [27] that the algorithm presented above is strongly backward stable in \(\mathbb{R}^{4n\times 4n}\). That is, if \(\tilde{S}\) is the analogue to S from (4.45) computed in finite precision arithmetic, then

$$\displaystyle{\tilde{S}^{T}P^{T}\left [\begin{array}{cc} 0 &H \\ H & 0 \end{array} \right ]P\tilde{S} = T+E,}$$

with T as in (4.46), \(\Vert E\Vert _{2} \leq c\varepsilon \Vert H\Vert _{2}\) for a small constant c and \(E \in \mathbb{R}^{4n\times 4n}\) is Hamiltonian. Moreover it is shown in [27] that the computed invariant subspace is as accurate as the maximum of its condition number and the condition number of its complimentary (antistable) H-invariant subspace permit. This is to be expected from the fact that at the same time we compute the stable H-invariant subspace, by Theorem 6 we also compute the antistable H-invariant subspace. In that sense the algorithm is not optimal as we would like the accuracy of the computed subspace to be limited only by its own condition number.

The algorithms described in this section are implemented in Fortran 77, see [19], while an implementation of Van Loan’s method with scaling is provided in [12]. All these methods are also integrated in SLICOT,Footnote 3 the Subroutine LIbrary in COntrol Theory [26]. This subroutine library is based on a joint initiative of European researchers, in which Volker also was a driving force.

The invariant subspace computation as described above turned out to be not completely satisfactory. For some examples with eigenvalues close to the imaginary axis, unexpectedly large errors are encountered in the computed CARE solutions. Almost 10 years later, Volker and co-workers came up with the observation that the orthogonal and symplectic matrix U that puts H 2 into skew-Hamiltonian Schur form contains information that can be used to construct the stable H-invariant subspace in a recursive fashion [49]. This matrix U is obtained from the symplectic URV decomposition (4.42) with the additional step of accumulating the matrices obtained from the periodic QR algorithm applied to \(H_{2}H_{1}\) into U, V. This approach is described in some detail in Sect. 1.4, we therefore refrain from recapitulating it here. An intuitive explanation of the method was given by Watkins in [89], and a block version of the algorithm that can better deal with clusters of eigenvalues is described in [79]. It can be concluded that Volker and co-workers have eventually found a method to solve CAREs via the approach based on invariant subspaces of the Hamiltonian eigenproblem that satisfies all desired properties regarding numerical stability, structure-preservation, and efficiency.

Analogous procedures for solving the DARE via the symplectic pencil approach have not yet been derived. On the other hand, by employing the generalized Cayley transformation approach described in the previous section, one may use the generalizations of this approach to skew-Hamiltonain/Hamiltonian or even/odd matrix pencils described by Volker and co-workers in [15, 24]. Nevertheless, there is still room for further improvement: the methods discussed in [15, 24] extend the approach from [27] rather than the more robust methods from [49, 79]. Also, the available software for structured matrix pencils [33, 34] is still based on the skew-Hamiltonian/Hamiltonian pencil structure rather than the more general structure of even or alternating matrix pencils.

4.2 Defect Correction

In this subsection, we re-visit a topic that is very important for obtaining solutions to CAREs and DAREs at highest possible accuracy. If a numerical solution \(\tilde{X}\) of the CARE (4.23) is computed, this process is prone to roundoff errors, thus \(\tilde{X}\) can only be an approximation to the solution X of (4.23). In particular for algorithms that are not numerically backward stable, like the SR algorithm discussed in Chap. 1 and the previous subsection, methods based on sign and disk functions, but also symplectic QR algorithms in case the final step of obtaining X via \(X = -U_{2}U_{1}^{-1}\) as in (4.25) is ill-conditioned, these roundoff errors may lead to deteriorating accuracy. Improving this approximation may be achieved using Newton’s method, but alternatively also by defect correction. The necessary theory was derived in [80] and [76, Chapter 10]. We will provide this here in some detail, as the defect correction principle has, despite its importance for practice, received little attention since, and in particular new computing platforms including hardware accelerators may use this principle for obtaining fast and reliable algorithms, as has been already suggested for Lyapunov equation in [18].

The defect correction principle is most easily explained for linear systems of equations. Given an approximate solution \(\tilde{x}\) of the underlying problem

$$\displaystyle{\mathit{Ax} = b,}$$

given \(A \in \mathbb{R}^{n\times n}\) nonsingular, \(b \in \mathbb{R}^{n}\), we determine the residual \(\;r = b - A\tilde{x}\), then compute a solution δ of the linear system

$$\displaystyle{\;A\delta = r,}$$

and set

$$\displaystyle{x^{+} =\tilde{ x} +\delta.}$$

This process can be iterated until a given stopping criterion is satisfied. Defect correction is successful as long as the residual can be computed accurately enough, i.e., with higher precision than \(\tilde{x}\). Such higher precision may be obtained, e.g., using double precision in case \(\tilde{x}\) was computed in single precision. There is no need to compute \(\tilde{x}\) and δ by the same algorithm, providing great flexibility to the defect correction principle.

The basis for applying defect correction to CAREs is provided by the following theorem due to Mehrmann/Tan [80]

Theorem 7

Let X = X T be a solution of (4.23) , and \(\tilde{X}\) a symmetric approximation to this solution. Define \(P = X -\tilde{ X},\;\tilde{A} = A - G\tilde{X}\) , and let

$$\displaystyle{\tilde{F} = F + A^{T}\tilde{X} +\tilde{ X}A -\tilde{ X}G\tilde{X}}$$

be the residual of (4.23) with respect to \(\tilde{X}\) . Then P satisfies the CARE

$$\displaystyle{ 0 =\tilde{ F} + P\tilde{A} +\tilde{ A}^{T}P -\mathit{PGP}. }$$
(4.47)

The theorem is proved by simple algebraic manipulations after inserting \(X =\tilde{ X} + P\) in (4.23).

Due to the non-uniqueness of CARE solutions, it is important to guarantee that the updated approximate solution \(\tilde{X} + P\) is still the one of interest, usually the stabilizing solution in control applications. In order to show that this is indeed the case, first it is necessary to check whether the stabilizability property related to (4.23) carries over to (4.47).

Lemma 4

Given (4.23) with (A,G) stabilizable, i.e.,

$$\displaystyle{\mathop{\mathrm{rank}}\nolimits [\lambda I - A,\;G] = n\quad \forall \;\lambda \in \mathbb{C}:\, \mathfrak{R}(\lambda ) \geq 0.}$$

Then \(\tilde{A} = A - G\tilde{X}\) is stabilizable as well.

Proof

$$\displaystyle\begin{array}{rcl} \mathop{\mathrm{rank}}\nolimits [\lambda I_{n} -\tilde{ A},G]& =& \mathop{\mathrm{rank}}\nolimits [\lambda I_{n} - A + G\tilde{X},G] {}\\ & =& \mathop{\mathrm{rank}}\nolimits \left [[\lambda I_{n} - A,G]\left [\begin{array}{cc} I_{n}& 0 \\ \tilde{X} &I_{n} \end{array} \right ]\right ]\ =\ \mathop{\mathrm{rank}}\nolimits [\lambda I_{n} - A,G].\qquad \qquad {}\\ \end{array}$$

 □ 

As we have seen before, the positive semidefinite solution of the CARE (4.23) can be obtained from the invariant subspace of the Hamiltonian matrix

$$\displaystyle{H = \left [\begin{array}{*{10}c} A& G\\ F &-A^{T} \end{array} \right ]}$$

corresponding to the eigenvalues in \(\mathbb{C}^{-}\). Forming the Hamiltonian matrix corresponding to the defect CARE (4.47),

$$\displaystyle{\tilde{H} = \left [\begin{array}{cc} \tilde{A}& G\\ \tilde{F} & -\tilde{ A}^{T} \end{array} \right ],}$$

we see that \(\tilde{H} = Q^{-1}\mathit{HQ}\) using

$$\displaystyle{Q = \left [\begin{array}{cc} I &0\\ -\tilde{ X} &I \end{array} \right ].}$$

This immediately implies that \(\varLambda (H) =\varLambda (\tilde{H})\), so that also \(\tilde{H}\) has exactly n eigenvalues with negative real parts. Now assume the \(\tilde{H}\)–invariant subspace corresponding to these eigenvalues is spanned by \(\left [\begin{array}{c} Z_{1} \\ Z_{2} \end{array} \right ] \in \mathbb{R}^{2n\times n}\), that is,

$$\displaystyle{ \left [\begin{array}{cc} \tilde{A}& G\\ \tilde{F} & -\tilde{ A}^{T} \end{array} \right ]\left [\begin{array}{c} Z_{1} \\ Z_{2} \end{array} \right ] = \left [\begin{array}{c} Z_{1} \\ Z_{2} \end{array} \right ]Z, }$$
(4.48)

where all eigenvalues of \(Z \in \mathbb{R}^{n\times n}\) have negative real parts.

By Lemma 4, \((\tilde{A},G)\) is stabilizable. Using standard arguments of control theory [52], it can be shown then that Z 1 is invertible and that \(Z_{2}Z_{1}^{-1}\) is symmetric negative semidefinite. Inserting \(P:= -Z_{2}Z_{1}^{-1}\) in (4.48), we obtain

$$\displaystyle{ \left [\begin{array}{cc} \tilde{A}& G\\ \tilde{F} & -\tilde{ A}^{T} \end{array} \right ]\left [\begin{array}{c} I\\ -P \end{array} \right ] = \left [\begin{array}{c} I\\ -P \end{array} \right ]\tilde{Z}, }$$
(4.49)

where \(\tilde{Z} = Z_{1}ZZ_{1}^{-1}\) has only eigenvalues with negative real parts, too.

Expanding terms in (4.49), the first row yields

$$\displaystyle{ \tilde{A} -\mathit{GP} =\tilde{ Z}, }$$
(4.50)

implying that P is stabilizing, and from the second row,

$$\displaystyle{\tilde{F} +\tilde{ A}^{T}P = -P\tilde{Z} = -P\tilde{A} + \mathit{PGP},}$$

we see that P is a solution of (4.47). In summary, P must be the unique symmetric positive semidefinite solution of (4.47).

Now with \(\tilde{A} = A - G\tilde{X}\) it follows from (4.50), that

$$\displaystyle{A - G(\tilde{X} + P) =\tilde{ Z}}$$

is stable, so that in summary we obtain the following result:

Theorem 8

Suppose the invariant subspace of the Hamiltonian matrix

$$\displaystyle{\tilde{H} = \left [\begin{array}{cc} \tilde{A}& G\\ \tilde{F} & -\tilde{ A}^{T} \end{array} \right ]}$$

corresponding to (4.47) is spanned by the columns of \(\left [\begin{matrix}\scriptstyle Z_{1} \\ \scriptstyle Z_{2}\end{matrix}\right ] \in \mathbb{R}^{2n\times n}\) , and \(P = -Z_{2}Z_{1}^{-1}\) , then \(\tilde{X} + P\) is the unique symmetric positive semidefinite solution of (4.47) .

Based on Theorems 7 and 8, we may now formulate a defect correction algorithm for CAREs.

Algorithm 2 Defect correction for the CARE (4.23)

As there are no specifications given for the methods employed in Steps 1 and 3 of Algorithm 2, one could use any numerical method to solve the CAREs, and possibly different ones in Steps 1 and 3. For instance, one could simply use a fast, but potentially unstable, method as the SR algorithm, as there is no need to have an \(\tilde{X}\) of high accuracy. As \(\|P\|\) will be very small in general, the quadratic term in the defect CARE is basically negligible, Newton’s method is a natural choice in Step 3. Another option is to employ an algorithm like the orthogonal symplectic multishift algorithm in which part of the computations from Step 1 can be re-used in Step 3, so that the cost in Step 3 can be reduced. Such a variant is discussed in [2, 8].

The Algorithm DC_CARE is numerically backward stable in the sense that \(\tilde{X}\) is the exact solution of the following CARE with perturbed data:

$$\displaystyle{0 = F -\tilde{ F} + A^{T}X + \mathit{XA} -\mathit{XGX},}$$

where \(\tilde{F}\) is the residual from Theorem 7. If the leading significant digits of the defect P in Step 3 are computed correctly, the approximate solution \(\tilde{X}\) converges to the exact stabilizing solution X of (4.23). In practice, this will lead to reduced accuracy of the residual \(\tilde{F}\) due to cancelation which leads to a limitation of the obtainable accuracy. Nevertheless, the accuracy of the CARE is often greatly improved by 1–2 steps of defect correction, as several examples in [8] indicate.

A defect correction procedure for the DARE (4.24) can be derived in a completely analogous fashion; see [80] and [76, Chapter 10]. An interesting aspect of further research would be to derive a mixed precision CPU-GPU variant of Algorithm DC_CARE in the fashion of [18]. Also, in case of large-scale sparse solvers for CAREs as recently reviewed in [32], where low-rank approximations to the stabilizing solution are computed, it would be necessary to be able to represent \(\tilde{F}\) in low-rank format to use this concept. Whether this is possible or not is an open problem.

4.3 Other Contributions to the Numerical Solution of AREs

Volker’s many contributions to the numerical solution of AREs go well beyond the described methods based on structured eigenproblems. A classical solution method of AREs is Newton’s method, given their nature as nonlinear systems of equations. Volker has contributed to the convergence theory of Newton’s method applied to the generalized CAREs (4.7) and DAREs (4.8), see [76, § 11]. He attributes these contributions mainly to Ludwig Elsner, but they had not been published before.

A notable contribution to understanding why and when the sign function method for CAREs [45] can be expected to yield accurate solutions was the perturbation analysis derived in [46].

Recently, Volker together with Federico Poloni has explored the use of permuted graph matrices to represent invariant and deflating subspaces of Hamiltonian matrices and pencils. This has often a tremendously positive effect on the numerical accuracy of iterative methods based on the inverse-free iteration/disk function method [5, 11, 31] and doubling-like algorithms [50, 51]. See the Chap. 5 for further details and references.

An important aspect in deriving numerical methods for AREs is to test them on challenging examples and to compare the performance to existing methods using well-defined benchmarks. Volker was the driving force in establishing benchmark collections for CAREs [20] and DAREs [21], see [22] for an overview. This had a very positive effect on the publication attitudes of new numerical methods for AREs as non competitive methods can be identified easily since these benchmark collections became available. Later on, he also inspired a number of other benchmark collections for computational control problems that became part of the SLICOT project, see also [25].

Together with Petko Petkov and Mihail Konstantinov, Volker also contributed and tested mathematical software for solving CAREs [84, 85].

Although Volker had not published on solving large-scale AREs until recently [78], he inspired much of the work in this area by putting his Ph.D. student Thilo Penzl on this track. His thesis [83] is now considered the starting point for many of the currently used methods for large-scale matrix equations, see also the later paper [23] that was published only 8 years after Thilo’s unfortunate passing in 1999 due to a tragic accident. A survey on the developments of this prospering field was recently given in [32].

5 Avoiding AREs

Already in his early work related to his habilitation thesis and the book [76], Volker often made the point that in solving LQR problems, it may not be necessary to solve AREs explicitly. The basic idea of this can be presented using the continuous-time LQR problem. Borrowing the most recent representation, we can associate an even matrix pencil to the LQR problem (4.1) for the descriptor or LTI system (4.2):

$$\displaystyle{ \lambda K-L:=\lambda \left [\begin{array}{*{10}c} 0 &E &0 \\ -E^{T}& 0 &0\\ 0 & 0 &0 \end{array} \right ]-\left [\begin{array}{*{10}c} 0 & A & B \\ A^{T} &C^{T}QC &-C^{T}S \\ B^{T}& S^{T}C & R \end{array} \right ]. }$$
(4.51)

This matrix pencil considered as a matrix polynomial has the leading coefficient skew-symmetric, the constant term is symmetric, and therefore is skew-symmetric/symmetric, or more general, an alternating matrix polynomial, see the Chaps. 2, 6, and 12 for more on these structures. Assuming for simplicity E, R nonsingular, the key observation now is that if the feedback solution u(t) = F c x(t) is sought, and an n-dimensional stable deflating subspace of the even pencil λ KL from (4.51) exists, spanned by the columns of \(\left [V ^{T},\,U^{T},\,W^{T}\right ]^{T}\) with \(U \in \mathbb{R}^{n\times n}\) invertible and V, W of size according to (4.51), then, following [76, § 6], we get

$$\displaystyle{F_{c} = -R^{-1}(B^{T}X_{ c}E + S^{T}C) = \mathit{WU}^{-1}.}$$

Hence, the optimal feedback control law can be determined without computing the CARE solution! The latter is determined via \(X_{c}E = \mathit{VU}^{-1}\) in this setting. Working with the even matrix pencil in (4.1) has the additional advantage that no linear systems with R need to be solved in forming the coefficients and thus, rounding errors in forming these coefficients are avoided [14]. This principle also carries over to the discrete-time case, see again [76, § 6] for the unstructured setting.

Avoiding the solution of AREs is even more desirable in H control. We will not go into the details of this problem, that has become a paradigm in robust control. It suffices to understand that the H -optimal controller is usually determined using the so called γ-iteration and subsequent controller formulas based on the output of this iteration. The γ-iteration is classically formulated in such a way that (in the continuous-time case) two CAREs need to be solved in each iteration step, and a spectral condition of their product is checked, see, e.g., [57, 90]. The crux is that these CAREs usually have indefinite quadratic terms which does not allow to solve them with Newton’s method, and the eigenvalues of the associated Hamiltonian matrices are often close to the imaginary axis (in particular close to the “optimal” γ) which makes them difficult to solve by methods based on invariant or deflating subspaces of the associated Hamiltonian matrices. Moreover, often their norms tend to infinity which yields poorly scaled problems, implying additional numerical difficulties. For all these reasons, it was desirable to avoid the CAREs in the process. Volker and co-workers were able to derive a numerically robust method that achieves this, see [16, 17] for the LTI case and [73] for the descriptor case. Later, also controller formula based on this approach were presented in [13]. For details, we refer to [24], where also other applications of using extended pencil formulations in even or alternating form are discussed.

6 Concluding Remarks

Over the last 20 years, Volker and I have collaborated on methods for algebraic Riccati equations, and on how to avoid them. We often had differing opinions on whether or not the ARE is the right concept to use for solving certain control problems. These were always inspiring and fruitful discussions, and they certainly have been driving both of us to improve methods and ideas more and more. Today, we basically agree on which problems should be solved using AREs, and where one should avoid them by using structured matrix pencil methods as discussed, e.g., in Chaps. 1–3, and 5. Therefore, I foresee interesting research tasks in both directions for certainly another decade and more, as with increasing computer power, model complexity in control engineering problems is increasing, and will require new ideas and further development of the methods for DREs, AREs, and the related pencil problems at hand. I truly hope Volker and I will also be part of this development. In writing this chapter, a number of open problems and ideas already evolved, and I hope some of these can lead to improved methods and further inside in the theoretical and numerical treatment of DREs and AREs.