1 Introduction

We consider the numerical solution of large scale Lyapunov equations of the form

$$\begin{aligned} AX+XA^T=-BB^T, \end{aligned}$$
(1.1)

where \(A\in \mathbb {R}^{n\times n},~B\in \mathbb {R}^{n\times r}\). Lyapunov equations play a fundamental role in many areas of applications, such as control theory, model reduction and signal processing, see, e.g. [2, 14]. Here we assume that the spectrum of A, \(\varLambda (A)\), lies in the open left half plane, e.g. \(\varLambda (A)\subset \mathbb {C}_-\) and that the right hand side is of low rank, e.g. \(r\ll n\), which is often the case in practice. A large and growing amount of literature considers the solution to (1.1), see [45] and references therein for an overview of developments and methods.

The solution matrix X of (1.1) is, in general, dense, making it virtually impossible to store it for large dimensions n. For a low-rank right hand side, however, the solution X often has a very small numerical rank [3, 5, 25, 36, 40, 51] and many algorithms have been developed that approximate X by a low-rank matrix \(X\approx ZZ^T\), where \(Z\in \mathbb {R}^{n\times s}\), \(s\ll n\). Important low-rank algorithms are, for instance, projection type methods based on Krylov subspaces [20, 21, 28, 43, 44] and low-rank alternating directions implicit (ADI) methods [12,13,14, 30, 33, 36].

This paper considers both the rational Krylov subspace method (from the family of projection type methods) and the low-rank ADI method for the low rank solution to (1.1). One of the computationally most expensive parts in both methods is that, in each iteration step, shifted linear systems of the form

$$\begin{aligned} (A-\sigma I)y=z,\quad z\in \mathbb {R}^{n\times r}, \end{aligned}$$

need to be solved, where the shift \(\sigma \) is usually variable and both the shift \(\sigma \) and the right hand side z depend on the particular algorithm used. Normally these linear systems are solved by sparse-direct or iterative methods. When iterative methods, such as preconditioned Krylov subspace methods, are used to solve the linear systems, then these solves are implemented inexactly and we obtain a so-called inner-outer iterative method (sometimes the term “inexact method” is used). The outer method is (in our case) a rational Krylov subspace method or a low-rank ADI iteration. The inner problem is the iterative solution to the linear systems. The inner solves are often carried out at least as accurately as the required solution accuracy for the Lyapunov equation (cf., e.g., the numerical experiments in [20]), usually in terms of the associated Lyapunov residual norms. It turns out that this is not necessary, and the underlying theory is the main contribution of this paper.

Inexact methods have been considered for the solution to linear systems and eigenvalue problems (see [17, 23, 42, 47, 53] and references therein). One focus has been on inexact inverse iteration and similar methods, where, in general, the accuracy of the inexact solve has to be increased to obtain convergence [22]. For subspace expansion methods, such as the Krylov methods considered in [17, 24, 42, 47, 53], it has been observed numerically and shown theoretically that it is possible to relax the solve tolerance as the outer iteration proceeds.

In this paper we show that for both the rational Krylov subspace method and the low-rank ADI algorithm we can terminate the inner iteration before full precision has been reached (in a way that depends on the outer iteration accuracy) and still obtain very good accuracy in the final solution to the Lyapunov equation. We develop a strategy which shows that we can relax (e.g. increase) the stopping tolerance within the inner solve as the outer iteration proceeds. The key approach is to consider the so-called “residual gap” between the exact and inexact methods for solving Lyapunov equations and show that it depends on the linear system accuracy. Using this connection between inner and outer solve accuracy we can derive relaxation criteria for inexact solves to Lyapunov equations, which is the main contribution of this article.

This is a similar behavior as observed for Krylov methods with inexact matrix-vector products applied to eigenvalue methods [23, 42], linear systems [17, 47, 53], and matrix functions [24]. We provide practical relaxation strategies for both methods and give numerical examples which support the theory, and show that the amount of computation time can be significantly reduced by using these strategies.

The paper is organised as follows. In Sect. 2 we review results about rational Krylov subspace methods. Those are used to show important properties about Galerkin projections. A new inexact rational Arnoldi decomposition is derived, extending the theory of [37]. We show in Theorem 2.1, Corollaries 2.1 and 2.2 that the entries of the solution to the projected Lyapunov equation have a decreasing pattern, as seen in Fig. 1. This crucial novel result is then used to demonstrate that, in the inexact rational Krylov subspace method, the linear system solve can be relaxed, in a way inversely proportional to this pattern. Section 3 is devoted to low-rank ADI methods. We show that the low-rank factors arising within the inexact ADI iteration satisfy an Arnoldi like decomposition in Theorem 3.2. This theory is again new and significant for deriving a specially tailored relaxation strategy for inexact low-rank ADI methods in Theorem 3.3. Finally, in Sect. 4 we test several practical relaxation strategies and provide numerical evidence for our findings. Our examples show that, in particular for very large problems, we can save up to half the number of inner iterations within both methods for solving Lyapunov equations.

Notation Throughout the paper, if not stated otherwise, we use \(\Vert \cdot \Vert \) to denote the Euclidean vector and associated induced matrix norm. \((\cdot )^*\) stands for the transpose and complex conjugate transpose for real and, respectively, complex matrices and vectors. The identity matrix of dimension k is denoted by \(I_k\) with the subscript omitted if the dimension is clear from the context. The jth column of the identity matrix is \(e_j\). The spectrum of a matrix A and a matrix pair (AM) is denoted by \(\varLambda (A)\) and \(\varLambda (A,M)\), respectively. The smallest (largest) eigenvalues and singular values of a matrix X are denoted by \(\lambda _{\min }(X)\) (\(\lambda _{\max }(X)\)) and \(\sigma _{\min }(X)\) (\(\sigma _{\max }(X))\).

2 Rational Krylov subspace methods and inexact solves

2.1 Introduction and rational Arnoldi decompositions

Projection methods for (1.1) follow a Galerkin principle similar to, e.g., the Arnoldi method for eigenvalue computations or linear systems (in this case called full orthogonalization method). Let \(\mathscr {Q}=\mathop {\mathrm {range}}\left( Q \right) \subset \mathbb {C}^{n}\) with a matrix \(Q\in \mathbb {C}^{n\times k}\), \(k\ll n\) whose columns form an orthonormal basis of \(\mathscr {Q}\): \(Q^*Q=I_k\). We look for low-rank approximate solutions to (1.1) in the form \(Q\tilde{X}Q^*\) with \(\tilde{X}=\tilde{X}^*\in \mathbb {C}^{k\times k}\), i.e.

$$\begin{aligned} X\in \mathscr {Z}(\mathscr {Q}):=\lbrace Q\tilde{X}Q^*\in \mathbb {C}^{n\times n},~\tilde{X}=\tilde{X}^*,~\mathop {\mathrm {range}}\left( Q \right) =\mathscr {Q}\rbrace . \end{aligned}$$

The Lyapunov residual matrix for approximate solutions \(X\in \mathscr {Z}(\mathscr {Q})\) is

$$\begin{aligned} R:=R(Q\tilde{X}Q^*)=A(Q\tilde{X}Q^*)+(Q\tilde{X}Q^*)A^*+BB^*. \end{aligned}$$
(2.1)

Imposing a Galerkin condition \(R\perp \mathscr {Z}\) [28, 38] leads to the projected Lyapunov equation

$$\begin{aligned} T \tilde{X}+\tilde{X}T^*+Q^*BB^*Q=0,\quad T:= Q^*AQ, \end{aligned}$$
(2.2)

i.e., \(\tilde{X}\) is the solution of a small, k-dimensional Lyapunov equation which can be solved by algorithms employing dense numerical linear algebra, e.g., the Bartels–Stewart method [4]. Throughout this article we assume that the compressed problem (2.2) admits a unique solution which is ensured by \(\varLambda (T)\cap -\varLambda (T)=\emptyset \), which holds, for example, when \(\varLambda (T)\subset \mathbb {C}_-\). A sufficient condition for ensuring \(\varLambda (T)\subset \mathbb {C}_-\) is \(A+A^*<0\). In practice, however, this condition is rather restrictive because, on the one hand, often \(\varLambda (T)\subset \mathbb {C}_-\) holds even for indefinite \(A+A^*\) and, one the other hand, the discussed projection methods will still work if the restriction T has eigenvalues in \(\mathbb {C}_+\) provided that \(\varLambda (T)\cap -\varLambda (T)=\emptyset \) holds.

Usually, one produces sequences of subspaces of increasing dimensions in an iterative manner, e.g. \(\mathscr {Q}_1\subseteq \mathscr {Q}_2\subseteq \ldots \subseteq \mathscr {Q}_j\). For practical problems, using the standard (block) Krylov subspace

$$\begin{aligned} \mathscr {Q}_j=\mathscr {K}_j(A,B)=\mathop {\mathrm {range}}\left( \left[ B,AB,\ldots ,A^{j-1}B\right] \right) \end{aligned}$$
(2.3)

is not sufficient and leads to a slowly converging process and, hence, large low-rank solution factors. A better performance can in most cases be achieved by using rational Krylov subspaces which we use here in the form

$$\begin{aligned} \mathscr {Q}_j=\mathscr {K}^{\text {rat}}_j(A,B,\varvec{\xi }):=\mathop {\mathrm {range}}\left( \left[ B,(A-\xi _2 I)^{-1}B,\ldots ,\prod \limits _{i=2}^j(A-\xi _i I)^{-1}B\right] \right) , \end{aligned}$$
(2.4)

with shifts \(\xi _i\in \mathbb {C}_+\) (hence \(\xi _i\notin \varLambda (A)\)), \(i = 2,\ldots ,j\). An orthonormal basis for \(\mathscr {K}^{\text {rat}}_j\) can be computed by the (block) rational Arnoldi process [37] leading to \(Q_j\in \mathbb {C}^{n\times jr}\) of full column rank. The resulting rational Krylov subspace method (RKSM) for computing approximate solutions to (1.1) is given in Algorithm 1. The shift parameters are crucial for a rapid reduction of the Lyapunov residual norm \(\Vert R_j\Vert :=\Vert R(Q_{j}Y_{j}Q_{j}^*)\Vert \) (cf. (2.1)) and can be generated a-priori or adaptively in the course of the iteration [21]. The extension of the orthonormal basis \(\mathop {\mathrm {range}}\left( Q_j \right) \) by w in Line 6 of Algorithm 1 should be done by a robust orthogonalization process, e.g., using a repeated (block) Gram–Schmidt process. The orthogonalization coefficients are accumulated in the block Hessenberg matrix \(H_j=[h_{i,k}]\in \mathbb {C}^{jr\times jr}\), \(h_{i,k}\in \mathbb {C}^{r\times r}\), \(i,k=1,\ldots ,j\), \(i<k+2\). If the new basis blocks are normalized using a thin QR factorization, then the \(h_{i+1,i}\) matrices are upper triangular. In the following we summarize properties of the RKSM method, in particular known results about the rational Arnoldi decomposition and the Lyapunov residual, which will be crucial later on. We simplify this discussion of the rational Arnoldi process and Algorithm 1 to the case \(r=1\). From there, one can generalize to the situation \(r>1\).

figure a

Assuming at first that the linear systems in Line 3 of Algorithm 1 are solved exactly, then, after j steps of Algorithm 1, the generated quantities satisfy a rational Arnoldi decomposition [21, 27, 37]

$$\begin{aligned} AQ_j&=Q_jT_j+g_jh_{j+1,j}e_{j}^*H_j^{-1},\quad g_j:=q_{j+1}\xi _{j+1}-(I-Q_jQ_j^*)A q_{j+1}, \end{aligned}$$
(2.5)

which can also be expressed as

$$\begin{aligned}&AQ_{j+1}H_{j+1,j}=Q_{j+1}M_{j+1,j},\quad M_{j+1,j}:=\left[ \begin{array}{l} I+H_jD_j\\ \xi _{j+1}h_{j+1,j}e_{j}^*\end{array}\right] , \end{aligned}$$
(2.6)

where \(H_{j+1,j}\) and \(M_{j+1,j}\) are \((j+1)\times j\) matrices, \(D_j={\text {diag}}\!\left( \xi _2,\ldots ,\xi _{j+1}\right) \), \(H_j\) and \(D_j\) are square matrices of size j, see, e.g., [16, 37]. Here we generally use \(H_j =H_{j,j}\) for simplicity. From the relations (2.5) and (2.6) it follows that the restriction of A onto \(\mathscr {K}^{\text {rat}}_j\) can be written as

$$\begin{aligned} T_j&=Q_j^*(AQ_j)=(I+H_jD_j)H_j^{-1}-Q_j^*Aq_{j+1}h_{j+1,j}e_j^*H_j^{-1}, \end{aligned}$$
(2.7)

which enables to compute the restriction \(T_j\) efficiently in terms of memory usage. The restriction \(T_j\) can also be associated with the Hessenberg–Hessenberg pair \((M_{j,j},H_{j,j})\) [16, 37].

The final part of this section discusses how the Lyapunov residual in Line 9 of Algorithm 1 can be computed efficiently. If the projected Lyapunov equation in Line 8

$$\begin{aligned} T_jY_j+Y_jT_j^*+\tilde{B}_j\tilde{B}_j^*&=0 \end{aligned}$$
(2.8)

is solved for \(Y_j\), then for the Lyapunov residual (2.1) after j steps of the RKSM, we have

$$\begin{aligned} R_j= R(Q_jY_j Q_j^*)&=F_j+F_j^*, \end{aligned}$$
(2.9a)

with

$$\begin{aligned} F_j&:=L_jQ_j^*\in \mathbb {C}^{n\times n},\quad L_j:=g_jh_{j+1,j}e_{j}^*H_j^{-1}Y_j\in \mathbb {C}^{n\times j}, \end{aligned}$$
(2.9b)

as was shown in [21, 44]. The term \(L_j\) is sometimes referred to as “semi-residual”. Since \(g_j\in \mathbb {C}^{n}\), \(h_{j+1,j}e_{j}^*H_j^{-1}Y_j\in \mathbb {C}^{1\times j}\), the matrix \(F_j\) is of rank one. In general for \(B\in \mathbb {R}^{n\times r}\), \(r>1\) and a block-RKSM, \(F_j\) is of rank r.

Note that relation (2.9a) is common for projection methods for (1.1), but the special structure of \(L_j\) in (2.9b) arises from the rational Arnoldi process. Because \(g_j^*Q_j=0\) we have that \(F_j^2=0\) and \(\Vert R_j\Vert =\Vert F_j\Vert =\Vert L_j\Vert \) (using the relationship between the spectral norm and the largest eigenvalues/singular values). This enables an efficient way for computing the norm of the Lyapunov residual \(R_j\) via

$$\begin{aligned} \Vert R_j\Vert&=\Vert \Vert g_j\Vert h_{j+1,j}e_{j}^*H_j^{-1}Y_j\Vert , \end{aligned}$$
(2.10)

for \(g_j\in \mathbb {C}^{n}\).

2.2 The inexact rational Arnoldi method

The solution of the linear systems for \(w_{j+1}\) in each step of RKSM (line 3 in Algorithm 1) is one of the computationally most expensive stages of the rational Arnoldi method. In this section we investigate inexact solves of this linear systems, e.g., by iterative Krylov subspace methods, but we assume that this is the only source of inaccuracy in the algorithm. Clearly, some of the above properties do not hold anymore if the linear systems are solved inaccurately. Let

$$\begin{aligned} s_j:=q_j-(A-\xi _{j+1} I)\tilde{w}_{j+1},\quad \text {with}\quad \Vert s_j\Vert \le \tau ^{\text {LS}}_j \end{aligned}$$

be the residual vectors with respect to the linear systems and inexact solution vectors \(\tilde{w}_{j+1}\) with \(\tau ^{\text {LS}}_j<1\) being the solve tolerance of the linear system at step j of the rational Arnoldi method.

The derivations in [37], [21, Proof of Prop. 4.2.] can be modified with

$$\begin{aligned} q_j&=(A-\xi _{j+1}I)Q_{j+1}h_{1:j+1,j}+s_j, \end{aligned}$$

in order to obtain

$$\begin{aligned} AQ_{j+1}H_{j+1,j}=Q_{j+1}M_{j+1,j}-S_j,~S_j:=[s_1,\ldots ,s_j], \end{aligned}$$

where \(M_{j+1,j}\) is as in (2.6). Note that for \(S_j=0\) we recover (2.6), as expected. Hence \(AQ_jH_j=[Q_j(I+H_jD_j)+(\xi _{j+1}q_{j+1}-Aq_{j+1})e^*_jh_{j+1,j}]-S_j\) which leads to the perturbed rational Arnoldi relation

$$\begin{aligned} AQ_j&=Q_jT^{\text {impl.}}_j+g_je_j^*h_{j+1,j}H_j^{-1}-S_jH_j^{-1}, \end{aligned}$$
(2.11)

where \(T^{\text {impl.}}_j:=[(I+H_jD_j)-Q_j^*Aq_{j+1})e^*_jh_{j+1,j}]H_j^{-1}\) marks the restriction of A by the implicit formula (2.7). However, the explicit restriction of A can be written as

$$\begin{aligned} \begin{aligned} T^{\text {expl.}}_j&:=Q_j^*(AQ_j)=T^{\text {impl.}}_j-Q_j^*S_jH_j^{-1},\\&=((I+H_jD_j)-Q_j^*Aq_{j+1})e^*_jh_{j+1,j}-Q_j^*S_j)H_j^{-1} \end{aligned} \end{aligned}$$
(2.12)

which highlights the problem that the implicit (computed) restriction \(T^{\text {impl.}}_j\) from (2.7) is not the true restriction of A onto \(\mathop {\mathrm {range}}\left( Q_j \right) \). In fact, the above derivations reveal that

$$\begin{aligned} T^{\text {impl.}}_j=Q_j^*(A+E_j)Q_j,\quad E_j:=S_jH_j^{-1}Q_j^*, \end{aligned}$$

i.e., the implicit restriction (2.7) is the exact restriction of a perturbation of A [26, 31].

Similar to [26] we prefer to use \(T^{\text {expl.}}_j\) for defining the projected problem as this keeps the whole process slightly closer to the original matrix A. Moreover, (2.12) reveals that unlike \(T^{\text {impl.}}\), the explicit restriction \(T^{\text {expl.}}_j\) is not connected to a Hessenberg–Hessenberg pair because the term \(Q_j^*S_j\) has no Hessenberg structure. Computing \(T^{\text {expl.}}_j\) by either (2.12) or explicitly generating \(T^{\text {expl.}}_j=Q_j^*(AQ_j)\) by adding new columns and rows to \(T^{\text {expl.}}_{j-1}\) will double the memory requirements because an additional \(n\times j\) array, \(S_j\in \mathbb {C}^{n\times j}\) or \(W_j:=AQ_j\), has to be stored. Hence, the per-step storage requirements are similar to the extended Krylov subspace method (see, e.g., [43] and the corresponding paragraph in Sect. 2.5). Alternatively, if matrix vector products with \(A^*\) are available, we can also update the restriction

$$\begin{aligned} T_j^{\text {expl.}}=Q_j^*(AQ_j)=\left[ \begin{array}{l@{\quad }l} T^{\text {expl.}}_{j-1}&{} Q_{j-1}^*(Aq_j)\\ (z_j)^*Q_{j-1}&{} q_{j}^*(Aq_j) \end{array}\right] ,\quad z_j:=A^*q_j, \end{aligned}$$

thereby bypassing the increased storage requirements. We will use this approach in our numerical experiments. Trivially, we may also compute \(T_j^{\text {expl.}}\) from scratch row/columnwise, requiring j matrix vector products with A but circumventing the storage increase. Using \(T^{\text {expl.}}_j\) leads to the inexact rational Arnoldi relation

$$\begin{aligned} AQ_j&=Q_jT^{\text {expl.}}_j+\tilde{g}_jH_j^{-1},\quad \tilde{g}_j:=g_jh_{j+1,j}e_j^*-(I-Q_jQ_j^*)S_j, \end{aligned}$$
(2.13)

which we employ in the subsequent investigations.

Lemma 2.1

Consider the approximate solution \(Q_jY_jQ_j^*\) of (1.1) after j iterations of inexact RKSM, where \(Y_j\) solves the projected Lyapunov equation (2.8) defined by either \(T^{\text {expl.}}_j\) or \(T^{\text {impl.}}_j\). Then the true Lyapunov residual matrix \(R^{\text {true}}_j=R(Q_jY_jQ_j^*)\) can be written in the following forms.

  1. (a)

    If \(T^{\text {expl.}}_j\) is used it holds

    $$\begin{aligned} R^{\text {true}}_j= & {} F^{\text {expl.}}_j+(F^{\text {expl.}}_j)^*,\quad F^{\text {expl.}}_j:=\tilde{g}_jH_j^{-1}Y_jQ_j^*\nonumber \\= & {} F_j-(I-Q_jQ_j^*)S_jH_j^{-1}Y_jQ_j^*, \end{aligned}$$
    (2.14a)
  2. (b)

    and, if otherwise \(T^{\text {impl.}}_j\) is used, it holds

    $$\begin{aligned} R^{\text {true}}_j&=F^{\text {impl.}}_j+(F^{\text {impl.}}_j)^*,\quad F^{\text {impl.}}_j:=F_j-S_jH_j^{-1}Y_jQ_j^*. \end{aligned}$$
    (2.14b)

Proof

For case (a), using (2.13) immediately yields

$$\begin{aligned} R^{\text {true}}_j =&R(Q_jY_j Q_j^*)=AQ_jY_jQ_j^*+Q_jY_jQ_j^*A^*+BB^*\\ =&[Q_jT^{\text {expl.}}_j+\tilde{g}_jH_j^{-1}]Y_jQ_j^*+Q_jY_j[Q_jT^{\text {expl.}}_j\\&+\tilde{g}_jH_j^{-1}]^*+Q_jQ_j^*BB^*Q_jQ_j^*\\ =&Q_j\left[ T^{\text {expl.}}_jY_j+Y_j(T^{\text {expl.}}_j)^*+Q_j^*BB^*Q_j\right] Q_j^*\\&+\tilde{g}_jH_j^{-1}Y_jQ_j^*+Q_jY_jH_j^{-*}\tilde{g}^*_j=F^{\text {expl.}}_j+(F^{\text {expl.}}_j)^*. \end{aligned}$$

Case (b) follows similarly using (2.11). \(\square \)

Hence in both cases the true residual \(R^{\text {true}}_j\) is a perturbation of the computed residual given by (2.9a)–(2.9b) which we denote in the remainder by \(R^{\text {comp.}}_j\). In case \(T^{\text {expl.}}_j\) is used, since \(Q_j\perp \tilde{g}_j\) one can easily see that \(\Vert R^{\text {true}}_j\Vert =\Vert F^{\text {expl.}}_j\Vert =\Vert \tilde{g}_jH_j^{-1}Y_j\Vert \), a property not shared when using \(T^{\text {impl.}}_j\).

Our next step is to analyze the difference between the true and computed residual.

Definition 1

The residual gap after j steps of inexact RKSM for Lyapunov equations is given by

$$\begin{aligned} \varDelta R_j&:=R_j^{\text {true}}-R^{\text {comp.}}_j = \eta ^{\text {expl./impl.}}_j+(\eta ^{\text {expl./impl.}}_j)^*,\\ \text {where}\quad \eta ^{\text {expl.}}_j&:=F_j-F^{\text {expl.}}_j=(I-Q_jQ_j^*)S_jH_j^{-1}Y_jQ_j^*\quad (\text {if}~T^{\text {expl.}}_j~\text {is used}),\\ \eta ^{\text {impl.}}_j&:=F_j-F^{\text {impl.}}_j=S_jH_j^{-1}Y_jQ_j^*\quad (\text {if}~T^{\text {impl.}}~\text {is used}). \end{aligned}$$

We have

$$\begin{aligned} \Vert \varDelta R_j\Vert ={\left\{ \begin{array}{ll} \Vert \eta ^{\text {expl.}}_j+(\eta ^{\text {expl.}}_j)^*\Vert &{}=\Vert \eta ^{\text {expl.}}_j\Vert ,\\ \Vert \eta ^{\text {impl.}}_j+(\eta ^{\text {impl.}}_j)^*\Vert &{}\le 2\Vert \eta ^{\text {impl.}}_j\Vert . \end{array}\right. } \end{aligned}$$

The result for \(\Vert \eta ^{\text {expl.}}_j\Vert \) follows from the orthogonality of left and rightmost factors of \(\eta ^{\text {expl.}}_j\): with \(\hat{Q}:=[Q_j,Q_j^{\perp }]\in \mathbb {C}^{n\times n}\) unitary we have

$$\begin{aligned} \Vert \varDelta R_j\Vert _2&= \left| \lambda _{\max }(\eta ^{\text {expl.}}_j+(\eta ^{\text {expl.}}_j)^*)\right| =\left| \lambda _{\max }(\hat{Q}^*(\eta ^{\text {expl.}}_j+(\eta ^{\text {expl.}}_j)^*)\hat{Q})\right| \\&=\left| \lambda _{\max }\left( \left[ \begin{array}{ll} 0&{}\quad {Q_j^{\perp }}^*\eta ^{\text {expl.}}_jQ_j\\ ({Q_j^{\perp }}^*\eta ^{\text {expl.}}_jQ_j)^*&{} \quad 0\end{array}\right] \right) \right| =\sigma _{\max }(\eta ^{\text {expl.}}_j)=\Vert \eta ^{\text {expl.}}_j\Vert _2. \end{aligned}$$

In the following we use \(T^{\text {expl.}}_j\) to define the projected problem and omit the superscripts \(^{\text {expl.}}\), \(^{\text {impl.}}\). Suppose the desired accuracy is so that \(\Vert R^{\text {true}}_j\Vert \le \varepsilon \), where \(\varepsilon >0\) is a given threshold. In practice the computed residual norms often show a decreasing behavior very similar to the exact method. However, the norm of the residual gap \(\Vert \eta _j\Vert \) indicates the attainable accuracy of the inexact rational Arnoldi method because \(\Vert R^{\text {true}}_j\Vert \le \Vert R^{\text {comp.}}_j\Vert +\Vert \eta _j\Vert \) and the true residual norm is bounded by \(\Vert \eta _j\Vert \) even if \(\Vert R^{\text {comp.}}_j\Vert \le \varepsilon \), which would indicate convergence of the computed residuals. This motivates to enforce \(\Vert \eta _j\Vert <\varepsilon \), such that small true residual norms \(\Vert R^{\text {true}}_j\Vert \le 2\varepsilon \) are obtained overall. Since, at step j,

$$\begin{aligned} \Vert \eta _j\Vert&\le \Vert S_jH_j^{-1}Y_j\Vert =\left\| \sum \limits _{k=1}^js_ke_k^*H_j^{-1}Y_j\right\| \le \sum \limits _{k=1}^j\Vert s_k\Vert \Vert e_k^*H_j^{-1}Y_j\Vert , \end{aligned}$$
(2.15)

it is sufficient that only one of the factors in each addend in the sum is small and the other one is bounded by, say, \(\mathscr {O}(1)\) in order to achieve \(\Vert \eta _j\Vert \le \varepsilon \). In particular, if the \(\Vert e_k^*H_j^{-1}Y_j\Vert \) terms decrease with k, the linear residual norms \(\Vert s_k\Vert \) are allowed to increase to some extent, and still achieve a small residual gap \(\Vert \eta _j\Vert \). Hence, the solve tolerance \(\tau ^{\text {LS}}_k\) of the linear solves can be relaxed in the course of the outer iteration which has coined the term relaxation. For this to happen, however, we first need to investigate if there is a decay of \(\Vert e_k^*H_j^{-1}Y_j\Vert \) as k increases, which we will do next.

2.3 Properties of the solution of the Galerkin system

By (2.15), the norm of the residual gap \(\eta _j\) strongly depends on the solution \(Y_j\) of the projected Lyapunov equation (2.8). We will see in Theorem 2.1 and Corollary 2.1 that the entries of \(Y_j\) decrease away from the diagonal in a manner proportional to the Lyapunov residual norm.

In the second part of this section we consider the rows of \(H_j^{-1}Y_j\), since the residual formula (2.9a2.9b) and the residual gap (2.15) depend on this quantity. It turns out that the norm of those rows also decay with the Lyapunov residual norms. Both decay bounds will later be used to develop practical relaxation criteria for achieving \(\Vert \eta _j\Vert \le \varepsilon \).

Consider the solution to the projected Lyapunov equation (2.8). We are interested in the transition from step k to j, where \(k<j\). At first, we investigate this transition for a general Galerkin method including RKSM as a special case.

Theorem 2.1

Assume \(A+A^*<0\) and consider a Galerkin projection method for (1.1) with \(T_j=Q_j^*(AQ_j)\), \(Q_j^*Q_j=I_j\), and the first basis vector given by \(B=q_1\beta \). Let the \(k\times k\) matrix \(Y_k\) and the \(j\times j\) matrix \(Y_j\) be the solution to the projected Lyapunov equation (2.8) after k and j steps of this method (e.g., Algorithm 1 with \(r=1\)), respectively, where \(k<j\). Consider the \(j\times j\) difference matrix \(\varDelta Y_{j,k}:=Y_{j}-\left[ \begin{array}{ll} Y_k&{}\quad 0\\ 0&{}\quad 0\end{array}\right] \), where the zero blocks are of appropriate size. Then

$$\begin{aligned} \Vert \varDelta Y_{j,k}\Vert&\le c_A\Vert R^{\text {true}}_k\Vert ,\quad c_A:=\frac{(1+\sqrt{2})^2}{2\alpha _A}, \end{aligned}$$
(2.16)

where \(\alpha _A:=\frac{1}{2}\vert \lambda _{\max }(A+A^*)\vert \), and \(R^{\text {true}}_k\) is the Lyapunov residual matrix after k steps of Algorithm 1.

Proof

The residual matrix \(N_{j,k}\) of (2.8) w.r.t. \(T_j\) and \(\left[ \begin{array}{cc} Y_k&{}\\ &{}0\end{array}\right] \) is given by

$$\begin{aligned} N_{j,k}:=T_{j}\left[ \begin{array}{cc} Y_k&{}\\ &{} 0\end{array}\right] +\left[ \begin{array}{cc} Y_k&{}\\ &{}0\end{array}\right] T_{j}^*+\left[ \begin{array}{cc}\left[ \begin{array}{cc}\beta \beta ^*&{}\quad 0\\ 0&{}\quad 0\end{array}\right] &{}0\\ 0 &{}0\end{array}\right] \end{aligned}$$

as \(T_j\) is built cumulatively. Since \(Q_j[\begin{array}{ll} Y_k&{} \\ &{} 0 \end{array}] Q_j^*=Q_kY_kQ_k^*\), it follows that \(\Vert N_{j,k}\Vert \le \Vert R^{\text {true}}_k\Vert \). The difference matrix \(\varDelta Y_{j,k}\) satisfies the Lyapunov equation \(T_{j}\varDelta Y_{j,k}+\varDelta Y_{j,k}T_{j}^*=-N_{j,k}\), and since \(\varLambda (T_j)\subset \mathbb {C}_-\) (using \(A+A^*<0\)) it can be expressed via the integral

$$\begin{aligned} \varDelta Y_{j,k}&=\int \limits _{0}^{\infty }\mathop {\mathrm {e}^{T_{j}t}}N_{j,k}\mathop {\mathrm {e}^{T^*_{j}t}}dt. \end{aligned}$$
(2.17)

Moreover, using results from [19], we have

$$\begin{aligned} \Vert \mathop {\mathrm {e}^{T_{j}t}}\Vert \le (1+\sqrt{2})\max \limits _{z\in \mathscr {W}(T_j)}\vert \mathop {\mathrm {e}^{zt}}\vert =(1+\sqrt{2})\mathop {\mathrm {e}^{\max \limits _{z\in \mathscr {W}(T_j)}\mathop {\mathrm {Re}}\,\left( z\right) t}}, \end{aligned}$$

where \(\mathscr {W}(\cdot )\) denotes the field of values. Using the assumption \(A+A^*<0\) it holds that \(\mathscr {W}(A)\subset \mathbb {C}_-\) and, consequently, \(\mathscr {W}(T_{j})\subseteq \mathscr {W}(A),~\forall j>0\). Hence,

$$\begin{aligned} \mathop {\mathrm {e}^{\max \limits _{z\in \mathscr {W}(T_j)}\mathop {\mathrm {Re}}\,\left( z\right) t}} \le \mathop {\mathrm {e}^{-\alpha _A t}}\quad \text {with}\quad \alpha _A:=\frac{1}{2}\vert \lambda _{\min }(A+A^*)\vert . \end{aligned}$$

Finally, from (2.17) we obtain \(\Vert \varDelta Y_{j,k}\Vert \le \displaystyle \int \limits _{0}^{\infty }\Vert \mathop {\mathrm {e}^{T_{j}t}}\Vert ^2dt \Vert N_{j,k}\Vert \le \displaystyle \frac{(1+\sqrt{2})^2\Vert R^{\text {true}}_k\Vert }{2\alpha _A}\). \(\square \)

Theorem 2.1 shows that the difference matrix \(\varDelta Y_{j,k}\) decays at a similar rate as the Lyapunov residual norms.

Remark 2.1

Note that the constant \(c_A\) in Theorem 2.1 is a bound on \(\int \limits _{0}^{\infty }\Vert \mathop {\mathrm {e}^{T_{j}t}}\Vert ^2dt\) and could be much smaller than the given value. In the following considerations we assume that \(c_A\) is small or of moderate size. If \(c_A\) is large (e.g., the field of values is close to the imaginary axis) then the ability to approximate X by a low-rank factorization might be reduced, see, e.g. [3, 25]. Hence, this situation could be difficult for the low-rank solvers, regardless whether exact or inexact linear solves are used, and it is therefore appropriate to assume that \(c_A\) is small (or of moderate size).

The above theorem can be used to obtain results about the entries of the solution \(Y_j\) of the projected Lyapunov equation (2.8).

Corollary 2.1

Let the assumptions of Theorem 2.1 hold. For the \((\ell ,i)\)th entry of \(Y_j\) we have

$$\begin{aligned} \vert e_{\ell }^*Y_je_i\vert \le c_A\Vert R^{\text {true}}_{k}\Vert ,\quad \ell ,i=1,\ldots ,j, \end{aligned}$$
(2.18)

where \(k< \max (\ell ,i)\) and \(c_A\) is as in Theorem 2.1.

Proof

We have

$$\begin{aligned} e_{\ell }^*Y_je_i=e_{\ell }^*\left[ \begin{array}{ll} Y_k&{}\quad 0\\ 0&{}\quad 0\end{array}\right] e_i+e_{\ell }^*\varDelta Y_{j,k}e_i. \end{aligned}$$
(2.19)

For \(k<\ell \) the first summand vanishes and hence, using Theorem 2.1

$$\begin{aligned} \vert e_{\ell }^*Y_je_i\vert =\vert e_{\ell }^*\varDelta Y_{j,k}e_i\vert \le \frac{(1+\sqrt{2})^2}{2\alpha _A}\Vert R^{\text {true}}_k\Vert ,\quad k=1,\ldots ,\ell -1. \end{aligned}$$
(2.20)

Since, \(Y_j=Y_j^{*}\) the indices \(\ell \), i can be interchanged, s.t. \(k< \max (\ell ,i)\) and we end up with the final bound (2.18). \(\square \)

Remark 2.2

Note that the sequence \(\lbrace \Vert R^{\text {true}}_k\Vert \rbrace \) is not necessarily monotonically decreasing and we can further extend the bound in (2.18) and obtain

$$\begin{aligned} \vert e_{\ell }^*Y_je_i\vert \le c_A\min \limits _{p=1,\ldots ,\max (\ell ,i)}\Vert R^{\text {true}}_{p-1}\Vert ,\quad \ell ,i=1,\ldots ,j. \end{aligned}$$

Corollary 2.1 shows that the \((\ell ,i)\)-th entry of \(Y_j\) can be bounded from above using the Lyapunov residual norm at step \(k<\max (\ell ,i)\). This indicates that the further away from the diagonal, the smaller the entries of \(Y_j\) will become in the course of the iteration, provided the true residual norms exhibit a sufficiently decreasing behavior. We can observe this characteristic in Fig. 1 in Sect. 4. The decay of Lyapunov solutions has also been investigated by different approaches. Especially when \(T_j\) is banded, which is, e.g., the case when a Lanczos process is applied to \(A=A^*\), more refined decay bounds for the entries of \(Y_j\) can be established, see, e.g., [18, 35].

In exact and inexact RKSM, the formula for residuals (2.9), (2.1) and residual gaps (2.15) depend not only on \(Y_j\) but rather on \(H_j^{-1}Y_j\). In particular, the rows of \(H_j^{-1}Y_j\) appear in (2.15) and will later be substantial for defining relaxation strategies. Hence, we shall investigate if the norms \(\Vert e_k^*H_j^{-1}Y_j\Vert \) also exhibit a decay for \(1\le k\le j\). For the last row, i.e. \(k=j\), using (2.10) readily reveals

$$\begin{aligned} \Vert e_j^*H_j^{-1}Y_j\Vert = \frac{\Vert R^{\text {comp.}}_j\Vert }{\vert h_{j+1,j}\vert \Vert g_j\Vert },\quad \text {assuming}\quad g_j\ne 0,\,h_{j+1,j}\ne 0. \end{aligned}$$

In the spirit of Theorem 2.1, Corollary 2.1, we would like to bound the \(\ell \)th row of \(H_j^{-1}Y_j\) by the \((\ell -1)\)th computed Lyapunov residual norm \(\Vert R^{\text {comp.}}_{\ell -1}\Vert \). For this we require the following lemma showing that, motivated by similar results in [53], the first k (\(1< k\le j\)) entries of \(e_k^*H_j^{-1}\) are essentially determined by the left null space of \(\underline{H}_j:=H_{j+1,j}\) and \(e_{k-1}^*H_{k-1}^{-1}\) modulo scaling.

Lemma 2.2

Let \(\underline{H}_j=\left[ \begin{array}{l} H_j\\ h_{j+1,j}e_j^*\end{array}\right] \in \mathbb {C}^{(j+1)\times j}\) with \(H_j\) an unreduced upper Hessenberg matrix, with rank\((\underline{H}_j)=j\), \(H_k:=H_{1:k,1:k}\) nonsingular \(\forall 1\le k\le j\), and let \(\omega \in \mathbb {C}^{1\times (j+1)}\), \(\omega \ne 0\) satisfy \(\omega \underline{H}_j=0\). Define the vectors \(f^{(k)}_m:=e_k^*H_m^{-1}\), \(1\le k\le m\le j\). Then, for \(1\le k\le j\), we have

$$\begin{aligned} f^{(j)}_j&=-\frac{\omega _{1:j}}{\omega _{j+1}h_{j+1,j}},\quad f^{(k)}_j=\frac{v^{(k)}_j}{\phi _{j}^{(k)}},\quad \text {where} \end{aligned}$$
(2.21a)
$$\begin{aligned} v^{(k)}_j&:=\omega _{1:j}+[0_{1,k},[0_{1,j-k-1},h_{j+1,j}\omega _{j+1}]H_{k+1:j,k+1:j}^{-1}],\quad \phi _{j}^{(k)}:=v^{(k)}_jH_{j}e_k. \end{aligned}$$
(2.21b)

Moreover, for \(k>1\), the first k entries of \(v^{(k)}_j\) can be expressed by

$$\begin{aligned} (v^{(k)}_j)_{1:k}&=[-h_{k,k-1}f^{(k-1)}_{k-1},1]\omega _k. \end{aligned}$$
(2.21c)

Proof

See “Appendix A”. \(\square \)

Corollary 2.2

Let the assumptions of Theorem 2.1, Corollary 2.1, and Lemma 2.2 hold and consider the RKSM as a special Galerkin projection method. Then

$$\begin{aligned} \Vert e_{\ell }^*H_j^{-1}Y_j\Vert \le {\left\{ \begin{array}{ll} c_A\Vert e_{1}^*H_j^{-1}\Vert \Vert R^{\text {true}}_{0}\Vert ,&{}\ell =1,\\ \frac{1}{\phi _{j}^{(\ell )}\Vert g_{\ell -1}\Vert }\Vert R^{\text {comp.}}_{\ell -1}\Vert +c_A\Vert e_{\ell }^*H_j^{-1}\Vert \Vert R^{\text {true}}_{\ell -1}\Vert ,&{} \ell =2,\ldots ,j \end{array}\right. } \end{aligned}$$
(2.22)

with \(g_{\ell }\) from (2.5) and \(\phi _{j}^{(\ell )}\) from Lemma 2.2.

Proof

Using Lemma 2.2 for \(f_j^{(\ell )}:=e_{\ell }^*H_j^{-1}\), \(1<\ell \le j\), and the structure (2.9b) of the computed Lyapunov residuals yields

$$\begin{aligned} e_{\ell }^*H_j^{-1}Y_j&=f_j^{(\ell )}\left( \left[ \begin{array}{ll} Y_{\ell -1}&{}\quad 0\\ 0&{}\quad 0\end{array}\right] +\varDelta Y_{j,\ell -1}\right) \\&=[-h_{\ell ,\ell -1}f_{\ell -1}^{(\ell -1)}\omega _{\ell }]/\phi _{j}^{(\ell )}Y_{\ell -1}+f_j^{(\ell )}\varDelta Y_{j,\ell -1}\\&=-\frac{g_{\ell -1}^*L_{\ell -1}}{\Vert g_{\ell -1}\Vert ^2\phi _{j}^{(\ell )}}\omega _{\ell }+f_j^{(\ell )}\varDelta Y_{j,\ell -1}. \end{aligned}$$

Taking norms, using that \(\omega \) is a unit norm vector, noticing for \(\ell =1\) only the second term exists, and applying Theorem 2.1 gives the result. \(\square \)

Corollary 2.2 shows that, similar to the entries of \(Y_j\), the rows of \(H_j^{-1}Y_j\) can be bounded using the previous Lyapunov residual norm. However, due to the influence of \(H_j^{-1}\), the occurring constants in front of the Lyapunov residual norms can potentially be very large.

2.4 Relaxation strategies and stopping criteria for the inner iteration

In order to achieve accurate results, the difference between the true and computed residual, the residual gap, needs to be small.

Corollary 2.2 indicates that \( \Vert e_k^* H_j^{-1}Y_j\Vert \) decreases with the computed Lyapunov residual, and hence \(\Vert s_k\Vert \) may be relaxed during the RKSM iteration in a manner inverse proportional to the norm of the Lyapunov residual (assuming the Lyapunov residual norm decreases).

Theorem 2.2

(Theoretical relaxation strategy in RKSM) Let the assumptions of Theorem 2.1 and Corollary 2.2 hold. Assume we carry out j steps of Algorithm 1 always using the explicit projection \(T^{\text {expl.}}_j\). If we choose the solve tolerances \(\Vert s_k\Vert \), \(1\le k\le j\) for the inexact solves within RKSM such that

$$\begin{aligned} \Vert s_k\Vert \le \tau _k^{\text {LS}}={\left\{ \begin{array}{ll} \tfrac{\varepsilon }{jc_A\Vert e_{1}^*H_j^{-1}\Vert \Vert R^{\text {true}}_{0}\Vert },&{}k=1,\\ \tfrac{\varepsilon }{\frac{j}{\phi _{j}^{(k)}\Vert g_{k-1}\Vert }\Vert R^{\text {comp.}}_{k-1}\Vert +jc_A\Vert e_{k}^*H_j^{-1}\Vert \Vert R^{\text {true}}_{k-1}\Vert },&{}1<k\le j,\\ \end{array}\right. } \end{aligned}$$
(2.23)

with the same notation as before, then, for the residual gap \(\Vert \eta _j\Vert \le \varepsilon \) holds.

Proof

Consider a single addend in the sum expression (2.15) for the norm of the residual gap

$$\begin{aligned} \Vert s_k\Vert \Vert e_k^* H_j^{-1}Y_j\Vert \le \Vert s_k\Vert \left( \frac{1}{\phi _{j}^{(k)}\Vert g_{k-1}\Vert }\Vert R^{\text {comp.}}_{k-1}\Vert +c_A\Vert e_{k}^*H_j^{-1}\Vert \Vert R^{\text {true}}_{k-1} \Vert \right) , k>1. \end{aligned}$$

Choosing \(s_k\) such that (2.23) is satisfied for \(1\le k\le j\) then gives \(\Vert \eta _j\Vert \le \sum \nolimits _{k=1}^j\frac{\varepsilon }{j}=\varepsilon \) where we have used (2.15), Theorem 2.1, Corollaries 2.1 and 2.2. \(\square \)

The true norms \(\Vert R^{\text {true}}_{k-1}\Vert \) can be estimated by \(\Vert R^{\text {true}}_{k-1}\Vert \le \Vert R^{\text {comp.}}_{k-1}\Vert +\Vert \eta _{k-1}\Vert \). For this, we might either use some estimation for \(\Vert \eta _{k-1}\Vert \) or simply assume that all previous residual gaps were sufficiently small, i.e., \(\Vert \eta _{k-1}\Vert \le \varepsilon \).

Practical relaxation strategies for inexact RKSM The relaxation strategy in Theorem 2.2 is far from practical. First, the established bounds for the entries and rows of \(Y_j\) and \(H_j^{-1}Y_j\), respectively, can be a vast overestimation of the true norms. Hence, the potentially large denominators in (2.23) result in very small solve tolerances \(\tau _k^{\text {LS}}\) and, therefore, prevent a substantial relaxation of the inner solve accuracies. Second, several quantities in the used bounds are unknown at step \(k<j\), e.g. \(H_j^{-1}\), \(Y_j\), and the constants \(\phi _{j}^{(k)}\). If we knew \(H_j^{-1}\), \(Y_j\), we could employ a relaxation strategy of the form \(\Vert s_k\Vert \le \frac{\varepsilon }{j \Vert e_k^* H_j^{-1}Y_j\Vert }\) and only use Corollary 2.2 as theoretical indication that \(\Vert e_k^* H_j^{-1}Y_j\Vert \) decreases as the outer method converges.

In the following we therefore aim to develop a more practical relaxation strategy by trying to estimate the relevant quantity \(\Vert e_k^* H_j^{-1}Y_j\Vert \) differently using the most recent available data. Suppose an approximate solution with residual norm \(\Vert R^{\text {true}}_{k}\Vert \le \varepsilon \), \(0<\varepsilon \ll 1\) is desired which should be found after at most \(j_{\max }\) rational Arnoldi steps. This goal is achieved if \(\Vert \eta _{j_{\max }}\Vert \le \tfrac{\varepsilon }{2}\) and if \(\Vert R^{\text {comp.}}_{j_{\max }}\Vert \le \tfrac{\varepsilon }{2}\) is obtained by the inexact RKSM.

Consider the left null vectors of the augmented Hessenberg matrices, \(\omega _{k}\underline{H}_{k}=0\), \(k\le j_{\max }\). It is easy to show that \(\omega _{k}\) can be updated recursively, in particular it is possible to compute \(\omega _{m}=(\omega _{k})_{1:m+1}\), \(m\le k\le j_{\max }\). Consequently, at the beginning of step k we already have \(\underline{H}_{k-1}\) and hence, also the first k entries of \(\omega _{j_{\max }}\) without knowing the full matrix \(\underline{H}_{j_{\max }}\). Using Lemma 2.2 and proceeding similar as in the proof of Corollary 2.2 results in

$$\begin{aligned} \Vert e_k^* H_{j_{\max }}^{-1}Y_{j_{\max }}\Vert&=\left\| e_k^* H_{j_{\max }}^{-1}\left( \left[ \begin{array}{ll} Y_{k-1}&{}\quad 0\\ 0&{}\quad 0\end{array}\right] +\varDelta Y_{{j_{\max }},\ell -1}\right) \right\| \\&=\left\| -\frac{\omega _{j_{\max }}(k)}{\phi _{j_{\max }}^{(k)}}[h_{k,k-1}e_{k-1}^*H_{k-1}^{-1},*]\left[ \begin{array}{ll} Y_{k-1}&{}\quad 0\\ 0&{}\quad 0\end{array}\right] +e_k^* H_{j_{\max }}^{-1}\varDelta Y_{{j_{\max }},\ell -1})\right\| \\&=\left\| -\frac{h_{k,k-1}\omega _k(k)}{\phi _{j_{\max }}^{(k)}}e_{k-1}^*H_{k-1}^{-1}Y_{k-1}+e_k^* H_{j_{\max }}^{-1}\varDelta Y_{{j_{\max }},k-1}\right\| \\&\le \left| \frac{h_{k,k-1}\omega _k(k)}{\phi _{j_{\max }}^{(k)}}\right| \Vert e_{k-1}^*H_{k-1}^{-1}Y_{k-1}\Vert +\Vert e_k^*H_{j_{\max }}^{-1}\Vert c_A\Vert R^{\text {true}}_{k-1}\Vert \\&\approx \left| \frac{h_{k,k-1}\omega _k(k)}{\phi _{j_{\max }}^{(k)}}\right| \Vert e_{k-1}^*H_{k-1}^{-1}Y_{k-1}\Vert , \end{aligned}$$

if \(\Vert e_k^*H_{j_{\max }}^{-1}\Vert c_A\Vert R^{\text {true}}_{k-1}\Vert \) is small. Only the scaling parameter \(\phi _{j_{\max }}^{(k)}\) contains missing data at the beginning of step k, because \(H_{k-1},~Y_{k-1}\), \(\omega _k\) are known from the previous step. We suggest to omit the unknown data and use the following practical relaxation strategy

$$\begin{aligned} \Vert s_k\Vert \le \tau _k^{\text {LS}}= {\left\{ \begin{array}{ll} \delta \frac{\varepsilon }{j_{\max }},&{}k=1,\\ \delta \frac{\varepsilon }{j_{\max }\Vert h_{k,k-1}e_{k-1}^*H_{k-1}^{-1}Y_{k-1}\Vert },&{}k>1, \end{array}\right. } \end{aligned}$$
(2.24a)

where \(0<\delta \le 1\) is a safeguard constant to accommodate for the estimation error resulting from approximating \(\Vert e_k^* H_{j_{\max }}^{-1}Y_{j_{\max }}\Vert \) and omitting unknown quantities (e.g., \(\phi _{j_{\max }}^{(k)}\)).

Remark 2.3

The parameter \(\delta \) is a safeguard constant. Choosing \(\delta \) too large may lead to divergence of the inexact method, choosing it too small could lead to unnecessary extra work. Investigating the choice of this constant is future work, however we would like to remark, that similar constants have been used in other works on inexact iterative solves, e.g. [46].

The reader might notice by following Algorithm 1 closely, that the built up subspace at the beginning of iteration step \(k\le j_{\max }\) is already k-dimensional and since we are using the explicit projection \(T^{\text {expl.}}_k\) to define the Galerkin systems, we can already compute \(Y_k\) directly after building \(Q_k\). This amounts to a simple rearrangement of Algorithm 1 by moving Lines 78 before Line 3. Hence, the slight variation

$$\begin{aligned} \Vert e_k^* H_{j_{\max }}^{-1}Y_{j_{\max }}\Vert \approx \left| \frac{\omega _{k}(k)}{\phi _{j_{\max }}^{(k)}}\right| \Vert [-h_{k,k-1}e_{k-1}^*H_{k-1}^{-1},1]Y_k\Vert \end{aligned}$$

of the above estimate is obtained, which suggests the use of the (slightly different) practical relaxation strategy

$$\begin{aligned} \Vert s_k\Vert \le \tau _k^{\text {LS}}= {\left\{ \begin{array}{ll} \delta \frac{\varepsilon }{j_{\max }},&{}k=1,\\ \delta \frac{\varepsilon }{j_{\max }\Vert [-h_{k,k-1}e_{k-1}^*H_{k-1}^{-1},1]Y_k\Vert },&{}k>1. \end{array}\right. } \end{aligned}$$
(2.24b)

For both dynamic stopping criteria, in order to prevent too inaccurate and too accurate linear solves, it is reasonable to enforce \(\tau _k^{\text {LS}}\in [\tau ^{\text {LS}}_{\min },\tau ^{\text {LS}}_{\max }]\), where \(0<\tau ^{\text {LS}}_{\min }<\tau ^{\text {LS}}_{\max }\le 1\) indicate minimal and maximal linear solve thresholds.

The numerical examples in Sect. 4 show that these practical relaxation strategies are effective and can reduce the number of inner iterations for RKSM by up to 50%.

2.5 Implementation issues and generalizations

This section contains several remarks on the implementation of the inexact RKSM, in particular the case when the right hand side of the Lyapunov equation has rank greater than one, as well as considerations of the inner iterative solver and preconditioning. We also briefly comment on extensions to generalized Lyapunov equations and algebraic Riccati equations.

The case \(r>1\) The previous analysis was restricted to the case \(r=1\) but the block situation, \(r>1\), can be handled similarly by using appropriate block versions of the relevant quantities, e.g., \(q_k,~w_k,~s_k\in \mathbb {C}^{n\times r}\), \(h_{ij}\in \mathbb {R}^{r}\), \(\omega \in \mathbb {C}^{r\times (j+1)r}\), and \(e_k\) by \(e_k\otimes I_r\), as well as replacing absolute values by spectral norms in the right places. When solving the linear system with r right hand sides \(q_k\), such that, \(\Vert s_k\Vert \le \tau _k^{\text {LS}}\) is achieved, one can either used block variants of the iterative methods (see, e.g., [49]), or simply apply a single vector method and sequentially consider every column \(q_k(:,\ell )\), \(\ell =1,\ldots ,r\) and demand that \(s_k(:,\ell )\le \tau _k^{\text {LS}}/r\).

Choice of inner solver One purpose of low-rank solvers for large matrix equations is to work in a memory efficient way. Using a long-recurrence method such as GMRES to solve unsymmetric inner linear systems defies this purpose in some sense because it requires storing the full Krylov basis. Unless a very good preconditioner is available, this can lead to significant additional storage requirements within the inexact low-rank method, where the Krylov method consumes more memory than the actual low-rank Lyapunov solution factor of interest. Therefore we exclusively used short-recurrence Krylov methods (e.g., BiCGstab) for the numerical examples defined by an unsymmetric matrix A.

Preconditioning The preceding relaxation strategies relate to the residuals \(s_k\) of the underlying linear systems. For enhancing the performance of the Krylov subspace methods, using preconditioners is typically inevitable. Then the inner iteration itself inherently only works with the preconditioned residuals \(s^{\text {Prec.}}_k\) which, if left or two-sided preconditioning is used, are different from the residuals \(s_k\) of the original linear systems. Since \(\Vert s^{\text {Prec.}}_k\Vert \le \tau _k^{\text {LS}}\) does not imply \(\Vert s_k\Vert \le \tau _k^{\text {LS}}\) this can result in linear systems solved not accurately enough to ensure small enough Lyapunov residual gaps. Hence, some additional care is needed to respond to these effects from preconditioning. The obvious approach is to use right preconditioning which gives \(\Vert s^{\text {Prec.}}_k\Vert =\Vert s_k\Vert \).

Complex shifts In practice A, B are usually real but some of the shift parameters can occur in complex conjugate pairs. Then it is advised to reduce the amount of complex operations by working with a real rational Arnoldi method [37] that constructs a real rational Arnoldi decomposition and slightly modified formulae for the computed Lyapunov residuals, in particular for \(F_j\). The actual derivations are tedious and are omitted here for the sake of brevity, but our implementation for the numerical experiments works exclusively with the real Arnoldi process.

Generalized Lyapunov equations In many practical relevant situations, generalized Lyapunov equations of the form

$$\begin{aligned} AXM^*+MXA^*=-BB^*, \end{aligned}$$
(2.25)

with an additional, nonsingular matrix \(M\in \mathbb {R}^{n\times n}\) have to be solved. Projection methods tackle (2.25) by implicitly running on equivalent Lyapunov equations defined by \(A_M:=L_M^{-1}AU_M^{-1}\), \(B_M:=L_M^{-1}B\) using a factorization \(M=L_MU_M\), which could be a LU-factorization or, if M is positive definite, a Cholesky factorization (\(U_M=L_M^*\)). Other possibilities are \(L_M=M,~U_M=I\) and \(L_M=I,~U_M=M\). Basis vectors for the projection subspace are obtained by

$$\begin{aligned} Q_1\beta&=L_M^{-1}B,\quad (A-\xi _{j+1}M)\hat{w}_{j+1}=L_Mq_{j},\quad w_{j+1}=U_M\hat{w}_{j+1}. \end{aligned}$$
(2.26)

After convergence, the low-rank approximate solution of the original problem (2.25) is given by \(X\approx (U_M^{-1}Q_j)Y_j(U_M^{-1}Q_j)^*)\), where \(Y_j\) solves the reduced Lyapunov equation defined by the restrictions of \(A_M\) and \(B_M\). This requires solving extra linear systems defined by (factors of) M in certain stages of Algorithm 1: setting up the first basis vector, building the restriction \(T_j\) of \(A_M\) (either explicitly or implicitly using (2.7)), and recovering the approximate solution after termination. Since the coefficients of these linear system do not change throughout the iteration, often sparse factorizations of M are computed once at the beginning and reused every time they are needed. In this case the above analysis can be applied with minor changes of the form

$$\begin{aligned} s_j:=L_Mq_j-(A-\xi _{j+1} M)\hat{\tilde{w}}_{j+1},\quad \Vert s_j\Vert \le \tau _j^{\mathrm{LS}},\quad w_{j+1}=U_M\hat{\tilde{w}}_{j+1}, \end{aligned}$$

for the inexact linear solves. We obtain an inexact rational Arnoldi decomposition with respect to \(A_M\) of the form

$$\begin{aligned} A_MQ_j&=Q_j\hat{T}^{\text {expl.}}_j+\hat{g}_je_j^*h_{j+1,j}H_j^{-1}-(I-Q_jQ_j^*)L_M^{-1}S_jH_j^{-1},\\ \text {with}\quad \hat{T}^{\text {expl.}}_j&=Q_j^*A_MQ_j,\quad \hat{g}_j=q_{j+1}\xi _{j+1}-(I-Q_jQ_j^*)A_M q_{j+1}. \end{aligned}$$

If \(Q_jY_jQ_j^*\) is an approximate solution of the equivalent Lyapunov equation defined by \(A_M,~B_M\), and \(Y_j\) solves the reduced Lyapunov equation defined by \(\hat{T}^{\text {expl.}}_j\) and \(Q_j^*B_M\), then the associated residual is

$$\begin{aligned}&A_MQ_jY_jQ_j^*+Q_jY_jQ_j^*A_M^*+B_MB_M^*=\hat{F}_j+\varDelta \hat{F}_j+(\hat{F}_j+\varDelta \hat{F}_j)^*,\\&\hat{F}_j :=\hat{g}_j^*h_{j+1,j}H_j^{-1}Y_jQ_j^*,\quad \varDelta \hat{F}_j :=(I-Q_jQ_j^*)L_M^{-1}S_jH_j^{-1}Y_jQ_j^*. \end{aligned}$$

Hence, the generalized residual gap is \(\hat{\eta }_j=\varDelta \hat{F}_j\). If \(L_M=I\), \(U_M=M\), bounding \(\Vert \hat{\eta }_j\Vert \) works in the same way as in the case \(M=I\), otherwise an additional constant \(1/\sigma _{\min }(L_M)\) (or an estimation thereof) has to be multiplied to the established bounds. Allowing inexact solves of the linear systems defined by (factors of) M substantially complicates the analysis. In particular, the transformation to a standard Lyapunov equation is essentially not given exactly since, in that case, only a perturbed version of \(A_M\) and its restriction are available. This situation is, hence, similar to the case when no exact matrix vector products with A are available. If \(L_M\ne I\), also \(B_M\) is not available exactly leading to further perturbations in the basis generation. For these reasons, we will not further pursue inexact solves with M or its factors. This is also motivated from practical situations, where solving with M, or computing a sparse factorization thereof, is usually much less demanding compared to factorising \(A-\xi _{j+1} M\).

Extended Krylov subspace methods A special case of the rational Krylov subspace appears when only the shifts zero and infinity are used, leading to the extended Krylov subspace \(\mathscr {E}\mathscr {K}_k(A_M,B_M)=\mathscr {K}_k(A_M,B_M)\cup \mathscr {K}_k(A^{-1}_M,A^{-1}_MB_M)\) (using the notation from the previous subsection). Usually, in the resulting extended Arnoldi process the basis is expanded by vectors from \(\mathscr {K}_k(A_M,B_M)\) and \(\mathscr {K}_k(A^{-1}_M,A^{-1}_MB_M)\) in an alternating fashion, starting with \(\mathscr {K}_k(A_M,B_M)\). The extended Krylov subspace method (EKSM) [43] for (1.1) and (2.25) uses a Galerkin projection onto \(\mathscr {E}\mathscr {K}_k(A_M,B_M)\). In each step the basis is orthogonally expanded by \(w_{j+1}=[A_Mq_j(:,1:r),A_M^{-1}q_j(:,r+1:2r)]\), where \(q_j\) contains the last 2r basis vectors. This translates to the following linear systems and matrix vector products

$$\begin{aligned} U_Mz&=q_j(:,1:r),\quad L_Mw_{j+1}(:,1:r)=Az,\\ \quad \text {and}\quad Az&=L_Mq_j(:,r+1:2r),\quad w_{j+1}(:,r+1:2r)=U_Mz, \end{aligned}$$

that have to be dealt with. Similar formula for the implicit restriction of A and the Lyapunov residual as in RKSM can be found in [43]. Since these coefficient matrices do not change during the iteration, a very efficient strategy is to compute, if possible, sparse factorizations of AM once before the algorithm and reuse them in every step. Incorporating inexact linear solves by using inexact sparse factorizations \(A\approx \tilde{L}_A\tilde{U}_A\), \(M\approx \tilde{L}_M\tilde{U}_M\) would make it difficult to incorporate relaxed solve tolerances, since there is little reason to compute a less accurate factorization once a highly accurate one has been constructed. For the same reasons stated in the paragraph above, we restrict ourselves to the iterative solution of the linear systems defined by A. These linear systems affect only the columns in \(w_{j+1}(:,r+1:2r)\). In particular, by proceeding as for inexact RKSM, one can show that

$$\begin{aligned} A_MQ_j&=Q_{j+1}T_{j+1,j}^{\text {expl.}}-(I-Q_jQ_j^*)S_j^{\text {EK}}\quad \text {with}\\ T_{j+1,j}^{\text {expl.}}&=Q_{j+1}^*A_MQ_j,\quad S_j^{\text {EK}}:=[s_1^{\text {EK}},\ldots ,s_j^{\text {EK}}],\quad s_i^{\text {EK}}:=[0,U_M^{-1}s_i]\in \mathbb {C}^{n\times 2r}, \end{aligned}$$

where \(s_i:=L_Mq_i(:,r+1:2r)-Az\), \(1\le i\le j\). Note that, allowing inexact solves w.r.t. M or even inexact matrix vector products with A,  M, would destroy the zero block columns in \(s_i^{\text {EK}}\).

Algebraic Riccati equations The RKSM in Algorithm 1 can be generalized to compute low-rank approximate solutions of generalized algebraic Riccati equations (AREs)

$$\begin{aligned} AXM^*+MXA^*-MXCC^*XM+BB^*=0,\quad C\in \mathbb {C}^{n\times p},\quad p\ll n, \end{aligned}$$
(2.27)

see, e.g., [44, 48]. The majority of steps in Algorithm 1 remain unchanged, the main difference is that the Galerkin system is now a reduced ARE

$$\begin{aligned} T_jY_j+Y_jT_j^*-Y_j(Q_j^*C_M)(C_M^*Q_j)Y_j+Q_j^*B_MB_M^*Q_j=0,\quad C_M:=U_M^{-*}C, \end{aligned}$$

which has to be solved. Since we do not alter the underlying rational Arnoldi process, most of the properties of the RKSM hold again, especially the residual formulas in both the exact and inexact case, and a residual gap is defined again as in the Lyapunov case. Differences occur in the bounds for the entries of \(Y_j\) and rows of \(H_j^{-1}Y_j\), since Theorem 2.1 cannot be formulated in the same way. Under some additional assumptions, a bound of the form \(\Vert \varDelta Y_{j,j-1}\Vert \le c_{\text {ARE}}\Vert R_{j-1}\Vert \) can be established [44], where the constant \(c_{\text {ARE}}\) is different from \(c_A\). We leave concrete generalizations of Theorem 2.1, Corollaries 2.12.2 for future research and only show in some experiments that relaxation strategies of the form (2.24a), (2.24b) also work for the inexact RKSM for AREs.

3 The inexact low-rank ADI iteration

3.1 Derivation, basic properties, and introduction of inexact solves

Using the Cayley transformation \(C(A,\alpha ):=(A+\alpha I)^{-1}(A-\overline{\alpha } I)\), for \(\alpha \in \mathbb {C}_-\),  (1.1) can be reformulated as discrete-time Lyapunov equation (symmetric Stein equation)

$$\begin{aligned} X=C(A,\alpha )XC(A,\alpha )^*-2\mathop {\mathrm {Re}}\,\left( \alpha \right) B(\alpha )B(\alpha )^*,~B(\alpha ):=(A+\alpha I)^{-1}B. \end{aligned}$$

For suitably chosen \(\alpha _j\) this motivates the iteration

$$\begin{aligned} \begin{aligned} X_{j}&=C(A,\alpha _j)X_{j-1}C(A,\alpha _j)^*-2\mathop {\mathrm {Re}}\,\left( \alpha _j\right) B(\alpha _j)B(\alpha _j)^*, \end{aligned} \end{aligned}$$
(3.1)

which is known as alternating directions implicit (ADI) iteration [54] for Lyapunov equations. It converges for shift parameters \(\alpha _j\in \mathbb {C}_-\) because \(\rho (C(A,\alpha _j))<1\) and it holds [10, 33, 40, 54]

$$\begin{aligned} X_j-X&=\mathfrak {C}_j(X_0-X)\mathfrak {C}_j^*, \end{aligned}$$
(3.2)
$$\begin{aligned} R_j&=A X_j+X_jA^*+BB^*=\mathfrak {C}_jR_0\mathfrak {C}_j^*, \end{aligned}$$
(3.3)

where \(\mathfrak {C}_j:=\prod \nolimits _{i=1}^{j}C(A,\alpha _i)\).

A low-rank version of the ADI iteration is obtained by setting \(X_0=0\) in (3.1), exploiting that \((A+\alpha _j I)^{-1}\) and \((A-\overline{\alpha _i} I)\) commute for \(i,j\ge 1\), and realising that the iterates are given in low-rank factored form \(X_j=Z_jZ_j^*\) with low-rank factors \(Z_j\) constructed by

$$\begin{aligned} \begin{aligned} Z_j&=\left[ \gamma _1v_1,~\gamma _2v_2,~\ldots ,\gamma _jv_j\right] =\left[ Z_{j-1},\gamma _jv_j\right] ,\quad \gamma _j:=\sqrt{-2\mathop {\mathrm {Re}}\,\left( \alpha _j\right) },\\ v_j&=(A-\overline{\alpha _{j-1}}I)(A+\alpha _jI)^{-1}v_{j-1},~j\ge 1,\quad v_1:=(A+\alpha _1 I)^{-1}B, \end{aligned} \end{aligned}$$
(3.4)

see [33, 39] for more detailed derivations. Thus, in each step \(Z_{j-1}\) is augmented by r new columns \(v_j\). Moreover, from (3.3) with \(X_0=0\) and (3.4) it is evident that

$$\begin{aligned} R_j=w_jw_j^*,\quad w_j:=\mathfrak {C}_jB=w_{j-1}-\gamma ^2_j(A+\alpha _jI)^{-1}w_{j-1},\quad w_0:=B, \end{aligned}$$
(3.5)

see also [10, 55]. Hence, the residual matrix has at most rank r and its norm can be cheaply computed as \(\Vert R_j\Vert _2=\Vert w_j^*w_j\Vert _2\) which coined the name residual factors for the \(w_j\). The low-rank ADI (LR-ADI) iteration using these residual factors is

$$\begin{aligned} v_j =(A+\alpha _j I)^{-1}w_{j-1},\quad w_j=w_{j-1}+\gamma _j^2v_j,\quad w_0:=B. \end{aligned}$$
(3.6a)

For generalized Lyapunov equations (2.25), this iteration can be formally applied to an equivalent Lyapunov equations defined, e.g. by \(M^{-1}A\), \(M^{-1}B\). Basic manipulations (see, e.g. [10, 30]) resolving the inverses of M yield the generalized LR-ADI iteration

$$\begin{aligned} v_j =(A+\alpha _j M)^{-1}w_{j-1},\quad w_j=w_{j-1}+\gamma _j^2Mv_j,\quad w_0:=B. \end{aligned}$$
(3.6b)

As for RKSM, the choice of shift parameters \(\alpha _j\) is crucial for a rapid convergence. Many approaches have been developed for this problem, see, e.g., [40, 52, 54], where shifts are typically obtained from (approximate) eigenvalues of AM. Among those exist asymptotically optimal shifts for certain situations, e.g., \(A=A^*\), \(M=M^*\) and \(\varLambda (A,M)\subset {\mathbb {R}}_-\). Adaptive shift generation approaches, where shifts are computed during the running iteration, were proposed in [12, 30] and often yield better results, especially for nonsymmetric coefficients with complex spectra. In this work we mainly work with these dynamic shift selection techniques. The main computational effort in each step is obviously the solution of the shifted linear system with \((A+\alpha _jM)\) and r right hand sides for \(v_j\) in (3.6). Allowing inexact linear solves but keeping the other steps in (3.6) unchanged results in the inexact low-rank ADI iteration illustrated in Algorithm 2. We point out that a different notion of an inexact ADI iteration can be found in [34] in the context of operator Lyapunov equations, where inexactness refers to the approximation of infinite dimensional operators by finite dimensional ones.

figure b

Of course, allowing \(s_j\ne 0\) will violate some of the properties that were used to derive the LR-ADI iteration. Inexact solves within the closely related Smith methods have been investigated in, e.g., [40, 50], from the viewpoint of an inexact nonstationary iteration (3.1) that led to rather conservative results on the allowed magnitude of the norm of the linear system residual \(s_j\). The analysis we present here follows a different path by exploiting the well-known connection of the LR-ADI iteration to rational Krylov subspaces [20, 32, 33].

Theorem 3.1

[32, 33, 55] The low-rank solution factors \(Z_j\) after j steps of the exact LR-ADI iteration (\(\Vert s_j\Vert =0,~\forall j\ge 1\)) span a (block) rational Krylov subspace:

$$\begin{aligned} \mathop {\mathrm {range}}\left( Z_j \right) \subseteq \mathop {\mathrm {range}}\left( \left[ (A+\alpha _1 M)^{-1}B,\ldots ,\left[ \prod \limits _{i=2}^j(A+\alpha _iM)^{-1}\right] (A+\alpha _1 M)^{-1}B\right] \right) . \end{aligned}$$
(3.7)

Although for LR-ADI, the \(Z_j\) do not have orthonormal columns and there is no rational Arnoldi process in Algorithm 2, it is still possible to find decompositions similar to (2.5) and (2.11).

Theorem 3.2

The low-rank solution factors \(Z_j\) after j steps of the inexact LR-ADI iteration (Algorithm 2) satisfy a rational Arnoldi like decomposition

$$\begin{aligned} AZ_j&=MZ_jT_j+w_jg_j^*-S_j\varGamma _j,\quad S_j:=[s_1,\ldots ,s_j],\quad \text {where} \end{aligned}$$
(3.8)
$$\begin{aligned} T_j&=-\left[ \begin{array}{llll}\alpha _1&{}\quad 0&{}\quad \ldots &{}\quad 0\\ \gamma _1\gamma _2&{}\quad \alpha _2&{}\quad \ddots &{}\quad \vdots \\ \vdots &{}\quad &{}\quad \ddots &{}\quad 0\\ \gamma _1\gamma _j&{}\quad \ldots &{}\quad &{}\quad \alpha _j \end{array}\right] \otimes I_r,\quad g_j:=\left[ \begin{array}{l} \gamma _1\\ \vdots \\ \gamma _j\end{array}\right] \otimes I_r,\quad \varGamma _j:={\text {diag}}\!\left( g_j\right) . \end{aligned}$$
(3.9)

Moreover, the Lyapunov residual matrix associated with the approximation \(X_j= Z_jZ_j^*\approx X\) is given by

$$\begin{aligned} R_j^{\text {true}}&=AZ_jZ_j^*M^*+MZ_jZ_j^*A^*+BB^* \nonumber \\&= -S_j\varGamma _jZ_j^*M^*-MZ_j\varGamma _jS_j^*+w_jw_j^*. \end{aligned}$$
(3.10)

Note that here, \(T_j\) and \(g_j\) denote different quantities than in Sect. 2.

Proof

For \(S_j=0\) the decomposition (3.8) was established in [30, 55] and can be entirely derived from the relations (3.6). Inexact solves in the sense \(w_{j-1}-(A+\alpha _jM)v_j=s_j\) can be inserted in a straightforward way leading to (3.8). By construction, it holds \(B=w_0=w_j-MZ_jg_j\) which, together with (3.8), gives

$$\begin{aligned} R_j^{\text {true}}&=MZ_j(T_j+T_j^*+g_jg_j^*)Z_j^*M^*+w_jw_j^*-S_j\varGamma _jZ_j^*M^*-MZ_j\varGamma _jS_j^* \end{aligned}$$

and (3.10) follows by verifying that \(T_j+T_j^*=-g_jg_j^*\).

Remark 3.1

The LR-ADI iteration is in general not a typical Galerkin projection method using orthonormal bases of the search spaces and solutions of reduced matrix equations. For instance, in [55] it is shown that the exact LR-ADI iteration can be seen as an implicit Petrov-Galerkin type method with a hidden left projection subspace. In particular, as we exploited in the proof above, the relation \(T_j+T_j^*+g_jg_j^*=0\) can be interpreted as reduced matrix equation solved by the identity matrix \(I_j\). It is also possible to state a decomposition similar to (3.8) which incorporates \(w_0=B\) instead of \(w_j\) [20, 55]. This can be used to state conditions which indicate when the LR-ADI approximation satisfies a Galerkin condition [20, Theorem 3.4]. Similar to inexact RKSM, if \(s_j\ne 0\) these result do in general not hold any more.

3.2 Computed Lyapunov residuals, residual gap, and relaxation strategies in inexact LR-ADI

Similar to inexact RKSM when inexact solves are allowed in LR-ADI, the computed Lyapunov residuals \(R^{\text {comp.}}_j = w_jw_j^*\) are different from the true residuals \(R_j^{\text {true}}\) and, thus, \(\Vert w_j\Vert ^2\) ceases to be a safe way to assess the quality of the current approximation \(Z_jZ_j^*\). In the exact case, \(s_j=0\), it follows from (3.5) and \(\rho _j:=\rho (C(A,\alpha _j))<1\) that the Lyapunov residual norms decrease in the form \(\Vert R_j\Vert \le c\rho _j^2\Vert R_{j-1}\Vert \) for some \(c\ge 1\). For the factors \(w_j\) of the computed Lyapunov residuals \(R^{\text {comp.}}_j\) in the inexact method we have the following result.

Lemma 3.1

Let \(w^{\text {exact}}_{j}\) be the factors of the Lyapunov residuals of the exact LR-ADI method, i.e., \(s_i=0\), and \(C_i:=C(A,\alpha _i)\), \(1\le i\le j\). The factors \(w_j\) of the computed Lyapunov residuals of the inexact LR-ADI are given by

$$\begin{aligned} w_j=w^{\text {exact}}_{j}+\sum \limits _{i=1}^j\left[ \prod \limits _{k=i+1}^jC_k\right] (C_i-I) s_i. \end{aligned}$$

Proof

For simplicity, let \(r=1\), \(M=I_n\), and exploit \(C_j=A_{-\overline{\alpha _j}}A_{\alpha _j}^{-1}=I+\gamma _j^2A_{\alpha _j}^{-1}\), \(A_{\alpha }:=A+\alpha I\). Denoting the errors by \(\delta v_j=A_{\alpha _j}^{-1}w_{j-1}-v_j=A_{\alpha _j}^{-1}s_j\) for \(j\ge 1\), we have

$$\begin{aligned} w_j&=w_{j-1}+\gamma _j^2 v_j=w_{j-1}+\gamma _j^2 A_{\alpha _j}^{-1}w_{j-1}-\gamma _j^2\delta v_j=C_jw_{j-1}-\gamma _j^2\delta v_j\\&=\ldots =\prod \limits _{k=1}^jC_k w_0-\sum \limits _{i=1}^j\gamma _i^2\left[ \prod \limits _{k=i+1}^jC_k\right] \delta v_i\\&=\prod \limits _{k=1}^jC_k w_0-\sum \limits _{i=1}^j\left[ \prod \limits _{k=i+1}^jC_k\right] (C_i-I) s_i \end{aligned}$$

and we notice that the first term is exactly the exact Lyapunov residual factor from (3.5). \(\square \)

Constructing \(w_jw_j^*\) from the above formula and taking norms indicates that, by the contraction property of \(C_i\), the linear system residuals \(s_i\) get damped. In fact, similar to the inexact projection methods, in practice we often observe that the computed Lyapunov residual norms \(\Vert R^{\text {comp.}}_j\Vert =\Vert w_jw_j^*\Vert \) also show a decreasing behavior.

In the next step we analyze the difference between the computed and true residuals, in a similar manner as we did for the RKSM in Sect. 2. Theorem 3.2 motivates the definition of a residual gap analogue to the inexact RKSM.

Definition 2

The residual gap after j steps of the inexact LR-ADI iteration is given by

$$\begin{aligned} \varDelta R_j^{\text {ADI}}&:=R_j^{\text {true}}-R^{\text {comp.}}_j = R_j^{\text {true}}-w_jw_j^*=\eta _j^{\text {ADI}}+(\eta _j^{\text {ADI}})^*,\nonumber \\ \eta _j^{\text {ADI}}&:=-S_j\varGamma _jZ_j^*M^*. \end{aligned}$$
(3.11)

Assuming we have \(\Vert R^{\text {comp.}}_j\Vert =\Vert w_jw_j^*\Vert \le \varepsilon \) and are able to bound the residual gap, e.g., \(\Vert \varDelta R_j^{\text {ADI}}\Vert \le \varepsilon \), then we achieve small true residual norms \(\Vert R_j^{\text {true}}\Vert \le 2\varepsilon \). A theoretical approach for bounding \(\Vert \varDelta R_j^{\text {ADI}}\Vert \) is given in the next theorem.

Theorem 3.3

(Theoretical relaxation for inexact LR-ADI) Let the residual gap be given by Definition 2 with \(w_j\), \(\gamma _j\) as in (3.4)–(3.6b). Let \(j_{\max }\) be the maximum number of steps of Algorithm 2, \(\sigma _{k}:=\Vert M(A+\alpha _kM)^{-1}\Vert \), and \(0<\varepsilon <1\) the desired residual tolerance.

  1. (a)

    If, for \(1\le k\le j_{\max }\), the linear system residual satisfies

    $$\begin{aligned} \Vert s_k\Vert \le \frac{1}{2}\left( \sqrt{\Vert w_{k-1}\Vert ^2+\frac{2\varepsilon }{\sigma _{k}\gamma _k^2j_{\max }}}-\Vert w_{k-1}\Vert \right) , \end{aligned}$$
    (3.12a)

    then \(\Vert \varDelta R^{\text {ADI}}_{j_{\max }}\Vert \le \varepsilon \).

  2. (b)

    If, for \(1\le k\le j_{\max }\), the linear system residual satisfies

    $$\begin{aligned} \Vert s_k\Vert \le \frac{1}{2}\left( \sqrt{\Vert w_{k-1}\Vert ^2+\left( \frac{k\varepsilon }{j_{\max }}-2\Vert \eta ^{\text {ADI}}_{k-1}\Vert \right) \frac{2}{\sigma _{k}\gamma _k^2}}-\Vert w_{k-1}\Vert \right) , \end{aligned}$$
    (3.12b)

    then \(\Vert \varDelta R^{\text {ADI}}_{j_{\max }}\Vert \le \varepsilon \).

Proof

Consider the following estimate

$$\begin{aligned} \Vert \varDelta R_{j_{\max }}^{\text {ADI}}\Vert&\le 2\Vert \eta _{j_{\max }}^{ \text {ADI}}\Vert =2\Vert S_{j_{\max }}\varGamma _{j_{\max }}Z_{j_{\max }}^*M^*\Vert \nonumber \\&\le 2\Vert S_{{j_{\max }}-1}\varGamma _{{j_{\max }}-1}Z_{{j_{\max }}-1}^*M^*\Vert +2\gamma _{j_{\max }}^2\Vert s_{j_{\max }}\Vert \Vert Mv_{j_{\max }}\Vert \end{aligned}$$
(3.13)
$$\begin{aligned}&\le 2\sum \limits _{k=1}^{j_{\max }}\gamma _k^2\Vert s_k\Vert \Vert Mv_k\Vert . \end{aligned}$$
(3.14)

Moreover,

$$\begin{aligned} \Vert Mv_k\Vert =\Vert M(A+\alpha _k M)^{-1}(w_{k-1}-s_k)\Vert \le \sigma _k(\Vert w_{k-1}\Vert +\Vert s_k\Vert ). \end{aligned}$$
(3.15)

If the linear residual norms \(\Vert s_k\Vert \) are so that each addend in the sum (3.14) is smaller than \(\frac{\varepsilon }{{j_{\max }}}\) we achieve \(\Vert \eta _{j_{\max }}^{\text {ADI}}\Vert \le \varepsilon /2\). With \(\phi _k:=\displaystyle 2\gamma _k^2\sigma _{k}\), \(\omega _{k-1}:=\Vert w_{k-1}\Vert =\sqrt{\Vert R_{k-1}\Vert }\) this means we require \(\phi _k(\omega _{k-1} \Vert s_k\Vert +\Vert s_k\Vert ^2)\le \displaystyle \frac{\varepsilon }{j_{\max }}.\) Hence, the desired largest allowed value \(\Vert s_k\Vert \) is given by the positive root of the inherent quadratic equation \(\phi _k(\omega _k \varsigma +\varsigma ^2)-\frac{\varepsilon }{j_{\max }}=0\) such that

$$\begin{aligned} \Vert s_k\Vert \le \frac{1}{2}\left( \sqrt{\omega _{k-1}^2+\frac{4\varepsilon }{\phi _kj_{\max }}}-\omega _{k-1}\right) \end{aligned}$$
(3.16)

leading to the desired result (a). The second strategy can be similarly shown by using (3.13) and

$$\begin{aligned} \Vert \eta ^{\text {ADI}}_{k}\Vert \le 2\Vert \eta ^{\text {ADI}}_{k-1}\Vert +\phi _k(\Vert w_{k-1}\Vert \Vert s_k\Vert +\Vert s_k\Vert ^2), \end{aligned}$$

and then finding \(\Vert s_k\Vert \) such that the right hand side in the above inequality is less than \(\displaystyle \frac{k\varepsilon }{j_{\max }}\). \(\square \)

The motivation behind the second stopping strategy (3.12b) is that we can take the previous \(\Vert \eta ^{\text {ADI}}_{k-1}\Vert \) into account. This can be helpful if at steps \(i\le k-1\) the used iterative method produced smaller linear residual norms than demanded, such that the linear residual norms \(\Vert s_i\Vert \) are allowed to grow slightly larger at later iteration steps \(i>k-1\).

Convergence analysis of inexact ADI with relaxed solve tolerances Using the relaxation strategies in Theorem 3.3 reveals, under some conditions, further insight into the behaviour of the computed LR-ADI residual norms.

Theorem 3.4

(Decay of computed LR-ADI residuals) Assume \(\Vert C_j\Vert <1\), \(j=1,\ldots ,j_{\max }\) such that the residuals \(\Vert R^{\text {exact}}_{j}\Vert \) of the exact LR-ADI iteration decrease. Let the assumptions of Theorem 3.3 hold. If the relaxation (3.12a) is used in the inexact LR-ADI iteration, then

$$\begin{aligned} \Vert R^{\text {comp.}}_{j_{\max }}\Vert \le \Vert R^{\text {exact}}_{j_{\max }}\Vert +\varepsilon . \end{aligned}$$

Proof

For simplicity, let \(r=1\), \(M = I_n\) and \(\lbrace \alpha _j\rbrace _{j=1}^{j_{\max }}\subset {\mathbb {R}}_-\). The assumption \(\Vert C_j\Vert <1\) gives \(\xi _j:=\gamma _j^2\sigma _j=\Vert C_j-I\Vert <2\) for \(1\le j\le j_{\max }\). Consider the identity \(w_j=C_jw_{j-1}+(C_j-I)s_j\) established in the proof of Lemma 3.1. Then

$$\begin{aligned} \Vert R^{\text {comp.}}_{j}\Vert&=\Vert w_j\Vert ^2=\Vert C_jw_{j-1}\Vert ^2+\Vert (C_j-I)s_j\Vert ^2+2w_{j-1}^*C_j^*(C_j-I)s_j\\&\le \Vert C_jw_{j-1}\Vert ^2+\xi _j^2\Vert s_j\Vert ^2+2\omega _{j-1}\xi _j\Vert s_j\Vert , \end{aligned}$$

where we have used \(\omega _{j-1} = \Vert w_{j-1}\Vert \). Inserting (3.12a) yields

$$\begin{aligned} \xi _j^2\Vert s_j\Vert ^2+2\omega _{j-1}\xi _j \Vert s_j\Vert \le&\frac{\xi _j^2}{4}\left( \sqrt{\omega _{j-1}^2+\tfrac{2\varepsilon }{\xi _j j_{\max }}}-\omega _{j-1}\right) ^2\\&+\omega _{j-1}\xi _j\left( \sqrt{\omega _{j-1}^2+\tfrac{2\varepsilon }{\xi _j j_{\max }}}-\omega _{j-1}\right) \\ =&\frac{\xi _j^2}{4} \left( \omega _{j-1}^2+\tfrac{2\varepsilon }{\xi _j j_{\max }}-2\omega _{j-1}\sqrt{\omega _{j-1}^2+\tfrac{2\varepsilon }{\xi _j j_{\max }}}+\omega _{j-1}^2\right) \\&\quad \quad \quad \quad \quad \quad +\omega _{j-1}\xi _j\sqrt{\omega _{j-1}^2+\tfrac{2\varepsilon }{\xi _j j_{\max }}}-\omega _{j-1}^2\xi _j. \end{aligned}$$

Now, using \(\xi _j<2\) we obtain

$$\begin{aligned} \xi _j^2\Vert s_j\Vert ^2+2\omega _{j-1}\xi _j \Vert s_j\Vert \le&\frac{\xi _j}{2} \left( 2\omega _{j-1}^2+\tfrac{2\varepsilon }{\xi _j j_{\max }}-2\omega _{j-1}\sqrt{\omega _{j-1}^2+\tfrac{2\varepsilon }{\xi _j j_{\max }}}\right) \\&+\omega _{j-1}\xi _j\sqrt{\omega _{j-1}^2+\tfrac{2\varepsilon }{\xi _j j_{\max }}}-\omega _{j-1}^2\xi _j\\ =&\tfrac{\varepsilon }{j_{\max }}. \end{aligned}$$

resulting in \(\Vert R^{\text {comp.}}_{j}\Vert \le \Vert C_jw_{j-1}\Vert ^2+\frac{\varepsilon }{j_{\max }}\). Using Lemma 3.1 again for \(w_{j-1}\) and repeating the above steps for \(1\le j\le j_{\max }\) gives

$$\begin{aligned} \Vert R^{\text {comp.}}_{j_{\max }}\Vert&=\Vert w_{j_{\max }}\Vert ^2=\left\| \prod \limits _{j=1}^{j_{\max }}C_jw_{0}\right\| ^2+\sum \limits _{j=1}^{j_{\max }}\frac{\varepsilon }{j_{\max }}=\Vert R^{\text {exact}}_{j_{\max }}\Vert +\varepsilon . \end{aligned}$$

\(\square \)

Using Theorem 3.4 together with (3.11) yields the following conclusion.

Corollary 3.1

Under the same conditions as Theorem 3.4 we have \(\Vert R^{\text {true}}_{j_{\max }}\Vert \le \Vert R^{\text {exact}}_{j_{\max }}\Vert +2\varepsilon \).

Hence, if (3.12a) is used, then the (true) Lyapunov residual norms in inexact LR-ADI are a small perturbation of the residuals of the exact method.

Remark 3.2

Another way to enforce decreasing computed Lyapunov residuals norms \(\Vert R^{\text {comp.}}_{j}\Vert \) is using the stronger condition \(\Vert C_j\Vert <1\) together with a proportional inner accuracy \(\Vert s_j\Vert <\mu \Vert R^{\text {comp.}}_{j-1}\Vert \), \(0<\mu <1\), similar to [41] in the context of inexact stationary iterations. For the LR-ADI iteration this does, however, not ensure small residual gaps, \(\Vert \varDelta R^{\text {comp.}}_{j}\Vert <\varepsilon \), and, thus, no small true residuals.

Practical relaxation strategies for inexact LR-ADI The proposed stopping criteria (3.12) are not very practical, since the employed bound (3.15) will often give substantial overestimation of \(\Vert Mv_k\Vert \) by several orders of magnitude which, in turn, will result in smaller inner tolerances \(\tau ^{\text {LS}}\) than actually needed. Furthermore, computing or estimating the norm of the large matrix \(M(A+\alpha _kM)^{-1}\) is possible, e.g. by inexact Lanczos-type approaches, but the additional computational effort for this does not pay off because it can be easily more expensive compared to the remaining parts of a single iteration step. Here we propose some variations of the above approaches that are better applicable in an actual implementation. From the algorithmic description (Algorithm 2) of the LR-ADI iteration it holds

$$\begin{aligned} \Vert Mv_k\Vert =\frac{1}{\gamma ^2_k}\Vert w_k-w_{k-1}\Vert \le \frac{1}{\gamma ^2_k}(\Vert w_k\Vert +\Vert w_{k-1}\Vert ). \end{aligned}$$
(3.17)

In practice, the sequence \(\lbrace \Vert w_k\Vert \rbrace =\lbrace \sqrt{\Vert R^{\text {comp.}}_{k}\Vert }\rbrace \) is often monotonically decreasing. Assuming \(\Vert w_k\Vert \le \Vert w_{k-1}\Vert \) suggests to use \(\Vert Mv_k\Vert \le \frac{2\Vert w_{k-1}\Vert }{\gamma _k^2}\) in (3.14) leading to the relaxation criterion

$$\begin{aligned} \Vert s_k\Vert \le \tau ^{\text {LS}}_k=\frac{\varepsilon }{4j_{\max }\sqrt{\Vert R^{\text {comp.}}_{k-1}\Vert }}. \end{aligned}$$
(3.18a)

Starting from (3.13), assuming \(\Vert \eta _{k-1}^{\text {ADI}}\Vert \le \frac{(k-1)\varepsilon }{2j_{\max }}\), and enforcing \(\Vert \varDelta R_k^{\text {ADI}}\Vert <\frac{k\varepsilon }{j_{\max }}\) we obtain

$$\begin{aligned} \Vert s_k\Vert \le \tau ^{\text {LS}}_k=\frac{\frac{k\varepsilon }{j_{\max }}-2\Vert \eta _{k-1}^{\text {ADI}}\Vert }{4\sqrt{\Vert R^{\text {comp.}}_{k-1}\Vert }}. \end{aligned}$$
(3.18b)

The second relaxation strategy (3.18b) requires \(\Vert \eta _{k-1}^{\text {ADI}}\Vert =\Vert S_{k-1}\varGamma _{k-1}Z_{k-1}^*M^*\Vert \) or an approximation thereof. From \(\Vert \eta _{k-1}^{\text {ADI}}\Vert \le \Vert \eta _{k-2}^{\text {ADI}}\Vert +\gamma _{k-1}^2\Vert Mv_{k-1}\Vert \Vert s_{k-1}\Vert \) basic bounds \(\Vert \eta _{i}^{\text {ADI}}\Vert \le u_i\), \(i=0,\ldots ,k-1\) can be computed in each step via

$$\begin{aligned} \Vert \eta _{k-1}^{\text {ADI}}\Vert \le u_{k-1}:=u_{k-2}+\gamma _{k-1}^2\Vert Mv_{k-1}\Vert \Vert s_{k-1}\Vert ,\quad u_0:=0. \end{aligned}$$
(3.19)

Note that \(Mv_{k-1}\) is required anyway to continue Algorithm 2. The linear residual norms \(\Vert s_i\Vert ,~1\le i\le k-1\) are sometimes available as byproducts of Krylov subspace solvers for linear systems if right preconditioning is used. In case of other forms of preconditioning, the \(s_i\) might need to be computed explicitly, which requires extra matrix vector products with A (and M), or the norms \(\Vert s_i\Vert \) have to be estimated in some other way. Similar to RKSM, for problems defined by real but unsymmetric coefficients, pairs of complex conjugated shifts can occur in LR-ADI. These can be dealt with efficiently using the machinery developed in [10, 11, 30] to ensure that the majority of operations remains in real arithmetic. By following these results it is easily shown that if steps \(k-2, k-1\) used a complex conjugated pairs of shifts, then in the formula (3.19) the real and imaginary parts of both \(v_{k-2},s_{k-2}\) enter the update.

At the first look, (3.18) appears to allow somewhat less relaxation compared to RKSM since only the square roots of the computed Lyapunov residual norms appear in the denominator. However, the numerical examples in the next section show that with these relaxation strategies we can reduce the amount of work for inexact LR-ADI by up to 50%.

4 Numerical examples

In this section we consider several numerical examples and apply inexact RKSM and inexact LR-ADI with our practical relaxation strategies. The experiments were carried out in MATLAB ®  2016a on a Intel®Core™2 i7-7500U CPU @ 2.7 GHz with 16 GB RAM. We wish to obtain an approximate solution such that the scaled true Lyapunov residual norm satisfies

$$\begin{aligned} \mathfrak {R}:=\Vert R^{\text {true}}\Vert /\Vert B\Vert ^2\le \hat{\varepsilon },\quad 0<\hat{\varepsilon }\ll 1, \end{aligned}$$

i.e., \(\varepsilon =\hat{\varepsilon }\Vert B\Vert ^2\). In all tests we desire to achieve this accuracy with \(\hat{\varepsilon }=10^{-8}\) within at most \(j_{\max }=50\) iteration steps. In all but one case we employ dynamic shift generation strategies using the approach in [21] for RKSM and shifts based on a projection idea for LR-ADI, see  [12, 30] for details. For the latter one, the projection basis is chosen as the last \(\min (\mathrm {coldim}(Z_r),10r)\) columns of the low-rank factor \(Z_j\). For one case, the symmetric problem, optimal shifts following [54] for the LR-ADI iteration are used. For comparative purposes, the exact (direct) EKSM is also tested for some examples.

Table 1 Properties and setting of used test equations including save guard constant \(\delta \) in (2.24) and type of preconditioner used

We employ different Krylov subspace solvers and stopping criteria for the arising linear systems of equations, in particular, we use fixed as well as dynamically chosen inner solver tolerances \(\tau ^{\text {LS}}\) with the proposed relaxation strategies. The fixed solve tolerances were determined a-priori via trial-and-error such that the residual behavior of inexact method mimicked the residuals of the method using sparse-direct solution approaches for the linear systems. If not stated otherwise, the values \(\tau ^{\text {LS}}_{\min }=10^{-12}\), \(\tau ^{\text {LS}}_{\max }=0.1\) are taken as minimal and maximal linear solve tolerances. As Krylov subspace solvers we use BiCGstab and MINRES for problems with unsymmetric and symmetric (\(A=A^*\), \(M=M^*\)) coefficients, respectively. Sparse-direct solves were carried out by the MATLAB  backslash-routine, or, in case of EKSM, by precomputed sparse LU (or Cholesky) factorizations of A. EKSM and RKSM handled the generalized problems by means of the equivalent problem using sparse Cholesky factors of the matrix M as described in Sect. 2.5. We consider five examples, two standard Lyapunov equations (cd2d and heat3d, where heat3d is an example for which \(r>1\)), two generalized Lyapunov equation (fem3d, msd) and an algebraic Riccati equation (fem3d-are), in order to illustrate the theoretical results in this paper. Details and setup on the examples we used are summarized in Table 1. The matrices B for examples cd2d and heat3d were generated randomly with a standard Gaussian distribution and the initialization randn(’state’, 0). Examples fem3d, fem3d-are provide vectors BC [8]. Because of the symmetric coefficient in heat3d, we use this example to experiment with precomputed shifts from [54] instead of the dynamic shifts in all other experiments. The example msd represents coefficients of a first order linearization of a second order dynamical system and is a modified version of an example used in [52]. Further details on the setup of this example can be found in “Appendix B”. For efficient iterative solves, this example required updating the preconditioner in every iteration step.

At first we briefly investigate the results in Sect. 2.3 on the decay of Galerkin solution \(Y_j\) using example cd2d. We run the exact RKSM and plot the row norms of \(Y_j\), \(H_j^{-1}Y_j\) and the corresponding bounds obtained in Corollaries 2.1 and 2.2 against the iteration number j, as well as the absolute values of the entries of the final Galerkin solution \(Y_j\) in the left plot of Fig. 1. Real shift parameters were used for this experiment. The figures are an example to show that our theoretical bounds can indeed be verified, but that they significantly overestimate the true norms. The right plot shows the decay of the entries \(Y_j\) as predicted by Corollary 2.1. Similar results are obtained for the other examples.

Fig. 1
figure 1

Illustration of the decay of the Galerkin solution for Example cd2d. Left: Row norms of \(Y_j\), \(H_j^{-1}Y_j\) and the corresponding bounds (Corollary 2.1, Corollary 2.2) against the iteration number j. Right: Absolute values of the entries of the final \(Y_j\)

We now experiment with the different practical relaxation strategies (2.24) and (3.18) from Sects. 2 and 3 for the inner iteration. In Table 2 we report the results for all examples. There, we give, among other relevant information on the performance of the outer method under inexact inner solves, also the final obtained scaled computed residual norms \(\mathfrak {R}^{\text {comp}}_{j}\) (using the formula (2.9) for RKSM and (3.5) for LR-ADI). For assessing the reliability of the value of \(\mathfrak {R}^{\text {comp}}_{j}\), the distance \(\delta \mathfrak {R}_j:=\vert \mathfrak {R}^{\text {comp}}_j-\mathfrak {R}_j^{\text {true}}\vert \) to the true scaled residual norms is also listed, where \(\mathfrak {R}_j^{\text {true}}\) was computed using the Lanczos process on \(\mathscr {R}_j\).

First, we observe that in all examples the difference between the true and the computed Lyapunov residual norm, \(\delta \mathfrak {R}_j\), is of the order \(\mathscr {O}(10^{-9})\), or smaller.

The second observation we make is that both practical relaxation criteria for RKSM, namely (2.24a) and (2.24b) are effective and lead to a reduction in overall inner iteration numbers. They both lead to nearly the same results for the number of inner iterations, the gain in using (2.24b) over (2.24a) is very minor. For our examples, by using the relaxed stopping criterion up to 50 per cent of inner iterations can by saved compared to a fixed inner tolerance.

For the LR-ADI method we consider the two relaxation criteria (3.18a) and (3.18b). Again, we observe a reduction in the total number of iterations for both relaxation strategies, but we also see that the second relaxation criterion (3.18b) reduces the iteration numbers even further compared to (3.18a), so the use of (3.18b) is generally recommended. Here it pays off that the second strategy takes the previous iterate into account. The total savings in inner iterations between fixed and relaxed tolerance solves for LR-ADI is between 14 and 42% in our examples.

Table 2 Experimental results

The fairer comparison between the fixed and relaxed tolerance solve is the computation time; in Table 2 we report the computation time for direct solves as well as iterative solves with fixed and relaxed solve tolerances. For all examples we see that the relaxed solve tolerance also leads to (sometimes significant) savings in computation time. For Example fem3d, the generalized Lyapunov equation, the absolute time saving is not so significant (both for RKSM and LR-ADI), however, for the other three examples the savings are quite large, in particular for inexact RKSM. The results for the Riccati example fem3d-are indicate that the proposed relaxation criteria also work for inexact RKSM for Riccati equations. Note that for Example cd2d, the direct solver outperforms the iterative methods with fixed small solve tolerance (both for RKSM and LR-ADI), however, the relaxed tolerance versions perform similar to the direct methods in terms of computation time. This is to be expected as example cd2d represents a two-dimensional problem, where sparse-direct solvers are usually very efficient. For the three-dimensional examples heat3d, fem3d and fem3d-are the iterative solvers significantly outperform the direct method.

Regarding the comparison with EKSM, we observe that EKSM achieves sometimes the smallest computations times (e.g., examples cd2d, msd) but always generates the largest subspace dimensions among all tested methods. Using a block-MINRES for the heat3d example (with \(r=4\)) largely led to very similar results in terms of the numbers of inner iteration steps, but the overall computing times were slightly larger.

We point out that, in every experiment, RKSM, LR-ADI generated new shift parameters on their own. Interestingly and perhaps surprisingly, this did not lead to substantial differences regarding the Lyapunov residual behaviour, although the employed adaptive shift generation techniques [12, 21] are based on the eigenvalues of \(T_j\) (rational Ritz values of A) and, hence, depend on the generated subspace. One explanation might be that the eigenvalue generation itself can be seen as inexact rational Arnoldi process for the eigenvalue problem and if the inner solve tolerances are chosen intelligently, no large differences regarding the Ritz values should appear [23, 31, 42]. Some small differences in the required number of outer iteration steps are observed for example msd were, for instance, the inexact LR-ADI required less outer steps than the exact LR-ADI. One possible explanation could be that for this example slightly different shifts were generated in exact and inexact methods.

For the heat3d example precomputed shifts [54] were used within LR-ADI leading to almost indistinguishable residual curves (cf. right plot in Fig. 2) indicating that the optimal shift approaches for certain problem classes still work under inexact solves. Similar observations were made for the remaining examples when precomputed shifts were used for both exact and inexact methods.

Fig. 2
figure 2

Results for Example heat3d: Scaled computed residual norms \(\mathfrak {R}^{\text {comp}}\) and inner tolerances \(\tau ^{\text {LS}}\) vs iteration numbers obtained by (in)exact RKSM (left) and LR-ADI (right). Here, LR-ADI uses precomputed shifts

Figure 2 shows the scaled computed residual norms \(\mathfrak {R}^{\text {comp}}\) and inner tolerances \(\tau ^{\text {LS}}\) vs iteration numbers obtained by (in)exact RKSM and LR-ADI for Example heat3d.

On the left we plot the convergence history for the exact RKSM (e.g. direct solves within the inner iteration), the inexact RKSM with fixed small solve tolerance within the iterative solve and the inexact RKSM with relaxation criterion (2.24a) and (2.24b) within the iterative solution of the inner linear system. All computed residual norms are decreasing and virtually indistinguishable. The solve tolerances when using relaxation criterion (2.24a) are shown in dotted lines with diamonds and the ones using criterion (2.24b) are shown in dashed lines with red circles. The relaxation criteria lead to increasing inner solve tolerances, but as already observed in Table 2, both criteria for inexact RKSM give nearly the same results.

The right plot in Fig. 2 shows the same results for LR-ADI. Again,the convergence history of the residual norms for inexact LR-ADI using the two relaxation strategies is not visibly distinguishable from the exact LR-ADI. However, we observe that the second relaxation criterion (3.18b), shown in dashed lines with red circles, gives better results, e.g. more relaxation and hence fewer inner iterations, than the first relaxation criterion (3.18a), shown in dotted lines with diamonds, a result also observed in Table 2. Similar plots as in Fig. 2 can be obtained for other examples.

5 Conclusions and future work

The numerical solution of large scale Lyapunov equations is a very important problem in many applications. The rational Krylov subspace method (RKSM) and the low-rank alternating directions implicit (LR-ADI) iteration are well-established methods for computing low-rank solution factors of large-scale Lyapunov equations. The main task in both those methods is to solve a linear system at each step, which is usually carried out iteratively and hence inexactly.

We observed empirically that, when solving the linear system at each iteration step, the accuracy of the solve may be relaxed while maintaining the convergence to the solution of the Lyapunov equation. In this paper we have presented a theoretical foundation for explaining this phenomenon, both for the inexact RKSM method and the inexact low-rank ADI iteration. For both methods we introduced a so-called residual gap, which depends on the accuracy of the linear system solve and on quantities arising from the solution methods for the large scale Lyapunov equation. We analyzed this gap for each method which provided theoretical relaxation criteria for both inexact RKSM and inexact ADI. These criteria are often not applicable in practice as they contain unknown and/or overestimated quantities. Hence, we gave practical relaxation criteria for both methods. Our numerical results indicate that using flexible accuracies gives very good results and can reduce the amount of work for solving large scale Lyapunov equations by up to 50%.

One numerical experiment with inexact RKSM indicated that relaxation strategies might also be fruitful for low-rank methods for algebraic Riccati equations [6,7,8, 12, 14, 44, 48], making this an obvious future research topic, together with inexact linear solves in low-rank methods for Sylvester equations [9, 13, 30]. In this work, we restricted ourselves to standard preconditioning techniques. Improved concepts such as tuned preconditioners and similar ideas [15, 23] might further enhance the performance of the inner iteration process. Preliminary tests, however, did not yield any performance gain from these techniques worth mentioning, further investigations are necessary in this direction. A further research direction worth pursuing is to reduce the computational effort for solving the sequences of shifted linear systems by storing the Krylov basis obtained from solving one (e.g., the first) linear system and employing subspace recycling techniques as, e.g., discussed for LR-ADI in [32] and for rational Krylov methods in the context of model order reduction in [1]. Allowing inexact matrix vector products, and in case of generalized equations also inexact solves with M, represents a further, more challenging research perspective.