1 Introduction

Let \(A\in \mathbb {R}^{n\times n}\) be a large nonsymmetric and nonsingular matrix, and let V and \(W\in \mathbb {R}^{n\times s}\) be two block vectors with 1 < sn. We are interested in approximating expressions of the form

$$ \mathcal{I}_{tr}:=\text{trace}(W^{T}f(A)V)\quad \text{ and }\quad \mathcal{I}:=W^{T}f(A)V, $$
(1)

where f is a function such that f(A) is well defined. Evaluation of expressions (1) arises in various applications such as in network analysis (\(f(t)=\exp (t)\) or f(t) = t3) [10], machine learning \((f(t)=\log (t))\) [13, 20], theory of Quantum Chromodynamics (f(t) = t1/2), electronic structure computation [3, 7, 22], and the solution of ill-posed problems [11, 14]. The matrix function f(A) can be defined by the spectral factorization of A; see, e.g., [14, 16] for discussions on several possible definitions of matrix functions. In the present paper, we assume that the matrix A is so large that it is impractical to evaluate its spectral factorization.

Approximation of \(\mathcal {I}_{tr}\) in the case when A is a symmetric matrix and W = V and using global methods is well studied in [4]. Global Krylov subspace techniques were first proposed in [17] for solving linear equations with multiple right hand sides and Lyapunov equations. Consider the global Krylov subspace

$$ {\mathbb{K}^{gl}_{m}(A,V)}=\text{span}\{V,AV,\ldots,A^{m-1}V\}=\{p(A)V : p\in\varPi_{m-1}\}, $$
(2)

where πm− 1 denotes the set of polynomials of degree at most m − 1. Using m steps of the global Lanczos method [18] to A with the initial block vector V leads to the decomposition

$$ A\mathbb{V}_{m}=\mathbb{V}_{m}(T_{m}\otimes I_{s})+\beta_{m}V_{m+1}{E^{T}_{m}}, $$
(3)

where ⊗ stands for the Kronecker product. The block columns V1,…,Vm of the matrix \(\mathbb {V}_{m}=[V_{1},\ldots ,V_{m}]\in \mathbb {R}^{n\times ms}\) form an F-orthonormal (with respect to the Frobenius inner product) basis of the global Krylov subspace (2) which means that

$$ \langle V_{j},V_{k} \rangle_{F} := \text{trace}({V_{k}^{T}}V_{j})= \left\{\begin{array}{cc} 1 & j=k,\\ 0 & j\ne k. \end{array}\right. $$

Moreover, the matrix \(T_{m}\in \mathbb {R}^{m\times m}\) in (3) is symmetric and tridiagonal, \(I_{s}\in \mathbb {R}^{s\times s}\) denotes the identity matrix, βm ≥ 0 and \(E_{m}\in \mathbb {R}^{ms\times s}\) is the m th block axis vector, i.e., Em corresponds to the (m − 1)s + 1,…,ms columns of the identity matrix Ims.

As for the classical Krylov subspace methods and in the symmetric case, it is attractive to approximate the expression \(\mathcal {I}_{tr}\) in (1) by

$$ \mathcal{G}_{m}(f):=\|V\|^{2}_{F} \widetilde{e}^{T}_{1}f(T_{m})\widetilde{e}_{1}, $$
(4)

where \(\widetilde {e}_{1}\) is the first vector of the canonical basis of \(\mathbb {R}^{m}\). One can show that the approximation (4) is exact for polynomials of degree at most 2m − 1, i.e.,

$$ \mathcal{I}_{tr}(p)=\mathcal{G}_{m}(p),\quad \forall p\in\varPi_{2m-1}, $$

see, Bellalij et al. [4] for more details. When A is nonsymmetric, the F-orthonormal basis \(\mathbb {V}_{m}\) of \({\mathbb {K}^{gl}_{m}(A,V)}\) cannot be generated with short recurrence relations. The nonsymmetric global Lanczos method [18] generates a pair of F-biorthonormal bases \(\mathbb {V}_{m}=[V_{1},\ldots ,V_{m}]\) and \(\mathbb {W}_{m}=[W_{1},\ldots ,W_{m}]\) with short recurrence relations, for the Krylov subspaces \(\mathbb {K}_{m}(A,V)\) and \(\mathbb {K}_{m}(A^{T},W)\), respectively. Application of m steps of global Lanczos nonsymmetric algorithm yields the following relations:

$$ \begin{array}{@{}rcl@{}} A\mathbb{V}_{m}&=&\mathbb{V}_{m}({T^{b}_{m}}\otimes I_{s})+\beta_{m}V_{m+1}{E^{T}_{m}},\\ A^{T}\mathbb{W}_{m}&=&\mathbb{W}_{m}(T^{b,T}_{m}\otimes I_{s})+\gamma_{m}W_{m+1}{E^{T}_{m}}. \end{array} $$

The block vectors Vis and Wis of \(\mathbb {V}_{m}=[V_{1},\ldots ,V_{m}]\) and \(\mathbb {W}_{m}=[W_{1},\ldots ,W_{m}]\) elements \(\mathbb {R}^{n\times s}\) are said to be F-biorthonormal, if

$$ \begin{array}{@{}rcl@{}} \langle V_{j},W_{k}\rangle_{F} := \text{trace}({W_{k}^{T}}V_{j})= \left\{\begin{array}{cc} 1 & j=k,\\ 0 & j\ne k. \end{array}\right. \end{array} $$

The matrix \({T^{b}_{m}}\in \mathbb {R}^{m\times m}\) is nonsymmetric and tridiagonal. The coefficients \(\beta _{m},\gamma _{m}\in \mathbb {R}\), and Em is the matrix defined in (3). The expression \(\mathcal {I}_{tr}\) in (1) can be approximated by

$$ \mathcal{G}_{m}^{b}(f):=\langle V,W \rangle_{F}\widetilde{e}^{T}_{1}f({T^{b}_{m}})\widetilde{e}_{1}. $$

In the same way as in [12], one can show that the approximation \(\mathcal {G}_{m}^{b}\) is exact for polynomials of degree at most 2m − 1, i.e.,

$$ \mathcal{I}_{tr}(p)=\mathcal{G}_{m}^{b}(p)\quad \forall p\in\varPi_{2m-1}. $$

Approximation of WTf(A)V is well studied in the literature; see, e.g., [12, 21] and the references therein. For the nonsymmetric matrices, Reichel et al. [21], approximate \(\mathcal {I}\) by performing m steps with the nonsymmetric block Lanczos method, applied to A with initial block vectors W and V and this leads to the following algebraic relations:

$$ \begin{array}{rcl} A[V_{1},\ldots,V_{m}]&=&[V_{1},\ldots,V_{m}]J_{m}+V_{m+1}\varGamma_{m}{E^{T}_{m}},\\ A^{T}[W_{1},\ldots,W_{m}]&=&[W_{1},\ldots,W_{m}]{J^{T}_{m}}+W_{m+1}{\varDelta}_{m}{E^{T}_{m}}, \end{array} $$

where \(W_{j},V_{j}\in \mathbb {R}^{n\times s}\) are bi-orthonormal, i.e.,

$$ {W^{T}_{k}}V_{j} := \left\{\begin{array}{cc} I_{s} & j=k,\\ O_{s} & j\ne k. \end{array}\right. $$

Os denotes the zero matrix of order s. The matrix \(J_{m}\in \mathbb {R}^{ms\times ms}\) is a block nonsymmetric tridiagonal matrix. The size of the block entries is s × s. \(\varGamma _{m},{\varDelta }_{m}\in \mathbb {R}^{s\times s}\), and Em are the matrix defined in (3). In [21], it was shown that \({\mathcal {G}^{\text {block}}_{m}(f)}\) defined by

$$ {\mathcal{G}^{\text{block}}_{m}(f)}:=\widetilde{E}^{T}_{1}f(J_{m})\widetilde{E}_{1}, $$

where \(\widetilde {E}_{1}\in \mathbb {R}^{m\times s}\) corresponds to the first s columns of the identity matrix Im, can be used to approximate \(\mathcal {I}\). Also \({\mathcal {G}^{\text {block}}_{m}}\) is exact for polynomials of degree at most 2m − 1, i.e.,

$$ \mathcal{I}(p)={\mathcal{G}^{\text{block}}_{m}(p)},\quad \forall p\in\varPi_{2m-1}. $$

In the context of approximating expressions of the form f(A)b for some vector \(b\in \mathbb {R}^{n}\), Druskin et al. [9, 19] computed approximation using the standard extended Krylov subspace generated by the vectors Amb,…,A− 1b, b, Ab,…, Am− 1b.

We are interested in exploring approximations of (1) by using a pair of F-biorthonormal bases from the two extended global Krylov subspaces

$$ \begin{array}{rl} \mathbb{K}_{m}^{e}(A,V)&=\text{span}\{V,A^{-1}V,AV,\ldots,A^{m-1}V,A^{-m}V\},\\ \mathbb{K}_{m}^{e}(A^{T},W)&=\text{span}\{W,A^{-T}W,A^{T}W,\ldots,A^{m-1,T}W,A^{-m,T}W\}. \end{array} $$
(5)

This paper is organized as follows. Section 2 introduces the extended nonsymmetric global Lanczos process, for generating a pair of F-biorthonormal bases for the two extended global Krylov subspaces defined by (5). Section 3 describes the application of this process to the approximation of the two matrix functions given in (1) and give some properties. Section 4 presents some numerical experiments that illustrate the quality of the computed approximations.

2 Some algebraic properties of the extended nonsymmetric global Lanczos process

2.1 Preliminaries and notations

We begin by recalling some notations and definitions that will be used throughout this paper. The Kronecker product satisfies the following properties:

  1. 1.

    (AB)(CD) = ACBD.

  2. 2.

    (AB)T = ATBT.

Definition 1

Partition the matrices \(M=[M_{1},\ldots ,M_{p}]\in \mathbb {R}^{n\times ps}\) and \(N=[N_{1},\ldots ,N_{l}]\in \mathbb {R}^{n\times ls}\) into block columns \(M_{i},N_{j}\in \mathbb {R}^{n\times s}\), and define the ◇-product of the matrices M and N as

$$ M^{T}\diamond N=[\text{trace}({M^{T}_{i}}N_{j})]_{i=1,\ldots,p}^{j=1,\ldots,l}\in{\mathbb R}^{p\times l}. $$
(6)

The following proposition gives some properties satisfied by the above product.

Proposition 1

[5, 6] Let \(A,B,C\in \mathbb {R}^{n\times ps}\), \(D\in \mathbb {R}^{n\times n}\), \(L\in \mathbb {R}^{p\times p}\), and \(\alpha \in \mathbb {R}^{n\times n}\). Then, we have

  1. 1.

    (A + B)TC = ATC + BTC

  2. 2.

    AT ◇ (B + C) = ATB + ATC

  3. 3.

    (αA)TC = α(ATC)

  4. 4.

    (ATB)T = BTA

  5. 5.

    (DA)TB = AT ◇ (DTB)

  6. 6.

    AT ◇ (B(LIs)) = (ATB)L

Definition 2

Let X be an n × s matrix. The vectorization of the matrix X, denoted vec(X), is the ns × 1 column vector obtained by stacking the columns of the matrix X, i.e.,

$$\text{vec}(X)=[X_{1,1},\ldots,X_{n,1},X_{1,2},\ldots,X_{n,2},\ldots,X_{1,s},\ldots,X_{n,s}]^{T}.$$

vec satisfies the following properties:

  1. 1.

    vec(MXN) = (NTM)vec(X), \(\forall A\in \mathbb {R}^{p\times q},N\in \mathbb {R}^{r\times s},\)\(X\in \mathbb {R}^{q\times r}\).

  2. 2.

    vec(A)Tvec(B) = trace(ATB), \(\forall A,B\in \mathbb {R}^{n\times n}\).

Let M = [M1,…,Mm] and \(N=[N_{1},\ldots ,N_{m}]\in \mathbb {R}^{n\times ms}\), with \(M_{i},N_{i}\in \mathbb {R}^{n\times s}\). The following algorithm applied to the pair (M, N) allows us to obtain two F-biorthonormal matrices \(\mathbb {V},\mathbb {W}\in \mathbb {R}^{n\times ms}\).

figure a

Proposition 2

Let M and N be the matrices defined above and let \(\mathbb {V}\) and \(\mathbb {W}\in \mathbb {R}^{n\times ms}\) be the corresponding matrices obtained from Algorithm 1. Then we have the following decompositions:

$$M=\mathbb{V}(H\otimes I_{s}), \quad N=\mathbb{W}(G\otimes I_{s}),$$

where H = [hi, j] and G = [gi, j] are the two m × m upper triangular matrices determined by Algorithm 1 and satisfying

$$H=\mathbb{W}^{T}\diamond M \quad and \quad G=\mathbb{V}^{T}\diamond N.$$

Proof

From Algorithm 1, the block vectors Vj and Wj are written as follows:

$$ M_{j}=\sum\limits_{i=1}^{j} h_{i,j}V_{i},\text{ and } N_{j}=\sum\limits_{i=1}^{j} g_{i,j}W_{i}, $$

and because hi, j and gi, j vanish for i > j, we obtain

$$ M_{j}=\sum\limits_{i=1}^{m} V_{i}(h_{i,j}\otimes I_{s})\text{ and } N_{j}=\sum\limits_{i=1}^{m} W_{i}(g_{i,j}\otimes I_{s}). $$

Then

$$M=\mathbb{V}(H\otimes I_{s}) \text{ and } N=\mathbb{W}(G\otimes I_{s}).$$

Using the properties of ◇-product, it follows that

$$ \begin{array}{@{}rcl@{}} H&=&(\mathbb{W}^{T}\diamond\mathbb{V})H=\mathbb{W}^{T}\diamond(\mathbb{V}(H\otimes I_{s}))=\mathbb{W}^{T}\diamond M.\\ G&=&(\mathbb{V}^{T}\diamond\mathbb{W})G=\mathbb{V}^{T}\diamond(\mathbb{W}(G\otimes I_{s}))=\mathbb{V}^{T}\diamond N. \end{array} $$

2.2 Description of the process

In this section, we recall the standard extended nonsymmetric Lanczos method given in [25]. Proofs in this section are formulated from results given in [25]. Let v and w be two vectors in \(\mathbb {R}^{n}\). The standard extended nonsymmetric Lanczos algorithm applied to the pairs (A, v) and (AT,w) determines two bi-orthonormal basis \(\mathbb {V}_{2m+2}=[v_{1},\ldots ,v_{2m+2}]\) and \(\mathbb {W}_{2m+2}=[w_{1},\ldots ,w_{2m+2}]\) of dimensions n × 2m, with \(v_{i},w_{i}\in \mathbb {R}^{n}\) (see [25, Algorithm 3]). Then the odd vectors v2j+ 1,w2j+ 1 verify the following equations:

$$ \begin{array}{ll} \alpha=\sqrt{|w^{T}v|}, v_{1}&=v/\alpha, \beta= w^{T}v/\alpha, w_{1}=w/\beta, \\ h_{2j+1,2j-1}v_{2j+1}&=Av_{2j-1}-\sum \limits_{{i=2j-3}}^{2j} h_{i,2j-1}v_{i}=\widehat{v}_{2j+1},\\ g_{2j+1,2j-1}w_{2j+1}&=A^{T}w_{2j-1}-\sum \limits_{{i=2j-3}}^{2j} g_{i,2j-1}w_{i}=\widehat{w}_{2j+1}, \end{array} j=1,\ldots,m. $$
(7)

where

$$ h_{i,2j-1}={w_{i}^{T}}Av_{2j-1} \text{ and } g_{i,2j-1}={v_{i}^{T}}A^{T}w_{2j-1}, \text{ for } i=1,\ldots,2j. $$

The h2j+ 1,2j− 1,g2j+ 1,2j− 1 are such that \(w_{2j+1}^{T}v_{2j+1}=1\). Hence,

$$ h_{2j+1,2j-1}=\sqrt{|\widehat{w}_{2j+1}^{T}\widehat{v}_{2j+1}|}, g_{2j+1,2j-1}=(\widehat{w}_{2j+1}^{T}\widehat{v}_{2j+1})/h_{2j+1,2j-1}. $$

Similarly, the even vectors v2j+ 2,w2j+ 2 are computed by the following relations:

$$ \begin{array}{lll} \widehat{v}_{2}&=A^{-1}v-({w_{1}^{T}}A^{-1}v)v_{1}, &\widehat{w}_{2}=A^{-T}w-({v_{1}^{T}}A^{-T}w)w_{1}\\ \gamma&=\sqrt{|\widehat{w}_{2}^{T}\widehat{v}_{2}|}, v_{2}=\widehat{v}_{2}/\gamma, &\varGamma= \widehat{w}_{2}^{T}\widehat{v}_{2}/\gamma, w_{2}=\widehat{w}_{2}/\varGamma, \end{array} $$
$$ \begin{array}{ll} h_{2j+2,2j}v_{2j+2}&=A^{-1}v_{2j}-\sum\limits_{{i=2j-2}}^{2j+1} h_{i,2j}v_{i}=\widehat{v}_{2j+2},\\ g_{2j+2,2j}w_{2j+2}&=A^{-T}w_{2j}-\sum\limits_{{i=2j-2}}^{2j+1} g_{i,2j}w_{i}=\widehat{w}_{2j+2}, \end{array} j=1,\ldots,m. $$
(8)

where

$$ h_{i,2j}={w_{i}^{T}}A^{-1}v_{2j} \text{ and } g_{i,2j}={v_{i}^{T}}A^{-T}w_{2j}, \text{ for } i=1,\ldots,2j+1. $$

The h2j+ 2,2j,g2j+ 2,2j are such that \(w_{2j+2}^{T}v_{2j+2}=1\). Hence,

$$ h_{2j+2,2j}=\sqrt{|\widehat{w}_{2j+2}^{T}\widehat{v}_{2j+2}|}, g_{2j+2,2j}=\widehat{w}_{2j+2}^{T}\widehat{v}_{2j+2}/h_{2j+2,2j}. $$

In the case where V and W are two blocks of size n × s, the implementation of the extended nonsymmetric global Lanczos algorithm applied to (A, V ) and (AT,W) will be similar to the standard algorithm [25, Algorithm 3] except that the standard inner product will be replaced by the Frobenius inner product 〈⋅,⋅〉F. This algorithm provides two F-biorthonormal bases \(\mathbb {V}_{2m+2}=[v_{1},\ldots ,v_{2m+2}]\) and \(\mathbb {W}_{2m+2}=[w_{1},\ldots ,w_{2m+2}]\). The dimension of these bases is n × 2(m + 1)s, where \(v_{i},w_{i}\in \mathbb {R}^{n\times s}\) are the i th block vector of \(\mathbb {V}_{2m+2}\) and \(\mathbb {W}_{2m+2}\), respectively. In addition, the block vectors vi and wi satisfy the relations (7) and (8). Following the idea in [24], the column vectors of the bases \(\mathbb {V}_{2m+2}\) and \(\mathbb {W}_{2m+2}\) can be computed two by two blocks as follows. Introduce

$$V_{j}=[v_{2j-1} , v_{2j}],\quad W_{j}=[w_{2j-1} , w_{2j}],$$

and

$$ H_{i,j}=\left[\begin{array}{ll} h_{2i-1,2j-1} & h_{2i-1,2j} \\ h_{2i,2j-1} & h_{2i,2j} \end{array}\right],\quad G_{i,j}=\left[\begin{array}{ll} g_{2i-1,2j-1} & g_{2i-1,2j} \\ g_{2i,2j-1} & g_{2i,2j} \end{array}\right], $$
(9)

where Vj and Wj are the j th n × 2s block vectors of the matrices \(\mathbb {V}_{2m+2}=[V_{1},\ldots ,V_{m+1}]\) and \(\mathbb {W}_{2m+2}=[W_{1},\ldots ,W_{m+1}]\), respectively; Hi, j and Gi, j are 2 × 2 matrices. The block vectors Vj and Wj are such that

$$ [V , A^{-1}V]=V_{1}(H_{0} \ \otimes I_{s}), \quad [W , A^{-T}W]=W_{1}(G_{0}\ \otimes I_{s}), $$

where H0 and \(G_{0}\in \mathbb {R}^{2\times 2}\) are computed by applying Algorithm 1 to [V, A− 1V ] and \([W,A^{-T}W]\in \mathbb {R}^{n\times 2s}\). And

$$ \begin{array}{@{}rcl@{}} V_{j+1}(H_{j+1,j}\otimes I_{s})&=&[Av_{2j-1},A^{-1}v_{2j}]-V_{j-1}(H_{j-1,j}\otimes I_{s})-V_{j}(H_{j,j}\otimes I_{s}),\\ W_{j+1}(G_{j+1,j}\otimes I_{s})&=&[A^{T}w_{2j-1},A^{-T}w_{2j}] - W_{j-1}(G_{j-1,j}\!\otimes\! I_{s}) - W_{j}(G_{j,j}\!\otimes\! I_{s}). \end{array} $$

Set

$$ H_{0}=\left( \begin{array}{ll} \alpha_{1,1} & \alpha_{1,2}\\ 0 & \alpha_{2,2}, \end{array}\right)\quad G_{0}=\left( \begin{array}{ll} \beta_{1,1} & \beta_{1,2}\\ 0 & \beta_{2,2} \end{array}\right). $$
(10)

As the n × 2s blocks V1,…,Vm and W1,…,Wm are F-biorthonormal (with respect to the Frobenius-product) and by using the properties of the ◇-product, it follows that

$$ H_{i,j}={W^{T}_{i}}\diamond [Av_{2j-1} , A^{-1}v_{2j}] \text{ and } G_{i,j}={V^{T}_{i}}\diamond [A^{T}w_{2j-1} , A^{-T}w_{2j}]. $$

The extended global nonsymmetric Lanczos is summarized in the following algorithm (Algorithm 2).

Proposition 3

Let v1,v2,…,v2m+ 2 and w1,w2,…,w2m+ 2 be the F-biorthon ormal block vectors determined by Algorithm 2. Then

$$ \begin{array}{ccc} v_{2j-1}&=&p^{V}_{j-1}(A)V+q^{V}_{j-1}(A^{-1})V,\\ v_{2j}&=&r^{V}_{j-1}(A^{-1})A^{-1}V+s^{V}_{j-1}(A)V, \end{array} $$

where deg(pj− 1) = deg(rj− 1) = j − 1, deg(qj− 1) ≤ j − 1 and deg(sj− 1) ≤ j − 1. These relations also hold when V and A are replaced by W and AT, respectively.

Proof

The result follows easily from Algorithm 2. □

We notice here that in the previous proposition, the block vectors [V1,…,Vm+ 1] and [W1,…,Wm+ 1] form a F-biorthonormal basis of \(\mathbb {K}^{e}_{m+1}(A,V)\) and \(\mathbb {K}^e_{m+1}(A^T,W)\), respectively. When inverting A or solving linear systems via the LU decomposition is not easy, then we can compute matrix products of the form A− 1V and ATW by using an efficient iterative linear solver such as the well-known GMRES method.

figure b

We now introduce the 2m × 2m matrices T2m and S2m, given by

$$ T_{2m}=\mathbb{W}_{2m}^{T}\diamond A\mathbb{V}_{2m},\quad S_{2m}=\mathbb{W}_{2m}^{T}\diamond A^{-1}\mathbb{V}_{2m}, $$
(11)

where \(\mathbb {V}_{2m}=[V_1,\ldots ,V_m]\) and \(\mathbb {W}_{2m}=[W_1,\ldots ,W_m]\). An important result concerning the connection between the proposed method and the standard extended nonsymmetric Lanczos method is given by the following theorem.

Theorem 1

Let V, W be the initial given block vectors of \(\mathbb {R}^{n\times s}\). Set \(\mathcal {A}=I_s\otimes A\), v = vec(V ), and \(w=vec(W)\in \mathbb {R}^{ns}\) and perform m steps of the standard extended nonsymmetric Lanczos applied to the pairs \((\mathcal {A},v)\) and \((\mathcal {A}^T,w)\). Then we obtain the same F- biorthonormal basis \(\mathbb {V}_{2m+2}\), \(\mathbb {W}_{2m+2}\) as the one determined by the extended nonsymmetric global Lanczos algorithm. Moreover, the basis \(\mathbb {V}_{2m+2}\), \(\mathbb {W}_{2m+2}\) satisfy the following relations:

$$ \begin{array}{@{}rcl@{}} A\mathbb{V}_{2m}&=&\mathbb{V}_{2m}(T_{2m}\otimes I_{s})+v_{2m+1} \left( \left[\begin{array}{ll} t_{2m+1,2m-1}, & t_{2m+1,2m} \end{array}\right] {E^{T}_{m}}\otimes I_{s}\right), \end{array} $$
(12)
$$ \begin{array}{@{}rcl@{}} A^{-1}\mathbb{V}_{2m}&=&\mathbb{V}_{2m}(S_{2m}\otimes I_{s})+ V_{m+1}\left( \left[\begin{array}{ll} 0 & s_{2m+1,2m}\\0 & s_{2m+2,2m} \end{array}\right] {E^{T}_{m}}\otimes I_{s}\right),\\ A^{T}\mathbb{W}_{2m}&=&\mathbb{W}_{2m}(T^{T}_{2m}\otimes I_{s})+w_{2m+1} \left( \left[\begin{array}{ll} \widetilde{t}_{2m+1,2m-1}, & \widetilde{t}_{2m+1,2m} \end{array}\right] {E^{T}_{m}}\!\otimes\! I_{s}\right), \end{array} $$
(13)
$$ \begin{array}{@{}rcl@{}} A^{-T}\mathbb{W}_{2m}&=&\mathbb{W}_{2m}(S^{T}_{2m}\otimes I_{s})+ W_{m+1}\left( \left[\begin{array}{ll} 0 & \widetilde{s}_{2m+1,2m}\\ 0 & \widetilde{s}_{2m+2,2m} \end{array}\right] {E^{T}_{m}}\otimes I_{s}\right), \end{array} $$

where \([\widetilde {t}_{2m+1,2m-1},\widetilde {t}_{2m+1,2m}]=[\langle A^Tw_{2m-1},v_{2m+1}\rangle _F,\langle A^Tw_{2m},v_{2m+1}\rangle _F]\), and \([\widetilde {s}_{2m+1,2m},\widetilde {s}_{2m+2,2m}]=[\langle A^{-T}w_{2m},v_{2m+1}\rangle _F,\langle A^{-T}w_{2m},v_{2m+2}\rangle _F]\).

\(E_m=[e_{2m-1},e_{2m}]\in \mathbb {R}^{2m\times 2}\) corresponds to the last two columns of the identity matrix I2m.

Proof

Perform m steps of the extended Lanczos algorithm applied to the pairs \((\mathcal {A},v)\) and \((\mathcal {A}^T,w)\) yield two ns × (2m + 2) bi-orthonormal bases \(\mathcal {V}_{2m+2}=[x_1,\ldots ,x_{2m+2}]\), \(\mathcal {W}_{2m+2}=[y_1,\ldots ,y_{2m+2}]\). These bases satisfy the relations (7) and (8) for the matrix \(\mathcal {A}\). By setting, \(X_j,Y_j\in \mathbb {R}^{n\times s}\) such that xj = vec(Xj) and yj = vec(Yj), we obtain

$$ \text{trace}({Y_{i}^{T}}X_{j})=vec(Y_{i})^{T}\text{vec}(X_{j})={y^{T}_{i}}x_{j}=\delta_{i,j}, $$

which means that Xj,Yj are F-biorthonormal. In addition, it follows that Xj = Vj and Yj = Wj since

$$ \begin{array}{@{}rcl@{}} h_{i,2j-1}&=&{y^{T}_{i}}\mathcal{A}x_{2j}=\text{vec}(Y_{i})^{T}(I_{s}\otimes A)\text{vec}(X_{2j}).\\ &=&\text{vec}(Y_{i})^{T}\text{vec}(AX_{2j})=\langle AX_{2j-1},Y_{i}\rangle_{F}.\\ g_{i,2j-1}&=&{v^{T}_{i}}\mathcal{A}^{T}w_{2j-1}=\langle A^{T}Y_{2j-1},X_{i}\rangle_{F}.\\ h_{i,2j}&=&{w^{T}_{i}}\mathcal{A}^{-1}v_{2j}=\langle A^{-1}X_{2j},Y_{i}\rangle_{F}.\\ g_{i,2j}&=&{v^{T}_{i}}\mathcal{A}^{-T}w_{2j}=\langle A^{-T}Y_{2j},X_{i}\rangle_{F}. \end{array} $$

Using the same techniques as above, we obtain that \(T_{2m}=\mathcal {W}^T_{2m}\mathcal {A}\mathcal {V}_{2m}\) and \(S_{2m}=\mathcal {W}^T_{2m}\mathcal {A}^{-1}\mathcal {V}_{2m}\). Let us prove the relation (12). Using [25, Lemma 3.1], it holds that

$$ \mathcal{A}\mathcal{V}_{2m}= \mathcal{V}_{2m}\mathcal{T}_{2m}+x_{2m+1}[t_{2m+1,2m-1},t_{2m+1,2m}]{E^{T}_{m}}, $$

where \(\mathcal {T}_{2m}=\mathcal {W}^T_{2m}\mathcal {A}\mathcal {V}_{2m}=T_{2m}\).

In other hand,

$$ \begin{array}{@{}rcl@{}} \mathcal{A}\mathcal{V}_{2m}&=&(I_{s}\otimes A)[vec(v_{1}),\ldots,vec(v_{2m})], \\ &=&[vec(Av_{1}),\ldots, vec(Av_{2m})], \end{array} $$

while

$$ \begin{array}{@{}rcl@{}} \mathcal{V}_{2m}T_{2m}&=&\left[\begin{array}{l} \sum\limits_{i=1}^{2m}[T_{2m}e_{1}]_{i} \text{vec}(v_{i}),\ldots,\sum\limits_{i=1}^{2m}[T_{2m}e_{2m}]_{i} \text{vec}(v_{i}) \end{array}\right]\\ &=&[vec(\mathbb{V}_{2m}(T_{2m}e_{1}\otimes I_{s})),\ldots,vec(\mathbb{V}_{2m}(T_{2m}e_{2m}\otimes I_{s}))]. \end{array} $$

Define \(\tau =[t_{2m+1,2m-1},t_{2m+1,2m}]E^T_m\), then

$$ x_{2m+1}\tau=[vec(t_{2m+1,2m-1}v_{2m+1}),vec(t_{2m+1,2m}v_{2m+1})]{E^{T}_{m}}. $$

For i = 1,…,2m − 2, it is shown that

$$ \text{vec}(Av_{i})=\text{vec} (\mathbb{V}_{2m}(T_{2m}e_{i}\otimes I_{s})) $$

and

$$ \begin{array}{@{}rcl@{}} \text{vec}(Av_{2m-1})&=& \text{vec} (\mathbb{V}_{2m}(T_{2m}e_{2m-1}\otimes I_{s}))+\text{vec}(v_{2m+1}(t_{2m+1,2m-1}\otimes I_{s})),\\ \text{vec}(Av_{2m})&=& \text{vec} (\mathbb{V}_{2m}(T_{2m}e_{2m}\otimes I_{s}))+\text{vec}(v_{2m+1}(t_{2m+1,2m}\otimes I_{s})). \end{array} $$

Which completes the proof of (12). The proof of the other relations is similar. □

The next two propositions express the entries of T2m and S2m in terms of the recursion coefficients. This will allow us to compute the entries quite efficiently.

Proposition 4

Let the coefficients hi, j and αi, j as defined in (9) and (10), respectively. The matrix T2m = [ti, j] in (11) is pentadiagonal with the nontrivial entries,

$$ \begin{array}{@{}rcl@{}} t_{i,2j-1}&=&h_{i,2j-1}\quad \text{for } i\in\{2j-3,\ldots,2j+1\},\\ t_{1,2}&=&\frac{1}{\alpha_{2,2}}(\alpha_{1,1}-\alpha_{1,2}h_{1,1}),\\ t_{2,2}&=&\frac{-1}{\alpha_{2,2}}\alpha_{1,2}h_{2,1},\\ t_{3,2}&=&\frac{-1}{\alpha_{2,2}}\alpha_{1,2}h_{3,1}. \end{array} $$

For j = 1,2,…,m − 1,

$$ \begin{array}{@{}rcl@{}} t_{2j+1,2j+2}&=&\frac{-1}{h_{2j+2,2j}}t_{2j+1,2j-1:2j+1}h_{2j-1:2j+1,2j},\\ t_{2j+2,2j+2}&=&\frac{-1}{h_{2j+2,2j}}t_{2j+2,2j+1}h_{2j+1,2j},\\ t_{2j+3,2j+2}&=&\frac{-1}{h_{2j+2,2j}}t_{2j+3,2j+1}h_{2j+1,2j}. \end{array} $$

Proof

The proof is similar to the one given [25, Lemma 3.3]. □

Proposition 5

Let the coefficients hi, j as defined in (9). The matrix S2m = [si, j] in (11) is also pentadiagonal with the nontrivial entries, for j = 1,…, m − 1,

$$ \begin{array}{@{}rcl@{}} s_{i,2j}&=&h_{i,2j}\quad \text{for } i\in\{2j-2,\ldots,2j+2\},\\ s_{2j,2j+1}&=&\frac{-1}{h_{2j+1,2j-1}}s_{2j,2j-2:2j+1}h_{2j-2:2j+1,2j-1},\\ s_{2j+1,2j+1}&=&\frac{-1}{h_{2j+1,2j-1}}[s_{2j+1,2j}h_{2j,2j-1}+s_{2j+1,2j-1}h_{2j+1,2j-1}],\\ s_{2j+2,2j+1}&=&\frac{-1}{h_{2j+1,2j-1}}[s_{2j+2,2j}h_{2j,2j-1}+s_{2j+2,2j+1}h_{2j+1,2j+1}]. \end{array} $$

Proof

The proof is analogous to the proof of Proposition 4. □

The following result relates positive powers of S2m to negative powers of T2m.

Lemma 1

Let T2m and S2m as given by (11) and let \(e_1=[1,0,\ldots ,0]^T\in \mathbb {R}^{2m}\). Then

$$ S_{2m}^{j}e_{1}=T_{2m}^{-j}e_{1}\quad \text{for}~~ j=1,2,\ldots,m. $$

Lemma 2

Let the matrices T2m and S2m as defined by (11). Let \(\mathbb {V}_{2m}=[v_1,v_2,\ldots ,v_{2m}]\) be the matrix given in (12). Then for j = 1…m − 1, we have

$$ \begin{array}{@{}rcl@{}} A^{j}v_{1}&=&\mathbb{V}_{2m}(T^{j}_{2m}e_{1}\otimes I_{s}),\quad j=0,1,\ldots,m-1, \end{array} $$
(14)
$$ \begin{array}{@{}rcl@{}} A^{-j}v_{1}&=&\mathbb{V}_{2m}(S^{j}_{2m}e_{1}\otimes I_{s}),\quad j=0,1,\ldots,m, \end{array} $$
(15)
$$ \begin{array}{@{}rcl@{}} A^{-j}v_{1}&=&\mathbb{V}_{2m}(T^{-j}_{2m}e_{1}\otimes I_{s}),\quad j=0,1,\ldots,m. \end{array} $$
(16)

Proof

It was shown in [25] that when performing m steps of standard extended nonsymmetric Lanczos method to the pairs \((\mathcal {A},v)\) and \((\mathcal {A}^T,w)\), it holds that

$$ \mathcal{A}^{j}x_{1}=\mathcal{V}_{2m}\mathcal{T}_{2m}e_{1},\quad j=0,\ldots,m-1, $$

by using the properties of the ⊗-product and the “vec” operation, we obtain

$$ \mathcal{A}^{j}x_{1}=\text{vec}(A^{j}v_{1}) \text{ and } \mathcal{V}_{2m}\mathcal{T}_{2m}e_{1}=\sum\limits_{i=1}^{2m}[T^{j}_{2m}e_{1}]_{i}\text{vec}(v_{i})=\text{vec}(\mathbb{V}_{2m}(T^{j}_{2m}e_{1}\otimes I_{s})), $$

which prove (14). Using the same techniques as above, we can prove (15). Finally, (16) follows from (15) and Lemma 1. □

Lemma 3

Let the matrices T2m and S2m as defined by (11). Let \(\mathbb {W}_{2m}=[w_1,w_2,\ldots ,w_{2m}]\) be the matrix given in (13). Then for j = 1…m − 1, we have

$$ \begin{array}{@{}rcl@{}} A^{j,T}w_{1}&=&\mathbb{W}_{2m}(T^{j,T}_{2m}e_{1}\otimes I_{s}),\quad j=0,1,\ldots,m-1,\\ A^{-j,T}w_{1}&=&\mathbb{W}_{2m}(S^{j,T}_{2m}e_{1}\otimes I_{s}),\quad j=0,1,\ldots,m,\\ A^{-j,T}w_{1}&=&\mathbb{W}_{2m}(T^{-j,T}_{2m}e_{1}\otimes I_{s}),\quad j=0,1,\ldots,m. \end{array} $$

Proof

The proof is similar to the ones given in Lemma 2. □

3 Application to the approximation of matrix functions

3.1 Computing the matrix function trace(WTf(A)V )

The expression \(\mathcal {I}_{tr}\) defined in (1) can be approximated by

$$ \begin{array}{@{}rcl@{}} \mathcal{G}^{e}_{2m}(f):=\langle W,V \rangle_{F}{e^{T}_{1}}f(T_{2m})e_{1} \end{array} $$
(17)

Lemma 4

Let the matrices T2m and S2m as defined by (11). Let v1 and w1 be the initial block vectors computed by Algorithm 2. Then the following equalities hold

$$ \begin{array}{@{}rcl@{}} \text{trace}({w^{T}_{1}}A^{j}v_{1})&=&{e^{T}_{1}}T^{j}_{2m}e_{1}\quad\text{for } j=0,1,\ldots,2m-1,\\ \text{trace}({w^{T}_{1}}A^{-j}v_{1})&=&{e^{T}_{1}}T^{-j}_{2m}e_{1}\quad\text{for } j=0,1,\ldots,2m. \end{array} $$
(18)

Proof

We have trace\((w_1^TA^jv_1)=\text {vec}(w_1)^T\text {vec}(A^jv_1)\). From Theorem 1, we get y1 = vec(w1) and x1 = vec(v1). By using the properties of the “vec” operation, it follows that trace\((w_1^TA^jv_1)=y^T_1\mathcal {A}^jx_1\). Then using the result given in [25, Lemma 4.1], we obtain \(y^T_1\mathcal {A}^jx_1=e^T_1\mathcal {T}_{2m}e_1\). This shows (18) since, \(\mathcal {T}_{2m}=T_{2m}\). The second equation can be established by using the same techniques as above. □

Theorem 2

After m steps of Algorithm 2 with initial block vectors \(V,W\in \mathbb {R}^{n\times s}\), we get

$$ \text{trace}(W^{T}p(A)V)= \langle V,W \rangle_{F} {e^{T}_{1}}p(T_{2m})e_{1}\quad \forall p\in{\varDelta}_{-2m,2m-1}, $$

where

$$ {\varDelta}_{-2m,2m-1}=\text{span}\{x^{-2m},\ldots,x^{-1},1,x,\ldots,x^{2m-1}\}. $$

Proof

From Algorithm 2, V and W are collinear with V1 and W1, respectively, i.e., V = αV1 and W = βW1 with αβ = 〈V, WF. Which implies that trace\((W^Tp(A)V)=\langle V,W\rangle _F \text {trace}(W_1^Tp(A)V_1)\). Using the results of Lemma 4, we get trace\((W_1^Tp(A)V_1)=e^T_1p(T_{2m})e_1\). This completes the proof. □

figure c

3.2 Computing the matrix function WTf(A)V

The aim of this subsection is to show how to use the extended nonsymmetric global Lanczos algorithm to approximate \(\mathcal {I}\) in (1) (see Algorithm 2). Before describing the application of the proposed method, we notice that the extended nonsymmetric block Lanczos method (ENBL) given in [2] can also be used to approximate \(\mathcal {I}\). After m steps of this algorithm applied to the pairs (A, V ) and (AT,W), we obtain two n × 2ms bi-orthonormal bases \({\mathcal{V}}_{2m}\) and \({\mathcal{W}}_{2m}\), i.e., \({\mathcal{W}}^T_{2m}{\mathcal{V}}_{2m}=I_{2ms}\). Then the expression \(\mathcal {I}\) can be approximated as follows: \(W^T({\mathcal{V}}_{2m}f({\mathcal{T}}_{2m}){\mathcal{W}}^T_{2m})V\) where \({\mathcal{T}}_{2m}\) is 2ms × 2ms block tridiagonal matrix with 2s × 2s blocks. The matrix \({\mathcal{T}}_{2m}={\mathcal{W}}^T_{2m}A{\mathcal{V}}_{2m}\) is computed recursively without requiring additional matrix-vector products with A (see [2, Proposition 3]). The ENBL algorithm is expensive as the number m of iterations increases and also for large values of s.

Now, let us come back to our proposed method. Applying Algorithm 2 to the pairs (A, V ) and (AT,W) allows us to obtain two F-biorthonormal bases \(\mathbb {V}_{2m}\) and \(\mathbb {W}_{2m}\) such that V = αv1 and W = βw1. It is clear that \(\mathbb {W}_{2m}^+\mathbb {V}_{2m}=I_{2ms}\); and then, we can consider the projector \(\mathcal {P}_{2m}\) defined as

$$ \begin{array}{cccc} \mathcal{P}_{2m}:& \mathbb{R}^{n\times s}&\longrightarrow &\mathbb{K}^{e}_{m}(A,V)\\ &X &\longmapsto &\mathbb{V}_{2m}\mathbb{W}_{2m}^{+}X. \end{array} $$

Applying the projector \(\mathcal {P}_{2m}\) to \(\mathcal {I}\) gives the reduced matrix function

$$ \mathcal{B}^{e}_{2m}(f)=C^{T}_{2m}f(A_{2m})B_{2m}, $$
(19)

where \(A_{2m}=\mathbb {W}^+_{2m}A\mathbb {V}_{2m}\in \mathbb {R}^{2ms\times 2ms}\), \(B_{2m}=\mathbb {W}^+_{2m}V\in \mathbb {R}^{2ms\times s}\) and \(C_{2m}=\mathbb {V}^T_{2m}W\in \mathbb {R}^{2ms\times s}\).

The next proposition will allow us to compute A2m and B2m from the recursion matrix T2m without requiring the computation of the \(\mathbb {W}^+_{2m}\) and \(A\mathbb {V}_{2m}\).

Proposition 6

Let V, W be the initial block vectors where V = αV1. Then the matrices A2m and B2m defined above are computed as follows:

$$ \begin{array}{lll} A_{2m}&=&(T_{2m}\otimes I_{s})+\mathbb{W}^{+}_{2m}v_{2m+1}(\left[\begin{array}{ll} t_{2m+1,2m-1},& t_{2m+1,2m} \end{array}\right]{E^{T}_{m}}\otimes I_{s}),\\ B_{2m}&=&\alpha \mathcal{E}_{1}, \end{array} $$

where \(\left [\begin {array}{ll} t_{2m+1,2m-1},& t_{2m+1,2m} \end {array}\right ]\) is defined by (12), and \({\mathcal{E}}_1\) corresponds to the first s columns of the identity matrix In.

Proof

Let \(E_j=[e_{2j-1},e_{2j}]\in \mathbb {R}^{2m\times 2}\), for j = 1,…,m. Multiplying the matrix A2m from the right by EjIs gives

$$A_{2m}(E_{j}\otimes I_{s})=\mathbb{W}^{+}_{2m}A\mathbb{V}_{2m}(E_{j}\otimes I_{s}).$$

Using (12) , we get

$$ A_{2m}(E_{j}\otimes I_{s})=(T_{2m}E_{j}\otimes I_{s})+\mathbb{W}_{2m}^{+}v_{2m+1}\left( \left[\begin{array}{ll} t_{2m+1,2m-1},& t_{2m+1,2m} \end{array}\right]{E^{T}_{m}}E_{j}\!\otimes\! I_{s}\right). $$

It follows that j = 1,…,m − 1,

$$A_{2m}(E_{j}\otimes I_{s})=T_{2m}E_{j}\otimes I_{s},$$

while for j = m, it results that

$$ A_{2m}(E_{m}\otimes I_{s})=(T_{2m}E_{m}\otimes I_{s})+\mathbb{W}_{2m}^{+}v_{2m+1}\left( \left[\begin{array}{ll} t_{2m+1,2m-1},& t_{2m+1,2m} \end{array}\right]\otimes I_{s}\right). $$

Therefore,

$$ A_{2m}=(T_{2m}\otimes I_{s})+\mathbb{W}^{+}_{2m}v_{2m+1}\left( \left[\begin{array}{ll} t_{2m+1,2m-1},& t_{2m+1,2m} \end{array}\right]{E^{T}_{m}}\otimes I_{s}\right). $$

To show the expression of B2m, we use the fact that V = αv1, which implies that

$$ \begin{array}{@{}rcl@{}} B_{2m}&=&\mathbb{W}_{2m}^{+}V=\alpha \mathbb{W}_{2m}^{+}v_{1}\\ &=&\alpha \mathbb{W}_{2m}^{+}\mathbb{V}_{2m}\mathcal{E}_{1}=\alpha \mathcal{E}_{1}. \end{array} $$

Lemma 5

Let A2m be the matrix be defined in (19). Let \(\mathbb {V}_{2m}=[v_1,\ldots ,v_{2m}]\) defined by (12). Then

$$ \begin{array}{@{}rcl@{}} A^{j}v_{1}&=&\mathbb{V}_{2m}A^{j}_{2m}\mathcal{E}_{1}\quad \text{for } j=0,\ldots,m-1,\\ A^{-j}v_{1}&=&\mathbb{V}_{2m}A^{-j}_{2m}\mathcal{E}_{1}\quad \text{for } j=0,\ldots,m. \end{array} $$
(20)

Proof

The first equation is shown by induction. Av1 and v1 are two elements of \(\mathbb {K}^e_m(A,v_1)\) and then

$$ Av_{1}=\mathbb{V}_{2m}\mathbb{W}^{+}_{2m}Av_{1}=\mathbb{V}_{2m}\mathbb{W}^{+}_{2m}A\mathbb{V}_{2m}\mathbb{W}^{+}_{2m}v_{1}=\mathbb{V}_{2m}A_{2m}\mathcal{E}_{1}. $$

The equality is true for j = 1. Let j = 2,…,m − 1, and assume that

$$ A^{k}v_{1}=\mathbb{V}_{2m}A^{k}_{2m}\mathcal{E}_{1} \quad k=1,\ldots,j-1. $$

Since \(A^jv_1\in \mathbb {K}^e_m(A,v_1)\), it follows that \(A^jv_1=\mathbb {V}_{2m}\mathbb {W}^+_{2m}A^jv_1\). By induction, we have

$$ A^{j}v_{1}=\mathbb{V}_{2m}\mathbb{W}^{+}_{2m}A\mathbb{V}_{2m}A^{j-1}_{2m}\mathcal{E}_{1},$$

which completes the proof of (20).

The second equation follows from the fact that \(\mathbb {V}_{2m}\mathbb {W}^+_{2m}A^{-j}v_1=A^{-j}v_1\)j = 1,…,m and by using same technique induction as above. □

Lemma 6

Let A2m be the matrix defined by (19) and let \(\mathbb {W}_{2m}=[ w_1,\ldots ,w_{2m}]\) be the matrix in (13). Then

$$ \begin{array}{@{}rcl@{}} A^{j,T}w_{1}&=\mathbb{W}^{+,T}_{2m}A^{j,T}_{2m}\mathbb{V}^{T}_{2m}w_{1} &\text{for } j=0,\ldots,m-1,\\ A^{-j,T}w_{1}&=\mathbb{W}^{+,T}_{2m}A^{-j,T}_{2m}\mathbb{V}^{T}_{2m}w_{1} &\text{for } j=0,\ldots,m. \end{array} $$

Proof

We have \(A^{j,T}w_1\in \mathbb {K}^{e}_{m}(A^T,W)\), which means that \(\mathbb {W}^{+,T}_{2m}\mathbb {V}^T_{2m}A^{j,T}w_1=A^{j,T}w_1\) for any integer j such that − mjm − 1. According to the previous equality, and by the same techniques as the proof of Lemma 5, then both equations are shown. □

Proposition 7

After m steps of the process, we have

$$ \begin{array}{@{}rcl@{}} W^{T}A^{-j}V&=&C^{T}_{2m}A^{-j}_{2m}B_{2m},\quad \text{for } j=0,\ldots,2m,\\ W^{T}A^{j}V&=&C^{T}_{2m}A^{j}_{2m}B_{2m},\quad \text{for } j=0,\ldots,2m-1. \end{array} $$

Proof

The proof is based on an application of results of Lemmas 5 and 6. Let j = j1 + j2 with j1,j2 ∈{0,…,m}. We have V = αv1 and W = βw1, then

$$ \begin{array}{@{}rcl@{}} W^{T}A^{-j}V&=&\alpha\beta {w^{T}_{1}}A^{-j_{1}}A^{-j_{2}}v_{1}\\ &=&\alpha\beta(A^{-j_{1},T}w_{1})^{T}(A^{-j_{2}}v_{1})\\ \end{array} $$

Using equations of Lemmas 5 and 6, we obtain

$$ \begin{array}{@{}rcl@{}} &=&\alpha\beta(\mathbb{W}^{+,T}_{2m}A^{-j_{1},T}_{2m}\mathbb{V}^{T}_{2m}w_{1})^{T}(\mathbb{V}_{2m}A^{-j_{2}}_{2m}\mathcal{E}_{1})\\ &=&W^{T}\mathbb{V}_{2m}A^{-j}_{2m}\mathbb{W}^{+}_{2m}V\\ &=&C_{2m}^{T}A^{-j}_{2m}B_{2m}, \end{array} $$

which completes the proof of the first relation. The proof of the second relation is similar. □

Now, we compare the operations requirements for the extended block Lanczos method ENBL and the extended global Lanczos method ENGL. Both methods require the same cost of computing matrix-matrix products AV for some \(V\in \mathbb {R}^{n\times s}\); they also require the same cost of solving linear systems. The ENBL method requires 4ns2 operations to compute the n × 2s matrix VH and 4ns2 for computing the 2s × 2s matrix WTV, while the ENGL method only needs 4ns operations to compute WTV and 4ns to compute V (HIs). To update the new bases Vj+ 1 and Wj+ 1, the ENBL method has to perform the bi-orthonormalization decomposition of \(\widehat {V}_{j+1}\) and \(\widehat {W}_{j+1}\) that costs 16ns2 in every step, while the global bi-orthonormalization decomposition costs only 16ns. To compute \({\mathcal{T}}_{2m}\) using [2, Proposition 3], we need to 2s(4ns + n + 1)(m + 1) operations, while for ENGL method, computing T2m costs only (10ns + 2)(m + 1). Moreover, the computation of \({\mathcal{T}}_{2m}\) requires solving linear systems of size s × s for every step, while ENGL method only needs the division by a scalar (see Proposition 4). In Table 1, we summarize the number of operations after m iterations of the ENBL algorithm and the ENGL algorithm. As we observed, the ENGL algorithm is less expensive than the ENBL algorithm, when the number of iteration m increases and the block size s is not small.

figure d
Table 1 Comparison of the extended nonsymmetric block Lanczos (ENBL) and the extended nonsymmetric global Lanczos (ENGL) algorithms

4 Numerical experiments

In this section, we give some numerical examples to show the performance of the extended nonsymmetric global Lanczos (ENGL) method . In the selected examples, the proposed method is applied to the approximation of expressions of the form \(\mathcal {I}_{tr}\) and \(\mathcal {I}\) given in (1). All experiments were carried out in MATLAB R2015a on a computer with an Intel Core i-3 processor and 3.89 GBytes of RAM. The computations were done with about 15 significant decimal digits.

As mentioned in the second section, we did not have to compute explicitly A− 1. In all examples, the matrix products with A− 1 and AT in lines 2 − 8 − 9 of Algorithm 4 are computed via an LU factorization or by using an iterative solver. We used a preconditioned block biconjugate gradient (PBBiCG) method as described in Algorithm 5.

figure e

4.1 Examples for approximations of trace(WTf(A)V )

Example 1

In this experiment, we compared the performance of the ENGL algorithm with the performance of extended global Arnoldi algorithm (EGA) described in [1, 15]. We approximate \(\text {trace}(E^T_1\exp (A)E_1)\) where A is the nonsymmetric adjacency matrix pesa of order n = 11738. This matrix was obtained from the Suite Sparse Matrix Collection[8]. \(E_1\in \mathbb {R}^{n\times s}\) corresponds to the first s columns of identity matrix In. Results for several choices of the block size s and number of iterations m are reported in Table 2. We notice that in the ENGL and EGA algorithms, we used the PBBiCG algorithm defined by Algorithm 5 with an ILU preconditionner. As observed from this table, the approximate errors determined by ENGL have higher accuracy as compared with the approximations obtained by the EGA method.

Table 2 Example 1: \(A\in \mathbb {R}^{n\times n}\) is the nonsymmetric adjacency pesa matrix with n = 11738

Example 2

In this example, we used a diagonalizable matrix of order n = 1000 whose eigenvalues are log-uniformly distributed in the interval [10− 1,104] and random eigenvectors. We computed approximations of \(\text {trace}(E^T_1f(A)E_1)\) given by ENGL and by the standard nonsymmetric global Lanczos method (SNGL). Here we used s = 6. In Table 3, we reported the number of iterations, the relative errors, and the required CPU times obtained with different functions. Here we used the LU factorization to compute products of the form A− 1V and ATW. The results show that the ENGL algorithm is faster and give better relative errors, while SNGL algorithm is unable to determine an accurate approximation for all functions f used in this example.

Table 3 Example 2: \(A\in \mathbb {R}^{n\times n}\) has eigenvalues distributed in the interval [10− 1,104] and a random eigenvector matrix. The block size s = 6

4.2 Examples for the approximation of WTf(A)V

In this subsection, we present some results to approximate WTf(A)V using the ENGL algorithm. In the following experiments, we used the functions: \(f(x)=\sqrt {x}\) and f(x) = x− 1/3. The blocks W and V were generated randomly with entries uniformly distributed on [0,1]. The matrix A was obtained from the centered finite difference discretization (CFDD) of the elliptic operators given by (21) on the unit square [0,1] × [0,1] with Dirichlet homogeneous boundary conditions. The number of inner grid points in each direction was n0 and the dimension of matrices is \(n=n_0^2\).

$$ \begin{array}{ll} \mathcal{L}_{1}(u)&=-100u_{xx}-u_{yy}+10xu_{x},\\ \mathcal{L}_{2}(u)&=-e^{-xy}u_{xx}-e^{xy}u_{yy}+1/(x+y)u_{x}. \end{array} $$
(21)

Example 3

We consider the approximation of WTA1/2V and WTB1/2V where the matrices \(A,B\in \mathbb {R}^{4900\times 4900}\) are nonsymmetric matrices coming from CFDD of the operators \({\mathcal{L}}_1(u)\) and \({\mathcal{L}}_2(u)\), respectively, and given by (21). The block size s was s = 4. In Fig. 1, we reported the relative errors of ENBL, ENGL, and SNBL algorithms versus the dimension of the projected subspace using the matrix A on the left and on the right part of this figure, we give the results corresponding to the matrix B. Both plots show that ENGL and ENBL algorithms yield significantly smaller errors than the SNBL algorithm.

Fig. 1
figure 1

Approximation of \(E^T_1A^{1/2}E_1\)(left plot) and \(E^T_1B^{1/2}E_1\)(right plot)

Example 4

In this example, we consider nonsymmetric matrices coming from CFDD of the same operators as in Example 3. In Tables 4 and 5, we reported results for the ENGL and ENBL algorithms when approximating \(\mathcal {I}\). We used different values of the dimension n ({2500,4900,7225,10000}) and two different block sizes s = 10,20 in Table 4, and s = 30,40 in Table 5. For the last two values of n, we used PBBiCG preconditioned by the block ILU preconditioner (see [23]). As shown in Tables 4 and 5, when the block size s increases, the approximations of \(\mathcal {I}\) computed with ENGL are more accurate than the approximations produced by the ENBL algorithm.

Table 4 Example 4: Approximation of WTf(A)V for two functions and different matrix dimensions for the operators given by (21)
Table 5 Example 4: Approximation of WTf(A)V for two functions and different matrix dimensions for the operators given by (21)

5 Conclusion

This paper describes an extended nonsymmetric global Lanczos method for the approximation of trace(WTf(A)V ) and WTf(A)V. Two F-biorthonormal bases of the extended Krylov subspaces given by (5) are computed by short recurrence formulas. We gave some suitable algebraic relations. The numerical results show that the nonsymmetric extended global Lanczos method requires fewer iterations and CPU time as compared with the standard nonsymmetric global Lanczos method and to the extended global Arnoldi method when approximating trace(WTf(A)V ) and WTf(A)V.