1 Introduction

Let \(t'\ge t\in I\subseteq \mathbb {R}\) be variables—called times for convenience—in an interval I, and \(\mathsf {A}(t')\) be an \(N\times N\) time-dependent matrix. For a fixed t (usually \(t=0\)), the time-ordered exponential of \(\mathsf {A}(t')\) is defined as the unique solution \(\mathsf {U}(t',t)\) of the non autonomous system of linear ordinary differential equations

$$\begin{aligned} \mathsf {A}(t') \mathsf {U}(t',t)=\frac{d}{dt'}\mathsf {U}(t',t), \quad \mathsf {U}(t,t)=\mathsf {Id}, \quad t' \ge t, \end{aligned}$$
(1.1)

with \(\mathsf {Id}\) the identity matrix. Note that t represents the time on which the initial condition is given, and that the (unusual) notation \(\mathsf {U}(t',t)\) will be useful later. If the matrix \(\mathsf {A}\) commutes with itself at all times, i.e., \(\mathsf {A}(\tau _1)\mathsf {A}(\tau _2)-\mathsf {A}(\tau _2)\mathsf {A}(\tau _1)=0\) for all \(\tau _1,\tau _2 \in I\), then the time-ordered exponential is given by the matrix exponential \(\mathsf {U}(t',t)=\exp \left( \int _t^{t'} \mathsf {A}(\tau )\, \text {d}\tau \right) .\) However, when \(\mathsf {A}\) does not commute with itself at all times, the time-ordered exponential has no known explicit form in terms of \(\mathsf {A}\) and is rather denoted

$$\begin{aligned} \mathsf {U}(t',t)=\mathcal {T}\exp \left( \int _t^{t'} \mathsf {A}(\tau )\, \text {d}\tau \right) , \end{aligned}$$

with \(\mathcal {T}\) the time-ordering operator [1]. This expression, introduced by Dyson in 1952, is more a notation than an explicit form as the action of the time-ordering operator is very difficult to evaluate. In particular, \(\mathsf {U}(t',t)\) does not have a Cauchy integral representation, and it cannot be evaluated via ordinary diagonalization. It is unlikely that a closed form expression for \(\mathsf {U}(t',t)\) in terms of \(\mathsf {A}\) exists at all since even when \(\mathsf {A}\) is \(2\times 2\), \(\mathsf {U}\) can involve very complicated special functions [2, 3].

Evaluating time-ordered exponentials is a central question in the field of system dynamics, in particular in quantum physics where \(\mathsf {A}\) is the Hamiltonian operator. Situations where this operator does not commute with itself are routinely encountered [4], and the departure of the time-ordered exponential from a straightforward matrix exponential is responsible for many peculiar physical effects [5,6,7]. Further applications are found via differential Lyapunov and Riccati matrix equations, which frequently appear in control theory, filter design, and model reduction problems [8,9,10,11,12]. Indeed, the solutions of such differential equations involve time-ordered exponentials [13,14,15,16].

In [17], we introduced a tridiagonal form for the matrix \(\mathsf {A}(t')\) from which it is possible to express a time-ordered exponential via path-sum continued fractions of finite depth. More precisely, the described procedure formulates each element of a time-ordered exponential in terms of a finite and treatable number of scalar integro-differential equations. Such a tridiagonal form is obtained by using the so-called \(*\)- Lanczos algorithm. The algorithm also appeared in [18] as the paper’s motivation. Despite being at the core of results in [17] and being the motivation of [18], the \(*\)-Lanczos algorithm construction and the proofs of related main properties have yet not appeared in a scientific journal. The present paper aims to fill a gap in the literature as it constructs the \(*\)-Lanczos algorithm from a (generalized) Krylov subspace perspective, proves the related Matching Moment Property, introduces a bound for the approximation error, and adds further results on the breakdown issue that may affect the method.

1.1 Existing analytical approaches: Pitfalls and drawbacks

In spite of the paramount importance of the time-ordered exponential, it is usually omitted from the literature on matrix functions. Until 2015, only two families of analytical approaches existed (numerical methods will be discussed in Sect. 5). The first one to have been devised relies on Floquet theory and necessitates \(\mathsf {A}(t')\) to be periodic (see, e.g., [4]). This method transforms Eq. (1.1) into an infinite system of coupled linear differential equations with constant coefficients. This system is then solved by a perturbative expansion at very low order relying on the presence of a small parameter. Orders higher than 2 or 3 are typically too involved to be treated. The second method was developed in 1954 by Wilhelm Magnus [19]. It produces an infinite series of nested commutators of \(\mathsf {A}\) with itself at different times, the ordinary matrix exponential of which provides the desired solution \(\mathsf {U}(t',t)\). Magnus series are very much in use nowadays [4], especially because they guarantee that the approximation to \(\mathsf {U}(t',t)\) is unitary in quantum mechanical calculations [4]. Nevertheless, the Magnus series for \(\mathsf {U}(t',t)\) has a small convergence domain; see [20] and also [21,22,23,24,25].

In 2015, in the joint article [26], one of the authors presented a third method to obtain time-ordered exponentials using graph theory and necessitating only the entries \(\mathsf {A}(t')_{k\ell }\) to be bounded functions of time [26]. The method formulates any desired entry or group of entries of \(\mathsf {U}(t',t)\) as a branched continued fraction of finite depth and breadth. It has been succesfully used to solve challenging quantum dynamic problems, see e.g. [27, 28]. This approach is unconditionally convergent and it provides exact expressions in terms of a finite number of integrals and Volterra equations. However, it suffers from a complexity drawback. Indeed, it requires one to find all the simple cycles and simple paths of a certain graph G. These are the walks on G which are not self-intersecting. Unfortunately, the problem of enumerating such walks is \(\#\)P-completeFootnote 1 [29], hindering the determination of exact solutions in large systems that must be treated using a further property of analytical path-sums called scale-invariance [28]. The present work with the results in [17, 18] solves this issue by transforming the original matrix into a structurally simpler one on which the path-sum solution takes the form of an ordinary, finite, continued fraction.

1.2 The non-Hermitian Lanczos algorithm: background

Consider the simpler case in which \(\mathsf {A}\) is not time-dependent. The solution of (1.1) is given by the matrix function \(\exp (\mathsf {A}(t'-t))\) which can be numerically approximated in several different ways (see, e.g., [30,31,32]). One possible method is the (non-Hermitian) Lanczos algorithm. Computing the \((k,\ell )\) element of \(\exp (\mathsf {A})\) is equivalent to computing the bilinear form \(\varvec{e}_k^H\exp (\mathsf {A}) \,\varvec{e}_\ell \), with \(\varvec{e}_k, \varvec{e}_\ell \) vectors from the canonical Euclidean basis, and \(\varvec{e}_k^H\) the usual Hermitian transpose (here it coincides with the transpose since the vector is real). The non-Hermitian Lanczos algorithm (e.g., [33,34,35,36]) gives, when no breakdown occurs, the matrices

$$\begin{aligned} \mathsf {V}_n = [\varvec{v}_0,\dots ,\varvec{v}_{n-1}], \quad \mathsf {W}_n = [\varvec{w}_0,\dots ,\varvec{w}_{n-1}], \end{aligned}$$

whose columns are biorthonormal bases respectively for the Krylov subspaces

$$\begin{aligned} \text {span}\{ \varvec{e}_\ell , \mathsf {A}\,\varvec{e}_\ell , \dots , \mathsf {A}^{n-1}\, \varvec{e}_\ell \}, \quad \text {span}\{ \varvec{e}_k, \mathsf {A}^H\varvec{e}_k, \dots , (\mathsf {A}^H)^{n-1}\, \varvec{e}_k \}. \end{aligned}$$

Note that for \(\mathsf {A}\) Hermitian and \(k=\ell \) we can equivalently use the Hermitian Lanczos algorithm (getting \(\mathsf {V}_n = \mathsf {W}_n\)). The so-called (complex) Jacobi matrix \(\mathsf {J}_n\) is the tridiagonal symmetric matrix with generally complex elements obtained by

$$\begin{aligned} \mathsf {J}_n = \mathsf {W}_n^H \mathsf {A} \,\mathsf {V}_n. \end{aligned}$$

As described in [35], we can use the approximation

$$\begin{aligned} \varvec{e}_k^H\exp (\mathsf {A}) \varvec{e}_\ell \approx \varvec{e}_1^H\exp (\mathsf {J}_n) \varvec{e}_1, \end{aligned}$$
(1.2)

which relies on the so-called matching moment property, i.e.,

$$\begin{aligned} \varvec{e}_k^H (\mathsf {A})^j \varvec{e}_\ell = \varvec{e}_1^H (\mathsf {J}_n)^j \varvec{e}_1, \quad j=0,1,\dots , 2n-1; \end{aligned}$$
(1.3)

see, e.g., [35, 36] for the Hermitian case, and [37, 38] for the non-Hermitian one. The approximation (1.2) is a model reduction in two ways. First, the size of \(\mathsf {J}_n\) is much smaller than that of \(\mathsf {A}\). Second, the structure of the matrix \(\mathsf {J}_n\) is much simpler since it is tridiagonal.

Given a matrix \(\mathsf {A}\) with size N, the Lanczos algorithm can be used as a method for its tridiagonalization (see, e.g., [39]). Assuming no breakdown, the Nth iteration of the non-Hermitian Lanczos with input the matrix \(\mathsf {A}\) and a couple of vectors \(\varvec{v}, \varvec{w}\) produces the tridiagonal matrix \(\mathsf {J}_N\), and the biorthogonal square matrices \(\mathsf {V}_N, \mathsf {W}_N\) so that

$$\begin{aligned} \mathsf {A}^j = \mathsf {V}_N \, (\mathsf {J}_N)^j\,\mathsf {W}_N^H, \quad j=0,1,\dots , \end{aligned}$$
(1.4)

giving the exact expression

$$\begin{aligned} \exp (\mathsf {A}) = \mathsf {V}_N \exp (\mathsf {J}_N)\, \mathsf {W}_N^H. \end{aligned}$$
(1.5)

Theorem 2.2 which we prove in this work extends this result to time-ordered exponentials.

The Lanczos approximation (1.2) is connected with several further topics, such as (formal) orthogonal polynomials, Gauss quadrature, continued fractions, the moment problem, and many others. Information about these connections and references to the related rich and vast literature can be found, e.g., in the monographs [35, 36, 40] and the surveys [37, 38].

Inspired by approximation (1.2), the \(*\)-Lanczos algorithm produces a model reduction of a time-ordered exponential by providing a time-dependent tridiagonal matrix \(\mathsf {T}_n\) satisfying properties analogous to the ones described above [17]. Differently from the classical case, the \(*\)-Lanczos algorithm works on vector distribution subspaces and it has to deal with a non-commutative product.

The time-dependent framework in which the proposed method works is much more complicated than the time-independent Krylov subspace approximation given by the Lanczos algorithm. In this paper, we will not deal with the behavior of the \(*\)-Lanczos algorithm when taking into account approximations and finite-precision arithmetic problems.

1.3 Outline

The work is organized as follows: In Sect. 2, we build the \(*\)-Lanczos algorithm. The algorithm relies on a non-commutative \(*\)-product between generalized functions of two-time variables, which we describe in Sect. 2.1. Then, in Sect. 2.2, we state the main result, Theorem 2.1, which underpins the Lanczos-like procedure. Theorem 2.2 establishes that the first 2n \(*\)-moments of a certain tridiagonal matrix \(\mathsf {T}_n\) match the corresponding \(*\)-moments of the original matrix \(\mathsf {A}\). Theorem 2.1 is proved with the tools developed in the subsequent Sect. 2.3. Section 3 is devoted to the convergence and breakdown properties of the algorithm, while examples of its use are given in Sect. 4. In Sect. 5 we outline a way to implement the Lanczos-like procedure numerically and we evaluate its computational cost. Section 6 concludes the paper.

2 The \(*\)-Lanczos algorithm

2.1 The \(*\)-product and \(*\)-moments

In this section, we recall the definition and the main properties of the product introduced in [18, Section 1.2].

Let t and \(t'\) be two real variables in a real time interval I. We consider the class \(\text {D}(I)\) of all distributions which are linear superpositions of Heaviside theta functions and Dirac delta derivatives with smooth coefficients over \(I^2\). That is, a distribution d is in \(\text {D}(I)\) if and only if it can be written as

$$\begin{aligned} d(t',t)=\widetilde{d}(t',t)\Theta (t'-t) + \sum _{i=0}^N \widetilde{d}_i(t',t)\delta ^{(i)}(t'-t), \end{aligned}$$
(2.1)

where \(N\in \mathbb {N}\) is finite, \(\Theta (\cdot )\) stands for the Heaviside theta function (with the convention \(\Theta (0)=1\)) and \(\delta ^{(i)}(\cdot )\) is the ith derivative of the Dirac delta distribution \(\delta = \delta ^{(0)}\). Here and from now on, a tilde over a function (e.g., \(\widetilde{d}(t',t)\)) indicates that it is an ordinary function smooth in both \(t'\in I\) and \(t\in I\). Note that we consider distributions as defined by Schwartz ( [41]). Hence a distribution \(f \in \text {D}(I)\) should be interpreted as a linear functional applied to test functions.

We can endow the class \(\text {D}(I)\) with a non-commutative algebraic structure upon defining a product between its elements. For \(f_1, f_2 \in \text {D}(I)\) we define the convolution-like \(*\) product between \(f_1(t',t)\) and \(f_2(t',t)\) as

$$\begin{aligned} \big (f_2 * f_1\big )(t',t) := \int _{-\infty }^{\infty } f_2(t',\tau ) f_1(\tau , t) \, \text {d}\tau , \end{aligned}$$
(2.2)

that has as identity element the Dirac delta distribution, \(1_*:=\delta (t'-t)\). When \(f(t',t) = f(t'-t)\) has bounded supporting set, the \(*\)-product \(f *g\) (and \(g *f\)) is equivalent to the convolution product for distributions defined by Schwartz ([42, § 11] and [41, Chapter VI]). Since \(\delta ^{(i)}(t'-t)\) has bounded supporting set, given \(f \in \text {D}(I)\), the \(*\)-product \(\delta ^{(i)}(t'-t) *f\) and \(f *\delta ^{(i)}(t'-t)\) are well-defined and are both elements of \(\text {D}(I)\); see [18] for further details. Moreover, it holds

$$\begin{aligned} \delta ^{(i)}(t'-t) *\delta ^{(j)}(t'-t)&= \delta ^{(j)}(t'-t) *\delta ^{(i)}(t'-t) = \delta ^{(i+j)}(t'-t); \\ \Theta (t'-t)*\delta '(t'-t)&= \delta '(t'-t) *\Theta (t'-t) = \delta (t'-t). \end{aligned}$$

Consider the subclass \(\text {Sm}_\Theta (I)\) of \(\text {D}(I)\) comprising those distributions of the form

$$\begin{aligned} f(t',t)=\tilde{f}(t',t)\Theta (t'-t). \end{aligned}$$
(2.3)

For \(f_1,\,f_2\in \text {Sm}_\Theta (I)\), the \(*\)-product between \(f_1,f_2\) simplifies to

$$\begin{aligned} \big (f_2 * f_1\big )(t',t)&= \int _{-\infty }^{\infty } \tilde{f}_2(t',\tau ) \tilde{f}_1(\tau , t)\Theta (t'-\tau )\Theta (\tau -t) \, \text {d}\tau ,\\&=\Theta (t'-t)\int _t^{t'} \tilde{f}_2(t',\tau ) \tilde{f}_1(\tau , t) \, \text {d}\tau , \end{aligned}$$

which makes calculations involving such functions easier to carry out and shows that \(\text {Sm}_\Theta (I)\) is closed under \(*\)-multiplication. Together with the arguments above, this proves that \(\text {D}(I)\) is closed under \(*\)-multiplication. Hence, for \(f \in \text {D}(I)\), we can define its kth \(*\)-power \(f^{*k}\) as the k \(*\)-products \(f *f *\dots *f\), with the convention \(f^{*0} = \delta (t'-t)\). First examples of \(*\)-powers are

$$\begin{aligned} \Theta ^{*k}(t'-t)&= \frac{(t'-t)^{k-1}}{(k-1)!}\Theta (t'-t); \end{aligned}$$
(2.4)
$$\begin{aligned} \left( \delta ^{(j)}(t'-t)\right) ^{*k}&= \delta ^{(kj)}(t'-t). \end{aligned}$$
(2.5)

Note that, for members of \(\text {Sm}_\Theta (I)\), the \(*\)-product reduces exactly to the Volterra composition, a product between smooth functions of two-variables developed by Volterra and Pérès [43].

We will not discuss technicality associated with the \(*\)-product by and between distributions such as Dirac delta derivatives since it would bring us too far from the paper’s goals. More details on the distributions themselves can be found in [41] while illustrative examples of \(*\)-products involving such distributions can be found in [18].

The \(*\)-product extends directly to distributions of \(\text {D}(I)\) whose smooth coefficients depend on less than two variables. Indeed, consider a generalized function \(f_3(t',t)=\tilde{f}_3(t')\delta ^{(i)}(t'-t)\) with \(i\ge -1\) and \(\delta ^{(-1)}= \Theta \). Then

$$\begin{aligned} \big (f_3 *f_1\big )(t',t)&= \tilde{f}_3(t')\int _{-\infty }^{+\infty } \delta ^{(i)}(t'-\tau )f_1(\tau , t) \, \text {d}\tau ,\\ \big (f_1*f_3\big )(t',t)&=\int _{-\infty }^{+\infty } f_1(t',\tau )\tilde{f}_3(\tau )\delta ^{(i)}(\tau -t) \, \text {d}\tau , \end{aligned}$$

where \(f_1(t',t)\in \text {Sm}_\Theta (I)\) is as defined on p. 6. Hence the variable of \(\tilde{f}_3(t')\) is treated as the left variable of a smooth function of two variables. This observation extends straightforwardly should \(\tilde{f}_3\) be constant and, by linearity, to any distribution of \(\text {D}(I)\).

The \(*\)-product also naturally extends to matrices whose entries are distributions of \(\text {D}(I)\). Consider two of such matrices \(\mathsf {A}_1(t',t)\) and \(\mathsf {A}_2(t',t)\in \text {D}(I)^{N\times N}\) then

$$\begin{aligned} \big (\mathsf {A}_2 * \mathsf {A}_1\big )(t',t) := \int _{-\infty }^{+\infty } \mathsf {A}_2(t',\tau ) \mathsf {A}_1(\tau , t) \, \text {d}\tau , \end{aligned}$$

where the sizes of \(\mathsf {A}_1, \mathsf {A}_2\) are compatible for the usual matrix product (here and in the following, we omit the dependency on \(t'\) and t when it is clear from the context) and the integral is treated component-wise. As earlier, the \(*\)-product is associative and distributive with respect to the addition, but it is non-commutative. The identity element with respect to this product is now \(\mathsf {Id}_*:=\mathsf {Id}\,1_*\), with \(\mathsf {Id}\) the identity matrix of appropriate size.

Given a square matrix \(\mathsf {A}(t',t)\) composed of elements from \(\text {D}(I)\), we define the k-th matrix \(*\)-power \(\mathsf {A}^{*k}\) as the k \(*\)-products \(\mathsf {A}*\mathsf {A} *\dots *\mathsf {A}\). In particular, by (2.4) we get the bound

$$\begin{aligned} \Vert \mathsf {A}^{*k}(t',t)\Vert _{\star } \le \left( \sup _{\begin{array}{c} \tau \ge \rho \\ \tau , \rho \in I \end{array}}\Vert \mathsf {A}(\tau ,\rho )\Vert _{\star }\right) ^{k} \frac{(t'-t)^{k-1}}{(k-1)!}\Theta (t'-t); \quad t',t \in I, \end{aligned}$$

with \(\Vert \cdot \Vert _\star \) any induced matrix norm. As a consequence, the \(*\)-resolvent of any matrix depending on at most two variables is well defined, as \(\mathsf {R}_{*}(\mathsf {A}):=\left( \mathsf {Id}_*-\mathsf {A}\right) ^{*-1}=\mathsf {Id}_{*}+\sum _{k\ge 1} \mathsf {A}^{*k}\) exists provided every entry of \(\mathsf {A}\) is bounded for all \(t',t \in I\) (see [26]). Then

$$\begin{aligned} \mathsf {U}(t',t)=\Theta (t'-t) *\mathsf {R}_{*}(\mathsf {A})(t',t) \end{aligned}$$
(2.6)

is the time-ordered exponential of \(\mathsf {A}(t',t)\); see [26]. We recall that \(\Theta (\cdot )\) here stands for the Heaviside function under the convention that \(\Theta (0)=1\). Note that time-ordered exponentials are usually presented with only one-time variable, corresponding to \(\mathsf {U}(t)=\mathsf {U}(t,0)\). Yet, in general \(\mathsf {U}(t',t)\ne \mathsf {U}(t'-t,0)\).

In the spirit of the Lanczos algorithm, given a time-dependent matrix \(\mathsf {A}(t',t)\), we will construct a matrix \(\mathsf {T}_n(t',t)\) of size \(n\le N\) with a simpler (tridiagonal) structure and so that, fixing the indexes \(k,\ell \), it holds

$$\begin{aligned} \big (\mathsf {A}^{*j}(t',t)\big )_{k,\ell } = \big (\mathsf {T}_n^{*j}(t',t)\big )_{1,1}, \quad \text { for } \quad j=0,\dots , 2n-1, \quad t',t \in I; \end{aligned}$$
(2.7)

compare it with (1.3). In particular, when \(n=N\) property (2.7) stands for every \(j\ge 0\), giving

$$\begin{aligned} \mathsf {R}_{*}(\mathsf {A})_{k,\ell } = \mathsf {R}_{*}(\mathsf {T}_N)_{1,1}. \end{aligned}$$

Hence the solution is given by the path-sum techniques exploiting the fact that the graph having \(\mathsf {T}_N\) as its adjacency matrix admits self-loops. More in general, given time-independent vectors \(\varvec{v},\varvec{w}\) we call the jth \(*\)-moment of \(\mathsf {A},\varvec{v},\varvec{w}\) the scalar function \(\varvec{w}^H (\mathsf {A}^{*j}(t',t))\, \varvec{v}\), for \(j \ge 0\) ( note that when the * product symbol is omitted, it stands for the usual matrix-vector product). Then property (2.7) is an instance of the more general case

$$\begin{aligned} \varvec{w}^H (\mathsf {A}^{*j}(t',t))\, \varvec{v} = \varvec{e}_1^H (\mathsf {T}_n^{*j}(t',t))\, \varvec{e}_1, \quad \text { for } \quad j=0,\dots , 2n-1, \quad t',t \in I. \end{aligned}$$

2.2 Building up the \(*\)-Lanczos process

Given a doubly time-dependent matrix \(\mathsf {A}(t',t)=\widetilde{\mathsf {A}}(t')\Theta (t'-t)\) and \(k+1\) scalar generalized functions \(\alpha _0(t',t), \alpha _1(t',t),\dots ,\alpha _k({t',t}) \in \text {D}(I)\) which play the role of the coefficients, we define the matrix \(*\)-polynomial \(p(\mathsf {A})(t',t)\) of degree k as

$$\begin{aligned} p(\mathsf {A})(t',t) := \sum _{j=0}^k \left( \mathsf {A}^{*j}*\alpha _j \right) (t',t); \end{aligned}$$

moreover, we define the corresponding dual matrix \(*\)-polynomial as

$$\begin{aligned} p^D(\mathsf {A})(t',t) := \sum _{j=0}^k \left( \bar{\alpha }_j*(\mathsf {A}^{*j}) \right) (t',t), \end{aligned}$$

where, in general, \(\bar{d}\) is the conjugated value of \(d \in \text {D}(I)\) and it is defined by conjugating the functions \(\widetilde{d}\) and \(\widetilde{d}_i\) in (2.1). Let \(\varvec{v}\) be a time independent vector, we can define the set of time-dependent vectors \(p(\mathsf {A})\varvec{v}\), with p a matrix \(*\)-polynomial. Such a set is a vector space with respect to the product \(*\) and with scalars \(\alpha _j(t',t)\) (the addition is the usual addition between vectors). Similarly, given a vector \(\varvec{w}^H\) not depending on time, we can define the vector space given by the dual vectors \(\varvec{w}^H p^D(\mathsf {A})\). In particular, we can define the \(*\)-Krylov subspaces

$$\begin{aligned} \mathcal {K}_n(\mathsf {A}, \varvec{v})(t',t)&:= \left\{ \, \left( \,p(\mathsf {A})\varvec{v}\right) (t',t) \,\, \vert \,\, p \text { of degree at most } n-1 \right\} , \\ \mathcal {K}^D_n(\mathsf {A}, \varvec{w})(t',t)&:= \left\{ \, \left( \,\varvec{w}^H p^D(\mathsf {A})\right) (t',t) \,\, \vert \,\, p \text { of degree at most } n-1 \right\} . \end{aligned}$$

The vectors \(\varvec{v}, \mathsf {A}\varvec{v}, \dots ,\mathsf {A}^{*(n-1)}\varvec{v}\) and \(\varvec{w}^H, \varvec{w}^H \mathsf {A}, \dots ,\varvec{w}^H \mathsf {A}^{*(n-1)}\) are bases respectively for \(\mathcal {K}_n(\mathsf {A}, \varvec{v})\) and \(\mathcal {K}^D_n(\mathsf {A}, \varvec{w})\). We aim to derive \(*\)-biorthonormal bases \(\varvec{v}_0, \dots , \varvec{v}_{n-1}\) and \(\varvec{w}_0^H, \dots , \varvec{w}_{n-1}^H\) for the \(*\)-Krylov subspaces, i.e., so that

$$\begin{aligned} \varvec{w}_i^H * \varvec{v}_j = \delta _{ij} \, 1_*, \end{aligned}$$
(2.8)

with \(\delta _{ij}\) the Kronecker delta.

Assume that \(\varvec{w}^H \varvec{v} = 1\), we can use a non-Hermitian Lanczos like biorthogonalization process for the triplet \((\varvec{w}, \mathsf {A}(t',t), \varvec{v})\). We shall call this method the \(*\)-Lanczos process. The first vectors of the biorthogonal bases are

$$\begin{aligned} \varvec{v}_0 = \varvec{v} \, 1_*, \quad \varvec{w}_0^H = \varvec{w}^H 1_*, \end{aligned}$$

so that \(\varvec{w}_0^H * \varvec{v}_0 = 1_*\). Consider now a vector \(\widehat{\varvec{v}}_1 \in \mathcal {K}_2(\mathsf {A},\varvec{v})\) given by

$$\begin{aligned} \widehat{\varvec{v}}_1 = \mathsf {A}*\varvec{v}_0 - \varvec{v}_0 * \alpha _0=\mathsf {A}\varvec{v} - \varvec{v}\alpha _0. \end{aligned}$$

If the vector satisfies the \(*\)-biorthogonal condition \(\varvec{w}_0^H * \widehat{\varvec{v}}_1 = 0\), then

$$\begin{aligned} \alpha _0 = \varvec{w}_0^H * \mathsf {A} * \varvec{v}_0=\varvec{w}^H \mathsf {A}\,\varvec{v}. \end{aligned}$$
(2.9)

Similarly, we get the expression

$$\begin{aligned} \widehat{\varvec{w}}^H_1 = \varvec{w}^H_0*\mathsf {A} - \alpha _0*\varvec{w}^H_0=\varvec{w}^H\mathsf {A} - \alpha _0\varvec{w}^H, \end{aligned}$$

with \(\alpha _0\) given by (2.9). Hence the \(*\)-biorthonormal vectors are defined as

$$\begin{aligned} \varvec{v}_1 = \widehat{\varvec{v}}_1 * \beta _1^{*-1}, \quad \varvec{w}_1 = \widehat{\varvec{w}}_1, \end{aligned}$$

with \(\beta _1=\widehat{\varvec{w}}_1^H*\widehat{\varvec{v}}_1\) and \(\beta _1^{*-1}\) its \(*\)-inverse, i.e., \(\beta _1^{*-1}*\beta _1 = \beta _1 *\beta _1^{*-1}= 1_*\). We give sufficient conditions for the existence of such \(*\)-inverses below. Assume now that we have the \(*\)-biorthonormal bases \(\varvec{v}_0, \dots , \varvec{v}_{n-1}\) and \(\varvec{w}_0^H, \dots , \varvec{w}_{n-1}^H\). Then we can build the vector

$$\begin{aligned} \widehat{\varvec{v}}_n = \mathsf {A}*\varvec{v}_{n-1} - \sum _{i=0}^{n-1} \varvec{v}_{i}*\gamma _{i}, \end{aligned}$$

where the \(\gamma _i\) are determined by the condition \(\varvec{w}_j^H * \widehat{\varvec{v}}_n = \delta _{jn}1_*\), for \(j=0,\dots ,n-1\), giving

$$\begin{aligned} \gamma _j = \varvec{w}_j^H * \mathsf {A} * \varvec{v}_{n-1}, \quad j=0,\dots ,n-1. \end{aligned}$$

In particular, since \(\varvec{w}_j^H*\mathsf {A} \in \mathcal {K}_{j+1}^D(\mathsf {A},\varvec{w})\) we get \(\gamma _j = 0\) for \(j=0,\dots ,n-3\). This leads to the following three-term recurrences for \(n=1,2,\dots \) using the convention \(\varvec{v}_{-1} = \varvec{w}_{-1} = 0\),

$$\begin{aligned} \varvec{w}_n^H&= \varvec{w}_{n-1}^H*\mathsf {A} - \alpha _{n-1}*\varvec{w}_{n-1}^H - {\beta }_{n-1}*\varvec{w}^H_{n-2},\end{aligned}$$
(2.10a)
$$\begin{aligned} \varvec{v}_n*\beta _{n}&= \mathsf {A}*\varvec{v}_{n-1} - \varvec{v}_{n-1}*\alpha _{n-1} - \varvec{v}_{n-2}, \end{aligned}$$
(2.10b)

with the coefficients given by

$$\begin{aligned} \alpha _{n-1} = \varvec{w}_{n-1}^H * \mathsf {A} * \varvec{v}_{n-1}, \quad \beta _n = \varvec{w}_n^H * \mathsf {A} * \varvec{v}_{n-1}. \end{aligned}$$
(2.11)

Should \(\beta _n\) not be \(*\)-invertible, we would get a breakdown in the algorithm, since it would be impossible to compute \(\varvec{v}_n\). We developed a range of general methods to determine the \(*\)-inverse of functions of two-time variables which are gathered in [18]. These methods constructively show the existence of \(\beta _n^{*-1}\) almost everywhere on I under the following conditions:

  • \(\beta _n\) is not identically zero in \(I^2\);

  • \(\beta _n\in \text {Sm}_\Theta (I)\).

The question of whether all \(\alpha _n, \beta _n\in \text {Sm}_\Theta (I)\) was settled affirmatively in [17].

Since the issue of breakdowns of the \(*\)-Lanczos algorithm is connected with the behavior of (usual) Lanczos techniques, we proceed as it is common when working with the non-Hermitian Lanczos algorithm. Thus, from now on, we assume all \(\beta _n\) to be \(*\)-invertible, while we come back to the issue of breakdowns in Sect. 3.2.

The \(*\)-orthogonalization process described above defines the \(*\)-Lanczos algorithm ( Algorithm 1).

figure a

Let us define the tridiagonal matrix

$$\begin{aligned} \mathsf {T}_n := \begin{bmatrix} \alpha _0 &{} 1_* &{} &{} \\ \beta _1 &{} \alpha _1 &{} \ddots &{} \\ &{} \ddots &{} \ddots &{} 1_* \\ &{} &{} \beta _{n-1} &{} \alpha _{n-1} \end{bmatrix}, \end{aligned}$$
(2.12)

and the matrices \(\mathsf {V}_n:=[\varvec{v}_0, \dots , \varvec{v}_{n-1}]\) and \(\mathsf {W}_n := [\varvec{w}_0, \dots , \varvec{w}_{n-1}]\). Then the three-term recurrences Eqs. (2.10) read, in matrix form,

$$\begin{aligned} \mathsf {A} * \mathsf {V}_n&= \mathsf {V}_n * \mathsf {T}_n + (\varvec{v}_{n} * \beta _n) \varvec{e}_n^H, \end{aligned}$$
(2.13)
$$\begin{aligned} \mathsf {W}_n^H * \mathsf {A}&= \mathsf {T}_n *\mathsf {W}_n^H + \varvec{e}_n \, \varvec{w}_n^H. \end{aligned}$$
(2.14)

Hence the tridiagonal matrix (2.12) can be expressed as

$$\begin{aligned} \mathsf {T}_n = \mathsf {W}_n^H * \mathsf {A} * \mathsf {V}_n. \end{aligned}$$

The following property of \(\mathsf {T}_n\) is fundamental for the time-ordered exponential approximation; its proof is given in Sect. 2.3.

Theorem 2.1

(Matching Moment Property) Let \(\mathsf {A},\varvec{w},\varvec{v}\) and \(\mathsf {T}_n\) be as described above, then

$$\begin{aligned} \varvec{w}^H (\mathsf {A}^{*j})\, \varvec{v} = \varvec{e}_1^H (\mathsf {T}_n^{*j})\,\varvec{e}_1, \quad \text { for } \quad j=0,\dots , 2n-1. \end{aligned}$$
(2.15)

Consider the time-ordered exponential \(\mathsf {U}_n\) given by the differential equation

$$\begin{aligned} \mathsf {T}_n(t',t) \mathsf {U}_n(t',t)=\frac{d}{dt'}\mathsf {U}_n(t',t), \quad \mathsf {U}_n(t,t)=\mathsf {Id}. \end{aligned}$$
(2.16)

Theorem 2.1 and Eq. (2.6) justify the use of the approximation \(\varvec{w}^H\mathsf {U}(t',t)\,\varvec{v} \approx \varvec{e}_1^H\mathsf {U}_n(t',t)\,\varvec{e}_1\) [17]. The system (2.16) can be seen as a reduced order model of the initial differential Eq. (1.1) from two points of view. First, n may be much smaller than the size of \(\mathsf {A}\); in this sense, in Sect. 3, we will discuss the convergence behavior of the approximation using Theorem 2.1. Secondly, looking at \(\mathsf {A}\) and \(\mathsf {T}_n\) as adjacency matrices, \(\mathsf {A}\) may correspond to a graph with a complex structure, while \(\mathsf {T}_n\) corresponds to a very simple graph composed of one path with possible self-loops. Then the path-sum method gives

$$\begin{aligned} \mathsf {R}_{*}(\mathsf {T}_n)_{1,1}(t',t) = \Big (1_*- \alpha _0-\big (1_*-\alpha _1-(1_*-...)^{*-1}*\beta _2\big )^{*-1}*\beta _1\Big )^{*-1}, \end{aligned}$$
(2.17)

see [26, 44]. This expression is analogous to the one for the first diagonal entry of the inverse of an ordinary tridiagonal matrix [45], except here all operations are taken with respect to the \(*\)-product.

For \(n=N\), we get

$$\begin{aligned} \mathsf {V}_N * \mathsf {W}_N^H = \mathsf {W}_N^H * \mathsf {V}_N = \mathsf {Id} \, 1_*. \end{aligned}$$
(2.18)

Theorem 2.2

Let \(\mathsf {A},\mathsf {V}_N,\mathsf {W}_N\) and \(\mathsf {T}_N\) be as described above, then

$$\begin{aligned} \mathsf {A}^{*j} = \mathsf {V}_N *\mathsf {T}_N^{*j} *\mathsf {W}_N^H, \quad j = 0,1, \dots , \end{aligned}$$

and thus

$$\begin{aligned} \mathsf {R}_*(\mathsf {A}) = \mathsf {V}_N *\mathsf {R}_*(\mathsf {T}_N) *\mathsf {W}_N^H. \end{aligned}$$

The theorem follows by using (2.18). Here, any entry of \(R_*(T_N)\) is computable using a path-sum continued fraction of depth at most N.

Remark 2.3

The Lanczos-like method presented here for the time-ordered exponential is immediately valid for the ordinary matrix exponential function, since the latter is obtained from the former in the situation where \(\mathsf {A}\) commutes with itself at all times,

$$\begin{aligned} \mathcal {T}e^{\int \mathsf {A}(\tau ) \, \text {d}\tau } = e^{\int _t^{t'} \mathsf {A}(\tau ) \, \text {d}\tau }. \end{aligned}$$

This situation includes the case where \(\mathsf {A}\) is time-independent, in which case setting \(t=0\) and \(t'=1\) above yields the matrix exponential of \(\mathsf {A}\). However, the \(*\)-Lanczos algorithm cannot be considered a generalization of the Lanczos algorithm since its outputs on constant matrices are made of distributions and time dependent functions.

2.3 Matching \(*\)-moments through \(*\)-biorthonormal polynomials

In order to prove Theorem 2.1, we will exploit the connection between the \(*\)-Lanczos algorithm and families of \(*\)-biorthonormal polynomials. Let us define the set of \(*\)-polynomials

$$\begin{aligned} \mathcal {P}_* := \left\{ p(\lambda ) = \sum _{j=0}^k \lambda ^{*j}*\gamma _j(t',t) \right\} , \end{aligned}$$

with \(\gamma _j(t',t) \in \text {D}(I)\). Consider a \(*\)-sesquilinear form \([\cdot , \cdot ]: \mathcal {P}_* \times \mathcal {P}_* \rightarrow \text {D}(I)\), i.e., so that given \(p_1,p_2,q_1,q_2 \in \mathcal {P}_*\) and \(\alpha , \beta \in \text {D}(I)\), it satisfies

$$\begin{aligned}{}[q_1*\alpha , p_1*\beta ]&= \bar{\alpha }*[q_1,p_1]*\beta , \\ [q_1 + q_2, p_1 + p_2]&= [q_1, p_1] + [q_2, p_1] + [q_1, p_2] + [q_2, p_2]. \end{aligned}$$

From now on we assume that every considered \(*\)-sesquilinear form \([\cdot , \cdot ]\) also satisfies

$$\begin{aligned}{}[\lambda *q,p] = [q, \lambda *p]. \end{aligned}$$
(2.19)

The \(*\)-sesquilinear form \([\cdot ,\cdot ]\) is determined by its \(*\)-moments defined as

$$\begin{aligned} m_j(t,t'):= [\lambda ^{*j},1] = [1, \lambda ^{*j}], \quad j=0,1,\dots \,. \end{aligned}$$

We aim to build sequences of \(*\)-polynomials \(p_0, p_1, \dots \) and \(q_0, q_1, \dots \) so that they are \(*\)-biorthonormal with respect to \([\cdot ,\cdot ]\), i.e.,

$$\begin{aligned}{}[q_i, p_j] = \delta _{ij} 1_*, \end{aligned}$$
(2.20)

where the subindex j in \(p_j\) and \(q_j\) corresponds to the degree of the \(*\)-polynomial. Here and in the following we assume \(m_0 = 1_*\), getting \(p_0 = q_0 = 1_*\). Consider the \(*\)-polynomial

$$\begin{aligned} q_1(\lambda ) = \lambda * q_0(\lambda ) - q_0(\lambda )*\bar{\alpha }_0. \end{aligned}$$

The orthogonality conditions (2.20) give \(\alpha _0 = [\lambda *q_0, p_0]\). Similarly, we get the \(*\)-polynomial

$$\begin{aligned} p_1(\lambda ) * \beta _1 = \lambda * p_0(\lambda ) - p_0(\lambda ) * \alpha _0, \end{aligned}$$

with \(\alpha _0 = [q_0, \lambda *p_0], \beta _1 = [q_1, \lambda *p_0]\). Repeating the \(*\)-orthogonalization process, we obtain the three-term recurrences for \(n=1,2,\dots \)

$$\begin{aligned} q_n(\lambda )&= \lambda * q_{n-1}(\lambda ) - q_{n-1}(\lambda )*\bar{\alpha }_{n-1} - q_{n-2}(\lambda )*\bar{\beta }_{n-1} \end{aligned}$$
(2.21a)
$$\begin{aligned} p_n(\lambda )*\beta _n&= \lambda * p_{n-1}(\lambda ) - p_{n-1}(\lambda )*\alpha _{n-1} - p_{n-2}(\lambda ), \end{aligned}$$
(2.21b)

with \(p_{-1} = q_{-1} = 0\) and

$$\begin{aligned} \alpha _{n-1} = [q_{n-1}, \lambda * p_{n-1}], \quad \beta _{n} = [q_n, \lambda * p_{n-1}]. \end{aligned}$$
(2.22)

Note that deriving the recurrences needs property (2.19). The \(*\)-biorthonormal polynomials \(p_0, \dots , p_n\) and \(q_0, \dots , q_n\) exist under the assumption that \(\beta _1, \dots , \beta _n\) are \(*\)-invertible.

Let \(\mathsf {A}\) be a time-dependent matrix, and \(\varvec{w},\varvec{v}\) time-independent vectors such that \(\varvec{w}^H\varvec{v} = 1\). Consider the \(*\)-sesquilinear form \([\cdot , \cdot ]\) defined by

$$\begin{aligned}{}[q, p] = \varvec{w}^H \, q^D(\mathsf {A}) * p(\mathsf {A}) \, \varvec{v}. \end{aligned}$$

Assume that there exist \(*\)-polynomials \(p_0, \dots , p_n\) and \(q_0,\dots ,q_n\) \(*\)-biorthonormal with respect to \([\cdot , \cdot ]\). Defining the vectors

$$\begin{aligned} \varvec{v}_j = p_j(\mathsf {A})\,\varvec{v}, \qquad \varvec{w}^H_j = \varvec{w}^H\, q_j^D(\mathsf {A}), \end{aligned}$$

and using the recurrences (2.21) gives the \(*\)-Lanczos recurrences (2.10). Moreover, the coefficients in (2.22) are the \(*\)-Lanczos coefficients in (2.11).

Let \(\mathsf {T}_n\) be a tridiagonal matrix as in (2.12) composed of the coefficients (2.22) associated with the \(*\)-sesquilinear form \([\cdot ,\cdot ]\). Then we can define the \(*\)-sesquilinear form

$$\begin{aligned}{}[q, p]_n = \varvec{e}_1^H \, q^D(\mathsf {T}_n) * p(\mathsf {T}_n) \, \varvec{e}_1. \end{aligned}$$

The following lemmas show that

$$\begin{aligned} m_j = [\lambda ^{*j},1_*] = [\lambda ^{*j},1_*]_n, \quad j=0,\dots ,2n-1, \end{aligned}$$

proving Theorem 2.1.

Lemma 2.4

Let \(p_0,\dots ,p_n\) and \(q_0,\dots ,q_n\) be \(*\)-biorthonormal polynomials with respect to the \(*\)-sesquilinear form \([\cdot ,\cdot ]\). Assume that the coefficients \(\beta _1,\dots , \beta _n\) in the related recurrences (2.21) are \(*\)-invertible. Then the \(*\)-polynomials are also \(*\)-biorthonormal with respect to the form \([\cdot ,\cdot ]_n\) defined above.

Proof

Consider the vectors \(\varvec{y}_j^H = \varvec{e}_1^H \, \mathsf {T}_n^{*j}\) and \(\varvec{x}_j= \mathsf {T}_n^{*j} \, \varvec{e}_1\). Since the matrix \(\mathsf {T}_n\) is tridiagonal, for \(j=1,\dots ,n-1\), we have

$$\begin{aligned} \varvec{e}_i^H \varvec{x}_j = 0, \text { for } i \ge j+2,&\quad \text { and } \quad \varvec{e}_{j+1}^H \varvec{x}_j = \beta _j * \cdots * \beta _{1}, \\ \varvec{y}_j^H \varvec{e}_i = 0, \text { for } i \ge j+2,&\quad \text { and } \quad \varvec{y}_j^H \varvec{e}_{j+1} = 1_* \,. \end{aligned}$$

By assumption, the product \(\beta _j * \cdots * \beta _1\) is \(*\)-invertible. Therefore there exist \(*\)-polynomials \(\widehat{p}_0,\dots ,\widehat{p}_{n-1}\) and \(\widehat{q}_0,\dots ,\widehat{q}_{n-1}\) so that, for \(i=0,\dots ,n-1\), we get

$$\begin{aligned} 1_* \varvec{e}_{i+1}^H = \varvec{e}_1^H\,\widehat{q}_i^D(\mathsf {T}_n), \qquad 1_* \varvec{e}_{i+1} = \widehat{p}_i(\mathsf {T}_n)\, \varvec{e}_1. \end{aligned}$$

Such \(*\)-polynomials are \(*\)-biorthonormal with respect to \([\cdot ,\cdot ]_n\) since they satisfy

$$\begin{aligned}{}[\widehat{q}_i, \widehat{p}_j]_n = 1_*\varvec{e}_{i+1}^H *1_* \varvec{e}_{j+1} = \delta _{ij} 1_*. \end{aligned}$$

Moreover, for \(i=0,\dots ,n-1\), the corresponding recurrence coefficients (2.22) are the same as the ones of the \(*\)-polynomials \(p_0,\dots ,p_{n-1}\) and \(q_0,\dots ,q_{n-1}\). Indeed,

$$\begin{aligned} \widehat{\alpha }_{i-1}&= [\widehat{q}_{i-1}, \lambda *\widehat{p}_{i-1}]_n = \varvec{e}_{i-1}^H \mathsf {T}_n \, \varvec{e}_{i-1} = \alpha _{i-1}, \\ \widehat{\beta _{i}}&= [\widehat{q}_i, \lambda *\widehat{p}_{i-1}]_n = \varvec{e}_{i}^H \, \mathsf {T}_n \varvec{e}_{i-1} = \beta _{i}. \end{aligned}$$

Since \(\widehat{p}_0 = p_0 = \widehat{q}_0 = q_0 = 1_*\), we get \(\widehat{p}_i = p_i\) and \(\widehat{q}_i = q_i\) for \(i=0,\dots ,n-1\). \(\square \)

Lemma 2.5

Let \(p_0,\dots ,p_{n-1}\) and \(q_0,\dots ,q_{n-1}\) be \(*\)-biorthonormal polynomials with respect to a \(*\)-sesquilinear form \([\cdot ,\cdot ]_A\) and to a \(*\)-sesquilinear form \([\cdot ,\cdot ]_B\). If \([1_*,1_*]_A = [1_*,1_*]_B = 1_*\), then \([\lambda ^{*j},1_*]_A = [\lambda ^{*j},1_*]_B\) for \(j=0,\dots ,2n-1\).

Proof

We prove it by induction. Let \(m_j = [\lambda ^{*j},1_*]_A\) and \(\widehat{m}_j = [\lambda ^{*j},1_*]_B\) for \(j=0,1,\dots , 2n-1\). By the expression for the coefficients in (2.22) we get

$$\begin{aligned}{}[q_{0}, \lambda *p_{0}]_A = \alpha _0 = [q_{0}, \lambda *p_{0}]_B. \end{aligned}$$

Hence \(m_1 = \alpha _0 = \widehat{m}_1\). Assuming \(m_j = \widehat{m}_j\) for \(j=0,\dots ,2k-3\) we will prove that \(m_{2k-2} = \widehat{m}_{2k-2}\) and \(m_{2k-1} = \widehat{m}_{2k-1}\), for \(k = 2,\dots ,n\). The coefficient expressions in (2.22) gives

$$\begin{aligned}{}[q_{k-1}, \lambda *p_{k-2}]_A = \beta _{k-1} = [q_{k-1}, \lambda *p_{k-2}]_B, \end{aligned}$$

which can be rewritten as

$$\begin{aligned} \sum _{i=0}^{k-1}\sum _{j=0}^{k-2} \bar{a}_i * m_{i+j+1} * b_j = \sum _{i=0}^{k-1}\sum _{j=0}^{k-2} \bar{a}_i * \widehat{m}_{i+j+1} * b_j, \end{aligned}$$

with \(a_i, b_j\) the coefficients respectively of \(q_{k-1}\) and \(p_{k-2}\). The inductive assumption implies

$$\begin{aligned} \bar{a}_{k-1} * m_{2k-2} * b_{k-2} = \bar{a}_{k-1} * \widehat{m}_{2k-2} * b_{k-2}. \end{aligned}$$

The leading coefficients of the \(*\)-polynomials \(q_{2k-2}\) and \(p_{2k-2}\) are respectively \(a_{k-1}= 1_*\) and \(b_{k-2}=(\beta _{k-2} * \cdots *\beta _{1})^{*-1}\). Hence \(m_{2k-2} = \widehat{m}_{2k-2}\). Repeating the same argument with the coefficient \(\alpha _{k-1}\) (2.22) concludes the proof. \(\square \)

3 Convergence, breakdown, and related properties

3.1 The convergence behavior of intermediate approximations

Assuming no breakdown, the \(*\)-Lanczos algorithm in conjunction with the path-sum method converges to the solution \(\varvec{w}^H\mathsf {U}(t',t)\varvec{v}\) in N iterations, with N the size of \(\mathsf {A}\); see Theorem 2.2. Most importantly, intermediate \(*\)-Lanczos iterations provide a sequence of approximations \(\int _t^{t'} \mathsf {R}_{*}(\mathsf {T}_n)_{1,1}(\tau ,t)\, \text {d}\tau \), \(n=1,\dots ,N\), whose convergence behavior we analyze hereafter.

As we have already discussed before, under the assumption that all entries of \(\mathsf {A}\) are smooth over I and that all the \(\beta _j^{(1,0)}(t,t)\ne 0\), for every \(t \in I\), all the \(\alpha _j\) and \(\beta _j\) distributions are elements of \(\text {Sm}_\Theta (I)\). The proof of this statement is very long and technical and serves only to establish the theoretical feasibility of tridigonalization for systems of coupled linear differential equations with variable coefficients using smooth functions. It was therefore presented in a separate work. We refer the reader to [17] for a full exposition and proof, while here we only state some of the main results:

Theorem 3.1

[17] Let \(\mathsf {A}(t',t)= \widetilde{\mathsf {A}}(t')\Theta (t'-t)\) be an \(N\times N\) matrix composed of elements from \(\text {Sm}_\Theta (I)\). Let \(\alpha _{n-1}\) and \(\beta _n\) be the coefficients generated by Algorithm 1 running on \(\mathsf {A}\) and the time-independent vectors \(\varvec{w}, \varvec{v}\) (\(\varvec{w}^H \varvec{v}=1\)). Let

$$\begin{aligned} \beta _j^{(1,0)}(t',t) := \frac{\partial }{\partial t} \beta _j(t',t), \quad t',t \in I. \end{aligned}$$

For any \(1 \le n \le N\), assuming that \(\beta _j^{(1,0)}(t,t) \ne 0\) for every \(t \in I\), \(j=1,\dots ,n-1\), we get \(\beta _n(t',t), \alpha _{n-1}(t',t) \in \text {Sm}_\Theta (I)\). In addition, all required \(*\)-inverses \(\beta _{j}^{*-1}\) exist and are of the form

$$\begin{aligned} \beta _{j}^{*-1} = b_j^L(t',t) *\delta ^{(3)}(t'-t), \end{aligned}$$

with \(b_j^{L}\in \text {Sm}_\Theta (I)\).

All \(b_j^{L}\) have explicit expansions in terms of \(\beta _j\) which are given in [17] but not reproduced here owing to length concerns. When \(\beta _n \not \equiv 0\), but \(\beta _n^{(1,0)}(\rho ,\rho )=0\) for some \(\rho \in I\), the results of Theorem 3.1 hold if we restrict the intial interval I to a subinteval \(J \subset I\) so that \(\rho \notin J\).

Thanks to these regularity results, we can establish the following bound for the approximation error:

Proposition 3.2

Let us consider the setting and assumptions of Theorem 3.1. Moreover, let \(\mathsf {U}\) designate the time-ordered exponential of \(\mathsf {A}\) and let \(\mathsf {T}_n\) be the tridiagonal matrix (2.12) such that

$$\begin{aligned} \varvec{w}^H\mathsf {A}^{*j}\varvec{v}= (\mathsf {T}_n^{*j})_{1,1}, \quad \text { for } \quad j=0,\dots , 2n-1. \end{aligned}$$

Then, for \(t'\ge t\), with \(t', t \in I\),

$$\begin{aligned} \left|\varvec{w}^H\mathsf {U}(t',t)\varvec{v}-\int _t^{t'} \mathsf {R}_{*}(\mathsf {T}_n)_{1,1}(\tau ,t)\, \text {d}\tau \right|\le \frac{C^{2n}+D_n^{2n}}{(2n)!}(t'-t)^{2n}e^{(C+D_n)(t'-t)}. \end{aligned}$$

Here

$$\begin{aligned} C:=\sup _{t'\in I}\Vert {\mathsf {A}}(t')\Vert _{\infty }, \quad D_n:=3 \, \sup _{t',t\in I^2}\max _{0\le j\le n-1}\big \{\vert {\alpha }_j(t',t)\vert ,\vert {\beta }_j(t',t)\vert \big \} \end{aligned}$$

are both finite, with \(\Vert \cdot \Vert _\infty \) the matrix norm induced by the uniform norm.

Proof

Assume \(t' \ge t\). Observe that

$$\begin{aligned} \varvec{w}^H\mathsf {U}(t',t)\varvec{v}-\int _t^{t'} \!\!\mathsf {R}_{*}(\mathsf {T}_n)(\tau ,t)_{11}\, \text {d}\tau =\int _t^{t'}\!\!\sum _{j=2n}^{\infty }\varvec{w}^H\mathsf {A}^{*j}(\tau ,t)\varvec{v}-(\mathsf {T}_n^{*j})_{1,1}(\tau ,t) \, \text {d}\tau , \end{aligned}$$

so that

$$\begin{aligned} \left|\varvec{w}^H\mathsf {U}(t',t)\varvec{v}-\!\int _t^{t'} \!\!\mathsf {R}_{*}(\mathsf {T}_n)(\tau ,t)_{11}\,\text {d}\tau \right|&\le \int _t^{t'}\!\!\sum _{j=2n}^{\infty }\left|\varvec{w}^H\mathsf {A}^{*j}(\tau ,t)\varvec{v}\right|+\left|(\mathsf {T}_n^{*j})_{1,1}(\tau ,t)\right|\, \text {d}\tau . \end{aligned}$$

Now \(\sup _{t'\in I}\vert \varvec{w}^H{\mathsf {A}}(t')\varvec{v}\vert \le C\) and

$$\begin{aligned} \left|\int _t^{t'}\varvec{w}^H\mathsf {A}^{*j}(\tau ,t)\varvec{v}\, \text {d}\tau \right|\le \Theta (t'-t) *\big (C\Theta (t'-t)\big )^{*j} = C^{j} \frac{(t'-t)^{j}}{j!}. \end{aligned}$$

We proceed similarly for the terms involving \(\mathsf {T}_n\). Theorem 3.1 implies the existence of \(\widehat{D}_n := \sup _{t',t\in I^2}\max _{0\le j\le n-1}\big \{\vert {\alpha }_j(t',t)\vert ,\vert {\beta }_j(t',t)\vert \big \} < +\infty \). The matrix element \((\mathsf {T}_n^{*j})_{1,1}\) is given by the sum of \(*\)-products of coefficients \(\alpha _i, \beta _i\) and \(1_*\). Replacing all the factors in those \(*\)-products with \(\widehat{D}_n \Theta (t'-t)\) gives an upper bound for \((\mathsf {T}_n^{*j})_{1,1}\). Hence we get

$$\begin{aligned} \left|\left( \mathsf {T}_n^{*j}\right) _{1,1}\right|\le \left( \left( \widehat{D}_n \mathsf {P}_n\Theta (t'-t)\right) ^{*j} \right) _{1,1} \le \widehat{D}_n^j \Vert \mathsf {P}_n\Vert _{\infty }^j \Theta (t'-t)^{*j}, \end{aligned}$$

where \(\mathsf {P}_n\) is the \(n\times n\) tridiagonal matrix whose nonzero entries are equal to 1. Note that \(\Vert \mathsf {P}_n\Vert _{\infty }=3\). Hence the error can be bounded by

$$\begin{aligned} \sum _{j=2n}^\infty \left( C^{j}+ D_n^j\right) \frac{(t'-t)^{j}}{j!}&\le \frac{(C^{2n}+ D_n^{2n})}{2n!}(t'-t)^{2n} \sum _{j=0}^\infty \frac{2n!}{({2n+j})!} \left( C^{j}+ D_n^j\right) (t'-t)^{j}, \\&\le \frac{(C^{2n}+ D_n^{2n})}{2n!}(t'-t)^{2n} \sum _{j=0}^\infty \frac{(C+ D_n)^j(t'-t)^{j}}{j!}, \\&\le \frac{(C^{2n}+ D_n^{2n})}{2n!}(t'-t)^{2n} e^{(C+D_n)(t'-t)}, \end{aligned}$$

concluding the proof. \(\square \)

Analogously to the classical non-Hermitian Lanczos algorithm, we need to further assume \(D_n\) to be not too large for \(n\ge 1\) in order to get a meaningful bound. Such an assumption can be verified a-posteriori. The bound of Proposition 3.2 demonstrates that under reasonable assumptions, the approximation error has a super-linear decay. Assuming no breakdown and exact precision arithmetic, we also recall that the algorithm does necessarily converge in at most N steps, independently of the error bound. The computational cost of the algorithm is discussed separately in Sect. 5.

3.2 Breakdown

In the classical non-Hermitian Lanczos algorithm, a breakdown appears either when an invariant Krylov subspace is produced (lucky breakdown) or when the last vectors of the biorthogonal bases \(\varvec{v}_n, \varvec{w}_n\) are nonzero, but \(\varvec{w}_n^H\varvec{v}_n = 0\) (serious breakdown); for further details refer, e.g., to [33, 34, 39, 46,47,48]. Analogously, in the \(*\)-Lanczos algorithm 1, a lucky breakdown arises when either \(\varvec{w}_n \equiv 0\) or \(\widehat{\varvec{v}}_n \equiv 0\). In such a case, the algorithm has converged to the solution, as the following proposition shows.

Proposition 3.3

Assume that the \(*\)-Lanczos algorithm 1 does not breakdown until the nth step when a lucky breakdown arises, i.e., \(\widehat{\varvec{v}}_n \equiv 0\) (or \(\varvec{w}_n \equiv 0\)). Then

$$\begin{aligned} \varvec{w}^H (\mathsf {A}^{*j})\, \varvec{v}&= \varvec{e}_1^H (\mathsf {T}_n^{*j})\,\varvec{e}_1, \quad \text { for } j\ge 0,\\ \varvec{w}^H\mathsf {U}(t',t)\varvec{v}&= \int _0^t\mathsf {R}_*(\mathsf {T}_n)_{1,1}(\tau ,t)\,\text {d}\tau . \end{aligned}$$

Proof

Assuming \(\widehat{\varvec{v}}_n \equiv 0\), eq. (2.13) becomes

$$\begin{aligned} \mathsf {A} * \mathsf {V}_n = \mathsf {V}_n * \mathsf {T}_n. \end{aligned}$$
(3.1)

By \(*\)-multiplying (3.1) from the left by \(\mathsf {A}\), we get

$$\begin{aligned} \mathsf {A} * \mathsf {A} * \mathsf {V}_n = \mathsf {A} *\mathsf {V}_n * \mathsf {T}_n, \end{aligned}$$

which becomes

$$\begin{aligned} \mathsf {A}^{*2} \, * \mathsf {V}_n = \mathsf {V}_n * \mathsf {T}_n^{*2}, \end{aligned}$$

using (3.1). Repeating the argument gives

$$\begin{aligned} \mathsf {A}^{*j} \, * \mathsf {V}_n = \mathsf {V}_n * \mathsf {T}_n^{*j}, \quad j \ge 0, \end{aligned}$$

from which we get that \(\mathsf {W}_n^H * \mathsf {A}^{*j} \, * \mathsf {V}_n = \mathsf {T}_n^{*j}\), \(j\ge 0\). The proposition easily follows from here. An analogous proof holds for the case \(\widehat{\varvec{w}}_n \equiv 0\).

\(\square \)

The \(*\)-Lanczos algorithm construction and its polynomial interpretation in Sect. 2.3 suggest that it may be possible to deal with the serious breakdown issue by a look-ahead strategy analogous to the one for the non-Hermitian Lanczos algorithm; see, e.g., [33, 34]. If \(i\ne j\), then \(\varvec{w}^H\varvec{v} = 0\), which does not satisfy the \(*\)-Lanczos assumption. Moreover, if \(i=j\) and \(\mathsf {A}\) is a sparse non-Hermitian matrix, then it may be possible that \(\mathsf {A}_{ii} \equiv 0\) and \(\mathsf {A}^{{*2}_{ii}} \equiv 0\). As a consequence, we get \(\beta _1 \equiv 0\). We can try to fix these problems rewriting the approximation of the time-ordered exponential \(\mathsf {U}\) as

$$\begin{aligned} \varvec{e}_i^H\mathsf {U}\varvec{e}_j = (\varvec{e} + \varvec{e}_i)^H \mathsf {U} \varvec{e}_j - \varvec{e}^H \mathsf {U} \varvec{e}_j, \end{aligned}$$

with \(\varvec{e} = (1, \dots , 1)^H\). Then one can approximate \((\varvec{e} + \varvec{e}_i)^H \mathsf {U} \, \varvec{e}_j\) and \(\varvec{e}^H \mathsf {U} \, \varvec{e}_j\) separately, which are less likely going to have a breakdown, thanks to the fact that \(\varvec{e}\) is a full vector; see, e.g., [35, Section 7.3]. Note that while this strategy can prevent a breakdown in the first iterations, a serious breakdown might still happen in later ones.

4 Examples

In this section, we use the \(*\)-Lanczos algorithm 1 on examples in ascending order of difficulty. All the computations have been performed using Mathematica 11.

Example 4.1

(Ordinary matrix exponential) Let us first consider a constant matrix

$$\begin{aligned} \mathsf {A}= \begin{pmatrix} -1 &{} 1 &{} 1 \\ 1 &{} 0 &{} 1 \\ 1 &{} 1 &{} -1 \end{pmatrix}. \end{aligned}$$

Because \(\mathsf {A}\) commutes with itself at all times, its time-ordered exponential coincides with its ordinary exponential, \(\mathcal {T}e^{\int \mathsf {A}(\tau )\, \text {d}\tau }\equiv e^{\mathsf {A}(t')}\) (we set \(t=0\)). Note that the matrix chosen here is symmetric only to lead to concise expressions suitable for presentation in an article, e.g.,

$$\begin{aligned} \big (e^{\mathsf {A}t'}\big )_{11}=-\frac{1}{2} \sinh (2 t')+\frac{1}{2} \cosh (2 t')+\frac{1}{2} \cosh \left( \sqrt{2} t'\right) , \end{aligned}$$
(4.1)

and that such symmetries are not a requirement of the \(*\)-Lanczos approach.

Now let us find the result of Eq. (4.1) with Algorithm 1. We define \(\varvec{w}:=\varvec{v}^H:=(1,0,0)\), \(\varvec{w}_0=\varvec{w}1_*\), \(\varvec{v}_0=\varvec{v}1_*\), from which it follows that \(\alpha _0(t',t) = -1\times \Theta (t'-t)\) and \(\varvec{w}_1=\widehat{\varvec{v}}_1^H=(0,1,1)\Theta (t'-t)\). Furthermore, since \(\mathsf {A}\) is a constant matrix times \(\Theta (t'-t)\), we have \(\mathsf {A}^{*n} = \tilde{\mathsf {A}}^n \times \Theta (t'-t)^{*n} = \tilde{\mathsf {A}}\times (t'-t)^{n-1}/(n-1)!\times \Theta (t'-t)\) and similarly \(\alpha _0^{*2}(t',t)=\tilde{\alpha }_0^2\times (t'-t)\Theta (t'-t)\). Thus

$$\begin{aligned}&\beta _1=\varvec{w}^H\mathsf {A}^2\varvec{v}\times (t'-t)\Theta (t'-t)-\tilde{\alpha }_0^2(t'-t)\Theta (t'-t) = 2(t'-t)\Theta (t'-t). \end{aligned}$$

The \(*\)-inverse follows as \(\beta ^{*-1}=\frac{1}{2} \delta ''\) [18], from which we get

$$\begin{aligned} \varvec{v}_1 = \widehat{\varvec{v}}_1*\beta _1^{*-1} = (0,1,1)^H\frac{1}{2}\delta '(t'-t), \end{aligned}$$

Now it follows that

$$\begin{aligned} \alpha _1(t',t)&=\varvec{w}_1*\mathsf {A}*\varvec{v}_1=\frac{1}{2}\Theta (t-t'),\\ \varvec{w}_2(t',t)&=\varvec{w}_1*\mathsf {A}-\alpha _1*\varvec{w}_1-\beta _1*\varvec{w}_0=(0,1,-1)\frac{1}{2}(t'-t)\Theta (t'-t),\\ \widehat{\varvec{v}}_2(t',t)&=\mathsf {A}*\varvec{v}_1- \varvec{v}_1*\alpha _1-\varvec{v}_0=(0,1,-1)^H\frac{1}{4}\delta (t'-t),\\ \beta _2&=\varvec{w}_2*\mathsf {A}*\varvec{v}_1 = \frac{1}{4}(t'-t)\Theta (t'-t). \end{aligned}$$

Then \(\beta _2^{*-1}=4 \delta ''\) and so

$$\begin{aligned} \varvec{v}_2&= \widehat{\varvec{v}}_2*\beta _2^{*-1}=(0,1,-1)^H\delta ''(t'-t) \alpha _2&=\varvec{w}_2*\mathsf {A}*\varvec{v}_2=-\frac{3}{2}\Theta (t'-t). \end{aligned}$$

At this point we have determined the \(*\)-Lanczos matrices \(\mathsf {T}\), \(\mathsf {V}\) and \(\mathsf {W}\) entirely

$$\begin{aligned} \mathsf {T}=\begin{pmatrix} -\Theta &{} \delta &{} 0 \\ 2 \Theta ^{*2} &{} \frac{1}{2}\Theta &{} \delta \\ 0 &{} \frac{1}{4}\Theta ^{*2} &{} -\frac{3}{2}\Theta \\ \end{pmatrix}, \quad \mathsf {V}=\begin{pmatrix} \delta &{} 0 &{} 0 \\ 0 &{} \frac{1}{2} \delta ' &{} \delta '' \\ 0 &{} \frac{1}{2} \delta ' &{} -\delta '' \end{pmatrix}, \quad \mathsf {W}^{H}=\begin{pmatrix} \delta &{} 0 &{} 0 \\ 0 &{} \Theta &{} \Theta \\ 0 &{} \frac{1}{2}\Theta ^{*2} &{} -\frac{1}{2}\Theta ^{*2} \end{pmatrix}. \end{aligned}$$

In all of these expressions, \(\Theta \) is a short-hand notation for \(\Theta (t'-t)\) and \(\delta \), \(\delta '\) and \(\delta ''\) are to be evaluated in \(t'-t\). It is now straightforward to verify the matching moment property \(\big (\mathsf {T}^{*j}\big )_{11}=\big (\mathsf {A}^{*j}\big )_{11}\) for all \(j\in \mathbb {N}\). We can also check directly that the time-ordered exponential of \(\mathsf {A}\) is correctly determined from \(\mathsf {T}\) using either the general formula of Eq. (2.17) or, because the situation is so simple that all entries depend only on \(t'-t\), we may use a Laplace transform with respect to \(t'-t\). This gives \(\mathsf {T}(s)\), and the inverse Laplace-transform of the resolvent \(\big (\mathsf {I}-\mathsf {T}(s)\big )^{-1}_{11}\) is the desired quantity. Both procedures give the same result, namely the derivative of \(e^{\mathsf {A}t}\) as it should [26], i.e.,

$$\begin{aligned} (\mathsf {Id}_*-\mathsf {T})^{*-1}_{11}(t',0) = \left( \sinh (2 t')+\frac{1}{\sqrt{2}}\sinh \left( \sqrt{2} t'\right) -\cosh (2 t')\right) \Theta (t'), \end{aligned}$$

which is indeed the derivative of Eq. (4.1).

Example 4.2

(Time-ordered exponential of a time-dependent matrix) In this example, we consider the \(5\times 5\) time-dependent matrix \(\mathsf {A}(t',t)=\tilde{\mathsf {A}}(t')\Theta (t'-t)\) with

$$\begin{aligned} \tilde{\mathsf {A}}(t') = \left( \begin{array}{ccccc} \cos (t') &{} 0 &{} 1 &{} 2 &{} 1 \\ 0 &{} \cos (t')-t' &{} 1-3 t' &{} t' &{} 0 \\ 0 &{} t' &{} 2 t'+\cos (t') &{} 0 &{} 0 \\ 0 &{} 1 &{} 2 t'+1 &{} t'+\cos (t') &{} t' \\ t' &{} -t'-1 &{} -6 t'-1 &{} 1-2 t' &{} \cos (t')-2 t' \\ \end{array} \right) . \end{aligned}$$

The matrix \(\tilde{\mathsf {A}}\) does not commute with itself at different times \(\tilde{\mathsf {A}}(t')\tilde{\mathsf {A}}(t)-\tilde{\mathsf {A}}(t)\tilde{\mathsf {A}}(t')\ne 0\), and the corresponding differential system Eq. (1.1) has no known analytical solution. We use Algorithm 1 to determine the tridiagonal matching moment matrix \(\mathsf {T}\) such that \(\big (\mathsf {A}^{*j}\big )_{11}=\big (\mathsf {T}^{*j}\big )_{11}\) for \(j \in \mathbb {N}\). We define \(\varvec{w}:=\varvec{v}^H:=(1,0,0,0,0)\), \(\varvec{w}_0=\varvec{w}1_*\), \(\varvec{v}_0=\varvec{v}1_*\), from which it follows that

$$\begin{aligned} \alpha _0(t',t)&= \cos (t')\Theta (t'-t),\\ \varvec{w}_1&=\big (0,0,1,2,1\big )\Theta (t'-t),\\ \widehat{\varvec{v}}_1&=\big (0,0,0,0,t')^H\Theta (t'-t),\\ \beta _1(t',t)&=\frac{1}{2} \left( t'^2-t^2\right) \Theta (t'-t). \end{aligned}$$

Observing that \(\beta _1 = \Theta (t'-t)*t'\Theta (t'-t)\), we get \(\beta _1^{*-1} = \frac{1}{t} \delta '(t'-t)*\delta '(t'-t)=-\frac{1}{t^2}\delta '(t'-t)+\frac{1}{t}\delta ''(t'-t)\), so that

$$\begin{aligned} \varvec{v}_1=\widehat{\varvec{v}}_1*\beta _1^{*-1}=\big (0,\,0,\,0,\,0,\,1\big )^H\delta '(t'-t), \end{aligned}$$

which terminates the initialization phase of the Algorithm. We proceed with

$$\begin{aligned} \alpha _1(t',t)&=\varvec{w}_1*\mathsf {A}*\varvec{v}_1 = \cos (t)\Theta (t'-t),\\ \varvec{w}_2&=\varvec{w}_1*\mathsf {A}-\alpha _1*\varvec{w}_1-\beta _1*\varvec{w}_0,\\&=\big (0,\,t'-t,\,t'-t,\,t'-t,\, 0\big )\Theta (t'-t),\\ \widehat{\varvec{v}}_2&=\mathsf {A}*\varvec{v}_1-\varvec{v}_1*\alpha _1 -\varvec{v}_0= \big (0,\,0,\,0,\,t,\,-2t\big )^H\delta (t'-t),\\ \beta _2&=\varvec{w}_2*\mathsf {A}*\varvec{v}_1 =t (t'-t)\Theta (t'-t). \end{aligned}$$

As we did for \(\beta _1\), we factorize \(\beta _2 = \Theta (t'-t) *t\, \Theta (t'-t)\) so that its \(*\)-inverse is \(\beta _2^{*-1} = \frac{1}{t'} \delta '(t'-t)*\delta '(t'-t)=\frac{1}{t'}\delta ''\). Then

$$\begin{aligned} \varvec{v}_2 = \big (0,\,0,\,0,\,1,\,-2\big )^H\delta ''(t'-t). \end{aligned}$$

Continuing in this fashion yields the tridiagonal output matrix \(\mathsf {T}_5\equiv \mathsf {T}\),

$$\begin{aligned} \mathsf {T}\!\!=\!\!\begin{pmatrix} \cos (t') \Theta &{} \delta &{} 0 &{} 0 &{} 0 \\ \frac{1}{2}(t'^2\!-\!t^2) \Theta &{} \cos (t) \Theta &{} \delta &{} 0 &{} 0 \\ 0 &{} t (t'\!-\!t) \Theta &{} \widetilde{\alpha }_2(t',t) \Theta &{} \delta &{} 0 \\ 0 &{} 0 &{} -\frac{1}{2}(3 t^2\!-\!4 t t'\!+\!t'^2) \Theta &{} \widetilde{\alpha }_3(t',t) \Theta &{} \delta \\ 0 &{} 0 &{} 0 &{} (-2 t^2 \!+\! 3 t t' \!-\! t'^2) \Theta &{} \widetilde{\alpha }_4(t',t) \Theta \\ \end{pmatrix}\!\!, \end{aligned}$$

with

$$\begin{aligned} \widetilde{\alpha }_2(t',t)&= (t'-t) \sin (t)+\cos (t), \\ \widetilde{\alpha }_3(t',t)&=\frac{1}{2} \left( 4 (t'-t) \sin (t)-\left( (t-t')^2-2\right) \cos (t) \right) , \\ \widetilde{\alpha }_4(t',t)&=\frac{1}{6} \left( \left( (t-t')^2-18\right) (t-t') \sin (t)+\left( 6-9 (t-t')^2\right) \cos (t)\right) , \end{aligned}$$

and the bases matrices

$$\begin{aligned} \mathsf {V}_5= \left( \begin{array}{ccccc} \delta &{} 0 &{} 0 &{} 0 &{} 0 \\ 0 &{} 0 &{} 0 &{} \delta ^{(3)} &{} -2 \delta ^{(4)} \\ 0 &{} 0 &{} 0 &{} 0 &{} \delta ^{(4)} \\ 0 &{} 0 &{} \delta ''&{} -\delta ^{(3)} &{} \delta ^{(4)} \\ 0 &{} \delta ' &{} -2 \delta ''&{} 2 \delta ^{(3)} &{} -3 \delta ^{(4)} \\ \end{array} \right) , \quad \mathsf {W}_5^H = \left( \begin{array}{ccccc} \delta &{} 0 &{} 0 &{} 0 &{} 0 \\ 0 &{} 0 &{} \Theta &{} 2 \Theta &{} \Theta \\ 0 &{} \Theta ^{*2} &{} \Theta ^{*2} &{} \Theta ^{*2} &{} 0 \\ 0 &{} \Theta ^{*3} &{} 2\Theta ^{*3} &{} 0 &{} 0 \\ 0 &{} 0 &{} \Theta ^{*4} &{} 0 &{} 0 \\ \end{array} \right) . \end{aligned}$$

In all of these expressions, \(\Theta \) and \(\delta ^{(n)}\) are short-hand notations respectively for \(\Theta (t'-t)\) and \(\delta ^{(n)}(t'-t)\). All the required \(\beta _j^{*-1}\) were calculated using the strategies described in [18], getting the factorized \(*\)-inverses

$$\begin{aligned} \beta _3^{*-1} = \frac{1}{t}\Theta (t'-t) *\delta ^{(3)}(t'-t), \quad \beta _4^{*-1} = \frac{t'}{t^2} \Theta (t'-t) *\delta ^{(3)}(t'-t). \end{aligned}$$

We have also verified that \(\big (\mathsf {A}^{*j}\big )_{11}=\big (\mathsf {T}^{*j}\big )_{11}\) holds for j up to 9. The \(*\)-resolvent of \(\mathsf {T}\) has no closed-form expression, its Neumann series likely converging to a hitherto undefined special function. Ultimately, such difficulties are connected with the propensity of systems of coupled linear ordinary differential equations with non-constant coefficients to produce transcendent solutions.

5 Outlook: numerical implementation

We do not expect closed-forms to exist in most cases for the entries of time-ordered matrix exponentials as these can involve complicated special functions [2]. Also, very large matrices \(\mathsf {A}(t')\) are to be treatable by the algorithm for it to be relevant to most applications. For these reasons, it is fundamental to implement the \(*\)-Lanczos algorithm numerically, e.g., using time discretization approximations.

As shown in [26], there exists an isometry \(\Phi \) between the algebra of distributions of \(\text {D}(I)\) equipped with the \(*\)-product and the algebra of time-continuous operators (for which the time variables \(t'\) and t serve as line and row indices). Consider, for simplicity, a discretization of the interval I with constant time step \(\Delta t\); then these operators become ordinary matrices. Specifically, given \(f, g \in \text {Sm}_\Theta (I)\), their discretization counterparts are the lower triangular matrices \(\mathsf {F}, \mathsf {G}\). Moreover, the function \(f *g\) corresponds to the matrix \(\mathsf {F} \, \mathsf {G} \, \Delta t\), with the usual matrix product. In other terms, the isometry \(\Phi \) followed by a time discretization sends the \(*\)-product to the ordinary matrix product times \(\Delta t\). Similarly, the Dirac delta distribution is sent to the identity matrix times \(1/(\Delta t)\), the kth Dirac delta derivative \(\delta ^{(k)}\) is sent to the finite difference matrix

$$\begin{aligned} \big (\mathsf {M}_{\delta ^{(k)}}\big )_{ij}=\frac{1}{(\Delta t)^{k+1}}{\left\{ \begin{array}{ll} (-1)^{i-j}\left( {\begin{array}{c}k\\ i-j\end{array}}\right) ,&{}\text {if }i\ge j\\ 0,&{}\text {else} \end{array}\right. }, \end{aligned}$$

and \(\Theta \) is sent to the matrix \( (\mathsf {M}_\Theta )_{ij}=1 \) if \(i\ge j\) and 0 otherwise. Most importantly, in this picture, the \(*\)-inverse of a function \(f(t',t)\in D(I)\) is given as \(\mathsf {F}^{-1}/(\Delta t)^2\), with \(\mathsf {F}\) the lower triangular matrix corresponding to f. Moreover, the time-discretized version of the path-sum formulation Eq. (2.17) only involves ordinary matrix resolvents. At the same time, the final integration of \(\mathsf {R}_*(\mathsf {T}_n)_{11}\) yielding \(\varvec{w}^H\mathsf {U}\,\varvec{v}\) becomes a left multiplication by \(\mathsf {M}_\Theta \). Therefore, a numerical implementation of the time-discretized \(*\)-Lanczos algorithm only requires ordinary operations on triangular matrices.

We can now meaningfully evaluate the numerical cost of the time-discretized version of the algorithm. Let \(N_t\) be the number of time subintervals in the discretization of I for both the t and \(t'\) time variables. Then time-discretized \(*\)-multiplications or \(*\)-inversions cost \(O(N_t^3)\) operations. Considering a sparse time-dependent matrix \(\mathsf {A}(t)\) with \(N_{nz}\) nonzero elements, the \(*\)-Lanczos algorithm therefore necessitates \(O(N_i\times N_t^3\times N_{nz})\) operations to obtain the desired \(\varvec{w}^H\mathsf {U}\, \varvec{v}\). Here \(N_i\) is the number of iterations needed to get an error lower than a given tolerance. Unfortunately, as well-explained in [36], the presence of computational errors can slow down the (usual) Lanczos algorithm convergence. Hence, in general, we cannot assume \(N_i \approx N\) since the \(*\)-Lanczos algorithm could analogously require more iterations. However, in many cases, the (usual) Lanczos algorithm demands few iterations to reach the tolerance also in finite precision arithmetic. We expect the \(*\)-Lanczos algorithm to behave analogously, giving \(N_i \ll N\) in many cases. Concerning \(N_t\), there is no reason to expect that is would depend on N since \(N_t\) controls the quality of individual generalized functions. We also remark that the \(*\)-Lanczos algorithm can exploit the sparsity structure of the matrix \(\mathsf {A}\), making it inherently competitive when dealing with large sparse matrices that are typical of applications.

The classical numerical methods (e.g., Runge–Kutta methods) for the approximation of the system of ODEs (1.1) are known to perform poorly in certain cases. These include for example, very large system sizes, or in the presence of highly oscillatory coefficients. Consequently, in the last decades, novel techniques have been sought and proposed, many of which are based on the Magnus series; see, for instance, [4, 23, 49,50,51,52,53,54,55,56]. However, for large matrices, these methods are known to be highly consuming in resources. This motivates the research of novel approaches in particular for large-scale problems. Here the guaranteed convergence of the \(*\)-Lanczos algorithm in a finite number of iterations, the sequence of approximations it produces, its ability to exploit matrix sparsity and its relations with numerically well studied Lanczos procedures are all promising elements which justify further works on concrete numerical implementations. More precise theoretical results about the overall approximation quality and further issues on numerical applications of the present algorithm are beyond the scope of this work. They will appear together with a practical implementation of the algorithm in a future contribution.

6 Conclusion

In this work, we constructed the \(*\)-Lanczos algorithm as a biorthogonalization process of Krylov subspaces composed of distributions with respect to the \(*\)-product, a convolution-like product. The algorithm relies on a non-commutative operation and is analogous in spirit to the non-Hermitian Lanczos algorithm. The \(*\)-Lanczos algorithm can express the element of a time-ordered exponential of size \(N \times N\) by the path-sum continued fraction (2.17). To our knowledge, such an expression is the only one composed of \(\mathcal {O}(N)\) scalar integro-differential equations. Such a time-ordered exponential approximation relies on the matching moment property proved in this paper.

The overall approach generates a controllable sequence of time-ordered exponential approximations, offers an innovative perspective of the connection between numerical linear algebra and differential calculus, and opens the door to efficient numerical algorithms for large-scale computations.