1 Introduction

Let \(\mu \) be a positive Borel measure on the real line whose support contains an infinite number of points and has finite moments:

$$\begin{aligned} -\infty<\int _{{\mathbb {R}}}x^n{\textrm{d}\mu }(x) < \infty ,\quad \forall n\in {\mathbb {N}}_0. \end{aligned}$$
(1.1)

We consider \(\textbf{P}(x) = (p_0(x) \quad p_1(x) \quad p_2(x) \quad \cdots )\) to be an orthonormal polynomial basis with respect to the inner-product:

$$\begin{aligned} \langle f, g\rangle _\mu = \int _{{\mathbb {R}}}f(x) g(x){\textrm{d}\mu }(x). \end{aligned}$$

That is, \(\langle p_m, p_n\rangle _\mu = \delta _{m,n}\). This inner-product induces a norm, \(\Vert f\Vert _\mu ^2 = \langle f, f\rangle _\mu \), and the Hilbert space \(L^2({{\mathbb {R}}}, {\textrm{d}\mu })\). We will assume that polynomials are dense in our Hilbert spaces, making them separable—necessary and sufficient conditions for this are discussed in [1, Theorem 2.3.3].

Orthonormal polynomials are important in numerical analysis for their ability to represent \(f\in L^2({{\mathbb {R}}}, {\textrm{d}\mu })\) via the expansion:

$$\begin{aligned} f(x) = \sum _{k=0}^\infty \langle f, p_k\rangle _\mu \,p_k(x) = \textbf{P}(x) \varvec{f}, \end{aligned}$$

where \(\varvec{f} = (\langle f, p_0\rangle _\mu , \langle f, p_1\rangle _\mu , \ldots )^\top \), establishing the well-known isometric isomorphism between \(L^2({{\mathbb {R}}}, {\textrm{d}\mu })\) and \(\ell ^2\) by identifying for each such function the unique corresponding infinite vector. For simplicity we assume our function spaces are over the reals (that is, we only consider real-valued functions and vectors).

Given a nontrivial rational function

$$\begin{aligned} r(x) = \frac{u(x)}{v(x)} \end{aligned}$$

with polynomials u and v, we will also consider \(\textbf{Q}(x) = (q_0(x) \quad q_1(x) \quad q_2(x) \quad \cdots )\) to be an orthonormal polynomial basis in \(L^2({{\mathbb {R}}}, r{\textrm{d}\mu })\). The conditions on \({\textrm{d}\mu }\) are also imposed on \(r {\textrm{d}\mu }\); in particular, r(x) is nonnegative and nonsingular on \({\text {supp}}(\mu )\). We wish to describe efficient algorithms for the connection problem between the original and the modified orthogonal polynomials, defined by the upper-triangular matrix R in:

$$\begin{aligned} \textbf{P}(x) = \textbf{Q}(x)R. \end{aligned}$$

That is, the computation of the coefficients \(R_{k,n}\) (the entries of the matrix R) such that:

$$\begin{aligned} p_n(x) = \sum _{k=0}^n q_k(x)R_{k,n}. \end{aligned}$$

Before doing so, it is important to relate this connection problem with the central object of a family of orthonormal polynomials, the Jacobi matrix. Every family of orthonormal polynomials \(\textbf{P}(x)\) has a Jacobi matrix \(X_P\) (an infinite irreducible symmetric tridiagonal matrix), that implements multiplication by x:

$$\begin{aligned} x\textbf{P}(x) = \textbf{P}(x)X_P. \end{aligned}$$

That is, if we denote the three-term recurrence as:

$$\begin{aligned} x p_n(x) = \beta _{n-1} p_{n-1}(x) + \alpha _n p_n(x) + \beta _n p_{n+1}(x),\quad p_{-1}(x) \equiv 0, \end{aligned}$$

then the nonzero entries of the Jacobi matrix \(X_P\) are \((X_P)_{k,k} = \alpha _{k-1}\), \((X_P)_{k+1,k} = (X_P)_{k,k+1} = \beta _{k-1}\). As u and v are polynomial it is now straightforward to define U and V by

$$\begin{aligned} u(x)\textbf{P}(x)&= \textbf{P}(x)\underbrace{u(X_P)}_U,\quad v(x)\textbf{P}(x) = \textbf{P}(x)\underbrace{v(X_P)}_V \end{aligned}$$

as symmetric banded commuting multiplication matrices whose bandwidths are respectively equal to the degrees of the corresponding polynomials. In practice the entries of U and V can be efficiently computed via Clenshaw’s algorithm [9, 52]. We also know that \(x\textbf{Q}(x) = \textbf{Q}(x)X_Q\), though at this point we assume only \(X_P\) is in hand.

Definition 1.1

For \(0<p<\infty \), let \(\ell ^p\) denote the space of all sequences \(\varvec{v}\in {{\mathbb {R}}}^\infty \) satisfying:

$$\begin{aligned} \sum _{n=0}^\infty |v_n|^p < \infty . \end{aligned}$$

By continuity, we shall consider \(\ell ^0\) to be the space of sequences with finitely many non-zeros.

In order to establish the results of this paper we need to think of the infinite-dimensional matrices (\(X_P\), \(X_Q\), U, V) as operators acting on function spaces, in-particular \(\ell ^0\) and \(\ell ^2\), as we require access to functional calculus (square-roots, inverses, etc.). Note that an infinite-dimensional matrix is essentially an operator in which \(\varvec{e}_j^\top A \varvec{e}_k \equiv \langle \varvec{e}_j, A \varvec{e}_k \rangle _{\ell ^2}\) exists for all j and k.

Definition 1.2

Let \(P_n\) denote the canonical orthogonal projection:

$$\begin{aligned} P_n = \begin{pmatrix} I_n &{} 0\\ 0 &{} 0\end{pmatrix}, \end{aligned}$$
(1.2)

where \(I_n\) is the \(n\times n\) identity and the zero blocks are of conformable sizes.

On \(\ell ^0\), triangular matrices with nonzero diagonals are trivially invertible. This can be seen by examining:

$$\begin{aligned} P_nRP_n = \begin{pmatrix} R_n &{} 0\\ 0 &{} 0\end{pmatrix}, \end{aligned}$$

and noting that \((R^{-1})_{j,k} = (R_n^{-1})_{j,k}\) provided that \(j,k\le n\); that is, the principal finite section is taken to be sufficiently large.

Theorem 1.3

(Gautschi [21]) Let \(X_P\) and \(X_Q\) be the Jacobi matrices for the original and modified orthonormal polynomials, \(\textbf{P}(x)\) and \(\textbf{Q}(x)\), respectively. Then there exists a unique infinite invertible upper triangular operator \(R: \ell ^0 \rightarrow \ell ^0\) such that:

  1. 1.

    \(\textbf{P}(x) = \textbf{Q}(x) R\) and,

  2. 2.

    \(RX_P = X_QR\).

Proof

The proof is in two parts.

  1. 1.

    Since both \(\textbf{P}(x)\) and \(\textbf{Q}(x)\) are complete orthonormal bases of degree-graded polynomials, the degree-n polynomial of one basis must result from the other as a unique linear combination of its first \(n+1\) elements, resulting in a unique upper triangular conversion operator. \(R^{-1}\) exists because all upper triangular operators with nonzero diagonals are invertible on \(\ell ^0\).

  2. 2.

    This follows from:

    $$\begin{aligned} x\textbf{P}(x) =\left\{ \begin{array}{c} \textbf{P}(x)X_P = \textbf{Q}(x) RX_P,\\ x\textbf{Q}(x) R = \textbf{Q}(x) X_QR. \end{array}\right. \end{aligned}$$

\(\square \)

This result shows that there is an infinite-dimensional similarity transformation between the two Jacobi matrices (viewed as operators on \(\ell ^0\)):

$$\begin{aligned} X_Q = RX_PR^{-1}, \end{aligned}$$

and the principal finite section of \(X_Q\) may be computed in linear complexity. In particular, it depends only on \(X_P\) and the main and first super-diagonals of R:

$$\begin{aligned} (X_Q)_{0,0}&= \frac{R_{0,0}(X_P)_{0,0} + R_{0,1}(X_P)_{1,0}}{R_{0,0}},\\ (X_Q)_{i,i+1}&= (X_Q)_{i+1,i} = \frac{R_{i+1,i+1}(X_P)_{i+1,i}}{R_{i,i}},\\ (X_Q)_{i,i}&= \frac{R_{i,i}(X_P)_{i,i} + R_{i,i+1}(X_P)_{i+1,i} - (X_Q)_{i,i-1}R_{i-1,i}}{R_{i,i}}. \end{aligned}$$

As a finite-dimensional similarity transformation preserves the spectrum, only the \((n-1)\times (n-1)\) principal finite section of \(X_Q\) may be recovered by the \(n\times n\) principal finite sections of R and \(X_P\)—an observation of Kautsky and Golub [30] that follows from triangular and tridiagonal matrix multiplication:

$$\begin{aligned} P_{n-1}X_QP_{n-1} = (P_{n-1}RP_n) (P_nX_PP_n) (P_n R^{-1}P_{n-1}). \end{aligned}$$

Remark 1.4

In the modified Hilbert space \(L^2({\mathbb {R}},r\textrm{d}\mu )\), \(\textbf{P}(x)\) is not orthonormal (apart from trivial measure modifications); hence, Theorem 1.3 (1) may be thought of as a QR factorization in the sense of quasimatrices.

The main contributions of this paper are outlined in Table 1, where we relate the computation of the connection coefficient matrix directly to different matrix factorizations in a unified manner. We also consider approximating more general bounded modifications by rationals (Sect. 2.1), and when orthogonal polynomials have banded derivative relationships (Sect. 2.3). Some of these factorizations are exactly realisable with finite-dimensional linear algebra (e.g. the Cholesky and QR factorizations), while others are inherently infinite-dimensional (e.g. the reverse Cholesky and QL factorizations), though in practice they can be approximated effectively using truncations (Sect. 3). Finally, we present several applications and numerical experiments making use of the results described in this paper (Sect. 4).

Table 1 A collection of connection problems resulting from banded matrix factorizations

1.1 Previous Work

Polynomial and rational measure modifications and its connection problem in Theorem 1.3 are classical problems discussed at length in Gautschi’s book on computational orthogonal polynomials [23]. We mention some selected important advances on this problem without any claim to historical completeness. Much of the early development of methods to compute nonclassical orthogonal polynomials was motivated by their applications in quadrature methods.

As early as 1968, it was already known to Mysovskikh [37] that the connection coefficients are the upper-triangular Cholesky factor of the Gram matrix, the matrix that corresponds to the left-hand side of our Eq. (2.1) in the case of a rational modification. This abstract notion does not require polynomial or rational structure to the modification; and conversely, the sparsity structures present in the more specific modifications are absent in that discussion. The next year, Uvarov [61] discusses the rational measure modification problem, uncovering the banded sparsity in the polynomial connection problem and in a Fourier–Christoffel version of the rational connection problem, \(u(x)\textbf{Q}(x) = \textbf{P}(x)C\), where C is a banded matrix with lower bandwidth \(\deg (u)\) and upper bandwidth \(\deg (v)\). The banded sparsity of C in fact characterizes the generalized Christoffel–Darboux formula, which explains why the modified basis must be multiplied by u(x). While this connection problem has been solved since 1969, there are a few outstanding numerical issues. Firstly, while banded sparsity is important numerically, a general banded matrix such as C must be further factored in order to allow the solution of linear systems. Secondly, Uvarov’s method requires the computation of the roots and poles of r(x), which are unstable with respect to perturbations. The method is also encumbered by the use of second-kind functions, whose numerical evaluation is not universally available. As a matter of computational complexity, the cost of computing the first n columns of C is \({\mathcal {O}}([\deg (u)+\deg (v)]^3n)\), as each column requires the solution of linear systems on the order of the bandwidth. We will note that Uvarov’s Fourier–Christoffel connection problem may be reconstructed from either of lines (5) or (6) in Table 1 as, for example, \(C = UL^\top R_{I}^{-1}\). While our table of factorizations may appear more complicated, the factors are all orthogonal or triangular, hence directly applicable and invertible.

By 1970, the work of Mysovskikh and Uvarov had begun disseminating in the West, though incompletely, with Gautschi [21] using the Cholesky factor of the Gram matrix to compute the modified Jacobi matrices for the purposes of modified Gaussian quadrature. Kumar [35] independently proves Uvarov’s Lemma 1, that reciprocal linear polynomial measure modifications connect the modified polynomials to the original polynomials by coefficients that are computed by second-kind functions. He uses this to compute modified first, second, and third kind Chebyshev quadrature rules, as the second-kind Chebyshev functions are particularly easy to compute. Price [29] generalizes the work of Kumar to a general rational measure modification. In the case of a reciprocal polynomial measure modification, his Theorems 1 and 2 uncover the banded sparsity in this special case of the connection problem; see line (3) in Table 1. This method corresponds to an upper–lower factorization of \(v(X_P)\), though it relies on second-kind functions to compute the first row/column of the factors to allow such a factorization to proceed directly. Since Price’s method relies on monomial moments, it is not suitable for large degree problems. In Skrzipek [49] builds on Price’s method by replacing the computation of monomial moments and second-kind functions with Gaussian quadrature for the starting columns in the upper–lower factorization of \(v(X_P)\). This improves the stability of the method, but results in a computational complexity that depends on the degree of the quadrature rule. Inevitably, this may become quite large when the poles of the rational modification come close to the support of \(\mu \).

It is well-known that Cholesky factorization is numerically stable, though the error may be large for an ill-conditioned matrix. In the context of polynomial modifications, if the roots of u(x) are near the support of \(\mu \), the condition number of U will be large. Kautsky and Golub [30] observed this phenomenon in the computation of \(X_Q\) by the similarity transformation implied by Theorem 1.3, and provided two alternative algorithms that maintain a much higher accuracy for the computation of the modified Jacobi matrix. The first requires an orthogonal diagonalization of a principal finite section of \(X_P\), while the second requires the polynomial u(x) to be factored in terms of its roots, which allows Cholesky and QR factorizations of successive modifications to take place. In related work, Buhmann and Iserles [7] use the unshifted infinite-dimensional QR algorithm to compute the \(r(x) = x^2\) measure modification and draw interesting connections with certain Sobolev orthogonal polynomials. Elhay and Kautsky [15] develop an inverse Cholesky method for a reciprocal degree polynomial modification of degree at most 2, where a finite-rank correction based on second-kind function evaluation is added to the top of V to allow the upper–lower factorization to proceed from the top down, that is, directly. Combined with a partial fraction decomposition of r(x) and the summation technique of Elhay et al. [14], the method enables a general rational weight modification. Elhay and Kautsky compare their method to Gautschi’s method [22] of minimal solutions of certain recurrence relations and find the performance of methods complimentary: where one is poor the other is excellent, and vice versa.

Finally, we also mention that aside from the Gram–Schmidt type approaches which constitute the primary methods referenced in the literature and the secondary line of research projects on factorization methods outlined above, there have also been tertiary approaches to computing non-classical orthogonal polynomials independent of these two. An example of such a method is to solve the corresponding Riemann–Hilbert problem for the modified orthogonal polynomials as discussed in [60].

1.2 Infinite-Dimensional Matrix Factorizations

In finite dimensions, matrix factorizations are a powerful collection of tools for the solution of various ubiquitous types of problems. Some commonly used factorizations include the LU, QR, Cholesky, and Schur factorizations among many others [24, 59]. Depending on the specific needs of a particular application, there also may exist different algorithms for these factorizations favouring either speed or accuracy and allowing different levels of parallelization, a typical example being the contrast of Gram–Schmidt and Householder-based QR factorization.

Cholesky factorization of a symmetric positive-definite square matrix \(M = M^\top \in {\mathbb {R}}^{n\times n}\) represents M as the unique product \(R^\top R\), with an upper-triangular matrix R with positive diagonal entries. Reverse Cholesky factorization produces an upper–lower factorization of the form \(M = L^\top L\). QR is one of the most well-studied and widely used matrix factorizations, in particular in the form of so-called thin or reduced QR factorization, which allows the representation of a matrix \(M\in {\mathbb {R}}^{m\times n}\) in terms of a product of a matrix \(Q\in {\mathbb {R}}^{m\times n}\) with orthonormal columns and an upper-triangular matrix \(R\in {\mathbb {R}}^{n\times n}\). This factorization is unique up to a phase, or a diagonal matrix \(D\in {\mathbb {R}}^{n\times n}\) with entries \(\pm 1\), since \(QR = QDDR\) and the matrices QD and DR are respectively also orthogonal and upper-triangular. The QL factorization is defined analogously, where the matrix L now is lower-triangular. For orthogonal-triangular matrix factorizations, we shall impose uniqueness by adopting the positive phase convention in L and R, whereby \(L_{i,i} > 0\) and \(R_{i,i}>0\), \(\forall i\).

In this work, we shall find it convenient to work with the infinite-dimensional analogues of Cholesky, reverse Cholesky, QR and QL factorizations. In a computing environment, one can represent infinite-dimensional linear operators as cached adaptive arrays [44], and this is facilitated and simplified in the case of banded operators. There is a distinction between Cholesky and QR on the one hand and reverse Cholesky and QL on the other, where finite sections of the former are computable (that is, with a finite number of operations), while finite sections of the latter are not. By perturbative arguments, we shall show in Sect. 3 when we expect (larger) finite section methods to approximate the true finite sections of the non-computable matrix factorizations.

If \(A:\ell ^2\rightarrow \ell ^2\) is a bounded invertible linear operator, then \(A = QR\), where Q is not only an isometry,Footnote 1\(Q^\top Q = I\), but also orthogonal, \(QQ^\top = I\), and R is upper-triangular, bounded, and invertible. We refer the reader to [10, 12, 27] for further details. If A is banded and unbounded, then a QR factorization still exists with an isometric Q because it is independent of a diagonal column scaling, say D, such that \(A D:\ell ^2\rightarrow \ell ^2\). If A has dense column space, then Q is also orthogonal as Q and A must have the same range, a fact that was used in [3]. Moreover, the diagonal column scaling shows that the upper-triangular operator R is unbounded with the same growth as A.

Cholesky and reverse Cholesky factorizations may be extended analogously to the infinite-dimensional case, but there is a subtlety to the QL factorization. For QL to exist with orthogonal Q, it is sufficientFootnote 2 that A be invertible [63].

1.3 Contemporary Applications

Almost all previous work mentions modified Gaussian quadrature as the archetypal application. But there are numerous reasons to be interested in modified classical orthogonal polynomials. Our interest can be divided into two classes of problems. Firstly, we are interested in the orthogonal structure in and on algebraic curves. Recent advances in this direction include a description of the orthogonal structures on quadratic [46], planar cubic [16], and a class of planar algebraic curves [17] and quadratic surfaces of revolution [45]. Secondly, we are interested in the Koornwinder constructions of bivariate (and multivariate) orthogonal polynomials that are semi-classical. Orthogonal polynomials on half-disks and trapeziums are described in [54] and on spherical caps in [55] and we briefly introduce two more classes of polynomials on an annulus and a spherical band in Sects. 4.4 and 4.5, respectively. Orthogonal polynomials with nonclassical weights can further be used for computing random matrix statistics for general invariant ensembles [11], and in-particular, sampling their eigenvalues corresponds to a sequence of rational modifications of the measure [41].

2 Rational Measure Modification via Infinite-Dimensional Matrix Factorizations

In what follows, we denote the synthesis operator:

$$\begin{aligned} {{{\mathcal {S}}}}:\ell ^2\rightarrow L^2({{\mathbb {R}}},{\textrm{d}\mu }),\quad \varvec{f}\mapsto \textbf{P} \varvec{f} = f, \end{aligned}$$

and the analysis operator:

$$\begin{aligned} {{{\mathcal {A}}}}:L^2({{\mathbb {R}}}, {\textrm{d}\mu })\rightarrow \ell ^2,\quad f\mapsto \int _{{\mathbb {R}}}\textbf{P}(x)^\top f(x) {\textrm{d}\mu }(x) = \varvec{f}. \end{aligned}$$

For computational purposes, it is well-known that both operators have discrete analogues [42].

Note that any bounded invertible operator has QR and QL factorizations and any symmetric positive definite operator has a Cholesky factorization.

Proposition 2.1

[10, 12, 27] If \(A:\ell ^2\rightarrow \ell ^2\) is a bounded invertible operator then the QR factorization (\(A = QR\) for Q orthogonal and R is invertible and bounded upper triangular) exists. If A is not invertible then Q is only guaranteed to be an isometry and R may not be invertible.

Proposition 2.2

[63]Footnote 3 If \(A:\ell ^2\rightarrow \ell ^2\) has a bounded inverse, then the QL factorization (\(A = QL\) for Q orthogonal and L lower triangular) exists.

In the infinite-dimensional setting there are two possible definitions for positive definite (which are equivalent in finite-dimensions) as outlined in the following definition:

Definition 2.3

A linear operator \(A:\ell ^2\rightarrow \ell ^2\) is:

  1. 1.

    Positive semi-definite if \(\langle \varvec{v}, A \varvec{v}\rangle \ge 0\), for all \(\varvec{v} \in \ell ^2\);

  2. 2.

    Positive definite if \(\langle \varvec{v}, A \varvec{v}\rangle > 0\), for all nonzero \(\varvec{v} \in \ell ^2\); and,

  3. 3.

    Strictly positive definite if \(\exists M>0\) such that \(\langle \varvec{v}, A \varvec{v}\rangle \ge M\langle \varvec{v}, \varvec{v}\rangle \), for all \(\varvec{v} \in \ell ^2\).

Proposition 2.4

[8, 25, 26] Let \(A:\ell ^2\rightarrow \ell ^2\) be a self-adjoint linear operator with dense domain in \(\ell ^2\). If \(A = R^\top R\) with upper-triangular R, then:

  1. 1.

    A is positive semi-definite;

  2. 2.

    R is unique and invertible on \(\ell ^0\) only if A is positive definite; and,

  3. 3.

    R is unique and invertible on \(\ell ^2\) only if A is strictly positive definite.

Corollary 2.5

Let \(A:\ell ^2\rightarrow \ell ^2\) be a self-adjoint linear operator with dense domain in \(\ell ^2\). If A is invertible and \(A = L^\top L\) with lower-triangular L, then:

  1. 1.

    \(A^{-1}\) is positive semi-definite;

  2. 2.

    L is unique and invertible on \(\ell ^0\) only if \(A^{-1}\) is positive definite; and,

  3. 3.

    L is unique and invertible on \(\ell ^2\) only if \(A^{-1}\) is strictly positive definite.

As occurs in unbounded domains, Jacobi matrices may be unbounded linear operators. Thus, our analytical framework for functional calculus must include bounded and unbounded functions of bounded and unbounded operators. Thus we require the Borel functional calculus, which will allow us to define functions of the Jacobi matrix through the projection-valued measure, \({\textrm{d}\mu }_\textbf{P}(x) = \textbf{P}(x)^\top \textbf{P}(x){\textrm{d}\mu }(x)\). We refer the interested reader to Reed and Simon [48, § VIII.3] for further details.

Definition 2.6

(p. 263 in [48]) Let r be a measurable function with finite \(\mu \)-moments:

$$\begin{aligned} -\infty<\int _{{\mathbb {R}}}x^n r(x){\textrm{d}\mu }(x) < \infty ,\quad \forall n\in {\mathbb {N}}_0. \end{aligned}$$

We define:

$$\begin{aligned} r(X_P):= \int _{{\mathbb {R}}}r(x) {\textrm{d}\mu }_\textbf{P}(x). \end{aligned}$$

If r is a polynomial, then \(r(X_P)\) is also realized by operator composition and the Clenshaw algorithm, as alluded to earlier, and if \(r(x) = \frac{u(x)}{v(x)}\) is a rational function, then it follows from \(\textbf{P}(x) r(X_P) = r(x) \textbf{P}(x) = \frac{u(x)}{v(x)} \textbf{P}(x)\) that:

$$\begin{aligned} r(X_P) = \underbrace{u(X_P) v(X_P)^{-1}}_{UV^{-1}} = \underbrace{v(X_P)^{-1} u(X_P)}_{ V^{-1} U}. \end{aligned}$$

Proposition 2.7

If \(r{\textrm{d}\mu }\) is a positive Borel measure with finite moments, as in Eq. (1.1), then \(r(X_P)\) is symmetric positive definite.

Proof

Symmetry of \(r(X_P)\) is a direct consequence of the orthonormality of \(\textbf{P}(x)\). To show positive definiteness we observe that for any nonzero \(\varvec{v} \in \ell ^2\) we have a nonzero \(v(x) = \textbf{P}(x) \varvec{v}\in L^2({{\mathbb {R}}}, {\textrm{d}\mu })\):

$$\begin{aligned} \langle \varvec{v}, r(X_P)\varvec{v}\rangle = \int _{{\mathbb {R}}}v(x)^2 r(x) {\textrm{d}\mu }(x)>0. \end{aligned}$$

See also [48, Theorem VIII.5 (f)]. \(\square \)

From this last result we can translate the computation of the conversion operator R between \(\textbf{P}(x)\) and \(\textbf{Q}(x)\) to computing a Cholesky factorization:

Lemma 2.8

If \(r{\textrm{d}\mu }\) is a positive Borel measure with finite moments, then the Cholesky factorization of \(r(X_P)\) encodes the conversion matrix R, i.e.:

$$\begin{aligned} r(X_P) = R^\top R, \end{aligned}$$
(2.1)

if and only if \(\textbf{P}(x) = \textbf{Q}(x) R\). Furthermore:

$$\begin{aligned} r(x) \textbf{Q}(x) = \textbf{P}(x) R^\top . \end{aligned}$$

Proof

By Proposition 2.7, \(r(X_P)\) is a self-adjoint positive definite operator hence its Cholesky factorization exists and R is invertible on \(\ell ^0\): this much was known to Mysovskikh [37], who called principal finite sections of \(r(X_P)\) the Gram matrix. Thus \({\tilde{{\textbf {Q}}}}(x) = \textbf{P}(x) R^{-1}\) connects two families of graded polynomials with leading coefficients of the same sign. While \(R^{-1}\) is not necessarily bounded on \(\ell ^2\) we have \(R^{-1} \varvec{e}_n \in \ell ^2\) for all n. Thus we find for \(\tilde{q}_n(x) = {\tilde{{\textbf {Q}}}}(x) \varvec{e}_n\):

$$\begin{aligned} \int _{{\mathbb {R}}}\tilde{q}_m(x)^\top r(x) \tilde{q}_n(x){\textrm{d}\mu }(x)&= \varvec{e}_m^\top R^{-\top } \int _{{\mathbb {R}}}\textbf{P}(x)^\top r(x) \textbf{P}(x){\textrm{d}\mu }(x) R^{-1} \varvec{e}_n \\&= \varvec{e}_m^\top R^{-\top } r(X_P) R^{-1} \varvec{e}_n = \delta _{m,n}, \end{aligned}$$

which shows by uniqueness \({\tilde{{\textbf {Q}}}}(x) = \textbf{Q}(x)\). For the other direction we have

$$\begin{aligned} r(X_P) = \int _{{\mathbb {R}}}\textbf{P}(x)^\top r(x) \textbf{P}(x) {\textrm{d}\mu }(x) = R^\top \int _{{\mathbb {R}}}\textbf{Q}(x)^\top r(x) \textbf{Q}(x) {\textrm{d}\mu }(x) R = R^\top R. \end{aligned}$$

The final statement follows since:

$$\begin{aligned} r(x) \textbf{Q}(x) = \textbf{P}(x) r(X_P) R^{-1} = \textbf{P}(x) R^\top . \end{aligned}$$

\(\square \)

The polynomial measure modification, where \(r(x) = u(x)\) is a polynomial, corresponds to the case \(V = I\), in which case the banded Cholesky factorization of U returns the connection coefficients, showing line (1) of Table 1, which is equivalent to the result of Gautschi [21]. That is, in this setting one can establish the result using only finite-dimensional linear algebra.

In the case that \(\sqrt{u(x)}\) is a polynomial we establish line (2) of Table 1 as follows:

Lemma 2.9

Suppose \(r(x) = u(x)\) where \(\sqrt{u(x)}\) is a polynomial. The QR factorization of \(\sqrt{U}\) encodes the conversion matrix R, i.e.:

$$\begin{aligned} \sqrt{U} = Q R, \end{aligned}$$
(2.2)

if and only if \(\textbf{P}(x) = \textbf{Q}(x) R\). Furthermore:

$$\begin{aligned} \sqrt{u(x)} \textbf{Q}(x) = \textbf{P}(x) Q. \end{aligned}$$

Proof

First note that \(R_{n,n}>0\) because the Cholesky factor of \(r(X_P)\), the square of a polynomial, is invertible on \(\ell ^0\) (exactly as argued above). The first result follows as above by observing that

$$\begin{aligned} U = \sqrt{U}^\top \sqrt{U}= R^\top Q^\top Q R = R^\top R. \end{aligned}$$

The second result follows from:

$$\begin{aligned} \sqrt{u(x)} \textbf{Q}(x) = \textbf{P}(x) \sqrt{U} R^{-1} = \textbf{P}(x) Q. \end{aligned}$$

\(\square \)

We now consider \(r(x) = 1/v(x)\) to be a reciprocal polynomial. Note that the entries of \(V^{-1}\) are not computable.Footnote 4 However, lines (3) and (4) of Table 1 can be deduced from factorizations of V directly: we know from the fact that r(x) has no poles in \({\text {supp}}(\mu )\) that V has a bounded inverse. By Proposition 2.2 and Corollary 2.5, there exist QL and reverse Cholesky factorizations and then the proofs follow along the same lines as above.

For other rational modifications we use factorizations of the left-hand side of Eq. (2.1) that respect the upper-triangular structure of the Cholesky factors on the right-hand side. Two avenues to proceed require the infinite-dimensional QL and reverse Cholesky algorithms.

Theorem 2.10

(Rational case 1) Suppose \(r \in L^\infty ({{\mathbb {R}}}, {\textrm{d}\mu })\) and let \(V = QL\). Then

$$\begin{aligned} Q^\top UL^\top = R_{I}^\top R_{I}, \end{aligned}$$

if and only if \(\textbf{Q}(x)R_{I}= \textbf{P}(x) L^\top \). Furthermore:

$$\begin{aligned} v(x) \textbf{P}(x) = \textbf{Q}(x) R_{I}Q^\top . \end{aligned}$$

Proof

As above, nonsingularity of r(x) on \({\text {supp}}(\mu )\) guarantees that V is invertible, and we find \(V^{-1} = L^{-1}Q^\top \). Define \({\tilde{\mathbf{{Q}}}}(x):= \textbf{P}(x) L^\top R_{I}^{-1}\) and \(\tilde{q}_n(x)\) as above, so that

$$\begin{aligned} \int _{{\mathbb {R}}}{{\tilde{q}}}_m(x)^\top {u(x) \over v(x)} {{\tilde{q}}}_n(x) {\textrm{d}\mu }(x)&= \varvec{e}_m^ \top R_{I}^{-\top } L V^{-1} U L^{\top } R_{I}^{-1} \varvec{e}_n = \delta _{m,n}. \end{aligned}$$

For the other direction we have \(R = R_{I}L^{-\top }\) hence

$$\begin{aligned} Q^\top U L^\top = L V^{-1} U L^\top = L R^\top \int _{{\mathbb {R}}}\textbf{Q}(x)^\top r(x) \textbf{Q}(x) {\textrm{d}\mu }(x) R L^\top =R_{I}^\top R_{I}. \end{aligned}$$

The final statement follows since \(V = V^\top = L^\top Q^\top \) and hence:

$$\begin{aligned} v(x) \textbf{P}(x) = \textbf{P}(x) V = \textbf{Q}(x) R_{I}L^{-\top } V = \textbf{Q}(x) R_{I}Q^\top . \end{aligned}$$

\(\square \)

Theorem 2.11

(Rational case 2) Suppose \(r \in L^\infty ({{\mathbb {R}}}, {\textrm{d}\mu })\) and let \(V = L^\top L\). Then

$$\begin{aligned} L UL^{-1} = R_{I \hspace{-2pt} I}^\top R_{I \hspace{-2pt} I}, \end{aligned}$$

if and only if \(\textbf{Q}(x)R_{I \hspace{-2pt} I}= \textbf{P}(x) L^\top \). Furthermore:

$$\begin{aligned} v(x) \textbf{P}(x) = \textbf{Q}(x) R_{I \hspace{-2pt} I}L. \end{aligned}$$

Proof

Note for the last time that nonsingularity of r(x) on \({\text {supp}}(\mu )\) guarantees that V is invertible, and we find \(V^{-1} = L^{-1}L^{-\top }\). Define \({\tilde{{\textbf {Q}}}}(x):= \textbf{P}(x) L^\top R_{I \hspace{-2pt} I}^{-1}\) and \(\tilde{q}_n(x)\) as above, so that

$$\begin{aligned} \int _{{\mathbb {R}}}\tilde{q}_m(x)^\top {u(x) \over v(x)} \tilde{q_n}(x) {\textrm{d}\mu }(x)&= \varvec{e}_m^ \top R_{I \hspace{-2pt} I}^{-\top } L V^{-1} U L^{\top } R_{I \hspace{-2pt} I}^{-1} \varvec{e}_n = \delta _{m,n}. \end{aligned}$$

For the other direction we have \(R = R_{I \hspace{-2pt} I}L^{-\top }\) hence

$$\begin{aligned} L U L^{-1} = L U V^{-1} L^\top = L \int _{{\mathbb {R}}}\textbf{P}(x)^\top r(x) \textbf{P}(x) {\textrm{d}\mu }(x) L^\top = R_{I \hspace{-2pt} I}^\top R_{I \hspace{-2pt} I}. \end{aligned}$$

The final statement follows since \(V = V^\top = L^\top L\) and hence:

$$\begin{aligned} v(x) \textbf{P}(x) = \textbf{P}(x) V = \textbf{Q}(x) R_{I \hspace{-2pt} I}L^{-\top } V = \textbf{Q}(x) R_{I \hspace{-2pt} I}L. \end{aligned}$$

\(\square \)

Through banded matrix factorizations, Table 1 generalizes many familiar identities for the classical orthogonal polynomials. We give a few examples using the orthonormal generalized Laguerre polynomials [40, § 18.3], related to the standard normalization by:

$$\begin{aligned} \tilde{L}_n^{(\alpha )}(x) = \sqrt{\frac{\Gamma (n+1)}{\Gamma (n+\alpha +1)}}L_n^{(\alpha )}(x). \end{aligned}$$

Given \(X_{\tilde{L}^{(\alpha )}}\), whose entries are determined by:

$$\begin{aligned} x\tilde{L}_n^{(\alpha )}(x)= & {} -\sqrt{n(n+\alpha )}\tilde{L}_{n-1}^{(\alpha )}(x) + (2n+\alpha +1)\tilde{L}_n^{(\alpha )}(x) \\{} & {} - \sqrt{(n+1)(n+\alpha +1)}\tilde{L}_{n+1}^{(\alpha )}(x), \end{aligned}$$

four other operators arise by Cholesky and QR factorization. These are:

  1. 1.

    The raising operator that is the upper Cholesky factor of \(X_{\tilde{L}^{(\alpha )}} = R^\top R\):

    $$\begin{aligned} \tilde{L}_n^{(\alpha )}(x) = \sqrt{n+\alpha +1}\tilde{L}_n^{(\alpha +1)}(x) - \sqrt{n}\tilde{L}_{n-1}^{(\alpha +1)}(x), \end{aligned}$$
  2. 2.

    The lowering operator that is the transpose of the raising operator:

    $$\begin{aligned} x\tilde{L}_n^{(\alpha +1)}(x) = \sqrt{n+\alpha +1}\tilde{L}_n^{(\alpha )}(x) - \sqrt{n+1}\tilde{L}_{n+1}^{(\alpha )}(x), \end{aligned}$$
  3. 3.

    The iterated raising operator that is the triangular factor of \(X_{\tilde{L}^{(\alpha )}} = QR\):

    $$\begin{aligned} \tilde{L}_n^{(\alpha )}(x) =&\, \sqrt{(n+\alpha +1)(n+\alpha +2)}\tilde{L}_n^{(\alpha +2)}(x)\\&- 2\sqrt{n(n+\alpha +1)}\tilde{L}_{n-1}^{(\alpha +2)}(x) + \sqrt{n(n-1)}L_{n-2}^{(\alpha +2)}(x), \end{aligned}$$
  4. 4.

    The corresponding orthogonal factor [42, Corollary 4.8]:

    $$\begin{aligned}&x\tilde{L}_n^{(\alpha +2)}(x) + \sqrt{\frac{n+1}{n+\alpha +2}}\tilde{L}_{n+1}^{(\alpha )}(x)\\&\quad \quad =(\alpha +1)\sum _{\ell =0}^n\sqrt{\frac{\Gamma (\ell +\alpha +1)}{\Gamma (\ell +1)} \frac{\Gamma (n+1)}{\Gamma (n+\alpha +3)}} \tilde{L}_\ell ^{(\alpha )}(x). \end{aligned}$$

Remark 2.12

In the general rational cases, it is important to relate the main and first super-diagonals of R to the factored forms of the connection coefficients to efficiently compute the modified Jacobi matrix \(X_Q\). In the first general rational case, \(R = R_{I}L^{-\top }\), so:

$$\begin{aligned} R_{i,i}&= (R_{I})_{i,i}/L_{i,i},\\ R_{i,i+1}&= \frac{(R_{I})_{i,i+1} - R_{i,i}L_{i+1,i}}{L_{i+1,i+1}}. \end{aligned}$$

In the second general rational case, the same formulas hold with \(R_{I}\leftrightarrow R_{I \hspace{-2pt} I}\).

Finally, we observe that \(R_{I}\) is an upper-triangular banded matrix with upper bandwidth at most \(\deg (u)+\deg (v)\). This inference follows from the same bound on the lower bandwidth of \(Q^\top U L^\top \). Similarly, \(R_{I \hspace{-2pt} I}\) is an upper-triangular banded matrix with upper bandwidth at most \(\deg (u)\), a result that follows from comparison with \(LUL^{-1}\). Since \(R_{I}\) and \(R_{I \hspace{-2pt} I}\) are banded matrices, their principal finite sections are computable in \({\mathcal {O}}(n)\) flops.

2.1 \(L^\infty \) Measure Modifications

It is reasonable to consider irrational measure modifications r(x). We wish to characterize the rational approximation error in \(L^\infty ({{\mathbb {R}}},{\textrm{d}\mu })\) to the 2-norm error in approximating the infinite-dimensional matrix \(r(X_P)\). In this way, we will understand when polynomial and rational approximants construct nearby matrices of measure modifications to the true problems. This confers a significant computational advantage as then we are able to leverage the banded sparsity in the nearby problem.

Lemma 2.13

(Proposition 1, § VIII.3 in [48]) For any \(r\in L^\infty ({{\mathbb {R}}}, {\textrm{d}\mu })\):

$$\begin{aligned} \Vert r(X_P)\Vert _2 = \Vert r\Vert _{L^\infty ({{\mathbb {R}}}, {\textrm{d}\mu })}. \end{aligned}$$

A consequence of this is that we can control the error in approximating a measure modification by a rational function:

Theorem 2.14

Given \(\epsilon >0\) and a measure modification \(r\in L^\infty ({{\mathbb {R}}}, {\textrm{d}\mu })\), if it is approximated by a rational function u/v such that:

$$\begin{aligned} \left\| r - \frac{u}{v}\right\| _{L^\infty ({{\mathbb {R}}}, {\textrm{d}\mu })} \le \epsilon \Vert r\Vert _{L^\infty ({{\mathbb {R}}}, {\textrm{d}\mu })}, \end{aligned}$$

then:

$$\begin{aligned} \left\| r(X_P) - V^{-1}U\right\| _2 \le \epsilon \Vert r(X_P)\Vert _2. \end{aligned}$$

As an example, we consider a Jacobi polynomial approximation to an analytic function on \([-1,1]\). By classical results of approximation theory [58, Theorems 8.2 and 12.1], the canonical finite-dimensional interpolation and projection errors are dominated by exponential convergence; in particular, the degree dependence is \({\mathcal {O}}(\log \epsilon ^{-1})\) as \(\epsilon \rightarrow 0\). Thus, to relative error \(\epsilon \), the matrix \(r(X_P)\) is well-approximated by U (and thereby trivially \(V^{-1}U\)) of effectively finite bandwidth.

On unbounded domains, it is impossible to approximate a bounded measure modification in the uniform norm by polynomials, partially motivating our interest in rational approximation.

2.2 Measure Modifications with M-Matrix Connection Coefficients

We include in this section a result on a property of the measure modification that implies a property of the connection coefficients. Since this property is inherently entry-wise, it is understood that in this section we work with principal finite sections of the connection coefficients; hence, all matrices are finite-dimensional.

Definition 2.15

A positive-definite matrix A is an M-matrix if \(A_{i,i} > 0\), \(A_{i,j}\le 0\) for \(i\ne j\). A symmetric M-matrix is also known as a Stieltjes matrix.

Triangular M-matrices and their inverses have tighter-than-general 2-norm condition number estimates [47]; and it has been useful to identify this property in the setting of a connection problem [31].

Lemma 2.16

(Fiedler and Pták [18]) Let A be an M-matrix. Then \(A=LU\) and both L and U are M-matrices.

In particular, this means that the Cholesky factors of a Stieltjes matrix are M-matrices.

Definition 2.17

(Fiedler and Schneider [19]) A smooth function \(f:(0,\infty )\rightarrow [0,\infty )\) is completely monotonic if \((-1)^kf^{(k)}(x)\ge 0\) for \(x>0\) and \(k\in {\mathbb {N}}_0\).

Theorem 2.18

(Fiedler and Schneider [19]) Let f be positive on \((0,\infty )\) and \(f'\) be completely monotonic. Then A is an M-matrix if and only if f(A) is an M-matrix.

Corollary 2.19

Let \(r(x) = f(s(x))\) where f is positive on \((0,\infty )\) and \(f'\) is completely monotonic and s is smooth. If principal finite sections of \(s(X_P)\) are Stieltjes matrices, then principal finite sections of \(r(X_P)\) are Stieltjes matrices if and only if principal finite sections of R are M-matrices.

Proof

This corollary constructively combines Lemma 2.16 and Theorem 2.18. \(\square \)

Given the Jacobi polynomials, it is easy to show that principal finite sections of \(I-X_P\) and \(I-X_P^2\) are Stieltjes matrices, corresponding to \(s(x) = 1-x\) and \(s(x) = 1-x^2\), respectively. For the Laguerre polynomials, principal finite sections of \(X_P\) itself are Stieltjes matrices. A few concrete examples of f positive and \(f'\) completely monotonic include:

  1. 1.

    \(f(x) = x^\lambda \) for \(0\le \lambda \le 1\); and,

  2. 2.

    \(\displaystyle f(x) = \frac{x}{x+\gamma }\) for \(\gamma >0\).

2.3 Banded Derivatives of Modified Classical Orthogonal Polynomials

Classical orthogonal polynomials are characterized by Bochner [5] and Krall [34] as the polynomial solutions of the degree-preserving second-order linear differential equation:

$$\begin{aligned} \left( \sigma \mathcal {D}^2 + \tau \mathcal {D}+ \lambda _n\right) p_n = 0, \end{aligned}$$
(2.3)

where \(\sigma \) and \(\tau \) are polynomials in x independent of n that satisfy \(\deg (\sigma ) \le 2\) and \(\deg (\tau ) \le 1\), and the eigenvalues \(\lambda _n = -\frac{n}{2}\left[ (n-1)\sigma ''+2\tau '\right] \ge 0\). We call this factorization degree-preserving because the degrees of the polynomial variable coefficients do not exceed the orders of the respective differential operators.

In addition, the measure is expressed in terms of a positive weight function, \({\textrm{d}\mu }(x) = w(x)\,\textrm{d}x\) supported on the (possibly unbounded) interval (ab), that satisfies the first-order Pearson differential equation:

$$\begin{aligned} \mathcal {D}(\sigma w) = \tau w. \end{aligned}$$
(2.4)

A useful property of \(\sigma \) is that it is zero at finite boundary points: for \(\chi \) equal to a or b if they are finite we have \( \sigma (\chi ) w(\chi ) = 0, \) annihilating boundary terms in integration by parts.

By considering the self-adjoint form of Eq. (2.3):

$$\begin{aligned} (-\mathcal {D})\left( \sigma w\mathcal {D}\right) p_n = \lambda _n w p_n, \end{aligned}$$
(2.5)

it follows that the polynomials \(\mathcal {D}p_n\) are orthogonal with respect to \(L^2({{\mathbb {R}}}, \sigma {\textrm{d}\mu })\). Thus the characterization of classical orthogonal polynomials is often stated as those polynomials whose derivatives are also orthogonal polynomials.

It follows that differentiation of the classical orthogonal polynomials can be represented by infinite-dimensional banded matrices:

$$\begin{aligned} \mathcal {D}\textbf{P}(x) = \textbf{P}'(x) D_P^{P'}, \end{aligned}$$
(2.6)

where \(\mathcal {D}\) denotes the derivative operator, \( \textbf{P}(x)\) are the orthonormal polynomials with respect to \({\textrm{d}\mu }(x) = w(x)\,\textrm{d}x\), \( \textbf{P}'(x)\) are the orthonormal polynomials with respect to \(\sigma {\textrm{d}\mu }\) and \(D_P^{P'}\) is a banded matrix with nonzero entries only on the first super-diagonal. Moreover, the super-diagonal entries of \(D_P^{P'}\) may be found by Eq. (2.5):

$$\begin{aligned} (D_P^{P'})^\top D_P^{P'} = \Lambda ,\quad \Lambda = {{\,\textrm{diag}\,}}(\lambda _0,\lambda _1,\ldots ). \end{aligned}$$

From a computational perspective, banded multiplication, raising, and differentiation enable sparse spectral methods to be constructed for linear differential equations with (polynomial) variable coefficients [42, 43].

In light of the current work on rationally modified orthogonal polynomials, we may ask:

Is there a basis in which the derivative of the rationally modified polynomials is banded?

The following theorem shows that banded derivatives exist for this problem and also more generally in the case of classical weights modified by algebraic powers of a product of polynomials. Let \(\varvec{\alpha } = (\alpha _1,\ldots ,\alpha _d)\) be a multi-index and \(\varvec{u}(x) = (u_1(x),\ldots ,u_d(x))\) be a collection of polynomials. We will use the following notation:

$$\begin{aligned} \varvec{u}^{\varvec{\alpha }}(x) = \prod _{i=1}^d u_i^{\alpha _i}(x), \end{aligned}$$

and the shorthand \(\displaystyle \deg (\varvec{u}^{\varvec{\alpha }}) = \sum _{i=1}^d \deg (u_i^{\alpha _i})\) if \(\varvec{\alpha }\in {\mathbb {N}}_0^d\). Define \(\lambda :{\mathbb {R}}^d\rightarrow {\mathbb {N}}_0^d\) by:

$$\begin{aligned} {[}\lambda (\varvec{\alpha })]_i = \left\{ \begin{array}{ll} 1 &{}\quad \alpha _i\ne 0,\\ 0 &{}\quad \alpha _i=0.\end{array}\right. \end{aligned}$$

Theorem 2.20

Let w(x) be a classical orthogonal polynomial weight, let \(\sigma (x)\) be the corresponding degree at most 2 polynomial, let:

$$\begin{aligned} {\textrm{d}\mu }^{(\varvec{\alpha },\beta )}(x) = w^{(\varvec{\alpha },\beta )}(x)\, \textrm{d}x= \varvec{u}^{\varvec{\alpha }}(x)\sigma ^\beta (x)w(x)\,\textrm{d}x, \end{aligned}$$

with \(\varvec{u}^{\varvec{\alpha }}(x)\sigma ^\beta (x)\) such that \({\textrm{d}\mu }^{(\varvec{\alpha },\beta )}(x)\) is a positive Borel measure with finite moments. Further, let \(\textbf{P}^{(\varvec{\alpha },\beta )}(x)\) be orthonormal polynomials with respect to \(L^2({{\mathbb {R}}}, {\textrm{d}\mu }^{(\varvec{\alpha },\beta )})\) for any real \(\varvec{\alpha }\) and \(\beta \) such that, in particular:

$$\begin{aligned} \lim _{x\rightarrow a^+} x^n w^{(\varvec{\alpha }+\lambda (\varvec{\alpha }),\beta +1)}(x) = \lim _{x\rightarrow b^-} x^n w^{(\varvec{\alpha }+\lambda (\varvec{\alpha }),\beta +1)}(x) = 0,\quad \forall n\in {\mathbb {N}}_0. \end{aligned}$$

Then:

$$\begin{aligned} \mathcal {D}\textbf{P}^{(\varvec{\alpha },\beta )}(x) = \textbf{P}^{(\varvec{\alpha }+\lambda (\varvec{\alpha }), \beta +1)}(x)D_{(\varvec{\alpha }, \beta )}^{(\varvec{\alpha }+\lambda (\varvec{\alpha }), \beta +1)}, \end{aligned}$$
(2.7)

where \(D_{(\varvec{\alpha }, \beta )}^{(\varvec{\alpha }+\lambda (\varvec{\alpha }), \beta +1)}\) is strictly upper triangular and has upper bandwidth at most \(\deg (\varvec{u}^{\lambda (\varvec{\alpha })})+1\).

Proof

Differentiation of polynomials reduces the degree; hence, \(D_{(\varvec{\alpha }, \beta )}^{(\varvec{\alpha }+\lambda (\varvec{\alpha }), \beta +1)}\) is strictly upper triangular. Consider at first the relaxed problem:

$$\begin{aligned} \mathcal {D}\textbf{P}^{(\varvec{\alpha },\beta )}(x) = \textbf{P}^{(\varvec{\alpha }+\varvec{1},\beta +1)}(x)D_{(\varvec{\alpha },\beta )}^{(\varvec{\alpha }+\varvec{1},\beta +1)}. \end{aligned}$$

To find its upper bandwidth, we note that:

$$\begin{aligned}&(D_{(\varvec{\alpha },\beta )}^{(\varvec{\alpha }+\varvec{1},\beta +1)})_{m,n}\\&\quad = \int _a^b p_m^{(\varvec{\alpha }+\varvec{1},\beta +1)}(x) \mathcal {D}p_n^{(\varvec{\alpha },\beta )}(x) w^{(\varvec{\alpha }+\varvec{1},\beta +1)}(x)\,\textrm{d}x,\\&\quad = -\int _a^b \mathcal {D}\left[ p_m^{(\varvec{\alpha }+\varvec{1},\beta +1)}(x)w^{(\varvec{\alpha }+\varvec{1},\beta +1)}(x)\right] p_n^{(\varvec{\alpha },\beta )}(x)\,\textrm{d}x,\\&\quad = -\int _a^b \left\{ \mathcal {D}\left[ p_m^{(\varvec{\alpha }+\varvec{1},\beta +1)}(x)\right] w^{(\varvec{\alpha }+\varvec{1},\beta +1)}(x)\right. \\&\qquad + p_m^{(\varvec{\alpha }+\varvec{1},\beta +1)}(x)\mathcal {D}\left[ \varvec{u}^{\varvec{\alpha }+\varvec{1}}(x)\right] \sigma ^{\beta +1}(x)w(x)\\&\qquad \left. + p_m^{(\varvec{\alpha }+\varvec{1},\beta +1}(x)\varvec{u}^{\varvec{\alpha }+\varvec{1}}(x)\mathcal {D}\left[ \sigma ^{\beta +1}(x) w(x)\right] \right\} p_n^{(\varvec{\alpha },\beta )}(x)\,\textrm{d}x,\\&\quad = -\int _a^b \left\{ \mathcal {D}\left[ p_m^{(\varvec{\alpha }+\varvec{1},\beta +1)}(x)\right] \varvec{u}^{\varvec{1}}(x)\sigma (x)\right. \\&\qquad + p_m^{(\varvec{\alpha }+\varvec{1},\beta +1)}(x)\sum _{i=1}^d(\alpha _i+1)u'_i(x)\prod _{\begin{array}{c} j=1\\ j\ne i \end{array}}^du_j(x)\sigma (x)\\&\qquad \left. + p_m^{(\varvec{\alpha }+\varvec{1},\beta +1)}(x)\varvec{u}^{\varvec{1}}(x)(\tau +\beta {\sigma '})\right\} p_n^{(\varvec{\alpha },\beta )}(x) w^{(\varvec{\alpha },\beta )}(x)\,\textrm{d}x. \end{aligned}$$

It follows that the first term is a degree at most \(m+\deg (\varvec{u}^{\varvec{1}})+\max \{\deg (\sigma )-1,\deg (\tau )\}\) polynomial, proving that \((D_{(\varvec{\alpha },\beta )}^{(\varvec{\alpha }+\varvec{1},\beta +1)})_{m,n}=0\) if \(n>m+\deg (\varvec{u}^{\varvec{1}})+1\), by orthogonality of \(p_n^{(\varvec{\alpha },\beta )}(x)\) with all polynomials of lesser degree. If any of the exponents \(\alpha _i\) is zero, then this overestimates the upper bandwidth, as \(u_i^0(x) \equiv 1\), no matter how large its degree. \(\square \)

While Theorem 2.20 shows that these classes of orthogonal polynomials have banded derivatives, it does not provide an algorithm for computations. In fact, there are many different formulas to compute these based on connection coefficients. For example, if:

$$\begin{aligned} \textbf{P}^{(\varvec{\alpha },\beta )}(x) = \textbf{P}^{(\varvec{\gamma },\delta )}(x)R_{(\varvec{\alpha },\beta )}^{(\varvec{\gamma },\delta )}, \end{aligned}$$

then it is true that:

$$\begin{aligned} D_{(\varvec{\alpha },\beta )}^{(\varvec{\alpha }+\lambda (\varvec{\alpha }),\beta +1)}=R_{(\varvec{\gamma }+\lambda (\varvec{\gamma }),\delta +1)}^{(\varvec{\alpha }+\lambda (\varvec{\alpha }),\beta +1)} D_{(\varvec{\gamma },\delta )}^{(\varvec{\gamma }+\lambda (\varvec{\gamma }),\delta +1)} R_{(\varvec{\alpha },\beta )}^{(\varvec{\gamma },\delta )}. \end{aligned}$$
(2.8)

If \(\varvec{\gamma } = \varvec{0}\), then \(D_{(\varvec{\gamma },\delta )}^{(\varvec{\gamma }+\lambda (\varvec{\gamma }),\delta +1)}\) is a classical orthogonal polynomial differentiation matrix.

In the proof of Theorem 2.20, we found that:

$$\begin{aligned} \mathcal {D}(\sigma ^{\beta +1}w) = (\tau +\beta {\sigma '})\sigma ^\beta w, \end{aligned}$$

which shows that \(\sigma ^\beta (x)w(x)\) is also a classical weight. We include \(\sigma ^\beta (x)\) to enable convenient notation for higher order derivatives.

Without loss of generality, we discuss the banded higher order derivatives of rationally modified orthonormal polynomials. To represent a rational classical weight modification, suffice it to take \(d=2\) and identify \(\varvec{u}(x) = (u(x), v(x))\) and \(\varvec{\alpha } = (1,-1)\). Then:

$$\begin{aligned} \mathcal {D}\textbf{P}^{(1,-1,0)}(x) = \textbf{P}^{(2,0,1)}(x)D_{(1,-1,0)}^{(2,0,1)}. \end{aligned}$$

It follows that:

$$\begin{aligned} D_{(1,-1,0)}^{(2,0,1)} = R_{(0,0,1)}^{(2,0,1)} D_{(0,0,0)}^{(0,0,1)} (R_{(0,0,0)}^{(1,-1,0)})^{-1}. \end{aligned}$$

This formula finds the banded derivative \(D_{(1,-1,0)}^{(2,0,1)}\) in terms of a polynomial weight modification \(R_{(0,0,1)}^{(2,0,1)}\), the classical derivative \(D_{(0,0,0)}^{(0,0,1)}\), and the inverse of the rational weight modification \(R_{(0,0,0)}^{(1,-1,0)}\), all computable or well-approximable matrices with the connection problems described in Table 1. Subsequent differentiation is natural:

$$\begin{aligned} \mathcal {D}\textbf{P}^{(k+1,0,k)}(x) = \textbf{P}^{(k+2,0,k+1)}(x)D_{(k+1,0,k)}^{(k+2,0,k+1)},\quad \forall k\in {\mathbb {N}}, \end{aligned}$$

where:

$$\begin{aligned} D_{(1,0,0)}^{(2,0,1)}&= R_{(0,0,1)}^{(2,0,1)} D_{(0,0,0)}^{(0,0,1)} (R_{(0,0,0)}^{(1,0,0)})^{-1},\\ D_{(k+1,0,k)}^{(k+2,0,k+1)}&= R_{(k+1,0,k)}^{(k+2,0,k+1)} D_{(k,0,k-1)}^{(k+1,0,k)} (R_{(k,0,k-1)}^{(k+1,0,k)})^{-1},\quad \forall k\in {\mathbb {N}}. \end{aligned}$$

All of these connection problems are banded with a minimal bandwidth in the sense that they are independent of v(x).

Equation (2.6) offers two more practical interpretations. Firstly, the transpose of the differentiation matrix is the discretization of the weighted negative derivative:

$$\begin{aligned} (-{\mathcal {D}})\left[ \sigma (x)w(x)\textbf{P}'(x)\right] = w(x)\textbf{P}(x) (D_P^{P'})^\top . \end{aligned}$$

Secondly, an indefinite integral of \(\textbf{P}'(x)\) is discretized by the Moore–Penrose pseudoinverse [24]:

$$\begin{aligned} \int ^x \textbf{P}'(x)\,\textrm{d}x = \textbf{P}(x)(D_P^{P'})^+. \end{aligned}$$

Since \(D_P^{P'}\) has full row rank and only one nontrivial band, \((D_P^{P'})^+\) is a right inverse that is particularly easy to compute: it too only has one nontrivial band, and \((D_P^{P'})^+_{n+1,n} = (D_P^{P'})^{-1}_{n,n+1}\). Both of these properties carry over to the more general setting of Theorem 2.20.

Corollary 2.21

Consider \(\textbf{P}^{(\varvec{\alpha },\beta )}(x)\) to be the same as in Theorem 2.20.

  1. 1.

    The weighted negative derivative is:

    $$\begin{aligned}&(-{\mathcal {D}})\left[ w^{(\varvec{\alpha }+\lambda (\varvec{\alpha }),\beta +1)}(x)\textbf{P}^{(\varvec{\alpha }+\lambda (\varvec{\alpha }),\beta +1)}(x)\right] \\&\quad = w^{(\varvec{\alpha },\beta )}(x)\textbf{P}^{(\varvec{\alpha },\beta )}(x) (D_{(\varvec{\alpha },\beta )}^{(\varvec{\alpha }+\lambda (\varvec{\alpha }),\beta +1)})^\top , \end{aligned}$$
  2. 2.

    and an indefinite integral is:

    $$\begin{aligned} \int ^x \textbf{P}^{(\varvec{\alpha }+\lambda (\varvec{\alpha }),\beta +1)}(x)\,\textrm{d}x = \textbf{P}^{(\varvec{\alpha },\beta )}(x)(D_{(\varvec{\alpha },\beta )}^{(\varvec{\alpha }+\lambda (\varvec{\alpha }),\beta +1)})^+. \end{aligned}$$

While \(D_{(\varvec{\alpha },\beta )}^{(\varvec{\alpha }+\lambda (\varvec{\alpha }),\beta +1)}\) in general has a larger upper bandwidth than \(D_P^{P'}\), its full row rank enables direct computation of successive columns of its pseudoinverse, preserving the lower bandwidth of 1. In particular, we may take the first row of \((D_{(\varvec{\alpha },\beta )}^{(\varvec{\alpha }+\lambda (\varvec{\alpha }),\beta +1)})^+\) to be zero, and for the rest:

$$\begin{aligned} (D_{(\varvec{\alpha },\beta )}^{(\varvec{\alpha }+\lambda (\varvec{\alpha }),\beta +1)})^+_{2:n+1,1:n} =(D_{(\varvec{\alpha },\beta )}^{(\varvec{\alpha }+\lambda (\varvec{\alpha }),\beta +1)})_{1:n,2:n+1}^{-1}. \end{aligned}$$

Combining Theorem 2.20 and Corollary 2.21 is now irresistible.

Theorem 2.22

Consider \(\textbf{P}^{(\varvec{\alpha },\beta )}(x)\) to be the same as in Theorem 2.20.

$$\begin{aligned}&(-{\mathcal {D}})\left[ w^{(\varvec{\alpha }+\lambda (\varvec{\alpha }),\beta +1)}(x){\mathcal {D}}\right] \textbf{P}^{(\varvec{\alpha },\beta )}(x)\\&\quad = w^{(\varvec{\alpha },\beta )}(x)\textbf{P}^{(\varvec{\alpha },\beta )}(x)(D_{(\varvec{\alpha },\beta )}^{(\varvec{\alpha }+\lambda (\varvec{\alpha }),\beta +1)})^\top D_{(\varvec{\alpha },\beta )}^{(\varvec{\alpha }+\lambda (\varvec{\alpha }),\beta +1)}. \end{aligned}$$

3 Algorithms for Infinite-Dimensional Matrix Factorizations

As mentioned above, there is an important distinction between Cholesky and QR decompositions of infinite-dimensional matrices and their reverse Cholesky and QL decompositions: Finite sections of infinite-dimensional Cholesky and QR decompositions are computable while in the QL and reverse Cholesky factorizations, matrix factorizations are employed that in general are not computable (with a finite number of operations) for infinite-dimensional matrices. In finite dimensions, a QL factorization may proceed by orthogonal transformations designed to lower-triangularize the matrix in question. It follows that these orthogonal transformations must begin in the bottom right corner but in infinite dimensions, the bottom right corner is always out of reach, see [63]. A similar argument follows for the reverse Cholesky factorization. In finite dimensions, symmetric elementary transformations introduce zeros starting with the last row and column.

Since we only require the \(n\times n\) principal section of the connection coefficients, we develop iterative methods to compute the approximate infinite-dimensional QL and reverse Cholesky factorizations based on larger principal sections. Let \(N > n\) and partition V as follows:

$$\begin{aligned} V = \begin{pmatrix} V_N &{} V_b\\ V_b^\top &{} V_\infty \end{pmatrix}. \end{aligned}$$

Here, \(V_N\) is the \(N\times N\) principal section of V. It inherits the symmetric positive-definite and banded structure from V. Next, \(V_b\) is an \(N\times \infty \) block with a rank bounded by the bandwidth of V and sparsity pattern of a \(b\times b\) lower-triangular matrix in the first b columns and the last b rows. Finally, \(V_\infty \) is the trailing infinite-dimensional section.

3.1 Approximating the QL Factorization

We compute the QL factorization of \(V_N\) and we wish to analyse the proximity of its \(n\times n\) principal section to the same sized section of the infinite-dimensional QL factorization of a nearby matrix to V; thus, we shall make a perturbative argument. Recall that \(P_n\) is the canonical orthogonal projection of Eq. (1.2). We compute \(V_N = Q_N L_N\) and consider:

$$\begin{aligned} \begin{pmatrix} Q_N^\top \\ {} &{} I\end{pmatrix} V = \begin{pmatrix} L_N &{} Q_N^\top V_b\\ V_b^\top &{} V_\infty \end{pmatrix}. \end{aligned}$$

In the first n rows, the orthogonal transformation \(Q_N^\top \) has successfully lower triangularized the first N columns of V. However, in the next b columns, the entries \(P_n Q_N^\top V_b\) interfere with a complete lower triangularization. Were they 0, we would declare victory as the orthogonal lower triangularization would have succeeded. In general, this is impossible and yet we notice that in practice for \(N\gg n\) these entries may be 2-normwise small. Thus, we conclude with the following criterion for convergence of principal sections of the QL factorization.

Lemma 3.1

Given \(\epsilon >0\), if:

$$\begin{aligned} \Vert P_nQ_N^\top V_b\Vert _2 < \epsilon \Vert Q_N^\top V_b\Vert _2 = \epsilon \Vert V_b\Vert _2, \end{aligned}$$

then there exists a \(\tilde{V}_b\) such that:

$$\begin{aligned} P_n Q_N^\top \tilde{V}_b&= 0,\quad \textrm{and}\\ \Vert V_b-\tilde{V}_b\Vert _2&< \epsilon \Vert V_b\Vert _2. \end{aligned}$$

Proof

We choose \(\tilde{V}_b = Q_N(I_N-P_n)Q_N^\top V_b\). Then:

$$\begin{aligned} P_nQ_N^\top \tilde{V}_b = P_nQ_N^\top Q_N(I_N-P_n)Q_N^\top V_b = P_n(I_N-P_n)Q_N^\top V_b = 0, \end{aligned}$$

and:

$$\begin{aligned} V_b - \tilde{V}_b&= \left[ I_N - Q_N(I_N-P_n)Q_N^\top \right] V_b\\&= Q_N P_n Q_N^\top V_b, \end{aligned}$$

and the result follows. \(\square \)

Since \(\Vert V_b\Vert _2 \le \Vert V\Vert _2\), Lemma 3.1 suggests that introducing a 2-normwise small relative perturbation in \(V_b\) is sufficient to result in an n-row lower triangularization of a \(\tilde{V}\) by the orthogonal transformation \(Q_N^\top \). Moreover, as the perturbation is localized to the \(V_b\) block, the statement is in fact stronger than:

$$\begin{aligned} \Vert V-\tilde{V}\Vert _2 < \epsilon \Vert V\Vert _2. \end{aligned}$$

It may not be immediately obvious, but Lemma 3.1 guarantees that for every \(M>N\), if \(\tilde{V}_M = Q_ML_M\), then the first n rows of \(L_M\) coincide with those of \(L_N\), and the first n columns of \(Q_M\) are also the first n columns of \(Q_N\). Since \(L_M\) and \(L_N\) are lower-triangular matrices, the first \(n\times n\) principal sections are the same. Similarly, the \(N\times n\) principal sections of \(Q_M\) and \(Q_N\) are same. If N is sufficiently large, then \(\tilde{V}\) is 2-normwise close to V and we have computed the exact partial QL factorization of a nearby matrix.

Our algorithm is as follows: given \(\epsilon >0\), we begin by setting \(N = 2n\), we compute \(V_N = Q_NL_N\), and we double N until we may invoke Lemma 3.1 and continue with the finite-dimensional aspect of the computations involved in the QL factorization. As there is no universal guarantee for this to occur, the software implementation of our algorithm issues a warning if N reaches a huge maximum value.

3.2 Approximating the Reverse Cholesky Factorization

As above, we compute the reverse Cholesky factorization of \(V_N\), and we make a perturbative argument to consider the utility of its \(n\times n\) principal section. With \(V_N = L_N^\top L_N\) in hand:

$$\begin{aligned} \begin{pmatrix} L_N^{-\top }\\ {} &{} I\end{pmatrix} V = \begin{pmatrix} L_N &{} L_N^{-\top } V_b\\ V_b^\top &{} V_\infty \end{pmatrix}, \end{aligned}$$

and we wish to consider how closely we have lower-triangularized the first n rows.

Lemma 3.2

Given \(\epsilon >0\), if:

$$\begin{aligned} \Vert L_N^\top P_nL_N^{-\top } V_b\Vert _2 < \epsilon \Vert V_b\Vert _2, \end{aligned}$$

then there exists a \(\tilde{V}_b\) such that:

$$\begin{aligned} P_n L_N^{-\top }\tilde{V}_b&= 0,\quad \textrm{and}\\ \Vert V_b-\tilde{V}_b\Vert _2&< \epsilon \Vert V_b\Vert _2. \end{aligned}$$

Proof

We choose \(\tilde{V}_b = L_N^\top (I_N-P_n)L_N^{-\top } V_b\). Then:

$$\begin{aligned} P_nL_N^{-\top }\tilde{V}_b = P_nL_N^{-\top } L_N^\top (I_N-P_n)L_N^{-\top } V_b = P_n(I_N-P_n)L_N^{-\top } V_b = 0, \end{aligned}$$

and:

$$\begin{aligned} V_b - \tilde{V}_b&= \left[ I_N - L_N^\top (I_N-P_n)L_N^{-\top }\right] V_b\\&= L_N^\top P_n L_N^{-\top } V_b, \end{aligned}$$

and the result follows. \(\square \)

Since \(\Vert V_b\Vert _2 \le \Vert V\Vert _2\), Lemma 3.2 suggests that introducing a 2-normwise small relative perturbation in \(V_b\) is sufficient to result in an n-row lower triangularization of V by \(L_N^{-\top }\). Our algorithm is as follows: given \(\epsilon >0\), we begin by setting \(N = 2n\), we compute \(V_N=L_N^\top L_N\), and we double N until we may invoke Lemma 3.2 and continue with the finite-dimensional aspect of the computations involved in the reverse Cholesky factorization.

3.3 A Model for Infinite-Dimensional Matrix Factorizations

We consider a model problem of a rational modification describing a single simple pole off \([-1,1]\):

$$\begin{aligned} v(x) = \alpha +2\beta x. \end{aligned}$$

If this modifies the (orthonormal) Chebyshev polynomials of the second-kind, then:

$$\begin{aligned} V = \begin{pmatrix} \alpha &{} \beta \\ \beta &{} \alpha &{} \ddots \\ {} &{} \ddots &{} \ddots \end{pmatrix}, \end{aligned}$$

and \(|\alpha |> 2|\beta |\). This symmetric tridiagonal matrix is also Toeplitz [24, § 4.7], which allows us to draw from the literature on the QL and Wiener–Hopf factorizations of Toeplitz matrices.

3.3.1 QL Factorizations

Let \(G_i(c, s)\) denote the real Givens rotation matrix in the \(e_ie_{i+1}\)-plane embedded in the infinite-dimensional identity:

$$\begin{aligned} G_i(c, s) = \begin{pmatrix} I_{i-1}\\ {} &{} c &{} s\\ {} &{} -s &{} c\\ {} &{} &{} &{} I\end{pmatrix}, \end{aligned}$$

where \(s^2+c^2=1\).

Lemma 3.3

(Theorem 5.2.6 in [63]) If \(\alpha > 2|\beta |\), the QL factorization of V exists and is given by:

$$\begin{aligned} V = \prod _{i=1}^\infty G_i\left( c, s\right) \begin{pmatrix} \sqrt{\frac{(\alpha +\sqrt{\alpha ^2-(2\beta )^2})\sqrt{\alpha ^2-(2\beta )^2}}{2}}\\ 2\beta &{} \frac{\alpha +\sqrt{\alpha ^2-(2\beta )^2}}{2}\\ \frac{2\beta ^2}{\alpha +\sqrt{\alpha ^2-(2\beta )^2}} &{} 2\beta &{} \ddots \\ {} &{} \ddots &{} \ddots \end{pmatrix}, \end{aligned}$$

where:

$$\begin{aligned} s = \frac{2\beta }{\alpha +\sqrt{\alpha ^2-(2\beta )^2}},\quad \textrm{and}\quad c = \sqrt{\frac{2\sqrt{\alpha ^2-(2\beta )^2}}{\alpha +\sqrt{\alpha ^2-(2\beta )^2}}}, \end{aligned}$$

and the Givens rotations are applied to the lower triangular matrix in the order determined by their index; that is, the infinite product iteratively prepends Givens rotations on the left.

Proof

We parameterize some of the entries of L by s and c. We apply \(G_1\) to L:

$$\begin{aligned}&G_1\left( c, s\right) \begin{pmatrix} \sqrt{\frac{(\alpha +\sqrt{\alpha ^2-(2\beta )^2})\sqrt{\alpha ^2-(2\beta )^2}}{2}}\\ 2\beta &{} \frac{\alpha +\sqrt{\alpha ^2-(2\beta )^2}}{2}\\ s\beta &{} 2\beta &{} \ddots \\ {} &{} \ddots &{} \ddots \end{pmatrix}\\&\quad = \begin{pmatrix} \alpha &{} \beta \\ \beta \sqrt{\frac{2\sqrt{\alpha ^2-(2\beta )^2}}{\alpha +\sqrt{\alpha ^2-(2\beta )^2}}} &{} \sqrt{\frac{(\alpha +\sqrt{\alpha ^2-(2\beta )^2})\sqrt{\alpha ^2-(2\beta )^2}}{2}}\\ s\beta &{} 2\beta &{} \frac{\alpha +\sqrt{\alpha ^2-(2\beta )^2}}{2}\\ {} &{} \ddots &{} \ddots &{} \ddots \end{pmatrix},\\&\quad = \begin{pmatrix} \alpha &{} \beta \\ c\beta &{} \sqrt{\frac{(\alpha +\sqrt{\alpha ^2-(2\beta )^2})\sqrt{\alpha ^2-(2\beta )^2}}{2}}\\ s\beta &{} 2\beta &{} \frac{\alpha +\sqrt{\alpha ^2-(2\beta )^2}}{2}\\ {} &{} \ddots &{} \ddots &{} \ddots \end{pmatrix}. \end{aligned}$$

Then, applying \(G_2\) to the result, we find:

$$\begin{aligned}&G_2\left( c, s\right) \begin{pmatrix} \alpha &{} \beta \\ c\beta &{} \sqrt{\frac{(\alpha +\sqrt{\alpha ^2-(2\beta )^2})\sqrt{\alpha ^2-(2\beta )^2}}{2}}\\ s\beta &{} 2\beta &{} \frac{\alpha +\sqrt{\alpha ^2-(2\beta )^2}}{2}\\ {} &{} \ddots &{} \ddots &{} \ddots \end{pmatrix}\\&\quad = \begin{pmatrix} \alpha &{} \beta \\ \beta &{} \alpha &{} \beta \\ 0 &{} c\beta &{} \sqrt{\frac{(\alpha +\sqrt{\alpha ^2-(2\beta )^2})\sqrt{\alpha ^2-(2\beta )^2}}{2}}\\ &{} s\beta &{} 2\beta &{} \frac{\alpha +\sqrt{\alpha ^2-(2\beta )^2}}{2}\\ {} &{} &{} \ddots &{} \ddots &{} \ddots \end{pmatrix}. \end{aligned}$$

We have now shown that the first two rows of \(G_2G_1L\) are equal to the first two rows of V, and since the \(2:\infty \times 2:\infty \) section of \(G_2G_1L\) is equal to \(G_1L\), the full factorization follows by induction. \(\square \)

If \(\alpha = 2|\beta |\), then the infinite product of Givens rotations with \(|s|=1\) and \(c=0\) is an isometry but it is not orthogonal as every entry in the first column is zero.

Next, we compare this result to the computable factorization \(V_N = Q_NL_N\).

Lemma 3.4

If \(\alpha>2\beta >0\), the QL factorization of \(V_N\) is given by:

$$\begin{aligned} V_N = \prod _{i=1}^{N-1}G_i(c_i, s_i) L_N, \end{aligned}$$

where:

$$\begin{aligned} s_{N-k} = \sqrt{\frac{\sum _{j=1}^k a_j^2}{\sum _{j=1}^{k+1}a_j^2}}, \quad \textrm{and}\quad c_{N-k} = \sqrt{\frac{a_{k+1}^2}{\sum _{j=1}^{k+1}a_j^2}}, \end{aligned}$$
(3.1)

where:

$$\begin{aligned} a_j = \frac{\beta ^2}{\sqrt{\alpha ^2-(2\beta )^2}}\left[ \left( \frac{\alpha +\sqrt{\alpha ^2-(2\beta )^2}}{2\beta }\right) ^j - \left( \frac{2\beta }{\alpha +\sqrt{\alpha ^2-(2\beta )^2}}\right) ^j\right] , \end{aligned}$$
(3.2)

and where:

$$\begin{aligned} (L_N)_{1,1}&= c_1\alpha -s_1c_2\beta , \end{aligned}$$
(3.3)
$$\begin{aligned} (L_N)_{k,k}&= \sqrt{(c_k\alpha -s_kc_{k+1}\beta )^2+\beta ^2},\quad k>1,\end{aligned}$$
(3.4)
$$\begin{aligned} (L_N)_{k+1,k}&= s_k\alpha +c_kc_{k+1}\beta ,\end{aligned}$$
(3.5)
$$\begin{aligned} (L_N)_{k+2,k}&= s_{k+1}\beta . \end{aligned}$$
(3.6)

Proof

As \(V_N\) is finite-dimensional, we may begin in the bottom right corner. As \(V_N\) is tridiagonal, we focus on the six entries that change for each Givens rotation. Note that:

$$\begin{aligned} s_N = 0,\quad \textrm{and}\quad c_N = 1. \end{aligned}$$

Then the next sine and cosine are determined by \(G_{N-1}(c_{N-1}, s_{N-1})^\top V_N\) introducing a zero in the \(N-1,N\) position:

$$\begin{aligned} \begin{pmatrix} c_{N-1} &{} -s_{N-1}\\ s_{N-1} &{} c_{N-1}\end{pmatrix}\begin{pmatrix} \beta &{} \alpha &{} \beta \\ 0 &{} \beta &{} \alpha \end{pmatrix}&= \begin{pmatrix} c_{N-1}\beta &{} c_{N-1}\alpha -s_{N-1}\beta &{} 0\\ s_{N-1}\beta &{} s_{N-1}\alpha +c_{N-1}\beta &{} \sqrt{\alpha ^2+\beta ^2}\end{pmatrix},\\&= \begin{pmatrix} c_{N-1}\beta &{} c_{N-1}\alpha -s_{N-1}c_N\beta &{} 0\\ s_{N-1}\beta &{} s_{N-1}\alpha +c_{N-1}c_N\beta &{} \sqrt{\alpha ^2+\beta ^2}\end{pmatrix}, \end{aligned}$$

or:

$$\begin{aligned} s_{N-1} = \frac{\beta }{\sqrt{\alpha ^2+\beta ^2}},\quad \textrm{and}\quad c_{N-1} = \frac{\alpha }{\sqrt{\alpha ^2+\beta ^2}}. \end{aligned}$$

Introducing another zero in the \(N-2,N-1\) position involves the computation \(G_{N-2}(c_{N-2}, s_{N-2})^\top G_{N-1}(c_{N-1}, s_{N-1})^\top V_N\):

$$\begin{aligned}&\begin{pmatrix} c_{N-2} &{} -s_{N-2}\\ s_{N-2} &{} c_{N-2}\end{pmatrix}\begin{pmatrix} \beta &{} \alpha &{} \beta \\ 0 &{} c_{N-1}\beta &{} c_{N-1}\alpha -s_{N-1}c_N\beta \end{pmatrix}\\&\quad = \begin{pmatrix} c_{N-2}\beta &{} c_{N-2}\alpha -s_{N-2}c_{N-1}\beta &{} 0\\ s_{N-2}\beta &{} s_{N-2}\alpha +c_{N-2}c_{N-1}\beta &{} \sqrt{(c_{N-1}\alpha -s_{N-1}c_N\beta )^2+\beta ^2}\end{pmatrix}, \end{aligned}$$

or:

$$\begin{aligned} s_{N-2}&= \frac{\beta }{\sqrt{(c_{N-1}\alpha -s_{N-1}c_N\beta )^2+\beta ^2}},\quad \textrm{and}\\ c_{N-2}&= \frac{c_{N-1}\alpha -s_{N-1}c_N\beta }{\sqrt{(c_{N-1}\alpha -s_{N-1}c_N\beta )^2+\beta ^2}}. \end{aligned}$$

After k steps, \(G_{N-k}(c_{N-k}, s_{N-k})^\top \cdots G_{N-1}(c_{N-1}, s_{N-1})^\top V_N\), the sine and cosine are determined by two nonlinear recurrence relations:

$$\begin{aligned} s_{N-k}^2&= \frac{\beta ^2}{(c_{N-k+1}\alpha -s_{N-k+1}c_{N-k+2}\beta )^2+\beta ^2},\quad \textrm{and} \end{aligned}$$
(3.7)
$$\begin{aligned} c_{N-k}^2&= \frac{(c_{N-k+1}\alpha -s_{N-k+1}c_{N-k+2}\beta )^2}{(c_{N-k+1}\alpha -s_{N-k+1}c_{N-k+2}\beta )^2+\beta ^2}. \end{aligned}$$
(3.8)

Substituting Eqs. (3.1) into Eq. (3.7), we find:

$$\begin{aligned} \beta ^2\sum _{j=1}^{k+1}a_j^2&= \left[ (c_{N-k+1}\alpha -s_{N-k+1}c_{N-k+2}\beta )^2+\beta ^2\right] \sum _{j=1}^ka_j^2,\\ \beta ^2a_{k+1}^2&= (c_{N-k+1}\alpha -s_{N-k+1}c_{N-k+2}\beta )^2\sum _{j=1}^ka_j^2,\\&= (a_k\alpha -a_{k-1}\beta )^2. \end{aligned}$$

Taking the positive square root on the both sides, the linear recurrence relation:

$$\begin{aligned} a_{k+1}\beta = a_k\alpha -a_{k-1}\beta ,\quad a_1 = \beta ,\quad a_2 = \alpha , \end{aligned}$$

is solved by Eq. (3.2). A careful bookkeeping of the above procedure provides the entries of \(L_N\). \(\square \)

Theorem 3.5

Let \(\alpha>2\beta >0\) and let:

$$\begin{aligned} \rho = \frac{\alpha +\sqrt{\alpha ^2-(2\beta )^2}}{2\beta }>1. \end{aligned}$$

If:

$$\begin{aligned} N > n + \log _\rho \left( \frac{\rho -\rho ^{-1}}{2\epsilon } + \sqrt{\left( \frac{\rho -\rho ^{-1}}{2\epsilon }\right) ^2+1}\right) -1, \end{aligned}$$

then:

$$\begin{aligned} \left\| P_nL - P_n\begin{pmatrix} L_N &{} 0\\ 0 &{} 0\end{pmatrix}\right\| _2 < \epsilon \Vert L\Vert _2. \end{aligned}$$

Proof

We wish to invoke Lemma 3.1 with the Givens rotations from Lemma 3.4. Then:

$$\begin{aligned} P_nQ_N^\top V_b = P_n G_1^\top \cdots G_{N-1}^\top \beta e_N. \end{aligned}$$

As \(P_n\) commutes with \(G_i\) for \(i<n\):

$$\begin{aligned} \Vert P_nQ_N^\top V_b\Vert _2&= \Vert G_1^\top \cdots G_{n-1}^\top P_n G_n^\top \cdots G_{N-1}^\top \beta e_N \Vert _2,\\&= \Vert P_n G_n^\top \cdots G_{N-1}^\top \beta e_N \Vert _2,\\&= \Vert V_b\Vert _2 \prod _{k=1}^{N-n} |s_{N-k}|. \end{aligned}$$

Then, by Lemma 3.4:

$$\begin{aligned} \prod _{k=1}^{N-n} |s_{N-k}|&= \prod _{k=1}^{N-n} \sqrt{\frac{\sum _{j=1}^k a_j^2}{\sum _{j=1}^{k+1}a_j^2}},\\&= \sqrt{\frac{a_1^2}{\sum _{j=1}^{N-n+1}a_j^2}} < \sqrt{\frac{a_1^2}{a_{N-n+1}^2}},\\&= \frac{a_1}{a_{N-n+1}} = \frac{\rho -\rho ^{-1}}{\rho ^{N-n+1}-\rho ^{n-N-1}} = \frac{\rho -\rho ^{-1}}{2\sinh [(N-n+1)\log \rho ]}. \end{aligned}$$

By our lower bound on N, we guarantee that \(\Vert P_nQ_N^\top V_b\Vert _2 < \epsilon \Vert V_b\Vert _2\), and it follows from Lemma (3.1) at the subsequent discussion that the first \(n\times n\) principal section of \(L_N\) is 2-normwise close to the same principal section of L. \(\square \)

Remark 3.6

Consider the Bernstein ellipse:

$$\begin{aligned} {{{\mathcal {E}}}}_\rho = \left\{ z\in {\mathbb {C}}: z = \frac{\rho e^{\textrm{i}\theta } + \rho ^{-1} e^{-\textrm{i}\theta }}{2},\quad \theta \in [0,2\pi ),\quad \rho \ge 1\right\} . \end{aligned}$$

Then the real pole of r(x), or the root of v(x), is located at \(-\frac{\alpha }{2\beta }\), corresponding to the Bernstein ellipse parameter \(\rho = \frac{\alpha +\sqrt{\alpha ^2-(2\beta )^2}}{2\beta }\). It is no surprise that the rate of exponential convergence of the finite-dimensional sines to their asymptotic limits is precisely equal to the logarithm of the Bernstein ellipse parameter. But we feel the physical connection is important: the closer the pole, the slower the convergence of finite-dimensional QL to the infinite-dimensional.

3.3.2 Reverse Cholesky Factorizations

Lemma 3.7

If \(\alpha > 2|\beta |\), the reverse Cholesky factorization of V is given by:

$$\begin{aligned} V = \begin{pmatrix} l_d &{} l_o\\ {} &{} l_d &{} \ddots \\ {} &{} &{} \ddots \end{pmatrix}\begin{pmatrix} l_d\\ l_o &{} l_d\\ {} &{} \ddots &{} \ddots \end{pmatrix}, \end{aligned}$$

where:

$$\begin{aligned} l_d = \sqrt{\frac{\alpha +\sqrt{\alpha ^2-(2\beta )^2}}{2}},\quad \textrm{and}\quad l_o = \beta \sqrt{\frac{2}{\alpha +\sqrt{\alpha ^2-(2\beta )^2}}}. \end{aligned}$$

Proof

On main diagonal entries, we find:

$$\begin{aligned} \alpha = l_d^2+l_o^2 = \frac{\alpha +\sqrt{\alpha ^2-(2\beta )^2}}{2} + \frac{2\beta ^2}{\alpha +\sqrt{\alpha ^2-(2\beta )^2}}, \end{aligned}$$

which is true. On sub- and super-diagonal entries, it is also true that:

$$\begin{aligned} \beta = l_dl_o. \end{aligned}$$

\(\square \)

This lemma is an instance of the discrete Wiener–Hopf factorization [4, 6], which describes a UL factorization of a Toeplitz operator resulting in Toeplitz factors where all the work revolves around factoring the Toeplitz symbol.

As we may expect, the finite-dimensional reverse Cholesky factorization of a Toeplitz matrix is more involved.

Lemma 3.8

If \(\alpha >2|\beta |\), the reverse Cholesky factorization of \(V_N\) is given by:

$$\begin{aligned} V_N = L_N^\top L_N, \end{aligned}$$

where:

$$\begin{aligned} (L_N)_{N-k,N-k} = \sqrt{d_k},\quad \textrm{and}\quad (L_N)_{N-k,N-k-1} = \frac{\beta }{\sqrt{d_k}}, \end{aligned}$$
(3.9)

and where:

$$\begin{aligned} d_k = \frac{b_{k+1}}{b_k}, \end{aligned}$$
(3.10)

where:

$$\begin{aligned} b_k= & {} \frac{1}{\sqrt{\alpha ^2-(2\beta )^2}}\left[ \left( \frac{\alpha +\sqrt{\alpha ^2-(2\beta )^2}}{2}\right) ^{k+1}\right. \nonumber \\{} & {} -\left. \left( \frac{2\beta ^2}{\alpha +\sqrt{\alpha ^2-(2\beta )^2}}\right) ^{k+1}\right] . \end{aligned}$$
(3.11)

Proof

By considering the product \(L_N^\top L_N\), we recover the off-diagonal relationship in Eq. (3.9) and the nonlinear recurrence for the main diagonal:

$$\begin{aligned} (L_N)_{N-k-1,N-k-1}^2 + \frac{\beta ^2}{(L_N)_{N-k,N-k}^2} \!=\! \alpha ,\!\quad \textrm{and}\quad \! (L_N)_{N-k,N-k-1} \!=\! \frac{\beta }{(L_N)_{N-k,N-k}}. \end{aligned}$$

The recurrence relation is linearized by the definitions of \(d_k\) and \(b_k\), resulting in:

$$\begin{aligned} b_{k+2} -\alpha b_{k+1}+\beta ^2b_k = 0,\quad b_0 = 1,\quad b_1 = \alpha . \end{aligned}$$

This linear recurrence relation is solved by Eq. (3.11). \(\square \)

We could replicate an analogous result to Theorem 3.5, but the estimation of \(\Vert L_N^\top P_n L_n^{-\top } V_b\Vert _2\) to invoke Lemma 3.2 is more technically complicated. Instead, we will directly show that the \(n\times n\) principal sections of L and \(L_N\) are 2-normwise close, provided that N is sufficiently large.

Theorem 3.9

Let \(\alpha>2\beta >0\) and let:

$$\begin{aligned} \rho = \frac{\alpha +\sqrt{\alpha ^2-(2\beta )^2}}{2\beta }>1. \end{aligned}$$

If:

$$\begin{aligned} N > n + \log _\rho \left( \frac{2}{\epsilon }\sqrt{\frac{\beta }{\alpha +2\beta }}\right) -\frac{1}{2}, \end{aligned}$$

then:

$$\begin{aligned} \left\| P_nL - P_n\begin{pmatrix} L_N &{} 0\\ 0 &{} 0\end{pmatrix}\right\| _2 < \epsilon \Vert L\Vert _2. \end{aligned}$$

Proof

As a reverse Cholesky factor, it follows that:

$$\begin{aligned} \Vert L\Vert _2 = \sqrt{\Vert V\Vert _2} = \sqrt{\alpha +2\beta }. \end{aligned}$$

The largest singular value of a matrix is bounded above by the maximum over all the absolute row and column sums [47]. For our bidiagonal matrix, this means:

$$\begin{aligned} \left\| P_nL - P_n\begin{pmatrix} L_N &{} 0\\ 0 &{} 0\end{pmatrix}\right\| _2&\le \max \left\{ \max _{2\le k\le n} \left[ |L_{k,k} - (L_N)_{k,k}| + |L_{k,k-1} - (L_N)_{k,k-1}|\right] ,\right. \\&\quad \left. \max _{1\le k\le n} \left[ |L_{k,k} - (L_N)_{k,k}| + |L_{k+1,k} - (L_N)_{k+1,k}|\right] \right\} . \end{aligned}$$

Since \(\sqrt{d_k}\) is monotonically decreasing to \(l_d\), the absolute row sums are uniformly larger than the absolute column sums and for \(n>1\), the maximum absolute row sum is attained by the last row:

$$\begin{aligned} \left\| P_nL - P_n\begin{pmatrix} L_N &{} 0\\ 0 &{} 0\end{pmatrix}\right\| _2 \le |L_{n,n} - (L_N)_{n,n}| + |L_{n,n-1} - (L_N)_{n,n-1}|. \end{aligned}$$

On the main diagonal, we have:

$$\begin{aligned} |l_d - \sqrt{d_k}|&= \left| \sqrt{\beta \rho } - \sqrt{\beta \frac{\rho ^{k+2}-\rho ^{-k-2}}{\rho ^{k+1}-\rho ^{-k-1}}}\right| ,\\&= \sqrt{\beta }\left| \sqrt{\rho } - \sqrt{\frac{\rho ^{k+2} - \rho ^{-k}}{\rho ^{k+1}-\rho ^{-k-1}} + \frac{\rho ^{-k}-\rho ^{-k-2}}{\rho ^{k+1}-\rho ^{-k-1}}}\right| ,\\&= \sqrt{\beta }\left| \sqrt{\rho } - \sqrt{\rho + \rho ^{-1-2k}\frac{1-\rho ^{-2}}{1-\rho ^{-2k-2}}}\right| . \end{aligned}$$

Since \(|\sqrt{x} -\sqrt{x+y}| < \sqrt{y}\) for \(x,y>0\), it follows that:

$$\begin{aligned} |l_d - \sqrt{d_k}|< \sqrt{\beta }\sqrt{\rho ^{-1-2k}\frac{1-\rho ^{-2}}{1-\rho ^{-2k-2}}} < \sqrt{\beta }\rho ^{-k-\frac{1}{2}}. \end{aligned}$$

On the sub-diagonal, we have the reciprocal difference:

$$\begin{aligned} \left| l_o - \frac{\beta }{\sqrt{d_k}}\right|&= \left| \sqrt{\frac{\beta }{\rho }} - \frac{\beta }{\sqrt{d_k}}\right| ,\\&= \sqrt{\beta }\left| \frac{1}{\sqrt{\rho }} - \frac{1}{\sqrt{\rho + \rho ^{-1-2k}\frac{1-\rho ^{-2}}{1-\rho ^{-2k-2}}}}\right| , \end{aligned}$$

This time, we use \(|\frac{1}{\sqrt{x}} - \frac{1}{\sqrt{x+y}}|< \sqrt{\frac{y}{x(x+y)}}< \frac{\sqrt{y}}{x} < \sqrt{y}\) for \(x>1\) and \(y>0\) to find:

$$\begin{aligned} \left| l_o - \frac{\beta }{\sqrt{d_k}}\right|< \sqrt{\beta }\sqrt{\rho ^{-1-2k}\frac{1-\rho ^{-2}}{1-\rho ^{-2k-2}}} < \sqrt{\beta }\rho ^{-k-\frac{1}{2}}. \end{aligned}$$

Setting \(k=N-n\), the inequality is attained. \(\square \)

If \(\beta <0\), a similar analysis reveals the same exponential convergence of the finite reverse Cholesky factor to its infinite-dimensional analogue.

4 Applications and Numerical Experiments

The free and open-source implementation of our algorithms is available in C at FastTransforms [50]. Our numerical experiments are conducted on an iMac Pro (Early 2018) with a 2.3 GHz Intel Xeon W-2191B with 128 GB 2.67 GHz DDR4 RAM. The library is compiled with-Ofast -march=native optimization flags.

4.1 Rapid Modified Jacobi Synthesis and Analysis

Consider the rational Jacobi weight modification:

$$\begin{aligned} r(x;\gamma ) =\frac{x^2+(500\gamma )^2}{[(x-\frac{1}{2})^2+\gamma ^2]^2[(x+\frac{3}{4})^2+\gamma ^2]}. \end{aligned}$$

We expand the corresponding orthonormal polynomials, \(q_n^{(\alpha ,\beta )}(x;\gamma )\), in Jacobi polynomials with the same parameters. Then, by a Jacobi–Chebyshev transform [42], we further convert to first-kind Chebyshev polynomials, and rapidly synthesize the modified polynomials (and expansions) on a Chebyshev grid via the fast inverse discrete cosine transform (iDCT) [20]. This entire procedure requires:

$$\begin{aligned}{} & {} \underbrace{{\mathcal {O}}\left( n + \log _{\rho }\left[ \frac{\rho -\rho ^{-1}}{2\epsilon }+\sqrt{\left( \frac{\rho -\rho ^{-1}}{2\epsilon }\right) ^2+1}\right] \right) }_\mathrm{rational~modification~by~Theorem}~3.5 \\{} & {} \quad +\underbrace{{\mathcal {O}}(n\log n\log \epsilon ^{-1})}_\mathrm{Jacobi-Chebyshev} + \underbrace{{\mathcal {O}}(n\log n)}_\textrm{iDCT}, \end{aligned}$$

flops, where \(\rho = \rho (\gamma )\) is the smallest Bernstein ellipse parameter for the four distinct poles of \(r(x;\gamma )\). The left panel of Fig. 1 illustrates one such polynomial and Szegő’s corresponding asymptotic envelope [57, Theorem 12.1.4].

With the connection problem in hand, we may also modify the Jacobi matrices for the construction of modified Gaussian quadrature rules. The right panel of Fig. 1 illustrates how the 30-point rule is modified for \(10^{-4}<\gamma <10^2\); in particular, we note that some nodes converge toward the projection of the poles of \(r(x;\gamma )\) on the real axis as \(\gamma \rightarrow 0\) and are dispersed by the appearance of roots of \(r(x;\gamma )\) near origin.

Fig. 1
figure 1

Left: synthesis of \(q_{500}^{(-0.25,-0.75)}(x;\gamma =0.0001)\) on a Chebyshev grid with the Szegő envelope. Right: the nodes of the 30-point modified Gaussian quadrature rule with \((\alpha ,\beta ) = (-0.25, -0.75)\)

Figure 2 illustrates the relative error and calculation times when working with degree-n modified Jacobi polynomials. The left panel shows that the method is numerically stable. Notably in the right panel, the complexity of the factorization closely matches the result of Theorem 3.5 for the simple pole modifying second-kind Chebyshev polynomials. This complexity includes a relatively large overhead at small degrees as \(N\gg n\) such that it may accurately compute 2-normwise nearby approximations to finite sections of the orthogonal and lower triangular factors of V. With the precomputation in hand, the execution times appear to produce relatively clean and predictable complexity plots.

Fig. 2
figure 2

Conversion of a degree-n expansion in modified Jacobi polynomials \(q_n^{(-0.25,-0.75)}(x;\gamma =0.01)\) with standard normally distributed pseudorandom coefficients to Jacobi polynomials with the same parameters. Left: 2-norm and \(\infty \)-norm relative error in the forward and backward transformation. Right: precomputation and execution time as well as a complexity estimate based on Theorem 3.5

4.2 Modified Laguerre Polynomials

As alluded to in Sect. 2.1, it would be impossible for a Laguerre polynomial series approximating \(r(x) = \frac{x}{x+\gamma }\) to also well-approximate the infinite matrix \(r(X_P)\). By the logical rational approximation, it is straightforward to consider the rationally modified Laguerre polynomials. Figure 3 illustrates the relative error and calculation times for transforming to and from these modified generalized Laguerre polynomials.

Notice that the relative error in forward-backward transformations appears to grow linearly with the truncation degree, which is markedly different from the asymptotically bounded relative error in the modified Jacobi polynomial transforms. We suspect this is due to the linear growth in the condition numbers of U and V that nearly cancel in \(V^{-1}U\) and this inherent instability in the representation of a rational function as the quotient of two polynomials in particular on unbounded domains. We propose an avenue of future research that may address this issue in Sect. 5.

For generalized Laguerre series, a parabola co-axial with the x-axis opening to the right with focus the origin and vertex chosen to maximize the region of analyticity is the analogue of a Bernstein ellipse for Jacobi series [56]. For the simple pole of r(x) at \(x=-\gamma \), the corresponding parabola is \(y^2 = 4\gamma (x+\gamma )\), and generalized Laguerre series converge root-exponentially at the rate \({\mathcal {O}}(e^{-2\sqrt{\gamma n}})\) as \(n\rightarrow \infty \). From this rate, we include in Fig. 3 a graph estimating the complexity of the precomputation based on this region of analyticity.

Fig. 3
figure 3

Conversion of a degree-n expansion in modified generalized Laguerre polynomials \(q_n^{(0.25)}(x;\gamma =0.1)\) with standard normally distributed pseudorandom coefficients to generalized Laguerre polynomials with the same parameter. Left: 2-norm and \(\infty \)-norm relative error in the forward and backward transformation. Right: precomputation and execution time as well as a complexity estimate inspired by Theorem 3.5

4.3 An Orthonormal Basis for \(L^2({\mathbb {R}})\)

Consider the problem of finding an orthonormal basis, \(\textbf{F}(x) = (f_0(x) \quad f_1(x) \quad f_2(x) \quad \cdots )\), for \(L^2({\mathbb {R}})\) with algebraic decay:

$$\begin{aligned} \int _{-\infty }^\infty f_m(x)f_n(x)\,\textrm{d}x = \delta _{m,n},\quad |f_n(x)| = \Theta (x^{-\alpha }),\quad x\rightarrow \pm \infty , \end{aligned}$$

for some \(\frac{1}{2}<\alpha <\infty \). Llewellyn Smith and Luca [53] use the rational map \(\displaystyle x = \frac{t}{1-t^2}\) to transform the problem to one on \((-1,1)\), where orthonormality reads:

$$\begin{aligned} \int _{-1}^1 f_m(x(t)) f_n(x(t)) \frac{1+t^2}{(1-t^2)^2}\,\textrm{d}t = \delta _{m,n}. \end{aligned}$$

Define \(q_n(t)\) to be orthonormal polynomials in \(L^2([-1,1], (1+t^2)\textrm{d}t)\). Of course, they are connected to the normalized Legendre polynomials by \(\textbf{P}(t) = \textbf{Q}(t) R\) where R is the upper-triangular Cholesky factor of \(I+X_P^2\). It follows that:

$$\begin{aligned} f_n(x) = (1-t^2)q_n(t) = \frac{2}{\sqrt{1+4x^2}+1} q_n\left( \frac{2x}{\sqrt{1+4x^2}+1}\right) . \end{aligned}$$

An important property of an orthonormal basis on \(L^2({\mathbb {R}})\) is that differentiation, \({\mathcal {D}}\), is a skew-adjoint linear operator. When evolving time-dependent partial differential equations numerically, if the discretization of this operator is also skew-adjoint and sparse for any principal finite section, then fast methods exist for its spectral decomposition [28], and the time evolution is stable.

We will now show that given the eigenvalue problem:

$$\begin{aligned} {\mathcal {D}}u(x) = \lambda u(x), \quad \lim _{x\rightarrow \pm \infty }u(x) = 0, \end{aligned}$$

there is a skew-definite and banded discretization in the orthonormal basis \(\textbf{F}(x)\):

$$\begin{aligned} {\mathcal {D}}{} \textbf{F}(x)U = \textbf{F}(x) D U&= \textbf{F}(x) U \Lambda ,\\ R^\top DRV&= R^\top RV\Lambda , \end{aligned}$$

and \(U = RV\).

By changing coordinates:

$$\begin{aligned} D = \int _{{\mathbb {R}}}\textbf{F}(x)^\top {\mathcal {D}}{} \textbf{F}(x)\,\textrm{d}x = \int _{-1}^1 (1-t^2)\textbf{Q}(t)^\top {\mathcal {D}}\left[ (1-t^2)\textbf{Q}(t)\right] \,\textrm{d}t, \end{aligned}$$

and the connection problem congruence transformation:

$$\begin{aligned} R^\top D R = \int _{-1}^1 (1-t^2)\textbf{P}(t)^\top {\mathcal {D}}\left[ (1-t^2)\textbf{P}(t)\right] \,\textrm{d}t, \end{aligned}$$

uncovers banded sparsity on the right-hand side, by orthogonality of Legendre polynomials. Legendre polynomials are an instance of Jacobi polynomials, and by classical connections we find two formulæ for the right-hand side that also show the skew symmetry:

$$\begin{aligned} R^\top D R = (I-X_P^2)(-D_P^{P'})^\top R_P^{P'} = (R_P^{P'})^\top D_P^{P'}(I-X_P^2). \end{aligned}$$

Another useful property for any basis is a uniform pointwise bound. We can show that:

$$\begin{aligned} |f_n(x)| < \sqrt{\frac{2}{\pi }}\frac{2^{\frac{7}{4}}}{(1+4x^2)^{\frac{3}{8}}},\quad \forall x\in {\mathbb {R}}. \end{aligned}$$

By using line (1) in Table 1, \((1+t^2)\textbf{Q}(t) = \textbf{P}(t)R^\top \), and noting that by symmetry R is banded with only two nontrivial bands:

$$\begin{aligned} |f_n(x)|&= |(1-t^2)q_n(t)|,\\&\le |(1-t^2)(1+t^2)q_n(t)|,\\&= |(1-t^2)(p_n(t)R_{n,n}+p_{n+2}(t)R_{n,n+2})|. \end{aligned}$$

Recall the sharpened Bernstein inequality [2]:

$$\begin{aligned} (1-x^2)^{\frac{1}{4}}|P_n(x)| < \sqrt{\frac{2}{\pi }}\frac{1}{\sqrt{n+\frac{1}{2}}}, \quad x\in [-1,1]. \end{aligned}$$

Then, since \(p_n(t)\) are the orthonormal Legendre polynomials:

$$\begin{aligned} |f_n(x)|&\le |(1-t^2)^{\frac{3}{4}}|\left[ |(1-t^2)^{\frac{1}{4}}p_n(t)||R_{n,n}| + |(1-t^2)^{\frac{1}{4}}p_{n+2}(t)||R_{n,n+2}|\right] ,\\&< |(1-t^2)^{\frac{3}{4}}|\sqrt{\frac{2}{\pi }}\left( |R_{n,n}| + |R_{n,n+2}|\right) . \end{aligned}$$

Now:

$$\begin{aligned} |R_{n,n}| + |R_{n,n+2}|&= \Vert R^\top e_n\Vert _1 \le \sqrt{2}\Vert R^\top e_n\Vert _2 \le \sqrt{2}\Vert R^\top \Vert _2,\\&= \sqrt{2}\sqrt{\Vert R^\top R\Vert _2} = 2,\quad \textrm{since}\quad \Vert R^\top R\Vert _2 = \sup _{t\in [-1,1]}|1+t^2| = 2. \end{aligned}$$

4.4 Orthogonal Polynomials on an Annulus

Consider \(P_n^{t,(\alpha ,\beta ,\gamma )}(x)\) to be orthonormal polynomials in \(L^2([-1,1], (1-x)^\alpha (1+x)^\beta (t+x)^\gamma {\textrm{d}x})\), for parameter values \(\{t>1, \alpha ,\beta>-1, \gamma \in {\mathbb {R}}\}\cup \{t=1, \alpha ,\beta +\gamma >-1\}\). If \(\gamma \in {\mathbb {Z}}\), then \((t+x)^\gamma \) is either polynomial or rational and our algorithms to connect \(\textbf{P}^{t,(\alpha ,\beta ,\gamma )}(x)\) to the Jacobi polynomials \(\textbf{P}^{(\alpha ,\beta )}(x)\) are immediately applicable. For \(\gamma \in {\mathbb {R}}{\setminus }{\mathbb {Z}}\) and \(t>1\), \((t+x)^\gamma \) may be well-approximated by polynomials and rationals alike.

Real orthonormal annulus polynomials in:

$$\begin{aligned} L^2(\{(r,\theta ): \rho< r< 1, 0< \theta < 2\pi \}, r^{2\gamma +1}(r^2-\rho ^2)^\alpha (1-r^2)^\beta \,\textrm{d} r\,\textrm{d}\theta ), \end{aligned}$$

are:

$$\begin{aligned} Z_{\ell ,m}^{\rho ,(\alpha ,\beta ,\gamma )}(r,\theta ) =&\sqrt{2} \left( \frac{2}{1-\rho ^2}\right) ^{\frac{|m|+\alpha +\beta +\gamma +1}{2}} r^{|m|}P_{\frac{\ell -|m|}{2}}^{\frac{1+\rho ^2}{1-\rho ^2},(\beta ,\alpha ,|m|+\gamma )}\left( \frac{2r^2-1-\rho ^2}{1-\rho ^2}\right) \\&\times \sqrt{\frac{2-\delta _{m,0}}{2\pi }} \left\{ \begin{array}{ll} \cos (m\theta ) &{} \quad \textrm{for} \quad m \ge 0,\\ \sin (|m|\theta ) &{} \quad \textrm{for} \quad m < 0.\end{array}\right. \end{aligned}$$

For the purposes of synthesis and analysis on tensor-product grids on the annulus, we convert these polynomials to a direct sum of tensor-product bases through a sequence of orthogonal transformations, a generalization that has the same structure as Zernike polynomial transforms [42, 50]. These are computed by considering that decrementing the order m in steps of 2 requires the computation of line (2) in Table 1:

$$\begin{aligned} (t+x)\textbf{P}^{t,(\alpha ,\beta ,\gamma +m+2)}(x) = \textbf{P}^{t,(\alpha ,\beta ,\gamma +m)}(x)Q_{t,(\alpha ,\beta ,\gamma +m+2)}^{t,(\alpha ,\beta ,\gamma +m)}. \end{aligned}$$

Moreover, the similarity transformation between the Jacobi matrices for \(\textbf{P}^{t,(\alpha ,\beta ,\gamma +m)}(x)\) and \(\textbf{P}^{t,(\alpha ,\beta ,\gamma +m+2)}(x)\) requires the upper-triangular factor in line (2) in Table 1:

$$\begin{aligned} \textbf{P}^{t,(\alpha ,\beta ,\gamma +m)}(x) = \textbf{P}^{t,(\alpha ,\beta ,\gamma +m+2)}(x)R_{t,(\alpha ,\beta ,\gamma +m)}^{t,(\alpha ,\beta ,\gamma +m+2)}, \end{aligned}$$

enabling:

$$\begin{aligned} X_{t,(\alpha ,\beta ,\gamma +m+2)} = R_{t,(\alpha ,\beta ,\gamma +m)}^{t,(\alpha ,\beta ,\gamma +m+2)} X_{t,(\alpha ,\beta ,\gamma +m)} (R_{t,(\alpha ,\beta ,\gamma +m)}^{t,(\alpha ,\beta ,\gamma +m+2)})^{-1}. \end{aligned}$$

Both factors may be computed simultaneously from:

$$\begin{aligned} tI+X_{t,(\alpha ,\beta ,\gamma +m)} = Q_{t,(\alpha ,\beta ,\gamma +m+2)}^{t,(\alpha ,\beta ,\gamma +m)} R_{t,(\alpha ,\beta ,\gamma +m)}^{t,(\alpha ,\beta ,\gamma +m+2)}. \end{aligned}$$

4.5 Orthogonal Polynomials on a Spherical Band

For multi-indices \(\varvec{t}\) and \(\varvec{\alpha }\), consider \(P_n^{\varvec{t},(\varvec{\alpha })}(x)\) to be orthonormal polynomials in:

$$\begin{aligned} L^2([-1,1], (t_1-x)^{\alpha _1}(1-x)^{\alpha _2}(1+x)^{\alpha _3}(t_2+x)^{\alpha _4}{\textrm{d}x}). \end{aligned}$$

On a spherical band:

$$\begin{aligned} {\mathbb {S}}_{\varvec{\theta }}^2 = \{(\theta ,\varphi ): 0< \theta _1< \theta< \theta _2< \pi , 0< \varphi < 2\pi \}, \end{aligned}$$

with weight:

$$\begin{aligned} w^{\varvec{\theta },(\varvec{\alpha })}(\theta ) = (2\sin ^2\tfrac{\theta }{2})^{\alpha _1}(\cos \theta _1-\cos \theta )^{\alpha _2} (\cos \theta -\cos \theta _2)^{\alpha _3}(2\cos ^2\tfrac{\theta }{2})^{\alpha _4}, \end{aligned}$$

real orthonormal spherical band polynomials in \(L^2({\mathbb {S}}_{\varvec{\theta }}^2, w^{\varvec{\theta },(\varvec{\alpha })}(\theta )\sin \theta \,\textrm{d}\theta \,\textrm{d}\varphi )\) are:

$$\begin{aligned} Y_{\ell ,m}^{\varvec{\theta },(\varvec{\alpha })}(\theta ,\varphi ) =&\left( \frac{2}{\cos \theta _1-\cos \theta _2}\right) ^{\frac{|\varvec{\alpha }|+2|m|+1}{2}} \sin ^{|m|}\!\theta \\&\times P_{\ell -|m|}^{\varvec{t},(\varvec{\alpha }_{|m|})} \left( \frac{2\cos \theta -\cos \theta _1-\cos \theta _2}{\cos \theta _1-\cos \theta _2}\right) \\&\times \sqrt{\frac{2-\delta _{m,0}}{2\pi }} \left\{ \begin{array}{ll} \cos (m\theta ) &{}{}\quad \text {for} \quad m \ge 0,\\ \sin (|m|\theta ) &{}{}\quad \text {for} \quad m < 0,\end{array}\right. \end{aligned}$$

where:

$$\begin{aligned} \varvec{t} = (t_1, t_2)^\top = \left( \frac{2-\cos \theta _1-\cos \theta _2}{\cos \theta _1-\cos \theta _2}, \frac{2+\cos \theta _1+\cos \theta _2}{\cos \theta _1-\cos \theta _2}\right) ^\top , \end{aligned}$$

and \(\varvec{\alpha }_m = (\alpha _1+m, \alpha _2, \alpha _3, \alpha _4+m)^\top \) and \(|\varvec{\alpha }| = \alpha _1+\alpha _2+\alpha _3+\alpha _4\).

For the purposes of synthesis and analysis on tensor-product grids on the spherical band, we convert these polynomials to a direct sum of tensor-product bases through a sequence of orthogonal transformations, a generalization that has the same structure as spherical harmonic transforms [50, 51]. These are computed by considering that decrementing the order m in steps of 2 requires the computation of line (2) in Table 1:

$$\begin{aligned} (t_1-x)(t_2+x)\textbf{P}^{\varvec{t},(\varvec{\alpha }_{m+2})}(x) = \textbf{P}^{\varvec{t},(\varvec{\alpha }_m)}(x)Q_{\varvec{t},(\varvec{\alpha }_{m+2})}^{\varvec{t},(\varvec{\alpha }_m)}. \end{aligned}$$

Moreover, the similarity transformation between the Jacobi matrices for \(\textbf{P}^{\varvec{t},(\varvec{\alpha }_m)}(x)\) and \(\textbf{P}^{\varvec{t},(\varvec{\alpha }_{m+2})}(x)\) requires the upper-triangular factor in line (2) in Table 1:

$$\begin{aligned} \textbf{P}^{\varvec{t},(\varvec{\alpha }_m)}(x) = \textbf{P}^{\varvec{t},(\varvec{\alpha }_{m+2})}(x)R_{\varvec{t},(\varvec{\alpha }_m)}^{\varvec{t},(\varvec{\alpha }_{m+2})}, \end{aligned}$$

enabling:

$$\begin{aligned} X_{\varvec{t},(\varvec{\alpha }_{m+2})} = R_{\varvec{t},(\varvec{\alpha }_m)}^{\varvec{t},(\varvec{\alpha }_{m+2})} X_{\varvec{t},(\varvec{\alpha }_m)} (R_{\varvec{t},(\varvec{\alpha }_m)}^{\varvec{t},(\varvec{\alpha }_{m+2})})^{-1}. \end{aligned}$$

Both factors may be computed simultaneously from:

$$\begin{aligned} (t_1I-X_{\varvec{t},(\varvec{\alpha }_m)})(t_2I+X_{\varvec{t},(\varvec{\alpha }_m)}) = Q_{\varvec{t},(\varvec{\alpha }_{m+2})}^{\varvec{t},(\varvec{\alpha }_m)}R_{\varvec{t}, (\varvec{\alpha }_m)}^{\varvec{t},(\varvec{\alpha }_{m+2})}. \end{aligned}$$

Remark 4.1

The orthogonal structures of an annular sector and a spherical quadrangle follow naturally by replacing the Fourier modes with orthonormal polynomials on an arc.

5 Conclusions and Future Directions

A straightforward generalization of this work is to develop efficient algorithms in the multivariate setting. The Koornwinder construction [32] provides bivariate analogues of the classical orthogonal polynomials, and other multivariate orthogonal polynomials are known in greater than two dimensions [13]. Bivariate polynomial and rational measure modifications can extend these polynomials to new and more interesting measures, where polynomial modifications result in banded-block-banded matrices: block-banded matrices whose blocks are also banded. A challenge is to explore improving the complexity of banded-block-banded matrix factorizations. In higher dimensions, symmetric Jacobi matrices of Koornwinder-like constructions are tridiagonal-block-banded [33, 64, 65]; hence, the “bandwidths” of polynomial weight modifications grows with the degree, resulting in an \({\mathcal {O}}(n^4)\) Cholesky factorization with \({\mathcal {O}}(n^3)\) nonzero entries in two dimensions. These complexities stand in contrast to the linear complexities in the univariate setting.

Another logical extension is to consider the full semi-classical setting, where the weight function w(x) satisfies a first-order linear homogeneous differential equation with polynomial coefficients a(x) and b(x):

$$\begin{aligned} \mathcal {D}(aw) = bw. \end{aligned}$$

In this Pearson differential Eq. (2.4), the restrictions on the polynomial degrees are removed. Interesting connections between semi-classical orthogonal polynomials and the Painlevé transcendents [36] have been discovered. While this case may not exhibit a visible sparsity in the connection problem, such as bandedness, it would be worthwhile exploring methods to formulate the connection coefficients as solutions to matrix equations.

Compared with previous approaches to rational measure modifications [29, 49, 61], where the polynomials are factored as a product of degree-1 or 2 polynomials, our infinite-dimensional matrix factorizations leave u(x) and v(x) whole. This allows the more numerically stable use of orthogonal polynomial expansions of these variable coefficients, avoids the potentially unstable computation of their roots, and solves the connection problem in one step. Representing rational functions as a ratio of polynomials is no panacea. For example, the Newman rational [39] approximating |x| creates numerator and denominator polynomials with large dynamic ranges. Hence, by orthogonal polynomial expansion, absolutely small negative parts of \(v(X_P)\) are introduced numerically that lead to the nonexistence of QL and reverse Cholesky factorizations. This example and others inducing Froissart doublets have led to the construction of an even more stable representation of rational functions that has been used to develop the so-called AAA algorithm [38]. In AAA, a rational function is represented as a ratio of rationals:

$$\begin{aligned} r(x) = \frac{\sum _{k=0}^n\frac{w_kf_k}{x-x_k}}{\sum _{k=0}^n\frac{w_k}{x-x_k}}. \end{aligned}$$

This definition is augmented by \(\lim _{x\rightarrow x_k}r(x) = f_k\) and the constants \(x_k\) and \(w_k\) are the parameters at one’s disposal to range over all type-(nn) rational functions. In AAA, the support points \(x_k\) are greedily chosen from among a set of user-supplied sample points in the approximation domain. Consequently, both the numerator and denominator rationals have singularities on the approximation domain that cancel but which are a problem when considering (approximately) evaluating \(r(X_P)\). A partial fraction decomposition could reveal the poles as distinct from the support points and the connection coefficients may be recovered by Cholesky factorization of a sum of invertible symmetric tridiagonal matrices [62, § 7.2.1].

A final avenue of future work is to investigate the structure of the connection coefficients with piecewise-defined polynomial and rational measure modifications. Relatedly, we note that \({\mathcal {O}}(n^2)\) algorithms exist [14] to construct Jacobi matrices for sums of weight functions. It remains to be seen whether or not these connection problems may be solved in reduced complexities for sums of weight functions. Here, we do not anticipate linear complexity; rather, we suspect quasi-optimal complexities based on hierarchical factorizations of the connection problem.