1 Introduction

We are interested in the compact representation and fast evaluation of a class of space- or time-varying linear integral operators with regular variations. Such operators appear in a large number of applications ranging from wireless communications [28, 37] to seismic data analysis [23], biology [22] and image processing [38].

In all these applications, a key numerical problem is to efficiently evaluate the action of the operator and its adjoint on given functions. This is necessary—for instance—to design fast inverse problems solvers. The main objective of this paper is to analyze the complexity of a set of approximation techniques coined product-convolution expansions.

We are interested in bounded linear integral operators \(H:L^2(\varOmega )\rightarrow L^2(\varOmega )\) defined from a kernel K by:

$$\begin{aligned} Hu(x) = \int _{\varOmega } K(x,y) u(y) \, \mathrm{d}y. \end{aligned}$$
(1)

for all \(u\in L^2(\varOmega )\), where \(\varOmega = {{\mathbb {R}}}\setminus {{\mathbb {Z}}}\) is the one dimensional torus. Extensions to bounded and higher-dimensional domains will be mentioned at the end of the paper. Evaluating integrals of type (1) is a major challenge in numerical analysis, and many methods have been developed in the literature. Nearly, all methods share the same basic principle: decompose the operator kernel as a sum of low-rank matrices with a multi-scale structure. This is the case in panel clustering methods [24], hierarchical matrices [6], cross-approximations [35] or wavelet expansions [3, 13, 14]. The method proposed in this paper basically shares the same idea, except that the time-varying impulse response T of the operator is decomposed instead of the kernel K. The time-varying impulse response (TVIR) T of H is defined by:

$$\begin{aligned} T(x,y) = K(x+y,y), \quad \forall (x,y) \in \varOmega \times \varOmega . \end{aligned}$$
(2)

The TVIR representation of H allows formalizing the notion of regularly varying integral operator: The functions \(T(x,\cdot )\) should be “smooth” for all \(x\in \varOmega \). Intuitively, the smoothness assumption means that two neighboring impulse responses should only differ slightly. Under this assumption, it is tempting to approximate H locally by a convolution. Two different approaches have been proposed in the literature to achieve this. The first one is called convolution-product expansion of order m and consists of approximating H by an operator \(H_m\) of type:

$$\begin{aligned} H_m u = \sum _{k=1}^m w_k \odot (h_k \star u), \end{aligned}$$
(3)

where \(h_k\) and \(w_k\) are real-valued functions defined on \(\varOmega \), \(\odot \) denotes the standard multiplication for functions and the Hadamard product for vectors, and \(\star \) denotes the convolution operator. The second one, called product-convolution expansion of order m, is at the core of this paper and consists of using an expansion of type:

$$\begin{aligned} H_m u = \sum _{k=1}^m h_k \star (w_k \odot u). \end{aligned}$$
(4)

Function \(w_k\) is usually chosen as a windowing function localized in space, while \(h_k\) is a kernel describing the operator on the support of \(w_k\). These two types of approximations have been used for a long time in the field of imaging (and to a lesser extent mobile communications and biology) and progressively became more and more refined [1, 17, 21, 22, 27, 28, 33, 34, 40]. In particular, the recent work [17] provides a nice overview of existing choices for the functions \(h_k\) and \(w_k\) as well as new ideas leading to significant improvements. Many different names have been used in the literature to describe expansions of type (3) and (4) depending on the communities: sectional methods, overlap-add and overlap-save methods, piecewise convolutions, anisoplanatic convolutions, parallel product-convolution, filter flow, windowed-convolutions... The term product-convolution comes from the field of mathematics [7]Footnote 1. We believe that it precisely describes the set of expansions of type (3) and therefore chose this naming. It was already used in the field of imaging by [1]. Now that product-convolution expansions have been described, natural questions arise:

  1. (i)

    How to choose the functions \(h_k\) and \(w_k\)?

  2. (ii)

    What is the numerical complexity of evaluating products of type \(H_m u\)?

  3. (iii)

    What is the resulting approximation error \(\Vert H_m-H\Vert \), where \(\Vert \cdot \Vert \) is a norm over the space of operators?

  4. (iv)

    How many operations are needed in order to obtain an approximation \(H_m\) such that \(\Vert H_m-H\Vert \le \epsilon \)?

Elements (i) and (ii) have been studied thoroughly and improved over the years in the mentioned papers. The main questions addressed herein are points (iii) and (iv). To the best of our knowledge, they have been ignored until now. They are however necessary in order to evaluate the theoretical performance of different product-convolution expansions and to compare their respective advantages precisely.

The main outcome of this paper is the following: Under smoothness assumptions of type \(T(x,\cdot ) \in H^s(\varOmega )\) for all \(x\in \varOmega \) (the Hilbert space of functions in \(L^2(\varOmega )\) with s derivatives in \(L^2(\varOmega )\)), most methods proposed in the literature—if implemented correctly—ensure a decay of type \(\Vert H_m-H\Vert _{HS}= O(m^{-s})\), where \(\Vert \cdot \Vert _{HS}\) is the Hilbert–Schmidt norm. Moreover, this bound cannot be improved uniformly on the considered smoothness class. By adding a support condition of type \(\mathrm {supp}(T(x,\cdot )) \subseteq [-\kappa /2,\kappa /2]\), the bound becomes \(\Vert H_m-H\Vert _{HS}= O(\sqrt{\kappa } m^{-s})\). More importantly, bounded supports allow reducing the computational burden. After discretization on n time points, we show that the number of operations required to satisfy \(\Vert H_m-H\Vert _{HS}\le \epsilon \) varies from \(O\left( \kappa ^{\frac{1}{2s}}n\log _2(n)\epsilon ^{-1/s}\right) \) to \(O\left( \kappa ^{\frac{2s+1}{2s}} n\log _2(\kappa n)\epsilon ^{-1/s}\right) \) depending on the choices of \(w_k\) and \(h_k\). We also show that the compressed operator representations of Meyer [32] can be used under additional regularity assumptions.

An important difference of product-convolution expansions compared to most methods in the literature [3, 6, 20, 24, 35] is that they are insensitive to the smoothness of \(T(\cdot ,y)\). The smoothness in the x direction is a useful property to control the discretization error, but not the approximation rate. The proposed methodology might therefore be particularly competitive in applications with irregular impulse responses.

The paper is organized as follows. In Sect. 2, we describe the notation and introduce a few standard results of approximation theory. In Sect. 3, we precisely describe the class of operators studied in this paper, show how to discretize them and provide the numerical complexity of evaluating product-convolution expansions of type (4). Sections 4 and 5 contain the full approximation analysis for two different kinds of approaches called linear or adaptive methods. Section 6 contains a summary and a few additional comments.

2 Notation

Let a and b denote functions depending on some parameters. The relationship \(a\asymp b\) means that a and b are equivalent, i.e., that there exists \(0<c_1\le c_2\) such that \(c_1 a \le b \le c_2 a\). Constants appearing in inequalities will be denoted by C and may vary at each occurrence. If a dependence on a parameter exists (e.g., \(\epsilon \)), we will use the notation \(C(\epsilon )\).

In most of the paper, we work on the unit circle \(\varOmega = {{\mathbb {R}}}\backslash {{\mathbb {Z}}}\) sometimes identified with the interval \(\left[ -\frac{1}{2},\frac{1}{2}\right] \). This choice is driven by simplicity of exposition, and the results can be extended to bounded domains such as \(\varOmega =[0,1]^d\) (see Sect. 6.2). Let \(L^2(\varOmega )\) denote the space of square integrable functions on \(\varOmega \). The Sobolev space \(H^s(\varOmega )\) is defined as the set of functions in \(L^2(\varOmega )\) with weak derivatives up to order s in \(L^2(\varOmega )\). The k-th weak derivative of \(u\in H^s(\varOmega )\) is denoted \(u^{(s)}\). The norm and semi-norm of \(u\in H^s(\varOmega )\) are defined by:

$$\begin{aligned} \Vert u\Vert _{H^s(\varOmega )} = \sum _{k=0}^s \Vert u^{(k)}\Vert _{L^2(\varOmega )} \quad \text {and} \quad |u|_{H^s(\varOmega )} = \Vert u^{(s)}\Vert _{L^2(\varOmega )}. \end{aligned}$$
(5)

The sequence of functions \((e_k)_{k \in {{\mathbb {Z}}}}\) where \(e_k:x \mapsto \exp (-2i\pi k x)\) is an orthonormal basis of \(L^2(\varOmega )\) (see, e.g., [29]).

Definition 1

Let \(u\in L^2(\varOmega )\) and \(e_k:x \mapsto \exp (-2i\pi k x)\) denote the k-th Fourier atom. The Fourier series coefficients \({\hat{u}}[k]\) of u are defined for all \(k\in {{\mathbb {Z}}}\) by:

$$\begin{aligned} {\hat{u}}[k] = \int _{\varOmega } u(x) e_k(x)\,\mathrm{d}x. \end{aligned}$$
(6)

The space \(H^s(\varOmega )\) can be characterized through Fourier series.

Lemma 1

(Fourier characterization of Sobolev norms)

$$\begin{aligned} \Vert u\Vert ^2_{H^s(\varOmega )} \asymp \sum _{k \in {{\mathbb {Z}}}} |\hat{u}[k]|^2 (1 + |k|^2)^s. \end{aligned}$$
(7)

Definition 2

(B-spline of order \(\alpha \)) Let \(\alpha \in {{\mathbb {N}}}\) and \(m\ge \alpha +2\) be two integers. The B-spline of order 0 is defined by

$$\begin{aligned} B_{0,m} = \mathbbm {1}_{[-1/(2m),1/(2m)]}. \end{aligned}$$
(8)

The B-spline of order \(\alpha \in {{\mathbb {N}}}^*\) is defined by recurrence by:

$$\begin{aligned} B_{\alpha ,m} = m B_{0,m} \star B_{\alpha -1,m} = m^{\alpha } \underbrace{B_{0,m} \star \ldots \star B_{0,m}}_{\alpha \text { times}}. \end{aligned}$$
(9)

The set of cardinal B-splines of order \(\alpha \) is denoted \({\mathcal {B}}_{\alpha ,m}\) and defined by:

$$\begin{aligned} \begin{aligned} {\mathcal {B}}_{\alpha ,m} = \Bigg \{ f(\cdot )&= \sum _{k=0}^{m-1} c_k B_{\alpha ,m}(\cdot - k/m), \\&c_k\in {{\mathbb {R}}}, \ 0\le k \le m-1 \Bigg \}. \end{aligned} \end{aligned}$$
(10)

In this work, we use the Daubechies wavelet basis for \(L^2({{\mathbb {R}}})\) [15]. We let \(\phi \) and \(\psi \) denote the scaling and mother wavelets and assume that the mother wavelet \(\psi \) has \(\alpha \) vanishing moments, i.e.,

$$\begin{aligned} \forall 0 \le m < \alpha , \quad \int _{[0,1]} t^m \psi (t) dt = 0. \end{aligned}$$
(11)

Daubechies wavelets satisfy \(\mathop {\mathrm {supp}}(\psi )=[-\alpha +1,\alpha ]\), see [31, Theorem 7.9, p. 294]. Translated and dilated versions of the wavelets are defined, for all \(j > 0\) by

$$\begin{aligned} \psi _{j,l}(x) = 2^{j/2} \psi \left( 2^{j} x - l \right) . \end{aligned}$$
(12)

The set of functions \((\psi _{j,l})_{j\in {{\mathbb {N}}}, l\in Z}\), is an orthonormal basis of \(L^2({{\mathbb {R}}})\) with the convention \(\psi _{0,l}=\phi (x-l)\). There are different ways to construct a wavelet basis on the interval \([-1/2,1/2]\) from a wavelet basis on \(L^2({{\mathbb {R}}})\). Here, we use boundary wavelets defined in [12]. We refer to [16, 31] for more details on the construction of wavelet bases. This yields an orthonormal basis \((\psi _\lambda )_{\lambda \in \varLambda }\) of \(L^2(\varOmega )\), where

$$\begin{aligned} \varLambda = \left\{ (j,l), j \in {{\mathbb {N}}}, 0 \le l \le 2^j \right\} . \end{aligned}$$
(13)

We let \(I_\lambda = \mathop {\mathrm {supp}}(\psi _\lambda )\) and for \(\lambda \in \varLambda \), we use the notation \(|\lambda |=j\).

Let u and v be two functions in \(L^2(\varOmega )\), the notation \(u\otimes v\) will be used both to indicate the function \(w\in L^2(\varOmega \times \varOmega )\) defined by

$$\begin{aligned} w(x,y) = (u\otimes v)(x,y) = u(x) v(y), \end{aligned}$$
(14)

or the Hilbert–Schmidt operator \(w:L^2(\varOmega )\rightarrow L^2(\varOmega )\) defined for all \(f\in L^2(\varOmega )\) by:

$$\begin{aligned} w(f) = (u \otimes v) f = \langle u, f \rangle v. \end{aligned}$$
(15)

The meaning can be inferred depending on the context. Let \(H:L^2(\varOmega )\rightarrow L^2(\varOmega )\) denote a linear integral operators. Its kernel will always be denoted K and its time-varying impulse response T. The linear integral operator with kernel T will be denoted J.

The following result is an extension of the singular value decomposition to operators.

Lemma 2

(Schmidt decomposition [36, Theorem 2.2] or [26, Theorem 1 p. 215]) Let \(H:L^2(\varOmega ) \rightarrow L^2(\varOmega )\) denote a compact operator. There exists two finite or countable orthonormal systems \(\{e_1, \ldots \}\), \(\{f_1, \ldots \}\) of \(L^2(\varOmega )\) and a finite or infinite sequence \(\sigma _1 \ge \sigma _2 \ge \ldots \) of positive numbers (tending to zero if it is infinite), such that H can be decomposed as:

$$\begin{aligned} H = \sum _{k \ge 1} \sigma _k \cdot e_k \otimes f_k. \end{aligned}$$
(16)

A function \(u\in L^2(\varOmega )\) is denoted in regular font, whereas its discretized version \(\varvec{u}\in {{\mathbb {R}}}^n\) is denoted in bold font. The value of function u at \(x\in \varOmega \) is denoted u(x), while the i-th coefficient of vector \(\varvec{u}\in {{\mathbb {R}}}^n\) is denoted \(\varvec{u}[i]\). Similarly, an operator \(H : L^2(\varOmega ) \rightarrow L^2(\varOmega )\) is denoted in upper-case regular font, whereas its discretized version \(\varvec{H} \in {{\mathbb {R}}}^{n \times n}\) is denoted in upper-case bold font.

3 Preliminary Facts

In this section, we gather a few basic results necessary to derive approximation results.

3.1 Assumptions on the Operator and Examples

All the results stated in this paper rely on the assumption that the TVIR T of H is a sufficiently simple function. By simple, we mean that i) the functions \(T(x,\cdot )\) are smooth for all \(x\in \varOmega \), and ii) the impulse responses \(T(\cdot ,y)\) have a bounded support or a fast decay for all \(y\in \varOmega \).

There are numerous ways to capture the regularity of a function. In this paper, we assume that \(T(x,\cdot )\) lives in the Hilbert spaces \(H^s(\varOmega )\) for all \(x\in \varOmega \). This hypothesis is deliberately simple to clarify the proofs and the main ideas.

Definition 3

( Class \({\mathcal {T}}^s\) ) We let \({\mathcal {T}}^s\) denote the class of functions \(T : \varOmega \times \varOmega \rightarrow {{\mathbb {R}}}\) satisfying the smoothness condition: \(T(x,\cdot ) \in H^s(\varOmega ), \ \forall x \in \varOmega \) and \(\Vert T(x,\cdot )\Vert _{H^s(\varOmega )}\) is uniformly bounded in x, i.e:

$$\begin{aligned} \sup _{x\in \varOmega }\Vert T(x,\cdot )\Vert _{H^{s}(\varOmega )} \le C <+\infty . \end{aligned}$$
(17)

Note that if \(T\in {\mathcal {T}}^s\), then H is a Hilbert–Schmidt operator since:

$$\begin{aligned} \Vert H\Vert _{HS}^2&= \int _{\varOmega } \int _{\varOmega } K(x,y)^2\,\mathrm{d}x\,\mathrm{d}y \end{aligned}$$
(18)
$$\begin{aligned}&= \int _{\varOmega } \int _{\varOmega } T(x,y)^2\,\mathrm{d}x\,\mathrm{d}y \end{aligned}$$
(19)
$$\begin{aligned}&= \int _{\varOmega } \Vert T(x,\cdot )\Vert _{L^2(\varOmega )}^2\,\mathrm{d}x <+\infty . \end{aligned}$$
(20)

We will often use the following regularity assumption.

Assumption 1

The TVIR T of H belongs to \({\mathcal {T}}^s\).

In many applications, the impulse responses have a bounded support, or at least a fast spatial decay allowing to neglect the tails. This property will be exploited to design faster algorithms. This hypothesis can be expressed by the following assumption.

Assumption 2

\(T(x,y)=0, \forall |x|>\kappa /2\).

3.2 Examples

We provide three examples of kernels that may appear in applications. Figure 1 shows each kernel as a 2D image, the associated TVIR and the spectrum of the operator J (the linear integral operator with kernel T) computed with an SVD.

Example 1

A typical kernel that motivates our study is defined by:

$$\begin{aligned} K(x,y)= \frac{1}{\sqrt{2\pi }\sigma (y)} \exp \left( - \frac{(x-y)^2}{2\sigma ^2(y)}\right) . \end{aligned}$$
(21)

The impulse responses \(K(\cdot ,y)\) are Gaussian for all \(y\in \varOmega \). Their variance \(\sigma (y)>0\) varies depending on the position y. The TVIR of K is defined by:

$$\begin{aligned} T(x,y)= \frac{1}{\sqrt{2\pi }\sigma (y)} \exp \left( - \frac{x^2}{2\sigma ^2(y)}\right) . \end{aligned}$$
(22)

The impulse responses \(T(\cdot ,y)\) are not compactly supported, therefore, \(\kappa =1\) in Assumption 2. However, it is possible to truncate them by setting \(\kappa =3\sup _{y\in \varOmega } \sigma (y)\) for instance. This kernel satisfies Assumption 1 only if \(\sigma :\varOmega \rightarrow {{\mathbb {R}}}\) is sufficiently smooth. In Fig. 1, left column, we set \(\sigma (y) = 0.08+ 0.02\cos (2\pi y)\).

Example 2

The second example is given by:

$$\begin{aligned} T(x,y)= \frac{2}{\sigma (y)} \max (1 - 2\sigma (y) |x|,0). \end{aligned}$$
(23)

The impulse responses \(T(\cdot ,y)\) are cardinal B-splines of degree 1 and width \(\sigma (y)>0\). They are compactly supported with \(\kappa =\sup _{y\in \varOmega } \sigma (y)\). This kernel satisfies Assumption 2 only if \(\sigma :\varOmega \rightarrow {{\mathbb {R}}}\) is sufficiently smooth. In Fig. 1, central column, we set \(\sigma (y) = 0.1 + 0.3 (1-|y|)\). This kernel satisfies Assumption 1 with \(s=1\).

Fig. 1
figure 1

Different kernels K, the associated TVIR T and the spectrum of the operator J. Left column corresponds to Example 1. Central column corresponds to Example 2. Right column corresponds to Example 3. a Kernel 1, b kernel 2, c kernel 3, d TVIR 1, e TVIR 2, f TVIR 3, g spectrum 1, h spectrum 2, and i spectrum 3

Example 3

The last example is a discontinuous TVIR. We set:

$$\begin{aligned} \begin{aligned} T(x,y)&= g_{\sigma _1}(x)\mathbbm {1}_{[-1/4,1/4]}(y) \\&\quad + g_{\sigma _2}(x)(1-\mathbbm {1}_{[-1/4,1/4]}(y)), \end{aligned} \end{aligned}$$
(24)

where \(g_{\sigma }(x) =\frac{1}{\sqrt{2\pi }} \exp \left( -\frac{x^2}{\sigma ^2}\right) \). This corresponds to the last column in Fig. 1, with \(\sigma _1=0.05\) and \(\sigma _2=0.1\). For this kernel, both Assumptions 1 and 2 are violated. Notice, however, that T is the sum of two tensor products and can therefore be represented using only four 1D functions. The spectrum of J should have only 2 non zero elements. This is verified in Fig. 1i, where the spectrum is 0 (up to numerical errors of order \(10^{-13}\)), except for the first two elements.

3.3 Product-Convolution Expansions as Low-Rank Approximations

Though similar in spirit, convolution-product (3) and product-convolution (4) expansions have a quite different interpretation captured by the following lemma.

Lemma 3

The TVIR \(T_m\) of the convolution-product expansion in (3) is given by:

$$\begin{aligned} T_m(x,y)= \sum _{k=1}^m h_k(x) w_k(x+y). \end{aligned}$$
(25)

The TVIR \(T_m\) of the product-convolution expansion in (4) is given by:

$$\begin{aligned} T_m(x,y)= \sum _{k=1}^m h_k(x) w_k(y). \end{aligned}$$
(26)

Proof

We only prove (26) since the proof of (25) relies on the same arguments. By definition:

$$\begin{aligned} (H_m u)(x)&= \left( \sum _{k=1}^m h_k \star (w_k \odot u)\right) (x) \end{aligned}$$
(27)
$$\begin{aligned}&= \int _{\varOmega } \sum _{k=1}^m h_k(x-y) w_k(y) u(y) \,\mathrm{d}y. \end{aligned}$$
(28)

By identification, this yields:

$$\begin{aligned} K_m(x,y) = \sum _{k=1}^m h_k(x-y) w_k(y), \end{aligned}$$
(29)

so that

$$\begin{aligned} T_m(x,y) = \sum _{k=1}^m h_k(x) w_k(y). \end{aligned}$$
(30)

\(\square \)

As can be seen in (26), product-convolution expansions consist of finding low-rank approximations of the TVIR. This interpretation was already proposed in [17] for instance and is the key observation to derive the forthcoming results. The expansion (25) does not share this simple interpretation, and we do not investigate it further in this paper.

3.4 Discretization

In order to implement a product-convolution expansion of type 4, the problem first needs to be discretized. We address this problem with a Galerkin formalism. Let \((\varphi _1,\ldots , \varphi _n)\) be a basis of a finite dimensional subspace \(V^n\) of \(L^2(\varOmega )\). Given an operator \(H:L^2(\varOmega )\rightarrow L^2(\varOmega )\), we can construct a matrix \(\varvec{H}^n\in {{\mathbb {R}}}^{n\times n}\) defined for all \(1\le i,j\le n\) by \(\varvec{H}^n[i,j] = \langle H\varphi _j, \varphi _i \rangle .\) Let \(S^n:H\mapsto \varvec{H}^n\) denote the discretization operator. From a matrix \(\varvec{H}^n\), an operator \(H^n\) can be reconstructed using, for instance, the pseudo-inverse \(S^{n,+}\) of \(S^n\). We let \(H^n=S^{n,+}(\varvec{H}^n)\). For instance, if \((\varphi _1,\ldots , \varphi _n)\) is an orthonormal basis of \(V^n\), the operator \(H^n\) is given by:

$$\begin{aligned} H^n = S^{n,+}(\varvec{H}^n) = \sum _{1\le i,j\le n} \varvec{H}^n[i,j] \varphi _i\otimes \varphi _j. \end{aligned}$$
(31)

This paper is dedicated to analyzing methods denoted \({\mathcal {A}}_m\) that provide an approximation \(H_m={\mathcal {A}}_m(H)\) of type (4), given an input operator H. Our analysis provides guarantees on the distance \(\Vert H-H_m\Vert _{HS}\) depending on m and the regularity properties of the input operator H, for different methods. Depending on the context, two different approaches can be used to implement \({\mathcal {A}}_m\).

  • Compute the matrix \(\varvec{H}_m^n = S^n(H_m)\) using numerical integration procedures. Then, create an operator \(H_m^n=S^{n,+}(\varvec{H}_m^n)\). This approach suffers from two defects. First, it is only possible by assuming that the kernel of H is given analytically. Moreover, it might be computationally intractable. It is illustrated below.

    figure c
  • In many applications, the operator H is not given explicitly. Instead, we only have access to its discretization \(\varvec{H}^n\). Then, it is possible to construct a discrete approximation algorithm \(\varvec{{\mathcal {A}}}_m\) yielding a discrete approximation \(\varvec{H}_m^n = \varvec{{\mathcal {A}}}_m(\varvec{H}^n)\). This matrix can then be mapped back to the continuous world using the pseudo-inverse: \(H_m^n=S^{n,+}(\varvec{H}_m^n)\). This is illustrated below. In this paper, we will analyze the construction complexity of \(\varvec{H}_m^n\) using this second approach.

    figure d

Ideally, we would like to provide guarantees on \(\Vert H-H_m^n\Vert _{HS}\) depending on m and n. In the first approach, this is possible by using the following inequality:

$$\begin{aligned} \Vert H-H_m^n\Vert _{HS} \le \underbrace{\Vert H - H_m\Vert _{HS}}_{\epsilon _a(m)} + \underbrace{\Vert H_m - H_m^n\Vert _{HS}}_{\epsilon _d(n)}, \end{aligned}$$
(32)

where \(\epsilon _a(m)\) is the approximation error studied in this paper and \(\epsilon _d(n)\) is the discretization error. Under mild regularity assumptions on K, it is possible to obtain results of type \(\epsilon _d(n) = O(n^{-\gamma })\), where \(\gamma \) depends on the smoothness of K. For instance, if \(K\in H^{r}(\varOmega \times \varOmega )\), the error satisfies \(\epsilon _d(n) = O(n^{-r/2})\) for many bases including Fourier, wavelets and B-splines [10]. For \(K\in BV(\varOmega \times \varOmega )\), the space of functions with bounded variations, \(\epsilon _d(n) = O(n^{-1/4})\), see [31, Theorem 9.3]. As will be seen later, the approximation error \(\epsilon _a(m)\) behaves like \(O(m^{-s})\). As will be seen later, the proposed approximation technique will be of interest only in the case \(m\ll n\), since otherwise, it will require storing too much data. Under this assumption, the discretization error can be considered negligible compared to the approximation error. In all the paper, we assume that \(\epsilon _d(n)\) is negligible compared to \(\epsilon _a(m)\) without mention.

In the second approach, the error analysis is more complex since there is an additional bias due to the algorithm discretization. This bias is captured by the following inequality:

$$\begin{aligned} \begin{aligned} \Vert H-H_m^n\Vert _{HS}&\le \underbrace{\Vert H - H^n\Vert _{HS}}_{\epsilon _d(n)} + \underbrace{\Vert H^n - {\mathcal {A}}_m(H^n) \Vert _{HS}}_{\epsilon _a(m)} \\&\qquad + \underbrace{\Vert {\mathcal {A}}_m(H^n) - H_m^n\Vert _{HS}}_{\epsilon _b(m,n)}. \end{aligned} \end{aligned}$$
(33)

The bias

$$\begin{aligned} \epsilon _b(m,n) = \Vert {\mathcal {A}}_m(S^{n,+}(S^n(H))) - S^{n,+}( \varvec{{\mathcal {A}}}_m( S^n(H))) \Vert _{HS} \end{aligned}$$
(34)

accounts for the difference between using the discrete or continuous approximation algorithm. In this paper, we do not study this bias error and assume that it is negligible compared to the approximation error \(\epsilon _a\).

3.5 Implementation and Complexity

Let \(\varvec{F}_n\in {\mathbb {C}}^{n\times n}\) denote the discrete inverse Fourier transform and \(\varvec{F}_n^*\) denote the discrete Fourier transform. Matrix-vector products \(\varvec{F}_n\varvec{u}\) or \(\varvec{F}_n^*\varvec{u}\) can be evaluated in \(O(n\log _2(n))\) operations using the fast Fourier transform (FFT). The discrete convolution-product \(\varvec{v}=\varvec{h}\star \varvec{u}\) is defined for all \(i\in {{\mathbb {Z}}}\) by \(\varvec{v}[i] = \sum _{j=1}^n \varvec{u}[i-j]\varvec{h}[j]\), with circular boundary conditions.

Discrete convolution-products can be evaluated in \(O(n\log _2(n))\) operations by using the following fundamental identity:

$$\begin{aligned} \varvec{v} = \varvec{F}_n \cdot ( (\varvec{F}_n^*\varvec{h}) \odot (\varvec{F}_n^*\varvec{u})). \end{aligned}$$
(35)

Hence, a convolution can be implemented using three FFTs (\(O(n\log _2(n))\) operations) and a point-wise multiplication (O(n) operations). This being said, it is straightforward to implement formula (4) with an \(O(mn\log _2(n))\) algorithm.

Under the additional assumption that \(w_k\) and \(h_k\) are supported on bounded intervals, the complexity can be improved. We assume that, after discretization, \(\varvec{h}_k\) and \(\varvec{w}_k\) are compactly supported, with support length \(q_k\le n\) and \(p_k\le n\), respectively.

Lemma 4

A matrix-vector product of type (4) can be implemented with a complexity that does not exceed

$$\begin{aligned} O\left( \sum _{k=1}^m (p_k+q_k) \log _2(\min (p_k,q_k)) \right) \end{aligned}$$

operations.

Proof

A convolution-product of type \(\varvec{h}_k\star (\varvec{w}_k\odot \varvec{u})\) can be evaluated in \(O((p_k+q_k) \log (p_k+q_k))\) operations. Indeed, the support of \(\varvec{h}_k\star (\varvec{w}_k\odot \varvec{u})\) has no more than \(p_k+q_k\) contiguous nonzero elements. Using the Stockham sectioning algorithm [39], the complexity can be further decreased to \(O((p_k+q_k) \log _2(\min (p_k,q_k)))\) operations. This idea was proposed in [27]. \(\square \)

4 Projections on Linear Subspaces

We now turn to the problem of choosing the functions \(h_k\) and \(w_k\) in Eq. (4). The idea studied in this section is to fix a subspace \(E_m=\mathrm {span}(e_k, k\in \{1,\ldots , m\})\) of \(L^2(\varOmega )\) and to approximate \(T(x,\cdot )\) as:

$$\begin{aligned} T_m(x,y) = \sum _{k=1}^m c_k(x) e_k(y). \end{aligned}$$
(36)

For instance, the coefficients \(c_k\) can be chosen so that \(T_m(x,\cdot )\) is a projection of \(T(x,\cdot )\) onto \(E_m\). We propose to analyze three different family of functions \(e_k\): Fourier atoms, wavelets atoms and B-splines. We analyze their complexity and approximation properties as well as their respective advantages.

Fig. 2
figure 2

Kohn–Nirenberg symbols of the kernels given in Examples 1, 2 and 3 in \(\log _{10}\) scale. Observe how the decay speed from the center (low frequencies) to the outer parts (high frequencies) changes depending on the TVIR smoothness. Note the lowest values of the Kohn–Nirenberg symbol have been set to \(10^{-4}\) for visualization purposes. a Kernel 1, b kernel 2 and c kernel 3

4.1 Fourier Decompositions

It is well known that functions in \(H^s(\varOmega )\) can be well approximated by linear combination of low-frequency Fourier atoms. This loose statement is captured by the following lemma.

Lemma 5

([18, 19]) Let \(f\in H^s(\varOmega )\) and \(f_m\) denote its partial Fourier series:

$$\begin{aligned} f_m=\sum _{k=-m}^m {\hat{f}}[k] e_k, \end{aligned}$$
(37)

where \(e_k(y) =\exp (-2 i \pi k y)\). Then

$$\begin{aligned} \Vert f_m-f\Vert _{L^2(\varOmega )} \le C m^{-s} |f|_{H^s(\varOmega )}. \end{aligned}$$
(38)

The so-called Kohn–Nirenberg symbol N of H is defined for all \((x,k)\in \varOmega \times {{\mathbb {Z}}}\) by

$$\begin{aligned} N(x,k) = \int _{\varOmega } T(x,y) \exp (-2i\pi ky) \,\mathrm{d}y. \end{aligned}$$
(39)

Illustrations of different Kohn–Nirenberg symbols are provided in Fig. 2.

Corollary 1

Set \(e_k(y) =\exp (-2 i \pi k y)\) and define \(T_m\) by:

$$\begin{aligned} T_m(x,y)=\sum _{|k|\le m} N(x,k) e_k(y). \end{aligned}$$
(40)

Then, under Assumptions 1 and 2

$$\begin{aligned} \Vert H_m - H\Vert _{HS} \le C \sqrt{\kappa }m^{-s}. \end{aligned}$$
(41)

Proof

By Lemma 5 and Assumption 1,

$$\begin{aligned} \Vert T_m(x,\cdot ) - T(x,\cdot ) \Vert _{L^2(\varOmega )} \le C m^{-s} \end{aligned}$$

for some constant C and for all \(x \in \varOmega \). In addition, by Assumption 2, \(\Vert T_m(x,\cdot )-T(x,\cdot ) \Vert _{L^2(\varOmega )}=0\) for \(|x|>\kappa /2\). Therefore:

$$\begin{aligned} \Vert H_m-H\Vert _{HS}^2&= \int _\varOmega \int _\varOmega (T_m(x,y) - T(x,y))^2\, \mathrm{d}x\,\mathrm{d}y \end{aligned}$$
(42)
$$\begin{aligned}&= \int _\varOmega \Vert T_m(x,\cdot ) - T(x,\cdot ) \Vert _{L^2(\varOmega )}^2 \,\mathrm{d}x \end{aligned}$$
(43)
$$\begin{aligned}&\le \kappa C^2m^{-2s}\,\mathrm{d}x \end{aligned}$$
(44)

\(\square \)

As will be seen later, the convergence rate (41) is optimal in the sense that no product-convolution expansion of order m can achieve a better rate under the sole Assumptions 1 and 2.

Corollary 2

Let \(\epsilon >0\) and set \(m = \lceil C \epsilon ^{-1/s} \kappa ^{1/2s}\rceil \). Under Assumptions 1 and 2, \(H_m\) satisfies \(\Vert H - H_m \Vert _{HS} \le \epsilon \) and products with \(H_m\) and \(H_m^*\) can be evaluated with no more than \(O(\kappa ^{1/2s} n \log n\epsilon ^{-1/s} )\) operations.

Proof

Since Fourier atoms are not localized in the time domain, the modulation functions \(\varvec{w}_k\) are supported on intervals of size \(p = n\). The complexity of computing a matrix vector product is therefore \(O( m n \log (n) )\) operations by Lemma 4. \(\square \)

Finally, let us mention that computing the discrete Kohn–Nirenberg symbol \(\varvec{N}\) costs \(O(\kappa n^2\log _2(n))\) operations (\(\kappa n\) discrete Fourier transforms of size n). The storage cost of this Fourier representation is \(O(m\kappa n)\) since one has to store \(\kappa n\) coefficients for each of the m vectors \(\varvec{h}_k\).

In the next two sections, we show that replacing Fourier atoms by wavelet atoms or B-splines preserves the optimal rate of convergence in \(O(\sqrt{\kappa } m^{-s})\), but has the additional advantage of being localized in space, thereby reducing complexity.

4.2 Spline Decompositions

B-Splines form a Riesz basis with dual Riesz basis of form [8]:

$$\begin{aligned} ({\tilde{B}}_{\alpha ,m}(\cdot - k/m))_{0\le k \le m-1}. \end{aligned}$$
(45)

The projection \(f_m\) of any \(f\in L^2(\varOmega )\) onto \({\mathcal {B}}_{\alpha ,m}\) can be expressed as:

$$\begin{aligned} f_m&= {\mathrm{arg}} \, {\mathrm{min}}_{{\tilde{f}} \in {\mathcal {B}}_{\alpha ,m}} \Vert {\tilde{f}} -f \Vert _2^2 \end{aligned}$$
(46)
$$\begin{aligned}&=\sum _{k=0}^{m-1} \langle f, {\tilde{B}}_{\alpha ,m}(\cdot - k/m)\rangle B_{\alpha ,m}(\cdot - k/m). \end{aligned}$$
(47)

Theorem 1

( [4, p. 87] or [19, p. 420]) Let \(f \in H^s(\varOmega )\) and \(\alpha \ge s\), then

$$\begin{aligned} \Vert f - f_m \Vert _2 \le C \sqrt{\kappa } m^{-s} \Vert f \Vert _{W^{s,2}}. \end{aligned}$$
(48)

The following result directly follows.

Corollary 3

Set \(\alpha \ge s\). For each \(x\in \varOmega \), let \((c_k(x))_{0\le k\le m-1}\) be defined as

$$\begin{aligned} c_k(x) = \langle T(x,\cdot ) , {\tilde{B}}_{\alpha ,m}(\cdot - k/m)\rangle . \end{aligned}$$
(49)

Define \(T_m\) by:

$$\begin{aligned} T_m(x,y)=\sum _{k=0}^{m-1} c_k(x) B_{\alpha ,m}(y - k/m). \end{aligned}$$
(50)

If \(\alpha \ge s\), then, under Assumptions 1 and 2,

$$\begin{aligned} \Vert H_m - H\Vert _{HS} \le C \sqrt{\kappa } m^{-s}. \end{aligned}$$
(51)

Proof

The proof is similar to that of Corollary (1). \(\square \)

Corollary 4

Let \(\epsilon >0\) and set \(m = \lceil C \epsilon ^{-1/s} \kappa ^{1/2s} \rceil \). Under Assumptions 1 and 2, \(H_m\) satisfies \(\Vert H - H_m \Vert _{HS} \le \epsilon \) and products with \(H_m\) and \(H_m^*\) can be evaluated with no more than

$$\begin{aligned} O\left( \left( s + \kappa ^{1+1/2s} \epsilon ^{-1/s} \right) n \log _2(\kappa n) \right) \end{aligned}$$
(52)

operations. For small \(\epsilon \) and large n, the complexity behaves like

$$\begin{aligned} O\left( \kappa ^{1+1/2s} n \log _2 (\kappa n) \epsilon ^{-1/s} \right) . \end{aligned}$$
(53)

Proof

In this approximation, m B-splines are used to cover \(\varOmega \). B-splines have a compact support of size \((\alpha +1)/m\). This property leads to windowing vector \(\varvec{w}_k\) with support of size \(p = \lceil (\alpha +1)\frac{n}{m}\rceil \). Furthermore, the vectors \((\varvec{h}_k)\) have a support of size \(q = \kappa n\). Combining these two results with Lemma 4 and Corollary 3 yields the result for the choice \(\alpha = s\). \(\square \)

The complexity of computing the vectors \(\varvec{c}_k\) is \(O(\kappa n^2\log (n))\) (\(\kappa n\) projections with complexity \(n\log (n)\), see, e.g., [41]).

As shown in Corollary (4), B-splines approximations are preferable over Fourier decompositions whenever the support size \(\kappa \) is small.

4.3 Wavelet Decompositions

Lemma 6

([31, Theorem 9.5]) Let \(f\in H^s(\varOmega )\) and \(f_m\) denote its partial wavelet series:

$$\begin{aligned} f_m=\sum _{|\mu | \le \lceil \log _2(m) \rceil } c_\mu \psi _{\mu }, \end{aligned}$$
(54)

where \(\psi \) is a Daubechies wavelet with \(\alpha > s\) vanishing moments and \(c_\mu =\langle \psi _{\mu },f\rangle \). Then

$$\begin{aligned} \Vert f_m-f\Vert _{L^2(\varOmega )} \le C m^{-s} |f|_{H^s(\varOmega )}. \end{aligned}$$
(55)

A direct consequence is the following corollary.

Corollary 5

Let \(\psi \) be a Daubechies wavelet with \(\alpha = s+1\) vanishing moments. Define \(T_m\) by:

$$\begin{aligned} T_m(x,y)=\sum _{|\mu |\le \lceil \log _2(m) \rceil } c_\mu (x) \psi _{\mu }(y), \end{aligned}$$
(56)

where \(c_\mu (x) = \langle \psi _{\mu },T(x,\cdot ) \rangle \). Then, under Assumptions 1 and 2

$$\begin{aligned} \Vert H_m - H\Vert _{HS} \le C \sqrt{\kappa } m^{-s}. \end{aligned}$$
(57)

Proof

The proof is identical to that of Corollary (1). \(\square \)

Proposition 1

Let \(\epsilon >0\) and set \(m = \lceil C \epsilon ^{-1/s} \kappa ^{1/2s} \rceil \). Under Assumptions 1 and 2, \(H_m\) satisfies \(\Vert H - H_m \Vert _{HS} \le \epsilon \) and products with \(H_m\) and \(H_m^*\) can be evaluated with no more than

$$\begin{aligned} O\left( \left( sn \log _2 \left( \epsilon ^{-1/s} \kappa ^{1/2s} \right) + \kappa ^{1 + 1/2s} n \epsilon ^{-1/s} \right) \log _2(\kappa n) \right) \end{aligned}$$
(58)

operations. For small \(\epsilon \), the complexity behaves like

$$\begin{aligned} O\left( \kappa ^{1+1/2s} n \log _2(\kappa n) \epsilon ^{-1/s} \right) . \end{aligned}$$
(59)

Proof

In (56), the windowing vectors \(\varvec{w}_k\) are wavelets \(\varvec{\psi }_\mu \) of support of size \(\min ( (2s+1) n 2^{-|\mu |}, n )\). Therefore, each convolution has to be performed on intervals of size \(|\varvec{\psi }_\mu | + q + 1\). Since there are \(2^{j}\) wavelets at scale j, the total number of operations is:

$$\begin{aligned}&\sum _{\mu \, | \, |\mu | < \log _2(m)} ( |\varvec{\psi }_\mu | + q + 1 ) \log _2( \min ( |\varvec{\psi }_\mu |, q+1) ) \end{aligned}$$
(60)
$$\begin{aligned}&\quad \le \sum _{\mu \, | \, |\mu | < \log _2(m)} ( (2s+1) n 2^{-|\mu |} + \kappa n ) \log _2( \kappa n ) \end{aligned}$$
(61)
$$\begin{aligned}&\quad = \sum _{j=0}^{\log _2(m) - 1} 2^j \left( (2s+1)n 2^{-j} + \kappa n\right) \log _2( \kappa n) \end{aligned}$$
(62)
$$\begin{aligned}&\quad = \sum _{j=0}^{\log _2(m) - 1} \left( (2s+1)n + 2^j \kappa n \right) \log _2( \kappa n) \end{aligned}$$
(63)
$$\begin{aligned}&\quad \le \left( (2s+1)n \log _2(m) + m \kappa n\right) \log _2( \kappa n ) \end{aligned}$$
(64)
$$\begin{aligned}&\quad = \left( (2s+1)n \log _2(\epsilon ^{-1/s} \kappa ^{1/2s}) + \epsilon ^{-1/s} \kappa ^{1 + 1/2s} n\right) \log _2( \kappa n ). \end{aligned}$$
(65)

\(\square \)

Fig. 3
figure 3

“Wavelet symbols” of the operators given in Examples 1, 2 and 3 in \(\log _{10}\) scale. The red bars indicate separations between scales. Notice that the wavelet coefficients in kernel 1 rapidly decay as scales increase. The decay is slower for kernels 2 and 3 which are less regular. The adaptivity of wavelets can be visualized in kernel 3: Some wavelet coefficients are non zero at large scales, but they are all concentrated around discontinuities. Therefore, only a few number of couples \((c_\mu ,\psi _\mu )\) will be necessary to encode the discontinuities. This was not the case with Fourier or B-spline atoms. a Kernel 1, b, kernel 2 and c kernel 3 (Color figure online)

Computing the vectors \(\varvec{c}_\mu \) costs \(O(\kappa s n^2)\) operations (\(\kappa n\) discrete wavelet transforms of size n). The storage cost of this wavelet representation is \(O(m\kappa n)\) since one has to store \(\kappa n\) coefficients for each of the m functions \(\varvec{h}_k\).

As can be seen from this analysis, wavelet and B-spline approximations roughly have the same complexity over the class \({\mathcal {T}}^s\). The main advantage of wavelets compared to B-splines with fixed knots is that they are known to characterize much more general function spaces than \(H^s(\varOmega )\). For instance, if all functions \(T(x,\cdot )\) have a single discontinuity at a given \(y\in \varOmega \), only a few coefficients \(c_\mu (x)\) will remain of large amplitude. Wavelets will be able to efficiently encode the discontinuity, while B-splines with fixed knots—which are not localized in nature—will fail to well approximate the TVIR. It is therefore possible to use wavelets in an adaptive way. This effect is visible in Fig. 3c: Despite discontinuities, only wavelets localized around the discontinuities yield large coefficients. In the next section, we propose two other adaptive methods, in the sense that they are able to automatically adapt to the TVIR regularity.

4.4 Interpolation VS Approximation

In all previous results, we constructed the functions \(w_k\) and \(h_k\) in 4 by projecting \(T(x,\cdot )\) onto linear subspaces. This is only possible if the whole TVIR T is available. In very large-scale applications, this assumption is unrealistic, since the TVIR contains \(n^2\) coefficients, which cannot even be stored. Instead of assuming a full knowledge of T, some authors (e.g., [34]) assume that the impulse responses \(T(\cdot , y)\) are available only at a discrete set of points \(y_i=i/m\) for \(1\le i\le m\).

In that case, it is possible to interpolate the impulse responses instead of approximating them. Given a linear subspace \(E_m=\mathrm {span}(e_k, k\in \{1,\ldots , m\})\), where the atoms \(e_k\) are assumed to be linearly independent, the functions \(c_k(x)\) in (36) are chosen by solving the set of linear systems:

$$\begin{aligned} \sum _{k=1}^m c_k(x) e_k(y_i) = T_m(x,y_i) \quad \text {for} \quad 1\le i \le m. \end{aligned}$$
(66)

In the discrete setting, under Assumption 2, this amounts to solving \(\lceil \kappa n \rceil \) linear systems of size \(m\times m\). The analysis of such a method requires using very different tools. We refer the interested reader to our recent work [5], where we investigate the rates of convergence with respect to the number of impulse responses, their geometry and the level of noise on the data.

4.5 On Meyer’s Operator Representation

Up to now, we only assumed a regularity of T in the y direction, meaning that the impulse responses vary smoothly in space. In many applications, the impulse responses themselves are smooth. In this section, we show that this additional regularity assumption can be used to further compress the operator. Finding a compact operator representation is a key to treat identification or estimation problems (e.g., blind deblurring in imaging), see, e.g., [30].

Since \((\psi _{\lambda })_{\lambda \in \varLambda }\) is a Hilbert basis of \(L^2(\varOmega )\), the set of tensor product functions \((\psi _\lambda \otimes \psi _\mu )_{\lambda \in \varLambda ,\mu \in \varLambda }\) is a Hilbert basis of \(L^2(\varOmega \times \varOmega )\). Therefore, any \(T\in L^2(\varOmega \times \varOmega )\) can be expanded as:

$$\begin{aligned} T(x,y) = \sum _{\lambda \in \varLambda } \sum _{\mu \in \varLambda } c_{\lambda ,\mu } \psi _\lambda (x) \psi _\mu (y). \end{aligned}$$
(67)

The main idea of the construction in this section consists of keeping only the coefficients \(c_{\lambda ,\mu }\) of large amplitude. A similar idea was proposed in the BCR paper [3]Footnote 2, except that the kernel K was expanded instead of the TVIR T. Decomposing T was suggested by Beylkin at the end of [2] without a precise analysis.

In this section, we assume that \(T\in H^{r,s}(\varOmega \times \varOmega )\), where

$$\begin{aligned} \begin{aligned} H^{r,s}(\varOmega \times \varOmega )=\{T:&\varOmega \times \varOmega \rightarrow {{\mathbb {R}}},\ \partial _{x}^{\alpha _1}\partial _{y}^{\alpha _2}T \in L^2(\varOmega \times \varOmega ), \\&\forall \alpha _1\in \{0,\ldots , r\}, \forall \alpha _2\in \{0,\ldots , s\} \}. \end{aligned} \end{aligned}$$
(68)

This space arises naturally in applications, where the impulse response regularity r might differ from the regularity s of their variations. Notice that \(H^{2s}(\varOmega \times \varOmega )\subset H^{s,s}(\varOmega \times \varOmega ) \subset H^{s}(\varOmega )\).

Theorem 2

Assume that \(T\in H^{r,s}(\varOmega \times \varOmega )\) and satisfies Assumption 2. Assume that \(\psi \) has \(\max (r,s)+1\) vanishing moments. Let \(c_{\lambda ,\mu }= \langle T, \psi _\lambda \otimes \psi _\mu \rangle \). Define

$$\begin{aligned} H_{m_1,m_2} = \sum _{|\lambda |\le \log _2(m_1)} \sum _{|\mu |\le \log _2(m_2)} c_{\lambda ,\mu } \psi _{\lambda }\otimes \psi _\mu . \end{aligned}$$
(69)

Let \(m\in {{\mathbb {N}}}\), set \(m_1 =\lceil m^{s/(r+s)}\rceil \), \(m_2=\lceil m^{r/(r+s)}\rceil \) and \(H_m=H_{m_1,m_2}\). Then

$$\begin{aligned} \Vert H-H_m\Vert _{HS} \le C \sqrt{\kappa } m^{-\frac{rs}{r+s}}. \end{aligned}$$
(70)

Proof

First notice that

$$\begin{aligned} T_{\infty , m_2} = \sum _{|\mu | \le \lceil \log _2(m_2) \rceil } c_\mu \otimes \psi _\mu , \end{aligned}$$
(71)

where \(c_\mu (x) = \langle T(x,\cdot ) , \psi _\mu \rangle \). From Corollary 5, we get:

$$\begin{aligned} \Vert T_{\infty , m_2} - T\Vert _{L^2(\varOmega \times \varOmega )} \le C \sqrt{\kappa } m_2^{-s}. \end{aligned}$$
(72)

Now, notice that \(c_\mu \in H^r(\varOmega )\). Indeed, for all \(0 \le k \le r\), we get:

$$\begin{aligned}&\int _{\varOmega } (\partial _x^k c_{\mu }(x)) ^2 \,\mathrm{d}x \end{aligned}$$
(73)
$$\begin{aligned}&\quad = \int _{\varOmega } \left( \partial _x^k \int _\varOmega T(x,y) \psi _\mu (y)\, \mathrm{d}y \right) ^2 \,\mathrm{d}x \end{aligned}$$
(74)
$$\begin{aligned}&\quad = \int _{\varOmega } \left( \int _\varOmega (\partial _x^k T)(x,y) \psi _\mu (y)\, \mathrm{d}y \right) ^2 \,\mathrm{d}x \end{aligned}$$
(75)
$$\begin{aligned}&\quad \le \int _{\varOmega } \Vert (\partial _x^k T)(x,\cdot )\Vert _{L^2(\varOmega )}^2 \Vert \psi _\mu \Vert _{L^2(\varOmega )}^2 \,\mathrm{d}x \end{aligned}$$
(76)
$$\begin{aligned}&\quad = \Vert (\partial _x^k T) \Vert _{L^2(\varOmega \times \varOmega )}<+\infty . \end{aligned}$$
(77)

Therefore, we can use Lemma 6 again to show:

$$\begin{aligned} \Vert T_{\infty ,m_2} - T_{m_1,m_2}\Vert _{L^2(\varOmega \times \varOmega )} \le C \sqrt{\kappa } m_1^{-r}. \end{aligned}$$
(78)

Finally, using the triangle inequality, we get:

$$\begin{aligned} \Vert T - T_{m_1,m_2}\Vert _{HS} \le C\sqrt{\kappa }(m_1^{-r} + m_2^{-s}). \end{aligned}$$
(79)

By setting \(m_1=m_2^{s/r}\), the two approximation errors in the right-hand side of (79) are balanced. This motivates the choice of \(m_1\) and \(m_2\) indicated in the theorem. \(\square \)

The approximation result in inequality (70) is worse than the previous ones. For instance if \(r=s\), then the bound becomes \(O(\sqrt{\kappa } m^{-s/2})\) instead of \(O(\sqrt{\kappa }m^{-s})\) in all previous theorems. The great advantage of this representation is the operator storage: Until now, the whole set of vectors \((\varvec{c}_\mu )\) had to be stored (\(O(\kappa n m)\) values), while now, only m coefficients \(c_{\lambda ,\mu }\) are required. For instance, in the case \(r=s\), for an equivalent precision, the storage cost of the new representation is \(O(\kappa m^2)\) instead of \(O(\kappa nm)\). Figure 4 illustrates the compression properties of Meyer’s representations.

In addition, evaluating matrix-vector products can be achieved rapidly by using the following trick:

$$\begin{aligned} \varvec{H_m} \varvec{u}&= \sum _{|\lambda |\le \log _2(m_1)} \sum _{|\mu |\le \log _2(m_2)} c_{\lambda ,\mu } \varvec{\psi }_{\lambda } \star (\varvec{\psi }_{\mu } \odot \varvec{u}) \end{aligned}$$
(80)
$$\begin{aligned}&= \sum _{|\mu |\le \log _2(m_2)} \left( \sum _{|\lambda |\le \log _2(m_1)} c_{\lambda ,\mu } \varvec{\psi }_{\lambda }\right) \star (\varvec{\psi }_{\mu } \odot \varvec{u}). \end{aligned}$$
(81)

By letting \(\tilde{\varvec{c}}_\mu = \sum _{|\lambda |\le \log _2(m_1)} c_{\lambda ,\mu } \varvec{\psi }_{\lambda }\), we get

$$\begin{aligned} \varvec{H}_m \varvec{u} = \sum _{|\mu |\le \log _2(m_2)} \tilde{\varvec{c}}_\mu \star (\varvec{\psi }_\mu \odot \varvec{u}). \end{aligned}$$
(82)

which can be computed in \(O(m_2 \kappa n \log _2(\kappa n))\) operations. This remark leads to the following proposition.

Proposition 2

Assume that \(T\in H^{r,s}(\varOmega \times \varOmega )\) and that it satisfies Assumption 2. Set \(m=\left\lceil \left( \frac{\epsilon }{C\sqrt{\kappa }} \right) ^{-(r+s)/rs}\right\rceil \). Then, the operator \(H_m\) defined in Theorem 2 satisfies \(\Vert H-H_m\Vert _{HS}\le \epsilon \) and the number of operations necessary to evaluate a product with \(H_m\) or \(H_m^*\) is bounded above by \(O\left( \epsilon ^{-1/s} \kappa ^{\frac{2s+1}{2s}} n \log _2(n)\right) \).

Notice that the complexity of matrix-vector products is unchanged compared to the wavelet or spline approaches with a much better compression ability. However, this method requires a preprocessing to compute \(\tilde{\varvec{c}}_\mu \) with complexity \(\epsilon ^{-1/s} \kappa ^{1/2s} n\) (Fig. 4).

Fig. 4
figure 4

Meyer’s representations of the operators in Examples 1, 2 and 3 in \(\log _{10}\) scale. a Kernel 1, b kernel 2 and c kernel 3

5 Adaptive Decompositions

In the last section, all methods shared the same principle: project \(T(x,\cdot )\) on a fixed basis for each \(x\in \varOmega \). Instead of fixing a basis, one can try to find a basis adapted to the operator at hand. This idea was proposed in [21] and [17].

5.1 Singular Value Decompositions

The authors of [21] proposed to use a singular value decomposition (SVD) of the TVIR in order to construct the functions \(h_k\) and \(w_k\). In this section, we first detail this idea and then analyze it from an approximation theoretic point of view. Let \(J:L^2(\varOmega )\rightarrow L^2(\varOmega )\) denote the linear integral operator with kernel \(T\in {\mathcal {T}}^s\). First notice that J is a Hilbert–Schmidt operator since \(\Vert J\Vert _{HS}=\Vert H\Vert _{HS}\). By Lemma 2 and since Hilbert–Schmidt operators are compact, there exists two orthonormal bases \((e_k)\) and \((f_k)\) of \(L^2(\varOmega )\) such that J can be decomposed as

$$\begin{aligned} J = \sum _{k\ge 1} \sigma _k \cdot e_k \otimes f_k, \end{aligned}$$
(83)

leading to

$$\begin{aligned} T(x,y)= \sum _{k=1}^{+\infty } \sigma _k f_k(x) e_k(y). \end{aligned}$$
(84)

The following result is a standard.

Theorem 3

For a given m, a set of functions \((h_k)_{1\le k\le m }\) and \((w_k)_{1\le k\le m }\) that minimizes \(\Vert H_m - H\Vert _{HS}\) is given by:

$$\begin{aligned} h_k = \sigma _k f_k \quad \text {and} \quad w_k = e_k. \end{aligned}$$
(85)

Moreover, if \(T(x,\cdot )\) satisfies Assumptions 1 and 2, we get:

$$\begin{aligned} \Vert H_m - H\Vert _{HS} = O\left( \sqrt{\kappa } m^{-s} \right) . \end{aligned}$$
(86)

Proof

The proof of optimality (86) is standard. Since \(T_m\) is the best rank m approximation of T, it is necessarily better than bound (41), yielding (86). \(\square \)

Theorem 4

For all \(\epsilon > 0\) and \(m<n\), there exists an operator H with TVIR satisfying 1 and 2 such that:

$$\begin{aligned} \Vert H_m - H\Vert _{HS} \ge C \sqrt{\kappa } m^{-(s+\epsilon )}. \end{aligned}$$
(87)

Proof

In order to prove (87), we construct a “worst case” TVIR T. We first begin by constructing a kernel T with \(\kappa =1\) to show a simple pathological TVIR. Define T by:

$$\begin{aligned} T(x,y) = \sum _{k \in {{\mathbb {Z}}}} \sigma _k f_k(x) f_k(y), \end{aligned}$$
(88)

where \(f_k(x) = \exp (2i\pi k x)\) is the k-th Fourier atom, \(\sigma _0=0\) and \(\sigma _k = \sigma _{-k} = \frac{1}{|k|^{s+1/2+\epsilon /2}}\) for \(|k|\ge 1\). With this choice,

$$\begin{aligned} T(x,y) = \sum _{|k| \le N} 2 \sigma _k \cos (2\pi (x+y)) \end{aligned}$$
(89)

is real for all (xy). We now prove that \(T\in {\mathcal {T}}^s\). The k-th Fourier coefficient of \(T(x,\cdot )\) is given by \(\sigma _k f_k(x)\) which is bounded by \(\sigma _k\) for all x. By Lemma 1, \(T(x,\cdot )\) therefore belongs to \(H^{s}(\varOmega )\) for all \(x\in \varOmega \). By construction, the spectrum of T is \((|\sigma _k|)_{k\in {{\mathbb {N}}}}\), therefore for any rank \(2m+1\) approximation of T, we get:

$$\begin{aligned} \Vert T-T_{2m+1}\Vert _{HS}^2&\ge \sum _{|k|\ge m+1} \frac{1}{|k|^{2s+1+\epsilon }} \end{aligned}$$
(90)
$$\begin{aligned}&\ge \int _{m+1}^\infty \frac{2}{t^{2s+1+\epsilon }} \,dt \end{aligned}$$
(91)
$$\begin{aligned}&= \frac{1}{2s+\epsilon } \frac{2}{(m+1)^{2s+\epsilon }} \end{aligned}$$
(92)
$$\begin{aligned}&= O(m^{-2s-\epsilon }), \end{aligned}$$
(93)

proving the result for \(\kappa =1\). Notice that the kernel K of the operator with TVIR T only depends on x:

$$\begin{aligned} K(x,y) = \sum _{|k| \le N} 2 \sigma _k \cos (2\pi x). \end{aligned}$$
(94)

Therefore, the worst-case TVIR exhibited here is that of a rank 1 operator H. Obviously, it cannot be well approximated by product-convolution expansions.

Let us now construct a TVIR satisfying Assumption 2. For this, we first construct an orthonormal basis \((\tilde{f}_k)_{k\in {{\mathbb {Z}}}}\) of \(L^2([-\kappa /2,\kappa /2])\) defined by:

$$\begin{aligned} {\tilde{f}}_k(x) = \left\{ \begin{array}{ll} \frac{1}{\sqrt{\kappa }} f_k\left( \frac{x}{\kappa }\right) &{} \text {if}\ |x|\le \frac{\kappa }{2}, \\ 0 &{} \text {otherwise}. \end{array}\right. \end{aligned}$$
(95)

The worst-case operator considered now is defined by:

$$\begin{aligned} T(x,y) = \sum _{k\in {{\mathbb {Z}}}} {\tilde{\sigma }}_k {\tilde{f}}_k(x) f_k(y). \end{aligned}$$
(96)

Its spectrum is \((|{\tilde{\sigma }}_k|)_{k\in {{\mathbb {Z}}}}\), and we get

$$\begin{aligned} |\langle T(x,\cdot ), f_k \rangle | = |{\tilde{\sigma }}_k {\tilde{f}}_k(x)| = \frac{1}{\kappa } |{\tilde{\sigma }}_k|. \end{aligned}$$
(97)

By Lemma 5, if \({\tilde{\sigma }}_k = \frac{\kappa }{(1+|k|^2)^s|k|^{1+\epsilon }}\), then \(\Vert T(x,\cdot )\Vert _{H^s(\varOmega )}\) is uniformly bounded by a constant independent of \(\kappa \). Moreover, by reproducing the reasoning in (90), we get:

$$\begin{aligned} \Vert T-T_{2m+1}\Vert _{HS}^2 = O(\kappa m^{-2s-\epsilon }). \end{aligned}$$
(98)

\(\square \)

Even if the SVD provides an optimal decomposition, there is no guarantee that functions \(e_k\) are supported on an interval of small size. As an example, it suffices to consider the “worst case” TVIR given in Eq. (88). Therefore, vectors \(\varvec{w}_k\) are generically supported on intervals of size \(p = n\). This yields the following proposition.

Corollary 6

Let \(\epsilon >0\) and set \(m = \lceil C \epsilon ^{-1/s} \kappa ^{1/2s}\rceil \). Then, \(H_m\) satisfies \(\Vert H - H_m \Vert _{HS} \le \epsilon \) and a product with \(H_m\) and \(H_m^*\) can be evaluated with no more than \(O( \kappa ^{1/2s} n \log n\epsilon ^{-1/s})\) operations.

Computing the first m singular vectors in (84) can be achieved in roughly \(O(\kappa n^2\log (m))\) operations thanks to recent advances in randomized algorithms [25]. The storage cost for this approach is O(mn) since the vectors \(\varvec{e}_k\) have no reason to be compactly supported.

5.2 The Optimization Approach in [17]

In [17], the authors propose to construct the windowing functions \(w_k\) and the filters \(h_k\) using constrained optimization procedures. For a fixed m, they propose solving:

$$\begin{aligned} \min _{(h_k,w_k)_{1\le k \le m}} \left\| T - \sum _{k=1}^m h_k\otimes w_k \right\| _{HS}^2 \end{aligned}$$
(99)

under an additional constraint that \(\mathrm {supp}(w_k)\subset \omega _k\) with \(\omega _k\) chosen so that \(\cup _{k=1}^m \omega _k =\varOmega \). A decomposition of type 99 is known as structured low-rank approximation [9]. This problem is nonconvex, and to the best of our knowledge, there currently exists no algorithm running in a reasonable time to find its global minimizer. It can however be solved approximately using alternating minimization like algorithms.

Depending on the choice of the supports \(\omega _k\), different convergence rates can be expected. However, by using the results for B-splines in Sect. 4.2, we obtain the following proposition.

Proposition 3

Set \(\omega _k=[(k-1)/m,k/m+s/m]\) and let \((h_k,w_k)_{1\le k \le m}\) denote the global minimizer of (99). Define \(T_m\) by \(T_m(x,y) = \sum _{k=1}^m h_k(x) w_k(y)\). Then:

$$\begin{aligned} \Vert T - T_m\Vert _{HS}^2 \le C \sqrt{\kappa }m^{-s}. \end{aligned}$$
(100)

Set \(m=\lceil \kappa ^{1/2s} C\epsilon ^{-1/s}\rceil \), then \(\Vert H_m-H\Vert _{HS}\le \epsilon \) and the evaluation of a product with \(H_m\) or \(H_m^*\) is of order

$$\begin{aligned} O(\kappa ^{1 + 1/2s} n\log (n) \epsilon ^{-1/s}). \end{aligned}$$
(101)

Proof

First notice that cardinal B-Splines are also supported on \([(k-1)/m,k/m+s/m]\). Since the method in [17] provides the best choices for \((h_k,w_k)\), the distance \(\Vert H_m-H\Vert _{HS}\) is necessarily lower than that obtained using B-splines in Corollary 3. \(\square \)

Finally, let us mention that—owing to Corollary 5—it might be interesting to use the optimization approach (99) with windows of varying sizes.

Table 1 Summary of the properties of different constructions

6 Summary and Extensions

6.1 A Summary of All Results

Table 1 summarizes the results derived so far under Assumptions 1 and 2. In the particular case of Meyer’s methods, we assume that \(T\in H^{r,s}(\varOmega \times \varOmega )\) instead of Assumption 1. As can be seen in this table, different methods should be used depending on the application. The best methods are:

  • Wavelets: They are adaptive and have a relatively low construction complexity, and matrix-vector products also have the best complexity.

  • Meyer: This method has a big advantage in terms of storage. The operator can be represented very compactly with this approach. It has a good potential for problems where the operator should be inferred (e.g., blind deblurring). It however requires stronger regularity assumptions.

  • The SVD and the method proposed in [17] both share an optimal adaptivity. The representation however depends on the operator and it is more costly to evaluate it.

6.2 Extensions to Higher Dimensions

Most of the results provided in this paper are based on standard approximation results in 1D, such as Lemmas 5, 6 and 1. All these lemmas can be extended to higher dimension, and we refer the interested reader to [18, 19, 31, 36] for more details.

We now assume that \(\varOmega =[0,1]^d\) and that the diameter of the impulse responses is bounded by \(\kappa \in [0,1]\). Using the mentioned results, it is straightforward to show that the approximation rate of all methods now becomes

$$\begin{aligned} \Vert H-H_m\Vert _{HS} = O(\kappa ^{d/2} m^{-s/d}). \end{aligned}$$
(102)

The space \(\varOmega \) can be discretized on a finite dimensional space of size \(n^d\). Similarly, all complexity results given in Table 1 are still valid by replacing n by \(n^d\), \(\epsilon ^{-1/s}\) by \(\epsilon ^{-d/s}\) and \(\kappa \) by \(\kappa ^d\).

6.3 Extensions to Least Regular Spaces

Until now, we assumed that the TVIR T belongs to Hilbert spaces (see, e.g., Assumption 1). This assumption was deliberately chosen easy to clarify the presentation. The results can most likely be extended to much more general spaces using nonlinear approximation theory results [18].

For instance, assume that \(T\in BV(\varOmega \times \varOmega )\), the space of functions with bounded variations. Then, it is well known (see, e.g., [11]) that T can be expressed compactly on an orthonormal basis of tensor product wavelets. Therefore, the product-convolution expansion 4 could be used by using the trick proposed in 82.

Similarly, most of the kernels found in partial differential equations (e.g., Calderòn–Zygmund operators) are singular at the origin. Once again, it is well known [32] that wavelets are able to capture the singularities and the proposed methods can most likely be applied to this setting too.

A precise setting useful for applications requires more work and we leave this issue open for future work.

6.4 Controls in Other Norms

In all the paper, we only controlled the Hilbert–Schmidt norm \(\Vert \cdot \Vert _{HS}\). This choice simplifies the analysis and also allows getting bounds for the spectral norm

$$\begin{aligned} \Vert H\Vert _{2\rightarrow 2} = \sup _{\Vert u\Vert _{L^2(\varOmega )}\le 1} \Vert Hu\Vert _{L^2(\varOmega )}, \end{aligned}$$
(103)

since \(\Vert H\Vert _{2\rightarrow 2} \le \Vert H\Vert _{HS}\). In applications, it often makes sense to consider other operator norms defined by

$$\begin{aligned} \Vert H\Vert _{X\rightarrow Y} = \sup _{\Vert u\Vert _{X}\le 1} \Vert Hu\Vert _{Y}, \end{aligned}$$
(104)

where \(\Vert \cdot \Vert _{X}\) and \(\Vert \cdot \Vert _{Y}\) are norms characterizing some function spaces. We showed in [20] that this idea could highly improve practical approximation results.

Unfortunately, it is not clear yet how to extend the proposed results and algorithms to such a setting and we leave this question open for the future. Let us mention that our previous experience shows that this idea can highly change the method’s efficiency.

7 Conclusion

In this paper, we analyzed the approximation rates and numerical complexity of product-convolution expansions. This approach was shown to be efficient whenever the time- or space-varying impulse response of the operator is well approximated by a low-rank tensor. We showed that this situation occurs under mild regularity assumptions, making the approach relevant for a large class of applications. We also proposed a few original implementations of this methods based on orthogonal wavelet decompositions and analyzed their respective advantages precisely. Finally, we suggested a few ideas to further improve the practical efficiency of the method.