Kernel embedding of measures and low-rank approximation of integral operators

Gauthier, Bertrand

doi:10.1007/s11117-024-01041-8

Kernel embedding of measures and low-rank approximation of integral operators

Open access
Published: 04 April 2024

Volume 28, article number 29, (2024)
Cite this article

Download PDF

You have full access to this open access article

Positivity Aims and scope Submit manuscript

Kernel embedding of measures and low-rank approximation of integral operators

Download PDF

Bertrand Gauthier ORCID: orcid.org/0000-0001-5469-814X¹

414 Accesses
Explore all metrics

Abstract

We describe a natural coisometry from the Hilbert space of all Hilbert-Schmidt operators on a separable reproducing kernel Hilbert space $\hbox { (RKHS)}\, \mathcal {H}$ and onto the RKHS $\mathcal {G}$ associated with the squared-modulus of the reproducing kernel of $\mathcal {H}$. Through this coisometry, trace-class integral operators defined by general measures and the reproducing kernel of $\mathcal {H}$ are isometrically represented as potentials in $\mathcal {G}$, and the quadrature approximation of these operators is equivalent to the approximation of integral functionals on $\mathcal {G}$. We then discuss the extent to which the approximation of potentials in RKHSs with squared-modulus kernels can be regarded as a differentiable surrogate for the characterisation of low-rank approximation of integral operators.

Generalized weighted Bergman–Dirichlet and Bargmann–Dirichlet spaces: explicit formulae for reproducing kernels and asymptotics

Article 12 October 2015

Integral Operators, Embedding Theorems and a Littlewood–Paley Formula on Weighted Fock Spaces

Article 21 February 2015

Integral operators with rough kernels in variable Lebesgue spaces

Article 19 May 2020

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Integral operators with positive-semidefinite (PSD) kernels play a central role in the theory of reproducing kernel Hilbert spaces (RKHSs) and their applications; see for instance [4, 5, 19, 20, 26]. As an important instance, this class of operators encompasses the PSD matrices.

Under suitable conditions, an integral operator defined by a PSD kernel K and a measure $\mu $ can be regarded as a Hilbert-Schmidt (HS) operator $L_{\mu }$ on the RKHS $\mathcal {H}$ associated with K; see e.g. [20,21,22]. Let $\mathcal {G}$ be the RKHS for which the squared-modulus kernel $|K|^{2}$ is reproducing. Following [10], when the integral of the diagonal of K with respect to the variation of $\mu $ is finite, the HS operator $L_{\mu }$ on $\mathcal {H}$ can be isometrically represented as the Riesz representation $g_{\mu }\in \mathcal {G}$ of the integral functional on $\mathcal {G}$ defined by the measure $\overline{\mu }$, the conjugate of $\mu $. The operator $L_{\mu }$ is in this case trace-class, and $g_{\mu }$ is the potential, or kernel embedding, of the measure $\mu $ in the RKHS $\mathcal {G}$. In the Hilbert space $\textrm{HS}(\mathcal {H})$ of all HS operators on $\mathcal {H}$, the quadrature approximation of trace-class integral operators with kernel K is hence equivalent to the approximation of integral functionals on $\mathcal {G}$. Considering another measure $\nu $ and denoting by $B_{\mathcal {G}}$ the closed unit ball of $\mathcal {G}$, we more specifically have

$$\begin{aligned} \Vert L_{\mu }-L_{\nu }\Vert _{\textrm{HS}(\mathcal {H})} =\Vert g_{\mu }-g_{\nu }\Vert _{\mathcal {G}} =\sup _{g\in B_{\mathcal {G}}}\bigg |\int g(t)\textrm{d}\mu (t)-\int g(t)\textrm{d}\nu (t)\bigg |, \end{aligned}$$

so that the map $(\mu ,\nu )\mapsto \Vert L_{\mu }-L_{\nu }\Vert _{\textrm{HS}(\mathcal {H})}$ corresponds to a generalised integral probability metric, or maximum mean discrepancy (see e.g. [2, 15, 16, 24, 27]).

We give an overall description of the framework surrounding such an isometric representation, and illustrate that it follows from the definition of a natural coisometry $\Gamma $ from $\textrm{HS}(\mathcal {H})$ onto $\mathcal {G}$; this coisometry maps self-adjoint operators to real-valued functions, and PSD operators to nonnegative functions (Sect. 2). Under adequate measurability conditions on K and assuming that the diagonal of K is integrable with respect to $|\mu |$, we show that $L_{\mu }$ always belongs to the initial space of $\Gamma $, and that $\Gamma [L_{\mu }]=g_{\mu }$. We then describe the equivalence between the quadrature approximation of integral operators with PSD kernels and the approximation of potentials in RKHSs with squared-modulus kernels (Sect. 3).

For an approximate measure $\nu $, and denoting by $\mathcal {H}_{\nu }$ the closure in $\mathcal {H}$ of the range of $L_{|\nu |}$ (so that when $\nu $ is finitely-supported, $\mathcal {H}_{\nu }$ is fully characterised by the support $\hbox { of}\, \nu $), we next investigate the extent to which the approximation of potentials in $\mathcal {G}$ can be used as a differentiable surrogate for the characterisation of approximations of $L_{\mu }$ of the form $P_{\nu }L_{\mu }$, $L_{\mu }P_{\nu }$ or $P_{\nu }L_{\mu }P_{\nu }$, with $P_{\nu }$ the orthogonal projection from $\mathcal {H}$ onto $\mathcal {H}_{\nu }$ (Sect. 4). When the measure $\mu $ is nonnegative, the operator $L_{\mu }$ admits the decomposition $L_{\mu }=\iota _{\mu }^{*}\iota _{\mu }^{}$, with $\iota _{\mu }^{}$ the natural embedding of $\mathcal {H}$ in $L^{2}(\mu )$. The three operators

$$\begin{aligned} \iota _{\mu }^{*}:L^{2}(\mu )\rightarrow \mathcal {H}, \quad \iota _{\mu }^{}\iota _{\mu }^{*}:L^{2}(\mu )\rightarrow L^{2}(\mu ), \quad \text {and}\quad \iota _{\mu }^{}\iota _{\mu }^{*}\iota _{\mu }^{}:\mathcal {H}\rightarrow L^{2}(\mu ), \end{aligned}$$

can then also be regarded as integral operators defined by the kernel K and the measure $\mu $, and through the partial embedding $\iota _{\mu }^{}P_{\nu }$, a measure $\nu $ characterises approximations of each of these operators. We study the properties of these approximations and further illustrate the connections between the low-rank approximation of integral operators with PSD kernels and the approximation of potentials in RKHSs with squared-modulus kernels. We also describe the link between the considered framework and the low-rank approximation of PSD matrices through column sampling (Sect. 5). The presentation ends with a concluding discussion (Sect. 6) and some technical results are gathered in appendix (Appendix A). The approximation schemes considered in this note should be apprehended from the the point of view of numerical strategies for discretisation or dimension reduction; in practical applications, approximations will generally be characterised by finitely-supported measures.

2 Framework, notations and basic properties

By default, all the Hilbert spaces considered in this note are complex; they are otherwise explicitly referred to as real Hilbert spaces; we use a similar convention for vector spaces. Inner products of complex Hilbert spaces are assumed to be linear with respect to their right argument. For $z\in \mathbb {C}$, we denote by $\overline{z}$, |z| and $\Re (z)$ the conjugate, modulus and real part of z, respectively, and $\textrm{i}\in \mathbb {C}$ is the imaginary unit. By analogy, for a complex-valued function f on a general set S, we denote by $\overline{f}$ and |f| the functions defined as $\overline{f}(s)=\overline{f(s)}$ and $|f|(s)=|f(s)|$, $s\in S$; we also use the notation $|f|^{2}$ to refer to the function $s\mapsto |f(s)|^{2}$.

For two Hilbert spaces H and F, we denote by $A^{*}$ the adjoint of a bounded linear operator $A:H\rightarrow F$. The map A is an isometry if $A^{*}A={{\,\textrm{id}\,}}_{H}$, the identity operator on H, and A is a coisometry if $A^{*}$ is an isometry (and so $AA^{*}={{\,\textrm{id}\,}}_{F}$). A coisometry A is a surjective partial isometry (that is, $AA^{*}A=A$), and $A^{*}A$ is then the orthogonal projection from H onto the initial space $\mathcal {I}(A)$ of A, with $\mathcal {I}(A)$ the orthogonal complement in H of the nullspace of A. We denote by ${{\,\textrm{null}\,}}(A)$ the nullspace of A, and by ${{\,\textrm{range}\,}}(A)$ its range. Also, for a subset C of H, we denote by $C^{\perp _{H}}$ the orthogonal complement of C in H, and by $\overline{C}^{H}$ the closure of C in H.

2.1 RKHSs and Hilbert–Schmidt operators

Below, we introduce the various Hilbert spaces relevant to our study.

Underlying RKHS. Let $\mathcal {H}$ be a separable RKHS of complex-valued functions on a general set $\mathscr {X}$, with reproducing kernel $K:\mathscr {X}\times \mathscr {X}\rightarrow \mathbb {C}$; see e.g. [1, 17]. For $t\in \mathscr {X}$, let $k_{t}\in \mathcal {H}$ be defined as $k_{t}(x)=K(x,t)$, $x\in \mathscr {X}$. For all $h\in \mathcal {H}$, we have $\langle {k_{t}\,|\,h}\rangle _{\mathcal {H}}=h(t)$, where $\langle {.|.}\rangle _{\mathcal {H}}$ stands for the inner product of $\mathcal {H}$ (this equality is often referred to as the reproducing property); we denote by $\Vert {\cdot }\Vert _{\mathcal {H}}$ the norm of $\mathcal {H}$, and we use a similar convention for the inner products and norms of all the Hilbert spaces encountered in this note.

Hilbert-Schmidt space. Let $\textrm{HS}(\mathcal {H})$ be the Hilbert space of all HS operators on $\mathcal {H}$; see e.g. [3, 7]. For $T\in \textrm{HS}(\mathcal {H})$, we denote by $T[h]\in \mathcal {H}$ the image of $h\in \mathcal {H}$ through T, and by T[h](x) the value of the function T[h] at $x\in \mathscr {X}$; we use similar notations for all function-valued operators. For a and $b\in \mathcal {H}$, let $T_{a,b}\in \textrm{HS}(\mathcal {H})$ be the rank-one operator given by

$$\begin{aligned} T_{a,b}[h] =a\langle {b\,|\,h}\rangle _{\mathcal {H}}, h\in \mathcal {H}; \end{aligned}$$

we also set $S_{b}=T_{b,b}$.

Remark 2.1

An operator $T\in \textrm{HS}(\mathcal {H})$ always admits a singular value decomposition (SVD) of the form $T=\sum _{i\in \mathbb {I}}\sigma _{i}T_{u_{i},v_{i}}$, $\mathbb {I}\subseteq \mathbb {N}$, where $\{\sigma _{i}\}_{i\in \mathbb {I}}\in \ell ^{2}(\mathbb {I})$ is the set of all strictly-positive singular values of T, and where $\{u_{i}\}_{i\in \mathbb {I}}$ and $\{v_{i}\}_{i\in \mathbb {I}}$ are two orthonormal systems $\hbox { in}\ \mathcal {H}$; the series converges in $\textrm{HS}(\mathcal {H})$. $\triangleleft $

Remark 2.2

Let $\mathcal {H}'$ be the continuous dual of $\mathcal {H}$. For $h\in \mathcal {H}$, let $\xi _{h}\in \mathcal {H}'$ be the bounded linear functional such that $\xi _{h}(f)=\langle {h\,|\,f}\rangle _{\mathcal {H}}$, $f\in \mathcal {H}$. Endowed with the inner product $\langle {\xi _{f}|\xi _{h}}\rangle _{\mathcal {H}'}=\langle {h\,|\,f}\rangle _{\mathcal {H}}$, the vector space $\mathcal {H}'$ is a Hilbert space, and the Riesz map $h\mapsto \xi _{h}$ is a bijective conjugate-linear isometry form $\mathcal {H}$ to $\mathcal {H}'$ (we may notice that $\xi _{\alpha h}=\overline{\alpha }\xi _{h}$, $\alpha \in \mathbb {C}$). The linear map densely defined as $T_{a,b}\mapsto a\otimes \xi _{b}$ (see Remark 2.1) is then a bijective isometry from the Hilbert space $\textrm{HS}(\mathcal {H})$ to the tensor Hilbert space $\mathcal {H}\otimes \mathcal {H}'$. $\triangleleft $

Conjugate RKHS. Let $\overline{\mathcal {H}}$ be the RKHS of complex-valued functions on $\mathscr {X}$ associated with the conjugate kernel $\overline{K}$, with $\overline{K}(x,t)=\overline{K(x,t)}$, x and $t\in \mathscr {X}$. For all $h\in \mathcal {H}$, we have $\overline{h}\in \overline{\mathcal {H}}$ (that is, the function $\overline{h}:x\mapsto \overline{h(x)}$ is a vector of $\overline{\mathcal {H}}$), and the map $h\mapsto \overline{h}$ is a bijective conjugate-linear isometry from $\mathcal {H}$ to $\overline{\mathcal {H}}$. We have $\overline{k_{t}}(x)=\overline{K(x,t)}=\overline{k_{t}(x)}$, and

$$\begin{aligned} \langle {\overline{k_{x}}|\overline{k_{t}}}\rangle _{\overline{\mathcal {H}}} =\overline{K(x,t)} =K(t,x) =\langle {k_{t}|k_{x}}\rangle _{\mathcal {H}}. \end{aligned}$$

We denote by $\Psi $ the bijective linear isometry from $\textrm{HS}(\mathcal {H})$ to the tensor Hilbert space $\mathcal {H}\otimes \overline{\mathcal {H}}$, densely defined as $\Psi (T_{a,b})=a\otimes \overline{b}$, a and $b\in \mathcal {H}$.

Remark 2.3

Following Remark 2.2, the linear map $\xi _{h}\mapsto \overline{h}$ is a bijective isometry form $\mathcal {H}'$ to $\overline{\mathcal {H}}$. Further, the linear map densely defined as $a\otimes \xi _{b}\mapsto a\otimes \overline{b}$ is a bijective isometry form $\mathcal {H}\otimes \mathcal {H}'$ to $\mathcal {H}\otimes \overline{\mathcal {H}}$; the composition of this isometry with the bijective isometry from $\textrm{HS}(\mathcal {H})$ to $\mathcal {H}\otimes \mathcal {H}'$ discussed in Remark 2.2 yields the isometry $\Psi :\textrm{HS}(\mathcal {H})\rightarrow \mathcal {H}\otimes \overline{\mathcal {H}}$; see the diagram (5). $\triangleleft $

Squared-kernel RKHS. The kernels K and $\overline{K}$ being PSD, by the Schur-product theorem, so is the squared-modulus kernel $|K|^{2}=K\overline{K}$, with

$$\begin{aligned} |K|^{2}(x,t) =K(x,t)\overline{K(x,t)} =|K(x,t)|^{2}=|k_{t}|^{2}(x), x \text { and } t\in \mathscr {X}. \end{aligned}$$

Let $\mathcal {G}$ be the RKHS of complex-valued functions on $\mathscr {X}$ for which $|K|^{2}$ is reproducing ($\mathcal {G}=\mathcal {H}\odot \overline{\mathcal {H}}$ is the product of the two RKHSs $\mathcal {H}$ and $\overline{\mathcal {H}}$; see e.g. [1, 17]).

Following [17, Chapter 5], we denote by $C_{\Delta }:\mathcal {H}\otimes \overline{\mathcal {H}}\rightarrow \mathcal {G}$ the coisometry densely defined as $C_{\Delta }(a\otimes \overline{b})=a\overline{b}$, a and $b\in \mathcal {H}$, where $a\overline{b}\in \mathcal {G}$ is the complex-valued function on $\mathscr {X}$ given by

$$\begin{aligned} \big (a\overline{b}\big )(x) =a(x)\overline{b(x)} =\langle {|k_{x}|^{2}| a\overline{b}}\rangle _{\mathcal {G}} =\langle {k_{x}\otimes \overline{k_{x}}|a\otimes \overline{b}}\rangle _{\mathcal {H}\otimes \overline{\mathcal {H}}}, x\in \mathscr {X}. \end{aligned}$$

For $\Upsilon \in \mathcal {H}\otimes \overline{\mathcal {H}}$, we more generally have

$$\begin{aligned} C_{\Delta }[\Upsilon ](x) =\langle {|k_{x}|^{2}|C_{\Delta }[\Upsilon ]}\rangle _{\mathcal {G}} =\langle {k_{x}\otimes \overline{k_{x}}|\Upsilon }\rangle _{\mathcal {H}\otimes \overline{\mathcal {H}}}. \end{aligned}$$

(1)

The initial space of $C_{\Delta }$ is $\mathcal {I}(C_{\Delta })=\overline{{{\,\textrm{span}\,}}_{\mathbb {C}}\{k_{x}\otimes \overline{k_{x}} | x\in \mathscr {X}\}}^{\mathcal {H}\otimes \overline{\mathcal {H}}}$, the closure in $\mathcal {H}\otimes \overline{\mathcal {H}}$ of the linear space spanned by the simple tensors $k_{x}\otimes \overline{k_{x}}$, $x\in \mathscr {X}$.

Remark 2.4

From (1), for all $x\in \mathscr {X}$, we have $C_{\Delta }^{*}[|k_{x}|^{2}]=k_{x}\otimes \overline{k_{x}}$. The linear space ${{\,\textrm{span}\,}}_{\mathbb {C}}\{|k_{x}|^{2} | x\in \mathscr {X}\}$ being dense in $\mathcal {G}$ (see for instance [17, Chapter 2]), we have space $\text {span}_{c}\{ |k_{x|^{2}}$ have $C_{\Delta }^{\,}C_{\Delta }^{*}={{\,\textrm{id}\,}}_{\mathcal {G}}$, so that $C_{\Delta }^{*}$ is an isometry. $\triangleleft $

2.2 Natural coisometry from $\textrm{HS}(\mathcal {H})$ onto $\mathcal {G}$

We can now define a natural coisometry from the Hilbert space $\textrm{HS}(\mathcal {H})$ of all HS operators on a RKHS $\mathcal {H}$, and onto the RKHS $\mathcal {G}$ associated with the squared-modulus of the reproducing kernel of $\mathcal {H}$. The terminology natural is used to emphasise that the considered construction does not depend on the choice of any specific basis.

Lemma 2.1

The linear map $\Gamma =C_{\Delta }\Psi :\textrm{HS}(\mathcal {H})\rightarrow \mathcal {G}$ is a coisometry, and its initial space is

$$\begin{aligned} \mathcal {I}(\Gamma )=\overline{{{\,\textrm{span}\,}}_{\mathbb {C}}\{S_{k_{x}} | x\in \mathscr {X}\}}^{\textrm{HS}(\mathcal {H})}; \end{aligned}$$

(2)

in addition, for all $T\in \textrm{HS}(\mathcal {H})$, we have

$$\begin{aligned} \Gamma [T](x) =\langle {S_{k_{x}}|T}\rangle _{\textrm{HS}(\mathcal {H})} =\langle {k_{x}|T[k_{x}]}\rangle _{\mathcal {H}} =T[k_{x}](x), x\in \mathscr {X}. \end{aligned}$$

(3)

Proof

The linear isometry $\Psi $ being bijective, we have $\Psi \Psi ^{*}={{\,\textrm{id}\,}}_{\mathcal {H}\otimes \overline{\mathcal {H}}}$, and so

$$\begin{aligned} \Gamma \Gamma ^{*}=C^{\,}_{\Delta }\Psi \Psi ^*C_{\Delta }^*=C_{\Delta }^{\,}C_{\Delta }^{*}={{\,\textrm{id}\,}}_{\mathcal {G}}. \end{aligned}$$

By definition of $C_{\Delta }$ and $\Psi $, we have $\Gamma ^{*}[|k_{x}|^{2}]=\Psi ^{*}[k_{x}\otimes \overline{k_{x}}]=S_{k_{x}}$, $x\in \mathscr {X}$, so that (2) follows from the density of ${{\,\textrm{span}\,}}_{\mathbb {C}}\{|k_{x}|^{2} | x\in \mathscr {X}\}$ in $\mathcal {G}$ (see Remark 2.4). The reproducing property in $\mathcal {G}$ then gives

$$\begin{aligned} \Gamma [T](x) =\langle {|k_{x}|^{2}|\Gamma [T]}\rangle _{\mathcal {G}} =\langle {S_{k_{x}}|T}\rangle _{\textrm{HS}(\mathcal {H})}, T\in \textrm{HS}(\mathcal {H}). \end{aligned}$$

We next observe that

$$\begin{aligned} \langle {T_{a,b}|T}\rangle _{\textrm{HS}(\mathcal {H})}=\langle {a|T[b]}\rangle _{\mathcal {H}}, a \ \text {and}\ \ b\in \mathcal {H}; \end{aligned}$$

(4)

indeed, as $T_{a,0}=0$, equality (4) trivially holds for $b=0$, and for $b\ne 0$, we have $\langle {T_{a,b}|T}\rangle _{\textrm{HS}(\mathcal {H})}=\langle {T_{a,b}[b]|T[b]}\rangle _{\mathcal {H}}/\Vert b\Vert _{\mathcal {H}}^{2}$, with $T_{a,b}[b]=a\Vert b\Vert _{\mathcal {H}}^{2}$. Taking ${a=b={k_{x}}}$ in (4) gives $\langle {S_{k_{x}}| T}\rangle _{\textrm{HS}(\mathcal {H})}=\langle {k_{x}|T[k_{x}]}\rangle _{\mathcal {H}}=T[k_{x}](x)$, concluding the proof. $\square $

The following diagram summarises the construction of $\Gamma $ (the $\cong $ symbol refers to the two bijective linear isometries discussed in Remarks 2.2 and 2.3).

(5)

Through $\Gamma $, the HS operators on $\mathcal {H}$ belonging to $\mathcal {I}(\Gamma )$ can be isometrically represented as functions in the RKHS $\mathcal {G}$ associated with the squared-modulus kernel $|K|^{2}$. In the framework of Remark 2.1, we may notice that if $T=\sum _{i\in \mathbb {I}}\sigma _{i}T_{u_{i},v_{i}}\in \textrm{HS}(\mathcal {H})$, then $\Gamma [T]=\sum _{i\in \mathbb {I}}\sigma _{i}u_{i}\overline{v_{i}}$.

Lemma 2.2

The following assertions hold:

(i)
if $T\in \textrm{HS}(\mathcal {H})$ is self-adjoint, then the function $\Gamma [T]$ is real-valued;
(ii)
if $T\in \textrm{HS}(\mathcal {H})$ is PSD, then the function $\Gamma [T]$ is nonnegative;
(iii)
if $T\in \textrm{HS}(\mathcal {H})$ is PSD and $\Gamma [T]=0$, then $T=0$; and
(iv)
if $T\in \mathcal {I}(\Gamma )$, then $T^{*}\in \mathcal {I}(\Gamma )$.

Proof

Assertions (i) and (ii) follow directly form (3). We assume that $T\in \textrm{HS}(\mathcal {H})$ is PSD, and we consider a spectral expansion $T=\sum _{j\in \mathbb {I}}\lambda _{j}S_{\varphi _{j}}$ of T, with $\lambda _{j}\geqslant 0$, $\varphi _{j}\in \mathcal {H}$ and $\mathbb {I}\subseteq \mathbb {N}$; observing that $\Gamma [S_{\varphi _{j}}]=|\varphi _{j}|^{2}$, $j\in \mathbb {I}$, we obtain (iii). To prove assertion (iv), we first observe that if $g\in \mathcal {G}$, then $\overline{g}\in \mathcal {G}$ (that is, the function $\overline{g}$ is a vector of $\mathcal {G}$); the map $\Gamma $ is indeed surjective, and if $g=\Gamma [T]$, $T\in \textrm{HS}(\mathcal {H})$, then $\overline{g}=\Gamma [T^{*}]$. By linearity, the real and imaginary parts of g are then also vectors of $\mathcal {G}$, and so $\Vert g\Vert _{\mathcal {G}}=\Vert \overline{g}\Vert _{\mathcal {G}}$ (see for instance [17, Chapter 5]; see also Remark 2.6). Since $\Gamma $ is a partial isometry, for $T\in \textrm{HS}(\mathcal {H})$, we have $\Vert T\Vert _{\textrm{HS}(\mathcal {H})}\geqslant \Vert \Gamma [T]\Vert _{\mathcal {G}}$, with equality if and only if $T\in \mathcal {I}(\Gamma )$; as $\Vert T\Vert _{\textrm{HS}(\mathcal {H})}=\Vert T^{*}\Vert _{\textrm{HS}(\mathcal {H})}$, the result follows. $\square $

Remark 2.5

The diagram (5) is also well-defined when the involved Hilbert spaces are real. We in this case have $\mathcal {I}(\Gamma )=\overline{{{\,\textrm{span}\,}}_{\mathbb {R}}\{S_{k_{x}} | x\in \mathscr {X}\}}^{\textrm{HS}(\mathcal {H})}$ and the operators in $\mathcal {I}(\Gamma )$ are self-adjoint; also if $T^{*}=-T$, then $\Gamma [T]=0$. By comparison, in the complex case, if $T^{*}=-T$, then the function $\Gamma [T]$ is pure-imaginary. $\triangleleft $

Remark 2.6

The PSD kernel $|K|^{2}$ being real-valued, it is the reproducing kernel of a real RKHS $\mathcal {G}_{\mathbb {R}}$ of real-valued functions on $\mathscr {X}$. The decomposition ${\mathcal {G}=\mathcal {G}_{\mathbb {R}}+\textrm{i}\mathcal {G}_{\mathbb {R}}}$ holds, and $\mathcal {G}_{\mathbb {R}}$ is the real-linear subspace of all real-valued functions in $\mathcal {G}$. This decomposition mirrors the decomposition $\textrm{HS}(\mathcal {H})=\textrm{HS}_{\mathbb {R}}(\mathcal {H})+\textrm{i}\textrm{HS}_{\mathbb {R}}(\mathcal {H})$, with $\textrm{HS}_{\mathbb {R}}(\mathcal {H})\subset \textrm{HS}(\mathcal {H})$ the real-linear subspace of all self-adjoint HS operators on $\mathcal {H}$. Also, the real convex cone $\textrm{HS}_{\mathbb {R}}^{+}(\mathcal {H})\subset \textrm{HS}_{\mathbb {R}}(\mathcal {H})$ of all PSD HS operators on $\mathcal {H}$ is generating in $\textrm{HS}_{\mathbb {R}}(\mathcal {H})$, and the real convex cone $\mathcal {G}_{\mathbb {R}}^{+}\subset \mathcal {G}_{\mathbb {R}}$ of all nonnegative functions in $\mathcal {G}_{\mathbb {R}}$ is generating in $\mathcal {G}_{\mathbb {R}}$. $\triangleleft $

Remark 2.7

Let $\mathcal {F}$ be another separable RKHS of complex-valued functions $\hbox { on}\ \mathscr {X}$, with reproducing kernel $J:\mathscr {X}\times \mathscr {X}\rightarrow \mathbb {C}$. We denote by $\textrm{HS}(\mathcal {F}, \mathcal {H})$ the Hilbert space of all HS operators from $\mathcal {F}$ to $\mathcal {H}$, and let $\mathcal {H}\odot \overline{\mathcal {F}}$ be the product of the RKHSs $\mathcal {H}$ and $\overline{\mathcal {F}}$, that is, the RKHS with kernel $K\overline{J}$. Following (5), we can more generally define a natural coisometry from $\textrm{HS}(\mathcal {F}, \mathcal {H})$ onto $\mathcal {H}\odot \overline{\mathcal {F}}$. $\triangleleft $

3 Trace-class integral operators with PSD kernels

From Lemma 2.1, if $T\in \textrm{HS}(\mathcal {H})$ is of the form $T=\sum _{j=1}^{n}\omega _{j}S_{k_{s_{j}}}$, with $n\in \mathbb {N}$, $s_{j}\in \mathscr {X}$ and $\omega _{j}\in \mathbb {C}$, then $T\in \mathcal {I}(\Gamma )$. We in this case have

$$\begin{aligned} T[h](x) =\sum _{j=1}^{n}\omega _{j}k_{s_{j}}(x)\langle {k_{s_{j}}|h}\rangle _{\mathcal {H}} =\sum _{j=1}^{n}\omega _{j}K(x,s_{j})h(s_{j}), h\in \mathcal {H}, x\in \mathscr {X}, \end{aligned}$$

so that T can be regarded as an integral operator on $\mathcal {H}$ defined by the kernel K and the finitely-supported measure $\sum _{j=1}^{n}\omega _{j}\delta _{s_{j}}$, with $\delta _{x}$ the Dirac measure at ${x\in \mathscr {X}}$. We also have $\Gamma [T]=\sum _{j=1}^{n}\omega _{j}|k_{s_{j}}|^{2}$, so that $\langle {\Gamma [T]| g}\rangle _{\mathcal {G}}=\sum _{j=1}^{n}\overline{\omega _{j}}g(s_{j})$, $g\in \mathcal {G}$, and $\Gamma [T]$ is thus the Riesz representation of the integral functional on $\mathcal {G}$ defined by the measure $\sum _{j=1}^{n}\overline{\omega _{j}}\delta _{s_{j}}$. Under measurability conditions, this observation holds for all trace-class integral operators on $\mathcal {H}$ defined by the reproducing kernel K of $\mathcal {H}$ and general measures on $\mathscr {X}$, as illustrated below.

3.1 Integral operators and kernel embedding of measures

Let $\mathcal {A}$ be a $\sigma $-algebra of subsets of $\mathscr {X}$. We consider the Borel $\sigma $-algebra of $\mathbb {C}$, and make the following assumptions on K and the measurable space $(\mathscr {X},\mathcal {A})$:

A.1
for all $t\in \mathscr {X}$, the function $k_{t}:\mathscr {X}\rightarrow \mathbb {C}$ is measurable;
A.2
the diagonal of K is measurable.

We recall that $K(t,t)=\Vert k_{t}\Vert _{\mathcal {H}}^{2} =\Vert S_{k_{t}}\Vert _{\textrm{HS}(\mathcal {H})}=\Vert |k_{t}|^{2} \Vert _{\mathcal {G}}, t\in \mathscr {X}$.

Remark 3.1

The RKHSs $\mathcal {H}$ and $\mathcal {G}$ being separable, A.1 ensures that all the functions in $\mathcal {H}$ and $\mathcal {G}$ are measurable; see for instance [25, Lemma 4.24]. Consequently, under A.1, the maps $t\mapsto k_{t}$, $t\mapsto |k_{t}|^{2}$ and $t\mapsto S_{k_{t}}$, $t\in \mathscr {X}$, are weakly-measurable, and since the Hilbert spaces $\mathcal {H}$, $\mathcal {G}$ and $\textrm{HS}(\mathcal {H})$ are separable, by the Pettis measurability theorem, these maps are also strongly-measurable (see e.g. [8, 28]). $\triangleleft $

We denote by $\mathcal {M}_{+}$, $\mathcal {M}$, and $\mathcal {M}_{\mathbb {C}}$ the set of all nonnegative, signed and complex measures^{Footnote 1} on $(\mathscr {X},\mathcal {A})$, and we set $\mathcal {M}_{\mathbb {F}}=\mathcal {M}\cup \mathcal {M}_{\mathbb {C}}$ (we have $\mathcal {M}_{+}\subset \mathcal {M}$). Noticing that $K(t,t)\geqslant 0$, $t\in \mathscr {X}$, from A.2, we define

$$\begin{aligned} \tau _{\mu }=\int _{\mathscr {X}}K(t,t)\textrm{d}|\mu |(t)\in \mathbb {R}_{\geqslant 0}\cup \{+\infty \}, \mu \in \mathcal {M}_{\mathbb {F}}. \end{aligned}$$

We next introduce the sets $\mathcal {T}_{+}(K)$, $\mathcal {T}(K)$ and $\mathcal {T}_{\mathbb {C}}(K)$ of all measures $\mu $ in $\mathcal {M}_{+}$, $\mathcal {M}$, and $\mathcal {M}_{\mathbb {C}}$ such that $\tau _{\mu }$ is finite, respectively; the inclusion $\mathcal {T}_{+}(K)\subset \mathcal {T}(K)$ holds, and we set $\mathcal {T}_{\mathbb {F}}(K)=\mathcal {T}(K)\cup \mathcal {T}_{\mathbb {C}}(K)$.

Integral operators on $\mathcal {H}$ with kernel K. By assumption, for ${\mu \in \mathcal {T}_{\mathbb {F}}(K)}$, the integral $\int _{\mathscr {X}}\Vert S_{k_{t}}\Vert _{\textrm{HS}(\mathcal {H})}\textrm{d}|\mu |(t)=\tau _{\mu }$ is finite, and the map $t\mapsto S_{k_{t}}$ is thus Bochner-integrable with respect to $\mu $ (Bochner integrability criterion, see e.g. [8, 28]; see also Remark 3.1). We set

$$\begin{aligned} L_{\mu }=\int _{\mathscr {X}}S_{k_{t}}\textrm{d}\mu (t)\in \textrm{HS}(\mathcal {H}). \end{aligned}$$

From (4), for $h\in \mathcal {H}$ and $x\in \mathscr {X}$, we have

$$\begin{aligned} L_{\mu }[h](x) =\langle {T_{k_{x},h}|L_{\mu }}\rangle _{\textrm{HS}(\mathcal {H})} =\int _{\mathscr {X}}\langle {T_{k_{x},h}|S_{k_t}}\rangle _{\textrm{HS}(\mathcal {H})}\textrm{d}\mu (t) =\int _{\mathscr {X}}K(x,t)h(t)\textrm{d}\mu (t), \end{aligned}$$

so that $L_{\mu }\in \textrm{HS}(\mathcal {H})$ can be regarded as an integral operator on $\mathcal {H}$ defined by the kernel K and the measure $\mu $.

Remark 3.2

For $\mu \in \mathcal {T}_{\mathbb {F}}(K)$, the operator $L_{\mu }\in \textrm{HS}(\mathcal {H})$ is the Riesz representation of the bounded linear functional $Z_{\mu }:\textrm{HS}(\mathcal {H})\rightarrow \mathbb {C}$ given by

$$\begin{aligned} Z_{\mu }(T) =\int _{\mathscr {X}}\langle {S_{k_{t}}|T}\rangle _{\textrm{HS}(\mathcal {H})}\textrm{d}\overline{\mu }(t), T\in \textrm{HS}(\mathcal {H}), \end{aligned}$$

that is, $Z_{\mu }(T)=\langle {L_{\mu }|T}\rangle _{\textrm{HS}(\mathcal {H})}$; from the CS inequality in $\textrm{HS}(\mathcal {H})$, we in particular have $|Z_{\mu }(T)| \leqslant \int _{\mathscr {X}}|\langle {S_{k_{t}}|T}\rangle _{\textrm{HS}(\mathcal {H})}|\textrm{d}|\mu |(t) \leqslant \Vert T\Vert _{\textrm{HS}(\mathcal {H})}\tau _{\mu }$.

By boundedness of the linear evaluation map $T\mapsto T[h]$ from $\textrm{HS}(\mathcal {H})$ to $\mathcal {H}$, we obtain (see for instance [28, Chapter 5])

$$\begin{aligned} L_{\mu }[h] =\int _{\mathscr {X}}S_{k_{t}}[h]\textrm{d}\mu (t) =\int _{\mathscr {X}}k_{t}h(t)\textrm{d}\mu (t), h\in \mathcal {H}, \end{aligned}$$

with, from the CS inequality in $\mathcal {H}$, $\int _{\mathscr {X}}\Vert k_{t}\Vert _{\mathcal {H}}|h(t)|\textrm{d}|\mu |(t)\leqslant \Vert h\Vert _{\mathcal {H}}\tau _{\mu }$. We also have

$$\begin{aligned} \langle {L_{\mu }[h]|f}\rangle _{\mathcal {H}} =\int _{\mathscr {X}}\overline{h(t)}f(t)\textrm{d}\overline{\mu }(t), h \text { and }f\in \mathcal {H}, \end{aligned}$$

(6)

so that $L_{\mu }[h]$ is the Riesz representation of the bounded linear functional $\Theta _{h,\mu }$ on $\mathcal {H}$, with $\Theta _{h,\mu }:f\mapsto \int _{\mathscr {X}}\overline{h(t)}f(t)\textrm{d}\overline{\mu }(t)$, $f\in \mathcal {H}$ (and $|\Theta _{h,\mu }(f)|\leqslant \Vert h\Vert _{\mathcal {H}}\Vert f\Vert _{\mathcal {H}}\tau _{\mu }$). $\triangleleft $

Lemma 3.1

For all $\mu \in \mathcal {T}_{\mathbb {F}}(K)$, the operator $L_{\mu }$ is trace-class.

Proof

Let $\{h_{i}\}_{i\in \mathbb {I}}$ be an orthonormal basis (ONB) of $\mathcal {H}$, with $\mathbb {I}\subseteq \mathbb {N}$. For all $t\in \mathscr {X}$, we have $k_{t}=\sum _{i\in \mathbb {I}}h_{i}\overline{h_{i}(t)}$, so that $\{h_{i}(t)\}_{i\in \mathbb {I}}\in \ell ^{2}(\mathbb {I})$ and $\sum _{i\in \mathbb {I}}|h_{i}(t)|^{2}=K(t,t)$; see e.g. [17, Chapter 2]. Let $\{f_{i}\}_{i\in \mathbb {I}}$ be another ONB of $\mathcal {H}$; from (6), and by monotone convergence and the CS inequality in $\ell ^{2}(\mathbb {I})$, we obtain

$$\begin{aligned} \sum _{i\in \mathbb {I}}&\big |\langle {f_{i}|L_{\mu }[h_{i}]}\rangle _{\mathcal {H}}\big | \leqslant \sum _{i\in \mathbb {I}}\int _{\mathscr {X}}|f_{i}(t)| |h_{i}(t)| \textrm{d}|\mu |(t) \\&\qquad =\int _{\mathscr {X}}\sum _{i\in \mathbb {I}}|f_{i}(t)| |h_{i}(t)|\textrm{d}|\mu |(t) \leqslant \int _{\mathscr {X}}\sqrt{K(t,t)}\sqrt{K(t,t)}\textrm{d}|\mu |(t) =\tau _{\mu }, \end{aligned}$$

so that ${{\,\textrm{trace}\,}}(|L_{\mu }|)\leqslant \tau _{\mu }$, with $|L_{\mu }|=(L_{\mu }^{*}L_{\mu })^{1/2}$ the modulus of $L_{\mu }$. $\square $

Kernel embedding of measures in $\mathcal {G}$. By assumption again, for $\mu \in \mathcal {T}_{\mathbb {F}}(K)$, the integral $\int _{\mathscr {X}}\Vert |k_{t}|^{2}\Vert _{\mathcal {G}}\textrm{d}|\mu |(t)$ is finite, and the map $t\mapsto |k_{t}|^{2}$ is therefore Bochner-integrable with respect to $\mu $. We set

$$\begin{aligned} g_{\mu }=\int _{\mathscr {X}}|k_{t}|^{2}\textrm{d}\mu (t)\in \mathcal {G}. \end{aligned}$$

We have $\langle {g_{\mu }|g}\rangle _{\mathcal {G}} =\int _{\mathscr {X}}g(t)\textrm{d}\overline{\mu }(t)$, $g\in \mathcal {G}$, so that $g_{\mu }$ is the Riesz representation of the linear functional $I_{\mu }:\mathcal {G}\rightarrow \mathbb {C}$, with $I_{\mu }(g)=\int _{\mathscr {X}}g(t)\textrm{d}\overline{\mu }(t)$; we may observe that $| I_{\mu }(g) |\leqslant \Vert g\Vert _{\mathcal {G}}\tau _{\mu }$, and that $g_{\mu }(x)=\int _{\mathscr {X}}|K(x,t)|^{2}\textrm{d}\mu (t)$, $x\in \mathscr {X}$. The vector $g_{\mu }$ is referred to as the kernel embedding, or potential, of the measure $\mu $ in the RHKS $\mathcal {G}$; see for instance [6, 15, 24].

Theorem 3.1

For all $\mu \in \mathcal {T}_{\mathbb {F}}(K)$, we have $L_{\mu }\in \mathcal {I}(\Gamma )$ and $\Gamma [L_{\mu }]=g_{\mu }$.

Proof

From Lemma 2.1 and by definition of $L_{\mu }$ and $g_{\mu }$, for all $T\in \textrm{HS}(\mathcal {H})$, we have

$$\begin{aligned} \langle {\Gamma ^{*}[g_{\mu }]|T}\rangle _{\textrm{HS}(\mathcal {H})} =\langle {g_{\mu }|\Gamma [T]}\rangle _{\mathcal {G}} =\int _{\mathscr {X}}\langle {S_{k_{t}}|T}\rangle _{\textrm{HS}(\mathcal {H})}\textrm{d}\overline{\mu }(t) =\langle {L_{\mu }|T}\rangle _{\textrm{HS}(\mathcal {H})}, \end{aligned}$$

so that $\Gamma ^{*}[g_{\mu }]=L_{\mu }$. $\square $

Remark 3.3

Following Lemma 2.2, for a signed measure $\mu \in \mathcal {T}(K)$, the function $g_{\mu }$ is real-valued, and the operator $L_{\mu }=\Gamma ^{*}[g_{\mu }]$ is self-adjoint. Also, for a nonnegative measure $\mu \in \mathcal {T}_{+}(K)$, the function $g_{\mu }$ is nonnegative, and the operator $L_{\mu }$ is PSD. We may notice that $L_{\delta _{x}}=S_{k_{x}}$, $x\in \mathscr {X}$. $\triangleleft $

From Theorem 3.1, for $\mu $ and $\nu \in \mathcal {T}_{\mathbb {F}}(K)$, the following equalities hold:

$$\begin{aligned} \begin{gathered} \langle {L_{\nu }|L_{\mu }}\rangle _{\textrm{HS}(\mathcal {H})} =\langle {g_{\nu }|g_{\mu }}\rangle _{\mathcal {G}} =\iint _{\mathscr {X}}|K(x,t)|^{2}\textrm{d}\mu (t)\textrm{d}\overline{\nu }(x)\\\qquad \qquad \quad =\int _{\mathscr {X}}g_{\mu }(x)\textrm{d}\overline{\nu }(x) =\int _{\mathscr {X}}\overline{g_{\nu }(t)}\textrm{d}\mu (t), \end{gathered} \end{aligned}$$

(7)

relating the evaluation of inner products in $\textrm{HS}(\mathcal {H})$ between trace-class integral operators with kernel K to the integration of potentials in $\mathcal {G}$.

3.2 Quadrature approximation

Let $B_{\mathcal {G}}=\{g\in \mathcal {G}| \Vert g\Vert _{\mathcal {G}} \leqslant 1 \}$ be the closed unit ball of $\mathcal {G}$. We set

$$\begin{aligned} \mathfrak {M}_{\mathcal {G}}(\mu ,\nu ) =\sup _{g\in B_{\mathcal {G}}}\bigg |\int _{\mathscr {X}}g(t)\textrm{d}\mu (t)-\int _{\mathscr {X}}g(t)\textrm{d}\nu (t)\bigg |, \mu \text { and } \nu \in \mathcal {T}_{\mathbb {F}}(K). \end{aligned}$$

The map $\mathfrak {M}_{\mathcal {G}}$ defines a pseudometric on $\mathcal {T}_{\mathbb {F}}(K)$; for probability measures, such pseudometrics are referred to as integral probability metrics, or maximum mean discrepancies; see for instance [15, 16, 23, 24, 27].

The following Corollary 3.1 describes the equivalence between the quadrature approximation of trace-class integral operators with PSD kernels and the approximation of integral functionals on RKHSs with squared-modulus kernels.

Corollary 3.1

For all $\mu $ and $\nu \in \mathcal {T}_{\mathbb {F}}(K)$, we have $\Vert L_{\mu } - L_{\nu }\Vert _{\textrm{HS}(\mathcal {H})} =\mathfrak {M}_{\mathcal {G}}(\mu ,\nu )$.

Proof

Form Theorem 3.1 and by linearity of $\Gamma ^{*}$, we have $\Gamma ^{*}[g_{\mu }-g_{\nu }]=L_{\mu }-L_{\nu }$. Since $\Gamma ^{*}$ is an isometry, it follows that $\Vert L_{\mu } - L_{\nu }\Vert _{\textrm{HS}(\mathcal {H})}=\Vert g_{\mu }-g_{\nu }\Vert _{\mathcal {G}}$. The CS inequality in $\mathcal {G}$ and the definition of $g_{\mu }$ and $g_{\nu }$ then give

$$\begin{aligned} \Vert g_{\mu }-g_{\nu }\Vert _{\mathcal {G}} =\sup _{g\in B_{\mathcal {G}}}\big |\langle {g_{\mu }-g_{\nu }|g}\rangle _{\mathcal {G}}\big | =\sup _{g\in B_{\mathcal {G}}}\bigg |\int _{\mathscr {X}}g(t)\textrm{d}\overline{\mu }(t)-\int _{\mathscr {X}}g(t)\textrm{d}\overline{\nu }(t)\bigg |. \end{aligned}$$

We conclude by observing that for all $g\in \mathcal {G}$, we have $\overline{g}\in \mathcal {G}$ and $\Vert \overline{g}\Vert _{\mathcal {G}}=\Vert g\Vert _{\mathcal {G}}$ (see the proof of Lemma 2.2), so that $\Vert g_{\mu }-g_{\nu }\Vert _{\mathcal {G}}=\mathfrak {M}_{\mathcal {G}}(\mu ,\nu )$. $\square $

3.3 Further properties

In this section and in anticipation of the forthcoming developments, we discuss some further properties verified by the integral operators considered in Theorem 3.1.

For $\mu \in \mathcal {T}_{+}(K)$, let $L^{2}(\mu )$ be the Hilbert space of all square-integrable functions with respect to $\mu $. From the CS inequality in $\mathcal {H}$, we have

$$\begin{aligned} \int _{\mathscr {X}}|h(t)|^{2}\textrm{d}\mu (t) =\int _{\mathscr {X}}\big |\langle {k_t|h}\rangle _{\mathcal {H}}\big |^{2}\textrm{d}\mu (t) \leqslant \Vert h\Vert _{\mathcal {H}}^{2}\tau _{\mu }, h\in \mathcal {H}, \end{aligned}$$

(8)

so that the linear embedding $\iota _{\mu }^{}:\mathcal {H}\rightarrow L^{2}(\mu )$, with $\iota _{\mu }^{}[h]$ the equivalence class of all measurable functions $\mu $-almost everywhere equal to h, is bounded (see e.g. [26]).

Lemma 3.2

For all $\mu \in \mathcal {T}_{+}(K)$, the map $\iota _{\mu }^{}$ is HS and $L_{\mu }=\iota _{\mu }^{*}\iota _{\mu }^{}$.

Proof

Let $\{h_{i}\}_{i\in \mathbb {I}}$ be an ONB of $\mathcal {H}$, with $\mathbb {I}\subseteq \mathbb {N}$. As $K(t,t)=\sum _{i\in \mathbb {I}}\big |h_{i}(t)\big |^{2}$, $t\in \mathscr {X}$ (see e.g. [17]), from (6) and by monotone convergence, we have

$$\begin{aligned} \sum _{i\in \mathbb {I}}\Vert \iota _{\mu }^{}[h_{i}]\Vert _{L^{2}(\mu )}^{2} =\sum _{i\in \mathbb {I}}\langle {L_{\mu }[h_{i}]|h_{i}}\rangle _{\mathcal {H}} =\int _{\mathscr {X}}\sum _{i\in \mathbb {I}}|h_{i}(t)|^{2}\textrm{d}\mu (t) =\tau _{\mu }, \end{aligned}$$

so that $\iota _{\mu }^{}$ is HS. From (6), we also obtain

$$\begin{aligned} \langle {L_{\mu }[h]|f}\rangle _{\mathcal {H}} =\langle {\iota _{\mu }^{}[h]|\iota _{\mu }^{}[f]}\rangle _{L^{2}(\mu )} =\langle {\iota _{\mu }^{*}\iota _{\mu }^{}[h]|f}\rangle _{\mathcal {H}}, h \text { and }f\in \mathcal {H}, \end{aligned}$$

and so $L_{\mu }=\iota _{\mu }^{*}\iota _{\mu }^{}$. $\square $

For $\nu \in \mathcal {T}_{\mathbb {F}}(K)$, we by definition have $|\nu |\in \mathcal {T}_{+}(K)$ and $\overline{\nu }\in \mathcal {T}_{\mathbb {F}}(K)$; from (6), we also have $L_{\nu }^{*}=L_{\overline{\nu }}$. The following relation (Lemma 3.3) holds between the range of $L_{\nu }$ and the range of $L_{|\nu |}$.

Lemma 3.3

For all $\nu \in \mathcal {T}_{\mathbb {F}}(K)$, we have $\overline{{{\,\textrm{range}\,}}(L_{\nu })}^{\mathcal {H}}\subseteq \overline{{{\,\textrm{range}\,}}(L_{|\nu |})}^{\mathcal {H}}$.

Proof

From (6) and the CS inequality in $L^{2}(|\nu |)$, we obtain

$$\begin{aligned} \begin{aligned} \big |\langle {L_{\overline{\nu }}[h]|f}\rangle _{\mathcal {H}}\big |&=\bigg |\int _{\mathscr {X}}\overline{h(t)}f(t)\textrm{d}\nu (t)\bigg | \leqslant \int _{\mathscr {X}}\big |h(t)| |f(t)| \textrm{d}|\nu |(t)\\&\leqslant \Vert \iota _{|\nu |}^{}[h]\Vert _{L^{2}(|\nu |)} \Vert \iota _{|\nu |}^{}[f]\Vert _{L^{2}(|\nu |)}, h \text { and }f\in \mathcal {H}, \end{aligned} \end{aligned}$$

(9)

where (8) ensures that the embedding $\iota _{|\nu |}^{}$ is well-defined. From Lemma 3.2, we have ${{\,\textrm{null}\,}}(L_{\mu })=\{h\in \mathcal {H}|\iota _{\mu }[h]=0\}$; inequality (9) then entails ${{\,\textrm{null}\,}}(L_{|\nu |})\subseteq {{\,\textrm{null}\,}}(L_{\overline{\nu }})$, and so ${{\,\textrm{null}\,}}(L_{\overline{\nu }})^{\perp _{\mathcal {H}}}\subseteq {{\,\textrm{null}\,}}(L_{|\nu |})^{\perp _{\mathcal {H}}}$. Recalling that ${{\,\textrm{null}\,}}(T^{*})={{\,\textrm{range}\,}}(T)^{\perp _{\mathcal {H}}}$, $T\in \textrm{HS}(\mathcal {H})$, we conclude by noticing that $L_{\overline{\nu }}^{*}=L^{\,}_{\nu }$ and $L_{|\nu |}^{*}=L^{\,}_{|\nu |}$. $\square $

Lemma 3.4 illustrates that when the measure $\nu $ is finitely-supported, the range of $L_{|\nu |}$ is fully characterised by the support of $\nu $.

Lemma 3.4

For $\nu =\sum _{i=1}^{n}\upsilon _{i}\delta _{s_{i}}$, with $n\in \mathbb {N}$, $\upsilon _{i}\in \mathbb {C}$, $\upsilon _{i}\ne 0$, and $s_{i}\in \mathscr {X}$, we have ${{\,\textrm{range}\,}}(L_{|\nu |})={{\,\textrm{span}\,}}_{\mathbb {C}}\{k_{s_{1}}, \ldots , k_{s_{n}}\}$.

Proof

We have $|\nu |=\sum _{i=1}^{n}|\upsilon _{i}|\delta _{s_{i}}\in \mathcal {T}_{+}(K)$, and $L_{|\nu |}$ is PSD. From Lemma 3.2, we obtain that ${{{\,\textrm{null}\,}}(L_{|\nu |}) =\{h\in \mathcal {H}| \iota _{|\nu |}^{}[h]=0\} =\textstyle {\bigcap _{i=1}^{n}}\{h\in \mathcal {H}| \langle {k_{s_{i}}| h}\rangle _{\mathcal {H}}=0\}}$, and so ${{\,\textrm{null}\,}}(L_{|\nu |})^{\perp _{\mathcal {H}}}={{\,\textrm{span}\,}}_{\mathbb {C}}\{k_{s_{1}}, \ldots , k_{s_{n}}\}$. Observing that $L_{|\nu |}$ is self-adjoint, the result follows. $\square $

4 Measures and projection-based approximations

In this section, we illustrate the extent to which the the approximation of potentials in $\mathcal {G}$ can be used as a surrogate for the characterisation of closed linear subspaces of $\mathcal {H}$ for the approximation of $L_{\mu }\in \textrm{HS}(\mathcal {H})$ through projections (see Remark 4.1).

4.1 Additional notations and general properties

For a closed linear subspace $\mathcal {H}_{S}$ of $\mathcal {H}$, we denote by $P_{S}$ the orthogonal projection from $\mathcal {H}$ onto $\mathcal {H}_{S}$. Endowed with the Hilbert structure of $\mathcal {H}$, the vector space $\mathcal {H}_{S}$ is a RKHS, and its reproducing kernel $K_{S}$ verifies $K_{S}(x,t)=P_{S}[k_{t}](x)$, x and $t\in \mathscr {X}$.

Remark 4.1

The linear map $T\mapsto P_{S}T$ is the orthogonal projection from $\textrm{HS}(\mathcal {H})$ onto $\mathcal {R}(\mathcal {H}_{S})=\{T\in \textrm{HS}(\mathcal {H}) | {{\,\textrm{range}\,}}(T)\subseteq \mathcal {H}_{S}\}$, the closed linear subspace of $\textrm{HS}(\mathcal {H})$ of all operators with range included in $\mathcal {H}_{S}$. Also, the linear map $T\mapsto TP_{S}$ is the orthogonal projection from $\textrm{HS}(\mathcal {H})$ onto ${\mathcal {Z}(\mathcal {H}_{S})=\{T\in \textrm{HS}(\mathcal {H}) | {{\,\textrm{range}\,}}(T^{*})\subseteq \mathcal {H}_{S}\}}$. The two orthogonal projections $T\mapsto P_{S}T$ and $T\mapsto TP_{S}$ commute, and their composition, that is, the linear map $T\mapsto P_{S}TP_{S}$, is the orthogonal projection from $\textrm{HS}(\mathcal {H})$ onto $\mathcal {R}(\mathcal {H}_{S})\cap \mathcal {Z}(\mathcal {H}_{S})$. As $(P_{S}T)^{*}=T^{*}P_{S}$, the orthogonal projections onto $\mathcal {R}(\mathcal {H}_{S})$ and $\mathcal {Z}(\mathcal {H}_{S})$ are intrinsically related; for this reason, in what follows, we mainly focus on approximations of the form $P_{S}T$ and $P_{S}TP_{S}$. By orthogonality, for all $T\in \textrm{HS}(\mathcal {H})$, we have

$$\begin{aligned}&\Vert T-P_{S}T\Vert _{\textrm{HS}(\mathcal {H})}^{2} =\Vert T\Vert _{\textrm{HS}(\mathcal {H})}^{2} - \Vert P_{S}T\Vert _{\textrm{HS}(\mathcal {H})}^{2}, \text { and } \end{aligned}$$

(10)

$$\begin{aligned}&\Vert T-P_{S}TP_{S}\Vert _{\textrm{HS}(\mathcal {H})}^{2} =\Vert T\Vert _{\textrm{HS}(\mathcal {H})}^{2} - \Vert P_{S}TP_{S}\Vert _{\textrm{HS}(\mathcal {H})}^{2}, \end{aligned}$$

(11)

with $\Vert P_{S}TP_{S}\Vert _{\textrm{HS}(\mathcal {H})} \leqslant \Vert P_{S}T\Vert _{\textrm{HS}(\mathcal {H})} \leqslant \Vert T\Vert _{\textrm{HS}(\mathcal {H})}$; in particular, if T is self-adjoint, then so is $P_{S}TP_{S}$. $\triangleleft $

Lemma 4.1

Let $\mathcal {H}_{S}$ and $\mathcal {H}_{R}$ be two closed linear subspaces of $\mathcal {H}$, with $\mathcal {H}_{R}\subseteq \mathcal {H}_{S}$. For all $T\in \textrm{HS}(\mathcal {H})$, we have

$$\begin{aligned} \Vert P_{R}T\Vert _{\textrm{HS}(\mathcal {H})}\leqslant \Vert P_{S}T\Vert _{\textrm{HS}(\mathcal {H})} \quad \text {and}\quad \Vert P_{R}TP_{R}\Vert _{\textrm{HS}(\mathcal {H})}\leqslant \Vert P_{S}TP_{S}\Vert _{\textrm{HS}(\mathcal {H})}. \end{aligned}$$

Proof

We denote by $\mathcal {H}_{e}$ the orthogonal complement of $\mathcal {H}_{R}$ in $\mathcal {H}_{S}$. We then have $P_{S}=P_{R}+P_{e}$ and $\langle {P_{R}T| P _{e}\tilde{T}}\rangle _{\text {HS}(\mathcal {H})}=\langle {TP_{R}|\tilde{T}P_{e}}\rangle _{\text {HS}(\mathcal {H})}=0$, T and $\tilde{T}\in \textrm{HS}(\mathcal {H})$. We hence obtain $\Vert P_{S}T\Vert _{\textrm{HS}(\mathcal {H})}^{2}=\Vert P_{R}T\Vert _{\mathcal {H}}^{2}+\Vert P_{e}T\Vert _{\textrm{HS}(\mathcal {H})}^{2}$, and

$$\begin{aligned} \Vert P_{S}TP_{S}\Vert _{\textrm{HS}(\mathcal {H})}^{2} =\Vert P_{R}TP_{R}\Vert _{\textrm{HS}(\mathcal {H})}^{2}&+\Vert P_{e}TP_{e}\Vert _{\textrm{HS}(\mathcal {H})}^{2}\\&+\Vert P_{R}TP_{e}\Vert _{\textrm{HS}(\mathcal {H})}^{2} +\Vert P_{e}TP_{R}\Vert _{\textrm{HS}(\mathcal {H})}^{2}, \end{aligned}$$

completing the proof. $\square $

By boundedness of $P_{S}$, for $\mu \in \mathcal {T}_{\mathbb {F}}(K)$, we have $P_{S}L_{\mu }=\int _{\mathscr {X}}P_{S}S_{k_{t}}\textrm{d}\mu (t)$, and so

$$\begin{aligned} P_{S}L_{\mu }[h](x) =\int _{\mathscr {X}}K_{S}(x,t)h(t)\textrm{d}\mu (t), \hbox {} h\in \mathcal {H}\hbox {,}\,\, x\in \mathscr {X}. \end{aligned}$$

The operator $P_{S}L_{\mu }\in \textrm{HS}(\mathcal {H})$ can thus be regarded as an integral operator on $\mathcal {H}$ defined by the kernel $K_{S}$ and the measure $\mu $. Since $K_{S}(t,t)\leqslant K(t,t)$, $t\in \mathscr {X}$, we may notice that $\mathcal {T}_{\mathbb {F}}(K)\subseteq \mathcal {T}_{\mathbb {F}}(K_{S})$. We have

$$\begin{aligned}&\Vert L_{\mu }-P_{S}L_{\mu }\Vert _{\textrm{HS}(\mathcal {H})}^{2} =\iint _{\mathscr {X}}K(t,x)\big (K(x,t)-K_{S}(x,t)\big )\textrm{d}\mu (t)\textrm{d}\overline{\mu }(x), \text { and} \end{aligned}$$

(12)

$$\begin{aligned}&\Vert L_{\mu }-P_{S}L_{\mu }P_{S}\Vert _{\textrm{HS}(\mathcal {H})}^{2} =\iint _{\mathscr {X}}|K(x,t)|^{2} - |K_{S}(x,t)|^{2}\textrm{d}\mu (t)\textrm{d}\overline{\mu }(x); \end{aligned}$$

(13)

see (10), (11) and Lemma A.1 in Appendix A for a detailed computation (see also Remark 4.2 for an alternative computation involving $\Gamma $).

Remark 4.2

Let $\mathcal {H}_{U}$ and $\mathcal {H}_{V}$ be two closed linear subspaces of $\mathcal {H}$. For $\mu \in \mathcal {T}_{\mathbb {F}}(K)$ and $x\in \mathscr {X}$, we have

$$\begin{aligned} \Gamma [P_{V}L_{\mu }P_{U}](x) = \langle {k_{x}|P_{V}L_{\mu }P_{U}[k_{x}]\rangle }_{\mathcal {H}} =\int _{\mathscr {X}}K_{U}(t,x)K_{V}(x,t)\text {d}\mu (t). \end{aligned}$$

From Theorem 3.1 and the properties of $\Gamma $, we then obtain

$$\begin{aligned} \Vert P_{V}L_{\mu }P_{U}\Vert _{\textrm{HS}(\mathcal {H})}^{2} =\langle {\Gamma ^{*}\Gamma [L_{\mu }]|P_{V}L_{\mu }P_{U} }\rangle _{\textrm{HS}(\mathcal {H})} =\langle {g_{\mu }|\Gamma [P_{V}L_{\mu }P_{U}] }\rangle _{\mathcal {G}}. \end{aligned}$$

For general subspaces $\mathcal {H}_{U}$ and $\mathcal {H}_{V}$, the operator $P_{V}L_{\mu }P_{U}$ does not necessarily belong to $\mathcal {I}(\Gamma )$; see Remark 4.3 for an example where this situation occurs. $\triangleleft $

4.2 Projections defined by measures

Motivated by Lemmas 3.3 and 4.1, for $\nu \in \mathcal {T}_{\mathbb {F}}(K)$, we set $\mathcal {H}_{\nu }=\overline{{{\,\textrm{range}\,}}(L_{|\nu |})}^{\mathcal {H}}$, and we denote by $P_{\nu }$ the orthogonal projection from $\mathcal {H}$ onto $\mathcal {H}_{\nu }$.

Lemma 4.2

For all $\nu \in \mathcal {T}_{\mathbb {F}}(K)$, we have $L_{\nu }=P_{\nu }L_{\nu }=L_{\nu }P_{\nu }=P_{\nu }L_{\nu }P_{\nu }$.

Proof

From Lemma 3.3, we have $L_{\nu }=P_{\nu }L_{\nu }$ and $L_{\overline{\nu }}=P_{\nu }L_{\overline{\nu }}$. We then obtain $L_{\nu }=L_{\overline{\nu }}^{*}=(P_{\nu }L_{\overline{\nu }})^{*}=L_{\nu }P_{\nu }$, and so $L_{\nu }=P_{\nu }L_{\nu }P_{\nu }$. $\square $

For an initial operator $L_{\mu }$, with $\mu \in \mathcal {T}_{\mathbb {F}}(K)$, through the orthogonal projection $P_{\nu }$ and in addition to $L_{\nu }$, an approximate measure $\nu \in \mathcal {T}_{\mathbb {F}}(K)$ also defines the approximations $P_{\nu }L_{\mu }$, $L_{\mu }P_{\nu }$ or $P_{\nu }L_{\mu }P_{\nu }$ of $L_{\mu }$.

Lemma 4.3

For all $\mu $ and $\nu \in \mathcal {T}_{\mathbb {F}}(K)$, we have

$$\begin{aligned}&\Vert L_{\mu }-L_{\nu }\Vert _{\textrm{HS}(\mathcal {H})}^{2} =\Vert L_{\mu }-P_{\nu }L_{\mu }\Vert _{\textrm{HS}(\mathcal {H})}^{2} +\Vert P_{\nu }L_{\mu }-L_{\nu }\Vert _{\textrm{HS}(\mathcal {H})}^{2}, \text { and} \end{aligned}$$

(14)

$$\begin{aligned}&\Vert L_{\mu }-L_{\nu }\Vert _{\textrm{HS}(\mathcal {H})}^{2} =\Vert L_{\mu }-P_{\nu }L_{\mu }P_{\nu }\Vert _{\textrm{HS}(\mathcal {H})}^{2} +\Vert P_{\nu }L_{\mu }P_{\nu }-L_{\nu }\Vert _{\textrm{HS}(\mathcal {H})}^{2}. \end{aligned}$$

(15)

Proof

Using the notations of Remark 4.1, Lemma 4.2 reads $L_{\nu }\in \mathcal {R}(\mathcal {H}_{\nu })\cap \mathcal {Z}(\mathcal {H}_{\nu })$. Observing that $L_{\mu }-P_{\nu }L_{\mu }$ is orthogonal to $\mathcal {R}(\mathcal {H}_{\nu })$ and that $P_{\nu }L_{\mu }-L_{\nu }\in \mathcal {R}(\mathcal {H}_{\nu })$, we obtain (14). In the same way, $L_{\mu }-P_{\nu }L_{\mu }P_{\nu }$ is orthogonal to $\mathcal {R}(\mathcal {H}_{\nu })\cap \mathcal {Z}(\mathcal {H}_{\nu })$, and we have $P_{\nu }L_{\mu }P_{\nu }-L_{\nu }\in \mathcal {R}(\mathcal {H}_{\nu })\cap \mathcal {Z}(\mathcal {H}_{\nu })$, leading to (15). $\square $

4.3 Error maps on sets of measures

In the framework of Sects. 3.2 and 4.2, the characterisation of measures leading to accurate approximations of an initial operator $L_{\mu }$, $\mu \in \mathcal {T}_{\mathbb {F}}(K)$, relates to the minimisation of error maps measuring the accuracy of the approximations induced by a measure $\nu \in \mathcal {T}_{\mathbb {F}}(K)$.

Quadrature approximation. We define the error map ${D_{\mu }:\mathcal {T}_{\mathbb {F}}(K)\rightarrow \mathbb {R}_{\geqslant 0}}$, with

$$\begin{aligned} D_{\mu }(\nu ) =\Vert L_{\mu } - L_{\nu }\Vert _{\textrm{HS}(\mathcal {H})}^{2} =\Vert g_{\mu } - g_{\nu }\Vert _{\mathcal {G}}^{2}, \nu \in \mathcal {T}_{\mathbb {F}}(K). \end{aligned}$$

Lemma 4.4

For $\mu \in \mathcal {T}_{\mathbb {F}}(K)$, the map $D_{\mu }$ is convex on any convex set $\mathscr {C}\subseteq \mathcal {T}_{\mathbb {F}}(K)$. For $\nu $ and $\eta \in \mathscr {C}$, the directional derivative of $D_{\mu }$ at $\nu $ along $\eta -\nu $ is

$$\begin{aligned} \lim _{\rho \rightarrow 0^{+}} \frac{1}{\rho }\big [D_{\mu }\big ( \nu +\rho (\eta - \nu ) \big ) - D_{\mu }(\nu )\big ] =2\Re \big (\langle {g_{\nu }-g_{\mu }|g_{\eta }-g_{\nu }}\rangle _{\mathcal {G}}\big ). \end{aligned}$$

Proof

For $\xi =(1-\rho )\nu +\rho \eta $, $\nu $ and $\eta \in \mathscr {C}$, $\rho \in [0,1]$, we have $g_{\xi }=(1-\rho )g_{\nu }+\rho g_{\eta }$; the convexity of $D_{\mu }$ on $\mathscr {C}$ then follows from the convexity of the map $g\mapsto \Vert g_{\mu }-g\Vert _{\mathcal {G}}^{2}$ on $\mathcal {G}$. Next, the expansion of the squared norm $\Vert g_{\mu }-g_{\nu }-\rho (g_{\eta }-g_{\nu })\Vert _{\mathcal {G}}^{2}$ provides the expected expression for the directional derivatives of $D_{\mu }$. $\square $

Projection-based approximation. We denote by $C^{\textrm{P}}_{\mu }$ and $C^{\textrm{PP}}_{\mu }:\mathcal {T}_{\mathbb {F}}(K)\rightarrow \mathbb {R}_{\geqslant 0}$ the error maps defined as

$$\begin{aligned} C^{\textrm{P}}_{\mu }(\nu ) =\Vert L_{\mu }-P_{\nu }L_{\mu }\Vert _{\textrm{HS}(\mathcal {H})}^{2} \text { and } C^{\textrm{PP}}_{\mu }(\nu ) =\Vert L_{\mu }-P_{\nu }L_{\mu }P_{\nu }\Vert _{\textrm{HS}(\mathcal {H})}^{2}, \nu \in \mathcal {T}_{\mathbb {F}}(K); \end{aligned}$$

we may notice that $C^{\textrm{X}}_{\mu }(\nu )=C^{\textrm{X}}_{\mu }(|\nu |)$, $\textrm{X}\in \{\textrm{P},\textrm{PP}\}$.

Theorem 4.1

For $\mu \in \mathcal {T}_{\mathbb {F}}(K)$ and $\textrm{X}\in \{\textrm{P},\textrm{PP}\}$, the map $C^{\textrm{X}}_{\mu }$ is convex on the real convex cone $\mathcal {T}_{+}(K)$, and for all $\nu $ and $\eta \in \mathcal {T}_{+}(K)$, we have

$$\begin{aligned} \lim _{\rho \rightarrow 0^{+}} \frac{1}{\rho }\big [C^{\textrm{X}}_{\mu }\big ( \nu +\rho (\eta - \nu ) \big ) - C^{\textrm{X}}_{\mu }(\nu )\big ] \in \{-\infty , 0\}. \end{aligned}$$

Proof

For $\nu ,\eta \in \mathcal {T}_{+}(K)$ and $\rho \in (0,1)$, we set $\xi =\nu +\rho (\eta -\nu )\in \mathcal {T}_{+}(K)$. The three operators $L_{\nu }$, $L_{\eta }$ and $L_{\xi }$ being PSD, independently of $\rho \in (0,1)$, we have ${{\,\textrm{null}\,}}(L_{\xi })={{\,\textrm{null}\,}}(L_{\nu })\cap {{\,\textrm{null}\,}}(L_{\eta })$, and so $\mathcal {H}_{\xi }=\overline{\mathcal {H}_{\nu }+\mathcal {H}_{\eta }}^{\mathcal {H}}$. The two maps

$$\begin{aligned} \rho \mapsto C^{\textrm{P}}_{\mu }\big ( \nu +\rho (\eta - \nu ) \big ) \quad \text {and}\quad \rho \mapsto C^{\textrm{PP}}_{\mu }\big ( \nu +\rho (\eta - \nu ) \big ) \end{aligned}$$

are therefore constant on the open interval (0, 1). From Lemma 4.1 and (10), noticing that $\mathcal {H}_{\nu }\subseteq \mathcal {H}_{\xi }$ and $\mathcal {H}_{\eta }\subseteq \mathcal {H}_{\xi }$, we obtain $C^{\textrm{P}}_{\mu }(\xi )\leqslant C^{\textrm{P}}_{\mu }(\nu )$ and $C^{\textrm{P}}_{\mu }(\xi )\leqslant C^{\textrm{P}}_{\mu }(\eta )$; from (11), we also get $C^{\textrm{PP}}_{\mu }(\xi )\leqslant C^{\textrm{PP}}_{\mu }(\nu )$ and $C^{\textrm{PP}}_{\mu }(\xi )\leqslant C^{\textrm{PP}}_{\mu }(\eta )$, concluding the proof. $\square $

In view of Theorem 4.1, the maps $C^{\textrm{P}}_{\mu }$ and $C^{\textrm{PP}}_{\mu }$ are akin to piecewise-constant functions. By contrast (see Lemma 4.4), the directional derivatives of the map $D_{\mu }$ are informative, in the sense that the landscape of $D_{\mu }$ can be explored through steepest descents. From Remark 4.1 and Lemma 4.3, we have

$$\begin{aligned} C^{\textrm{P}}_{\mu }(\nu ) \leqslant C^{\textrm{PP}}_{\mu }(\nu ) \leqslant D_{\mu }(\nu ), \nu \in \mathcal {T}_{\mathbb {F}}(K), \end{aligned}$$

(16)

with $C^{\textrm{P}}_{\mu }(\mu )=D_{\mu }(\mu )=0$ and $C^{\textrm{P}}_{\mu }(0)=D_{\mu }(0)=\Vert g_{\mu }\Vert _{\mathcal {G}}^{2}$ (see also Remark 4.3). The quadrature-approximation error map $D_{\mu }$ may hence be regarded as a differentiable relaxation of the projection-based-approximation error maps $C^{\textrm{P}}_{\mu }$ and $C^{\textrm{PP}}_{\mu }$; see Fig. 1 for an illustration.

Remark 4.3

For $\mu \in \mathcal {T}_{\mathbb {F}}(K)$ and $s\in \mathscr {X}$, introducing $c_{\delta _{s}}=g_{\mu }(s)/|K(s,s)|^{2}$ if $K(s,s)>0$, and $c_{\delta _{s}}=0$ otherwise, we have $P_{\delta _{s}}L_{\mu }P_{\delta _{s}}=c_{\delta _{s}}S_{k_{s}}$. For $K(s,s)=0$, we indeed have $k_{s}=0$, and so $P_{\delta _{s}}=0$, and for $K(s,s)>0$,

$$\begin{aligned} P_{\delta _{s}}L_{\mu }P_{\delta _{s}}[h](x) =\frac{K(x,s)h(s)}{|K(s,s)|^{2}}\int _{\mathscr {X}}|K(s,t)|^{2}\textrm{d}\mu (t), h\in \mathcal {H}\text { and }x\in \mathscr {X}. \end{aligned}$$

We obtain $P_{\delta _{s}}L_{\mu }P_{\delta _{s}}\in \mathcal {I}(\Gamma )$ and $C^{\textrm{PP}}_{\mu }(\delta _{s})=D_{\mu }(c_{\delta _{s}}\delta _{s})$, $s\in \mathscr {X}$. $\triangleleft $

Remark 4.4

From a numerical standpoint, in view of (12) and (13), for $\nu \in \mathcal {T}_{\mathbb {F}}(K)$, the evaluation of $C^{\textrm{P}}_{\mu }(\nu )$ or $C^{\textrm{PP}}_{\mu }(\nu )$ requires a suitable characterisation of the reproducing kernel $K_{\nu }$ of $\mathcal {H}_{\nu }$ (or equivalently, of the orthogonal projection $P_{\nu }$); in practice, $K_{\nu }$ is a priori unknown and needs to be computed from K and $\nu $ (see Remark 4.5). In comparison and in view of (7), the error map $D_{\mu }$ only involves the kernel K; the projection-free nature of $D_{\mu }$ is of notable interest for numerical applications. $\triangleleft $

Remark 4.5

Following Lemma 3.4, for a measure $\nu $ supported by ${\mathcal {S}=\{s_{1},\ldots ,s_{n}\}}$, $n\in \mathbb {N}$, the reproducing kernel $K_{\nu }$ of $\mathcal {H}_{\nu }$ can be expressed as

$$\begin{aligned} K_{\nu }(x,t) =\sum _{i,j=1}^{n}K(x,s_i) \varkappa _{i,j} K(s_{j},t), x \text { and } t\in \mathscr {X}, \end{aligned}$$

where $\varkappa _{i,j}$ is the i, j entry of the pseudoinverse (Moore-Penrose inverse) of the ${n\times n}$ kernel matrix with i, j entry $K(s_{i},s_{j})$. The worst-case time complexity of the evaluation of $K_{\nu }$ at $M\in \mathbb {N}$ distinct locations in $\mathscr {X}\times \mathscr {X}$ is thus $\mathcal {O}(n^{3}+n^{2}M)$. The term $\mathcal {O}(n^{3})$ is related to the pseudoinversion of the kernel matrix defined by K and $\mathcal {S}$, while the term $\mathcal {O}(n^{2}M)$ corresponds to the evaluation, from this pseudoinverse and the kernel K, of $K_{\nu }$ at M different locations. $\triangleleft $

5 Nonnegative measures and $L^{2}$-embeddings

Following Sect. 3.3, for $\mu \in \mathcal {T}_{+}(K)$, the embedding $\iota _{\mu }^{}:\mathcal {H}\rightarrow L^{2}(\mu )$ is HS. For $f\in L^{2}(\mu )$ and $x\in \mathscr {X}$, we have

$$\begin{aligned} \langle {k_{x}|\iota _{\mu }^{*}[f]}\rangle _{\mathcal {H}} =\langle {\iota _{\mu }^{}[k_{x}]|f}\rangle _{L^{2}(\mu )} =\int _{\mathscr {X}}K(x,t)f(t)\textrm{d}\mu (t), \end{aligned}$$

so that in addition to $L_{\mu }=\iota _{\mu }^{*}\iota _{\mu }^{}\in \textrm{HS}(\mathcal {H})$, the three operators

$$\begin{aligned} \iota _{\mu }^{*}:L^{2}(\mu )\rightarrow \mathcal {H}, \quad \iota _{\mu }^{}\iota _{\mu }^{*}:L^{2}(\mu )\rightarrow L^{2}(\mu ), \quad \text {and}\quad \iota _{\mu }^{}\iota _{\mu }^{*}\iota _{\mu }^{}:\mathcal {H}\rightarrow L^{2}(\mu ), \end{aligned}$$

(17)

can also be regarded as integral operators defined by the kernel K and the nonnegative measure $\mu $. These four interpretations are inherent to K, which characterises $\mathcal {H}$, and $\mu $, which characterises $L^{2}(\mu )$; see for instance [4, 19, 20, 22, 26] for illustrations. In each case, the corresponding operator is HS, and we denote by $\textrm{HS}(\mu ,\mathcal {H})$, $\textrm{HS}(\mu )$ and $\textrm{HS}(\mathcal {H},\mu )$ the Hilbert spaces of all HS operators from $L^{2}(\mu )$ to $\mathcal {H}$, on $L^{2}(\mu )$, and from $\mathcal {H}$ to $L^{2}(\mu )$, respectively.

5.1 Partial $L^{2}$-embeddings

For a closed linear subspace $\mathcal {H}_{S}\subseteq \mathcal {H}$, the embedding $\iota _{\mu }^{}$ can be approximated by the partial embedding $\iota _{\mu }^{}P_{S}$. For $f\in L^{2}(\mu )$ and $x\in \mathscr {X}$, we have

$$\begin{aligned} \langle {k_{x}|P_{S}\iota _{\mu }^{*}[f]}\rangle _{\mathcal {H}} =\langle {\iota _{\mu }^{}P_{S}[k_{x}]|f}\rangle _{L^{2}(\mu )} =\int _{\mathscr {X}}K_{S}(x,t)f(t)\text {d}\mu (t), \end{aligned}$$

so that $P_{S}\iota _{\mu }^{*}$ corresponds to an integral operator with kernel $K_{S}$. In the decomposition $L_{\mu }=\iota _{\mu }^{*}\iota _{\mu }^{}\in \textrm{HS}(\mathcal {H})$, substituting each $\iota _{\mu }^{}$ with $\iota _{\mu }^{}P_{S}$ gives the approximation $P_{S}\iota _{\mu }^{*}\iota _{\mu }^{}P_{S}$ discussed in Sect. 4. For the operators defined in (17), a similar substitution yields the approximations

$$\begin{aligned} P_{S}\iota _{\mu }^{*}\in \textrm{HS}(\mu ,\mathcal {H}), \quad \iota _{\mu }^{}P_{S}\iota _{\mu }^{*}\in \textrm{HS}(\mu ), \quad \text { and }\quad \iota _{\mu }^{}P_{S}\iota _{\mu }^{*}\iota _{\mu }^{}P_{S}\in \textrm{HS}(\mathcal {H},\mu ); \end{aligned}$$

see also Remark 5.1. In what follows, we mainly focus on the approximations related to $\textrm{HS}(\mu ,\mathcal {H})$ and $\textrm{HS}(\mu )$; the case of $\textrm{HS}(\mathcal {H},\mu )$ is more briefly discussed in Remark 5.2.

Remark 5.1

In addition to $P_{S}\iota _{\mu }^{*}\iota _{\mu }^{}P_{S}$, the approximation of $\iota _{\mu }^{}$ by $\iota _{\mu }^{}P_{S}$ gives rise to the approximations $P_{S}\iota _{\mu }^{*}\iota _{\mu }^{}$ and $\iota _{\mu }^{*}\iota _{\mu }^{}P_{S}$ of $\iota _{\mu }^{*}\iota _{\mu }^{}\in \textrm{HS}(\mathcal {H})$ (see Sect. 4). Similarly, for $\iota _{\mu }^{}\iota _{\mu }^{*}\iota _{\mu }^{}\in \textrm{HS}(\mathcal {H},\mu )$ and in addition to $\iota _{\mu }^{}P_{S}\iota _{\mu }^{*}\iota _{\mu }^{}P_{S}$, the approximations $\iota _{\mu }^{}P_{S}\iota _{\mu }^{*}\iota _{\mu }^{}$ and $\iota _{\mu }^{}\iota _{\mu }^{*}\iota _{\mu }^{}P_{S}$ of $\iota _{\mu }^{}\iota _{\mu }^{*}\iota _{\mu }^{}\in \textrm{HS}(\mathcal {H},\mu )$ may be considered. In these approximations, the substitution of $\iota _{\mu }^{}$ with $\iota _{\mu }^{}P_{S}$ is not applied invariably. $\triangleleft $

Lemma 5.1

Let $\mathcal {H}_{S}$ be a closed linear subspace of $\mathcal {H}$. For $\mu \in \mathcal {T}_{+}(K)$, we have

$$\begin{aligned}&\Vert \iota _{\mu }^{*}-P_{S}\iota _{\mu }^{*}\Vert _{\textrm{HS}(\mu ,\mathcal {H})}^{2} =\int _{\mathscr {X}}K(t,t)-K_{S}(t,t)\textrm{d}\mu (t), \text { and} \end{aligned}$$

(18)

$$\begin{aligned}&\Vert \iota _{\mu }^{}\iota _{\mu }^{*}-\iota _{\mu }^{}P_{S}\iota _{\mu }^{*}\Vert _{\textrm{HS}(\mu )}^{2} =\iint _{\mathscr {X}}\big |K(x,t)-K_{S}(x,t)\big |^{2}\textrm{d}\mu (t)\textrm{d}\mu (x). \end{aligned}$$

(19)

Proof

Let $\mathcal {H}_{0S}$ be the orthogonal complement of $\mathcal {H}_{S}$ in $\mathcal {H}$; endowed with the Hilbert structure of $\mathcal {H}$, $\mathcal {H}_{0S}$ is a RKHS, and $K_{0S}=K-K_{S}$. For an ONB $\{h_{j}\}_{j\in \mathbb {I}}$ of $\mathcal {H}$, $\mathbb {I}\subseteq \mathbb {N}$, we have

$$\begin{aligned} \Vert \iota _{\mu }^{*}-P_{S}\iota _{\mu }^{*}\Vert _{\textrm{HS}(\mu ,\mathcal {H})}^{2} =\Vert \iota _{\mu }^{}P_{0S}\Vert _{\textrm{HS}(\mathcal {H},\mu )}^{2} =\sum _{j\in \mathbb {I}}\int _{\mathscr {X}}\big |P_{0S}[h_{j}](t)\big |^{2}\textrm{d}\mu (t); \end{aligned}$$

since $\sum _{j\in \mathbb {I}}\big |P_{0S}[h_{j}](t)\big |^{2}=K_{0S}(t,t)$, $t\in \mathscr {X}$ (see e.g. [17]), equality (18) follows by monotone convergence. We also have

$$\begin{aligned} \Vert \iota _{\mu }^{}\iota _{\mu }^{*}-\iota _{\mu }^{}P_{S}\iota _{\mu }^{*}\Vert _{\textrm{HS}(\mu )}^{2} =\Vert \iota _{\mu }^{}P_{0S}\iota _{\mu }^{*}\Vert _{\textrm{HS}(\mu )}^{2} =\Vert P_{0S}L_{\mu }P_{0S}\Vert _{\textrm{HS}(\mathcal {H})}^{2}, \end{aligned}$$

so that (19) follows from Lemma A.1 (we recall that $L_{\mu }=\iota _{\mu }^{*}\iota _{\mu }^{}$). $\square $

Remark 5.2

We consider four closed linear subspaces $\mathcal {H}_{R}, \mathcal {H}_{S}, \mathcal {H}_{U}$ and $\mathcal {H}_{V}$ of $\mathcal {H}$, and let $\{h_{j}\}_{j\in \mathbb {I}}$ be an ONB of $\mathcal {H}$. For $\mu \in \mathcal {T}_{+}(K)$, we have

$$\begin{aligned} \begin{aligned}&\langle {\iota _{\mu }^{}P_{R}\iota _{\mu }^{*}\iota _{\mu }^{}P_{S}|\iota _{\mu }^{}P_{U}\iota _{\mu }^{*}\iota _{\mu }^{}P_{V} }\rangle _{\textrm{HS}(\mathcal {H},\mu )} \\&=\sum _{j\in \mathbb {I}}\iiint _{\mathscr {X}}P_{V}[h_{j}](t)\overline{P_{S}[h_{j}](s)}K_{R}(s,x)K_{U}(x,t)\textrm{d}\mu (s)\textrm{d}\mu (t)\textrm{d}\mu (x), \end{aligned} \end{aligned}$$

(20)

with, form the CS inequality in $\ell ^{2}(\mathbb {I})$ and in $\mathcal {H}$,

$$\begin{aligned} \iiint _{\mathscr {X}}\sum _{j\in \mathbb {I}}\big |P_{V}[h_{j}](t)\big |\big |{P_{S}[h_{j}](s)}\big |\big |K_{R}(s,x)\big |\big |K_{U}(x,t)\big | \textrm{d}\mu (s)\textrm{d}\mu (t)\textrm{d}\mu (x) \leqslant \tau _{\mu }^{3}. \end{aligned}$$

From (20) and Fubini’s theorem, we then for instance obtain

$$\begin{aligned} \Vert \iota _{\mu }^{}\iota _{\mu }^{*}\iota _{\mu }^{}-\iota _{\mu }^{}P_{S}\iota _{\mu }^{*}\iota _{\mu }^{}P_{S}\Vert _{\textrm{HS}(\mathcal {H},\mu )}^{2} =\iiint _{\mathscr {X}}\Big [ K(t,s)K(s,x)K(x,t) \\ + K_{S}(t,s)K_{S}(s,x) \big ( K_{S}(x,t) - 2K(x,t) \big ) \Big ] \textrm{d}\mu (s)\textrm{d}\mu (t)\textrm{d}\mu (x), \end{aligned}$$

where we should observe that $\langle {\iota _{\mu }^{}\iota _{\mu }^{*}\iota _{\mu }^{}| \iota _{\mu }^{}P_{S}\iota _{\mu }^{*}\iota _{\mu }^{}P_{S} }\rangle _{\textrm{HS}(\mathcal {H},\mu )} =\Vert \iota _{\mu }^{}P_{S}\iota _{\mu }^{*}\iota _{\mu }^{}\Vert _{\textrm{HS}(\mathcal {H},\mu )}^{2}$. $\triangleleft $

The following inequality (Lemma 5.2) holds between the approximations in $\textrm{HS}(\mu )$ and $\textrm{HS}(\mathcal {H})$ defined by a subspace $\mathcal {H}_{S}$. We recall that $\Vert \iota _{\mu }^{}\iota _{\mu }^{*}\Vert _{\textrm{HS}(\mu )}=\Vert \iota _{\mu }^{*}\iota _{\mu }^{}\Vert _{\textrm{HS}(\mathcal {H})}$, and that $\Vert \iota _{\mu }^{*}\iota _{\mu }^{}-P_{S}\iota _{\mu }^{*}\iota _{\mu }^{}\Vert _{\textrm{HS}(\mathcal {H})}^{} \leqslant \Vert \iota _{\mu }^{*}\iota _{\mu }^{}-P_{S}\iota _{\mu }^{*}\iota _{\mu }^{}P_{S}\Vert _{\textrm{HS}(\mathcal {H})}^{}$ (see Remark 4.1).

Lemma 5.2

Let $\mathcal {H}_{S}$ be a closed linear subspace of $\mathcal {H}$. For all $\mu \in \mathcal {T}_{+}(K)$, we have

$$\begin{aligned} \Vert \iota _{\mu }^{}\iota _{\mu }^{*}-\iota _{\mu }^{}P_{S}\iota _{\mu }^{*}\Vert _{\textrm{HS}(\mu )}^{} \leqslant \Vert \iota _{\mu }^{*}\iota _{\mu }^{}-P_{S}\iota _{\mu }^{*}\iota _{\mu }^{}\Vert _{\textrm{HS}(\mathcal {H})}^{}. \end{aligned}$$

Proof

We have $\Vert \iota _{\mu }^{}\iota _{\mu }^{*}-\iota _{\mu }^{}P_{S}\iota _{\mu }^{*}\Vert _{\textrm{HS}(\mu )} =\Vert P_{0S}L_{\mu }P_{0S}\Vert _{\textrm{HS}(\mathcal {H})}$, with $P_{0S}={{\,\textrm{id}\,}}_{\mathcal {H}}-P_{S}$. The operators $P_{0S}L_{\mu }P_{0S}$ and $P_{0S}L_{\mu }P_{S}$ being orthogonal in $\textrm{HS}(\mathcal {H})$, we obtain

$$\begin{aligned} \Vert P_{0S}L_{\mu }(P_{0S}+P_{S})\Vert _{\textrm{HS}(\mathcal {H})}^{2} =\Vert P_{0S}L_{\mu }P_{0S}\Vert _{\textrm{HS}(\mathcal {H})}^{2} +\Vert P_{0S}L_{\mu }P_{S}\Vert _{\textrm{HS}(\mathcal {H})}^{2}, \end{aligned}$$

and so $\Vert \iota _{\mu }^{}\iota _{\mu }^{*}-\iota _{\mu }^{}P_{S}\iota _{\mu }^{*}\Vert _{\textrm{HS}(\mu )} \leqslant \Vert \iota _{\mu }^{*}\iota _{\mu }^{}-P_{S}\iota _{\mu }^{*}\iota _{\mu }^{}\Vert _{\textrm{HS}(\mathcal {H})}$. $\square $

5.2 Trace and Frobenius error maps

Following Sect. 4.3 and considering subspaces of $\mathcal {H}$ defined by measures, we introduce the error maps $C^{\textrm{tr}}_{\mu }$ and $C^{\textrm{F}}_{\mu }:\mathcal {T}_{\mathbb {F}}(K)\rightarrow \mathbb {R}_{\geqslant 0}$, with

$$\begin{aligned} C^{\textrm{tr}}_{\mu }(\nu ) =\Vert \iota _{\mu }^{*}-P_{\nu }\iota _{\mu }^{*}\Vert _{\textrm{HS}(\mu ,\mathcal {H})}^{2} \text { and } C^{\textrm{F}}_{\mu }(\nu ) =\Vert \iota _{\mu }^{}\iota _{\mu }^{*}-\iota _{\mu }^{}P_{\nu }\iota _{\mu }^{*}\Vert _{\textrm{HS}(\mu )}^{2}, \nu \in \mathcal {T}_{\mathbb {F}}(K). \end{aligned}$$

The notations $C^{\textrm{tr}}_{\mu }$ and $C^{\textrm{F}}_{\mu }$ are motivated by the relation between these maps and the trace and Frobenius norms; see Sect. 5.3. As observed for $C^{\textrm{P}}_{\mu }$ and $C^{\textrm{PP}}_{\mu }$, we may notice that $C^{\textrm{X}}_{\mu }(\nu )=C^{\textrm{X}}_{\mu }(|\nu |)$, $\textrm{X}\in \{\textrm{tr},\textrm{F}\}$

Theorem 5.1

For $\mu \in \mathcal {T}_{+}(K)$, the statement of Theorem 4.1 also holds for the maps $C^{\textrm{tr}}_{\mu }$ and $C^{\textrm{F}}_{\mu }$; that is, these two maps are convex on the real convex cone $\mathcal {T}_{+}(K)$, and their directional derivatives take values in the set $\{-\infty ,0\}$.

Proof

We follow the same reasoning as in the proof of Theorem 4.1. For two measures $\nu $ and $\eta \in \mathcal {T}_{+}(K)$ and for $\rho \in (0,1)$, we set $\xi =\nu +\rho (\eta -\nu )\in \mathcal {T}_{+}(K)$. We then have $\mathcal {H}_{\xi }=\overline{\mathcal {H}_{\nu }+\mathcal {H}_{\eta }}^{\mathcal {H}}$ independently of $\rho \in (0,1)$. We conclude by combining the inclusions $\mathcal {H}_{\nu }\subseteq \mathcal {H}_{\xi }$ and $\mathcal {H}_{\eta }\subseteq \mathcal {H}_{\xi }$ with the inequalities provided in Lemma A.2 (Appendix A). $\square $

As illustrated by Lemma 5.1 and Theorem 5.1, and as already observed for the error maps $C^{\textrm{P}}_{\mu }$ and $C^{\textrm{PP}}_{\mu }$, the error maps $C^{\textrm{tr}}_{\mu }$ and $C^{\textrm{F}}_{\mu }$ are akin to piecewise-constant functions, and their evaluation requires a suitable characterisation of the kernel of subspaces of $\mathcal {H}$. For $\mu \in \mathcal {T}_{+}(K)$, the error maps $C^{\textrm{X}}_{\mu }$, $\textrm{X}\in \{\textrm{tr},\textrm{F},\textrm{P},\textrm{PP}\}$ can be regarded as alternative ways to asses the accuracy of the approximation of $\iota ^{}_{\mu }$ by $\iota ^{}_{\mu }P_{\nu }$, $\nu \in \mathcal {T}_{\mathbb {F}}(K)$. From the relation between the error maps $D_{\mu }$ and $C^{\textrm{X}}_{\mu }$, $\textrm{X}\in \{\textrm{P},\textrm{PP}\}$ (see Lemma 4.3) the approximation of potentials in $\mathcal {G}$ can hence more generally be regarded as a differentiable and projection-free surrogate for the characterisation of accurate partial embeddings. From Lemma 5.2, we may notice that $C^{\textrm{F}}_{\mu }(\nu )\leqslant C^{\textrm{P}}_{\mu }(\nu )$, ${\nu \in \mathcal {T}_{\mathbb {F}}(K)}$, extending the sequence of inequalities (16).

Remark 5.3

Let $\nu \in \mathcal {T}_{\mathbb {C}}(K)$ be a complex measure with real and imaginary parts $\nu _{{r}}$ and $\nu _{{i}}\in \mathcal {T}(K)$. For $\mu \in \mathcal {T}(K)$, the three operators $L_{\mu }$, $L_{\nu _{{r}}}$ and $L_{\nu _{i}}$ are self-adjoint; we thus have $\Vert L_{\mu }-L_{\nu }\Vert _{\textrm{HS}(\mathcal {H})}^{2}=\Vert L_{\mu }-L_{\nu _{{r}}}\Vert _{\textrm{HS}(\mathcal {H})}^{2}+\Vert L_{\nu _{i}}\Vert _{\textrm{HS}(\mathcal {H})}^{2}$, and so $D_{\mu }(\nu _{{r}})\leqslant D_{\mu }(\nu )$. Hence, when $L_{\mu }$ is self-adjoint, the search of an approximate measure $\nu $ for the approximation of $L_{\mu }$ by $L_{\nu }$ can be restricted to $\mathcal {T}(K)$. $\triangleleft $

5.3 Column sampling for PSD-matrix approximation

Let $\textbf{K}$ be a $N\times N$ PSD matrix, with $N\in \mathbb {N}$; we denote by [N] the set of all integers between 1 and N. For a subset $I\subseteq [N]$ the Nyström approximation^{Footnote 2} of $\textbf{K}$ induced by I is the $N\times N$ PSD matrix

$$\begin{aligned} \hat{\textbf{K}}(I) =\textbf{K}_{\bullet ,I}(\textbf{K}_{I,I})^{\dagger }\textbf{K}_{I,\bullet }, \end{aligned}$$

(21)

where $\textbf{K}_{\bullet ,I}$ is the matrix defined by the columns of $\textbf{K}$ with index in I, and where $(\textbf{K}_{I,I})^{\dagger }$ is the pseudoinverse of the principal submatrix of $\textbf{K}$ defined by I (and $\textbf{K}_{I,\bullet }$ consists of rows of $\textbf{K}$); see e.g. [9, 11, 14, 18].

For i and $j\in [N]$, the i, j entry of $\textbf{K}$ may be regarded as the value K(i, j) of a PSD kernel K defined on the discrete set $\mathscr {X}=[N]$. The j-th column of $\textbf{K}$ then corresponds to the function $k_{j}\in \mathcal {H}$, $j\in \mathscr {X}$, and the subset I defines the closed linear subspace $\mathcal {H}_{I}={{\,\textrm{span}\,}}_{\mathbb {C}}\{k_{j}|j\in I\}\subseteq \mathcal {H}$; in particular, the i, j entry of $\hat{\textbf{K}}(I)$ is $K_{I}(i,j)$, with $K_{I}$ the reproducing kernel of $\mathcal {H}_{I}$ (see e.g. [17], and Remark 4.5).

Introducing $\mu =\sum _{i=1}^{N}\delta _{i}$, the Hilbert space $L^{2}(\mu )$ can be identified with the Euclidean space $\mathbb {C}^{N}$; following Sect. 5.2, we then observe that

the trace norm $\Vert \textbf{K}-\hat{\textbf{K}}(I)\Vert _{\textrm{tr}}$ corresponds to (18), and
the squared Frobenius norm $\Vert \textbf{K}-\hat{\textbf{K}}(I)\Vert _{\textrm{F}}^{2}$ corresponds to (19).

The column-sampling problem for the Nyström approximation of a PSD matrix $\textbf{K}$, that is, the search of a subset $I\subseteq [N]$ leading to an accurate approximation $\hat{\textbf{K}}(I)$ of $\textbf{K}$, is thus a special instance of the general framework discussed in Sect. 5.1. In particular, the support of an approximate measure $\nu $ on $\mathscr {X}=[N]$ defines a subset of columns of $\textbf{K}$, and the approximation of potentials in the RKHS $\mathcal {G}$ may be used as surrogate for the characterisation of such measures. In the discrete setting, $\mathcal {G}$ corresponds to the RKHS defined by the $N\times N$ PSD matrix $\textbf{S}$ with i, j entry $|\textbf{K}_{i,j}|^{2}$ (that is, $\textbf{S}$ is the element-wise product between $\textbf{K}$ and $\overline{\textbf{K}}$, the conjugate of $\textbf{K}$).

6 Concluding discussion

We described the overall framework surrounding the isometric representation of integral operators with PSD kernels as potentials, and illustrated the equivalence between the quadrature approximation of such integral operators and the approximation of integral functionals on RKHSs with squared-modulus kernels. Through subspaces defined by measures and partial $L^{2}$-embeddings, we also discussed the extent to which the approximation of potentials in RKHSs with squared-modulus kernels can be used as a differentiable surrogate for the characterisation of projection-based approximation of integral operators with PSD kernels.

The link between integral-operator approximation and potential approximation may be leveraged to design sampling strategies for low-rank approximation (where approximations are characterised by sparse finitely-supported measures). The direct minimisation of $D_{\mu }$ under sparsity-inducing constraints is for instance considered in [10], while the possibility to locally optimised the support of approximate measures using particle-flow techniques is studied in [13]. Sequential approaches, where support points are added one-at-a-time on the basis of information provided by the directional derivatives of $D_{\mu }$, are investigated in [12]. The present work aims at supporting this type of approaches by strengthening their theoretical underpinning.

Notes

We only consider finite complex measures, while signed measures may not be finite.
In the machine-learning literature, Nyström approximation refers to the low-rank approximation of PSD matrices through column sampling; although related, this terminology should not to be confused with the quadrature method for the approximation of integral equations.

References

Aronszajn, N.: Theory of reproducing kernels. Trans. Am. Math. Soc. 68(3), 337–404 (1950)
Article MathSciNet Google Scholar
Bach, F.: On the equivalence between kernel quadrature rules and random feature expansions. J. Mach. Learn. Res. 18(1), 714–751 (2017)
MathSciNet Google Scholar
Conway, J.B.: A Course in Functional Analysis, vol. 96. Springer (2019)
Cucker, F., Smale, S.: On the mathematical foundations of learning. Bull. Am. Math. Soc. 39, 1–49 (2002)
Article MathSciNet Google Scholar
Cucker, F., Xuan Zhou, D.: Learning Theory: an Approximation Theory Viewpoint. Cambridge University Press (2007)
Damelin, S.B., Hickernell, F.J., Ragozin, D.L., Zeng, X.: On energy, discrepancy and group invariant measures on measurable subsets of Euclidean space. J. Fourier Anal. Appl. 16, 813–839 (2010)
Article MathSciNet Google Scholar
Davidson, K.R.: Nest Algebras: Triangular Forms for Operator Algebras on Hilbert Space, volume 191. Longman Scientific and Technical (1988)
Diestel, J., Uhl, J.J.: Vector Measures. vol. 15. American Mathematical Society (1977)
Drineas, P., Mahoney, M.W.: On the Nyström method for approximating a Gram matrix for improved kernel-based learning. J. Mach. Learn. Res. 6, 2153–2175 (2005)
MathSciNet Google Scholar
Gauthier, B., Suykens, J.: Optimal quadrature-sparsification for integral operator approximation. SIAM J. Sci. Comput. 40, A3636–A3674 (2018)
Article MathSciNet Google Scholar
Gittens, A., Mahoney, M.W.: Revisiting the Nyström method for improved large-scale machine learning. J. Mach. Learn. Res. 17, 1–65 (2016)
Google Scholar
Hutchings, M., Gauthier, B.: Energy-based sequential sampling for low-rank PSD-matrix approximation. Preprint https://hal.science/hal-04102664 (2023)
Hutchings, M., Gauthier, B.: Local optimisation of Nyström samples through stochastic gradient descent. Mach. Learn. Optim. Data Sci. LOD 2022, 123–140 (2023)
Google Scholar
Kumar, S., Mohri, M., Talwalkar, A.: Sampling methods for the Nyström method. J. Mach. Learn. Res. 13, 981–1006 (2012)
MathSciNet Google Scholar
Muandet, K., Fukumizu, K., Sriperumbudur, B., Schölkopf, B.: Kernel mean embedding of distributions: a review and beyond. Found. Trends Mach. Learn. 10, 1–141 (2017)
Article Google Scholar
Müller, A.: Integral probability metrics and their generating classes of functions. Adv. Appl. Probab. 29(2), 429–443 (1997)
Article MathSciNet Google Scholar
Paulsen, V.I., Raghupathi, M.: An Introduction to the Theory of Reproducing Kernel Hilbert Spaces, Cambridge University Press (2016)
Rasmussen, C.E., Williams, C.K.I.: Gaussian Processes for Machine Learning. MIT press, Cambridge, MA (2006)
Google Scholar
Rosasco, L., Belkin, M., De Vito, E.: On learning with integral operators. J. Mach. Learn. Res. 11, 905–934 (2010)
MathSciNet Google Scholar
Santin, G., Schaback, R.: Approximation of eigenfunctions in kernel-based spaces. Adv. Comput. Math. 42, 973–993 (2016)
Article MathSciNet Google Scholar
Smale, S., Zhou, D.-X.: Learning theory estimates via integral operators and their approximations. Constr. Approx. 26, 153–172 (2007)
Article MathSciNet Google Scholar
Smale, S., Zhou, D.-X.: Geometry on probability spaces. Constr. Approx. 30, 311–323 (2009)
Article MathSciNet Google Scholar
Sriperumbudur, B.K., Fukumizu, K., Gretton, A., Schölkopf, B., Lanckriet, G.R.G.: On the empirical estimation of integral probability metrics. Electron. J. Stat. 6, 1550–1599 (2012)
Article MathSciNet Google Scholar
Sriperumbudur, B.K., Gretton, A., Fukumizu, K., Schölkopf, B., Lanckriet, G.R.G.: Hilbert space embeddings and metrics on probability measures. J. Mach. Learn. Res. 11, 1517–1561 (2010)
MathSciNet Google Scholar
Steinwart, I., Christmann, A.: Support Vector Machines. Springer (2008)
Steinwart, I., Scovel, C.: Mercer’s theorem on general domains: on the interaction between measures, kernels, and RKHSs. Constr. Approx. 35, 363–417 (2012)
Article MathSciNet Google Scholar
Steinwart, I., Ziegel, J.F.: Strictly proper kernel scores and characteristic kernels on compact spaces. Appl. Comput. Harmon. Anal. 51, 510–542 (2021)
Article MathSciNet Google Scholar
Yosida, K.: Functional Analysis. Springer (2012)

Download references

Author information

Authors and Affiliations

School of Mathematics, Cardiff University, Abacws, Senghennydd Road, Cardiff, CF24 4AG, UK
Bertrand Gauthier

Authors

Bertrand Gauthier
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

BG wrote the entirety of the manuscript.

Corresponding author

Correspondence to Bertrand Gauthier.

Ethics declarations

Conflict of interest

The author received no financial support for the research, authorship, and/or publication of this article. The author declares he has no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Technical results

Lemma A.1

Let $\mathcal {H}_{U}$ and $\mathcal {H}_{V}$ be two closed linear subspaces of $\mathcal {H}$. For all ${\mu \in \mathcal {T}_{\mathbb {F}}(K)}$, we have

$$\begin{aligned} \Vert P_{V}L_{\mu }P_{U}\Vert _{\textrm{HS}(\mathcal {H})}^{2} =\iint _{\mathscr {X}}K_{U}(t,x)K_{V}(x,t)\textrm{d}\mu (t)\textrm{d}\overline{\mu }(x). \end{aligned}$$

Proof

We consider an ONB $\{h_{j}\}_{j\in \mathbb {I}}$ of $\mathcal {H}$, $\mathbb {I}\subseteq \mathbb {N}$. From (6), we have

$$\begin{aligned} \begin{aligned} \Vert P_{V}L_{\mu }P_{U}\Vert _{\textrm{HS}(\mathcal {H})}^{2}&=\sum _{j\in \mathbb {I}}\langle {L_{\mu }P_{U}[h_{i}]|P_{V}L_{\mu }P_{U}[h_{j}] }\rangle _{\mathcal {H}}\\&=\sum _{j\in \mathbb {I}}\iint _{\mathscr {X}}P_{U}[h_{j}](t)\overline{P_{U}[h_{j}](x)}K_{V}(x,t)\textrm{d}\mu (t)\textrm{d}\overline{\mu }(x). \end{aligned}\nonumber \\ \end{aligned}$$

(22)

As $\sum _{j\in \mathbb {I}}P_{U}[h_{j}](t)\overline{P_{U}[h_{j}](x)}=K_{U}(t,x)$, x and $t\in \mathscr {X}$ (see e.g. [17]), and since $K_{U}(t,t)\leqslant K(t,t)$ and $K_{V}(t,t)\leqslant K(t,t)$, from the CS inequality in $\ell ^{2}(\mathbb {I})$ and in $\mathcal {H}$, we obtain

$$\begin{aligned} \iint _{\mathscr {X}}\sum _{j\in \mathbb {I}}\big |P_{U}[h_{j}](t)\big | \big |P_{U}[h_{j}](x)\big | \big |K_{V}(x,t)\big |\textrm{d}|\mu |(t)\textrm{d}|\mu |(x) \leqslant \tau _{\mu }^{2}; \end{aligned}$$

the result then follows form (22) and Fubini’s theorem. $\square $

Lemma A.2

Let $\mathcal {H}_{S}$ and $\mathcal {H}_{R}$ be two closed linear subspaces of $\mathcal {H}$; we assume that ${\mathcal {H}_{R}\subseteq \mathcal {H}_{S}}$. For all $\mu \in \mathcal {T}_{+}(K)$, we have $\Vert \iota _{\mu }^{*}-P_{S}\iota _{\mu }^{*}\Vert _{\textrm{HS}(\mu ,\mathcal {H})} \leqslant \Vert \iota _{\mu }^{*}-P_{R}\iota _{\mu }^{*}\Vert _{\textrm{HS}(\mu ,\mathcal {H})}$ and $\Vert \iota _{\mu }^{}\iota _{\mu }^{*}-\iota _{\mu }^{}P_{S}\iota _{\mu }^{*}\Vert _{\textrm{HS}(\mu )} \leqslant \Vert \iota _{\mu }^{}\iota _{\mu }^{*}-\iota _{\mu }^{}P_{R}\iota _{\mu }^{*}\Vert _{\textrm{HS}(\mu )}$.

Proof

We denote by $\mathcal {H}_{e}$ the orthogonal complement of $\mathcal {H}_{R}$ in $\mathcal {H}_{S}$. Noticing that $P_{S}=P_{R}+P_{e}$ and that $\langle {\iota _{\mu }^{*}-P_{R}\iota _{\mu }^{*}|P_{e}\iota _{\mu }^{*}}\rangle _{\textrm{HS}(\mu ,\mathcal {H})} =\Vert P_{e}\iota _{\mu }^{*}\Vert _{\textrm{HS}(\mu ,\mathcal {H})}^{2}$, we obtain

$$\begin{aligned} \Vert \iota _{\mu }^{*}-P_{S}\iota _{\mu }^{*}\Vert _{\textrm{HS}(\mu ,\mathcal {H})}^{2}= \Vert \iota _{\mu }^{*}-P_{R}\iota _{\mu }^{*}\Vert _{\textrm{HS}(\mu ,\mathcal {H})}^{2} -\Vert P_{e}\iota _{\mu }^{*}\Vert _{\textrm{HS}(\mu ,\mathcal {H})}^{2}. \end{aligned}$$

Denoting by $\mathcal {H}_{0\,S}$ and $\mathcal {H}_{0R}$ the orthogonal complements of $\mathcal {H}_{S}$ and $\mathcal {H}_{R}$ in $\mathcal {H}$, respectively, we have $\mathcal {H}_{0S}\subseteq \mathcal {H}_{0R}$; Lemma 4.1 then gives

$$\begin{aligned} \Vert \iota _{\mu }^{}P_{0S}\iota _{\mu }^{*}\Vert _{\textrm{HS}(\mu )} =\Vert P_{0S}L_{\mu }P_{0S}\Vert _{\textrm{HS}(\mathcal {H})} \leqslant \Vert P_{0R}L_{\mu }P_{0R}\Vert _{\textrm{HS}(\mathcal {H})} =\Vert \iota _{\mu }^{}P_{0R}\iota _{\mu }^{*}\Vert _{\textrm{HS}(\mu )}, \end{aligned}$$

completing the proof. $\square $

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Gauthier, B. Kernel embedding of measures and low-rank approximation of integral operators. Positivity 28, 29 (2024). https://doi.org/10.1007/s11117-024-01041-8

Download citation

Received: 13 November 2023
Accepted: 23 February 2024
Published: 04 April 2024
DOI: https://doi.org/10.1007/s11117-024-01041-8

Keywords

Mathematics Subject Classification

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Kernel embedding of measures and low-rank approximation of integral operators

Abstract

Similar content being viewed by others

Generalized weighted Bergman–Dirichlet and Bargmann–Dirichlet spaces: explicit formulae for reproducing kernels and asymptotics

Integral Operators, Embedding Theorems and a Littlewood–Paley Formula on Weighted Fock Spaces

Integral operators with rough kernels in variable Lebesgue spaces

1 Introduction

2 Framework, notations and basic properties

2.1 RKHSs and Hilbert–Schmidt operators

Remark 2.1

Remark 2.2

Remark 2.3

Remark 2.4

2.2 Natural coisometry from \(\textrm{HS}(\mathcal {H})\) onto \(\mathcal {G}\)

Lemma 2.1

Proof

Lemma 2.2

Proof

Remark 2.5

Remark 2.6

Remark 2.7

3 Trace-class integral operators with PSD kernels

3.1 Integral operators and kernel embedding of measures

Remark 3.1

Remark 3.2

Lemma 3.1

Proof

Theorem 3.1

Proof

Remark 3.3

3.2 Quadrature approximation

Corollary 3.1

Proof

3.3 Further properties

Lemma 3.2

Proof

Lemma 3.3

Proof

Lemma 3.4

Proof

4 Measures and projection-based approximations

4.1 Additional notations and general properties

Remark 4.1

Lemma 4.1

Proof

Remark 4.2

4.2 Projections defined by measures

Lemma 4.2

Proof

Lemma 4.3

Proof

4.3 Error maps on sets of measures

Lemma 4.4

Proof

Theorem 4.1

Proof

Remark 4.3

Remark 4.4

Remark 4.5

5 Nonnegative measures and \(L^{2}\)-embeddings

5.1 Partial \(L^{2}\)-embeddings

Remark 5.1

Lemma 5.1

Proof

Remark 5.2

Lemma 5.2

Proof

5.2 Trace and Frobenius error maps

Theorem 5.1

Proof

Remark 5.3

5.3 Column sampling for PSD-matrix approximation

6 Concluding discussion

Notes

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations