Abstract
We describe a natural coisometry from the Hilbert space of all Hilbert-Schmidt operators on a separable reproducing kernel Hilbert space \(\hbox { (RKHS)}\, \mathcal {H}\) and onto the RKHS \(\mathcal {G}\) associated with the squared-modulus of the reproducing kernel of \(\mathcal {H}\). Through this coisometry, trace-class integral operators defined by general measures and the reproducing kernel of \(\mathcal {H}\) are isometrically represented as potentials in \(\mathcal {G}\), and the quadrature approximation of these operators is equivalent to the approximation of integral functionals on \(\mathcal {G}\). We then discuss the extent to which the approximation of potentials in RKHSs with squared-modulus kernels can be regarded as a differentiable surrogate for the characterisation of low-rank approximation of integral operators.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
Integral operators with positive-semidefinite (PSD) kernels play a central role in the theory of reproducing kernel Hilbert spaces (RKHSs) and their applications; see for instance [4, 5, 19, 20, 26]. As an important instance, this class of operators encompasses the PSD matrices.
Under suitable conditions, an integral operator defined by a PSD kernel K and a measure \(\mu \) can be regarded as a Hilbert-Schmidt (HS) operator \(L_{\mu }\) on the RKHS \(\mathcal {H}\) associated with K; see e.g. [20,21,22]. Let \(\mathcal {G}\) be the RKHS for which the squared-modulus kernel \(|K|^{2}\) is reproducing. Following [10], when the integral of the diagonal of K with respect to the variation of \(\mu \) is finite, the HS operator \(L_{\mu }\) on \(\mathcal {H}\) can be isometrically represented as the Riesz representation \(g_{\mu }\in \mathcal {G}\) of the integral functional on \(\mathcal {G}\) defined by the measure \(\overline{\mu }\), the conjugate of \(\mu \). The operator \(L_{\mu }\) is in this case trace-class, and \(g_{\mu }\) is the potential, or kernel embedding, of the measure \(\mu \) in the RKHS \(\mathcal {G}\). In the Hilbert space \(\textrm{HS}(\mathcal {H})\) of all HS operators on \(\mathcal {H}\), the quadrature approximation of trace-class integral operators with kernel K is hence equivalent to the approximation of integral functionals on \(\mathcal {G}\). Considering another measure \(\nu \) and denoting by \(B_{\mathcal {G}}\) the closed unit ball of \(\mathcal {G}\), we more specifically have
so that the map \((\mu ,\nu )\mapsto \Vert L_{\mu }-L_{\nu }\Vert _{\textrm{HS}(\mathcal {H})}\) corresponds to a generalised integral probability metric, or maximum mean discrepancy (see e.g. [2, 15, 16, 24, 27]).
We give an overall description of the framework surrounding such an isometric representation, and illustrate that it follows from the definition of a natural coisometry \(\Gamma \) from \(\textrm{HS}(\mathcal {H})\) onto \(\mathcal {G}\); this coisometry maps self-adjoint operators to real-valued functions, and PSD operators to nonnegative functions (Sect. 2). Under adequate measurability conditions on K and assuming that the diagonal of K is integrable with respect to \(|\mu |\), we show that \(L_{\mu }\) always belongs to the initial space of \(\Gamma \), and that \(\Gamma [L_{\mu }]=g_{\mu }\). We then describe the equivalence between the quadrature approximation of integral operators with PSD kernels and the approximation of potentials in RKHSs with squared-modulus kernels (Sect. 3).
For an approximate measure \(\nu \), and denoting by \(\mathcal {H}_{\nu }\) the closure in \(\mathcal {H}\) of the range of \(L_{|\nu |}\) (so that when \(\nu \) is finitely-supported, \(\mathcal {H}_{\nu }\) is fully characterised by the support \(\hbox { of}\, \nu \)), we next investigate the extent to which the approximation of potentials in \(\mathcal {G}\) can be used as a differentiable surrogate for the characterisation of approximations of \(L_{\mu }\) of the form \(P_{\nu }L_{\mu }\), \(L_{\mu }P_{\nu }\) or \(P_{\nu }L_{\mu }P_{\nu }\), with \(P_{\nu }\) the orthogonal projection from \(\mathcal {H}\) onto \(\mathcal {H}_{\nu }\) (Sect. 4). When the measure \(\mu \) is nonnegative, the operator \(L_{\mu }\) admits the decomposition \(L_{\mu }=\iota _{\mu }^{*}\iota _{\mu }^{}\), with \(\iota _{\mu }^{}\) the natural embedding of \(\mathcal {H}\) in \(L^{2}(\mu )\). The three operators
can then also be regarded as integral operators defined by the kernel K and the measure \(\mu \), and through the partial embedding \(\iota _{\mu }^{}P_{\nu }\), a measure \(\nu \) characterises approximations of each of these operators. We study the properties of these approximations and further illustrate the connections between the low-rank approximation of integral operators with PSD kernels and the approximation of potentials in RKHSs with squared-modulus kernels. We also describe the link between the considered framework and the low-rank approximation of PSD matrices through column sampling (Sect. 5). The presentation ends with a concluding discussion (Sect. 6) and some technical results are gathered in appendix (Appendix A). The approximation schemes considered in this note should be apprehended from the the point of view of numerical strategies for discretisation or dimension reduction; in practical applications, approximations will generally be characterised by finitely-supported measures.
2 Framework, notations and basic properties
By default, all the Hilbert spaces considered in this note are complex; they are otherwise explicitly referred to as real Hilbert spaces; we use a similar convention for vector spaces. Inner products of complex Hilbert spaces are assumed to be linear with respect to their right argument. For \(z\in \mathbb {C}\), we denote by \(\overline{z}\), |z| and \(\Re (z)\) the conjugate, modulus and real part of z, respectively, and \(\textrm{i}\in \mathbb {C}\) is the imaginary unit. By analogy, for a complex-valued function f on a general set S, we denote by \(\overline{f}\) and |f| the functions defined as \(\overline{f}(s)=\overline{f(s)}\) and \(|f|(s)=|f(s)|\), \(s\in S\); we also use the notation \(|f|^{2}\) to refer to the function \(s\mapsto |f(s)|^{2}\).
For two Hilbert spaces H and F, we denote by \(A^{*}\) the adjoint of a bounded linear operator \(A:H\rightarrow F\). The map A is an isometry if \(A^{*}A={{\,\textrm{id}\,}}_{H}\), the identity operator on H, and A is a coisometry if \(A^{*}\) is an isometry (and so \(AA^{*}={{\,\textrm{id}\,}}_{F}\)). A coisometry A is a surjective partial isometry (that is, \(AA^{*}A=A\)), and \(A^{*}A\) is then the orthogonal projection from H onto the initial space \(\mathcal {I}(A)\) of A, with \(\mathcal {I}(A)\) the orthogonal complement in H of the nullspace of A. We denote by \({{\,\textrm{null}\,}}(A)\) the nullspace of A, and by \({{\,\textrm{range}\,}}(A)\) its range. Also, for a subset C of H, we denote by \(C^{\perp _{H}}\) the orthogonal complement of C in H, and by \(\overline{C}^{H}\) the closure of C in H.
2.1 RKHSs and Hilbert–Schmidt operators
Below, we introduce the various Hilbert spaces relevant to our study.
Underlying RKHS. Let \(\mathcal {H}\) be a separable RKHS of complex-valued functions on a general set \(\mathscr {X}\), with reproducing kernel \(K:\mathscr {X}\times \mathscr {X}\rightarrow \mathbb {C}\); see e.g. [1, 17]. For \(t\in \mathscr {X}\), let \(k_{t}\in \mathcal {H}\) be defined as \(k_{t}(x)=K(x,t)\), \(x\in \mathscr {X}\). For all \(h\in \mathcal {H}\), we have \(\langle {k_{t}\,|\,h}\rangle _{\mathcal {H}}=h(t)\), where \(\langle {.|.}\rangle _{\mathcal {H}}\) stands for the inner product of \(\mathcal {H}\) (this equality is often referred to as the reproducing property); we denote by \(\Vert {\cdot }\Vert _{\mathcal {H}}\) the norm of \(\mathcal {H}\), and we use a similar convention for the inner products and norms of all the Hilbert spaces encountered in this note.
Hilbert-Schmidt space. Let \(\textrm{HS}(\mathcal {H})\) be the Hilbert space of all HS operators on \(\mathcal {H}\); see e.g. [3, 7]. For \(T\in \textrm{HS}(\mathcal {H})\), we denote by \(T[h]\in \mathcal {H}\) the image of \(h\in \mathcal {H}\) through T, and by T[h](x) the value of the function T[h] at \(x\in \mathscr {X}\); we use similar notations for all function-valued operators. For a and \(b\in \mathcal {H}\), let \(T_{a,b}\in \textrm{HS}(\mathcal {H})\) be the rank-one operator given by
we also set \(S_{b}=T_{b,b}\).
Remark 2.1
An operator \(T\in \textrm{HS}(\mathcal {H})\) always admits a singular value decomposition (SVD) of the form \(T=\sum _{i\in \mathbb {I}}\sigma _{i}T_{u_{i},v_{i}}\), \(\mathbb {I}\subseteq \mathbb {N}\), where \(\{\sigma _{i}\}_{i\in \mathbb {I}}\in \ell ^{2}(\mathbb {I})\) is the set of all strictly-positive singular values of T, and where \(\{u_{i}\}_{i\in \mathbb {I}}\) and \(\{v_{i}\}_{i\in \mathbb {I}}\) are two orthonormal systems \(\hbox { in}\ \mathcal {H}\); the series converges in \(\textrm{HS}(\mathcal {H})\). \(\triangleleft \)
Remark 2.2
Let \(\mathcal {H}'\) be the continuous dual of \(\mathcal {H}\). For \(h\in \mathcal {H}\), let \(\xi _{h}\in \mathcal {H}'\) be the bounded linear functional such that \(\xi _{h}(f)=\langle {h\,|\,f}\rangle _{\mathcal {H}}\), \(f\in \mathcal {H}\). Endowed with the inner product \(\langle {\xi _{f}|\xi _{h}}\rangle _{\mathcal {H}'}=\langle {h\,|\,f}\rangle _{\mathcal {H}}\), the vector space \(\mathcal {H}'\) is a Hilbert space, and the Riesz map \(h\mapsto \xi _{h}\) is a bijective conjugate-linear isometry form \(\mathcal {H}\) to \(\mathcal {H}'\) (we may notice that \(\xi _{\alpha h}=\overline{\alpha }\xi _{h}\), \(\alpha \in \mathbb {C}\)). The linear map densely defined as \(T_{a,b}\mapsto a\otimes \xi _{b}\) (see Remark 2.1) is then a bijective isometry from the Hilbert space \(\textrm{HS}(\mathcal {H})\) to the tensor Hilbert space \(\mathcal {H}\otimes \mathcal {H}'\). \(\triangleleft \)
Conjugate RKHS. Let \(\overline{\mathcal {H}}\) be the RKHS of complex-valued functions on \(\mathscr {X}\) associated with the conjugate kernel \(\overline{K}\), with \(\overline{K}(x,t)=\overline{K(x,t)}\), x and \(t\in \mathscr {X}\). For all \(h\in \mathcal {H}\), we have \(\overline{h}\in \overline{\mathcal {H}}\) (that is, the function \(\overline{h}:x\mapsto \overline{h(x)}\) is a vector of \(\overline{\mathcal {H}}\)), and the map \(h\mapsto \overline{h}\) is a bijective conjugate-linear isometry from \(\mathcal {H}\) to \(\overline{\mathcal {H}}\). We have \(\overline{k_{t}}(x)=\overline{K(x,t)}=\overline{k_{t}(x)}\), and
We denote by \(\Psi \) the bijective linear isometry from \(\textrm{HS}(\mathcal {H})\) to the tensor Hilbert space \(\mathcal {H}\otimes \overline{\mathcal {H}}\), densely defined as \(\Psi (T_{a,b})=a\otimes \overline{b}\), a and \(b\in \mathcal {H}\).
Remark 2.3
Following Remark 2.2, the linear map \(\xi _{h}\mapsto \overline{h}\) is a bijective isometry form \(\mathcal {H}'\) to \(\overline{\mathcal {H}}\). Further, the linear map densely defined as \(a\otimes \xi _{b}\mapsto a\otimes \overline{b}\) is a bijective isometry form \(\mathcal {H}\otimes \mathcal {H}'\) to \(\mathcal {H}\otimes \overline{\mathcal {H}}\); the composition of this isometry with the bijective isometry from \(\textrm{HS}(\mathcal {H})\) to \(\mathcal {H}\otimes \mathcal {H}'\) discussed in Remark 2.2 yields the isometry \(\Psi :\textrm{HS}(\mathcal {H})\rightarrow \mathcal {H}\otimes \overline{\mathcal {H}}\); see the diagram (5). \(\triangleleft \)
Squared-kernel RKHS. The kernels K and \(\overline{K}\) being PSD, by the Schur-product theorem, so is the squared-modulus kernel \(|K|^{2}=K\overline{K}\), with
Let \(\mathcal {G}\) be the RKHS of complex-valued functions on \(\mathscr {X}\) for which \(|K|^{2}\) is reproducing (\(\mathcal {G}=\mathcal {H}\odot \overline{\mathcal {H}}\) is the product of the two RKHSs \(\mathcal {H}\) and \(\overline{\mathcal {H}}\); see e.g. [1, 17]).
Following [17, Chapter 5], we denote by \(C_{\Delta }:\mathcal {H}\otimes \overline{\mathcal {H}}\rightarrow \mathcal {G}\) the coisometry densely defined as \(C_{\Delta }(a\otimes \overline{b})=a\overline{b}\), a and \(b\in \mathcal {H}\), where \(a\overline{b}\in \mathcal {G}\) is the complex-valued function on \(\mathscr {X}\) given by
For \(\Upsilon \in \mathcal {H}\otimes \overline{\mathcal {H}}\), we more generally have
The initial space of \(C_{\Delta }\) is \(\mathcal {I}(C_{\Delta })=\overline{{{\,\textrm{span}\,}}_{\mathbb {C}}\{k_{x}\otimes \overline{k_{x}} | x\in \mathscr {X}\}}^{\mathcal {H}\otimes \overline{\mathcal {H}}}\), the closure in \(\mathcal {H}\otimes \overline{\mathcal {H}}\) of the linear space spanned by the simple tensors \(k_{x}\otimes \overline{k_{x}}\), \(x\in \mathscr {X}\).
Remark 2.4
From (1), for all \(x\in \mathscr {X}\), we have \(C_{\Delta }^{*}[|k_{x}|^{2}]=k_{x}\otimes \overline{k_{x}}\). The linear space \({{\,\textrm{span}\,}}_{\mathbb {C}}\{|k_{x}|^{2} | x\in \mathscr {X}\}\) being dense in \(\mathcal {G}\) (see for instance [17, Chapter 2]), we have space \(\text {span}_{c}\{ |k_{x|^{2}}\) have \(C_{\Delta }^{\,}C_{\Delta }^{*}={{\,\textrm{id}\,}}_{\mathcal {G}}\), so that \(C_{\Delta }^{*}\) is an isometry. \(\triangleleft \)
2.2 Natural coisometry from \(\textrm{HS}(\mathcal {H})\) onto \(\mathcal {G}\)
We can now define a natural coisometry from the Hilbert space \(\textrm{HS}(\mathcal {H})\) of all HS operators on a RKHS \(\mathcal {H}\), and onto the RKHS \(\mathcal {G}\) associated with the squared-modulus of the reproducing kernel of \(\mathcal {H}\). The terminology natural is used to emphasise that the considered construction does not depend on the choice of any specific basis.
Lemma 2.1
The linear map \(\Gamma =C_{\Delta }\Psi :\textrm{HS}(\mathcal {H})\rightarrow \mathcal {G}\) is a coisometry, and its initial space is
in addition, for all \(T\in \textrm{HS}(\mathcal {H})\), we have
Proof
The linear isometry \(\Psi \) being bijective, we have \(\Psi \Psi ^{*}={{\,\textrm{id}\,}}_{\mathcal {H}\otimes \overline{\mathcal {H}}}\), and so
By definition of \(C_{\Delta }\) and \(\Psi \), we have \(\Gamma ^{*}[|k_{x}|^{2}]=\Psi ^{*}[k_{x}\otimes \overline{k_{x}}]=S_{k_{x}}\), \(x\in \mathscr {X}\), so that (2) follows from the density of \({{\,\textrm{span}\,}}_{\mathbb {C}}\{|k_{x}|^{2} | x\in \mathscr {X}\}\) in \(\mathcal {G}\) (see Remark 2.4). The reproducing property in \(\mathcal {G}\) then gives
We next observe that
indeed, as \(T_{a,0}=0\), equality (4) trivially holds for \(b=0\), and for \(b\ne 0\), we have \(\langle {T_{a,b}|T}\rangle _{\textrm{HS}(\mathcal {H})}=\langle {T_{a,b}[b]|T[b]}\rangle _{\mathcal {H}}/\Vert b\Vert _{\mathcal {H}}^{2}\), with \(T_{a,b}[b]=a\Vert b\Vert _{\mathcal {H}}^{2}\). Taking \({a=b={k_{x}}}\) in (4) gives \(\langle {S_{k_{x}}| T}\rangle _{\textrm{HS}(\mathcal {H})}=\langle {k_{x}|T[k_{x}]}\rangle _{\mathcal {H}}=T[k_{x}](x)\), concluding the proof. \(\square \)
The following diagram summarises the construction of \(\Gamma \) (the \(\cong \) symbol refers to the two bijective linear isometries discussed in Remarks 2.2 and 2.3).
Through \(\Gamma \), the HS operators on \(\mathcal {H}\) belonging to \(\mathcal {I}(\Gamma )\) can be isometrically represented as functions in the RKHS \(\mathcal {G}\) associated with the squared-modulus kernel \(|K|^{2}\). In the framework of Remark 2.1, we may notice that if \(T=\sum _{i\in \mathbb {I}}\sigma _{i}T_{u_{i},v_{i}}\in \textrm{HS}(\mathcal {H})\), then \(\Gamma [T]=\sum _{i\in \mathbb {I}}\sigma _{i}u_{i}\overline{v_{i}}\).
Lemma 2.2
The following assertions hold:
-
(i)
if \(T\in \textrm{HS}(\mathcal {H})\) is self-adjoint, then the function \(\Gamma [T]\) is real-valued;
-
(ii)
if \(T\in \textrm{HS}(\mathcal {H})\) is PSD, then the function \(\Gamma [T]\) is nonnegative;
-
(iii)
if \(T\in \textrm{HS}(\mathcal {H})\) is PSD and \(\Gamma [T]=0\), then \(T=0\); and
-
(iv)
if \(T\in \mathcal {I}(\Gamma )\), then \(T^{*}\in \mathcal {I}(\Gamma )\).
Proof
Assertions (i) and (ii) follow directly form (3). We assume that \(T\in \textrm{HS}(\mathcal {H})\) is PSD, and we consider a spectral expansion \(T=\sum _{j\in \mathbb {I}}\lambda _{j}S_{\varphi _{j}}\) of T, with \(\lambda _{j}\geqslant 0\), \(\varphi _{j}\in \mathcal {H}\) and \(\mathbb {I}\subseteq \mathbb {N}\); observing that \(\Gamma [S_{\varphi _{j}}]=|\varphi _{j}|^{2}\), \(j\in \mathbb {I}\), we obtain (iii). To prove assertion (iv), we first observe that if \(g\in \mathcal {G}\), then \(\overline{g}\in \mathcal {G}\) (that is, the function \(\overline{g}\) is a vector of \(\mathcal {G}\)); the map \(\Gamma \) is indeed surjective, and if \(g=\Gamma [T]\), \(T\in \textrm{HS}(\mathcal {H})\), then \(\overline{g}=\Gamma [T^{*}]\). By linearity, the real and imaginary parts of g are then also vectors of \(\mathcal {G}\), and so \(\Vert g\Vert _{\mathcal {G}}=\Vert \overline{g}\Vert _{\mathcal {G}}\) (see for instance [17, Chapter 5]; see also Remark 2.6). Since \(\Gamma \) is a partial isometry, for \(T\in \textrm{HS}(\mathcal {H})\), we have \(\Vert T\Vert _{\textrm{HS}(\mathcal {H})}\geqslant \Vert \Gamma [T]\Vert _{\mathcal {G}}\), with equality if and only if \(T\in \mathcal {I}(\Gamma )\); as \(\Vert T\Vert _{\textrm{HS}(\mathcal {H})}=\Vert T^{*}\Vert _{\textrm{HS}(\mathcal {H})}\), the result follows. \(\square \)
Remark 2.5
The diagram (5) is also well-defined when the involved Hilbert spaces are real. We in this case have \(\mathcal {I}(\Gamma )=\overline{{{\,\textrm{span}\,}}_{\mathbb {R}}\{S_{k_{x}} | x\in \mathscr {X}\}}^{\textrm{HS}(\mathcal {H})}\) and the operators in \(\mathcal {I}(\Gamma )\) are self-adjoint; also if \(T^{*}=-T\), then \(\Gamma [T]=0\). By comparison, in the complex case, if \(T^{*}=-T\), then the function \(\Gamma [T]\) is pure-imaginary. \(\triangleleft \)
Remark 2.6
The PSD kernel \(|K|^{2}\) being real-valued, it is the reproducing kernel of a real RKHS \(\mathcal {G}_{\mathbb {R}}\) of real-valued functions on \(\mathscr {X}\). The decomposition \({\mathcal {G}=\mathcal {G}_{\mathbb {R}}+\textrm{i}\mathcal {G}_{\mathbb {R}}}\) holds, and \(\mathcal {G}_{\mathbb {R}}\) is the real-linear subspace of all real-valued functions in \(\mathcal {G}\). This decomposition mirrors the decomposition \(\textrm{HS}(\mathcal {H})=\textrm{HS}_{\mathbb {R}}(\mathcal {H})+\textrm{i}\textrm{HS}_{\mathbb {R}}(\mathcal {H})\), with \(\textrm{HS}_{\mathbb {R}}(\mathcal {H})\subset \textrm{HS}(\mathcal {H})\) the real-linear subspace of all self-adjoint HS operators on \(\mathcal {H}\). Also, the real convex cone \(\textrm{HS}_{\mathbb {R}}^{+}(\mathcal {H})\subset \textrm{HS}_{\mathbb {R}}(\mathcal {H})\) of all PSD HS operators on \(\mathcal {H}\) is generating in \(\textrm{HS}_{\mathbb {R}}(\mathcal {H})\), and the real convex cone \(\mathcal {G}_{\mathbb {R}}^{+}\subset \mathcal {G}_{\mathbb {R}}\) of all nonnegative functions in \(\mathcal {G}_{\mathbb {R}}\) is generating in \(\mathcal {G}_{\mathbb {R}}\). \(\triangleleft \)
Remark 2.7
Let \(\mathcal {F}\) be another separable RKHS of complex-valued functions \(\hbox { on}\ \mathscr {X}\), with reproducing kernel \(J:\mathscr {X}\times \mathscr {X}\rightarrow \mathbb {C}\). We denote by \(\textrm{HS}(\mathcal {F}, \mathcal {H})\) the Hilbert space of all HS operators from \(\mathcal {F}\) to \(\mathcal {H}\), and let \(\mathcal {H}\odot \overline{\mathcal {F}}\) be the product of the RKHSs \(\mathcal {H}\) and \(\overline{\mathcal {F}}\), that is, the RKHS with kernel \(K\overline{J}\). Following (5), we can more generally define a natural coisometry from \(\textrm{HS}(\mathcal {F}, \mathcal {H})\) onto \(\mathcal {H}\odot \overline{\mathcal {F}}\). \(\triangleleft \)
3 Trace-class integral operators with PSD kernels
From Lemma 2.1, if \(T\in \textrm{HS}(\mathcal {H})\) is of the form \(T=\sum _{j=1}^{n}\omega _{j}S_{k_{s_{j}}}\), with \(n\in \mathbb {N}\), \(s_{j}\in \mathscr {X}\) and \(\omega _{j}\in \mathbb {C}\), then \(T\in \mathcal {I}(\Gamma )\). We in this case have
so that T can be regarded as an integral operator on \(\mathcal {H}\) defined by the kernel K and the finitely-supported measure \(\sum _{j=1}^{n}\omega _{j}\delta _{s_{j}}\), with \(\delta _{x}\) the Dirac measure at \({x\in \mathscr {X}}\). We also have \(\Gamma [T]=\sum _{j=1}^{n}\omega _{j}|k_{s_{j}}|^{2}\), so that \(\langle {\Gamma [T]| g}\rangle _{\mathcal {G}}=\sum _{j=1}^{n}\overline{\omega _{j}}g(s_{j})\), \(g\in \mathcal {G}\), and \(\Gamma [T]\) is thus the Riesz representation of the integral functional on \(\mathcal {G}\) defined by the measure \(\sum _{j=1}^{n}\overline{\omega _{j}}\delta _{s_{j}}\). Under measurability conditions, this observation holds for all trace-class integral operators on \(\mathcal {H}\) defined by the reproducing kernel K of \(\mathcal {H}\) and general measures on \(\mathscr {X}\), as illustrated below.
3.1 Integral operators and kernel embedding of measures
Let \(\mathcal {A}\) be a \(\sigma \)-algebra of subsets of \(\mathscr {X}\). We consider the Borel \(\sigma \)-algebra of \(\mathbb {C}\), and make the following assumptions on K and the measurable space \((\mathscr {X},\mathcal {A})\):
-
A.1
for all \(t\in \mathscr {X}\), the function \(k_{t}:\mathscr {X}\rightarrow \mathbb {C}\) is measurable;
-
A.2
the diagonal of K is measurable.
We recall that \(K(t,t)=\Vert k_{t}\Vert _{\mathcal {H}}^{2} =\Vert S_{k_{t}}\Vert _{\textrm{HS}(\mathcal {H})}=\Vert |k_{t}|^{2} \Vert _{\mathcal {G}}, t\in \mathscr {X}\).
Remark 3.1
The RKHSs \(\mathcal {H}\) and \(\mathcal {G}\) being separable, A.1 ensures that all the functions in \(\mathcal {H}\) and \(\mathcal {G}\) are measurable; see for instance [25, Lemma 4.24]. Consequently, under A.1, the maps \(t\mapsto k_{t}\), \(t\mapsto |k_{t}|^{2}\) and \(t\mapsto S_{k_{t}}\), \(t\in \mathscr {X}\), are weakly-measurable, and since the Hilbert spaces \(\mathcal {H}\), \(\mathcal {G}\) and \(\textrm{HS}(\mathcal {H})\) are separable, by the Pettis measurability theorem, these maps are also strongly-measurable (see e.g. [8, 28]). \(\triangleleft \)
We denote by \(\mathcal {M}_{+}\), \(\mathcal {M}\), and \(\mathcal {M}_{\mathbb {C}}\) the set of all nonnegative, signed and complex measuresFootnote 1 on \((\mathscr {X},\mathcal {A})\), and we set \(\mathcal {M}_{\mathbb {F}}=\mathcal {M}\cup \mathcal {M}_{\mathbb {C}}\) (we have \(\mathcal {M}_{+}\subset \mathcal {M}\)). Noticing that \(K(t,t)\geqslant 0\), \(t\in \mathscr {X}\), from A.2, we define
We next introduce the sets \(\mathcal {T}_{+}(K)\), \(\mathcal {T}(K)\) and \(\mathcal {T}_{\mathbb {C}}(K)\) of all measures \(\mu \) in \(\mathcal {M}_{+}\), \(\mathcal {M}\), and \(\mathcal {M}_{\mathbb {C}}\) such that \(\tau _{\mu }\) is finite, respectively; the inclusion \(\mathcal {T}_{+}(K)\subset \mathcal {T}(K)\) holds, and we set \(\mathcal {T}_{\mathbb {F}}(K)=\mathcal {T}(K)\cup \mathcal {T}_{\mathbb {C}}(K)\).
Integral operators on \(\mathcal {H}\) with kernel K. By assumption, for \({\mu \in \mathcal {T}_{\mathbb {F}}(K)}\), the integral \(\int _{\mathscr {X}}\Vert S_{k_{t}}\Vert _{\textrm{HS}(\mathcal {H})}\textrm{d}|\mu |(t)=\tau _{\mu }\) is finite, and the map \(t\mapsto S_{k_{t}}\) is thus Bochner-integrable with respect to \(\mu \) (Bochner integrability criterion, see e.g. [8, 28]; see also Remark 3.1). We set
From (4), for \(h\in \mathcal {H}\) and \(x\in \mathscr {X}\), we have
so that \(L_{\mu }\in \textrm{HS}(\mathcal {H})\) can be regarded as an integral operator on \(\mathcal {H}\) defined by the kernel K and the measure \(\mu \).
Remark 3.2
For \(\mu \in \mathcal {T}_{\mathbb {F}}(K)\), the operator \(L_{\mu }\in \textrm{HS}(\mathcal {H})\) is the Riesz representation of the bounded linear functional \(Z_{\mu }:\textrm{HS}(\mathcal {H})\rightarrow \mathbb {C}\) given by
that is, \(Z_{\mu }(T)=\langle {L_{\mu }|T}\rangle _{\textrm{HS}(\mathcal {H})}\); from the CS inequality in \(\textrm{HS}(\mathcal {H})\), we in particular have \(|Z_{\mu }(T)| \leqslant \int _{\mathscr {X}}|\langle {S_{k_{t}}|T}\rangle _{\textrm{HS}(\mathcal {H})}|\textrm{d}|\mu |(t) \leqslant \Vert T\Vert _{\textrm{HS}(\mathcal {H})}\tau _{\mu }\).
By boundedness of the linear evaluation map \(T\mapsto T[h]\) from \(\textrm{HS}(\mathcal {H})\) to \(\mathcal {H}\), we obtain (see for instance [28, Chapter 5])
with, from the CS inequality in \(\mathcal {H}\), \(\int _{\mathscr {X}}\Vert k_{t}\Vert _{\mathcal {H}}|h(t)|\textrm{d}|\mu |(t)\leqslant \Vert h\Vert _{\mathcal {H}}\tau _{\mu }\). We also have
so that \(L_{\mu }[h]\) is the Riesz representation of the bounded linear functional \(\Theta _{h,\mu }\) on \(\mathcal {H}\), with \(\Theta _{h,\mu }:f\mapsto \int _{\mathscr {X}}\overline{h(t)}f(t)\textrm{d}\overline{\mu }(t)\), \(f\in \mathcal {H}\) (and \(|\Theta _{h,\mu }(f)|\leqslant \Vert h\Vert _{\mathcal {H}}\Vert f\Vert _{\mathcal {H}}\tau _{\mu }\)). \(\triangleleft \)
Lemma 3.1
For all \(\mu \in \mathcal {T}_{\mathbb {F}}(K)\), the operator \(L_{\mu }\) is trace-class.
Proof
Let \(\{h_{i}\}_{i\in \mathbb {I}}\) be an orthonormal basis (ONB) of \(\mathcal {H}\), with \(\mathbb {I}\subseteq \mathbb {N}\). For all \(t\in \mathscr {X}\), we have \(k_{t}=\sum _{i\in \mathbb {I}}h_{i}\overline{h_{i}(t)}\), so that \(\{h_{i}(t)\}_{i\in \mathbb {I}}\in \ell ^{2}(\mathbb {I})\) and \(\sum _{i\in \mathbb {I}}|h_{i}(t)|^{2}=K(t,t)\); see e.g. [17, Chapter 2]. Let \(\{f_{i}\}_{i\in \mathbb {I}}\) be another ONB of \(\mathcal {H}\); from (6), and by monotone convergence and the CS inequality in \(\ell ^{2}(\mathbb {I})\), we obtain
so that \({{\,\textrm{trace}\,}}(|L_{\mu }|)\leqslant \tau _{\mu }\), with \(|L_{\mu }|=(L_{\mu }^{*}L_{\mu })^{1/2}\) the modulus of \(L_{\mu }\). \(\square \)
Kernel embedding of measures in \(\mathcal {G}\). By assumption again, for \(\mu \in \mathcal {T}_{\mathbb {F}}(K)\), the integral \(\int _{\mathscr {X}}\Vert |k_{t}|^{2}\Vert _{\mathcal {G}}\textrm{d}|\mu |(t)\) is finite, and the map \(t\mapsto |k_{t}|^{2}\) is therefore Bochner-integrable with respect to \(\mu \). We set
We have \(\langle {g_{\mu }|g}\rangle _{\mathcal {G}} =\int _{\mathscr {X}}g(t)\textrm{d}\overline{\mu }(t)\), \(g\in \mathcal {G}\), so that \(g_{\mu }\) is the Riesz representation of the linear functional \(I_{\mu }:\mathcal {G}\rightarrow \mathbb {C}\), with \(I_{\mu }(g)=\int _{\mathscr {X}}g(t)\textrm{d}\overline{\mu }(t)\); we may observe that \(| I_{\mu }(g) |\leqslant \Vert g\Vert _{\mathcal {G}}\tau _{\mu }\), and that \(g_{\mu }(x)=\int _{\mathscr {X}}|K(x,t)|^{2}\textrm{d}\mu (t)\), \(x\in \mathscr {X}\). The vector \(g_{\mu }\) is referred to as the kernel embedding, or potential, of the measure \(\mu \) in the RHKS \(\mathcal {G}\); see for instance [6, 15, 24].
Theorem 3.1
For all \(\mu \in \mathcal {T}_{\mathbb {F}}(K)\), we have \(L_{\mu }\in \mathcal {I}(\Gamma )\) and \(\Gamma [L_{\mu }]=g_{\mu }\).
Proof
From Lemma 2.1 and by definition of \(L_{\mu }\) and \(g_{\mu }\), for all \(T\in \textrm{HS}(\mathcal {H})\), we have
so that \(\Gamma ^{*}[g_{\mu }]=L_{\mu }\). \(\square \)
Remark 3.3
Following Lemma 2.2, for a signed measure \(\mu \in \mathcal {T}(K)\), the function \(g_{\mu }\) is real-valued, and the operator \(L_{\mu }=\Gamma ^{*}[g_{\mu }]\) is self-adjoint. Also, for a nonnegative measure \(\mu \in \mathcal {T}_{+}(K)\), the function \(g_{\mu }\) is nonnegative, and the operator \(L_{\mu }\) is PSD. We may notice that \(L_{\delta _{x}}=S_{k_{x}}\), \(x\in \mathscr {X}\). \(\triangleleft \)
From Theorem 3.1, for \(\mu \) and \(\nu \in \mathcal {T}_{\mathbb {F}}(K)\), the following equalities hold:
relating the evaluation of inner products in \(\textrm{HS}(\mathcal {H})\) between trace-class integral operators with kernel K to the integration of potentials in \(\mathcal {G}\).
3.2 Quadrature approximation
Let \(B_{\mathcal {G}}=\{g\in \mathcal {G}| \Vert g\Vert _{\mathcal {G}} \leqslant 1 \}\) be the closed unit ball of \(\mathcal {G}\). We set
The map \(\mathfrak {M}_{\mathcal {G}}\) defines a pseudometric on \(\mathcal {T}_{\mathbb {F}}(K)\); for probability measures, such pseudometrics are referred to as integral probability metrics, or maximum mean discrepancies; see for instance [15, 16, 23, 24, 27].
The following Corollary 3.1 describes the equivalence between the quadrature approximation of trace-class integral operators with PSD kernels and the approximation of integral functionals on RKHSs with squared-modulus kernels.
Corollary 3.1
For all \(\mu \) and \(\nu \in \mathcal {T}_{\mathbb {F}}(K)\), we have \(\Vert L_{\mu } - L_{\nu }\Vert _{\textrm{HS}(\mathcal {H})} =\mathfrak {M}_{\mathcal {G}}(\mu ,\nu )\).
Proof
Form Theorem 3.1 and by linearity of \(\Gamma ^{*}\), we have \(\Gamma ^{*}[g_{\mu }-g_{\nu }]=L_{\mu }-L_{\nu }\). Since \(\Gamma ^{*}\) is an isometry, it follows that \(\Vert L_{\mu } - L_{\nu }\Vert _{\textrm{HS}(\mathcal {H})}=\Vert g_{\mu }-g_{\nu }\Vert _{\mathcal {G}}\). The CS inequality in \(\mathcal {G}\) and the definition of \(g_{\mu }\) and \(g_{\nu }\) then give
We conclude by observing that for all \(g\in \mathcal {G}\), we have \(\overline{g}\in \mathcal {G}\) and \(\Vert \overline{g}\Vert _{\mathcal {G}}=\Vert g\Vert _{\mathcal {G}}\) (see the proof of Lemma 2.2), so that \(\Vert g_{\mu }-g_{\nu }\Vert _{\mathcal {G}}=\mathfrak {M}_{\mathcal {G}}(\mu ,\nu )\). \(\square \)
3.3 Further properties
In this section and in anticipation of the forthcoming developments, we discuss some further properties verified by the integral operators considered in Theorem 3.1.
For \(\mu \in \mathcal {T}_{+}(K)\), let \(L^{2}(\mu )\) be the Hilbert space of all square-integrable functions with respect to \(\mu \). From the CS inequality in \(\mathcal {H}\), we have
so that the linear embedding \(\iota _{\mu }^{}:\mathcal {H}\rightarrow L^{2}(\mu )\), with \(\iota _{\mu }^{}[h]\) the equivalence class of all measurable functions \(\mu \)-almost everywhere equal to h, is bounded (see e.g. [26]).
Lemma 3.2
For all \(\mu \in \mathcal {T}_{+}(K)\), the map \(\iota _{\mu }^{}\) is HS and \(L_{\mu }=\iota _{\mu }^{*}\iota _{\mu }^{}\).
Proof
Let \(\{h_{i}\}_{i\in \mathbb {I}}\) be an ONB of \(\mathcal {H}\), with \(\mathbb {I}\subseteq \mathbb {N}\). As \(K(t,t)=\sum _{i\in \mathbb {I}}\big |h_{i}(t)\big |^{2}\), \(t\in \mathscr {X}\) (see e.g. [17]), from (6) and by monotone convergence, we have
so that \(\iota _{\mu }^{}\) is HS. From (6), we also obtain
and so \(L_{\mu }=\iota _{\mu }^{*}\iota _{\mu }^{}\). \(\square \)
For \(\nu \in \mathcal {T}_{\mathbb {F}}(K)\), we by definition have \(|\nu |\in \mathcal {T}_{+}(K)\) and \(\overline{\nu }\in \mathcal {T}_{\mathbb {F}}(K)\); from (6), we also have \(L_{\nu }^{*}=L_{\overline{\nu }}\). The following relation (Lemma 3.3) holds between the range of \(L_{\nu }\) and the range of \(L_{|\nu |}\).
Lemma 3.3
For all \(\nu \in \mathcal {T}_{\mathbb {F}}(K)\), we have \(\overline{{{\,\textrm{range}\,}}(L_{\nu })}^{\mathcal {H}}\subseteq \overline{{{\,\textrm{range}\,}}(L_{|\nu |})}^{\mathcal {H}}\).
Proof
From (6) and the CS inequality in \(L^{2}(|\nu |)\), we obtain
where (8) ensures that the embedding \(\iota _{|\nu |}^{}\) is well-defined. From Lemma 3.2, we have \({{\,\textrm{null}\,}}(L_{\mu })=\{h\in \mathcal {H}|\iota _{\mu }[h]=0\}\); inequality (9) then entails \({{\,\textrm{null}\,}}(L_{|\nu |})\subseteq {{\,\textrm{null}\,}}(L_{\overline{\nu }})\), and so \({{\,\textrm{null}\,}}(L_{\overline{\nu }})^{\perp _{\mathcal {H}}}\subseteq {{\,\textrm{null}\,}}(L_{|\nu |})^{\perp _{\mathcal {H}}}\). Recalling that \({{\,\textrm{null}\,}}(T^{*})={{\,\textrm{range}\,}}(T)^{\perp _{\mathcal {H}}}\), \(T\in \textrm{HS}(\mathcal {H})\), we conclude by noticing that \(L_{\overline{\nu }}^{*}=L^{\,}_{\nu }\) and \(L_{|\nu |}^{*}=L^{\,}_{|\nu |}\). \(\square \)
Lemma 3.4 illustrates that when the measure \(\nu \) is finitely-supported, the range of \(L_{|\nu |}\) is fully characterised by the support of \(\nu \).
Lemma 3.4
For \(\nu =\sum _{i=1}^{n}\upsilon _{i}\delta _{s_{i}}\), with \(n\in \mathbb {N}\), \(\upsilon _{i}\in \mathbb {C}\), \(\upsilon _{i}\ne 0\), and \(s_{i}\in \mathscr {X}\), we have \({{\,\textrm{range}\,}}(L_{|\nu |})={{\,\textrm{span}\,}}_{\mathbb {C}}\{k_{s_{1}}, \ldots , k_{s_{n}}\}\).
Proof
We have \(|\nu |=\sum _{i=1}^{n}|\upsilon _{i}|\delta _{s_{i}}\in \mathcal {T}_{+}(K)\), and \(L_{|\nu |}\) is PSD. From Lemma 3.2, we obtain that \({{{\,\textrm{null}\,}}(L_{|\nu |}) =\{h\in \mathcal {H}| \iota _{|\nu |}^{}[h]=0\} =\textstyle {\bigcap _{i=1}^{n}}\{h\in \mathcal {H}| \langle {k_{s_{i}}| h}\rangle _{\mathcal {H}}=0\}}\), and so \({{\,\textrm{null}\,}}(L_{|\nu |})^{\perp _{\mathcal {H}}}={{\,\textrm{span}\,}}_{\mathbb {C}}\{k_{s_{1}}, \ldots , k_{s_{n}}\}\). Observing that \(L_{|\nu |}\) is self-adjoint, the result follows. \(\square \)
4 Measures and projection-based approximations
In this section, we illustrate the extent to which the the approximation of potentials in \(\mathcal {G}\) can be used as a surrogate for the characterisation of closed linear subspaces of \(\mathcal {H}\) for the approximation of \(L_{\mu }\in \textrm{HS}(\mathcal {H})\) through projections (see Remark 4.1).
4.1 Additional notations and general properties
For a closed linear subspace \(\mathcal {H}_{S}\) of \(\mathcal {H}\), we denote by \(P_{S}\) the orthogonal projection from \(\mathcal {H}\) onto \(\mathcal {H}_{S}\). Endowed with the Hilbert structure of \(\mathcal {H}\), the vector space \(\mathcal {H}_{S}\) is a RKHS, and its reproducing kernel \(K_{S}\) verifies \(K_{S}(x,t)=P_{S}[k_{t}](x)\), x and \(t\in \mathscr {X}\).
Remark 4.1
The linear map \(T\mapsto P_{S}T\) is the orthogonal projection from \(\textrm{HS}(\mathcal {H})\) onto \(\mathcal {R}(\mathcal {H}_{S})=\{T\in \textrm{HS}(\mathcal {H}) | {{\,\textrm{range}\,}}(T)\subseteq \mathcal {H}_{S}\}\), the closed linear subspace of \(\textrm{HS}(\mathcal {H})\) of all operators with range included in \(\mathcal {H}_{S}\). Also, the linear map \(T\mapsto TP_{S}\) is the orthogonal projection from \(\textrm{HS}(\mathcal {H})\) onto \({\mathcal {Z}(\mathcal {H}_{S})=\{T\in \textrm{HS}(\mathcal {H}) | {{\,\textrm{range}\,}}(T^{*})\subseteq \mathcal {H}_{S}\}}\). The two orthogonal projections \(T\mapsto P_{S}T\) and \(T\mapsto TP_{S}\) commute, and their composition, that is, the linear map \(T\mapsto P_{S}TP_{S}\), is the orthogonal projection from \(\textrm{HS}(\mathcal {H})\) onto \(\mathcal {R}(\mathcal {H}_{S})\cap \mathcal {Z}(\mathcal {H}_{S})\). As \((P_{S}T)^{*}=T^{*}P_{S}\), the orthogonal projections onto \(\mathcal {R}(\mathcal {H}_{S})\) and \(\mathcal {Z}(\mathcal {H}_{S})\) are intrinsically related; for this reason, in what follows, we mainly focus on approximations of the form \(P_{S}T\) and \(P_{S}TP_{S}\). By orthogonality, for all \(T\in \textrm{HS}(\mathcal {H})\), we have
with \(\Vert P_{S}TP_{S}\Vert _{\textrm{HS}(\mathcal {H})} \leqslant \Vert P_{S}T\Vert _{\textrm{HS}(\mathcal {H})} \leqslant \Vert T\Vert _{\textrm{HS}(\mathcal {H})}\); in particular, if T is self-adjoint, then so is \(P_{S}TP_{S}\). \(\triangleleft \)
Lemma 4.1
Let \(\mathcal {H}_{S}\) and \(\mathcal {H}_{R}\) be two closed linear subspaces of \(\mathcal {H}\), with \(\mathcal {H}_{R}\subseteq \mathcal {H}_{S}\). For all \(T\in \textrm{HS}(\mathcal {H})\), we have
Proof
We denote by \(\mathcal {H}_{e}\) the orthogonal complement of \(\mathcal {H}_{R}\) in \(\mathcal {H}_{S}\). We then have \(P_{S}=P_{R}+P_{e}\) and \(\langle {P_{R}T| P _{e}\tilde{T}}\rangle _{\text {HS}(\mathcal {H})}=\langle {TP_{R}|\tilde{T}P_{e}}\rangle _{\text {HS}(\mathcal {H})}=0\), T and \(\tilde{T}\in \textrm{HS}(\mathcal {H})\). We hence obtain \(\Vert P_{S}T\Vert _{\textrm{HS}(\mathcal {H})}^{2}=\Vert P_{R}T\Vert _{\mathcal {H}}^{2}+\Vert P_{e}T\Vert _{\textrm{HS}(\mathcal {H})}^{2}\), and
completing the proof. \(\square \)
By boundedness of \(P_{S}\), for \(\mu \in \mathcal {T}_{\mathbb {F}}(K)\), we have \(P_{S}L_{\mu }=\int _{\mathscr {X}}P_{S}S_{k_{t}}\textrm{d}\mu (t)\), and so
The operator \(P_{S}L_{\mu }\in \textrm{HS}(\mathcal {H})\) can thus be regarded as an integral operator on \(\mathcal {H}\) defined by the kernel \(K_{S}\) and the measure \(\mu \). Since \(K_{S}(t,t)\leqslant K(t,t)\), \(t\in \mathscr {X}\), we may notice that \(\mathcal {T}_{\mathbb {F}}(K)\subseteq \mathcal {T}_{\mathbb {F}}(K_{S})\). We have
see (10), (11) and Lemma A.1 in Appendix A for a detailed computation (see also Remark 4.2 for an alternative computation involving \(\Gamma \)).
Remark 4.2
Let \(\mathcal {H}_{U}\) and \(\mathcal {H}_{V}\) be two closed linear subspaces of \(\mathcal {H}\). For \(\mu \in \mathcal {T}_{\mathbb {F}}(K)\) and \(x\in \mathscr {X}\), we have
From Theorem 3.1 and the properties of \(\Gamma \), we then obtain
For general subspaces \(\mathcal {H}_{U}\) and \(\mathcal {H}_{V}\), the operator \(P_{V}L_{\mu }P_{U}\) does not necessarily belong to \(\mathcal {I}(\Gamma )\); see Remark 4.3 for an example where this situation occurs. \(\triangleleft \)
4.2 Projections defined by measures
Motivated by Lemmas 3.3 and 4.1, for \(\nu \in \mathcal {T}_{\mathbb {F}}(K)\), we set \(\mathcal {H}_{\nu }=\overline{{{\,\textrm{range}\,}}(L_{|\nu |})}^{\mathcal {H}}\), and we denote by \(P_{\nu }\) the orthogonal projection from \(\mathcal {H}\) onto \(\mathcal {H}_{\nu }\).
Lemma 4.2
For all \(\nu \in \mathcal {T}_{\mathbb {F}}(K)\), we have \(L_{\nu }=P_{\nu }L_{\nu }=L_{\nu }P_{\nu }=P_{\nu }L_{\nu }P_{\nu }\).
Proof
From Lemma 3.3, we have \(L_{\nu }=P_{\nu }L_{\nu }\) and \(L_{\overline{\nu }}=P_{\nu }L_{\overline{\nu }}\). We then obtain \(L_{\nu }=L_{\overline{\nu }}^{*}=(P_{\nu }L_{\overline{\nu }})^{*}=L_{\nu }P_{\nu }\), and so \(L_{\nu }=P_{\nu }L_{\nu }P_{\nu }\). \(\square \)
For an initial operator \(L_{\mu }\), with \(\mu \in \mathcal {T}_{\mathbb {F}}(K)\), through the orthogonal projection \(P_{\nu }\) and in addition to \(L_{\nu }\), an approximate measure \(\nu \in \mathcal {T}_{\mathbb {F}}(K)\) also defines the approximations \(P_{\nu }L_{\mu }\), \(L_{\mu }P_{\nu }\) or \(P_{\nu }L_{\mu }P_{\nu }\) of \(L_{\mu }\).
Lemma 4.3
For all \(\mu \) and \(\nu \in \mathcal {T}_{\mathbb {F}}(K)\), we have
Proof
Using the notations of Remark 4.1, Lemma 4.2 reads \(L_{\nu }\in \mathcal {R}(\mathcal {H}_{\nu })\cap \mathcal {Z}(\mathcal {H}_{\nu })\). Observing that \(L_{\mu }-P_{\nu }L_{\mu }\) is orthogonal to \(\mathcal {R}(\mathcal {H}_{\nu })\) and that \(P_{\nu }L_{\mu }-L_{\nu }\in \mathcal {R}(\mathcal {H}_{\nu })\), we obtain (14). In the same way, \(L_{\mu }-P_{\nu }L_{\mu }P_{\nu }\) is orthogonal to \(\mathcal {R}(\mathcal {H}_{\nu })\cap \mathcal {Z}(\mathcal {H}_{\nu })\), and we have \(P_{\nu }L_{\mu }P_{\nu }-L_{\nu }\in \mathcal {R}(\mathcal {H}_{\nu })\cap \mathcal {Z}(\mathcal {H}_{\nu })\), leading to (15). \(\square \)
4.3 Error maps on sets of measures
In the framework of Sects. 3.2 and 4.2, the characterisation of measures leading to accurate approximations of an initial operator \(L_{\mu }\), \(\mu \in \mathcal {T}_{\mathbb {F}}(K)\), relates to the minimisation of error maps measuring the accuracy of the approximations induced by a measure \(\nu \in \mathcal {T}_{\mathbb {F}}(K)\).
Quadrature approximation. We define the error map \({D_{\mu }:\mathcal {T}_{\mathbb {F}}(K)\rightarrow \mathbb {R}_{\geqslant 0}}\), with
Lemma 4.4
For \(\mu \in \mathcal {T}_{\mathbb {F}}(K)\), the map \(D_{\mu }\) is convex on any convex set \(\mathscr {C}\subseteq \mathcal {T}_{\mathbb {F}}(K)\). For \(\nu \) and \(\eta \in \mathscr {C}\), the directional derivative of \(D_{\mu }\) at \(\nu \) along \(\eta -\nu \) is
Proof
For \(\xi =(1-\rho )\nu +\rho \eta \), \(\nu \) and \(\eta \in \mathscr {C}\), \(\rho \in [0,1]\), we have \(g_{\xi }=(1-\rho )g_{\nu }+\rho g_{\eta }\); the convexity of \(D_{\mu }\) on \(\mathscr {C}\) then follows from the convexity of the map \(g\mapsto \Vert g_{\mu }-g\Vert _{\mathcal {G}}^{2}\) on \(\mathcal {G}\). Next, the expansion of the squared norm \(\Vert g_{\mu }-g_{\nu }-\rho (g_{\eta }-g_{\nu })\Vert _{\mathcal {G}}^{2}\) provides the expected expression for the directional derivatives of \(D_{\mu }\). \(\square \)
Projection-based approximation. We denote by \(C^{\textrm{P}}_{\mu }\) and \(C^{\textrm{PP}}_{\mu }:\mathcal {T}_{\mathbb {F}}(K)\rightarrow \mathbb {R}_{\geqslant 0}\) the error maps defined as
we may notice that \(C^{\textrm{X}}_{\mu }(\nu )=C^{\textrm{X}}_{\mu }(|\nu |)\), \(\textrm{X}\in \{\textrm{P},\textrm{PP}\}\).
Theorem 4.1
For \(\mu \in \mathcal {T}_{\mathbb {F}}(K)\) and \(\textrm{X}\in \{\textrm{P},\textrm{PP}\}\), the map \(C^{\textrm{X}}_{\mu }\) is convex on the real convex cone \(\mathcal {T}_{+}(K)\), and for all \(\nu \) and \(\eta \in \mathcal {T}_{+}(K)\), we have
Proof
For \(\nu ,\eta \in \mathcal {T}_{+}(K)\) and \(\rho \in (0,1)\), we set \(\xi =\nu +\rho (\eta -\nu )\in \mathcal {T}_{+}(K)\). The three operators \(L_{\nu }\), \(L_{\eta }\) and \(L_{\xi }\) being PSD, independently of \(\rho \in (0,1)\), we have \({{\,\textrm{null}\,}}(L_{\xi })={{\,\textrm{null}\,}}(L_{\nu })\cap {{\,\textrm{null}\,}}(L_{\eta })\), and so \(\mathcal {H}_{\xi }=\overline{\mathcal {H}_{\nu }+\mathcal {H}_{\eta }}^{\mathcal {H}}\). The two maps
are therefore constant on the open interval (0, 1). From Lemma 4.1 and (10), noticing that \(\mathcal {H}_{\nu }\subseteq \mathcal {H}_{\xi }\) and \(\mathcal {H}_{\eta }\subseteq \mathcal {H}_{\xi }\), we obtain \(C^{\textrm{P}}_{\mu }(\xi )\leqslant C^{\textrm{P}}_{\mu }(\nu )\) and \(C^{\textrm{P}}_{\mu }(\xi )\leqslant C^{\textrm{P}}_{\mu }(\eta )\); from (11), we also get \(C^{\textrm{PP}}_{\mu }(\xi )\leqslant C^{\textrm{PP}}_{\mu }(\nu )\) and \(C^{\textrm{PP}}_{\mu }(\xi )\leqslant C^{\textrm{PP}}_{\mu }(\eta )\), concluding the proof. \(\square \)
In view of Theorem 4.1, the maps \(C^{\textrm{P}}_{\mu }\) and \(C^{\textrm{PP}}_{\mu }\) are akin to piecewise-constant functions. By contrast (see Lemma 4.4), the directional derivatives of the map \(D_{\mu }\) are informative, in the sense that the landscape of \(D_{\mu }\) can be explored through steepest descents. From Remark 4.1 and Lemma 4.3, we have
with \(C^{\textrm{P}}_{\mu }(\mu )=D_{\mu }(\mu )=0\) and \(C^{\textrm{P}}_{\mu }(0)=D_{\mu }(0)=\Vert g_{\mu }\Vert _{\mathcal {G}}^{2}\) (see also Remark 4.3). The quadrature-approximation error map \(D_{\mu }\) may hence be regarded as a differentiable relaxation of the projection-based-approximation error maps \(C^{\textrm{P}}_{\mu }\) and \(C^{\textrm{PP}}_{\mu }\); see Fig. 1 for an illustration.
Remark 4.3
For \(\mu \in \mathcal {T}_{\mathbb {F}}(K)\) and \(s\in \mathscr {X}\), introducing \(c_{\delta _{s}}=g_{\mu }(s)/|K(s,s)|^{2}\) if \(K(s,s)>0\), and \(c_{\delta _{s}}=0\) otherwise, we have \(P_{\delta _{s}}L_{\mu }P_{\delta _{s}}=c_{\delta _{s}}S_{k_{s}}\). For \(K(s,s)=0\), we indeed have \(k_{s}=0\), and so \(P_{\delta _{s}}=0\), and for \(K(s,s)>0\),
We obtain \(P_{\delta _{s}}L_{\mu }P_{\delta _{s}}\in \mathcal {I}(\Gamma )\) and \(C^{\textrm{PP}}_{\mu }(\delta _{s})=D_{\mu }(c_{\delta _{s}}\delta _{s})\), \(s\in \mathscr {X}\). \(\triangleleft \)
Remark 4.4
From a numerical standpoint, in view of (12) and (13), for \(\nu \in \mathcal {T}_{\mathbb {F}}(K)\), the evaluation of \(C^{\textrm{P}}_{\mu }(\nu )\) or \(C^{\textrm{PP}}_{\mu }(\nu )\) requires a suitable characterisation of the reproducing kernel \(K_{\nu }\) of \(\mathcal {H}_{\nu }\) (or equivalently, of the orthogonal projection \(P_{\nu }\)); in practice, \(K_{\nu }\) is a priori unknown and needs to be computed from K and \(\nu \) (see Remark 4.5). In comparison and in view of (7), the error map \(D_{\mu }\) only involves the kernel K; the projection-free nature of \(D_{\mu }\) is of notable interest for numerical applications. \(\triangleleft \)
Remark 4.5
Following Lemma 3.4, for a measure \(\nu \) supported by \({\mathcal {S}=\{s_{1},\ldots ,s_{n}\}}\), \(n\in \mathbb {N}\), the reproducing kernel \(K_{\nu }\) of \(\mathcal {H}_{\nu }\) can be expressed as
where \(\varkappa _{i,j}\) is the i, j entry of the pseudoinverse (Moore-Penrose inverse) of the \({n\times n}\) kernel matrix with i, j entry \(K(s_{i},s_{j})\). The worst-case time complexity of the evaluation of \(K_{\nu }\) at \(M\in \mathbb {N}\) distinct locations in \(\mathscr {X}\times \mathscr {X}\) is thus \(\mathcal {O}(n^{3}+n^{2}M)\). The term \(\mathcal {O}(n^{3})\) is related to the pseudoinversion of the kernel matrix defined by K and \(\mathcal {S}\), while the term \(\mathcal {O}(n^{2}M)\) corresponds to the evaluation, from this pseudoinverse and the kernel K, of \(K_{\nu }\) at M different locations. \(\triangleleft \)
5 Nonnegative measures and \(L^{2}\)-embeddings
Following Sect. 3.3, for \(\mu \in \mathcal {T}_{+}(K)\), the embedding \(\iota _{\mu }^{}:\mathcal {H}\rightarrow L^{2}(\mu )\) is HS. For \(f\in L^{2}(\mu )\) and \(x\in \mathscr {X}\), we have
so that in addition to \(L_{\mu }=\iota _{\mu }^{*}\iota _{\mu }^{}\in \textrm{HS}(\mathcal {H})\), the three operators
can also be regarded as integral operators defined by the kernel K and the nonnegative measure \(\mu \). These four interpretations are inherent to K, which characterises \(\mathcal {H}\), and \(\mu \), which characterises \(L^{2}(\mu )\); see for instance [4, 19, 20, 22, 26] for illustrations. In each case, the corresponding operator is HS, and we denote by \(\textrm{HS}(\mu ,\mathcal {H})\), \(\textrm{HS}(\mu )\) and \(\textrm{HS}(\mathcal {H},\mu )\) the Hilbert spaces of all HS operators from \(L^{2}(\mu )\) to \(\mathcal {H}\), on \(L^{2}(\mu )\), and from \(\mathcal {H}\) to \(L^{2}(\mu )\), respectively.
5.1 Partial \(L^{2}\)-embeddings
For a closed linear subspace \(\mathcal {H}_{S}\subseteq \mathcal {H}\), the embedding \(\iota _{\mu }^{}\) can be approximated by the partial embedding \(\iota _{\mu }^{}P_{S}\). For \(f\in L^{2}(\mu )\) and \(x\in \mathscr {X}\), we have
so that \(P_{S}\iota _{\mu }^{*}\) corresponds to an integral operator with kernel \(K_{S}\). In the decomposition \(L_{\mu }=\iota _{\mu }^{*}\iota _{\mu }^{}\in \textrm{HS}(\mathcal {H})\), substituting each \(\iota _{\mu }^{}\) with \(\iota _{\mu }^{}P_{S}\) gives the approximation \(P_{S}\iota _{\mu }^{*}\iota _{\mu }^{}P_{S}\) discussed in Sect. 4. For the operators defined in (17), a similar substitution yields the approximations
see also Remark 5.1. In what follows, we mainly focus on the approximations related to \(\textrm{HS}(\mu ,\mathcal {H})\) and \(\textrm{HS}(\mu )\); the case of \(\textrm{HS}(\mathcal {H},\mu )\) is more briefly discussed in Remark 5.2.
Remark 5.1
In addition to \(P_{S}\iota _{\mu }^{*}\iota _{\mu }^{}P_{S}\), the approximation of \(\iota _{\mu }^{}\) by \(\iota _{\mu }^{}P_{S}\) gives rise to the approximations \(P_{S}\iota _{\mu }^{*}\iota _{\mu }^{}\) and \(\iota _{\mu }^{*}\iota _{\mu }^{}P_{S}\) of \(\iota _{\mu }^{*}\iota _{\mu }^{}\in \textrm{HS}(\mathcal {H})\) (see Sect. 4). Similarly, for \(\iota _{\mu }^{}\iota _{\mu }^{*}\iota _{\mu }^{}\in \textrm{HS}(\mathcal {H},\mu )\) and in addition to \(\iota _{\mu }^{}P_{S}\iota _{\mu }^{*}\iota _{\mu }^{}P_{S}\), the approximations \(\iota _{\mu }^{}P_{S}\iota _{\mu }^{*}\iota _{\mu }^{}\) and \(\iota _{\mu }^{}\iota _{\mu }^{*}\iota _{\mu }^{}P_{S}\) of \(\iota _{\mu }^{}\iota _{\mu }^{*}\iota _{\mu }^{}\in \textrm{HS}(\mathcal {H},\mu )\) may be considered. In these approximations, the substitution of \(\iota _{\mu }^{}\) with \(\iota _{\mu }^{}P_{S}\) is not applied invariably. \(\triangleleft \)
Lemma 5.1
Let \(\mathcal {H}_{S}\) be a closed linear subspace of \(\mathcal {H}\). For \(\mu \in \mathcal {T}_{+}(K)\), we have
Proof
Let \(\mathcal {H}_{0S}\) be the orthogonal complement of \(\mathcal {H}_{S}\) in \(\mathcal {H}\); endowed with the Hilbert structure of \(\mathcal {H}\), \(\mathcal {H}_{0S}\) is a RKHS, and \(K_{0S}=K-K_{S}\). For an ONB \(\{h_{j}\}_{j\in \mathbb {I}}\) of \(\mathcal {H}\), \(\mathbb {I}\subseteq \mathbb {N}\), we have
since \(\sum _{j\in \mathbb {I}}\big |P_{0S}[h_{j}](t)\big |^{2}=K_{0S}(t,t)\), \(t\in \mathscr {X}\) (see e.g. [17]), equality (18) follows by monotone convergence. We also have
so that (19) follows from Lemma A.1 (we recall that \(L_{\mu }=\iota _{\mu }^{*}\iota _{\mu }^{}\)). \(\square \)
Remark 5.2
We consider four closed linear subspaces \(\mathcal {H}_{R}, \mathcal {H}_{S}, \mathcal {H}_{U}\) and \(\mathcal {H}_{V}\) of \(\mathcal {H}\), and let \(\{h_{j}\}_{j\in \mathbb {I}}\) be an ONB of \(\mathcal {H}\). For \(\mu \in \mathcal {T}_{+}(K)\), we have
with, form the CS inequality in \(\ell ^{2}(\mathbb {I})\) and in \(\mathcal {H}\),
From (20) and Fubini’s theorem, we then for instance obtain
where we should observe that \(\langle {\iota _{\mu }^{}\iota _{\mu }^{*}\iota _{\mu }^{}| \iota _{\mu }^{}P_{S}\iota _{\mu }^{*}\iota _{\mu }^{}P_{S} }\rangle _{\textrm{HS}(\mathcal {H},\mu )} =\Vert \iota _{\mu }^{}P_{S}\iota _{\mu }^{*}\iota _{\mu }^{}\Vert _{\textrm{HS}(\mathcal {H},\mu )}^{2}\). \(\triangleleft \)
The following inequality (Lemma 5.2) holds between the approximations in \(\textrm{HS}(\mu )\) and \(\textrm{HS}(\mathcal {H})\) defined by a subspace \(\mathcal {H}_{S}\). We recall that \(\Vert \iota _{\mu }^{}\iota _{\mu }^{*}\Vert _{\textrm{HS}(\mu )}=\Vert \iota _{\mu }^{*}\iota _{\mu }^{}\Vert _{\textrm{HS}(\mathcal {H})}\), and that \(\Vert \iota _{\mu }^{*}\iota _{\mu }^{}-P_{S}\iota _{\mu }^{*}\iota _{\mu }^{}\Vert _{\textrm{HS}(\mathcal {H})}^{} \leqslant \Vert \iota _{\mu }^{*}\iota _{\mu }^{}-P_{S}\iota _{\mu }^{*}\iota _{\mu }^{}P_{S}\Vert _{\textrm{HS}(\mathcal {H})}^{}\) (see Remark 4.1).
Lemma 5.2
Let \(\mathcal {H}_{S}\) be a closed linear subspace of \(\mathcal {H}\). For all \(\mu \in \mathcal {T}_{+}(K)\), we have
Proof
We have \(\Vert \iota _{\mu }^{}\iota _{\mu }^{*}-\iota _{\mu }^{}P_{S}\iota _{\mu }^{*}\Vert _{\textrm{HS}(\mu )} =\Vert P_{0S}L_{\mu }P_{0S}\Vert _{\textrm{HS}(\mathcal {H})}\), with \(P_{0S}={{\,\textrm{id}\,}}_{\mathcal {H}}-P_{S}\). The operators \(P_{0S}L_{\mu }P_{0S}\) and \(P_{0S}L_{\mu }P_{S}\) being orthogonal in \(\textrm{HS}(\mathcal {H})\), we obtain
and so \(\Vert \iota _{\mu }^{}\iota _{\mu }^{*}-\iota _{\mu }^{}P_{S}\iota _{\mu }^{*}\Vert _{\textrm{HS}(\mu )} \leqslant \Vert \iota _{\mu }^{*}\iota _{\mu }^{}-P_{S}\iota _{\mu }^{*}\iota _{\mu }^{}\Vert _{\textrm{HS}(\mathcal {H})}\). \(\square \)
5.2 Trace and Frobenius error maps
Following Sect. 4.3 and considering subspaces of \(\mathcal {H}\) defined by measures, we introduce the error maps \(C^{\textrm{tr}}_{\mu }\) and \(C^{\textrm{F}}_{\mu }:\mathcal {T}_{\mathbb {F}}(K)\rightarrow \mathbb {R}_{\geqslant 0}\), with
The notations \(C^{\textrm{tr}}_{\mu }\) and \(C^{\textrm{F}}_{\mu }\) are motivated by the relation between these maps and the trace and Frobenius norms; see Sect. 5.3. As observed for \(C^{\textrm{P}}_{\mu }\) and \(C^{\textrm{PP}}_{\mu }\), we may notice that \(C^{\textrm{X}}_{\mu }(\nu )=C^{\textrm{X}}_{\mu }(|\nu |)\), \(\textrm{X}\in \{\textrm{tr},\textrm{F}\}\)
Theorem 5.1
For \(\mu \in \mathcal {T}_{+}(K)\), the statement of Theorem 4.1 also holds for the maps \(C^{\textrm{tr}}_{\mu }\) and \(C^{\textrm{F}}_{\mu }\); that is, these two maps are convex on the real convex cone \(\mathcal {T}_{+}(K)\), and their directional derivatives take values in the set \(\{-\infty ,0\}\).
Proof
We follow the same reasoning as in the proof of Theorem 4.1. For two measures \(\nu \) and \(\eta \in \mathcal {T}_{+}(K)\) and for \(\rho \in (0,1)\), we set \(\xi =\nu +\rho (\eta -\nu )\in \mathcal {T}_{+}(K)\). We then have \(\mathcal {H}_{\xi }=\overline{\mathcal {H}_{\nu }+\mathcal {H}_{\eta }}^{\mathcal {H}}\) independently of \(\rho \in (0,1)\). We conclude by combining the inclusions \(\mathcal {H}_{\nu }\subseteq \mathcal {H}_{\xi }\) and \(\mathcal {H}_{\eta }\subseteq \mathcal {H}_{\xi }\) with the inequalities provided in Lemma A.2 (Appendix A). \(\square \)
As illustrated by Lemma 5.1 and Theorem 5.1, and as already observed for the error maps \(C^{\textrm{P}}_{\mu }\) and \(C^{\textrm{PP}}_{\mu }\), the error maps \(C^{\textrm{tr}}_{\mu }\) and \(C^{\textrm{F}}_{\mu }\) are akin to piecewise-constant functions, and their evaluation requires a suitable characterisation of the kernel of subspaces of \(\mathcal {H}\). For \(\mu \in \mathcal {T}_{+}(K)\), the error maps \(C^{\textrm{X}}_{\mu }\), \(\textrm{X}\in \{\textrm{tr},\textrm{F},\textrm{P},\textrm{PP}\}\) can be regarded as alternative ways to asses the accuracy of the approximation of \(\iota ^{}_{\mu }\) by \(\iota ^{}_{\mu }P_{\nu }\), \(\nu \in \mathcal {T}_{\mathbb {F}}(K)\). From the relation between the error maps \(D_{\mu }\) and \(C^{\textrm{X}}_{\mu }\), \(\textrm{X}\in \{\textrm{P},\textrm{PP}\}\) (see Lemma 4.3) the approximation of potentials in \(\mathcal {G}\) can hence more generally be regarded as a differentiable and projection-free surrogate for the characterisation of accurate partial embeddings. From Lemma 5.2, we may notice that \(C^{\textrm{F}}_{\mu }(\nu )\leqslant C^{\textrm{P}}_{\mu }(\nu )\), \({\nu \in \mathcal {T}_{\mathbb {F}}(K)}\), extending the sequence of inequalities (16).
Remark 5.3
Let \(\nu \in \mathcal {T}_{\mathbb {C}}(K)\) be a complex measure with real and imaginary parts \(\nu _{{r}}\) and \(\nu _{{i}}\in \mathcal {T}(K)\). For \(\mu \in \mathcal {T}(K)\), the three operators \(L_{\mu }\), \(L_{\nu _{{r}}}\) and \(L_{\nu _{i}}\) are self-adjoint; we thus have \(\Vert L_{\mu }-L_{\nu }\Vert _{\textrm{HS}(\mathcal {H})}^{2}=\Vert L_{\mu }-L_{\nu _{{r}}}\Vert _{\textrm{HS}(\mathcal {H})}^{2}+\Vert L_{\nu _{i}}\Vert _{\textrm{HS}(\mathcal {H})}^{2}\), and so \(D_{\mu }(\nu _{{r}})\leqslant D_{\mu }(\nu )\). Hence, when \(L_{\mu }\) is self-adjoint, the search of an approximate measure \(\nu \) for the approximation of \(L_{\mu }\) by \(L_{\nu }\) can be restricted to \(\mathcal {T}(K)\). \(\triangleleft \)
5.3 Column sampling for PSD-matrix approximation
Let \(\textbf{K}\) be a \(N\times N\) PSD matrix, with \(N\in \mathbb {N}\); we denote by [N] the set of all integers between 1 and N. For a subset \(I\subseteq [N]\) the Nyström approximationFootnote 2 of \(\textbf{K}\) induced by I is the \(N\times N\) PSD matrix
where \(\textbf{K}_{\bullet ,I}\) is the matrix defined by the columns of \(\textbf{K}\) with index in I, and where \((\textbf{K}_{I,I})^{\dagger }\) is the pseudoinverse of the principal submatrix of \(\textbf{K}\) defined by I (and \(\textbf{K}_{I,\bullet }\) consists of rows of \(\textbf{K}\)); see e.g. [9, 11, 14, 18].
For i and \(j\in [N]\), the i, j entry of \(\textbf{K}\) may be regarded as the value K(i, j) of a PSD kernel K defined on the discrete set \(\mathscr {X}=[N]\). The j-th column of \(\textbf{K}\) then corresponds to the function \(k_{j}\in \mathcal {H}\), \(j\in \mathscr {X}\), and the subset I defines the closed linear subspace \(\mathcal {H}_{I}={{\,\textrm{span}\,}}_{\mathbb {C}}\{k_{j}|j\in I\}\subseteq \mathcal {H}\); in particular, the i, j entry of \(\hat{\textbf{K}}(I)\) is \(K_{I}(i,j)\), with \(K_{I}\) the reproducing kernel of \(\mathcal {H}_{I}\) (see e.g. [17], and Remark 4.5).
Introducing \(\mu =\sum _{i=1}^{N}\delta _{i}\), the Hilbert space \(L^{2}(\mu )\) can be identified with the Euclidean space \(\mathbb {C}^{N}\); following Sect. 5.2, we then observe that
-
the trace norm \(\Vert \textbf{K}-\hat{\textbf{K}}(I)\Vert _{\textrm{tr}}\) corresponds to (18), and
-
the squared Frobenius norm \(\Vert \textbf{K}-\hat{\textbf{K}}(I)\Vert _{\textrm{F}}^{2}\) corresponds to (19).
The column-sampling problem for the Nyström approximation of a PSD matrix \(\textbf{K}\), that is, the search of a subset \(I\subseteq [N]\) leading to an accurate approximation \(\hat{\textbf{K}}(I)\) of \(\textbf{K}\), is thus a special instance of the general framework discussed in Sect. 5.1. In particular, the support of an approximate measure \(\nu \) on \(\mathscr {X}=[N]\) defines a subset of columns of \(\textbf{K}\), and the approximation of potentials in the RKHS \(\mathcal {G}\) may be used as surrogate for the characterisation of such measures. In the discrete setting, \(\mathcal {G}\) corresponds to the RKHS defined by the \(N\times N\) PSD matrix \(\textbf{S}\) with i, j entry \(|\textbf{K}_{i,j}|^{2}\) (that is, \(\textbf{S}\) is the element-wise product between \(\textbf{K}\) and \(\overline{\textbf{K}}\), the conjugate of \(\textbf{K}\)).
6 Concluding discussion
We described the overall framework surrounding the isometric representation of integral operators with PSD kernels as potentials, and illustrated the equivalence between the quadrature approximation of such integral operators and the approximation of integral functionals on RKHSs with squared-modulus kernels. Through subspaces defined by measures and partial \(L^{2}\)-embeddings, we also discussed the extent to which the approximation of potentials in RKHSs with squared-modulus kernels can be used as a differentiable surrogate for the characterisation of projection-based approximation of integral operators with PSD kernels.
The link between integral-operator approximation and potential approximation may be leveraged to design sampling strategies for low-rank approximation (where approximations are characterised by sparse finitely-supported measures). The direct minimisation of \(D_{\mu }\) under sparsity-inducing constraints is for instance considered in [10], while the possibility to locally optimised the support of approximate measures using particle-flow techniques is studied in [13]. Sequential approaches, where support points are added one-at-a-time on the basis of information provided by the directional derivatives of \(D_{\mu }\), are investigated in [12]. The present work aims at supporting this type of approaches by strengthening their theoretical underpinning.
Notes
We only consider finite complex measures, while signed measures may not be finite.
In the machine-learning literature, Nyström approximation refers to the low-rank approximation of PSD matrices through column sampling; although related, this terminology should not to be confused with the quadrature method for the approximation of integral equations.
References
Aronszajn, N.: Theory of reproducing kernels. Trans. Am. Math. Soc. 68(3), 337–404 (1950)
Bach, F.: On the equivalence between kernel quadrature rules and random feature expansions. J. Mach. Learn. Res. 18(1), 714–751 (2017)
Conway, J.B.: A Course in Functional Analysis, vol. 96. Springer (2019)
Cucker, F., Smale, S.: On the mathematical foundations of learning. Bull. Am. Math. Soc. 39, 1–49 (2002)
Cucker, F., Xuan Zhou, D.: Learning Theory: an Approximation Theory Viewpoint. Cambridge University Press (2007)
Damelin, S.B., Hickernell, F.J., Ragozin, D.L., Zeng, X.: On energy, discrepancy and group invariant measures on measurable subsets of Euclidean space. J. Fourier Anal. Appl. 16, 813–839 (2010)
Davidson, K.R.: Nest Algebras: Triangular Forms for Operator Algebras on Hilbert Space, volume 191. Longman Scientific and Technical (1988)
Diestel, J., Uhl, J.J.: Vector Measures. vol. 15. American Mathematical Society (1977)
Drineas, P., Mahoney, M.W.: On the Nyström method for approximating a Gram matrix for improved kernel-based learning. J. Mach. Learn. Res. 6, 2153–2175 (2005)
Gauthier, B., Suykens, J.: Optimal quadrature-sparsification for integral operator approximation. SIAM J. Sci. Comput. 40, A3636–A3674 (2018)
Gittens, A., Mahoney, M.W.: Revisiting the Nyström method for improved large-scale machine learning. J. Mach. Learn. Res. 17, 1–65 (2016)
Hutchings, M., Gauthier, B.: Energy-based sequential sampling for low-rank PSD-matrix approximation. Preprint https://hal.science/hal-04102664 (2023)
Hutchings, M., Gauthier, B.: Local optimisation of Nyström samples through stochastic gradient descent. Mach. Learn. Optim. Data Sci. LOD 2022, 123–140 (2023)
Kumar, S., Mohri, M., Talwalkar, A.: Sampling methods for the Nyström method. J. Mach. Learn. Res. 13, 981–1006 (2012)
Muandet, K., Fukumizu, K., Sriperumbudur, B., Schölkopf, B.: Kernel mean embedding of distributions: a review and beyond. Found. Trends Mach. Learn. 10, 1–141 (2017)
Müller, A.: Integral probability metrics and their generating classes of functions. Adv. Appl. Probab. 29(2), 429–443 (1997)
Paulsen, V.I., Raghupathi, M.: An Introduction to the Theory of Reproducing Kernel Hilbert Spaces, Cambridge University Press (2016)
Rasmussen, C.E., Williams, C.K.I.: Gaussian Processes for Machine Learning. MIT press, Cambridge, MA (2006)
Rosasco, L., Belkin, M., De Vito, E.: On learning with integral operators. J. Mach. Learn. Res. 11, 905–934 (2010)
Santin, G., Schaback, R.: Approximation of eigenfunctions in kernel-based spaces. Adv. Comput. Math. 42, 973–993 (2016)
Smale, S., Zhou, D.-X.: Learning theory estimates via integral operators and their approximations. Constr. Approx. 26, 153–172 (2007)
Smale, S., Zhou, D.-X.: Geometry on probability spaces. Constr. Approx. 30, 311–323 (2009)
Sriperumbudur, B.K., Fukumizu, K., Gretton, A., Schölkopf, B., Lanckriet, G.R.G.: On the empirical estimation of integral probability metrics. Electron. J. Stat. 6, 1550–1599 (2012)
Sriperumbudur, B.K., Gretton, A., Fukumizu, K., Schölkopf, B., Lanckriet, G.R.G.: Hilbert space embeddings and metrics on probability measures. J. Mach. Learn. Res. 11, 1517–1561 (2010)
Steinwart, I., Christmann, A.: Support Vector Machines. Springer (2008)
Steinwart, I., Scovel, C.: Mercer’s theorem on general domains: on the interaction between measures, kernels, and RKHSs. Constr. Approx. 35, 363–417 (2012)
Steinwart, I., Ziegel, J.F.: Strictly proper kernel scores and characteristic kernels on compact spaces. Appl. Comput. Harmon. Anal. 51, 510–542 (2021)
Yosida, K.: Functional Analysis. Springer (2012)
Author information
Authors and Affiliations
Contributions
BG wrote the entirety of the manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The author received no financial support for the research, authorship, and/or publication of this article. The author declares he has no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Technical results
Technical results
Lemma A.1
Let \(\mathcal {H}_{U}\) and \(\mathcal {H}_{V}\) be two closed linear subspaces of \(\mathcal {H}\). For all \({\mu \in \mathcal {T}_{\mathbb {F}}(K)}\), we have
Proof
We consider an ONB \(\{h_{j}\}_{j\in \mathbb {I}}\) of \(\mathcal {H}\), \(\mathbb {I}\subseteq \mathbb {N}\). From (6), we have
As \(\sum _{j\in \mathbb {I}}P_{U}[h_{j}](t)\overline{P_{U}[h_{j}](x)}=K_{U}(t,x)\), x and \(t\in \mathscr {X}\) (see e.g. [17]), and since \(K_{U}(t,t)\leqslant K(t,t)\) and \(K_{V}(t,t)\leqslant K(t,t)\), from the CS inequality in \(\ell ^{2}(\mathbb {I})\) and in \(\mathcal {H}\), we obtain
the result then follows form (22) and Fubini’s theorem. \(\square \)
Lemma A.2
Let \(\mathcal {H}_{S}\) and \(\mathcal {H}_{R}\) be two closed linear subspaces of \(\mathcal {H}\); we assume that \({\mathcal {H}_{R}\subseteq \mathcal {H}_{S}}\). For all \(\mu \in \mathcal {T}_{+}(K)\), we have \(\Vert \iota _{\mu }^{*}-P_{S}\iota _{\mu }^{*}\Vert _{\textrm{HS}(\mu ,\mathcal {H})} \leqslant \Vert \iota _{\mu }^{*}-P_{R}\iota _{\mu }^{*}\Vert _{\textrm{HS}(\mu ,\mathcal {H})}\) and \(\Vert \iota _{\mu }^{}\iota _{\mu }^{*}-\iota _{\mu }^{}P_{S}\iota _{\mu }^{*}\Vert _{\textrm{HS}(\mu )} \leqslant \Vert \iota _{\mu }^{}\iota _{\mu }^{*}-\iota _{\mu }^{}P_{R}\iota _{\mu }^{*}\Vert _{\textrm{HS}(\mu )}\).
Proof
We denote by \(\mathcal {H}_{e}\) the orthogonal complement of \(\mathcal {H}_{R}\) in \(\mathcal {H}_{S}\). Noticing that \(P_{S}=P_{R}+P_{e}\) and that \(\langle {\iota _{\mu }^{*}-P_{R}\iota _{\mu }^{*}|P_{e}\iota _{\mu }^{*}}\rangle _{\textrm{HS}(\mu ,\mathcal {H})} =\Vert P_{e}\iota _{\mu }^{*}\Vert _{\textrm{HS}(\mu ,\mathcal {H})}^{2}\), we obtain
Denoting by \(\mathcal {H}_{0\,S}\) and \(\mathcal {H}_{0R}\) the orthogonal complements of \(\mathcal {H}_{S}\) and \(\mathcal {H}_{R}\) in \(\mathcal {H}\), respectively, we have \(\mathcal {H}_{0S}\subseteq \mathcal {H}_{0R}\); Lemma 4.1 then gives
completing the proof. \(\square \)
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Gauthier, B. Kernel embedding of measures and low-rank approximation of integral operators. Positivity 28, 29 (2024). https://doi.org/10.1007/s11117-024-01041-8
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11117-024-01041-8
Keywords
- Reproducing kernel Hilbert spaces
- Integral operators
- Low-rank approximation
- Kernel embedding of measures
- Differentiable relaxation