1 Introduction

For the past few decades, radial basis functions have been established as one of the main tools in multivariate approximation theory. They allow the user to build high-order, meshfree approximation spaces and provide an extremely flexible tool for the reconstruction of functions from scattered data, see [6, 15, 46].

The main ingredient are, of course, radial functions, though many of the results are also true for non-radial functions.

Definition 1.

A radial function is a function \(\varPhi: \mathbb{R}^{d} \rightarrow \mathbb{R}\) of the form Φ(x) = ϕ( ∥ x ∥ 2), \(\mathbf{x} \in \mathbb{R}^{d}\), where \(\phi: [0,\infty ) \rightarrow \mathbb{R}\) is a univariate function and ∥ x ∥ 2 = (x 1 2 + + x d 2)1∕2 denotes the Euclidean norm of \(\mathbf{x} \in \mathbb{R}^{d}\).

Sometimes, the univariate function ϕ in the above definition is referred to as the radial function but we will not do this here. We will however make the following general assumption.

Remark 1.

We will always assume that the univariate function \(\phi: [0,\infty ) \rightarrow \mathbb{R}\) is defined on all of \(\mathbb{R}\) by even extension, i.e. by ϕ(−r): = ϕ(r) for r > 0.

We will say that such a ϕ: [0, ) generates or induces a multivariate function on \(\mathbb{R}^{d}\). This induced function is simply defined to be Φ(x): = ϕ( ∥ x ∥ 2), \(\mathbf{x} \in \mathbb{R}^{d}\).

In this article, we will predominantly be dealing with radial functions \(\varPhi: \mathbb{R}^{d} \rightarrow \mathbb{R}\) having a compact support, i.e. with functions for which

$$\displaystyle{\mbox{ supp}\varPhi:= \overline{\{\mathbf{x} \in \mathbb{R}^{d}:\varPhi (\mathbf{x})\neq 0\}}}$$

is bounded. In general, we will simply assume that the support of Φ is given by the unit ball. In that case, we can introduce scaled versions, i.e.

$$\displaystyle{\varPhi _{\delta }(\mathbf{x}):=\delta ^{-d}\varPhi (\mathbf{x}/\delta )}$$

with δ > 0, which obviously have support in the ball about the origin with radius δ. Besides scaling we can also shift a scaled basis function. This allows us to define our basic approximation spaces. To this end let \(\varOmega \subseteq \mathbb{R}^{d}\) be a bounded domain and let X = {x 1, , x N } ⊆ Ω be a set of distinct points, the data sites. Associated to such a point set are two geometric quantities, the fill distance or mesh norm h X, Ω and the separation distance q X , which are defined to be

$$\displaystyle\begin{array}{rcl} h_{X,\varOmega }&:=& \sup _{\mathbf{x}\in \varOmega }\min _{1\leq j\leq N}\|\mathbf{x} -\mathbf{x}_{j}\|_{2}, {}\\ q_{X}&:=& \min _{1\leq j\neq k\leq N}\|\mathbf{x}_{j} -\mathbf{x}_{k}\|_{2}, {}\\ \end{array}$$

respectively. The fill distance describes how well the scattered points X “cover” the domain Ω in the following sense. The fill distance h X, Ω gives the radius of the largest ball completely contained in Ω without a data site within its interior. The fill distance is important for understanding approximation orders. The separation distance defines the smallest distance between the two closest data sites. It becomes important when looking at stability issues.

A scaled, compactly supported kernel and a set of data sites are enough to define a first approximation space via

$$\displaystyle{ W_{X} = W_{X,\varPhi _{\delta }}:= \mbox{ span}\{\varPhi _{\delta }(\cdot -\mathbf{x}): \mathbf{x} \in X\}. }$$
(1)

We will refer to this as a local approximation space since we will not deal with only one data set and one associated approximation space, but with a sequence of data sets X 1, X 2,  with mesh norms \(h_{j} = h_{X_{j},\varOmega }\), which are supposed to be monotonically decreasing. To ensure a certain uniformity in decrease, we will assume that h j+1 ≈ μ h j for some fixed μ ∈ (0, 1). For some of our results we will require that the sequence is quasi-uniform, which means that there is a constant c q such that, with \(q_{j} = q_{X_{j}}\),

$$\displaystyle{q_{j} \leq h_{j} \leq c_{q}q_{j}.}$$

Having this sequence of data sets X j , it is clear that we can build a local approximation space \(W_{j} = W_{X_{j}}\) for each X j . To this end, we will define a kernel \(\varPhi _{j}:\varOmega \times \varOmega \rightarrow \mathbb{R}\) for each level. In our application this kernel will be given by the scaled version of a fixed translation invariant radial basis function. To be more precise, we assume that there is a compactly supported function \(\varPhi: \mathbb{R}^{d} \rightarrow \mathbb{R}\) with support in the unit ball B(0, 1) and that, for each level, there is a scaling parameter δ j  > 0 such that we can define

$$\displaystyle{\varPhi _{j}(\mathbf{x},\mathbf{y}) =\delta _{ j}^{-d}\varPhi ((\mathbf{x} -\mathbf{y})/\delta _{ j}).}$$

If we consider y to be fixed, it follows that the function Φ j (⋅ , y) has support in B(y, δ j ), the ball with radius δ j and centre y.

As we assume that the data sets become denser and denser, we will also assume that the support radii become smaller and smaller, usually in the same way, i.e., we will assume that δ j  = vh j with a constant ν > 0, which we will link to μ later on. Note that this also leads to δ j+1 ≈ μ δ j .

With the data sets and the associated kernels at hand, we can build approximation spaces of the form

$$\displaystyle{ W_{j} = \mbox{ span}\{\varPhi _{j}(\cdot,\mathbf{x}): \mathbf{x} \in X_{j}\}. }$$
(2)

representing details on level j. Thus, the approximation of our function will come from the sum of these spaces, i.e., we have to investigate the approximation power of

$$\displaystyle{V _{n}:= W_{1} + W_{2} + \cdots + W_{n}}$$

for n → .

The rest of the paper is organised as follows. In the next section, we will discuss the relevant material on optimal recovery, reproducing kernel Hilbert spaces and Sobolev spaces. After that, we will discuss compactly supported radial basis functions. In the fourth section, we will then address multiscale interpolation and approximation.

2 Optimal Recovery and Reproducing Kernel Hilbert Spaces

2.1 The Reconstruction Problem

Our main tool in using discrete approximation spaces W X of the form (1) will be optimal recovery and interpolation, which turns out to be the same in this context. Throughout this section, we will consider only local approximation spaces \(W_{X,\varPhi _{\delta }}\). In particular, we want to keep the scaling factor δ fixed but might vary the data set X.

To understand why radial basis functions, or, more generally, positive definite functions are natural tools in multivariate approximation theory, we start with an abstract result from linear algebra.

To this end, let H be a Hilbert space and denote its dual by H . Suppose Λ = {λ 1, , λ N } ⊆ H is a set of linearly independent functionals on H and \(f_{1},\ldots,f_{N} \in \mathbb{R}\) are certain given values. Then a generalised recovery problem would ask for finding a function s ∈ H so that λ j (s) = f j , 1 ≤ j ≤ N. We will call s a generalised interpolant. Usually, there are an infinite number of possible generalised interpolants and hence there is need to distinguish one of them. Moreover, if the data contain noise, interpolation is not an appropriate way of reconstructing the unknown function f. Hence, we will also consider a method, which was originally introduced in the context of splines (see [42]).

Definition 2.

The norm-minimal generalised interpolant is the element s 0 ∈ H satisfying

$$\displaystyle{ \|s_{0}\| =\min \{\| s\|: s \in H,\lambda _{j}(s) = f_{j},1 \leq j \leq N\}. }$$
(3)

Given a smoothing parameter ɛ > 0, the optimal recovery or smoothing spline is the element s ɛ  ∈ H satisfying

$$\displaystyle{ \min \left \{\sum _{j=1}^{N}\vert \lambda _{ j}(s) - f_{j}\vert ^{2} +\varepsilon \| s\|_{ H}^{2}\right \}. }$$
(4)

The solutions to both problems (3) and (4) are unique. A proof of the following result can be found in [42, 46].

Theorem 1.

Let H be a Hilbert space, λ 1 ,…,λ N ∈ H linearly independent functionals with Riesz representers v j , 1,≤ j ≤ N, and let \(f_{1},\ldots,f_{N} \in \mathbb{R}\) be given. The unique solutions s 0 of (3) and s ɛ of (4) can be written as

$$\displaystyle{ s_{\varepsilon } =\sum _{ j=1}^{N}\alpha _{ j}v_{j}, }$$
(5)

ɛ ≥ 0, where the coefficients {α j } are the solution of the linear system

$$\displaystyle{ (A +\varepsilon I)\boldsymbol{\alpha } = \mathbf{f} }$$
(6)

where A = (λ i (v j )) and f = (f 1 ,…,f N ) T.

Hence, the optimal recovery problem can constructively be solved once the Riesz representer of the functionals are known. For general Hilbert spaces of functions this is not an easy task. However, for reproducing kernel Hilbert spaces this becomes simple once the reproducing kernel is known.

Definition 3.

A Hilbert space H of functions \(f:\varOmega \rightarrow \mathbb{R}\) is a reproducing kernel Hilbert space if there is a kernel \(\varPhi:\varOmega \times \varOmega \rightarrow \mathbb{R}\) with

  1. 1.

    Φ(⋅ , x) ∈ H for all x ∈ Ω,

  2. 2.

    f(x) = 〈f, Φ(⋅ , x)〉 H for all x ∈ Ω and all f ∈ H.

In such a reproducing kernel Hilbert space, the Riesz representer of any functional λ ∈ H is simply given by applying λ to one of the arguments of the kernel, i.e. v λ  = λ y Φ(⋅ , y). In this paper, we will exclusively be concerned with point evaluation functionals, i.e. λ j (f) = f(x j ). Hence, the optimal recovery takes the simple form

$$\displaystyle{s_{\varepsilon } =\sum _{ j=1}^{N}\alpha _{ j}\varPhi (\cdot,\mathbf{x}_{j})}$$

and the matrix A has thus entries Φ(x i , x j ).

2.2 Sobolev Spaces

The reproducing kernel Hilbert spaces we will be concerned with are Sobolev spaces. For an integer \(\sigma = k \in \mathbb{N}_{0}\) the Sobolev space H k(Ω) consists of all those functions u ∈ L 2(Ω) which have weak derivatives \(D^{\boldsymbol{\alpha }}u\) in L 2(Ω) for all \(\vert \boldsymbol{\alpha }\vert \leq k\). For non-integer σ > 0, the Sobolev space H σ(Ω) can be defined using interpolation theory, see, for example, [1].

The Sobolev embedding theorem guarantees that H σ(Ω) is a reproducing kernel Hilbert space for any σ > d∕2. However, since the reproducing kernel is uniquely determined by the inner product and also depends upon Ω in our setting, it is in general not possible to give an explicit formula for the reproducing kernel. We will circumvent this problem by two means. First of all, we will assume that we can extend our functions defined on Ω to all of \(\mathbb{R}^{d}\), see [4].

Proposition 1.

Let \(\varOmega \subseteq \mathbb{R}^{d}\) be a bounded domain with a Lipschitz boundary. Then, there is a universal extension operator \(E: H^{\sigma }(\varOmega ) \rightarrow H^{\sigma }(\mathbb{R}^{d})\) satisfying

  1. 1.

    Ef|Ω = f

  2. 2.

    \(\|Ef\|_{H^{\sigma }(\mathbb{R}^{d})} \leq C_{\sigma }\|f\|_{H^{\sigma }(\varOmega )}\)

for all f ∈ H σ (Ω) and all σ ≥ 0.

It is important that though the constant C σ  > 0 might depend on σ, the operator E itself does not. The existence of such an operator means that we can equip H σ(Ω) with an equivalent norm \(f\mapsto \|Ef\|_{H^{\sigma }(\mathbb{R}^{d})}\) and thus we can concentrate on determining the reproducing kernel of \(H^{\sigma }(\mathbb{R}^{d})\) for σ > d∕2. This becomes particularly easy if we write the norm on \(H^{\sigma }(\mathbb{R}^{d})\) using the Fourier transform \(\widehat{f}\) of \(f \in H^{\sigma }(\mathbb{R}^{d})\), σ > d∕2 defined by

$$\displaystyle{\widehat{f}(\boldsymbol{\omega }) = (2\pi )^{-d/2}\int _{ \mathbb{R}^{d}}f(\mathbf{x})e^{-i\mathbf{x}^{T}\boldsymbol{\omega } }d\mathbf{x}.}$$

With this, we can define the norm on \(H^{\sigma }(\mathbb{R}^{d})\) to be

$$\displaystyle{ \|f\|_{H^{\sigma }(\mathbb{R}^{d})}^{2} =\int _{ \mathbb{R}^{d}}\vert \widehat{f}(\boldsymbol{\omega })\vert ^{2}(1 +\| \boldsymbol{\omega }\|_{ 2}^{2})^{\sigma }d\boldsymbol{\omega }. }$$
(7)

The reproducing kernel can then be written in translation-invariant form, i.e. it satisfies Φ(x, y) = Φ 0(xy) with \(\varPhi _{0}: \mathbb{R}^{d} \rightarrow \mathbb{R}\) for which we will simply write Φ(x, y) = Φ(xy) in an abuse of notation. The function \(\varPhi: \mathbb{R}^{d} \rightarrow \mathbb{R}\) is then simply the inverse Fourier transform of one over the weight function, i.e.

$$\displaystyle{\widehat{\varPhi }(\boldsymbol{\omega }) = (1 +\| \boldsymbol{\omega }\|_{2}^{2})^{-\sigma },}$$

which is a consequence of the following general result, see [46].

Proposition 2.

Suppose \(\varPhi \in L_{1}(\mathbb{R}^{d})\) has a Fourier transform satisfying

$$\displaystyle{ c_{1}(1 +\| \boldsymbol{\omega }\|_{2}^{2})^{-\sigma }\leq \widehat{\varPhi } (\boldsymbol{\omega }) \leq c_{ 2}(1 +\| \boldsymbol{\omega }\|_{2}^{2})^{-\sigma } }$$
(8)

Then, Φ also defines a reproducing kernel on \(H^{\sigma }(\mathbb{R}^{d})\) with respect to the inner product defined by

$$\displaystyle{\langle f,g\rangle _{\varPhi }:=\int _{\mathbb{R}^{d}}\frac{\widehat{f}(\boldsymbol{\omega })\overline{\widehat{g}(\boldsymbol{\omega })}} {\widehat{\varPhi }(\boldsymbol{\omega })} d\boldsymbol{\omega }.}$$

The norm associated to this inner product is equivalent to the Sobolev norm (7).

This also opens the door to the construction of a variety of kernels, which are also reproducing kernels in \(H^{\sigma }(\mathbb{R}^{d})\). We will only consider compactly supported Φ’s having a Fourier transform with such a decay. However, since for t ≥ 0 and σ ≥ 0, we have

$$\displaystyle{1 + t^{\sigma } \leq (1 + t)^{\sigma } \leq 2^{\sigma }(1 + t^{\sigma })}$$

we see that the decay property (8) is equivalent to the decay property

$$\displaystyle{ c_{1}(1 +\| \boldsymbol{\omega }\|_{2}^{2\sigma })^{-1} \leq \widehat{\varPhi } (\boldsymbol{\omega }) \leq c_{ 2}(1 +\| \boldsymbol{\omega }\|_{2}^{2\sigma })^{-1}, }$$
(9)

which is slightly easier to handle in what follows. An immediate consequence of these observations is the following one.

Corollary 1.

Let \(\varPhi \in L_{1}(\mathbb{R}^{d})\) be a reproducing kernel of \(H^{\sigma }(\mathbb{R}^{d})\) , σ > d∕2, i.e. its Fourier transform satisfies (9). For δ ∈ (0,1] let Φ δ := δ −d Φ(⋅∕δ). Then, Φ δ is also a reproducing kernel of \(H^{\sigma }(\mathbb{R}^{d})\) and for every \(g \in H^{\sigma }(\mathbb{R}^{d})\) , we have the norm equivalence

$$\displaystyle{c_{1}^{1/2}\|g\|_{\varPhi _{\delta }} \leq \| g\|_{ H^{\sigma }(\mathbb{R}^{d})} \leq c_{2}^{1/2}\delta ^{-\sigma }\|g\|_{\varPhi _{\delta }},}$$

Proof.

This follows immediately from \(\widehat{\varPhi _{\delta }} =\widehat{\varPhi } (\delta \cdot )\). □ 

It is important to note that the Hilbert space in which the scaled kernel Φ δ is the reproducing kernel is always \(H^{\sigma }(\mathbb{R}^{d})\), i.e. the space itself does not depend on the scaling parameter. However, the norm associated to the inner product in which Φ δ is the reproducing kernel does depend on δ > 0. Also the equivalence constants depend on δ. While we always have \(\|f\|_{\varPhi _{\delta }} \leq C\|f\|_{H^{\sigma }(\mathbb{R}^{d})}\) with C > 0 being independent of δ > 0, the constant in the other estimate \(\|f\|_{H^{\sigma }(\mathbb{R}^{d})} \leq C\delta ^{-\sigma }\|f\|_{\varPhi _{\delta }}\) tends to infinity with δ → 0. This is not surprising at all, since \(\|\cdot \|_{\varPhi _{\delta }}\rightarrow \|\cdot \|_{L_{2}(\mathbb{R}^{d})}\) for δ → 0 and the constant has to balance this loss of derivative in the norm.

3 Compactly Supported Radial Basis Functions

In this section we will review recent results on compactly supported radial basis functions. We are particularly interested in the construction of such basis functions and in those compactly supported RBFs which are reproducing kernels of Sobolev spaces.

Much of the general theory in this section can be found in [48]. Newer results are based on [8, 9, 22, 35, 37, 43, 50, 51].

3.1 Construction of Compactly Supported RBFs

A famous result of Bochner states that a continuous, integrable function \(\varPhi: \mathbb{R}^{d} \rightarrow \mathbb{R}\) is positive definite and hence the reproducing kernel of its associated reproducing kernel Hilbert space, if and only if it possesses a non-negative, non-vanishing Fourier transform

$$\displaystyle{\widehat{\varPhi }(\boldsymbol{\omega }):= (2\pi )^{-d/2}\int _{ \mathbb{R}^{d}}\varPhi (\mathbf{x})e^{-i\mathbf{x}^{T}\boldsymbol{\omega } }d\mathbf{x},\qquad \boldsymbol{\omega } \in \mathbb{R}^{d},}$$

see [46] for a proof. For a radial, positive definite function, it is well-known that the Fourier transform is also radial and can be reduced to a one-dimensional Hankel transform. This will be essential in what we now want to do. To formulate this, we need to introduce Bessel functions of the first kind.

Definition 4.

The Bessel function of the first kind of order \(\nu \in \mathbb{C}\) is defined by

$$\displaystyle{J_{\nu }(z):=\sum _{ m=0}^{\infty }\frac{(-1)^{m}(\frac{1} {2}z)^{2m+\nu }} {m!\varGamma (\nu +m + 1)},\qquad z \in \mathbb{C}.}$$

This gives the desired representation in the case of a positive definite, radial function.

Theorem 2.

Suppose \(\varPhi \in L_{1}(\mathbb{R}^{d}) \cap C(\mathbb{R}^{d})\) is radial, i.e. Φ( x ) = ϕ(∥ x 2 ), \(\mathbf{x} \in \mathbb{R}^{d}\) . Then its Fourier transform \(\widehat{\varPhi }\) is also radial, i.e. \(\widehat{\varPhi }(\boldsymbol{\omega }) =\mathscr{ F}_{d}\phi (\|\boldsymbol{\omega }\|_{2})\) with

$$\displaystyle{ \mathscr{F}_{d}\phi (r) = r^{-\frac{d-2} {2} }\int _{0}^{\infty }\phi (t)t^{\frac{d} {2} }J_{\frac{d-2} {2} }(rt)dt. }$$
(10)

Hence, Φ is positive definite if and only if \(\mathscr{F}_{d}\phi \geq 0\) and \(\mathscr{F}_{d}\phi \not\equiv 0\).

An interesting consequence of (10) is that the operator \(\mathscr{F}_{d}\) cannot only be defined for \(d \in \mathbb{N}\) but actually for all d ∈ [0, ), which we will from now on do. However, the connection to a multivariate Fourier transform is, of course, only given if \(d \in \mathbb{N}\). Also, the existence of the integral has to be ensured.

Next, we introduce the key operators to construct compactly supported radial basis functions.

Definition 5.

  1. 1.

    Let ϕ be given so that tϕ(t)t is in L 1[0, ), then we define for r ≥ 0,

    $$\displaystyle{(\mathscr{I}\phi )(r) =\int _{ r}^{\infty }t\phi (t)dt.}$$
  2. 2.

    For even \(\phi \in C^{2}(\mathbb{R})\) we define for r ≥ 0,

    $$\displaystyle{(\mathscr{D}\phi )(r) = -\frac{1} {r}\phi ^{{\prime}}(r).}$$

In both cases the resulting functions should be seen as even functions by even extension.

Obviously, these operators are inverse to each other in the sense that \(\mathscr{I}\mathscr{D} = I\) and \(\mathscr{D}\mathscr{I} = I\) whenever these operations are well-defined, where I denotes the identity operator.

The importance of these operators comes from the fact that they nicely work together with the Fourier transform operator \(\mathscr{F}_{d}\). We will formulate this in a moment but before that, we want to generalise them.

Definition 6.

Let α > 0 and assume that tt ϕ(t) is integrable over [0, ). Then, we define for r ≥ 0,

$$\displaystyle{\mathscr{I}_{\alpha }\phi (r):= \frac{1} {2^{\alpha -1}\varGamma (\alpha )}\int _{r}^{\infty }\phi (t)t(t^{2} - r^{2})^{\alpha -1}dt}$$

and for r < 0 the function \(\mathscr{I}_{\alpha }\phi\) is defined by even extension.

Note that we can rewrite the integral in the definition as

$$\displaystyle{\int _{r}^{\infty }\phi (t)t(t + r)^{\alpha -1}(t - r)^{\alpha -1}dt.}$$

Hence, for α ∈ (0, 1), the singularity at t = r is integrable. Since we will be merely be concerned with continuous compactly supported functions in this paper, the operator \(\mathscr{I}_{\alpha }\) can always be applied to such a function for any α > 0.

We also see that for α = 1 we have \(\mathscr{I}_{1} =\mathscr{ I}\), i.e. both definitions coincide. Finally, we will particularly be interested in the case α = 1∕2. In this case the operator reduces to

$$\displaystyle{\mathscr{I}_{1/2}\phi (r) = \sqrt{\frac{2} {\pi }} \int _{r}^{\infty }\phi (t)t(t^{2} - r^{2})^{-1/2}dt.}$$

Since we are merely interested in compactly supported functions, it is important for us to note that \(\mathscr{I}_{\alpha }\) respects a compact support in the following sense.

Lemma 1.

Assume that the even function \(\phi: \mathbb{R} \rightarrow \mathbb{R}\) has a compact support contained in the interval [−R,R]. Then, for each α > 0 the function \(\mathscr{I}_{\alpha }\phi\) has also compact support contained in [−R,R].

We are now going to extend the range of possible operators \(\mathscr{I}_{\alpha }\) by defining them also for α ≤ 0. We will do this first formally in the following definition.

Definition 7.

  1. 1.

    We define \(\mathscr{I}_{0}\) to be the identity operator, i.e. \(\mathscr{I}_{0}\phi =\phi\) for all even functions \(\phi: \mathbb{R} \rightarrow \mathbb{R}\).

  2. 2.

    For \(n \in \mathbb{N}\) we define \(\mathscr{I}_{-n}:=\mathscr{ D}^{n}\).

  3. 3.

    For α > 0 we let k: = ⌈α⌉ and define \(\mathscr{I}_{-\alpha }:=\mathscr{ I}_{-k}\mathscr{I}_{k-\alpha }\).

To see to what functions ϕ we can apply these operators, we restrict ourselves to compactly supported and even functions. If such a function possesses + 1 continuous derivatives then we first note that the derivatives must satisfy ϕ (j)(r) = (−1)j ϕ (j)(−r). In particular, we see that for odd j the derivative at zero has to vanish. The Taylor expansion about the origin is thus

$$\displaystyle\begin{array}{rcl} \phi (r)& =& \sum _{j=0}^{\ell}\frac{\phi ^{(j)}(0)} {j!} r^{j} + \frac{1} {\ell!} \int _{0}^{r}(r - t)^{\ell}\phi ^{(\ell+1)}(t)dt {}\\ & =& \sum _{j=0}^{\lfloor \ell/2\rfloor }\frac{\phi ^{(2j)}(0)} {(2j)!} r^{2j} + \frac{1} {\ell!} \int _{0}^{r}(r - t)^{\ell}\phi ^{(\ell+1)}(t)dt. {}\\ \end{array}$$

Applying \(\mathscr{I}_{-1} =\mathscr{ D}\) to this expansion yields

$$\displaystyle\begin{array}{rcl} \mathscr{I}_{-1}\phi (r)& =& -\frac{1} {r} \frac{d} {dr}\phi (r) {}\\ & =& -\sum _{j=1}^{\lfloor \ell/2\rfloor } \frac{\phi ^{(2j)}(0)} {(2j - 1)!}r^{2j-2} - \frac{1} {(\ell-1)!} \frac{1} {r}\int _{0}^{r}(r - t)^{\ell-1}\phi ^{(\ell+1)}(t)dt. {}\\ \end{array}$$

The critical term on the right-hand side behaves like

$$\displaystyle{\lim _{r\rightarrow 0}\frac{1} {r}\int _{0}^{r}(r - t)^{\ell-1}\phi ^{(\ell+1)}(t)dt =\lim _{ r\rightarrow 0}(\ell-1)\int _{0}^{r}(r - t)^{\ell-2}\phi ^{(\ell+1)}(t)dt = 0}$$

as long as  ≥ 2 and like ϕ (2)(0) for  = 1. Hence, for even \(\phi \in C^{2}(\mathbb{R})\) with compact support \(\mathscr{I}_{-1}\phi\) is well-defined and continuous. We can iterate this process to derive the following result.

Corollary 2.

Let α > 0 and k = ⌈α⌉. If \(\phi \in C^{2k}(\mathbb{R})\) is even and compactly supported, then \(\mathscr{I}_{-\alpha }\phi\) is well-defined, continuous and compactly supported.

However, more important to us is the following observation, which can be derived from the above corollary but also straight-forward by induction.

Corollary 3.

If \(k \in \mathbb{N}\) , then for every even \(\phi \in C(\mathbb{R})\) with compact support, \(\psi:=\mathscr{ I}_{k}\phi\) and \(\mathscr{I}_{-k}\psi\) are well-defined and satisfy

$$\displaystyle{\mathscr{I}_{-k}\mathscr{I}_{k}\phi =\phi.}$$

Proof.

We already know that ψ is well-defined. To see that \(\mathscr{I}_{-k}\psi\) is also well-defined, we first recall that the fundamental theorem of calculus immediately yields \(\mathscr{I}_{-1}\mathscr{I}_{1}\phi =\phi\). For k ≥ 2, we first compute

$$\displaystyle\begin{array}{rcl} \mathscr{I}_{-1}\mathscr{I}_{k}\phi (r)& =& - \frac{1} {2^{k-1}(k - 1)!} \frac{1} {r} \frac{d} {dr}\int _{r}^{\infty }\phi (t)t(t^{2} - r^{2})^{k-1}dt {}\\ & =& \frac{2r(k - 1)} {2^{k-1}(k - 1)!} \frac{1} {r}\int _{r}^{\infty }\phi (t)t(t^{2} - r^{2})^{k-2}dt {}\\ & =& \frac{1} {2^{k-2}(k - 2)!}\int _{r}^{\infty }\phi (t)t(t^{2} - r^{2})^{k-2}dt {}\\ & =& \mathscr{I}_{k-1}\phi (r) {}\\ \end{array}$$

and the rest follows by induction. □ 

To understand the interaction of the operators \(\mathscr{I}_{\alpha }\) and \(\mathscr{F}_{d}\), we need the following relation on Bessel functions.

Lemma 2.

For μ > −1∕2 and ν > −1 we have

$$\displaystyle{J_{\mu +\mathit{v}+1}(t) = \frac{t^{\nu +1}} {2^{\mathit{v}}\varGamma (\nu +1)}\int _{0}^{1}J_{\mu }(tu)u^{\mu +1}(1 - u^{2})^{\mathit{v}}du,\qquad t > 0.}$$

The proof of this lemma starts with the integral on the right-hand side. It uses the definition of the Bessel function as an infinite series, exchanges integration and summation, integrates the resulting terms and interprets the result as a Bessel function again. Details can be found in [38, Lemma  4.13].

Theorem 3.

For each d,α > 0 we have

$$\displaystyle{ \mathscr{F}_{d}\mathscr{I}_{\alpha }\phi =\mathscr{ F}_{d+2\alpha }\phi }$$
(11)

Hence, ϕ defines a positive definite function on \(\mathbb{R}^{d+2\alpha }\) if and only if \(\mathscr{I}_{\alpha }\) defines a positive definite function on \(\mathbb{R}^{d}\).

Proof.

We compute \(\mathscr{F}_{d}\mathscr{I}_{\alpha }\) using Theorem 2 and Fubini’s theorem

$$\displaystyle\begin{array}{rcl} \mathscr{F}_{d}\mathscr{I}_{\alpha }\phi (r)& =& \frac{1} {2^{\alpha -1}\varGamma (\alpha )}r^{-\frac{d-2} {2} }\int _{0}^{\infty }\int _{t}^{\infty }\phi (s)s(s^{2} - t^{2})^{\alpha -1}t^{\frac{d} {2} }J_{\frac{d-2} {2} }(rt)dsdt {}\\ & =& \frac{1} {2^{\alpha -1}\varGamma (\alpha )}r^{-\frac{d-2} {2} }\int _{0}^{\infty }\phi (s)s\int _{0}^{s}(s^{2} - t^{2})^{\alpha -1}t^{\frac{d} {2} }J_{\frac{d-2} {2} }(rt)dtds. {}\\ \end{array}$$

After a change of variables u: = ts, the inner integral can be computed using Lemma 2:

$$\displaystyle\begin{array}{rcl} \int _{0}^{s}(s^{2} - t^{2})^{\alpha -1}t^{\frac{d} {2} }J_{\frac{d-2} {2} }(rt)dt& =& \int _{0}^{1}(s^{2} - u^{2}s^{2})^{\alpha -1}(us)^{\frac{d} {2} }J_{\frac{d-2} {2} }(rsu)sdu {}\\ & =& s^{2\alpha -1+\frac{d} {2} }\int _{0}^{1}(1 - u^{2})^{\alpha -1}u^{\frac{d} {2} }J_{\frac{d-2} {2} }(rsu)du {}\\ & =& s^{2\alpha -1+\frac{d} {2} }2^{\alpha -1}\varGamma (\alpha )(rs)^{-\alpha }J_{\frac{d-2} {2} +\alpha }(rs). {}\\ \end{array}$$

Inserting this into the above formula for \(\mathscr{F}_{d}\mathscr{I}_{\alpha }\) immediately yields

$$\displaystyle\begin{array}{rcl} \mathscr{F}_{d}\mathscr{I}_{\alpha }\phi (r)& =& r^{-\frac{d-2} {2} -\alpha }\int _{0}^{\infty }\phi (s)ss^{\alpha -1+\frac{d} {2} }J_{\frac{d-2} {2} +\alpha }(rs)ds {}\\ & =& r^{-\frac{d+2\alpha -2} {2} }\int _{0}^{\infty }\phi (s)s^{\frac{d+2\alpha } {2} }J_{\frac{d+2\alpha -2} {2} }(rs)ds {}\\ & =& \mathscr{F}_{d+2\alpha }\phi (r). {}\\ \end{array}$$

 □ 

This means particularly for the operators \(\mathscr{I} =\mathscr{ I}_{1}\) and \(\mathscr{I}_{1/2}\) the following.

Corollary 4.

  • A function ϕ induces a positive definite function on \(\mathbb{R}^{d+2}\) if and only if \(\mathscr{I}\phi\) induces a positive definite function on \(\mathbb{R}^{d}\).

  • A function ϕ induces a positive definite function on \(\mathbb{R}^{d+1}\) if and only if \(\mathscr{I}_{1/2}\phi\) induces a positive definite function on \(\mathbb{R}^{d}\).

However, (11) has another important consequence. If \(\phi \in C(\mathbb{R})\) has compact support, then we can interpret the classical Fourier transform of ϕ also in the L 2 sense. In the L 2-sense, ϕ is uniquely determined by its Fourier transform and hence we have the following result.

Corollary 5.

Let α,β > 0. Assume that \(\phi \in C(\mathbb{R})\) is even and has compact support then we have

$$\displaystyle{ \mathscr{I}_{\alpha }\mathscr{I}_{\beta }\phi =\mathscr{ I}_{\alpha +\beta }\phi =\mathscr{ I}_{\beta }\mathscr{I}_{\alpha } }$$
(12)

and

$$\displaystyle{ \mathscr{I}_{-\alpha }\mathscr{I}_{\alpha }\phi =\phi. }$$
(13)

Proof.

Using (11) yields

$$\displaystyle{\mathscr{F}_{d}\mathscr{I}_{\alpha }\mathscr{I}_{\beta }\phi =\mathscr{ F}_{d+2\alpha }\mathscr{I}_{\beta }\phi =\mathscr{ F}_{d+2\alpha +2\beta }\phi.}$$

The other expressions in the first identity have the same Fourier transform and hence all of the stated functions must be the same.

To prove the second equality, let k = ⌈α⌉. By definition, we have \(\mathscr{I}_{-\alpha } =\mathscr{ I}_{-k}\mathscr{I}_{k-\alpha }\). Hence,

$$\displaystyle{\mathscr{I}_{-\alpha }\mathscr{I}_{\alpha }\phi =\mathscr{ I}_{-k}\mathscr{I}_{k-\alpha }\mathscr{I}_{\alpha }\phi =\mathscr{ I}_{-k}\mathscr{I}_{k}\phi =\phi,}$$

where we first have used (12) and then Corollary 3. □ 

3.2 Wendland Functions and Generalised Wendland Functions

We will now describe a popular class of compactly supported radial basis functions, which is widely used in applications. The starting point is the cut-off power function

$$\displaystyle{ \phi _{\nu }(r) = (1 - r)_{+}^{\mathit{v}} }$$
(14)

which is known to induce a positive definite function on \(\mathbb{R}^{d}\) if ν ≥ (d + 1)∕2, see [3, 46].

Definition 8.

The Wendland function of smoothness 2k for space dimension d is defined by

$$\displaystyle\begin{array}{rcl} \phi _{d,k}(r)& =& \mathscr{I}_{k}\phi _{\lfloor d/2\rfloor +k+1}(r) {}\\ & =& \frac{1} {2^{k-1}(k - 1)!}\int _{r}^{1}(1 - t)^{\lfloor d/2\rfloor +k+1}t(t^{2} - r^{2})^{k-1}dt,\qquad 0 \leq r \leq 1. {}\\ \end{array}$$

The definition shows that we will have the same function for even d = 2n and odd d = 2n + 1. We will address this issue later on. However, it is also clear from this definition that ϕ d, k restricted to [0, 1] is a polynomial of degree ⌊d∕2⌋ + 3k + 1, which can easily be computed. Some of them are, up to a constant factor, listed in Table 1.

Table 1 Wendland functions.

The reason why these functions have become quite popular is summarised in the following theorem.

Theorem 4.

The functions ϕ d,k are positive definite on \(\mathbb{R}^{d}\) and are of the form

$$\displaystyle{\phi _{d,k}(r) = \left \{\begin{array}{ll} p_{d,k}(r),&0 \leq r \leq 1, \\ 0, &r > 1, \end{array} \right.}$$

with a univariate polynomial p d,k of degree ⌊d∕2⌋ + 3k + 1. They possess continuous derivatives up to order 2k. The degree of p d,k is minimal for a given space dimension d and smoothness 2k and are up to a constant factor uniquely determined by this setting.

The above defined functions have been generalised in the following way, see [8, 9, 22, 35].

Definition 9.

Let α > 0 and ν > −1 such that α +ν > 0. The generalised Wendland functions are defined to be

$$\displaystyle{\psi _{\nu,\alpha }(r):=\mathscr{ I}_{\alpha }\phi _{\nu }(r) = \frac{1} {2^{\alpha -1}\varGamma (\alpha )}\int _{r}^{1}(1 - t)^{\mathit{v}}t(t^{2} - r^{2})^{\alpha -1}dt.}$$

Obviously, we have

$$\displaystyle{\phi _{d,k} =\psi _{\lfloor d/2\rfloor +k+1,k}}$$

so that the new functions are indeed a generalisation of the old once. However, for arbitrary α and ν, we can neither expect ψ ν, α to be positive definite on \(\mathbb{R}^{d}\) nor will ψ ν, α in general be representable by a univariate polynomial within its support.

Nonetheless, using the machinery so far, we can compute the Fourier transform of these functions as

$$\displaystyle\begin{array}{rcl} \mathscr{F}_{d}\psi _{\nu,\alpha }(r)& =& \mathscr{F}_{d+2\alpha }\phi _{\mathit{v}}(r) = r^{-(d-2)/2-\alpha }\int _{ 0}^{1}s^{d/2+\alpha }(1 - s)^{\mathit{v}}J_{ (d-2)/2+\alpha }(rs)ds {}\\ & =& r^{-d-2\alpha -\nu }\int _{ 0}^{r}t^{d/2+\alpha }(r - t)^{\mathit{v}}J_{ (d-2)/2+\alpha }(t)dt. {}\\ \end{array}$$

The latter integral has intensively been studied and can be expressed using hyper-geometric series. Recall that such a series is formally defined to be

$$\displaystyle\begin{array}{rcl} _{p}F_{q}(z)& \equiv & _{p}F_{q}[a_{1},\ldots,a_{p};b_{1},\ldots,b_{q};z] {}\\ &:=& \sum _{n=0}^{\infty }\frac{(a_{1})_{n}\cdots (a_{p})_{n}} {(b_{1})_{n}\cdots (b_{q})_{n}} \frac{z^{n}} {n!} =:\sum _{ n=0}^{\infty }\frac{(\mathbf{a})_{n}} {(\mathbf{b})_{n}} \frac{z^{n}} {n!}, {}\\ \end{array}$$

where we assume that neither of the b 1, … b q is a non-positive integer and use the Pochhammer symbol defined as (a) n : = 1 for n = 0 and (a) n : = a(a + 1)⋯(a + n − 1) = Γ(a + n)∕Γ(a) and (a) n : = (a 1) n ⋯(a p ) n . Such a series is known to converge point-wise if p ≤ q, which is the case we are interested in.

A few simple properties and the connection to the integrals we are interested in are collected in the next lemma.

Lemma 3.

  1. 1.

    If a p = b q , then p F q = p−1 F q−1.

  2. 2.

    For β + μ > −1 and λ > −1 we have

    $$\displaystyle\begin{array}{rcl} & & \int _{0}^{r}(r - t)^{\lambda }t^{\mu }J_{\beta }(t)dt = \frac{\varGamma (\lambda +1)\varGamma (\beta +\mu + 1)} {\varGamma (\beta +1)\varGamma (\beta +\lambda +\mu +2)}2^{-\beta }r^{\beta +\lambda +\mu +1} {}\\ & & \quad \mbox{ } \cdot _{2}F_{3}\left [\frac{\beta +\mu + 1} {2}, \frac{\beta +\mu + 2} {2};\beta +1, \frac{\beta +\lambda +\mu +2} {2}, \frac{\beta +\lambda +\mu +3} {2};-r^{2}/4\right ]. {}\\ \end{array}$$
  3. 3.

    The derivatives of the hyper-geometric functions satisfy for \(n \in \mathbb{N}\) :

    $$\displaystyle\begin{array}{rcl} & & \frac{d^{n}} {dz^{n}}_{p}F_{q}[a_{1},\ldots,a_{p};b_{1},\ldots,b_{q};z] {}\\ & =& \frac{(a_{1})_{n}\cdots (a_{p})_{n}} {(b_{1})_{n}\cdots (b_{q})_{n}} _{p}F_{q}[a_{1} + n,\ldots a_{p} + n;b_{1} + n,\ldots,b_{q} + n;z]. {}\\ \end{array}$$
  4. 4.

    For r > 0 we have

    $$\displaystyle{_{1}F_{2}\left [a;a + \frac{b} {2},a + \frac{b + 1} {2};-r^{2}/4\right ] > 0}$$

    if a and b satisfy b ≥ 2a ≥ 0 or b ≥ a ≥ 1 or 0 ≤ a ≤ 1,b ≥ 1. The hyper-geometric series cannot be strictly positive if 0 ≤ b < a or if a = b ∈ (0,1).

Proof.

The first statement is obvious, the third statement can simply be checked by differentiation under the sum. The second statement can be found, for example, in [11, 13.1(56)]. The final statement is a consequence of the findings in [30]. □ 

Hence, we can continue to compute our Fourier transform. Setting μ: = d∕2 +α, λ: = ν and β: = (d − 2)∕2 +α, we see that the second statement of the lemma yields

$$\displaystyle\begin{array}{rcl} \mathscr{F}_{d}\psi _{\nu,\alpha }(r)& =& r^{-d-2\alpha -\nu }\int _{ 0}^{r}t^{d/2+\alpha }(r - t)^{\mathit{v}}J_{ (d-2)/2+\alpha }(t)dt {}\\ & =& \frac{\varGamma (\nu +1)\varGamma (d + 2\alpha )} {2^{d/2+\alpha -1}\varGamma (d/2+\alpha )\varGamma (d + 2\alpha +\nu +1)} {}\\ & & \mbox{ } \cdot _{2}{}F_{3}\left [\frac{d} {2}+\alpha, \frac{d + 1} {2} +\alpha; \frac{d} {2}+\alpha, \frac{d +\nu +1} {2} +\alpha, \frac{d +\nu +2} {2} +\alpha;-\frac{r^{2}} {4} \right ] {}\\ & =& C_{\nu,\alpha }^{d}{}_{ 1}F_{2}\left [\frac{d + 1} {2} +\alpha; \frac{d +\nu +1} {2} +\alpha, \frac{d +\nu +2} {2} +\alpha;-\frac{r^{2}} {4} \right ]. {}\\ \end{array}$$

with

$$\displaystyle{ C_{\nu,\alpha }^{d}:= \frac{\varGamma (\nu +1)\varGamma (d + 2\alpha )} {2^{d/2+\alpha -1}\varGamma (d/2+\alpha )\varGamma (d + 2\alpha +\nu +1)}. }$$
(15)

This allows us to make the following general statement on the generalised Wendland functions, which comes from [8].

Theorem 5.

Let α > 0 and ν > −1 with α + ν > 0. Then, the d-variate Fourier transform of the generalised Wendland function is given by

$$\displaystyle{\mathscr{F}_{d}\psi _{\nu,\alpha }(r) = C_{\nu,\alpha }^{d}\;_{ 1}F_{2}\left [\frac{d + 1} {2} +\alpha; \frac{d +\nu +1} {2} +\alpha, \frac{d +\nu +2} {2} +\alpha;-\frac{r^{2}} {4} \right ].}$$

Hence, ψ ν,α induces a positive definite function on \(\mathbb{R}^{d}\) if

$$\displaystyle{ \nu \geq \frac{d + 1} {2} +\alpha. }$$
(16)

Furthermore, the Fourier transform satisfies

$$\displaystyle{ \frac{d} {dr}\mathscr{F}_{d}\psi _{\nu,\alpha }(r) = -r\mathscr{F}_{d}\psi _{\nu,\alpha +1}(r). }$$
(17)

Proof.

Setting \(a = \frac{d+1} {2} +\alpha\) and b = ν shows that (16) is equivalent to b ≥ a ≥ 1. Hence, we can conclude from Lemma 3 that ψ ν, α is positive definite. To see (17), we also use Lemma 3, which yields

$$\displaystyle\begin{array}{rcl} \frac{d} {dr}\mathscr{F}_{d}\psi _{\nu,\alpha }(r)& =& C_{\nu,\alpha }^{d} \frac{d} {dr}_{1}F_{2}\left [\frac{d + 1} {2} +\alpha; \frac{d +\nu +1} {2} +\alpha, \frac{d +\nu +2} {2} +\alpha;-\frac{r^{2}} {4} \right ] {}\\ & =& \mbox{ } -\frac{r} {2}C_{\nu,\alpha }^{d} \frac{\frac{d+1} {2} +\alpha } {\left (\frac{d+\nu +1} {2} +\alpha \right )\left (\frac{d+\nu +2} {2} +\alpha \right )} {}\\ & & \mbox{ } \cdot _{1}F_{2}\left [\frac{d + 1} {2} +\alpha +1; \frac{d +\nu +1} {2} +\alpha +1\frac{d +\nu +2} {2} +\alpha +1;-\frac{r^{2}} {4} \right ] {}\\ & =& \mbox{ } - r \frac{C_{\nu,\alpha }^{d}} {C_{\nu,\alpha +1}^{d}} \frac{d + 2\alpha + 1} {(d +\nu +2\alpha + 1)(d +\nu +2\alpha + 2)}\mathscr{F}_{d}\psi _{\nu,\alpha +1}(r) {}\\ & =& \mbox{ } - r\mathscr{F}_{d}\psi _{\nu,\alpha +1}(r), {}\\ \end{array}$$

where the constant expression in the last but one line can be simplified to 1 using the recurrence formula of the Γ-function. □ 

Note that the last property can also be expressed as \(\mathscr{I}_{-1}\mathscr{F}_{d}\psi _{\nu,\alpha } =\mathscr{ F}_{d}\psi _{\nu,\alpha +1}\). It also has the following interesting consequence.

Corollary 6.

The Fourier transform of the generalised Wendland function ψ ν,α is monotonically decreasing if

$$\displaystyle{ \mathit{v} \geq \frac{d + 1} {2} +\alpha +1. }$$
(18)

Since the classical Wendland functions ϕ d, k are a special case of the generalised Wendland functions ψ ν, α , using the parameters ν = ⌊d∕2⌋ + k + 1 and α = k, we see that (16) is satisfied, i.e. we have an alternative proof for them being positive definite. However, for d ≥ n, the function ϕ d, k induces not only a positive definite function on \(\mathbb{R}^{d}\) but also on all \(\mathbb{R}^{n}\) with n ≤ d and the monotony condition (18) becomes

$$\displaystyle{\lfloor d/2\rfloor + k + 1 \geq \frac{n + 1} {2} + k + 1}$$

or

$$\displaystyle{\lfloor d/2\rfloor \geq \frac{n + 1} {2}.}$$

Corollary 7.

The Wendland function ϕ d,k induces a positive definite function on \(\mathbb{R}^{n}\) with a monotonically decreasing Fourier transform for all n ≤ 2⌊d∕2⌋− 1.

In [9], there is also a discussion of the decay of the Fourier transform of the generalised Wendland functions. This generalises earlier results from [44].

Theorem 6.

The d-dimensional Fourier transform \(\mathscr{F}_{d}\psi _{\mu,\alpha }\) of the generalised Wendland function ψ ν,α with \(\nu \geq \alpha +\frac{d+1} {2}\) and α > 0 satisfies

$$\displaystyle{c_{1}(1 + r^{2})^{-\alpha -\frac{d+1} {2} } \leq \mathscr{ F}_{d}\psi _{\nu,\alpha }(r) \leq c_{2}(1 + r^{2})^{-\alpha -\frac{d+1} {2} },\qquad r \in \mathbb{R}}$$

with certain constants c 1 ,c 2 > 0 independent of r. Hence, ψ ν,α defines a reproducing kernel of

$$\displaystyle{H^{\alpha +\frac{d+1} {2} }(\mathbb{R}^{d})}$$

As mentioned above, this recovers and generalises the decay rate established in [44] for \(\mathscr{F}_{d}\phi _{d,k}\), showing that they are reproducing kernels for \(H^{\sigma }(\mathbb{R}^{d})\) with \(\sigma = \frac{d+1} {2} + k\). While these are integer order Sobolev spaces in odd space dimensions, they are of order integer plus a half in even dimensions. But we are now also in the situation to explicitly state compactly supported RBFs which are reproducing kernels in Sobolev spaces of integer order in even dimensions. We only have to work with α = k − 1∕2, \(k \in \mathbb{N}\), and ν ≥ k + d∕2 to have a kernel for \(H^{k+d/2}(\mathbb{R}^{d})\). These kernels have a more complicated structure than the original Wendland kernels but are still easily computable. They can best be described by introducing the elementary functions

$$\displaystyle{S(r):= \sqrt{1 - r^{2}},\qquad L(r):=\log \left ( \frac{r} {1 + S(r)}\right ),\qquad r \in (0,1).}$$

Then, some of the functions together with the space dimension d and the order σ of the Sobolev space in which they are reproducing kernels are, up to a constant factor, given in Table 2. We have always chosen ν = k + d∕2 = σ, since the decay of the Fourier transform and hence the Sobolev space is independent of ν as long as ν ≥ α + (d + 1)∕2 = k + d∕2.

Table 2 The missing Wendland functions.

Yet another consequence is the existence of compactly supported reproducing kernels in Sobolev spaces.

Corollary 8.

Each \(H^{\sigma }(\mathbb{R}^{d})\) with σ > d∕2 possesses a compactly supported, radial reproducing kernel.

Proof.

The results of Theorem 6 show that \(H^{\sigma }(\mathbb{R}^{d})\) has such a reproducing kernel, namely ψ ν, α as long as \(\sigma =\alpha -\frac{d+1} {2} > \frac{d+1} {2}\). The case of σ ∈ (d∕2, (d + 1)∕2) cannot be handled with this technique but follows from another construction technique in [23]. □ 

As a matter of fact, in [23] Johnson constructed compactly supported radial functions ϕ having a d-variate Fourier transform satisfying (8) for a \(\sigma = k \in \mathbb{N}\) with k ≥ d∕4 if d is even and k ≥ max((d + 1)∕4, 2) if d is odd.

For k ∈ [d∕4, d∕2] for even d and k ∈ [(d + 1)∕4, d∕2] for odd d, this seems at first to be problematic, since it is well-known that an integrable, continuous function with a non-negative Fourier transform has automatically an integrable Fourier transform (see [46, Corollary  6.12]). The resolution of this seeming contradiction is quite simple, for such values of k the constructed ϕ in [23] is not continuous on all of \(\mathbb{R}\) any more, it might even have a pole at the origin.

When it comes to the actual computation of the Fourier transform of a generalised Wendland function, it is better to reduce the d-variate Fourier transform to a univariate one:

$$\displaystyle\begin{array}{rcl} \mathscr{F}_{d}\psi _{\nu,\alpha }(r)& =& \mathscr{F}_{1}\mathscr{I}_{\alpha +\frac{d-1} {2} }\phi _{\nu }(r) {}\\ & =& \frac{\sqrt{2/\pi }} {2^{\alpha -1+\frac{d-1} {2} }\varGamma (\alpha +\frac{d-1} {2})}\int _{0}^{1}\int _{ s}^{1}(1 - t)^{\mathit{v}}t(t^{2} - s^{2})^{\alpha -1+\frac{d-1} {2} }\cos (rs)dtds. {}\\ \end{array}$$

Specifying this to the case of ϕ d, k shows that we will naturally have a different Fourier transform for d even or d odd. The Fourier transform for odd d is easily calculated for the classical Wendland functions. As an example, we give the three-dimensional Fourier transform of

$$\displaystyle{\phi _{d,1}(r) = \frac{1} {20}(1 - r)^{4}(4r + 1).}$$

Lemma 4.

The Fourier transform of ϕ d,1 for d = 3 is given as

$$\displaystyle\begin{array}{rcl} \mathscr{F}_{3}\phi _{3,1}(r)& =& -\frac{6\sqrt{2}} {r^{8}\sqrt{\pi }}\left (r^{2} - 24)\cos r - 9r\sin r - 4r^{2} + 24\right ). {}\\ \end{array}$$

Clearly, we see that the Fourier transform decays like \(\mathscr{O}(r^{-6})\), so that the corresponding Sobolev space is \(H^{3}(\mathbb{R}^{3})\) as predicted. However, note that the numerical evaluation near the origin is highly ill-conditioned because of severe cancellation.

3.3 Other Compactly Supported Radial Basis Functions

Besides the functions introduced and discussed in the previous sections, there are plenty of others which can be found in the literature. There is the construction by Buhmann in [5], which produces also “one piece” radial functions, but the decay of their Fourier transform is unknown. There are the earlier constructions by Wu in [50], which are also one piece polynomials but with a higher degree than those mentioned above, having also a Fourier transform with isolated zeros. Then, there is the construction by Johnson in [2, 23], which produces functions with multiple radial pieces. Each of these pieces is poly-harmonic. The functions are again reproducing kernels in integer order Sobolev spaces.

4 Multiscale Interpolation and Approximation

We are now coming to the second main part of this paper, the discussion of multiscale approximation using compactly supported radial basis functions.

Let us first point out why we need a multiscale approach when working with compactly supported radial basis functions. We start with the following negative result (see, for example, [47, 49]).

Recall that the norm-minimal interpolant \(s = s_{0} = I_{X,\varPhi _{\delta }}\) to a function f ∈ C(Ω), which is based upon the data sites X = {x 1, , x N } ⊆ Ω and uses the basis function \(\varPhi _{\delta }: \mathbb{R}^{d} \rightarrow \mathbb{R}\) can be written as

$$\displaystyle{I_{X,\varPhi _{\delta }}f(\mathbf{x}) =\sum _{ j=1}^{N}\alpha _{ j}\varPhi _{\delta }(\mathbf{x} -\mathbf{x}_{j}),\qquad \mathbf{x} \in \varOmega.}$$

where the coefficients \(\boldsymbol{\alpha } \in \mathbb{R}^{N}\) are determined by the linear system \(A\boldsymbol{\alpha } = \mathbf{f}\) with the interpolation matrix A = (Φ δ (x i x j )). From now on, we will refer to this norm-minimal interpolant just as the interpolant.

Theorem 7.

Let \(\varOmega \subseteq \mathbb{R}^{d}\) be a bounded domain with a Lipschitz boundary. Let Φ be a reproducing kernel of \(H^{\sigma }(\mathbb{R}^{d})\) with σ > d∕2, i.e. \(\widehat{\varPhi }\) satisfies (9). Let Φ δ = δ −d Φ(⋅∕δ). Finally, let X ={ x 1 ,…, x N } ⊆Ω be given. Then, there is a constant C > 0 such that the error between any f ∈ H σ (Ω) and its interpolant \(s = I_{X,\varPhi _{\delta }}f\) can be bounded by

$$\displaystyle{ \|f - I_{X,\varPhi _{\delta }}f\|_{L_{2}(\varOmega )} \leq C\left (\frac{h_{X,\varOmega }} {\delta } \right )^{\sigma }\|f\|_{H^{\sigma }(\varOmega )}. }$$
(19)

Consequently, if we denote the interpolant to a function f from our local approximation spaces W j from (2) as s j and choose the support radii δ j proportional to the fill distances \(h_{j} = h_{X_{j},\varOmega }\), we cannot expect s j to converge to f with j → . We can only expect convergence if h j δ j  → 0 for j → , which means that we essentially lose the advantage of the compact support, since the interpolation matrices become more and more dense.

4.1 Quasi-Interpolation, Principle Shift-Invariant Spaces and the Strang-Fix Conditions

Even in the more ideal situation of infinite, regular data sites and even if the interpolation process is replaced by a best approximation process the above negative result remains true. This follows from the following short discussion on quasi-interpolation in principle shift-invariant spaces.

For a compactly supported, continuous function \(\varPhi: \mathbb{R}^{d} \rightarrow \mathbb{R}\) and h > 0, we define the space

$$\displaystyle{S_{h}:= \mbox{ span}\left \{\varPhi (h^{-1} \cdot -\boldsymbol{\alpha }): \boldsymbol{\alpha } \in \mathbb{Z}^{d}\right \} = \mbox{ span}\{\varPhi _{ h}(\cdot - h\boldsymbol{\alpha }): \boldsymbol{\alpha } \in \mathbb{Z}^{d}\}.}$$

Such spaces are called principal shift-invariant spaces and they mimic our local approximation spaces in the case of the data sites being a regular grid of grid size h. These spaces have extensively been studied and their approximation properties are intrinsically connected to polynomial reproduction, i.e. to the question whether \(\pi _{m}(\mathbb{R}^{d}) \subseteq S_{h}\), or alternatively, to the so-called Strang-Fix conditions.

Definition 10.

An integrable function \(\varPhi: \mathbb{R}^{d} \rightarrow \mathbb{R}\) satisfies the Strang-Fix conditions of order m if \(\widehat{\varPhi }(0)\neq 0\) and \(D^{\boldsymbol{\beta }}\widehat{\varPhi }(2\pi \boldsymbol{\alpha }) = 0\) for all \(\boldsymbol{\alpha } \in \mathbb{Z}^{d}\setminus \{\mathbf{0}\}\) and \(\boldsymbol{\beta } \in \mathbb{N}_{0}^{d}\) with \(\vert \boldsymbol{\beta }\vert \leq m\).

The following result summarises these ideas. Its proof can be found in [39].

Theorem 8.

Let \(\varPhi: \mathbb{R}^{d} \rightarrow \mathbb{R}\) be a compactly supported, continuous function. Then, the following statements are equivalent:

  1. 1.

    Φ satisfies the Strang-Fix conditions of order m.

  2. 2.

    For \(\boldsymbol{\beta } \in \mathbb{N}_{0}^{d}\) with \(\vert \boldsymbol{\beta }\vert \leq m\) the function

    $$\displaystyle{\mathbf{x}\mapsto \sum _{\boldsymbol{\alpha }\in \mathbb{Z}^{d}}\boldsymbol{\alpha }^{\boldsymbol{\beta } }\varPhi (\mathbf{x} -\boldsymbol{\alpha })}$$

    is a polynomial from \(\pi _{\vert \boldsymbol{\alpha }\vert }(\mathbb{R}^{d})\).

  3. 3.

    For each \(f \in H^{m+1}(\mathbb{R}^{d})\) and each h > 0 there are weights \(w_{\boldsymbol{\alpha }}^{h}\) such that as h → 0,

    $$\displaystyle\begin{array}{rcl} \left \|f -\sum _{\boldsymbol{\alpha }\in \mathbb{Z}^{d}}w_{\boldsymbol{\alpha }}^{h}\varPhi (h^{-1} \cdot -\boldsymbol{\alpha })\right \|_{ H^{s}(\mathbb{R}^{d})}& \leq & C_{s}h^{m+1-s}\|f\|_{ H^{s+1}(\mathbb{R}^{d})},\qquad 0 \leq s \leq p {}\\ \sum _{\boldsymbol{\alpha }\in \mathbb{Z}^{d}}\vert w_{\boldsymbol{\alpha }}^{h}\vert ^{2}& \leq & K\|f\|_{ L_{2}(\mathbb{R}^{d})}^{2}. {}\\ \end{array}$$

    The constants C s and K are independent of f.

Hence, in our terminology, at least if working on an infinite uniform grid, we could get away with just one of the approximation spaces \(S_{h} = W_{h\mathbb{Z}^{d}}\) if our compactly supported function was satisfying the Strang-Fix conditions.

However, as we will see now, if the kernel is in addition radial, it will not satisfy the Strang-Fix conditions. To prove this negative result, which goes back to Wu [51], we need two auxiliary results.

The first one is concerned with a question about the density of functions of the form exp(imx) in the continuous functions. It was first stated by Pólya as a problem in 1931 ([34]) and then solved by Szász in 1933 ([40]).

Lemma 5 (Pólya).

Let the real numbers m 1 ,m 2 ,⋯ have the properties 0 < m 1 < m 2 < ⋯ and

$$\displaystyle{\liminf _{n\rightarrow \infty } \frac{n} {m_{n}} > \frac{b - a} {2\pi } > 0.}$$

Furthermore, let f be continuous in the closed interval [a,b]. Then it will follow from

$$\displaystyle{\int _{a}^{b}f(x)\cos (m_{ a}x)dx =\int _{ a}^{b}f(x)sin(m_{ n}x)dx = 0}$$

that f vanishes identically on [a,b].

The second auxiliary result comes from number theory. It deals with the question which natural numbers can be represented as the sum of two squares, see, for example, [24].

Lemma 6.

Let a n be the nth natural number which can be expressed as a sum of two integer squares. There are constants c 1 ,c 2 > 0 and \(n_{0} \in \mathbb{N}\) such that

$$\displaystyle{c_{1}n < \frac{a_{n}} {\sqrt{\log (n)}} < c_{2}n,\qquad n \geq n_{0}.}$$

Using the operators introduced in Section 3.1, we are now able to show that there are no compactly supported radial functions, which satisfy the Strang-Fix conditions in dimensions larger than one.

Theorem 9.

For d ≥ 2, there is no non-vanishing, continuous, radial, compactly supported function \(\varPhi: \mathbb{R}^{d} \rightarrow \mathbb{R}\) , which satisfies the Strang-Fix conditions.

Proof.

Assume that \(\varPhi: \mathbb{R}^{d} \rightarrow \mathbb{R}\) is such a function, i.e. Φ = ϕ( ∥ ⋅ ∥ 2) with an even, compactly supported and continuous function \(\phi: \mathbb{R} \rightarrow \mathbb{R}\). Then, its Fourier transform is given by

$$\displaystyle{\widehat{\varPhi }(\boldsymbol{\omega }) =\mathscr{ F}_{d}\phi (\|\boldsymbol{\omega }\|_{2}) =\mathscr{ F}_{1}\mathscr{I}_{\frac{d-1} {2} }\phi (\|\omega \|_{2}) =:\mathscr{ F}_{1}\psi (\|\boldsymbol{\omega }\|_{2}).}$$

with a new compactly supported, continuous function \(\psi:=\mathscr{ I}_{\frac{d-1} {2} }\phi\). If Φ would satisfy the Strang-Fix condition, then we must therefore have

$$\displaystyle{0 =\mathscr{ F}_{1}\psi (2\pi \|\boldsymbol{\alpha }\|_{2}),\qquad \boldsymbol{\alpha } \in \mathbb{Z}^{d}\setminus \{\mathbf{0}\}.}$$

Next, let a n be the nth natural number, which can be represented by two squared integers. Since d ≥ 2, for each such a n we can therefore pick an \(\boldsymbol{\alpha }_{n} \in \mathbb{Z}^{d}\setminus \{\mathbf{0}\}\) such that \(a_{n} =\| \boldsymbol{\alpha }_{n}\|_{2}^{2}\). Thus, if we define \(m_{n}:= 2\pi \|\boldsymbol{\alpha }_{n}\|_{2} = 2\pi \sqrt{a_{n}}\), then Lemma 6 tells us that

$$\displaystyle{ \frac{n} {m_{n}} = \frac{1} {2\pi } \frac{n} {\sqrt{a_{n}}} > \frac{1} {2\pi \sqrt{c_{2}}} \frac{n} {\sqrt{n}(\log n)^{1/4}}.}$$

This particularly means that \(\liminf _{n\rightarrow \infty }(n/m_{n}) = \infty \) and hence that according to Lemma 5, we must have that \(\psi =\mathscr{ I}_{(d-1)/2}\phi \equiv 0\). But from (13) we can finally conclude that \(\phi =\mathscr{ I}_{-(d-1)/2}\psi = 0\). □ 

4.2 Multilevel Interpolation

In this section, we want to discuss and analyse the simplest case of a multilevel algorithm, which produces a global approximant from the space V n . Let us recall the general setting. We assume that we have a sequence of increasingly finer and finer finite point sets

$$\displaystyle{X_{1},X_{2},\ldots,X_{n},\ldots }$$

and a decreasing sequence of support radii

$$\displaystyle{\delta _{1} \geq \delta _{2} \geq \ldots \geq \delta _{n} \geq \ldots }$$

Then, using a compactly supported RBF \(\varPhi: \mathbb{R}^{d} \rightarrow \mathbb{R}\) and its scaled versions

$$\displaystyle{ \varPhi _{j}(\mathbf{x},\mathbf{y}):=\delta _{ j}^{-d}\varPhi ((\mathbf{x} -\mathbf{y})/\delta _{ j}), }$$
(20)

we build, as mentioned in the introduction, local approximation spaces

$$\displaystyle{ W_{j} = \mbox{ span}\{\varPhi _{j}(\cdot,\mathbf{x}): \mathbf{x} \in X_{j}\}. }$$
(21)

and global approximation spaces

$$\displaystyle{ V _{n}:= W_{1} + W_{2} +\ldots +W_{n}. }$$
(22)

In this situation, the simplest possible algorithm to compute an approximation f n  ∈ V n to a function f ∈ H is a residual correction scheme as described in Algorithm 1.

Algorithm 1: Multilevel Approximation

We will now analyse the approximation properties of this algorithm. To this end, we need a general sampling inequality. The following result comes from [32].

Lemma 7.

Let \(\varOmega \subseteq \mathbb{R}^{d}\) be a bounded domain with Lipschitz boundary. Let σ > d∕2. Let X ⊆Ω be a finite point set with sufficiently small mesh norm h X,Ω . Then, there is a constant C > 0, independent of X, such that for all f ∈ H σ (Ω) vanishing on X, we have

$$\displaystyle{\|f\|_{H^{\mu }(\varOmega )} \leq Ch_{X,\varOmega }^{\sigma -\mu }\|f\|_{ H^{\sigma }(\varOmega )}.}$$

for 0 ≤μ ≤σ.

Using \(s_{j} = I_{X_{j},\varPhi _{j}}e_{j-1}\) as the interpolant to e j−1 on X j with kernel Φ j , we have the following theorem.

Theorem 10.

Let \(\varOmega \subseteq \mathbb{R}^{d}\) be a bounded domain with Lipschitz boundary. Let X 1 ,X 2 ,… be a sequence of point sets in Ω with mesh norms h 1 ,h 2 ,… satisfying h j+1 = μh j for j = 1,2,… with fixed μ ∈ (0,1) and h 1 = μ sufficiently small. Let Φ be a kernel generating \(H^{\sigma }(\mathbb{R}^{d})\) , i.e. satisfying (9) and let Φ j be defined by (20) with scale factor δ j = vh j . Let the target function f belong to H σ (Ω). Then, there exists a constant C 1 > 0 such that

$$\displaystyle{ \|Ee_{j}\|_{\varPhi _{j+1}} \leq C_{1}\left (\mu ^{\sigma } +\nu ^{-\sigma }\right )\|Ee_{ j-1}\|_{\varPhi _{j}}\qquad \mbox{ for }j = 1,2,3,\ldots }$$
(23)

and hence there exists a constant C > 0 such that

$$\displaystyle{\|f - f_{n}\|_{L_{2}(\varOmega )} \leq C\left [C_{1}\mu ^{\sigma } + C_{ 1}\nu ^{-\sigma }\right ]^{n}\|f\|_{ H^{\sigma }(\varOmega )}\qquad \mbox{ for }n = 1,2,\ldots,}$$

provided that μv ≥γ > 0 with a constant γ > 0. Thus there are constants μ 0 ∈ (0,1) and ν 0 > 1 such that the multiscale approximation f n converges linearly to f in the L 2 norm for all μ ≤μ 0 and ν ≥ν 0 with μν ≥γ.

Proof.

The proof of this theorem can be found in [49]. We will thus only review its critical steps. The first, important auxiliary observation is that the interpolant at X j to e j−1 is the same as the interpolant to Ee j−1 since both functions coincide on X j  ⊆ Ω. Here, E denotes the extension operator from Proposition 1. From this, it follows that

$$\displaystyle\begin{array}{rcl} \|e_{j}\|_{H^{\sigma }(\varOmega )}& =& \|e_{j-1} - I_{X_{j},\delta _{j}}e_{j-1}\|_{H^{\sigma }(\varOmega )} \\ & =& \|Ee_{j-1} - I_{X_{j},\delta _{j}}Ee_{j-1}\|_{H^{\sigma }(\varOmega )} \\ & \leq & \|Ee_{j-1} - I_{X_{j},\delta _{j}}Ee_{j-1}\|_{H^{\sigma }(\mathbb{R}^{d})} \\ & \leq & C\delta _{j}^{-\sigma }\|Ee_{ j-1} - I_{X_{j},\delta _{j}}Ee_{j-1}\|_{\varPhi _{j}} \\ & \leq & C\delta _{j}^{-\sigma }\|Ee_{ j-1}\|_{\varPhi _{j}}, {}\end{array}$$
(24)

where we have used Lemma 1 and the fact that the interpolant is norm-minimal with respect to the Φ j -norm.

Then, we have

$$\displaystyle{\|Ee_{j}\|_{\varPhi _{j+1}}^{2} \leq \frac{1} {c_{1}}\int _{\mathbb{R}^{d}}\vert \widehat{Ee_{j}}(\boldsymbol{\omega })\vert ^{2}(1 + (\delta _{ j+1}\|\boldsymbol{\omega }\|_{2})^{2\sigma })d\boldsymbol{\omega } =: \frac{1} {c_{1}}\left (I_{1} + I_{2}\right )}$$

with

$$\displaystyle\begin{array}{rcl} I_{1}&:=& \int _{\mathbb{R}^{d}}\vert \widehat{Ee_{j}}(\boldsymbol{\omega })\vert ^{2}d\boldsymbol{\omega }, {}\\ I_{2}&:=& \delta _{j+1}^{2\sigma }\int _{ \mathbb{R}^{d}}\vert \widehat{Ee_{j}}(\boldsymbol{\omega })\vert ^{2}\|\boldsymbol{\omega }\|_{ 2}^{2\sigma }d\boldsymbol{\omega }. {}\\ \end{array}$$

Obviously, using Plancharel’s theorem, the properties of the extension operator and (19), we see that the first integral can be bounded by

$$\displaystyle\begin{array}{rcl} I_{1}& =& \|Ee_{j}\|_{L_{2}(\mathbb{R}^{d})}^{2} \leq c\|e_{ j}\|_{L_{2}(\varOmega )}^{2} = c\|e_{ j-1} - s_{j}\|_{L_{2}(\varOmega )}^{2} {}\\ & \leq & c\left (\frac{h_{j}} {\delta _{j}} \right )^{2\sigma }\|e_{ j-1}\|_{H^{\sigma }(\varOmega )}^{2} \leq c\nu ^{-2\sigma }\|Ee_{ j-1}\|_{\varPhi _{j}}^{2}. {}\\ \end{array}$$

Similarly, for the second integral, we can use (24) to derive the bound

$$\displaystyle\begin{array}{rcl} I_{2}& =& \delta _{j+1}^{2\sigma }\int _{ \mathbb{R}^{d}}\vert \widehat{Ee_{j}}(\boldsymbol{\omega })\vert ^{2}\|\boldsymbol{\omega }\|_{ 2}^{2\sigma }d\boldsymbol{\omega } \leq \delta _{ j+1}^{2\sigma }\int _{ \mathbb{R}^{d}}\vert \widehat{Ee_{j}}(\boldsymbol{\omega })\vert ^{2}(1 +\| \boldsymbol{\omega }\|_{ 2}^{2\sigma })d\boldsymbol{\omega } {}\\ & =& \delta _{j+1}^{2\sigma }\|Ee_{ j}\|_{H^{\sigma }(\mathbb{R}^{d})}^{2} \leq \delta _{ j+1}^{2\sigma }\|e_{ j}\|_{H^{\sigma }(\varOmega )}^{2} \leq C\left (\frac{\delta _{j+1}} {\delta _{j}} \right )^{2\sigma }\|Ee_{ j-1}\|_{\varPhi _{j}}^{2} {}\\ & \leq & C\mu ^{2\sigma }\|Ee_{ j-1}\|_{\varPhi _{j}}^{2}. {}\\ \end{array}$$

Piecing these bounds together gives estimate (23). The rest then more or less follows by applying (23) iteratively and the following observation. Since e n  = ff n vanishes on X n , we have by Lemma 7 and Lemma 1 that

$$\displaystyle\begin{array}{rcl} \|f - f_{n}\|_{L_{2}(\varOmega )}& =& \|e_{n}\|_{L_{2}(\varOmega )} \leq Ch_{n}^{\sigma }\|e_{ n}\|_{H^{\sigma }(\varOmega )} \leq Ch_{n}^{\sigma }\|Ee_{ n}\|_{H^{\sigma }(\mathbb{R}^{d})} \\ & \leq & Ch_{n}^{\sigma }\delta _{ n+1}^{-\sigma }\|Ee_{ n}\|_{\varPhi _{n+1}} = C\|Ee_{n}\|_{\varPhi _{n+1}}, {}\end{array}$$
(25)

since \(h_{n}/\delta _{n+1} = h_{n}/(\mathit{v}h_{n+1}) \leq = \frac{1} {\nu \mu } \leq \frac{1} {\gamma }\). □ 

Though we cannot directly determine μ 0 and ν 0, equation (23) gives some insight into the influence of the two critical parameters μ and ν. On the one hand, the parameter μ determines how much we have to refine our data set from level to level. Hence, the smaller μ the more points we have to use in the next level.

On the other hand, the parameter ν determines the relation between the support radius and the fill distance. Here, a larger ν means that we have more non-zero entries per row in the interpolation matrix, which increases the computational cost. Nonetheless, increasing ν is less critical than decreasing μ.

But there is yet another consequence of this theorem. For simplicity, let us eliminate one of the parameters by setting ν = μ −1 so that (23) becomes

$$\displaystyle{ \|Ee_{j}\|_{\varPhi _{j+1}} \leq C_{1}\mu ^{\sigma }\|Ee_{ j-1}\|_{\varPhi _{j}} }$$
(26)

and we have convergence for all μ > 0 with C 1 μ σ < 1. However, we even have convergence if for an arbitrary σ > ɛ > 0 we have C 1 μ ɛ ≤ 1. In this case, (26) becomes

$$\displaystyle{ \|Ee_{j}\|_{\varPhi _{j+1}} \leq \mu ^{\sigma -\varepsilon }\|Ee_{ j-1}\|_{\varPhi _{j}}. }$$
(27)

We can also revisit (25) by choosing an 0 ≤ β ≤ σɛ. Then, Lemma 7 applied in the derivation of (25) yields

$$\displaystyle{\|f - f_{n}\|_{H^{\beta }(\varOmega )} \leq Ch_{n}^{\sigma -\beta }\|e_{ n}\|_{H^{\sigma }(\varOmega )} \leq Ch_{n}^{\sigma -\beta }\delta _{ n-1}^{-\sigma }\|Ee_{ n}\|_{\varPhi _{n+1}} = Ch_{n}^{-\beta }\|Ee_{ n}\|_{\varPhi _{n+1}}.}$$

Using (27) n times yields the estimate

$$\displaystyle{\|f - f_{n}\|_{H^{\beta }(\varOmega )} \leq Ch_{n}^{-\beta }\mu ^{n(\sigma -\varepsilon )}\|f\|_{ H^{\sigma }(\varOmega )}.}$$

Finally, the fact that h 1 = μ and h j+1 = μ h j shows that h n  = μ n so that we can rephrase the last estimate as

$$\displaystyle{\|f - f_{n}\|_{H^{\beta }(\varOmega )} \leq Ch_{n}^{\sigma -\varepsilon -\beta }\|f\|_{ H^{\sigma }(\varOmega )},}$$

i.e., we have not only derived an estimate also for derivatives but have expressed the error in terms of the fill-distance of the finest data set. The exponent is almost optimal.

Corollary 9.

Under the assumptions of Theorem  10 with ν = 1∕μ and 0 ≤β ≤σ −ɛ we have the error bound

$$\displaystyle{\|f - f_{n}\|_{H^{\beta }(\varOmega )} \leq Ch_{n}^{\sigma -\varepsilon -\beta }\|f\|_{ H^{\sigma }(\varOmega )},}$$

provided that μ is sufficiently small.

There are several possible extensions to this theorem. First of all, the condition h j+1 = μ h j can be relaxed to something like c μ h j  ≤ h j+1 ≤ μ h j with fixed μ, c ∈ (0, 1) without altering the result. Secondly, the algorithm also converges if the target function f is rougher, say f ∈ H τ(Ω) with d∕2 < τ < σ. Details can be found in [49].

From a numerical point of view, the multilevel scheme is extremely efficient. Once, the neighbourhood information are known, i.e. once we know for each level and each data site x j () those data sites x k () which are relevant for the computation, i.e. those with ∥ x k ()x j () ∥ 2 ≤ δ , we have the following computational cost.

Corollary 10.

If the data sets X j are quasi-uniform, i.e q j ≈ h j , then we have for the involved linear systems:

  • The number of non-zero entries per row is independent of the level.

  • The condition number is independent of the level.

  • The number of steps required by a non-preconditioned CG method is independent of the level.

  • The computational cost is linear in each level.

The neighbourhood information can be assembled in \(\mathscr{O}(N_{j}\log N_{j})\) time using tree-based search structures. Finally, we also have to compute the residuals. If the data sets are nested, we can restrict ourselves to compute residuals only on the finest level. If they are not nested, then we have to do this in step j for the remaining point sets X j+1, , X n . Again, the neighbourhood information can be collected in \(\mathscr{O}(N_{j}\log N_{j})\) time for level j. Moreover, the number of levels is, because of the uniform refinement, at most \(\mathscr{O}(\log N_{n})\).

Before we come to numerical results, we want to discuss briefly two versions of this algorithm, one which discards unnecessary coefficients and one which only considers necessary data sites.

The first one was introduced in [25] as one of two discarding strategies for multilevel algorithms on the sphere. The general idea is that after computing the local interpolant at level j, which has a representation of the form

$$\displaystyle{s_{j} =\sum _{ k=1}^{N_{j} }\boldsymbol{\alpha }_{k}^{(j)}\varPhi _{ j}(\cdot,\mathbf{x}_{k}^{(j)}),}$$

to discard all coefficients \(\boldsymbol{\alpha }_{k}^{(j)}\) which have an absolute value smaller than a given threshold. This threshold can and should be level dependent. Since the discarding is done during each level, it was named discarding dynamically in contrast to the strategy of discarding after all steps have been computed. The Algorithm is formally given in Algorithm 2.

Algorithm 2: Multilevel Approximation with Dynamical Discarding

It is possible to show convergence of this algorithm in a very similar way as it has been done in the proof of Theorem 10. Details can be found in [25].

Theorem 11.

Let ɛ > 0 be given. Assume that the assumption of Theorem  10 are satisfied. Let ɛ j ≤ c 1 ɛδ j d∕2 with a constant c 1 > 0. Finally, let α:= C 1 σ + ν −σ ). Then there is a constant C > 0 such that

$$\displaystyle{\|\widetilde{e}_{j}\|_{\varPhi _{j+1}} \leq \alpha \|\widetilde{ e}_{j-1}\|_{\phi _{j}} + C\varepsilon.}$$

Hence, after n steps the error can be bounded by

$$\displaystyle{\|f -\widetilde{ f}_{n}\|_{L_{2}(\varOmega )} \leq C\alpha ^{n}\|f\|_{ H^{\sigma }(\varOmega )} + C\varepsilon \frac{1 -\alpha ^{n}} {1-\alpha }.}$$

The second variation of our standard multilevel interpolation algorithm is an adaptive version. After computing the local interpolant s j on X j to e j−1, we can check the error e j  = e j−1s j on the upcoming data set X j+1. Then, instead of interpolating this error on all of X j+1, we will actually only use those points of X j+1 on which e j has an absolute value larger than a given threshold ɛ j  > 0. To describe this algorithm in more detail let us denote the initial point sets by \(\widetilde{X}_{j}\) and let us denote the adaptive point sets which are actually used by X j . Then, the algorithm can be stated as in Algorithm 3.

Algorithm 3: Adaptive Multilevel Approximation

An error analysis of this algorithm is more problematic since it would require to know e j on all of \(\widetilde{X}_{j}\) but we only know e j on \(X_{j} \subseteq \widetilde{ X}_{j}\). We could avoid this problem by creating an inner loop in which we check e j on \(\widetilde{X}_{j}\) and add those points for which e j is still too large. However, in practice this does not seem to be necessary.

4.3 Numerical Examples

We will now look at two examples. In the first example, we are interested in understanding the importance of the parameters ν, which is responsible for the relation between the support radius and the fill distance on each level, and the parameter μ, which is responsible for the refinement of the data sets from level to level.

In this example, we will only be interested in varying the parameter ν. We will work on Ω = [0, 1]2 and use the Franke function, see, for example, [20] and the left half of Figure 1, as our target function. The RBF is given by ϕ(r) = (1 − r)+ 4(4r + 1), i.e. the C 2-Wendland function in \(\mathbb{R}^{2}\). We will work only on equidistant grids X j with grid size q j  = 2j−2, 1 ≤ j ≤ 8, which is also the separation distance of the data set X j and hence equivalent to its fill distance. This means that we fix the refinement factor to be μ = 1∕2. We then vary the overlap factor ν, by defining \(\delta _{j} =\widetilde{\nu } q_{j}\) and changing \(\widetilde{\nu }\). Finally, we measure the error on a fine grid of grid size q = 2−11.

Fig. 1
figure 1

Franke function (left), step function (right).

A typical result can be found in Table 3, where we have chosen the overlap factor to be \(\widetilde{\nu }= 3\), which means that we have at most 25 non-zero entries per row in our matrix, independent of the level. The table contains the number of points per level, the discrete 2- and -error and the approximation order computed from comparing two successive level. Since ϕ 2, 1 is a reproducing kernel in \(H^{\sigma }(\mathbb{R}^{d})\) with σ = k + (d + 1)∕2 = 2. 5, which means we would expect an L 2-approximation order σ = 2. 5 and an L -approximation order of σd∕2 = 1. 5. Finally, the table also contains the number of steps an unpreconditioned CG method requires to compute an approximate solution of the linear system with a relative accuracy of 10−6 a higher accuracy does not lead to a significant change in the solutions but only to a larger number CG steps.

Table 3 Approximation of the Franke function. Basis function ϕ 2, 1 ∈ C 2, \(\widetilde{\nu }= 3\).

In Table 4 we have collected the error and convergence order estimates for increasing overlaps. It seems that the expected order of 2. 5 cannot only be matched but is actually exceeded. The number of non-zero entries per row increases to at most 373 in the case of \(\widetilde{\nu }= 11\). The example also indicates that halving the fill distance, i.e. choosing μ = 1∕2 seems to be an appropriate choice and that the convergence order can be achieved solemnly by increasing the initial support radius.

Table 4 Approximation of the Franke function. Basis function ϕ 2, 1 ∈ C 2, various \(\widetilde{\nu }\).

In our second example, we also want to test the other two algorithms. To this end, we keep the general set-up of our first example but change the test function to a C step function given by

$$\displaystyle{ f(\mathbf{x}) = \frac{1} {2}\tanh (2000(x_{2} + x_{1} - 1)),\qquad \mathbf{x} \in [0,1]^{2}, }$$
(28)

which is shown in Figure 1 (right). We keep the overlap fixed at \(\widetilde{\nu }= 3\). The results of the standard multilevel algorithm are given in Table 5 and Figure 2. Clearly, convergence is much slower. In particular we have typical overshoots near the edge of the step function, at least in the first few levels.

Fig. 2
figure 2

Approximation of the step function, levels 0 (upper left) to 7 (lower right).

Table 5 Approximation of the step function (28). Basis function ϕ 2, 1 ∈ C 2, \(\widetilde{\nu }= 3\).

The result of the dynamically discarding algorithm, Algorithm 2, is given in Table 6. For these results we have set the overall threshold to ɛ = 10−5 and have on level j then discarded all the coefficients with absolute value less than ɛ δ j . Clearly, the results are very similar to those of the standard algorithm, Algorithm 1. But the total number of point information used reduces from 1,402,168 in Algorithm 1 to just 31,506 in Algorithm 2, i.e. we use only 2.25% of the original points. As we can see from Figure 3, the reason for this is that only those basis functions centred at data sites near the edge of the step function or near the boundary of the domain have large coefficients.

Fig. 3
figure 3

Approximation of the step function with dynamical discarding, used data sites for levels 0 (upper left) to 7 (lower right).

Table 6 Approximation of the step function with dynamical discarding. Basis function ϕ 2, 1 ∈ C 2, \(\widetilde{\nu }= 3\).

The results of the adaptive algorithm are given in Table 7. Here we have used the thresholding strategy that we only considered points on the next level where the error of the current level is larger than 10−3 times the maximum error of the current interpolant on the next level. Again, the errors are comparable, though slightly worse. The total number of points used is now 74, 029 out of 1, 402, 168, or 5. 3%. Figure 4 shows the data sites which are actually used. The pattern is similar to the one created by the dynamically discarding algorithm, though more points are kept towards the boundary of the computational domain.

Fig. 4
figure 4

Adaptive approximation of the step function, used data sites for levels 0 (upper left) to 7 (lower right).

Table 7 Adaptive Approximation of the step function. Basis function ϕ 2, 1 ∈ C 2, \(\widetilde{\nu }= 3\).

5 Other Multilevel Schemes

So far, we have discussed multilevel and multiscale methods for interpolation problems. It is straight-forward to extend the result to the optimal recovery or smoothing spline approach (4). The results of Theorem 10 and Corollary 9 remain valid if the smoothing parameter ɛ is chosen level dependent satisfying ɛ j  ≤ c(h j δ j )2σ, see [49].

In [41], the multilevel scheme is analysed for target functions being zero on the boundary. The centres and the support radii on each level are chosen such that the multilevel interpolant is satisfying zero boundary conditions, as well, i.e. the supports of the basis functions are always contained in the domain Ω.

In [13], a multilevel scheme for vector-valued, divergence-free functions is analysed. Here, matrix-valued kernels are employed. In this situation, it is not possible anymore to keep the ratio between fill distance and support radius constant.

The paper [29] follows closely the original papers [26, 27] and discusses convergence orders for interpolation on the sphere, similar to those derived in Corollary 9. It also deals with inverse estimates, i.e. statements that show that a certain convergence rate must lead to a certain smoothness of the target function. These are mainly based upon the results from [36].

Shortly after the introduction of compactly supported RBFs and the initial, purely numerical papers on multilevel interpolation schemes such as [18, 19] and the first theoretical results such as [21, 31], which all differ substantially from the theoretical results given in this paper, also first attempts based upon collocation for solving PDEs numerically with such a multilevel scheme have been derived, see, for example, [14, 16] and [7]. However, only recently, the numerical observations could be underpinned with theoretical proofs in [12, 28]. There have also been first papers on the solution of PDEs based upon a weak formulation and a Galerkin method, see [10, 45]. The first of these two papers is purely numerically, while the second one contains theoretical results, which seem, however, to be too optimistic.

Finally, multiscale schemes like those discussed here have been used in computer graphics (see [33]), in the context of neural networks (see [17]) and in the context of machine learning (see [52]).