This chapter and its sequels consider several spectral methods for uncertainty quantification. At their core, these are orthogonal decomposition methods in which a random variable stochastic process (usually the solution of interest) over a probability space \((\varTheta,\mathcal{F},\mu )\) is expanded with respect to an appropriate orthogonal basis of \(L^{2}(\varTheta,\mu; \mathbb{R})\). This chapter lays the foundations by considering spectral expansions in general, starting with the Karhunen–Loève bi-orthogonal decomposition, and continuing with orthogonal polynomial bases for \(L^{2}(\varTheta,\mu; \mathbb{R})\) and the resulting polynomial chaos decompositions. Chapters 12 and 13 will then treat two classes of methods for the determination of coefficients in spectral expansions, the intrusive and non-intrusive approaches respectively.

11.1 Karhunen–Loève Expansions

Fix a domain \(\mathcal{X} \subseteq \mathbb{R}^{d}\) (which could be thought of as ‘space’, ‘time’ or a general parameter space) and a probability space \((\varTheta,\mathcal{F},\mu )\). The Karhunen–Loève expansion of a square-integrable stochastic process \(U: \mathcal{X}\times \varTheta \rightarrow \mathbb{R}\) is a particularly nice spectral decomposition, in that it decomposes U in a bi-orthogonal fashion, i.e. in terms of components that are both orthogonal over the spatio-temporal domain \(\mathcal{X}\) and the probability space \(\varTheta\). To be more precise, consider a stochastic process \(U: \mathcal{X}\times \varTheta \rightarrow \mathbb{R}\) such that

  • for all \(x \in \mathcal{X}\), \(U(x) \in L^{2}(\varTheta,\mu; \mathbb{R})\);

  • for all \(x \in \mathcal{X}\), \(\mathbb{E}_{\mu }[U(x)] = 0\);

  • the covariance function \(C_{U}(x,y):= \mathbb{E}_{\mu }[U(x)U(y)]\) is a well-defined continuous function of \(x,y \in \mathcal{X}\).

Remark 11.1.

  1. (a)

    The condition that U is a zero-mean process is not a serious restriction; if U is not a zero-mean process, then simply consider \(\tilde{U}\) defined by \(\tilde{U}(x,\theta ):= U(x,\theta ) - \mathbb{E}_{\mu }[U(x)]\).

  2. (b)

    It is common in practice to see the covariance function interpreted as providing some information on the correlation length of the process U. That is, C U (x, y) depends only upon \(\|x - y\|\) and, for some function \(g: [0,\infty ) \rightarrow [0,\infty )\), \(C_{U}(x,y) = g(\|x - y\|)\). A typical such g is \(g(r) =\exp (-r/r_{0})\), and the constant r 0 encodes how similar values of U at nearby points of \(\mathcal{X}\) are expected to be; when the correlation length r 0 is small, the field U has dissimilar values near to one another, and so is rough; when r 0 is large, the field U has only similar values near to one another, and so is more smooth.

By abuse of notation, C U will also denote the covariance operator of U, which the linear operator \(C_{U}: L^{2}(\mathcal{X},\mathrm{d}x; \mathbb{R}) \rightarrow L^{2}(\mathcal{X},\mathrm{d}x; \mathbb{R})\) defined by

$$\displaystyle{(C_{U}f)(x):=\int _{\mathcal{X}}C_{U}(x,y)f(y)\,\mathrm{d}y\mbox{.}}$$

Now let \(\{\psi _{n}\mid n \in \mathbb{N}\}\) be an orthonormal basis of eigenvectors of \(L^{2}(\mathcal{X},\mathrm{d}x; \mathbb{R})\) with corresponding eigenvalues \(\{\lambda _{n}\mid n \in \mathbb{N}\}\), i.e.

$$\displaystyle\begin{array}{rcl} \int _{\mathcal{X}}C_{U}(x,y)\psi _{n}(y)\,\mathrm{d}y& =& \lambda _{n}\psi _{n}(x), {}\\ \int _{\mathcal{X}}\psi _{m}(x)\psi _{n}(x)\,\mathrm{d}x& =& \delta _{mn}. {}\\ \end{array}$$

Definition 11.2.

Let \(\mathcal{X}\) be a first-countable topological space. A function \(K: \mathcal{X}\times \mathcal{X} \rightarrow \mathbb{R}\) is called a Mercer kernel if

  1. (a)

    K is continuous;

  2. (b)

    K is symmetric, i.e. K(x, x′) = K(x′, x) for all \(x,x' \in \mathcal{X}\); and

  3. (c)

    K is positive semi-definite in the sense that, for all choices of finitely many points \(x_{1},\ldots,x_{n} \in \mathcal{X}\), the Gram matrix

    $$\displaystyle{G:= \left [\begin{array}{*{10}c} K(x_{1},x_{1}) &\cdots & K(x_{1},x_{n})\\ \vdots & \ddots & \vdots \\ K(x_{n},x_{1})&\cdots &K(x_{n},x_{n})\end{array} \right ]}$$

    is positive semi-definite, i.e. satisfies \(\xi \cdot G\xi \geq 0\) for all \(\xi \in \mathbb{R}^{n}\).

Theorem 11.3 (Mercer).

Let \(\mathcal{X}\) be a first-countable topological space equipped with a complete Borel measure μ. Let \(K: \mathcal{X}\times \mathcal{X} \rightarrow \mathbb{R}\) be a Mercer kernel. If x ↦ K(x,x) lies in \(L^{1}(\mathcal{X},\mu; \mathbb{R})\) , then there is an orthonormal basis \(\{\psi _{n}\}_{n\in \mathbb{N}}\) of \(L^{2}(\mathcal{X},\mu; \mathbb{R})\) consisting of eigenfunctions of the operator

$$\displaystyle{f\mapsto \int _{\mathcal{X}}K(\cdot,y)f(y)\,\mathrm{d}\mu (y)}$$

with non-negative eigenvalues \(\{\lambda _{n}\}_{n\in \mathbb{N}}\) . Furthermore, the eigenfunctions corresponding to non-zero eigenvalues are continuous, and

$$\displaystyle{K(x,y) =\sum _{n\in \mathbb{N}}\lambda _{n}\psi _{n}(x)\psi _{n}(y)\mbox{,}}$$

and this series converges absolutely, uniformly over compact subsets of \(\mathcal{X}\) .

The proof of Mercer’s theorem will be omitted, since the main use of the theorem is just to inform various statements about the eigendecomposition of the covariance operator in the Karhunen–Loève theorem. However, it is worth comparing the conditions of Mercer’s theorem to those of Sazonov’s theorem (Theorem 2.49): together, these two theorems show which integral kernels can be associated with covariance operators of Gaussian measures.

Theorem 11.4 (Karhunen–Loève).

Let \(U: \mathcal{X}\times \varTheta \rightarrow \mathbb{R}\) be square-integrable stochastic process, with mean zero and continuous and square-integrable Footnote 1 covariance function. Then

$$\displaystyle{U =\sum _{n\in \mathbb{N}}Z_{n}\psi _{n}}$$

in L 2 , where the \(\{\psi _{n}\}_{n\in \mathbb{N}}\) are orthonormal eigenfunctions of the covariance operator C U , the corresponding eigenvalues \(\{\lambda _{n}\}_{n\in \mathbb{N}}\) are non-negative, the convergence of the series is in \(L^{2}(\varTheta,\mu; \mathbb{R})\) and uniform among compact families of \(x \in \mathcal{X}\) , with

$$\displaystyle{Z_{n} =\int _{\mathcal{X}}U(x)\psi _{n}(x)\,\mathrm{d}x\mbox{.}}$$

Furthermore, the random variables Z n are centred, uncorrelated, and have variance \(\lambda _{n}\) :

$$\displaystyle{\mathbb{E}_{\mu }[Z_{n}] = 0\mbox{, and }\mathbb{E}_{\mu }[Z_{m}Z_{n}] =\lambda _{n}\delta _{mn}\mbox{.}}$$

Proof.

By Exercise 2.1, and since the covariance function C U is continuous and square-integrable on \(\mathcal{X}\times \mathcal{X}\), it is integrable on the diagonal, and hence is a Mercer kernel. So, by Mercer’s theorem, there is an orthonormal basis \(\{\psi _{n}\}_{n\in \mathbb{N}}\) of \(L^{2}(\mathcal{X},\mathrm{d}x; \mathbb{R})\) consisting of eigenfunctions of the covariance operator with non-negative eigenvalues \(\{\lambda _{n}\}_{n\in \mathbb{N}}\). In this basis, the covariance function has the representation

$$\displaystyle{C_{U}(x,y) =\sum _{n\in \mathbb{N}}\lambda _{n}\psi _{n}(x)\psi _{n}(y)\mbox{.}}$$

Write the process U in terms of this basis as

$$\displaystyle{U(x,\theta ) =\sum _{n\in \mathbb{N}}Z_{n}(\theta )\psi _{n}(x)\mbox{,}}$$

where the coefficients \(Z_{n} = Z_{n}(\theta )\) are given by orthogonal projection:

$$\displaystyle{Z_{n}(\theta ):=\int _{\mathcal{X}}U(x,\theta )\psi _{n}(x)\,\mathrm{d}x\mbox{.}}$$

(Note that these coefficients Z n are real-valued random variables.) Then

$$\displaystyle{\mathbb{E}_{\mu }[Z_{n}] = \mathbb{E}_{\mu }\left [\int _{\mathcal{X}}U(x)\psi _{n}(x)\,\mathrm{d}x\right ] =\int _{\mathcal{X}}\mathbb{E}_{\mu }[U(x)]\psi _{n}(x)\,\mathrm{d}x = 0\mbox{.}}$$

and

$$\displaystyle\begin{array}{rcl} \mathbb{E}_{\mu }[Z_{m}Z_{n}]& =& \mathbb{E}_{\mu }\left [\int _{\mathcal{X}}U(x)\psi _{m}(x)\,\mathrm{d}x\int _{\mathcal{X}}U(x)\psi _{n}(x)\,\mathrm{d}x\right ] {}\\ & =& \mathbb{E}_{\mu }\left [\int _{\mathcal{X}}\int _{\mathcal{X}}\psi _{m}(x)U(x)U(y)\psi _{n}(y)\,\mathrm{d}y\mathrm{d}x\right ] {}\\ & =& \int _{\mathcal{X}}\psi _{m}(x)\int _{\mathcal{X}}\mathbb{E}_{\mu }[U(x)U(y)]\psi _{n}(y)\,\mathrm{d}y\mathrm{d}x {}\\ & =& \int _{\mathcal{X}}\psi _{m}(x)\int _{\mathcal{X}}C_{U}(x,y)\psi _{n}(y)\,\mathrm{d}y\mathrm{d}x {}\\ & =& \int _{\mathcal{X}}\psi _{m}(x)\lambda _{n}\psi _{n}(x)\,\mathrm{d}x {}\\ & =& \lambda _{n}\delta _{mn}\mbox{.} {}\\ \end{array}$$

Let \(S_{N}:=\sum _{ n=1}^{N}Z_{n}\psi _{n}: \mathcal{X}\times \varTheta \rightarrow \mathbb{R}\). Then, for any \(x \in \mathcal{X}\),

$$\displaystyle\begin{array}{rcl} & & \mathbb{E}_{\mu }\left [\vert U(x) - S_{N}(x)\vert ^{2}\right ] {}\\ & & \quad = \mathbb{E}_{\mu }[U(x)^{2}] + \mathbb{E}_{\mu }[S_{ N}(x)^{2}] - 2\mathbb{E}_{\mu }[U(x)S_{ N}(x)] {}\\ & & \quad = C_{U}(x,x) + \mathbb{E}_{\mu }\left [\sum _{n=1}^{N}\sum _{ m=1}^{N}Z_{ n}Z_{m}\psi _{m}(x)\psi _{n}(x)\right ] - 2\mathbb{E}_{\mu }\left [U(x)\sum _{n=1}^{N}Z_{ n}\psi _{n}(x)\right ] {}\\ & & \quad = C_{U}(x,x) +\sum _{ n=1}^{N}\lambda _{ n}\psi _{n}(x)^{2} - 2\mathbb{E}_{\mu }\left [\sum _{ n=1}^{N}\int _{ \mathcal{X}}U(x)U(y)\psi _{n}(y)\psi _{n}(x)\,\mathrm{d}y\right ] {}\\ & & \quad = C_{U}(x,x) +\sum _{ n=1}^{N}\lambda _{ n}\psi _{n}(x)^{2} - 2\sum _{ n=1}^{N}\int _{ \mathcal{X}}C_{U}(x,y)\psi _{n}(y)\psi _{n}(x)\,\mathrm{d}y {}\\ & & \quad = C_{U}(x,x) -\sum _{n=1}^{N}\lambda _{ n}\psi _{n}(x)^{2} {}\\ & & \quad \rightarrow 0\mbox{ as $N \rightarrow \infty $,} {}\\ \end{array}$$

where the convergence with respect of x, uniformly over compact subsets of \(\mathcal{X}\), follows from Mercer’s theorem. □ 

Among many possible decompositions of a random field, the Karhunen–Loève expansion is optimal in the sense that the mean-square error of any truncation of the expansion after finitely many terms is minimal. However, its utility is limited since the covariance function of the solution process is often not known a priori. Nevertheless, the Karhunen–Loève expansion provides an effective means of representing input random processes when their covariance structure is known, and provides a simple method for sampling Gaussian measures on Hilbert spaces, which is a necessary step in the implementation of the methods outlined in Chapter 6

Example 11.5.

Suppose that \(C: \mathcal{H}\rightarrow \mathcal{H}\) is a self-adjoint, positive-definite, nuclear operator on a Hilbert space \(\mathcal{H}\) and let \(m \in \mathcal{H}\). Let \((\lambda _{k},\psi _{k})_{k\in \mathbb{N}}\) be a sequence of orthonormal eigenpairs for C, ordered by decreasing eigenvalue \(\lambda _{k}\). Let \(\varXi _{1},\varXi _{2},\ldots\) be independently distributed according to the standard Gaussian measure \(\mathcal{N}(0,1)\) on \(\mathbb{R}\). Then, by the Karhunen–Loève theorem,

$$\displaystyle{ U:= m +\sum _{ k=1}^{\infty }\lambda _{ k}^{1/2}\varXi _{ k}\psi _{k} }$$
(11.1)

is an \(\mathcal{H}\)-valued random variable with distribution \(\mathcal{N}(m,C)\). Therefore, a finite sum of the form \(m +\sum _{ k=1}^{K}\lambda _{k}^{1/2}\varXi _{k}\psi _{k}\) for large K is a reasonable approximation to a \(\mathcal{N}(m,C)\)-distributed random variable; this is the procedure used to generate the sample paths in Figure 11.1.

Fig. 11.1
figure 1

Approximate sample paths of the Gaussian distribution on H 0 1([0, 1]) that has mean path m(x) = x(1 − x) and covariance operator \({\bigl (-\frac{\mathrm{d}^{2}} {\mathrm{d}x^{2}} \bigr )}^{-1}\). Along with the mean path (black), six sample paths (grey) are shown for truncated Karhunen–Loève expansions using \(K \in \mathbb{N}\) terms. Except for the non-trivial mean, these are approximate draws from the unit Brownian bridge on [0, 1].

Note that the real-valued random variable \(\lambda _{k}^{1/2}\varXi _{k}\) has Lebesgue density proportional to \(\exp (-\vert \xi _{k}\vert ^{2}/2\lambda _{k})\). Therefore, although Theorem 2.38 shows that the infinite product of Lebesgue measures on \(\mathop{\mathrm{span}}\nolimits \{\psi _{k}\mid k \in \mathbb{N}\}\) cannot define an infinite-dimensional Lebesgue measure on \(\mathcal{H}\), Um defined by (11.1) may be said to have a ‘formal Lebesgue density’ proportional to

$$\displaystyle\begin{array}{rcl} \prod _{k\in \mathbb{N}}\exp \left (-\frac{\vert \xi _{k}\vert ^{2}} {2\lambda _{k}} \right )& =& \exp \left (-\frac{1} {2}\sum _{k\in \mathbb{N}} \frac{\vert \xi _{k}\vert ^{2}} {\lambda _{k}} \right ) {}\\ & =& \exp \left (-\frac{1} {2}\sum _{k\in \mathbb{N}} \frac{\vert \langle u - m,\psi _{k}\rangle _{\mathcal{H}}\vert ^{2}} {\lambda _{k}} \right ) {}\\ & =& \exp \left (-\frac{1} {2}{\bigl \|C^{-1/2}(u - m)\bigr \|}_{ \mathcal{H}}^{2}\right ) {}\\ \end{array}$$

by Parseval’s theorem and the eigenbasis representation of C. This formal derivation should make it intuitively reasonable that U is a Gaussian random variable on \(\mathcal{H}\) with mean m and covariance operator C. For more general sampling schemes of this type, see the later remarks on the sampling of Besov measures.

Principal Component Analysis.

As well as being useful for the analysis of random paths, surfaces, and so on, Karhunen–Loève expansions are also useful in the analysis of finite-dimensional random vectors and sample data:

Definition 11.6.

A principal component analysis of an \(\mathbb{R}^{N}\)-valued random vector U is the Karhunen–Loève expansion of U seen as a stochastic process \(U: \{1,\ldots,N\} \times \mathcal{X} \rightarrow \mathbb{R}\). It is also known as the discrete Karhunen–Loève transform, the Hotelling transform and the proper orthogonal decomposition.

Principal component analysis is often applied to sample data, and is intimately related to the singular value decomposition:

Example 11.7.

Let \(X \in \mathbb{R}^{N\times M}\) be a matrix whose columns are M independent and identically distributed samples from some probability measure on \(\mathbb{R}^{N}\), and assume without loss of generality that the samples have empirical mean zero. The empirical covariance matrix of the samples is

$$\displaystyle{\widehat{C}:= \tfrac{1} {M}XX^{\mathsf{T}}.}$$

(If the samples do not have empirical mean zero, then the empirical mean should be subtracted first, and then \(\frac{1} {M}\) in the definition of \(\widehat{C}\) should be replaced by \(\frac{1} {M-1}\) so that \(\widehat{C}\) will be an unbiased estimator of the true covariance matrix C.) The eigenvalues \(\lambda _{n}\) and eigenfunctions ψ n of the Karhunen–Loève expansion are just the eigenvalues and eigenvectors of this matrix \(\widehat{C}\). Let \(\varLambda \in \mathbb{R}^{N\times N}\) be the diagonal matrix of the eigenvalues \(\lambda _{n}\) (which are non-negative, and are assumed to be in decreasing order) and \(\varPsi \in \mathbb{R}^{N\times N}\) the matrix of corresponding orthonormal eigenvectors, so that \(\widehat{C}\) diagonalizes as

$$\displaystyle{\widehat{C} =\varPsi \varLambda \varPsi ^{\mathsf{T}}.}$$

The principal component transform of the data X is \(W:=\varPsi ^{\mathsf{T}}X\); this is an orthogonal transformation of \(\mathbb{R}^{N}\) that transforms X to a new coordinate system in which the greatest component-wise variance comes to lie on the first coordinate (called the first principal component), the second greatest variance on the second coordinate, and so on.

On the other hand, taking the singular value decomposition of the data (normalized by the number of samples) yields

$$\displaystyle{ \tfrac{1} {\sqrt{M}}X = U\varSigma V ^{\mathsf{T}},}$$

where \(U \in \mathbb{R}^{N\times N}\) and \(V \in \mathbb{R}^{M\times M}\) are orthogonal and \(\varSigma \in \mathbb{R}^{N\times M}\) is diagonal with decreasing non-negative diagonal entries (the singular values of \(\frac{1} {\sqrt{M}}X\)). Then

$$\displaystyle{\widehat{C} = U\varSigma V ^{\mathsf{T}}(U\varSigma V ^{\mathsf{T}})^{\mathsf{T}} = U\varSigma V ^{\mathsf{T}}V \varSigma ^{\mathsf{T}}U^{\mathsf{T}} = U\varSigma ^{2}U^{\mathsf{T}}.}$$

from which we see that U = Ψ and \(\varSigma ^{2} =\varLambda\). This is just another instance of the well-known relation that, for any matrix A, the eigenvalues of AA are the singular values of A and the right eigenvectors of AA are the left singular vectors of A; however, in this context, it also provides an alternative way to compute the principal component transform.

In fact, performing principal component analysis via the singular value decomposition is numerically preferable to forming and then diagonalizing the covariance matrix, since the formation of \(XX^{\mathsf{T}}\) can cause a disastrous loss of precision; the classic example of this phenomenon is the Läuchli matrix

$$\displaystyle{\left [\begin{array}{*{10}c} 1& \varepsilon &0&0\\ 1 &0 & \varepsilon &0 \\ 1&0&0& \varepsilon \end{array} \right ]\quad \mbox{ ($0 <\varepsilon \ll 1$),}}$$

for which taking the singular value decomposition (e.g. by bidiagonalization followed by QR iteration) is stable, but forming and diagonalizing \(XX^{\mathsf{T}}\) is unstable.

Karhunen–Loève Sampling of Non-Gaussian Besov Measures.

The Karhunen–Loève approach to generating samples from Gaussian measures of known covariance operator, as in Example 11.5, can be extended to more general settings, in which a basis is prescribed a priori and (not necessarily Gaussian) random coefficients with a suitable decay rate are used. The choice of basis elements and the rate of decay of the coefficients together control the smoothness of the sample realizations; the mathematical hard work lies in showing that such random series do indeed converge to a well-defined limit, and thereby define a probability measure on the desired function space.

One method for the construction of function spaces — and hence random functions — of desired smoothness is to use wavelets. Wavelet bases are particularly attractive because they allow for the representation of sharply localized features — e.g. the interface between two media with different material properties — in a way that globally smooth basis functions such as polynomials and the Fourier basis do not. Omitting several technicalities, a wavelet basis of \(L^{2}(\mathbb{R}^{d})\) or \(L^{2}(\mathbb{T}^{d})\) can be thought of as an orthonormal basis consisting of appropriately scaled and shifted copies of a single basic element that has some self-similarity. By controlling the rate of decay of the coefficients in a wavelet expansion, we obtain a family of function spaces — the Besov spaces — with three scales of smoothness, here denoted p, q and s. In what follows, for any function f on \(\mathbb{R}^{d}\) or \(\mathbb{T}^{d}\), define the scaled and shifted version f j, k of f for \(j,k \in \mathbb{Z}\) by

$$\displaystyle{ f_{j,k}(x):= f(2^{j}x - k). }$$
(11.2)

The starting point of a wavelet construction is a scaling function (also known as the averaging function or father wavelet) \(\widetilde{\phi }: \mathbb{R} \rightarrow \mathbb{R}\) and a family of closed subspaces \(\mathcal{V}_{j} \subseteq L^{2}(\mathbb{R})\), \(j \in \mathbb{Z}\), called a multiresolution analysis of \(L^{2}(\mathbb{R})\), satisfying

  1. (a)

    (nesting) for all \(j \in \mathbb{Z}\), \(\mathcal{V}_{j} \subseteq \mathcal{V}_{j+1}\);

  2. (b)

    (density and zero intersection) \(\overline{\bigcup _{j\in \mathbb{Z}}\mathcal{V}_{j}} = L^{2}(\mathbb{R})\) and \(\bigcap _{j\in \mathbb{Z}}\mathcal{V}_{j} =\{ 0\}\);

  3. (c)

    (scaling) for all \(j,k \in \mathbb{Z}\), \(f \in \mathcal{V}_{0}\;\Longleftrightarrow\;f_{j,k} \in \mathcal{V}_{j}\);

  4. (d)

    (translates of \(\widetilde{\phi }\) generate \(\mathcal{V}_{0}\)) \(\mathcal{V}_{0} =\mathop{ \mathrm{span}}\nolimits \{\widetilde{\phi }_{0,k}\mid k \in \mathbb{Z}\}\);

  5. (e)

    (Riesz basis) there are finite positive constants A and B such that, for all sequences \((c_{k})_{k\in \mathbb{Z}} \in \ell^{2}(\mathbb{Z})\),

    $$\displaystyle{A\|(c_{k})\|_{\ell^{2}(\mathbb{Z})} \leq \left \|\sum _{k\in \mathbb{Z}}c_{k}\widetilde{\phi }_{0,k}\right \|_{L^{2}(\mathbb{R})} \leq B\|(c_{k})\|_{\ell^{2}(\mathbb{Z})}.}$$

Given such a scaling function \(\widetilde{\phi }: \mathbb{R} \rightarrow \mathbb{R}\), the associated mother wavelet \(\widetilde{\psi }: \mathbb{R} \rightarrow \mathbb{R}\) is defined as follows:

$$\displaystyle\begin{array}{rcl} \mbox{ if }\widetilde{\phi }(x)& =& \sum _{k\in \mathbb{Z}}c_{k}\widetilde{\phi }(2x - k), {}\\ \mbox{ then }\widetilde{\psi }(x)& =& \sum _{k\in \mathbb{Z}}(-1)^{k}c_{ k+1}\widetilde{\phi }(2x + k). {}\\ \end{array}$$

It is the scaled and shifted copies of the mother wavelet \(\widetilde{\psi }\) that will form the desired orthonormal basis of L 2.

Example 11.8.

  1. (a)

    The indicator function \(\widetilde{\phi }= \mathbb{I}_{[0,1)}\) satisfies the self-similarity relation \(\widetilde{\phi }(x) =\widetilde{\phi } (2x) +\widetilde{\phi } (2x - 1)\); the associated \(\widetilde{\psi }\) given by

    $$\displaystyle{\widetilde{\psi }(x) =\widetilde{\phi } (2x)-\widetilde{\phi }(2x-1) = \left \{\begin{array}{@{}l@{\quad }l@{}} 1, \quad &\mbox{ if $0 \leq x < \tfrac{1} {2}$,} \\ -1,\quad &\mbox{ if $\tfrac{1} {2} \leq x < 1$,}\\ 0, \quad &\mbox{ otherwise.} \end{array} \right.}$$

    is called the Haar wavelet.

  2. (b)

    The B-spline scaling functions \(\sigma _{r}\), \(r \in \mathbb{N}_{0}\), are piecewise polynomial of degree r and globally \(\mathcal{C}^{r-1}\), and are defined recursively by convolution:

    $$\displaystyle{ \sigma _{r}:= \left \{\begin{array}{@{}l@{\quad }l@{}} \mathbb{I}_{[0,1)}, \quad &\mbox{ for $r = 0$,} \\ \sigma _{r-1} \star \sigma _{0},\quad &\mbox{ for $r \in \mathbb{N}$,}\end{array} \right. }$$
    (11.3)

    where

    $$\displaystyle{(f \star g)(x):=\int _{\mathbb{R}}f(y)g(x - y)\,\mathrm{d}y.}$$

Here, the presentation focusses on Besov spaces of 1-periodic functions, i.e. functions on the unit circle \(\mathbb{T}:= \mathbb{R}/\mathbb{Z}\), and on the d-dimensional unit torus \(\mathbb{T}^{d}:= \mathbb{R}^{d}/\mathbb{Z}^{d}\). To this end, set

$$\displaystyle{\phi (x):=\sum _{s\in \mathbb{Z}}\widetilde{\phi }(x + s)\quad \mbox{ and}\quad \psi (x):=\sum _{s\in \mathbb{Z}}\widetilde{\psi }(x + s).}$$

Scaled and translated versions of these functions are defined as usual by (11.2). Note that in the toroidal case the spaces \(\mathcal{V}_{j}\) for j < 0 consist of constant functions, and that, for each scale j ∈  0, \(\phi \in \mathcal{V}_{0}\) has only 2j distinct scaled translates \(\phi _{j,k} \in \mathcal{V}_{j}\), i.e. those with \(k = 0,\ldots,2^{j-1}\). Let

$$\displaystyle\begin{array}{rcl} \mathcal{V}_{j}&:=& \mathop{\mathrm{span}}\nolimits \{\phi _{j,k}\mid k = 0,\ldots,2^{j} - 1\}, {}\\ \mathcal{W}_{j}&:=& \mathop{\mathrm{span}}\nolimits \{\psi _{j,k}\mid k = 0,\ldots,2^{j} - 1\}, {}\\ \end{array}$$

so that \(\mathcal{W}_{j}\) is the orthogonal complement of \(\mathcal{V}_{j}\) in \(\mathcal{V}_{j+1}\) and

$$\displaystyle{L^{2}(\mathbb{T}) = \overline{\bigcup _{ j\in \mathbb{N}_{0}}\mathcal{V}_{j}} =\bigoplus _{j\in \mathbb{N}_{0}}\mathcal{W}_{j}}$$

Indeed, if ψ has unit norm, then 2j∕2 ψ j, k also has unit norm, and

$$\displaystyle\begin{array}{rcl} \{2^{j/2}\psi _{ j,k}\mid k = 0,\ldots,2^{j} - 1& \}& \mbox{ is an orthonormal basis of $\mathcal{W}_{ j}$, and} {}\\ \{2^{j/2}\psi _{ j,k}\mid j \in \mathbb{N}_{0},k = 0,\ldots,2^{j} - 1& \}& \mbox{ is an orthonormal basis of $L^{2}(\mathbb{T})$,} {}\\ \end{array}$$

a so-called wavelet basis.

To construct an analogous wavelet basis of \(L^{2}(\mathbb{T}^{d})\) for d ≥ 1, proceed as follows: for \(\nu \in \{ 0,1\}^{d}\setminus \{(0,\ldots,0)\}\), \(j \in \mathbb{N}_{0}\), and \(k \in \{ 0,\ldots,2^{j} - 1\}^{d}\), define the scaled and translated wavelet \(\psi _{j,k}^{\nu }: \mathbb{T}^{d} \rightarrow \mathbb{R}\) by

$$\displaystyle{\psi _{j,k}^{\nu }(x):= 2^{dj/2}\psi ^{\nu _{1} }(2^{j}x_{ 1} - k_{1})\cdots \psi ^{\nu _{d} }(2^{j}x_{ d} - k_{d})}$$

where ψ 0 = ϕ and ψ 1 = ψ. The system

$$\displaystyle{\left \{\psi _{j,k}^{\nu }\,\vert \,j \in \mathbb{N}_{ 0},k \in \{ 0,\ldots,2^{j} - 1\}^{d},\nu \in \{ 0,1\}^{d}\setminus \{(0,\ldots,0)\}\right \}}$$

is an orthonormal wavelet basis of \(L^{2}(\mathbb{T}^{d})\).

The Besov space \(B_{pq}^{s}(\mathbb{T}^{d})\) can be characterized in terms of the summability of wavelet coefficients at the various scales:

Definition 11.9.

Let \(1 \leq p,q < \infty \) and let s > 0. The Besov (p,q,s) norm of a function \(u =\sum _{j,k,\nu }u_{j,k}^{\nu }\psi _{j,k}^{\nu }: \mathbb{T}^{d} \rightarrow \mathbb{R}\) is defined by

$$\displaystyle\begin{array}{rcl} \left \|\sum _{j\in \mathbb{N}_{0}}\sum _{\nu,k}u_{j,k}^{\nu }\psi _{ j,k}^{\nu }\right \|_{ B_{pq}^{s}(\mathbb{T}^{d})}&:=& \left \|j\mapsto 2^{js}2^{jd(\frac{1} {2} -\frac{1} {p})}\left \|(k,\nu )\mapsto u_{j,k}^{\nu }\right \|_{\ell^{p}}\right \|_{ \ell^{q}(\mathbb{N}_{0})} {}\\ &:=& \left (\sum _{j\in \mathbb{N}_{0}}2^{qjs}2^{qjd(\frac{1} {2} -\frac{1} {p})}\left (\sum _{\nu,k}\vert u_{j,k}^{\nu }\vert ^{p}\right )^{q/p}\right )^{1/q}, {}\\ \end{array}$$

and the Besov space \(B_{pq}^{s}(\mathbb{T}^{d})\) is the completion of the space of functions for which this norm is finite.

Note that at each scale j, there are (2d − 1)2jd = 2(j+1)d − 2jd wavelet coefficients. The indices j, k and ν can be combined into a single index \(\ell\in \mathbb{N}\). First,  = 1 corresponds to the scaling function \(\phi (x_{1})\cdots \phi (x_{d})\). The remaining numbering is done scale by scale; that is, we first number wavelets with j = 0, then wavelets with j = 1, and so on. Within each scale \(j \in \mathbb{N}_{0}\), the 2d − 1 indices ν are ordered by thinking them as binary representation of integers, and an ordering of the 2jd translations k can be chosen arbitrarily. With this renumbering,

$$\displaystyle{\sum _{\ell=1}^{\infty }c_{\ell}\psi _{\ell} \in B_{ pq}^{s}(\mathbb{T}^{d})\;\Longleftrightarrow\;2^{js}2^{jd(\frac{1} {2} -\frac{1} {p})}\left (\sum _{\ell=2^{jd }}^{2^{(j+1)d}-1 }\vert c_{\ell}\vert ^{p}\right )^{1/p} \in \ell^{q}(\mathbb{N}_{ 0})}$$

For p = q, since at scale j it holds that 2jd ≤  < 2(j+1)d, an equivalent norm for \(B_{pp}^{s}(\mathbb{T}^{d})\) is

$$\displaystyle{\left \|\sum _{\ell\in \mathbb{N}}u_{\ell}\psi _{\ell}\right \|_{B_{pp}^{s}(\mathbb{T}^{d})} \simeq \left \|\sum _{\ell\in \mathbb{N}}u_{\ell}\psi _{\ell}\right \|_{X^{s,p}}:= \left (\sum _{\ell=1}^{\infty }\ell^{(ps/d+p/2-1)}\vert u_{\ell}\vert ^{p}\right )^{1/p};}$$

in particular if the original scaling function and mother wavelet are r times differentiable with r > s, then B 22 s coincides with the Sobolev space H s. This leads to a Karhunen–Loève-type sampling procedure for \(B_{pp}^{s}(\mathbb{T}^{d})\), as in Example 11.5: U defined by

$$\displaystyle{ U:=\sum _{\ell\in \mathbb{N}}\ell^{-( \frac{s} {d}+\frac{1} {2} -\frac{1} {p})}\kappa ^{-\frac{1} {p} }\varXi _{\ell}\psi _{\ell}, }$$
(11.4)

where \(\varXi _{\ell}\) are sampled independently and identically from the generalized Gaussian measure on \(\mathbb{R}\) with Lebesgue density proportional to \(\exp (-\frac{1} {2}\vert \xi _{\ell}\vert ^{p})\), can be said to have ‘formal Lebesgue density’ proportional to \(\exp (-\frac{\kappa }{2}\|u\|_{B_{pp}^{s}}^{p})\), and is therefore a natural candidate for a ‘typical’ element of the Besov space \(B_{pp}^{s}(\mathbb{T}^{d})\). More generally, given any orthonormal basis \(\{\psi _{k}\mid k \in \mathbb{N}\}\) of some Hilbert space, one can define a Banach subspace X s, p with norm

$$\displaystyle{\left \|\sum _{\ell\in \mathbb{N}}u_{\ell}\psi _{\ell}\right \|_{X^{s,p}}:= \left (\sum _{\ell=1}^{\infty }\ell^{(ps/d+p/2-1)}\vert u_{\ell}\vert ^{p}\right )^{1/p}}$$

and define a Besov-distributed random variable U by (11.4).

It remains, however, to check that (11.4) not only defines a measure, but that it assigns unit probability mass to the Besov space from which it is desired to draw samples. It turns out that the question of whether or not U ∈ X s, p with probability one is closely related to having a Fernique theorem (q.v. Theorem 2.47) for Besov measures:

Theorem 11.10.

Let U be defined as in (11.4) , with \(1 \leq p < \infty \) and s > 0. Then

$$\displaystyle\begin{array}{rcl} \|U\|_{X^{t,p}} < \infty \mbox{ almost surely}& \;\Longleftrightarrow\;& \mathbb{E}[\exp (\alpha \|U\|_{X^{t,p}}^{p})] < \infty \mbox{ for all $\alpha \in (0, \tfrac{\kappa } {2})$} {}\\ & \;\Longleftrightarrow\;& t < s -\tfrac{d} {p} {}\\ \end{array}$$

Furthermore, for p ≥ 1, \(s > \frac{d} {p}\) , and \(t < s -\tfrac{d} {p}\) , there is a constant r depending only on p, d, s and t such that, for all \(\alpha \in (0, \frac{\kappa } {2r^{{\ast}}})\) ,

$$\displaystyle{\mathbb{E}[\exp (\alpha \|U\|_{\mathcal{C}^{t}})] < \infty \mbox{.}}$$

11.2 Wiener–Hermite Polynomial Chaos

The next section will cover polynomial chaos (PC) expansions in greater generality, and this section serves as an introductory prelude. In this, the classical and notationally simplest setting, we consider expansions of a real-valued random variable U with respect to a single standard Gaussian random variable \(\varXi\), using appropriate orthogonal polynomials of \(\varXi\), i.e. the Hermite polynomials. This setting was pioneered by Norbert Wiener, and so it is known as the Wiener–Hermite polynomial chaos. The term ‘chaos’ is perhaps a bit confusing, and is not related to the use of the term in the study of dynamical systems; its original meaning, as used by Wiener (1938), was something closer to what would nowadays be called a stochastic process:

“Of all the forms of chaos occurring in physics, there is only one class which has been studied with anything approaching completeness. This is the class of types of chaos connected with the theory of Brownian motion.”

Let \(\varXi \sim \gamma = \mathcal{N}(0,1)\) be a standard Gaussian random variable, and let \(\mathrm{He}_{n} \in \mathfrak{P}\), for \(n \in \mathbb{N}_{0}\), be the Hermite polynomials, the orthogonal polynomials for the standard Gaussian measure γ with the normalization

$$\displaystyle{\int _{\mathbb{R}}\mathrm{He}_{m}(\xi )\mathrm{He}_{n}(\xi )\,\mathrm{d}\gamma (\xi ) = n!\delta _{mn}.}$$

By the Weierstrass approximation theorem (Theorem 8.20) and the approximability of L 2 functions by continuous ones, the Hermite polynomials form a complete orthogonal basis of the Hilbert space \(L^{2}(\mathbb{R},\gamma; \mathbb{R})\) with the inner product

$$\displaystyle{\langle U,V \rangle _{L^{2}(\gamma )}:= \mathbb{E}[U(\varXi )V (\varXi )] \equiv \int _{\mathbb{R}}U(\xi )V (\xi )\,\mathrm{d}\gamma (\xi ).}$$

Definition 11.11.

Let \(U \in L^{2}(\mathbb{R},\gamma; \mathbb{R})\) be a square-integrable real-valued random variable. The Wiener–Hermite polynomial chaos expansion of U with respect to the standard Gaussian \(\varXi\) is the expansion of U in the orthogonal basis \(\{\mathrm{He}_{n}\}_{n\in \mathbb{N}_{0}}\), i.e.

$$\displaystyle{U =\sum _{n\in \mathbb{N}_{0}}u_{n}\mathrm{He}_{n}(\varXi )}$$

with scalar Wiener–Hermite polynomial chaos coefficients \(\{u_{n}\}_{n\in \mathbb{N}_{0}} \subseteq \mathbb{R}\) given by

$$\displaystyle{u_{n} = \frac{\langle U,\mathrm{He}_{n}\rangle _{L^{2}(\gamma )}} {\|\mathrm{He}_{n}\|_{L^{2}(\gamma )}^{2}} = \frac{1} {n!\sqrt{2\pi }}\int _{-\infty }^{\infty }U(\xi )\mathrm{He}_{ n}(\xi )e^{-\xi ^{2}/2 }\,\mathrm{d}\xi.}$$

Note that, in particular, since \(\mathrm{He}_{0} \equiv 1\),

$$\displaystyle{\mathbb{E}[U] =\langle \mathrm{He}_{0},U\rangle _{L^{2}(\gamma )} =\sum _{n\in \mathbb{N}_{0}}u_{n}\langle \mathrm{He}_{0},\mathrm{He}_{n}\rangle _{L^{2}(\gamma )} = u_{0},}$$

so the expected value of U is simply its 0th PC coefficient. Similarly, its variance is a weighted sum of the squares of its PC coefficients:

$$\displaystyle\begin{array}{rcl} \mathbb{V}[U]& =& \mathbb{E}\left [\vert U - \mathbb{E}[U]\vert ^{2}\right ] {}\\ & =& \mathbb{E}\left [\left \vert \sum _{n\in \mathbb{N}}u_{n}\mathrm{He}_{n}\right \vert ^{2}\right ]\qquad \qquad \qquad \mbox{ since $\mathbb{E}[U] = u_{ 0}$} {}\\ & =& \sum _{m,n\in \mathbb{N}}u_{m}u_{n}\langle \mathrm{He}_{m},\mathrm{He}_{n}\rangle _{L^{2}(\gamma )} {}\\ & =& \sum _{n\in \mathbb{N}}u_{n}^{2}\|\mathrm{He}_{ n}\|_{L^{2}(\gamma )}^{2}\qquad \qquad \qquad \ \mbox{ by Hermitian orthogonality} {}\\ & =& \sum _{n\in \mathbb{N}}u_{n}^{2}n!. {}\\ \end{array}$$

Example 11.12.

Let \(X \sim \mathcal{N}(m,\sigma ^{2})\) be a real-valued Gaussian random variable with mean \(m \in \mathbb{R}\) and variance \(\sigma ^{2} \geq 0\). Let \(Y:= e^{X}\); since \(\log Y\) is normally distributed, the non-negative-valued random variable Y is said to be a log-normal random variable. As usual, let \(\varXi \sim \mathcal{N}(0,1)\) be the standard Gaussian random variable; clearly X has the same distribution as \(m+\sigma \varXi\), and Y has the same distribution as \(e^{m}e^{\sigma \varXi }\). The Wiener–Hermite expansion of Y as \(\sum _{k\in \mathbb{N}_{0}}y_{k}\mathrm{He}_{k}(\varXi )\) has coefficients

$$\displaystyle\begin{array}{rcl} y_{k}& =& \frac{\langle e^{m+\sigma \varXi },\mathrm{He}_{k}(\varXi )\rangle } {\|\mathrm{He}_{k}(\varXi )\|^{2}} {}\\ & =& \frac{e^{m}} {k!} \frac{1} {\sqrt{2\pi }}\int _{\mathbb{R}}e^{\sigma \xi }\mathrm{He}_{k}(\xi )e^{-\xi ^{2}/2 }\,\mathrm{d}\xi {}\\ & =& \frac{e^{m+\sigma ^{2}/2 }} {k!} \frac{1} {\sqrt{2\pi }}\int _{\mathbb{R}}\mathrm{He}_{k}(\xi )e^{-(\xi -\sigma )^{2}/2 }\,\mathrm{d}\xi {}\\ & =& \frac{e^{m+\sigma ^{2}/2 }} {k!} \frac{1} {\sqrt{2\pi }}\int _{\mathbb{R}}\mathrm{He}_{k}(w+\sigma )e^{-w^{2}/2 }\,\mathrm{d}w. {}\\ \end{array}$$

This Gaussian integral can be evaluated directly using the Cameron–Martin formula (Lemma 2.40), or else using the formula

$$\displaystyle{\mathrm{He}_{n}(x + y) =\sum _{ k=0}^{n}{n\choose k}x^{n-k}\mathrm{He}_{ k}(y),}$$

which follows from the derivative property \(\mathrm{He}'_{n} = n\mathrm{He}_{n-1}\), with \(x =\sigma\) and y = w: this formula yields that

$$\displaystyle{y_{k} = \frac{e^{m+\sigma ^{2}/2 }} {k!} \frac{1} {\sqrt{2\pi }}\int _{\mathbb{R}}\sum _{j=0}^{k}{k\choose j}\sigma ^{k-j}\mathrm{He}_{ j}(w)e^{-w^{2}/2 }\,\mathrm{d}w = \frac{e^{m+\sigma ^{2}/2 }\sigma ^{k}} {k!} }$$

since the orthogonality relation \(\langle \mathrm{He}_{m},\mathrm{He}_{n}\rangle _{L^{2}(\gamma )} = n!\delta _{mn}\) with n = 0 implies that every Hermite polynomial other than \(\mathrm{He}_{0}\) has mean 0 under standard Gaussian measure. That is,

$$\displaystyle{ Y = e^{m+\sigma ^{2}/2 }\sum _{k\in \mathbb{N}_{0}} \frac{\sigma ^{k}} {k!}\mathrm{He}_{k}(\varXi ). }$$
(11.5)

The Wiener–Hermite expansion (11.5) reveals that \(\mathbb{E}[Y ] = e^{m+\sigma ^{2}/2 }\) and

$$\displaystyle{\mathbb{V}[Y ] = e^{2m+\sigma ^{2} }\sum _{k\in \mathbb{N}}\left ( \frac{\sigma ^{k}} {k!}\right )^{2}\|\mathrm{He}_{ k}\|_{L^{2}(\gamma )}^{2} = e^{2m+\sigma ^{2} }\left (e^{\sigma ^{2} } - 1\right ).}$$

Truncation of Wiener–Hermite Expansions.

Of course, in practice, the series expansion \(U =\sum _{k\in \mathbb{N}_{0}}u_{k}\mathrm{He}_{k}(\varXi )\) must be truncated after finitely many terms, and so it is natural to ask about the quality of the approximation

$$\displaystyle{U \approx U^{K}:=\sum _{ k=0}^{K}u_{ k}\mathrm{He}_{k}(\varXi ).}$$

Since the Hermite polynomials \(\{\mathrm{He}_{k}\}_{k\in \mathbb{N}_{0}}\) form a complete orthogonal basis for \(L^{2}(\mathbb{R},\gamma; \mathbb{R})\), the standard results about orthogonal approximations in Hilbert spaces apply. In particular, by Corollary 3.26, the truncation error UU K is orthogonal to the space from which U K was chosen, i.e.

$$\displaystyle{\mathop{\mathrm{span}}\nolimits \{\mathrm{He}_{0},\mathrm{He}_{1},\ldots,\mathrm{He}_{K}\},}$$

and tends to zero in mean square; in the stochastic context, this observation was first made by Cameron and Martin (1947, Section 2).

Lemma 11.13.

The truncation error U − U K is orthogonal to the subspace

$$\displaystyle{\mathop{\mathrm{span}}\nolimits \{\mathrm{He}_{0},\mathrm{He}_{1},\ldots,\mathrm{He}_{K}\}}$$

of \(L^{2}(\mathbb{R},\mathrm{d}\gamma; \mathbb{R})\) . Furthermore, \(\lim _{K\rightarrow \infty }U^{K} = U\) in \(L^{2}(\mathbb{R},\gamma; \mathbb{R})\) .

Proof.

Let \(V:=\sum _{ m=0}^{K}v_{m}\mathrm{He}_{m}\) be any element of the subspace of \(L^{2}(\mathbb{R},\gamma; \mathbb{R})\) spanned by the Hermite polynomials of degree at most K. Then

$$\displaystyle\begin{array}{rcl} \langle U - U^{K},V \rangle _{ L^{2}(\gamma )}& =& \left \langle \left (\sum _{n>K}u_{n}\mathrm{He}_{n}\right ),\left (\sum _{m=0}^{K}v_{ m}\mathrm{He}_{m}\right )\right \rangle {}\\ & =& \sum _{\begin{array}{c}n>K \\ m\in \{0,\ldots,K\}\end{array}}u_{n}v_{m}\langle \mathrm{He}_{n},\mathrm{He}_{m}\rangle {}\\ & =& 0. {}\\ \end{array}$$

Hence, by Pythagoras’ theorem,

$$\displaystyle{\|U\|_{L^{2}(\gamma )}^{2} =\| U^{K}\|_{ L^{2}(\gamma )}^{2} +\| U - U^{K}\|_{ L^{2}(\gamma )}^{2},}$$

and hence \(\|U - U^{K}\|_{L^{2}(\gamma )} \rightarrow 0\) as \(K \rightarrow \infty \). □ 

11.3 Generalized Polynomial Chaos Expansions

The ideas of polynomial chaos can be generalized well beyond the setting in which the elementary random variable \(\varXi\) used to generate the orthogonal decomposition is a standard Gaussian random variable, or even a vector \(\varXi = (\varXi _{1},\ldots,\varXi _{d})\) of mutually orthogonal Gaussian random variables. Such expansions are referred to as generalized polynomial chaos (gPC) expansions.

Let \(\varXi = (\varXi _{1},\ldots,\varXi _{d})\) be an \(\mathbb{R}^{d}\)-valued random variable with independent (and hence L 2-orthogonal) components, called the stochastic germ. Let the measurable rectangle \(\varTheta =\varTheta _{1} \times \ldots \times \varTheta _{d} \subseteq \mathbb{R}^{d}\) be the support (i.e. range) of \(\varXi\). Denote by \(\mu =\mu _{1} \otimes \ldots \otimes \mu _{d}\) the distribution of \(\varXi\) on \(\varTheta\). The objective is to express any function (random variable, random vector, or even random field) \(U \in L^{2}(\varTheta,\mu )\) in terms of elementary μ-orthogonal functions of the stochastic germ \(\varXi\).

As usual, let \(\mathfrak{P}^{d}\) denote the ring of all d-variate polynomials with real coefficients, and let \(\mathfrak{P}_{\leq p}^{d}\) denote those polynomials of total degree at most \(p \in \mathbb{N}_{0}\). Let \(\varGamma _{p} \subseteq \mathfrak{P}_{\leq p}^{d}\) be a collection of polynomials that are mutually orthogonal, orthogonal to \(\mathfrak{P}_{\leq p-1}^{d}\), and span \(\mathfrak{P}_{=p}^{d}\). Assuming for convenience, as usual, the completeness of the resulting system of orthogonal polynomials, this yields the orthogonal decomposition

$$\displaystyle{L^{2}(\varTheta,\mu; \mathbb{R}) =\bigoplus _{ p\in \mathbb{N}_{0}}\mathop{ \text{span}}\varGamma _{p}.}$$

It is important to note that there is a lack of uniqueness in these basis polynomials whenever d ≥ 2: each choice of ordering of multi-indices \(\alpha \in \mathbb{N}_{0}^{d}\) can yield a different orthogonal basis of \(L^{2}(\varTheta,\mu )\) when the Gram–Schmidt procedure is applied to the monomials \(\xi ^{\alpha }\).

Note that (as usual, assuming separability) the L 2 space over the product probability space \((\varTheta,\mathcal{F},\mu )\) is isomorphic to the Hilbert space tensor product of the L 2 spaces over the marginal probability spaces:

$$\displaystyle{L^{2}(\varTheta _{ 1} \times \ldots \times \varTheta _{d},\mu _{1} \otimes \ldots \otimes \mu _{d}; \mathbb{R}) =\bigotimes _{ i=1}^{d}L^{2}(\varTheta _{ i},\mu _{i}; \mathbb{R});}$$

hence, as in Theorem 8.25, an orthogonal system of multivariate polynomials for \(L^{2}(\varTheta,\mu; \mathbb{R})\) can be found by taking products of univariate orthogonal polynomials for the marginal spaces \(L^{2}(\varTheta _{i},\mu _{i}; \mathbb{R})\). A generalized polynomial chaos (gPC) expansion of a random variable or stochastic process U is simply the expansion of U with respect to such a complete orthogonal polynomial basis of \(L^{2}(\varTheta,\mu )\).

Example 11.14.

Let \(\varXi = (\varXi _{1},\varXi _{2})\) be such that \(\varXi _{1}\) and \(\varXi _{2}\) are independent (and hence orthogonal) and such that \(\varXi _{1}\) is a standard Gaussian random variable and \(\varXi _{2}\) is uniformly distributed on [−1, 1]. Hence, the univariate orthogonal polynomials for \(\varXi _{1}\) are the Hermite polynomials \(\mathrm{He}_{n}\) and the univariate orthogonal polynomials for \(\varXi _{2}\) are the Legendre polynomials \(\mathrm{Le}_{n}\). Thus, by Theorem 8.25, a system of orthogonal polynomials for \(\varXi\) up to total degree 3 is

$$\displaystyle\begin{array}{rcl} \varGamma _{0}& =& \{1\}, {}\\ \varGamma _{1}& =& \{\mathrm{He}_{1}(\xi _{1}),\mathrm{Le}_{1}(\xi _{2})\} {}\\ & =& \{\xi _{1},\xi _{2}\}, {}\\ \varGamma _{2}& =& \{\mathrm{He}_{2}(\xi _{1}),\mathrm{He}_{1}(\xi _{1})\mathrm{Le}_{1}(\xi _{2}),\mathrm{Le}_{2}(\xi _{2})\} {}\\ & =& \{\xi _{1}^{2} - 1,\xi _{ 1}\xi _{2}, \tfrac{1} {2}(3\xi _{2}^{2} - 1)\}, {}\\ \varGamma _{3}& =& \{\mathrm{He}_{3}(\xi _{1}),\mathrm{He}_{2}(\xi _{1})\mathrm{Le}_{1}(\xi _{2}),\mathrm{He}_{1}(\xi _{1})\mathrm{Le}_{2}(\xi _{2}),\mathrm{Le}_{3}(\xi _{2})\} {}\\ & =& \{\xi _{1}^{3} - 3\xi _{ 1},\xi _{1}^{2}\xi _{ 2} -\xi _{2}, \tfrac{1} {2}(3\xi _{1}\xi _{2}^{2} -\xi _{ 1}), \tfrac{1} {2}(5\xi _{2}^{3} - 3\xi _{ 2})\}. {}\\ \end{array}$$

Remark 11.15.

To simplify the notation in what follows, the following conventions will be observed:

  1. (a)

    To simplify expectations, inner products and norms, \(\langle \cdot \rangle _{\mu }\) or simply \(\langle \cdot \rangle\) will denote integration (i.e. expectation) with respect to the probability measure μ, so that the L 2(μ) inner product is simply \(\langle X,Y \rangle _{L^{2}(\mu )} =\langle XY \rangle _{\mu }\).

  2. (b)

    Rather than have the orthogonal basis polynomials be indexed by multi-indices \(\alpha \in \mathbb{N}_{0}^{d}\), or have two scalar indices, one for the degree p and one within each set Γ p , it is convenient to order the basis polynomials using a single scalar index \(k \in \mathbb{N}_{0}\). It is common in practice to take Ψ 0 = 1 and to have the polynomial degree be (weakly) increasing with respect to the new index k. So, to continue Example 11.14, one could use the graded lexicographic ordering on \(\alpha \in \mathbb{N}_{0}^{2}\) so that \(\varPsi _{0}(\xi ) = 1\) and

    $$\displaystyle{\begin{array}{rlllll} \varPsi _{1}(\xi )& =\xi _{1}, &\varPsi _{2}(\xi )& =\xi _{2}, &\varPsi _{3}(\xi )& =\xi _{ 1}^{2} - 1, \\ \varPsi _{4}(\xi )& =\xi _{1}\xi _{2}, &\varPsi _{5}(\xi )& = \tfrac{1} {2}(3\xi _{2}^{2} - 1), &\varPsi _{ 6}(\xi )& =\xi _{ 1}^{3} - 3\xi _{ 1}, \\ \varPsi _{7}(\xi )& =\xi _{ 1}^{2}\xi _{2} -\xi _{2},&\varPsi _{8}(\xi )& = \tfrac{1} {2}(3\xi _{1}\xi _{2}^{2} -\xi _{ 1}),&\varPsi _{9}(\xi )& = \tfrac{1} {2}(5\xi _{2}^{3} - 3\xi _{ 2}).\end{array} }$$
  3. (c)

    By abuse of notation, Ψ k will stand for both a polynomial function (which is a deterministic function from \(\mathbb{R}^{d}\) to \(\mathbb{R}\)) and for the real-valued random variable that is the composition of that polynomial with the stochastic germ \(\varXi\) (which is a function from an abstract probability space to \(\mathbb{R}\)).

Truncation of gPC Expansions.

Suppose that a gPC expansion of the form \(U =\sum _{k\in \mathbb{N}_{0}}u_{k}\varPsi _{k}\) is truncated, i.e. we consider

$$\displaystyle{U^{K} =\sum _{ k=0}^{K}u_{ k}\varPsi _{k}.}$$

It is an easy exercise to show that the truncation error UU K is orthogonal to \(\mathop{\mathrm{span}}\nolimits \{\varPsi _{0},\ldots,\varPsi _{K}\}\). It is also worth considering how many terms there are in such a truncated gPC expansion. Suppose that the stochastic germ \(\varXi\) has dimension d (i.e. has d independent components), and we work only with polynomials of total degree at most p. The total number of coefficients in the truncated expansion U K is

$$\displaystyle{K + 1 = \frac{(d + p)!} {d!p!}.}$$

That is, the total number of gPC coefficients that must be calculated grows combinatorially as a function of the number of input random variables and the degree of polynomial approximation. Such rapid growth limits the usefulness of gPC expansions for practical applications where d and p are much greater than the order of 10 or so.

Expansions of Random Variables.

Consider a real-valued random variable U, which we expand in terms of a stochastic germ \(\varXi\) as

$$\displaystyle{U^{K}(\varXi ) =\sum _{ k\in \mathbb{N}_{0}}u_{k}\varPsi _{k}(\varXi ),}$$

where the basis functions Ψ k are orthogonal with respect to the law of \(\varXi\), and with the usual convention that Ψ 0 = 1. A first, easy, observation is that

$$\displaystyle{\mathbb{E}[U] =\langle \varPsi _{0}U\rangle =\sum _{k\in \mathbb{N}_{0}}u_{k}\langle \varPsi _{0}\varPsi _{k}\rangle = u_{0},}$$

so the expected value of U is simply its 0th gPC coefficient. Similarly, its variance is a weighted sum of the squares of its gPC coefficients:

$$ \displaystyle\begin{array}{rcl} \mathbb{E}\left [\vert U - \mathbb{E}[U]\vert ^{2}\right ]& =& \mathbb{E}\left [\left \vert \sum _{ k\in \mathbb{N}_{0}}u_{k}\varPsi _{k}\right \vert ^{2}\right ] {}\\ & =& \sum _{k,\ell\in \mathbb{N}}u_{k}u_{\ell}\langle \varPsi _{k}\varPsi _{\ell}\rangle {}\\ & =& \sum _{k\in \mathbb{N}}u_{k}^{2}\langle \varPsi _{ k}^{2}\rangle. {}\\ \end{array} $$

Similar remarks apply to any truncation \(U^{K} =\sum _{ k=1}^{K}u_{k}\varPsi _{k}\) of the gPC expansion of U. In view of the expression for the variance, the gPC coefficients can be used as sensitivity indices. That is, a natural measure of how strongly U depends upon \(\varPsi _{k}(\varXi )\) is

$$\displaystyle{ \frac{u_{k}^{2}\langle \varPsi _{k}^{2}\rangle } {\sum _{\ell\geq 1}u_{\ell}^{2}\langle \varPsi _{\ell}^{2}\rangle }.}$$

Expansions of Random Vectors.

Similarly, if \(U_{1},\ldots,U_{n}\) are (not necessarily independent) real-valued random variables, then the \(\mathbb{R}^{n}\)-valued random variable \(\boldsymbol{U} = [U_{1},\ldots,U_{n}]^{\mathsf{T}}\) with the U i as its components can be given a (possibly truncated) expansion

$$\displaystyle{\boldsymbol{U}(\xi ) =\sum _{k\in \mathbb{N}_{0}}\boldsymbol{u}_{k}\varPsi _{k}(\xi ),}$$

with vector-valued gPC coefficients \(\boldsymbol{u}_{k} = [u_{1,k},\ldots,u_{n,k}]^{\mathsf{T}} \in \mathbb{R}^{n}\) for each \(k \in \mathbb{N}_{0}\). As before,

$$\displaystyle{\mathbb{E}[\boldsymbol{U}] =\langle \varPsi _{0}\boldsymbol{U}\rangle =\sum _{k\in \mathbb{N}_{0}}\boldsymbol{u}_{k}\langle \varPsi _{0}\varPsi _{k}\rangle =\boldsymbol{ u}_{0} \in \mathbb{R}^{n}}$$

and the covariance matrix \(C \in \mathbb{R}^{n\times n}\) of U is given by

$$\displaystyle{C =\sum _{k\in \mathbb{N}}\boldsymbol{u}_{k}\boldsymbol{u}_{k}^{\mathsf{T}}\langle \varPsi _{ k}^{2}\rangle }$$

i.e. its components are \(C_{ij} =\sum _{k\in \mathbb{N}}u_{i,k}u_{j,k}\langle \varPsi _{k}^{2}\rangle\).

Expansions of Stochastic Processes.

Consider now a stochastic process U, i.e. a function \(U: \varTheta \times \mathcal{X} \rightarrow \mathbb{R}\). Suppose that U is square integrable in the sense that, for each \(x \in \mathcal{X}\), \(U(\cdot,x) \in L^{2}(\varTheta,\mu )\) is a real-valued random variable, and, for each \(\theta \in \varTheta\), \(U(\theta, \cdot ) \in L^{2}(\mathcal{X},\mathrm{d}x)\) is a scalar field on the domain \(\mathcal{X}\). Recall that

$$\displaystyle{L^{2}(\varTheta,\mu; \mathbb{R}) \otimes L^{2}(\mathcal{X},\mathrm{d}x; \mathbb{R})\mathop{\cong}L^{2}(\varTheta \times \mathcal{X},\mu \otimes \mathrm{d}x; \mathbb{R})\mathop{\cong}L^{2}{\bigl (\varTheta,\mu;L^{2}(\mathcal{X},\mathrm{d}x)\bigr )}\mbox{,}}$$

so U can be equivalently viewed as a linear combination of products of \(\mathbb{R}\)-valued random variables with deterministic scalar fields, or as a function on \(\varTheta \times \mathcal{X}\), or as a field-valued random variable. As usual, take \(\{\varPsi _{k}\mid k \in \mathbb{N}_{0}\}\) to be an orthogonal polynomial basis of \(L^{2}(\varTheta,\mu; \mathbb{R})\), ordered (weakly) by total degree, with Ψ 0 = 1. A gPC expansion of the random field U is an L 2-convergent expansion of the form

$$\displaystyle{U(x,\xi ) =\sum _{k\in \mathbb{N}_{0}}u_{k}(x)\varPsi _{k}(\xi ).}$$

The functions \(u_{k}: \mathcal{X} \rightarrow \mathbb{R}\) are called the stochastic modes of the process U. The stochastic mode \(u_{0}: \mathcal{X} \rightarrow \mathbb{R}\) is the mean field of U:

$$\displaystyle{\mathbb{E}[U(x)] = u_{0}(x).}$$

The variance of the field at \(x \in \mathcal{X}\) is

$$\displaystyle{\mathbb{V}[U(x)] =\sum _{k\in \mathbb{N}}u_{k}(x)^{2}\langle \varPsi _{ k}^{2}\rangle,}$$

whereas, for two points \(x,y \in \mathcal{X}\),

$$ \displaystyle\begin{array}{rcl} \mathbb{E}[U(x)U(y)]& =& \left \langle \sum _{k\in \mathbb{N}_{0}}u_{k}(x)\varPsi _{k}(\xi )\sum _{\ell\in \mathbb{N}_{0}}u_{\ell}(y)\varPsi _{\ell}(\xi )\right \rangle {}\\ & =& \sum _{k\in \mathbb{N}_{0}}u_{k}(x)u_{k}(y)\langle \varPsi _{k}^{2}\rangle {}\\ \end{array} $$

and so the covariance function of U is given by

$$\displaystyle{C_{U}(x,y) =\sum _{k\in \mathbb{N}}u_{k}(x)u_{k}(y)\langle \varPsi _{k}^{2}\rangle.}$$

The previous remarks about gPC expansions of vector-valued random variables are a special case of these remarks about stochastic processe, namely \(\mathcal{X} =\{ 1,\ldots,n\}\). At least when \(\dim \mathcal{X}\) is low, it is very common to see the behaviour of a stochastic field U (or its truncation U K) summarized by plots of the mean field and the variance field, as well as a few ‘typical’ sample realizations. The visualization of high-dimensional data is a subject unto itself, with many ingenious uses of shading, colour, transparency, videos and user interaction tools.

Changes of gPC Basis.

It is possible to change between representations of a stochastic quantity U with respect to gPC bases \(\{\varPsi _{k}\mid k \in \mathbb{N}_{0}\}\) and \(\{\varPhi _{k}\mid k \in \mathbb{N}_{0}\}\) generated by measures μ and ν respectively. Obviously, for such changes of basis to work in both directions, μ and ν must at least have the same support. Suppose that

$$\displaystyle{U =\sum _{k\in \mathbb{N}_{0}}u_{k}\varPsi _{k} =\sum _{k\in \mathbb{N}_{0}}v_{k}\varPhi _{k}.}$$

Then, taking the L 2(ν)-inner product of this equation with Φ ,

$$\displaystyle{\langle U\varPhi _{\ell}\rangle _{\nu } =\sum _{k\in \mathbb{N}_{0}}u_{k}\langle \varPsi _{k}\varPhi _{\ell}\rangle _{\nu } = v_{\ell}\langle \varPsi _{\ell}^{2}\rangle _{ \nu },}$$

provided that Ψ k Φ  ∈ L 2(ν) for all \(k \in \mathbb{N}_{0}\), i.e.

$$\displaystyle{v_{\ell} =\sum _{k\in \mathbb{N}_{0}} \frac{u_{k}\langle \varPsi _{k}\varPhi _{\ell}\rangle _{\nu }} {\langle \varPsi _{\ell}^{2}\rangle _{\nu }}.}$$

Similarly, taking the L 2(μ)-inner product of this equation with Ψ yields that, provided that Φ k Ψ  ∈ L 2(μ) for all \(k \in \mathbb{N}_{0}\),

$$\displaystyle{u_{\ell} =\sum _{k\in \mathbb{N}_{0}} \frac{v_{k}\langle \varPhi _{k}\varPsi _{\ell}\rangle _{\mu }} {\langle \varPsi _{\ell}^{2}\rangle _{\mu }}.}$$

Remark 11.16.

It is possible to adapt the notion of a gPC expansion to the situation of a stochastic germ \(\varXi\) with arbitrary dependencies among its components, but there are some complications. In summary, suppose that \(\varXi = (\varXi _{1},\ldots,\varXi _{d})\), taking values in \(\varTheta =\varTheta _{1} \times \ldots \times \varTheta _{d}\), has joint law μ, which is not necessarily a product measure. Nevertheless, let μ i denote the marginal law of \(\varXi _{i}\), i.e.

$$\displaystyle{\mu _{i}(E_{i}):=\mu (\varTheta _{1} \times \ldots \times \varTheta _{i-1} \times E_{i} \times \varTheta _{i+1} \times \ldots \times \varTheta _{d}).}$$

To simplify matters further, assume that μ (resp. μ i ) has Lebesgue density ρ (resp. ρ i ). Now let \(\phi _{p}^{(i)} \in \mathfrak{P}\), \(p \in \mathbb{N}_{0}\), be univariate orthogonal polynomials for μ i . The chaos function associated with a multi-index \(\alpha \in \mathbb{N}_{0}^{d}\) is defined to be

$$\displaystyle{\varPsi _{\alpha }(\xi ):= \sqrt{\frac{\rho _{1 } (\xi _{1 } )\ldots \rho _{d } (\xi _{d } )} {\rho (\xi )}} \phi _{\alpha _{1}}^{(1)}(\xi _{ 1})\ldots \phi _{\alpha _{d}}^{(d)}(\xi _{ d}).}$$

It can be shown that the family \(\{\varPsi _{\alpha }\mid \alpha \in \mathbb{N}_{0}^{d}\}\) is a complete orthonormal basis for \(L^{2}(\varTheta,\mu; \mathbb{R})\), so we have the usual series expansion \(U =\sum _{\alpha }u_{\alpha }\varPsi _{\alpha }\). Note, however, that with the exception of Ψ 0 = 1, the functions Ψ α are not polynomials. Nevertheless, we still have the usual properties that truncation error is orthogonal to the approximation subspace, and

$$\displaystyle{\mathbb{E}_{\mu }[U] = u_{0},\quad \mathbb{V}_{\mu }[U] =\sum _{\alpha \neq 0}u_{\alpha }^{2}\langle \varPsi _{ \alpha }^{2}\rangle _{ \mu }.}$$

Remark 11.17.

Polynomial chaos expansions were originally introduced in stochastic analysis, and in that setting the stochastic germ \(\varXi\) typically has countably infinite dimension, i.e. \(\varXi = (\varXi _{1},\ldots,\varXi _{d},\ldots )\). Again, for simplicity, suppose that the components of \(\varXi\) are independent, and hence orthogonal; let \(\varTheta\) denote the range of \(\varXi\), which is an infinite product domain, and let \(\mu =\bigotimes _{d\in \mathbb{N}}\mu _{d}\) denote the law of \(\varXi\). For each \(d \in \mathbb{N}\), let \(\{\psi _{\alpha _{d}}^{(d)}\mid \alpha _{d} \in \mathbb{N}_{0}\}\) be a system of univariate orthogonal polynomials for \(\varXi _{d} \sim \mu _{d}\), again with the usual convention that \(\psi _{0}^{(d)} \equiv 1\). Products of the form

$$\displaystyle{\psi _{\alpha }(\xi ):=\prod _{d\in \mathbb{N}}\psi _{\alpha _{d}}^{(d)}(\xi _{ d})}$$

are again polynomials when only finitely many α d ≠ 0, and form an orthogonal system of polynomials in \(L^{2}(\varTheta,\mu; \mathbb{R})\).

As in the finite-dimensional case, there are many choices of ordering for the basis polynomials, some of which may lend themselves to particular problems. One possible orthogonal PC decomposition of \(u(\varXi )\) for \(u \in L^{2}(\varTheta,\mu; \mathbb{R})\), in which summands are arranged in order of increasing ‘complexity’, is

$$\displaystyle\begin{array}{rcl} u(\varXi ) = f_{0}& +& \sum _{d\in \mathbb{N}}u_{\alpha _{d}}\psi _{\alpha _{d}}^{(d)}(\varXi _{ d}) {}\\ & +& \sum _{d_{1},d_{2}\in \mathbb{N}}u_{\alpha _{d_{ 1}}\alpha _{d_{2}}}\psi _{\alpha _{d_{ 1}}}^{(d_{1})}(\varXi _{ d_{1}})\psi _{\alpha _{d_{2}}}^{(d_{2})}(\varXi _{ d_{2}}) {}\\ & \cdots & {}\\ & +& \sum _{d_{1},d_{2},\ldots,d_{k}\in \mathbb{N}}u_{\alpha _{d_{ 1}}\alpha _{d_{2}}\ldots \alpha _{d_{k}}}\psi _{\alpha _{d_{ 1}}}^{(d_{1})}(\varXi _{ d_{1}})\psi _{\alpha _{d_{2}}}^{(d_{2})}(\varXi _{ d_{2}})\cdots \psi _{\alpha _{d_{k}}}^{(d_{k})}(\varXi _{ d_{k}}) {}\\ & \cdots \,;& {}\\ \end{array}$$

i.e., writing \(\varPsi _{\alpha _{d}}^{(d)}\) for the image random variable \(\psi _{\alpha _{d}}^{(d)}(\varXi _{d})\),

$$\displaystyle\begin{array}{rcl} U = u_{0}& +& \sum _{d\in \mathbb{N}}u_{\alpha _{d}}\varPsi _{\alpha _{d}}^{(d)} {}\\ & +& \sum _{d_{1},d_{2}\in \mathbb{N}}u_{\alpha _{d_{ 1}}\alpha _{d_{2}}}\varPsi _{\alpha _{d_{ 1}}}^{(d_{1})}\varPsi _{ \alpha _{d_{2}}}^{(d_{2})}(\varXi _{ d_{2}}) {}\\ & \cdots & {}\\ & +& \sum _{d_{1},d_{2},\ldots,d_{k}\in \mathbb{N}}u_{\alpha _{d_{ 1}}\alpha _{d_{2}}\ldots \alpha _{d_{k}}}\varPsi _{\alpha _{d_{ 1}}}^{(d_{1})}\varPsi _{ \alpha _{d_{2}}}^{(d_{2})}\cdots \varPsi _{\alpha _{ d_{k}}}^{(d_{k})} {}\\ & \cdots \,.& {}\\ \end{array}$$

The PC coefficients \(u_{\alpha _{d}} \in \mathbb{R}\), etc. are determined by the usual orthogonal projection relation. In practice, this expansion must be terminated at finite k, and provided that u is square-integrable, the L 2 truncation error decays to 0 as \(k \rightarrow \infty \), with more rapid decay for smoother u, as in, e.g., Theorem 8.23.

11.4 Wavelet Expansions

Recall from the earlier discussion of Gibbs’ phenomenon in Chapter 8 that expansions of non-smooth functions in terms of smooth basis functions such as polynomials, while guaranteed to be convergent in the L 2 sense, can have poor pointwise convergence properties. However, to remedy such problems, one can consider spectral expansions in terms of orthogonal bases of functions in \(L^{2}(\varTheta,\mu; \mathbb{R})\) that are no longer polynomials: a classic example of such a construction is the use of wavelets, which were developed to resolve the same problem in harmonic analysis and its applications. This section considers, by way of example, orthogonal decomposition of random variables using Haar wavelets, the so-called Wiener–Haar expansion.

Definition 11.18.

The Haar scaling function is \(\phi (x):= \mathbb{I}_{[0,1)}(x)\). For \(j \in \mathbb{N}_{0}\) and \(k \in \{ 0,\ldots,2^{j} - 1\}\), let \(\phi _{j,k}(x):= 2^{j/2}\phi (2^{j}x - k)\) and

$$\displaystyle{\mathcal{V}_{j}:=\mathop{ \mathrm{span}}\nolimits \{\phi _{j,0},\ldots,\phi _{j,2^{j}-1}\}.}$$

The Haar function (or Haar mother wavelet) \(\psi: [0,1] \rightarrow \mathbb{R}\) is defined by

$$\displaystyle{\psi (x):= \left \{\begin{array}{@{}l@{\quad }l@{}} 1, \quad &\mbox{ if $0 \leq x < \tfrac{1} {2}$,} \\ -1,\quad &\mbox{ if $\tfrac{1} {2} \leq x < 1$,}\\ 0, \quad &\mbox{ otherwise.} \end{array} \right.}$$

The Haar wavelet family is the collection of scaled and shifted versions ψ j, k of the mother wavelet ψ defined by

$$\displaystyle{\psi _{j,k}(x):= 2^{j/2}\psi (2^{j}x - k)\quad \mbox{ for $j \in \mathbb{N}_{ 0}$ and $k \in \{ 0,\ldots,2^{j} - 1\}$.}}$$

The spaces \(\mathcal{V}_{j}\) form an increasing family of subspaces of \(L^{2}([0,1],\mathrm{d}x; \mathbb{R})\), with the index j representing the level of ‘detail’ permissible in a function \(f \in \mathcal{V}_{j}\): more concretely, \(\mathcal{V}_{j}\) is the set of functions on [0, 1] that are constant on each half-open interval [2j k, 2j(k + 1)). A straightforward calculation from the above definition yields the following:

Lemma 11.19.

For all \(j,j' \in \mathbb{N}_{0}\) , \(k \in \{ 0,\ldots,2^{j} - 1\}\) and \(k' \in \{ 0,\ldots,2^{j'} - 1\}\) ,

$$\displaystyle\begin{array}{rcl} \int _{0}^{1}\psi _{ j,k}(x)\,\mathrm{d}x& =& 0,\quad \mbox{ and} {}\\ \int _{0}^{1}\psi _{ j,k}(x)\psi _{j',k'}(x)\,\mathrm{d}x& =& \delta _{jj'}\delta _{kk'}. {}\\ \end{array}$$

Hence, \(\{1\} \cup \{\psi _{j,k}\mid j \in \mathbb{N}_{0},k \in \{ 0,1,\ldots,2^{j} - 1\}\}\) is a complete orthonormal basis of \(L^{2}([0,1],\mathrm{d}x; \mathbb{R})\) . If \(\mathcal{W}_{j}\) denotes the orthogonal complement of \(\mathcal{V}_{j}\) in \(\mathcal{V}_{j+1}\) , then

$$\displaystyle\begin{array}{rcl} \mathcal{W}_{j}& =& \mathop{\mathrm{span}}\nolimits \{\psi _{j,0},\ldots,\psi _{j,2^{j}-1}\},\quad \mbox{ and} {}\\ L^{2}([0,1],\mathrm{d}x; \mathbb{R})& =& \bigoplus _{ j\in \mathbb{N}_{0}}\mathcal{W}_{j}. {}\\ \end{array}$$

Consider a stochastic germ \(\varXi \sim \mu \in \mathcal{M}_{1}(\mathbb{R})\) with cumulative distribution function \(F_{\varXi }: \mathbb{R} \rightarrow [0,1]\). For simplicity, suppose that \(F_{\varXi }\) is continuous and strictly increasing, so that \(F_{\varXi }\) is differentiable (with \(F'_{\varXi } = \frac{\mathrm{d}\mu } {\mathrm{d}x} =\rho _{\varXi }\)) almost everywhere, and also invertible. We wish to write a random variable \(U \in L^{2}(\mathbb{R},\mu; \mathbb{R})\), in particular one that may be a non-smooth function of \(\varXi\), as

$$\displaystyle\begin{array}{rcl} U(\xi )& =& u_{0} +\sum _{j\in \mathbb{N}_{0}}\sum _{k=0}^{2^{j}-1 }u_{j,k}\psi _{j,k}(F_{\varXi }(\xi )) {}\\ & =& u_{0} +\sum _{j\in \mathbb{N}_{0}}\sum _{k=0}^{2^{j}-1 }u_{j,k}W_{j,k}(\xi ); {}\\ \end{array}$$

such an expansion will be called a Wiener–Haar expansion of U. See Figure 11.2 for an illustration comparing the cumulative distribution function of a truncated Wiener–Haar expansion to that of a standard Gaussian, showing the ‘clumping’ of probability mass that is to be expected of Wiener–Haar wavelet expansions but not of Wiener–Hermite polynomial chaos expansions. Indeed, the (sample) law of a Wiener–Haar expansion even has regions of zero probability mass.

Fig. 11.2
figure 2

The cumulative distribution function and binned peak-normalized probability density function of 105 i.i.d. samples of a random variable U with truncated Wiener–Haar expansion \(U =\sum _{ j=0}^{J}\sum _{k=0}^{2^{j}-1 }u_{j,k}W_{j,k}(\varXi )\), where \(\varXi \sim \mathcal{N}(0,1)\). The coefficients u j, k were sampled independently from \(u_{j,k} \sim 2^{-j}\mathcal{N}(0,1)\). The cumulative distribution function of a standard Gaussian is shown dashed for comparison.

Note that, by a straightforward change of variables \(x = F_{\varXi }(\xi )\):

$$\displaystyle\begin{array}{rcl} \int _{\mathbb{R}}W_{j,k}(\xi )W_{j',k'}(\xi )\,\mathrm{d}\mu (\xi )& =& \int _{\mathbb{R}}W_{j,k}(\xi )W_{j',k'}(\xi )\rho _{\varXi }(\xi )\,\mathrm{d}\xi {}\\ & =& \int _{0}^{1}\psi _{ j,k}(x)\psi _{j',k'}(x)\,\mathrm{d}x {}\\ & =& \delta _{jj'}\delta _{kk'}, {}\\ \end{array}$$

so the family \(\{W_{j,k}\mid j \in \mathbb{N}_{0},k \in \{ 0,\ldots,2^{j} - 1\}\}\) forms a complete orthonormal basis for \(L^{2}(\mathbb{R},\mu; \mathbb{R})\). Hence, the Wiener–Haar coefficients are determined by

$$\displaystyle\begin{array}{rcl} u_{j,k} =\langle UW_{j,k}\rangle & =& \int _{\mathbb{R}}U(\xi )W_{j,k}(\xi )\rho _{\varXi }(\xi )\,\mathrm{d}\xi {}\\ & =& \int _{0}^{1}U(F_{\varXi }^{-1}(x))\psi _{ j,k}(x)\,\mathrm{d}x. {}\\ \end{array}$$

As in the case of a gPC expansion, the usual expressions for the mean and variance of U hold:

$$\displaystyle{\mathbb{E}[U] = u_{0}\quad \mbox{ and}\quad \mathbb{V}[U] =\sum _{j\in \mathbb{N}_{0}}\sum _{k=0}^{2^{j}-1 }\vert u_{j,k}\vert ^{2}.}$$

Comparison of Wavelet and gPC Expansions.

Despite the formal similarities of the corresponding expansions, there are differences between wavelet and gPC spectral expansions. For gPC expansions, the globally smooth orthogonal polynomials used as the basis elements have the property that expansions of smooth functions/random variables enjoy a fast convergence rate, as in Theorem 8.23; no such connection between smoothness and convergence rate is to be expected for Wiener–Haar expansions, in which the basis functions are non-smooth. However, in cases in which U shows a localized sharp variation or a discontinuity, a Wiener–Haar expansion may be more efficient than a gPC expansion, since the convergence rate of the latter would be impaired by Gibbs-type phenomena. Another distinctive feature of the Wiener–Haar expansion concerns products of piecewise constant processes. For instance, for \(f,g \in \mathcal{V}_{j}\) the product fg is again an element of \(\mathcal{V}_{j}\); it is not true that the product of two polynomials of degree at most n is again a polynomial of degree at most n. Therefore, for problems with strong dependence upon high-degree/high-detail features, or with multiplicative structure, Wiener–Haar expansions may be more appropriate than gPC expansions.

11.5 Bibliography

Mercer (1909) proved Theorem 11.3 for positive semi-definite kernels on [a, b] × [a, b]; the general theorem as used here can be found in many standard works on linear functional analysis, e.g. that of Dunford and Schwartz (1963, pp. 1087–1088); Steinwart and Scovel (2012) consider Mercer-type theorems for non-compact domains. The Karhunen–Loève expansion bears the names of Karhunen (1947) and Loève (1978), but Karhunen–Loève-type series expansions of stochastic processes were considered earlier by Kosambi (1943). Jolliffe (2002) gives a general introduction to principal component analysis, and de Leeuw (2013) gives a survey of the history of PCA and its nonlinear generalizations.

The application of Wiener–Hermite PC expansions to engineering systems was popularized by Ghanem and Spanos (1991); the extension to gPC and the connection with the Askey scheme is due to Xiu and Karniadakis (2002). The extension of gPC expansions to arbitrary dependency among the components of the stochastic germ, as in Remark 11.16, is due to Soize and Ghanem (2004). For a more pedagogical approach, see the discussion of the UQ applications of spectral expansions in the books of Le Maître and Knio (2010, Chapter 2), Smith (2014, Chapter 10), and Xiu (2010, Chapter 5).

The orthogonal decomposition properties of the Haar basis were first noted by Haar (1910). Meyer (1992) provides a thorough introduction to wavelets in general. Wavelet bases for UQ, which can better resolve locally non-smooth features of random fields, are discussed by Le Maître and Knio (2010, Chapter 8) and in articles of Le Maître et al. (2004a,b, 2007). Wavelets are also used in the construction and sampling of Besov measures, as in the articles of Dashti et al. (2012) and Lassas et al. (2009), and Theorem 11.10 is synthesized from results in those two papers. A thorough treatment of Besov spaces from a Fourier-analytic perspective can be found in Bahouri et al. (2011, Chapter 2).

11.6 Exercises

Exercise 11.1.

Consider the negative Laplacian operator \(\mathcal{L}:= -\frac{\mathrm{d}^{2}} {\mathrm{d}x^{2}}\) acting on real-valued functions on the interval [0, 1], with zero boundary conditions. Show that the eigenvalues μ n and normalized eigenfunctions ψ n of \(\mathcal{L}\) are

$$\displaystyle\begin{array}{rcl} \mu _{n}& =& (\pi n)^{2}, {}\\ \psi _{n}(x)& =& \sqrt{2}\sin (\pi nx). {}\\ \end{array}$$

Hence show that \(C:= \mathcal{L}^{-1}\) has the same eigenfunctions with eigenvalues \(\lambda _{n} = (\pi n)^{-2}\). Hence, using the Karhunen–Loève theorem, generate figures similar to Figure 11.1 for your choice of mean field \(m: [0,1] \rightarrow \mathbb{R}\).

Exercise 11.2.

Do the analogue of Exercise 11.1 for \(\mathcal{L} = (-\varDelta )^{\alpha }\) acting on real-valued functions on the square [0, 1]2, again with zero boundary conditions. Try α = 2 first, then try α = 1, and try coarser and finer meshes in each case. You should see that your numerical draws from the Gaussian field with α = 1 fail to converge, whereas they converge nicely for α > 1. Loosely speaking, the reason for this is that a Gaussian random variable with covariance (−Δ)α is almost surely in the Sobolev space H s or the Hölder space \(\mathcal{C}^{s}\) for \(s <\alpha -\frac{d} {2}\), where d is the spatial dimension; thus, α = 1 on the two-dimensional square is exactly on the borderline of divergence.

Exercise 11.3.

Show that the eigenvalues \(\lambda _{n}\) and eigenfunctions e n of the exponential covariance function \(C(x,y) =\exp (-\vert x - y\vert /a)\) on [−b, b] are given by

$$\displaystyle{\lambda _{n} = \left \{\begin{array}{@{}l@{\quad }l@{}} \frac{2a} {1+a^{2}w_{n}^{2}},\quad &\mbox{ if $n \in 2\mathbb{Z}$,} \\ \frac{2a} {1+a^{2}v_{n}^{2}},\quad &\mbox{ if $n \in 2\mathbb{Z} + 1$,}\\ \quad \end{array} \right.}$$
$$\displaystyle{e_{n}(x) = \left \{\begin{array}{@{}l@{\quad }l@{}} \sin (w_{n}x)\big/\sqrt{b - \frac{\sin (2w_{n } b)} {2w_{n}}},\quad &\mbox{ if $n \in 2\mathbb{Z}$,} \\ \cos (v_{n}x)\big/\sqrt{b + \frac{\sin (2v_{n } b)} {2v_{n}}}, \quad &\mbox{ if $n \in 2\mathbb{Z} + 1$,}\\ \quad \end{array} \right.}$$

where w n and v n solve the transcendental equations

$$\displaystyle{\left \{\begin{array}{@{}l@{\quad }l@{}} aw_{n} +\tan (w_{n}b) = 0,\quad &\mbox{ for $n \in 2\mathbb{Z}$,} \\ 1 - av_{n}\tan (v_{n}b) = 0,\quad &\mbox{ for $n \in 2\mathbb{Z} + 1$.} \end{array} \right.}$$

Hence, using the Karhunen–Loève theorem, generate sample paths from the Gaussian measure with covariance kernel C and your choice of mean path. Note that you will need to use a numerical method such as Newton’s method to find approximate values for w n and v n .

Exercise 11.4 (Karhunen–Loève-type sampling of Besov measures).

Let \(\mathbb{T}^{d}:= \mathbb{R}^{d}/\mathbb{Z}^{d}\) denote the d-dimensional unit torus. Let \(\{\psi _{\ell}\mid \ell \in \mathbb{N}\}\) be an orthonormal basis for \(L^{2}(\mathbb{T}^{d},\mathrm{d}x; \mathbb{R})\). Let \(q \in [1,\infty )\) and \(s \in (0,\infty )\), and define a new norm \(\|\cdot \|_{X^{s,q}}\) on series \(u =\sum _{\ell\in \mathbb{N}}u_{\ell}\psi _{\ell}\) by

$$\displaystyle{\left \|\sum _{\ell\in \mathbb{N}}u_{\ell}\psi _{\ell}\right \|_{X^{s,q}}:= \left (\sum _{\ell\in \mathbb{N}}\ell^{ \frac{sq} {d} +\frac{q} {2} -1}\vert u_{\ell}\vert ^{q}\right )^{1/q}.}$$

Show that \(\|\cdot \|_{X^{s,q}}\) is indeed a norm and that the set of u with \(\|u\|_{X^{s,q}}\) finite forms a Banach space. Now, for \(q \in [1,\infty )\), s > 0, and κ > 0, define a random function U by

$$\displaystyle{U(x):=\sum _{\ell\in \mathbb{N}}\ell^{-( \frac{s} {d}+\frac{1} {2} -\frac{1} {q})}\kappa ^{-\frac{1} {q} }\varXi _{\ell}\psi _{\ell}(x)}$$

where \(\varXi _{\ell}\) are sampled independently and identically from the generalized Gaussian measure on \(\mathbb{R}\) with Lebesgue density proportional to \(\exp (-\frac{1} {2}\vert \xi \vert ^{q})\). By treating the above construction as an infinite product measure and considering the product of the densities \(\exp (-\frac{1} {2}\vert \xi _{\ell}\vert ^{q})\), show formally that U has ‘Lebesgue density’ proportional to \(\exp (-\frac{\kappa }{2}\|u\|_{X^{s,q}}^{q})\).

Generate sample realizations of U and investigate the effect of the various parameters q, s and κ. It may be useful to know that samples from the probability measure \(\frac{\beta ^{1/2}} {2\varGamma (1+\frac{1} {q})}\exp (-\beta ^{q/2}\vert x - m\vert ^{q})\,\mathrm{d}x\) can be generated as m +β −1∕2 S | Y | 1∕q where S is uniformly distributed in { − 1, +1} and Y is distributed according to the gamma distribution on \([0,\infty )\) with parameter q, which has Lebesgue density \(qe^{-qx}\mathbb{I}_{[0,\infty )}(x)\).