Construction of Tight Frames on Graphs and Application to Denoising

Göbel, Franziska; Blanchard, Gilles; Luxburg, Ulrike von

doi:10.1007/978-3-319-18284-1_20

Franziska Göbel⁷,
Gilles Blanchard⁷ &
Ulrike von Luxburg⁸

Part of the book series: Springer Handbooks of Computational Statistics ((SHCS))

4525 Accesses
8 Citations

Abstract

Given a neighborhood graph representation of a finite set of points $x_i\in \mathbb {R}^d,i=1,\ldots ,n,$ we construct a frame (redundant dictionary) for the space of real-valued functions defined on the graph. This frame is adapted to the underlying geometrical structure of the x _i, has finitely many elements, and these elements are localized in frequency as well as in space. This construction follows the ideas of Hammond et al. (Appl Comput Harmon Anal 30:129–150, 2011), with the key point that we construct a tight (or Parseval) frame. This means we have a very simple, explicit reconstruction formula for every function f defined on the graph from the coefficients given by its scalar product with the frame elements. We use this representation in the setting of denoising where we are given noisy observations of a function f defined on the graph. By applying a thresholding method to the coefficients in the reconstruction formula, we define an estimate of f whose risk satisfies a tight oracle inequality.

Access provided by CONRICYT-eBooks. Download chapter PDF

Optimization Methods for Frame Conditioning and Application to Graph Laplacian Scaling

Properties of tight frames that are regular schemes

Article 20 June 2019

Spectral Design of Signal-Adapted Tight Frames on Graphs

Keywords

1 Introduction

1.1 Motivation

When dealing with high-dimensional data, a general principle is that the curse of dimensionality can be efficiently fought if one assumes the data points to lie on a structure of smaller intrinsic dimensionality, typically a manifold. Some well-known methods to discover such a lower dimensional structure include Isomap (Tenenbaum et al. 2000), LLE (Roweis and Saul 2000), and Laplacian Eigenmaps (Belkin and Niyogi 2003).

In this work, our main interest is not in visualizing or representing by an explicit mapping the underlying structure of the observed data points; rather, we want to represent or estimate efficiently a real-valued function on these points. More specifically, we focus on the following denoising problem: assuming we observe a noisy version of the function f, y _i = f(x _i) + ε _i at points (x ₁, …, x _n), we would like to recover the values of f at these points. An important step for solving this problem is to find a dictionary of functions to represent the signal f, which is adapted to the structure of the data. Ideally, we would like this dictionary to exhibit the features of a wavelet basis. In traditional signal processing on a flat space, with data points on a regular grid, orthogonal wavelet bases offer a very powerful tool to sparsely represent signals with inhomogeneous regularity (such as a signal that is very smooth everywhere except at a few singular points where it is discontinuous). Such bases are in particular well suited to the denoising task. Can this be generalized to irregularly scattered data on a manifold?

We present such a method to construct a so-called Parseval frame of functions exhibiting wavelet-like properties while adapting to the intrinsic geometry of the data. Furthermore, we use this dictionary for the denoising task using a simple coefficient thresholding method.

This work is organized as follows. In the coming section, we discuss the relationship to previous work on which the present chapter is built, as well as pointing out our new contributions. In Sect. 20.2, we recall important notions of frame theory as well as of neighborhood graphs needed for our construction. The construction of the frame and its properties is presented in Sect. 20.3. In Sect. 20.4, we develop a coefficient thresholding strategy for the denoising problem. In Sect. 20.5, we present numerical results and method comparison on testbed data.

1.2 Relation to Previous Work

Regression methods that adapt to an underlying lower dimension of the data have been considered by Bickel and Li (2007), Kpotufe and Dasgupta (2012) and Kpotufe (2011) using local polynomial estimates, random projection trees, and nearest-neighbors, respectively. However, these methods are not constructed to adapt to an inhomogeneous regularity of the target function: in these three cases, the smoothing scale (determined by the smoothing kernel bandwidth, the tree partition’s average data diameter, or the number of neighbors, respectively) is fixed globally. In the experimental Sect. 20.5, for data lying on a smooth manifold but a target function exhibiting a sharp discontinuity, we demonstrate the advantage of our method over kernel smoothing.

Based on the motivations similar to ours, a method for constructing a wavelet-like basis on scattered data was proposed by Gavish et al. (2010). It is based on a hierarchical tree partition of the data, on which a Haar-like basis of 0–1 functions is constructed. However, the performance of that method is then adapted to the geometry of the tree, in the sense that the distance of two points is measured through tree path distance. This can strongly distort the original distance: two close points in original distance can find themselves in very separated subtrees.

The construction proposed here, based on a transform of the spectral decomposition of the graph Laplacian, follows closely the ideas of Hammond et al. (2011). Two important contributions brought forth in the present work are that we construct a Parseval (or tight) frame, rather than a general frame; and we consider an explicit thresholding method for the denoising problem. The former point is crucial to obtain sharp bounds for the thresholding method, and also eliminates the computational problem of signal reconstruction from the frame coefficients, since Parseval frames enjoy a reconstruction formula similar to that of an orthonormal basis. The choice of multiscale bandpass filter functions leading to the tight frame is inspired by the recent work of Coulhon et al. (2012), where the spectral decomposition principle is also studied, albeit in the setting of a quite general metric space.

2 Notation and Basics

2.1 Setting

We consider a sample of n points $x_i\in \mathbb {R}^d$. These points are assumed to belong to an unknown low-dimensional submanifold $\mathscr {M}\subset \mathbb {R}^d$. We denote the design by $\mathfrak {D}=\{{x_{1},\ldots ,x_{n}}\}\subset \mathscr {M}$. Furthermore, we observe on these points the (noisy) value of a function $f: \mathfrak {D} \rightarrow \mathbb {R}$. Since $\mathfrak {D}$ is finite, we can represent the function f as vector $f=(f(x_1),\ldots ,f(x_n))^t\in \mathbb {R}^n$. The space of all (square-integrable) functions f defined on $\mathfrak {D}$ is denoted $L^2(\mathfrak {D})$ and endowed with the usual Euclidean inner product.

We denote by y _i = f(x _i) + 𝜖 _i the noisy observation of f at x _i, where 𝜖 _i are independent identically distributed centered random variables. The problem we consider in this work is that of denoising, that is, try to recover the underlying value of the function f at the points x _i.

While the existence of a low-dimensional supporting manifold $\mathscr {M}$ for the design points motivates the construction of the proposed method, we underline (again) that $\mathscr {M}$ is not known to the user and the method only uses the knowledge of the design points. In such a setting, a key idea to recover implicitly some information on the geometry of $\mathscr {M}$ is to construct a neighborhood graph based on the design points (see Sect. 20.2.3 for details).

2.2 Frames

For the construction in Sect. 20.3, we rely on the notion of a vector frame, for which we recall here some important properties (see, e.g., Casazza et al. 2013; Han 2007; Christensen 2008). A frame is an overcomplete dictionary with particular properties allowing it to act almost as basis.

Definition 1

Let $\mathscr {H}$ be a Hilbert space. Then a countable set $\{z_i\}_{i\in I}\subset \mathscr {H}$ is a frame with frame bounds A and B for $\mathscr {H}$ if there exists constants 0 < A ≤ B < ∞ such that

$$\displaystyle \begin{aligned} \forall z\in\mathscr{H}:~\; A \left\lVert z \right\rVert^2 \leq \sum_{i\in I}\left\lvert \left\langle z,z_i \right\rangle \right\rvert^2\leq B \left\lVert z \right\rVert^2. \end{aligned} $$

(20.1)

A frame is called tight if A = B, in particular the frame is called Parseval if A = B = 1.

In the remainder of this work we consider the case of a Euclidean space $\mathscr {H}=\mathbb {R}^n$, and assume that {z _i}_{i ∈ I} is a frame with a finite number of elements. Two important operators associated to the frame are the analysis operator

$$\displaystyle \begin{aligned}T:\mathbb{R}^n \rightarrow \mathbb{R}^I,~ Tz:=(\left\langle z,z_i \right\rangle)_{i\in I}\end{aligned} $$

(20.2)

(sequence of frame coefficients), and its adjoint the synthesis operator:

$$\displaystyle \begin{aligned}T^*: \mathbb{R}^I\rightarrow \mathbb{R}^n,~ T^*a=T^* (a_i)_{i\in I}^t=\sum_{i\in I} a_i z_i.\end{aligned} $$

(20.3)

Further, the frame operator is defined as S = T ^∗ T:

$$\displaystyle \begin{aligned}S:\mathbb{R}^n \rightarrow \mathbb{R}^n, ~Sz=T^*Tz=\sum_{i\in I}\left\langle z,z_i \right\rangle z_i,\end{aligned} $$

(20.4)

and finally the Gramian operator as U = TT ^∗,

$$\displaystyle \begin{aligned}U: \mathbb{R}^I\rightarrow \mathbb{R}^I,~ Ua=TT^*a= \left\{\left\langle \sum_{i\in I} a_i z_i,z_k \right\rangle\right\}_{k\in I}.\end{aligned} $$

(20.5)

In matrix form, the columns of T ^∗ are the vectors z _i, i ∈ I, T is its transpose and $U_{ij} = \left \langle z_i,z_j \right \rangle $.

The definition of a frame implies that S is invertible, and it is possible to reconstruct any z from its frame coefficients by $z=\sum _{i\in I}\left \langle z,z_i \right \rangle z^*_i=\sum _{i\in I}\left \langle z,z^*_i \right \rangle z_i$, where $z^*_i:=S^{-1}z_i, i\in I$ is called the canonical dual frame of (z _i)_{i ∈ I}.

We recall some properties of finite Parseval frames over Euclidean spaces (see, e.g., Han 2007, chapter 3).

Theorem 1 (Properties of Parseval frames)

Let $\mathscr {H}$ be a Hilbert space with $\dim \mathscr {H}=n< \infty $ . The following statements are equivalent:

1.
$\{z_i\}_{1\leq i \leq k}\subset \mathscr {H}$ is a Parseval frame.
2.
$\forall y\in \mathscr {H}: ~~y=\sum _{i=1}^{k}\left \langle \,y,z_i \right \rangle z_i$.
3.
The frame operator S is the identity on $\mathbb {R}^n$.
4.
The Gramian operator U is an orthogonal projector of rank n in $\mathbb {R}^k$.

Furthermore if $\{z_i\}_{1..k}\subset \mathscr {H}$ is a Parseval frame, then

$\left \lVert z_i \right \rVert \leq 1$ for i ∈{1, …, k};
$\dim \mathscr {H}=n=\sum _{i=1}^{k}\left \lVert z_i \right \rVert ^2;$
the canonical dual frame is the frame itself.

For the present work, the two most important points of this theory are the following: first, the reconstruction formula (point two above), where we see that a Parseval frame acts similarly to an orthonormal basis; secondly, if we construct a vector v = T ^∗ a =∑_i a _i z _i from an arbitrary vector of coefficients (a _i), then

$$\displaystyle \begin{aligned} \left\lVert \sum_i a_i z_i \right\rVert^2 = \left\langle T^*a,T^*a\right\rangle = \left\langle a,Ua\right\rangle = \left\lVert Ua \right\rVert^2 \leq \left\lVert a \right\rVert^2, \end{aligned} $$

(20.6)

which follows from property 4 above.

2.3 Neighborhood Graphs

In order to exploit the structure and geometry of the unknown submanifold $\mathscr {M}$ on which the sample $\mathfrak {D}$ is supposed to lie, a powerful idea is to use a graph-based representation of the data $\mathfrak {D}$ through a neighborhood graph. The points in $\mathfrak {D}$ correspond to the vertices of the graph, and two vertices of the graph are joined by an edge when the two corresponding points are neighbors (in some appropriate sense) in $\mathbb {R}^n$. The underlying idea is that the local geometry of $\mathbb {R}^n$ is reflected in the local connectivity of the graph, while the long-range geometry of the graph reflects the geometrical properties of the manifold $\mathscr {M}$, rather than those of $\mathbb {R}^n$.

Formally, a finite graph G = (V, E) is given by a finite set of vertices V and a set of edges E ⊂ V × V . The |V |×|V | adjacency matrix A of the graph is defined by A _i,j = 1 if (v _i, v _j) ∈ E and A _i,j = 0 otherwise. An undirected graph is such that its adjacency matrix is symmetric.

The graph is called weighted if every edge e ∈ E has a positive weight $w(e)\in \mathbb {R}_{+}$. In this case the notion of adjacency matrix is extended to A _i,j = w((v _i, v _j)) if (v _i, v _j) ∈ E and A _i,j = 0 otherwise. The degree of a vertex v _i in a (possibly weighted) graph is defined as $d_i=d(i)=\sum _{j=1}^{|V|}A_{i,j}$.

As announced, we focus on geometric graphs, which (can) approximate the structure of the unknown $\mathscr {M}$. Each point x _i is represented by a vertex, say v _i. An edge between two vertices represents a small distance, or a high similarity, of the two associated points. The weight of an edge can quantify the similarity more finely.

We use the Euclidean distance $d(x_i,x_j)=\left \lVert x_i-x_j \right \rVert .$ We recall three usual ways to construct the edges of a neighborhood graph:

(undirected) k-nearest-neighbor graph: an undirected edge connects the two vertices v _i and v _j iff x _i belongs to the k nearest neighbors of x _j, or x _j belongs to the k nearest neighbors of x _i (“the k-NN-graph”).
𝜖-graph: an undirected edge connects two vertices v _i and v _j iff d(x _i, x _j) ≤ 𝜖.
complete weighted neighborhood graph: for each pair of vertices there exists an undirected edge with a weight depending on the distance/similarity of the two vertices.

A k-NN graph or an 𝜖-graph can be made weighted by additionally assigning weights to the edges depending on d(x _i, x _j), for instance by choosing Gaussian weights $w(\left \{i,j\right \})= \exp ( - d^2(x_i,x_j)/2\lambda ^2)$.

2.4 Spectral Graph Theory

If one considers real-valued functions $f:\mathscr {M}\rightarrow \mathbb {R}$ defined on a submanifold $\mathscr {M}\subset \mathbb {R}^d$, it is known that under some regularity assumptions on the submanifold $\mathscr {M}$, the eigenfunctions of the Laplace-Beltrami-operator give a basis of the space of squared-integrable functions on $\mathscr {M}$. Since $\mathscr {M}$ is unknown in our setting, the principle of the Laplacian Eigenmaps method (Belkin and Niyogi 2003) is to use a discrete analogon, namely the graph Laplace operator L on a neighborhood graph.

Given a finite weighted undirected graph with adjacency matrix A (n × n) and vertex degrees (d _i)_i, as introduced in the previous section, we will either use the unnormalized graph Laplace operator L ^u or the normalized (symmetric) graph Laplace operator L ^norm defined by

$$\displaystyle \begin{aligned} \begin{array}{rcl}L^{u}&\displaystyle =&\displaystyle D-A \\ L^{\mathrm{norm}}&\displaystyle =&\displaystyle {\mathbf{I}}_{n}-D^{-1/2}AD^{-1/2}, \end{array} \end{aligned} $$

(20.7)

where $D={\mathrm {diag}\left (d_1,\ldots , d_n\right )}$ is a diagonal matrix with entries d _i on the diagonal. By construction L ^u and L ^norm are symmetric matrices. The positive semidefiniteness follows from

$$\displaystyle \begin{aligned}f^t L^{u}f=0.5 \sum_{(i,j)}A_{i,j} (f_i-f_j)^2 \mbox{ and } f^t L^{\mathrm{norm}}f=0.5 \sum_{(i,j)}A_{i,j} \big(\frac{f_i}{\sqrt{d_i}}-\frac{f_j}{\sqrt{d_j}}\big)^2,\end{aligned} $$

respectively. The spectral theorem for matrices indicates that the normalized eigenvectors Φ _i of the graph Laplace operator L (L ^u resp. L ^norm) form an orthonormal basis of $\mathbb {R}^n$ and all eigenvalues are nonnegative. Furthermore the number of components of the graph is given by the number of eigenvalues equal to 0.

3 Construction and Properties

3.1 Construction of a Tight Graph Frame

As discussed earlier, the principle of Laplacian Eigenmaps is to use the basis (Φ _i)_1≤i≤n to represent and process the data. An important advantage of this basis as compared with the natural basis of $\mathbb {R}^d$ is that it will be adapted to the geometry of the underlying submanifold $\mathscr {M}$ supporting the data distribution. For instance, in the denoising problem, a reasonable estimator of f could be a truncated expansion of the noisy vector of observations Y in the basis (Φ _i)_1≤i≤n.

On the other hand, a disadvantage of this basis is that it is not spatially localized. To get an intuitive view, consider the simple case of the interval [0, 1] with uniformly distributed data. In the population view, the eigenbasis of the Laplacian is the Fourier basis. While a truncated expansion in this basis is well-adapted to represent functions that are uniformly regular, it is not well-suited for functions exhibiting locally varying regularity (as an extreme example, a signal that is very smooth everywhere except at a few singular points where it is discontinuous). By contrast, wavelet bases, because they are localized both in space and frequency, allow for an efficient (i.e., sparse) representation of signals with locally varying regularity.

If we now think of data supported on a one-dimensional submanifold (curve) of $\mathbb {R}^d$, we can expect that the Laplacian eigenmaps method will discover a warped Fourier basis following the curve; and, for a more general submanifold $\mathscr {M}$, “harmonics” on $\mathscr {M}$.

In order to go from this basis to a spatially localized dictionary, following ideas of Coulhon et al. (2012) and Hammond et al. (2011), we use the principle of the Littlewood-Paley decomposition.

Let G be an undirected geometric neighborhood graph with adjacency matrix A constructed from $\mathfrak {D}$, and L be an associated symmetric graph Laplace operator with increasing eigenvalues 0 = λ ₁ ≤ λ ₂ ≤⋯ ≤ λ _n and normalized eigenvectors $\varPhi _i\in \mathbb {R}^n,i=1\ldots n$.

We first define a set of vectors using a decomposition of unity and a splitting operation and we will show that this vector set is a Parseval frame.

Definition 2

Let $\{\zeta _k\}_{k\in \mathbb {N}}$ be a sequence of functions $\zeta _k: \mathbb {R}_{+}\rightarrow [0,1]$ satisfying

(DoU):: ∑_j≥0 ζ _j(x) = 1 for all x ≥ 0;
(FD):: #{ζ _k : ζ _k(λ _i) ≠ 0} < ∞ for i = 1, …, n.

Then we define the set of column vectors $\{\varPsi _{kl}\in \mathbb {R}^n , 0\leq k \leq Q, 1 \leq j \leq n\}$ by

$$\displaystyle \begin{aligned} \varPsi_{kl}=\sum_{i=1}^{n}\sqrt{\zeta_k(\lambda_i)}\varPhi_i(x_l) \varPhi_i. \end{aligned} $$

(20.8)

with $Q:=\max \{k: \exists i \in \left \{1,\ldots ,n\right \} \text{with}\, \zeta _k(\lambda _i)>0\}$.

Theorem 2

{Ψ _kl}_k,l is a Parseval frame for $\mathscr {H}=\mathbb {R}^n$ , that is for all $x\in \mathbb {R}^n$ :

$$\displaystyle \begin{aligned} \sum_{k,l}\left\lvert \left\langle x,\varPsi_{kl} \right\rangle \right\rvert^2 = \left\lVert x \right\rVert^2\,. {} \end{aligned} $$

(20.9)

Proof

If we can show that $\sum _{(k,l)}\varPsi _{kl}^{} \varPsi _{kl}^t={\mathbf {I}}_{n}$, we get immediately

$$\displaystyle \begin{aligned} y={\mathbf{I}}_{n} y=\left(\sum_{(k,l)}\varPsi_{kl}^{} \varPsi_{kl}^t\right) y \stackrel{}{=} \sum_{(k,l)} \left\langle\,y,\varPsi_{kl} \right\rangle\varPsi_{kl}, \end{aligned} $$

(20.10)

for $y\in \mathbb {R}^n$. According to Theorem 1 this equation is equivalent to the condition (20.1) with A = B = 1. So we are done. It remains to show $\sum _{k,l}\varPsi _{kl}^{} \varPsi _{kl}^t={\mathbf {I}}_{n}$. We have (since we sum over a finite number of elements)

$$\displaystyle \begin{aligned} \begin{array}{rcl} \sum_{(k,l)} \varPsi_{kl}^{} \varPsi_{kl}^t &\displaystyle = &\displaystyle \sum_{k,l,i,j} \sqrt{\zeta_k(\lambda_i)} \sqrt{\zeta_k(\lambda_{j})} \varPhi_i(x_l) \varPhi_{j}(x_l) \varPhi_i \varPhi_{j}^t \\ &\displaystyle =&\displaystyle \sum_{i=1}^{n} \sum_{k=0}^{Q} \zeta_k(\lambda_i) \, \varPhi_i \varPhi_i^t \\ &\displaystyle = &\displaystyle \sum_{i=1}^{n} \varPhi_i \varPhi_i^t={\mathbf{I}}_{n}. \end{array} \end{aligned} $$

(20.11)

For the second equality, we have used that $\sum _l \varPhi _i(x_l) \varPhi _{j}(x_l) = \left \langle \varPhi _i,\varPhi _j\right \rangle = {\mathbf {1}\{i=j\}}$, since {Φ _i}_i is an orthonormal basis (onb). For the third equality, we used (DoU), and for the last again the onb property. □

We now choose a special sequence of functions satisfying the decomposition of unity (DoU) condition while also ensuring (a) a spectral localization property for the frame elements and (b) a multiscale decomposition interpretation of the resulting decomposition. This construction follows Coulhon et al. (2012), and is known in the context of functional analysis as a smooth Littlewood-Paley decomposition.

Definition 3 (Multiscale Bandpass Filter)

Let $g\in C^{\infty }(\mathbb {R}_{+})$, suppg ⊂ [0, 1], 0 ≤ g ≤ 1, g(u) = 1 for u ∈ [0, 1∕b] (for some constant b > 1). For $k\in \mathbb {N}=\{0,1,\ldots \}$ the functions $\zeta _k: \mathbb {R}_{+}\rightarrow [0,1]$ are defined by

$$\displaystyle \begin{aligned} \zeta_k(x):= \begin{cases}g(x) & \text{if}\, k=0\\ g(b^{-k}x)-g(b^{-k+1}x) & \text{if}\, k>0\\ \end{cases} \end{aligned} $$

(20.12)

The sequence {ζ _k}_k≥0 is called multiscale bandpass filter.

This definition leads to the following properties: ζ _k(x) = ζ ₁(b ^−k x) for k ≥ 1 (multiscale decomposition), $\zeta _k\in C^{\infty }(\mathbb {R}_{+})$, 0 ≤ ζ _k ≤ 1, suppζ ₀ ⊂ [0, 1], suppζ _k ⊂ [b ^k−2, b ^k] for k ≥ 1 (spectral localization property). Moreover, one can check readily

$$\displaystyle \begin{aligned}\sum_{j\geq 0}\zeta_j(x)=1,\end{aligned} $$

(20.13)

i.e., the (DoU) condition holds. In practice, we use a dyadic bandpass filter, that is, b = 2. The functions ζ ₀, …, ζ ₅ with b = 2 are displayed in Fig. 20.1b. By construction, the parameter k in Ψ _kl is naturally a spectral scale parameter, while l is a spatial localization parameter: the frame element Ψ _kl is localized around the point x _l, as we discuss next.

3.2 Spatial Localization

By construction, the elements of the frame are band-limited, i.e. localized in the spectral scale, in the sense that for a fixed k, the frame elements Ψ _kl (l = 1, …, n) are linear combinations of the eigenvectors of the graph Laplacian (“graph harmonics”) corresponding to eigenvalues in the range [b ^k−2, b ^k] only.

From our initial motivations, it is desirable that in contrast with the eigenfunctions of the Laplace operator, the frame elements Ψ _kl are spatially localized functions. In the classical Littlewood-Paley construction for the usual Laplacian on the interval [0, 1], this is a well-known fact: the use of linear combination of trigonometric functions $\varPsi _{k l}(y) := \sin {}(kl)\sin {}(ky)$ via smooth multiscale bandpass filters weights as described in Definition 3 gives rise to strongly localized functions (as illustrated in Fig. 20.1).

Regarding the corresponding discrete construction based on the graph Laplacian, this localization property is certainly observed in practice (as illustrated in Figs. 20.2 and 20.3, see Sect. 20.5 for the setup of the numerical experiments).

Concerning the theoretical perspective, we first review briefly the existing results of Hammond et al. (2011), denote d the shortest path distance in the graph. Theorem 5.5 of Hammond et al. (2011) gives the following localization result for graph frames:

$$\displaystyle \begin{aligned} \frac{\varPsi_{kl}(x)}{\left\lVert \varPsi_{kl} \right\rVert_2} \leq C b^{-k}\,, \end{aligned} $$

(20.14)

for all x with d(x, x _l) ≥ K, under the assumption that the scaling function ζ ₁ is K-times differentiable with vanishing first (K − 1) derivatives in 0, non-vanishing K-th derivative, and the scale parameter k is big enough. This says that Ψ _kl is “localized” around the point x _l. Unfortunately, this result is not informative in our framework for two reasons: first, we chose a function ζ ₁ (see (20.12)) vanishing in a neighborhood of zero, so that all derivatives vanish in the origin, contradicting one of the above assumptions. Secondly, and independently of this first issue, the condition “k is big enough,” and the factor C depend on the size n of the graph and of the largest eigenvalue of the Laplacian. As a consequence it is unclear if this bound covers any interesting part of the spectrum (for k too large, the spectral support [b ^k−2, b ^k] does not contain any eigenvalues, so that Ψ _kl is trivial). Finally, for fixed k the bound also does not give information on the behavior of Ψ _kl(x) when the path distance of x to x _l becomes very large.

On the other hand, the form of the scaling function ζ ₁ used in the present work is based on Coulhon et al. (2012) where a theory of multiscale frame analysis is developed on very general metric spaces under certain geometrical assumptions. In a nutshell, it is proved there that using this construction, the obtained frame functions Ψ _kl(x) are upper bounded by O((d(x, x _l)∕b ^k)^−ν) for ν arbitrary large. We observe that this type of localization estimate is sharper than (20.14) for fixed x and growing k, as well as for fixed scale k and varying x. We conjecture that these theoretical results apply meaningfully in the discrete setting considered here, under the assumption that x ₁, …, x _n are iid from a sufficiently regular distribution P ₀ on a regular manifold $\mathscr {M}$, but it is out of the intended scope of the present chapter to establish this formally. In particular “meaningfully” means that the constants involved in the bounds should be independent of the graph size (otherwise the bounds could potentially be devoid of interest for any particular graph, as pointed out above), a question that we are currently investigating.

4 Denoising

We consider the regression model for fixed design points $\mathfrak {D}=\left \{x_i, i=1\ldots n\right \}$ and observations y _i = f(x _i) + 𝜖 _i (𝜖 _i are independent and identically distributed random variables with ${\mathbf {E}\left (\epsilon _i\right )}=0$ and ${\mathbf {Var}\left (\varepsilon \right )}_i=\sigma ^2$). The aim of denoising is to recover the function $f:\mathfrak {D}\rightarrow \mathbb {R}$ at the design points themselves. We will use the proposed Parseval frame in order to define an estimate ${\widehat {f}}$ of the function f. In what follows, since the $\mathfrak {D}$ is fixed, we identify f with the vector (f(x ₁), …, f(x _n)) and denote y = (y ₁, …, y _n).

Given the frame $\mathscr {F}$ associated to the data points $\mathfrak {D}$ with a multiscale bandpass filter as from Definitions 2 and 3, we denote the frame coefficients $a_{kl}=\left \langle \varPsi _{kl},f \right \rangle $ for f and $b_{kl}=\left \langle \varPsi _{kl},y \right \rangle $ for y. Due to the linearity of the inner product we get $a_{kl}=b_{kl}-\left \langle \varPsi _{kl},\epsilon \right \rangle .$ We estimate the unknown coefficients a _kl by adjusting the known coefficients b _kl by soft-thresholding:

$$\displaystyle \begin{aligned} S_s\left(z, c\right)={\mathrm{sgn}}(z)\, (\left\lvert z \right\rvert-c)_+. \end{aligned} $$

(20.15)

In order to take into account that the frame elements Ψ _kl are not normalized, and generally have different norms, we use element-adapted thresholds of the form $c_{kl}=\sigma \left \lVert \varPsi _{kl} \right \rVert t$ which depend on the variance of $\left \langle \epsilon ,\varPsi _{kl} \right \rangle $ and some global parameter t. Equivalently, this corresponds to first normalizing the observed coefficients b _kl by dividing by their variance, then applying a global threshold to the normalized coefficients, and finally inverting the normalization.

The estimator of f is then the plug-in estimator

$$\displaystyle \begin{aligned} {\widehat{f}}_{S_{s}} = \sum_{k,l} {S_{s}}\left(b_{kl}, c_{kl}\right) \varPsi_{kl} {} = T^* {S_{s}}(b,c), \end{aligned} $$

(20.16)

where S _s(b, c) denotes the vector of thresholded coefficients, and T ^∗ is the synthesis operator of the frame as introduced in Sect. 20.2.2.

To measure the performance of this estimator, we use the risk measure

$$\displaystyle \begin{aligned}Risk({\widehat{f}},f)={{\mathbf{E}}_{\epsilon}\left({\left\lVert {\widehat{f}}-f \right\rVert^2}\right)},\end{aligned} $$

(20.17)

that is, the expected quadratic norm at the sampled points (where $\left \lVert f \right \rVert ^2=\sum _{i=1}^n f(x_i)^2$ is the Euclidean vector norm of f on the observation points), for the performance analysis of an estimator ${\widehat {f}}\in \mathbb {R}^n$.

For bounding the risk of the thresholding estimator ${\widehat {f}}_{S_{s}}$, rather than assuming some specific regularity properties on the function f, it is useful to compare the performance of ${\widehat {f}}_{S_{s}}$ to that of a group of reference estimators. This is called the oracle approach (Candès 2006; Donoho and Johnstone 1994): can the proposed estimator have a performance (almost) as good as the best estimator (for this specific f) in a reference family (that is to say, as good as if an oracle would have given us advance knowledge of which reference estimator is the best for this function f). We review here briefly some important results.

A suitable class of simple reference estimators consists of “keep or kill” (or diagonal projection) estimators, that keep without changes the observed coefficients b _k,l for (k, l) in some subset I, and put to zero the coefficients for indices outside of I:

$$\displaystyle \begin{aligned} {\widehat{f}}_I := \sum_{(k,l) \in I} b_{kl} \varPsi_{kl} = T^* {\widehat{a}}^I_{kl}, \end{aligned} $$

(20.18)

where ${\widehat {a}}^I_{kl}=b_{kl} {\mathbf {1}\{(k,l)\in I\}}$. Now using the frame reconstruction formula and (20.6), we obtain

$$\displaystyle \begin{aligned} \begin{array}{rcl} {{\mathbf{E}}_{\epsilon}\left({\left\lVert {\widehat{f}}_I-f \right\rVert^2}\right)} &\displaystyle = &\displaystyle {{\mathbf{E}}_{\epsilon}\left(\left\lVert T^*(a - {\widehat{a}}^I) \right\rVert^2\right)}\\ &\displaystyle \leq &\displaystyle {{\mathbf{E}}_{\epsilon}\left({\left\lVert a-{\widehat{a}}^I \right\rVert^2}\right)}\\ &\displaystyle = &\displaystyle \sum_{(k,l)} \big( a_{kl}^2 {\mathbf{1}\{(k,l)\not\in I\}}\\ &\displaystyle &\displaystyle + \sigma^2 \left\lVert \psi_{kl} \right\rVert^2 {\mathbf{1}\{(k,l) \in I\}}\big)\,. \end{array} \end{aligned} $$

(20.19)

Therefore, the optimal (oracle) choice of the index set I ^∗ obtained by minimizing the above upper bound is given by

$$\displaystyle \begin{aligned} \begin{array}{rcl} &\displaystyle (k,l)\in I^* ~~&\displaystyle \Leftrightarrow ~~\left\langle\,f,\varPsi_{kl} \right\rangle^2 \geq \sigma^2 \left\lVert \varPsi_{kl} \right\rVert^2 ~\mbox{ ~(keep) ~}\\ &\displaystyle (k,l)\notin I^*~~&\displaystyle \Leftrightarrow ~~\left\langle\,f,\varPsi_{kl} \right\rangle^2 \leq \sigma^2 \left\lVert \varPsi_{kl} \right\rVert^2 ~\mbox{ ~(kill) ~}.\vspace{-2.5pt} \end{array} \end{aligned} $$

(20.20)

One deduces from this that

$$\displaystyle \begin{aligned} \inf_I{{\mathbf{E}}_{\epsilon}\left({\left\lVert {\widehat{f}}_I-f \right\rVert^2}\right)} \leq \sum_{(k,l)\in N} \min\left(\left\langle\,f,\varPsi_{kl} \right\rangle^2, \sigma^2\left\lVert \varPsi_{kl} \right\rVert^2\right) =: OB(f)\,.\end{aligned} $$

(20.21)

The relation of soft thresholding estimators to the collection of keep-or-kill estimators on a Parseval frame is captured by the following oracle-type inequality (see Candès 2006, Section 9)^{Footnote 1}:

Theorem 3

Let {Ψ _kl}_k,l be a Parseval frame and consider the denoising observation model with Gaussian noise. Let ${\widehat {f}}_{S_s}=\sum _{k,l} S_s\left (\left \langle \,y,\varPsi _{kl} \right \rangle ,t_{kl}\right ) \varPsi _{kl}$ be the soft-threshold frame estimator from (20.16). Then with $t_{kl}=\sigma \left \lVert \varPsi _{kl} \right \rVert \sqrt {2 \log (n)}$ the following inequality holds:

$$\displaystyle \begin{aligned} {{\mathbf{E}}_{\epsilon}\left({\left\lVert {\widehat{f}}_{S_s}-f \right\rVert^2}\right)} \leq \left(2\log(n)+1\right) \left( \sigma^2 +OB(f)\right). \end{aligned} $$

(20.22)

To interpret this result, observe that if we renormalize the squared norm by $\frac {1}{n}$, so that it represents averaged squared error per point, we expect (depending on the regularity of f) the order of magnitude of n ⁻¹ OB(f) to be typically a polynomial rate O(n ^−ν) for some ν < 1. Then the term σ ²∕n is negligible in comparison, and the oracle inequality states that the performance of ${\widehat {f}}_{S_s}$ is only worse by a logarithmic factor than the performance obtained with the optimal, f-dependent choice of I in a keep-or-kill estimator.

For this tight oracle inequality to hold, it is particularly important that a Parseval frame is used. While thresholding strategies can also be applied to the coefficients of a frame that is not Parseval, the reconstruction step is less straightforward (the canonical dual frame must be computed for reconstruction from the thresholded coefficients, see Sect. 20.2.2); furthermore, an additional factor B∕A comes into the bound (A ≤ 1 ≤ B being the frame bounds from definition (20.1)) (see, for instance, Haltmeier and Munk 2014, Prop. 3.10). Therefore, the performance of simple thresholding estimates deteriorates when used with a non-Parseval frame.

5 Numerical Experiments

We investigate the performance of the proposed method for denoising on two testbed datasets where the ground truth is known and the design points are drawn randomly iid from a distribution on a manifold. More precisely, we will consider one example where the design points $\mathfrak {D}$ are drawn uniformly (n = 500) on the unit square, which is then rolled up into a “swiss roll” shape in 3D. We consider a very simple target function represented (on the original unit square) as a piecewise constant function (with values 5 and − 3) on two triangles, displaying a sharp discontinuity along one diagonal of the square and very smooth regularity elsewhere. This function is observed with an additional Gaussian noise of variance σ ² = 1. In the second example the design points $\mathfrak {D}$ are drawn uniformly (n = 500) on the unit sphere in $\mathbb {R}^3$. The target function remains a piecewise constant function, defined on the two parts of the sphere when intersecting it with a chosen plane. Again, this function is observed with an additional Gaussian noise of variance σ ² = 1. For the swiss roll example as well as for the sphere example, one sample consisting of design points and noisy function values is displayed in Fig. 20.4.

In each example, we consider the different types of neighborhood graphs described in Sect. 20.2.3. Following usual heuristics, for the construction of the k-NN graph we take $k=7 \approx \log n$; for the ε-graph, we take for ε the average distance to the k = 7th nearest neighbor, and for weighted graphs we take Gaussian weights, where the bandwidth λ is calibrated so that points at the distance ε defined above are given weight 0.5.

After constructing the (weighted or unweighted) graph Laplacian, we compute explicitly its eigendecomposition. For the construction of the frame via the multiscale bandpass filter, we use a ${\mathscr {C}}^3$ piecewise polynomial plateau function g satisfying the support constraints of Definition 3 for b = 2 (i.e., constant equal to 1 for x ≤ 0.5, and zero for x ≥ 1). While this function is not ${\mathscr {C}}^\infty $, it has the advantage of fast computation.

We compare the denoising performance of the following competitors: Parseval frame with soft thresholding, soft thresholding applied to the Laplacian Eigenmaps orthonormal basis, and truncated expansion in the Laplacian Eigenmaps basis (only the k coefficients corresponding to the first eigenvalues are kept, without thresholding). The latter method is in the spirit of Belkin and Niyogi (2002). It is well-known (from the regular grid case) that the “universal” theoretical threshold $\sigma \sqrt {\log n}$ is often too conservative in practice. For a fair comparison, we therefore compute the mean squared error (MSE) of both thresholding methods for varying threshold t (still modulated by $\left \lVert \varPsi _{kl} \right \rVert $ for the Parseval frame). Comparison of the MSE for one sample across the t-range for two particular settings is plotted in Fig. 20.4. For all studied settings (different graph and graph Laplacian types), for the same threshold level t we observed that the frame-based method systematically shows a noticeable improvement.

In Table 20.1 we report the minimum MSEs and their standard error (averaged over m = 50 samples of design points and independent noise) for different methods over the possible range of the parameter (threshold level t, resp. number of coefficients for truncated expansion), both for the swissroll and for the sphere example. We observe an improvement of 20–25% across the different settings (the best overall results being obtained with weighted graphs and the unnormalized Laplacian). We also compared to the more traditional methods of kernel smoothing (Nadaraya-Watson estimator) and kernel ridge regression, using a Gaussian kernel (also with optimal choices of bandwidth and regularization parameter), and observed a comparable performance improvement. While it is not realistic to assume that the optimal parameter choice is known in practice, it is fair to compare all methods under their respective optimal parameter settings, as parameter selection methods will induce a comparable performance hit with respect to the best setting.

Table 20.1 MSE performance under optimal parameter choice

Full size table

6 Outlook

Following the recently introduced idea of generalizing the Littlewood-Paley spectral decomposition, we constructed explicitly a Parseval frame of functions on a neighborhood graph formed on the data points. We established that a thresholding strategy on the frame coefficients has superior performance for the denoising problem as compared to usual, spectral or non-spectral, approaches. Future developments include extension of this methodology to the semisupervised learning setting, and a stronger theoretical basis for spatial localization.

Notes

1.
Candès (2006) only hints at the proof; we provide a proof in the appendix for completeness.

References

Belkin M, Niyogi P (2002) Using manifold structure for partially labeled classification. In: NIPS, pp 929–936
Google Scholar
Belkin M, Niyogi P (2003) Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput 15(6):1373–1396
Article Google Scholar
Bickel P, Li B (2007) Local polynomial regression on unknown manifolds. In: Complex datasets and inverse problems: tomography, networks and beyond. IMS lecture notes, vol 54. Institute of Mathematical Statistics, Bethesda, pp 177–186
Chapter Google Scholar
Candès E (2006) Modern statistical estimation via oracle inequalities. Acta Numer 15:257–325
Article MathSciNet Google Scholar
Casazza P, Kutyniok G, Philipp F (2013) Introduction to finite frame theory. In: Casazza PG, Kutyniok G (eds) Finite frames, applied and numerical harmonic analysis. Birkhäuser, Boston, pp 1–53
Chapter Google Scholar
Christensen O (2008) Frames and bases: an introductory course. In: Applied and numerical harmonic analysis. Birkhäuser, Boston
Google Scholar
Coulhon T, Kerkyacharian G, Petrushev P (2012) Heat kernel generated frames in the setting of Dirichlet spaces. J Fourier Anal Appl 18(5):995–1066
Article MathSciNet Google Scholar
Donoho DL, Johnstone IM (1994) Ideal spatial adaptation by wavelet shrinkage. Biometrika 81(3):425–455
Article MathSciNet Google Scholar
Gavish M, Nadler B, Coifman RR (2010) Multiscale wavelets on trees, graphs and high dimensional data: theory and applications to semi supervised learning. In: Fürnkranz J, Joachims T (eds) ICML. Omnipress, Madison, pp 367–374
Google Scholar
Haltmeier M, Munk A (2014) Extreme value analysis of empirical frame coefficients and implications for denoising by soft-thresholding. Appl Comput Harmon Anal 36(3):434–460. https://doi.org/10.1016/j.acha.2013.07.004
Article MathSciNet Google Scholar
Hammond DK, Vandergheynst P, Gribonval R (2011) Wavelets on graphs via spectral graph theory. Appl Comput Harmon Anal 30(2):129–150
Article MathSciNet Google Scholar
Han D (2007) Frames for undergraduates. In: Student mathematical library. American Mathematical Society, Providence
Google Scholar
Kpotufe S (2011) k-NN regression adapts to local intrinsic dimension. In: NIPS, pp 729–737
Google Scholar
Kpotufe S, Dasgupta S (2012) A tree-based regressor that adapts to intrinsic dimension. J Comput Syst Sci 78(5):1496–1515
Article MathSciNet Google Scholar
Roweis S, Saul L (2000) Nonlinear dimensionality reduction by locally linear embedding. Science 290:2323–2326
Article Google Scholar
Tenenbaum J, de Silva V, Langford J (2000) A global geometric framework for nonlinear dimensionality reduction. Science 290:2319–2323
Article Google Scholar

Download references

Acknowledgements

The authors acknowledge the financial support of the German DFG, under the Research Unit FOR-1735 “Structural Inference in Statistics—Adaptation and Efficiency.”

Author information

Authors and Affiliations

Institute of Mathematics, University of Potsdam, Potsdam, Germany
Franziska Göbel & Gilles Blanchard
Department of Computer Science, University of Tübingen, Tübingen, Germany
Ulrike von Luxburg

Authors

Franziska Göbel
View author publications
You can also search for this author in PubMed Google Scholar
Gilles Blanchard
View author publications
You can also search for this author in PubMed Google Scholar
Ulrike von Luxburg
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Gilles Blanchard .

Editor information

Editors and Affiliations

Ladislaus von Bortkiewicz Chair of Statistics, C.A.S.E. Center for Applied Statistics & Economics, Humboldt-Universität zu Berlin, Berlin, Germany
Wolfgang Karl Härdle
Institute of Statistics, National Chiao Tung University, Hsinchu, Taiwan
Henry Horng-Shing Lu
School of Statistics, University of Minnesota, Minneapolis, USA
Xiaotong Shen

Appendix

1.1 Proof of Theorem 3

Theorem 3 states a oracle-type inequality which captures the relation of soft thresholding estimators $\hat {f}_{S_s}=\sum _{k,l} S_s \left (\left \langle \,y,\varPsi _{kl} \right \rangle ,t_{kl}\right ) \varPsi _{kl}$ defined in (20.16) to the collection of keep-or-kill estimators on a Parseval frame. This result is known in the literature (see Candès 2006, Section 9), but we provide a short self-contained proof for completeness, modulo a technical result from Donoho and Johnstone (1994) for soft thresholding of a single one-dimensional Gaussian variable, which is basic for the Proof of Theorem 3.

Lemma 1

For 0 ≤ δ ≤ 1∕2, $t=\sqrt {2\log (\delta ^{-1})}$ and $X\sim \mathscr {N}(\mu ,1)$

$$\displaystyle \begin{aligned} \begin{array}{rcl}{} {{\mathbf{E}}_{X}\left({\left(S_s(X,t)-\mu\right)^2}\right)}&\displaystyle \leq &\displaystyle (2\log ( \delta^{-1})+1)(\delta+\min(1,\mu^2) )\\ &\displaystyle =&\displaystyle (t^2+1)\left( \exp \left(-\frac{t^2}{2}\right) +\min(1,\mu^2) \right). \end{array} \end{aligned} $$

(20.23)

The proof of this lemma can be found in appendix 1 of Donoho and Johnstone (1994). Now we are able to prove Theorem 3.

Proof

First note that for y = τx, τ > 0, we have

$$\displaystyle \begin{aligned} S_s(y,u)=\tau S_s\left(x, \frac{u}{\tau}\right). \end{aligned} $$

(20.24)

Secondly we remark that

$$\displaystyle \begin{aligned} \frac{\left\langle\,y,\varPsi_{kl} \right\rangle}{\sigma\left\lVert \varPsi_{kl} \right\rVert}\sim\mathscr{N}\left(\frac{a_{kl}}{\sigma\left\lVert \varPsi_{kl} \right\rVert},1\right). \end{aligned} $$

(20.25)

Considering now the risk of the soft thresholding estimator $\hat {f}_{S_s}$ we get

$$\displaystyle \begin{aligned} \begin{array}{rcl} {\mathbf{E}\left(\left\lVert \hat{f}_{S_{s}}-f \right\rVert^2\right)} &\displaystyle =&\displaystyle {\mathbf{E}\left(\left\lVert \sum_{k,l} \left({S_{s}}\left(\left\langle\,y,\varPsi_{kl} \right\rangle, t_{kl}\right)-a_{kl}\right) \varPsi_{kl} \right\rVert^2\right)}\\ &\displaystyle \leq &\displaystyle {\mathbf{E}\left(\sum_{k,l}\left({S_{s}} \left(\left\langle\,y,\varPsi_{kl} \right\rangle, t_{kl} \right)-a_{kl}\right)^2\right)}\\ &\displaystyle =&\displaystyle \sum_{k,l} {\mathbf{E}\left(\left({S_{s}}\left(\left\langle\,y,\varPsi_{kl} \right\rangle, t_{kl}\right)-a_{kl}\right)^2\right)}. \end{array} \end{aligned} $$

(20.26)

by using inequality (20.6). By applying (20.24) and then (20.23) with $t=\sqrt {2\log (n)}$ it follows that

$$\displaystyle \begin{aligned} \begin{array}{rcl} &\displaystyle &\displaystyle {\mathbf{E}\left(\left\lVert \hat{f}_{S_s}-f \right\rVert^2\right)} \leq \sum_{k,l} \sigma^2\left\lVert \varPsi_{kl} \right\rVert^2 {\mathbf{E}\left(\left({S_{s}}\left(\frac{\left\langle\,y,\varPsi_{kl} \right\rangle}{\sigma\left\lVert \varPsi_{kl} \right\rVert}, \sqrt{2\log(n)}\right)-\frac{a_{kl}}{\sigma\left\lVert \varPsi_{kl} \right\rVert}\right)^2\right)}\\ &\displaystyle &\displaystyle \quad \leq \sum_{k,l} \sigma^2\left\lVert \varPsi_{kl} \right\rVert^2 (2\log(n)+1)\left(\exp \left(-\frac{2\log(n)}{2}\right) +\min \left(1,\frac{a_{kl}^2}{\sigma^2\left\lVert \varPsi_{kl} \right\rVert^2}\right) \right)\\ &\displaystyle &\displaystyle \quad = \sum_{k,l} (2\log(n)+1)\left( \frac{1}{n}\sigma^2\left\lVert \varPsi_{kl} \right\rVert^2 +\min \left(\sigma^2\left\lVert \varPsi_{kl} \right\rVert^2,a_{kl}^2\right)\right )\\ &\displaystyle &\displaystyle \quad = (2\log(n)+1) \left( \frac{1}{n} \sum_{k,l} \sigma^2\left\lVert \varPsi_{kl} \right\rVert^2 +\sum_{k,l} \min \left(\sigma^2\left\lVert \varPsi_{kl} \right\rVert^2,a_{kl}^2\right) \right). \end{array} \end{aligned} $$

(20.27)

Recalling the Parseval frame property $\sum _{k,l} \left \lVert \varPsi _{kl} \right \rVert ^2=n$, we finally obtain

$$\displaystyle \begin{aligned} \begin{array}{rcl} {\mathbf{E}\left(\left\lVert \hat{f}_{S_s}-f \right\rVert^2\right)}&\displaystyle \leq &\displaystyle (2\log(n)+1) \left( \frac{1}{n}n \sigma^2 +\sum_{k,l} \min \left(\sigma^2\left\lVert \varPsi_{kl} \right\rVert^2,a_{kl}^2\right) \right)\\ &\displaystyle =&\displaystyle (2\log(n)+1) \left( \sigma^2 +\sum_{k,l} \min \left(\sigma^2\left\lVert \varPsi_{kl} \right\rVert^2,a_{kl}^2\right) \right). \end{array} \end{aligned} $$

(20.28)

where we recognize the upper bound $\sum _{k,l} \min \left (\sigma ^2\left \lVert \varPsi _{kl} \right \rVert ^2,a_{kl}^2 \right )=OB(f)$ for the oracle. □

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Göbel, F., Blanchard, G., Luxburg, U.v. (2018). Construction of Tight Frames on Graphs and Application to Denoising. In: Härdle, W., Lu, HS., Shen, X. (eds) Handbook of Big Data Analytics. Springer Handbooks of Computational Statistics. Springer, Cham. https://doi.org/10.1007/978-3-319-18284-1_20

Download citation

DOI: https://doi.org/10.1007/978-3-319-18284-1_20
Published: 18 July 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-18283-4
Online ISBN: 978-3-319-18284-1
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics

Construction of Tight Frames on Graphs and Application to Denoising

Abstract

Similar content being viewed by others

Optimization Methods for Frame Conditioning and Application to Graph Laplacian Scaling

Properties of tight frames that are regular schemes

Spectral Design of Signal-Adapted Tight Frames on Graphs

Keywords

1 Introduction

1.1 Motivation

1.2 Relation to Previous Work

2 Notation and Basics

2.1 Setting

2.2 Frames

Definition 1

Theorem 1 (Properties of Parseval frames)

2.3 Neighborhood Graphs

2.4 Spectral Graph Theory

3 Construction and Properties

3.1 Construction of a Tight Graph Frame

Definition 2

Theorem 2

Proof

Definition 3 (Multiscale Bandpass Filter)

3.2 Spatial Localization

4 Denoising

Theorem 3

5 Numerical Experiments

6 Outlook

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix

Appendix

1.1 Proof of Theorem 3

Lemma 1

Proof

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation