1 Introduction

Combining multiple basis representations and taking advantage of their sparsity properties to build a more efficient representation of complex data has a rather long history in applied harmonic analysis and signal processing. Among the earliest formulations of combined-basis representations, we recall the seminal papers of Coifman and Wickerhauser [5] and Mallat and Zhang [28]. Perhaps the first attempt to introduce a rigorous formalization of these ideas was the method of Basis Pursuit [4] by Chen et al. which established \(\ell ^1\)-norm minimization as a method to promote sparse representations from multiple bases. In more recent years, other mostly empirical papers have further exploited this point of view and provided remarkable applications to problems from signal and image processing. Starck et al. [31, 32], for instance, proposed an algorithmic approach, called morphological component analysis (MCA), which assumes that a signal is the linear mixture of several morphological components, each one endowed with specific geometric properties. Under the assumption that such components are sufficiently distinct and that each one is sparsely represented in a specific basis, MCA algorithms (using \(\ell ^1\)-norm minimization) are able to effectively separate the various signal components in many numerical applications. In addition to such work, we also recall the contributions in [9, 29, 30, 35].

More recently, Donoho and Kutyniok [7, 22] introduced a rigorous mathematical formalization of the problem of separating data into geometrically distinct components. Their motivation is the observation that the success of many successful numerical algorithms based on multiple basis representations “stem from an interplay between geometric properties of objects to be separated and the harmonic analysis for singularities of various geometric types” (cf. [7]). As a mathematical idealization of two-dimensional data containing distinct geometric constituents, they consider distributions on \(\mathbb {R}^2\) of the form \(f= \mathcal {P}+ \mathcal {T}\), where \(\mathcal {P}\) is a collection of point-like singularities and \(\mathcal {T}\) is a cartoon-like image, that is, a planar region enclosed by a smooth closed curve. The question is: how to separate f into its components \(\mathcal {P}\) and \(\mathcal {T}\)?

To address this question, they observe that while points and curves may overlap spatially, they are separated microlocally. Therefore, they construct a sparse representation of f with respect to a joint wavelet-curvelet dictionary, where sparsity is enforced via a procedure of minimization of the expansion coefficients in the \(\ell ^1\)-norm. The choice of such dictionary is due to the fact that wavelets provide very sparse representations of point-like singularities, while curvelets [2] or sherlets [10, 27] provide very sparse representations of curve-like singularities. By applying an \(\ell ^1\)-norm minimization over the expansion coefficients of the combined dictionary, they prove that f can be separated into its components \(\mathcal {P}\) and \(\mathcal {T}\) asymptotically at fine scales.

The proof of this separation result in [7] relies on the heavy machinery of a sparse matrix representation of Fourier integral operators in \(\mathbb {R}^2\) and does not extend directly to the 3-dimensional setting. In [16], we introduced a different and simpler argument to deal with the 3-dimensional setting which is based on techniques we previously developed for the geometric characterization of edge singularities in terms of the shearlet transform [11, 18, 23]. However, this result was limited to the separation of point-wise and polyhedral singularities in \(\mathbb {R}^3\), as our techniques could not be extended to the more difficult situation of curvilinear singularities in \(\mathbb {R}^3\).

In this paper, we finally introduce a new and more powerful argument that allows us to handle the geometric separation problem in the case of 3-dimensional curvilinear singularities. Similar to the general approach in [7, 16], we consider a combined dictionary of wavelets and shearlets and adopt the important notion of cluster coherence as a main tool to prove geometric separation. The most critical and difficult part of this proof is the derivation of appropriate estimates on the cluster coherence of the wavelet and shearlet bases. As we will further explain below, this part of the proof is new and relies in part on techniques developed by the authors for the shearlet-based analysis of curvilinear edges [12, 14, 17].

In addition to solving the 3D geometric separation problem, our new approach also yields a much simpler and more streamlined argument for the corresponding 2D problem, without the need of the machinery based on Fourier integral operators of the original arguments in [7]. We also recall that the geometric separation result has important implications to the solution of the inpainting problem, as shown by King et al. [21].

The rest of the paper is organized as follows. After setting some useful notation, we formulate the geometric separation problem and state our main theorem in Sect. 2. We present the proof of this theorem in Sect. 3.

Notation

In this paper, we adopt the convention that \(x \in \mathbb {R}^3\) is a column vector, i.e., \(x = \begin{pmatrix} x_1 \\ x_2 \\ x_3 \end{pmatrix}\), and that \(\xi \in \widehat{\mathbb {R}}^3\) (in the frequency domain) is a row vector, i.e., \(\xi =(\xi _1, \xi _2, \xi _3)\). A vector x multiplying a matrix \(A \in GL_3(\mathbb {R})\) on the right is understood to be a column vector, while a vector \(\xi \) multiplying A on the left is a row vector. Thus, \(A x \in \mathbb {R}^3\) and \(\xi A \in {\widehat{\mathbb {R}}}^3\). The Fourier transform of \(f \in L^1(\mathbb {R}^3)\) is defined as

$$\begin{aligned} \hat{f}(\xi ) = \int _{\mathbb {R}^3} f(x) \, e^{-2 \pi i \xi x} \, d x, \end{aligned}$$

where \(\xi \in \widehat{\mathbb {R}}^3\), and the inverse Fourier transform is

$$\begin{aligned} \check{f}(x) = \int _{\widehat{\mathbb {R}}^3} f(\xi ) \, e^{2 \pi i \xi x } \, d \xi . \end{aligned}$$

Given the functions f and g, we use the notation \(f \simeq g\) if there exist constants \(0<C_1 \le C_2 < \infty \), independent of x, such that \(C_1 \, g(x) \le f(x) \le C_2 \ g(x)\). Similarly, given the index set J, we use the notation \(f(j) \simeq g(j)\) if there exist \(0<C_1 \le C_2 < \infty \), independent of j, such that \(C_1 \, g(j) \le f(j) \le C_2 \, g(j)\) for all \(j \in J\).

2 The Geometric Separation Problem

As a model of multidimensional data found in many applications, it is often useful to consider functions or distributions containing different types of singularities; for instance, singularities supported at single points or surface boundaries if the domain is \(\mathbb {R}^3\). In this paper, we consider idealized three-dimensional objects of the form \(f = \mathcal {P}+\mathcal {T}\), where \(\mathcal {P}\) is a collection of point singularities and \(\mathcal {T}\) is a section of a paraboloid.

Our goal is to find a highly sparse representation of f, that is, to derive a representation method such that f can be accurately approximated using a ‘very small’ number of representation coefficients. As mentioned above, we can find bases that are ideally suited to specific types of singularities. Wavelets, in particular, offer optimally sparse representations, in a precise sense, for functions with point singularities, while shearlets were shown to provide optimally sparse representations for functions with discontinuities along piecewise smooth surfaces [13, 25]. However, neither wavelets nor shearlets alone (and no other single basis or traditional linear representation methods) are very efficient at representing \(f=\mathcal {P}+\mathcal {T}\). This observation leads to consider a multiple-basis dictionary comprising both wavelets and shearlets. Among all possible representations of f within this dictionary, we look for an ideally sparse representation where wavelets are used to sparsely represent \(\mathcal {P}\) and shearlets to sparsely represent \(\mathcal {T}\).

Let us be more precise about the statement of the problem and the singularities we consider. Following the general idea from [7], we take \(\mathcal {P}\) to be of the form

$$\begin{aligned} \mathcal {P}= \sum _{i=1}^N\big |x -y^{(i)}\big |^{-2}, \end{aligned}$$
(2.1)

which defines a distribution being smooth away from the singularity points \(y^{(i)} \in \mathbb {R}^3\), \(i=1, \dots ,N\). Next, to define a singularity supported on a surface boundary, we let B to be a section of a paraboloid in \(\mathbb {R}^3\) with graph z(uv) for \((u,v) \in U \subset \mathbb {R}^2\). For \(\alpha (u) \in C_0^\infty (U)\) and \(\phi \in \mathcal {S}(\mathbb {R}^3)\), we define a distribution \(\mathcal {T}\) concentrated on B as

$$\begin{aligned} \langle \mathcal {T},\phi \rangle = \int _U \phi (z(u,v),u,v) \, \alpha (u, v) \, d u \, d v. \end{aligned}$$

The reason for choosing the exponent \(-2\) in \(\mathcal {P}\) is that we want to match the energies of \(\mathcal {P}\) and \(\mathcal {T}\) at each scale \(2^{-2j}\), \(j \in \mathbb {Z}\). That is, we want to make the two singularities comparable at each scale. Without this assumption, it would be possible to separate the two components of f at different scales in a relatively simple way, as the energy of each singularity would dominate at a certain scale. By contrast, the model we adopt (as in [7]) makes the separation problem challenging at every scale.

To justify our observation about the matching energies, we remark that \({\widehat{\mathcal {P}}}(\xi ) \simeq |\xi |^{-2}\) (cf. [34, Ch.4]) and this implies that \(\int _{2^{2j}}^{2^{2j+2}} |\widehat{\mathcal {P}}(\xi )|^2 d\xi \simeq 2^{2j}\). In Sect. 3.2, we will show that \(\mathcal {T}\) satisfies a similar estimate so that also in this case \(\int _{2^{2j}}^{2^{2j+2}} |\widehat{\mathcal {T}}(\xi )|^2 d\xi \simeq 2^{2j}.\)

Following the language introduced in [7], we hence state geometric separation problem as follows.

Geometric separation problemGiven the observation\(f=\mathcal {P}+\mathcal {T}\), we want to recover the unknown components\(\mathcal {P}\)andTof f based only on the knowledge that they are of the form (2.1) and (2.2).

To solve this problem, we will adapt the strategy based on \(\ell ^1\) minimization proposed by Donoho and Kutyniok [7]. That is, we will expand f with respect to a representation consisting of the union of a Parseval frame of wavelets in \(L^2(\mathbb {R}^3)\) and a Parseval frame of shearlets in \(L^2(\mathbb {R}^3)\), and we will enforce sparsity by minimizing the representation coefficients in the \(\ell ^1\)-norm. As mentioned above, the sparsity-inducing properties of the \(\ell ^1\)-norm are well known in applied harmonic analysis and play a critical role, for instance, in the celebrated theory of compressed sensing (cf. [3, 6]).

In the following, to simplify notation and avoid unnecessary calculations we will assume that \(\mathcal {P}\) contains only one singularity point. In addition we will assume that the point singularity is centered at the origin, that is

$$\begin{aligned} \mathcal {P}= |x|^{-2}. \end{aligned}$$

For the singularity supported on a surface boundary, we let \(B = \{(\tfrac{1}{2} (x_2^2 + x_3^2), x_2, x_3): \, (x_2, x_3) \in U \}\), where \( U = \{ (x_2, x_3): \, x_2^2 + x_3^2 \le 1 \}\). Hence, for \(\alpha (u) \in C_0^\infty (U)\) and \(\phi \in \mathcal {S}(\mathbb {R}^3)\), we define \(\mathcal {T}\) as

$$\begin{aligned} \langle \mathcal {T},\phi \rangle = \int _U \phi \left( \tfrac{1}{2} \left( x_2^2 + x_3^2\right) , x_2, x_3\right) \, \alpha \left( x_2, x_3\right) \, d x_2 \, dx_3. \end{aligned}$$
(2.2)

Let us next define the wavelets and shearlets systems that we will use to represent \(f=\mathcal {P}+\mathcal {T}\).

2.1 A Parseval Frame of 3D Wavelets

As our wavelet system, we will choose a Parseval frame of Lemariè-Meyer wavelets (cf. [20]) in \(L^2(\mathbb {R}^3)\). This system will be denoted as \(\Phi = \{\phi _\lambda : \lambda \in \Lambda \} \), for \(\Lambda = \{\lambda = (j,k), j \ge -1, k \in \mathbb {Z}^3\}\), where the functions \(\phi _\lambda = \phi _{j,k} \in L^2(\mathbb {R}^3)\) are defined in the Fourier domain by

$$\begin{aligned} {\widehat{\Phi }}_{j,k}(\xi ) = {\left\{ \begin{array}{ll} 2^{-3j} \, W(2^{-2j} \xi ) \,e^{2 \pi i 2^{-2j} \xi k}, &{} \text {for } j\ge 0,\\ {\widetilde{W}}(\xi ) \,e^{2 \pi i \xi k}, &{} \text {for } j =-1, \end{array}\right. } \end{aligned}$$

and \(W, {\widetilde{W}} \in C_0^{\infty }(\mathbb {R}^3)\) satisfy the condition

$$\begin{aligned} |{\widetilde{W}}(\xi )|^2 + \sum _{j \ge 0} |W(2^{-2j}\xi )|^2 = 1, \quad \text {for a.e. }\xi \in {\widehat{\mathbb {R}}}^3. \end{aligned}$$
(2.3)

We assume that the window function W has support \({\mathrm{supp }}(W) \subset [-\frac{1}{2}, \frac{1}{2}]^3 \setminus [- \frac{1}{16}, \frac{1}{16}]^3\) so that the dilated functions \(W_j= W(2^{-2j} \, \cdot )\) have supports inside the Cartesian coronae

$$\begin{aligned} \left[ - 2^{-2j-1}, 2^{-2j-1}\right] ^3 \setminus \left[ - 2^{-2j-4}, 2^{-2j-4}\right] ^3 \subset {\widehat{\mathbb {R}}}^3, \end{aligned}$$
(2.4)

and the resulting collection of window functions \(|{\widetilde{W}}|^2, |W_j|^2,\)\(j \ge 0\), produce a smooth tiling of the frequency space into concentric Cartesian coronae associated with frequency bands indexed by \(j \ge 0\).

Recall that the Parseval frame condition implies that, for any \(f \in L^2(\mathbb {R}^3)\), we have the reproducing formula:

$$\begin{aligned} f = \sum _{\lambda \in \Lambda } \langle f,\phi _\lambda \rangle \, \phi _\lambda , \end{aligned}$$

with convergence in \(L^2\)-norm.

2.2 A Parseval Frame of 3D Shearlets

Shearlet were introduced to overcome certain limitations of conventional wavelets in the analysis of multivariate functions [27]. Similar to the curvelets of Candès and Donoho [2], they form a collection of well localized functions defined not only across several scales and locations, as the conventional wavelets, but also across several orientations and with highly anisotropic shapes, so that they can more efficiently represent functions containing distributed singularities, e.g., edges in images. By combining multiscale analysis and high directional sensitivity, shearlets are able to precisely characterize of the geometry of singularities of functions and distributions of several variables [11, 12, 18, 24] and enable optimally sparse representations, in a precise sense, for a large class of multivariate functions where traditional wavelets are suboptimal [10, 13].

With respect to curvelets, shearlets have some distinctive features: their mathematical structure is derived from the theory of affine systems and the directionality is controlled by shear matrices rather than rotations. This last property enables a unified framework for both continuum and discrete settings since shear transformations preserve the rectangular lattice and this is an advantage in deriving faithful digital implementations [8, 26]. Furthermore, there is a well-developed shearlet-based theory for the analysis of singularities (cf. [19] in addition to the references cited above). This theory sets the foundation for the main ideas that we employ in this paper for the analysis of surface singularities and it is the main reason for selecting this representation in our approach to the geometric separation problem.

Our shearlet system is defined by introducing an angular subdivision within the multiscale decomposition associated with the window functions \({\widetilde{W}}^2, W^2_j\) used above for the construction of the wavelet system. For this construction, we start by first partitioning the Fourier space \( {\widehat{\mathbb {R}}}^3\) into the following 3 pyramidal regions in \({\widehat{\mathbb {R}}}^3\):

$$\begin{aligned} \mathcal {C}_1= & {} \left\{ (\xi _1,\xi _2, \xi _3) \in {\widehat{\mathbb {R}}}^3: \, |\frac{\xi _2}{\xi _1}| \le 1, |\frac{\xi _3}{\xi _1}| \le 1\right\} ,\\ \mathcal {C}_2= & {} \left\{ (\xi _1,\xi _2, \xi _3) \in {\widehat{\mathbb {R}}}^3: \, |\frac{\xi _1}{\xi _2}|< 1, |\frac{\xi _3}{\xi _2}| \le 1\right\} ,\\ \mathcal {C}_3= & {} \left\{ (\xi _1,\xi _2, \xi _3) \in {\widehat{\mathbb {R}}}^3: \, |\frac{\xi _1}{\xi _3}|< 1, |\frac{\xi _2}{\xi _3}| < 1\right\} . \end{aligned}$$

We let \(W \in C^\infty _0(\mathbb {R}^3)\) be the same window as the one defined in Sect. 2.1 and let \( v \in C^\infty (\mathbb {R})\) be an appropriate ‘bump function’ satisfying \({\mathrm{supp }}v \subset [-1,1]\) and

$$\begin{aligned} |v(u-1)|^2+ |v(u)|^2 + |v(u+1)|^2 = 1 \quad \text { for } |u| \le 1. \end{aligned}$$
(2.5)

For \(d=1,2,3\), \(\ell =(\ell _1,\ell _2) \in \mathbb {Z}^2\), a 3D shearlet systems associated with the pyramidal regions\(\mathcal {C}_d\) is a collection

$$\begin{aligned} \left\{ \psi ^{(d)}_{j,\ell ,k}: \, j \ge 0, -2^j \le \ell _1, \ell _2 \le 2^j, k \in \mathbb {Z}^3\right\} , \end{aligned}$$
(2.6)

where

$$\begin{aligned} {\hat{\psi }}^{(d)}_{j,\ell ,k}(\xi ) = \left| \det A_{(d)}\right| ^{-j/2} \, W(2^{-2j} \xi ) \, V_{(d)}\left( \xi A_{(d)}^{-j} B_{(d)}^{{[-\ell ]}}\right) \, e^{2 \pi i \xi A_{(d)}^{-j} B_{(d)}^{{[-\ell ]}} k}, \end{aligned}$$
(2.7)

\( V_{(1)}(\xi _1,\xi _2, \xi _3)= v(\frac{\xi _2}{\xi _1}) v(\frac{\xi _3}{\xi _1}),\)\( V_{(2)}(\xi _1,\xi _2, \xi _3)= v(\frac{\xi _1}{\xi _2}) v(\frac{\xi _3}{\xi _2}),\) and \( V_{(3)}(\xi _1,\xi _2, \xi _3)= v(\frac{\xi _1}{\xi _3}) v(\frac{\xi _2}{\xi _3})\); the matrices \(A_{(d)}\) are given by

$$\begin{aligned} A_{(1)} = \begin{pmatrix} 4 &{} 0 &{} 0 \\ 0 &{} 2 &{} 0 \\ 0 &{} 0 &{} 2 \\ \end{pmatrix}, \, A_{(2)} = \begin{pmatrix} 2 &{} 0 &{} 0 \\ 0 &{} 4 &{} 0 \\ 0 &{} 0 &{} 2 \\ \end{pmatrix}, \, A_{(3)} = \begin{pmatrix} 2 &{} 0 &{} 0 \\ 0 &{} 2 &{} 0 \\ 0 &{} 0 &{} 4 \\ \end{pmatrix}, \end{aligned}$$

and the matrices \(B_{(d)}\), called shear matrices, are defined by

$$\begin{aligned} B_{(1)}^{[\ell ]}= \begin{pmatrix} 1 &{} \ell _1 &{} \ell _2\\ 0 &{} 1 &{} 0 \\ 0 &{} 0 &{} 1 \\ \end{pmatrix}, \, B_{(2)}^{[\ell ]}= \begin{pmatrix} 1 &{} 0 &{} 0\\ \ell _1 &{} 1 &{} \ell _2 \\ 0 &{} 0 &{} 1 \\ \end{pmatrix}, \, B_{(3)}^{[\ell ]}= \begin{pmatrix} 1 &{} 0 &{} 0\\ 0 &{} 1 &{} 0 \\ \ell _1 &{} \ell _2 &{} 1 \\ \end{pmatrix}. \end{aligned}$$

Notice that \((B_{(d)}^{{[\ell ]}})^{-1}=B_{(d)}^{{[-\ell ]}}\). Let us make a few observations about the properties of these systems.

Due to the support conditions on W and v, the elements of the system of shearlets (2.6) have compact support in Fourier domain. In particular, for \(d=1\), the shearlets \({\hat{\psi }}^{(1)}_{j,\ell ,k}(\xi )\) can be written explicitly as

$$\begin{aligned} {\hat{\psi }}^{(1)}_{j,\ell _1,\ell _2,k}(\xi ) = 2^{-2j} \, W(2^{-2j} \xi ) \, v \Bigl (2^j \frac{\xi _2}{\xi _1}-\ell _1 \Bigr ) \, v \Bigl (2^j \frac{\xi _3}{\xi _1}-\ell _2 \Bigr ) \, e^{2 \pi i \xi A_{(1)}^{-j} B_{(1)}^{{[-\ell _1,-\ell _2]}} k}, \end{aligned}$$
(2.8)

showing that their supports are contained inside the regions

$$\begin{aligned}&U_{j,\ell } = U_{j,\ell _1, \ell _2} \nonumber \\&\quad = \left\{ \left( \xi _1,\xi _2,\xi _3\right) : \xi _1 \in \left[ -2^{2j-1},-2^{2j-4}\right] \cup \left[ 2^{2j-4},2^{2j-1}\right] , \left| \tfrac{\xi _2}{\xi _1} -\ell _1 2^{-j}\right| \right. \nonumber \\&\quad \left. \le 2^{-j}, \left| \tfrac{\xi _3}{\xi _1} -\ell _2 2^{-j}\right| \le 2^{-j} \right\} . \end{aligned}$$
(2.9)

That is, the shearlets \({\hat{\psi }}^{(1)}_{j,\ell ,k}\) have supports contained in trapezoidal regions defined at various scales, controlled by \(j>0\), and various orientations, controlled by the shear parameters \(\ell _1,\ell _2\). This shows that the elements of the shearlet system (2.6) are well localized functions, defined over a range of locations, scales and orientations, controlled by the indices \(j, \ell =(\ell _1,\ell _2)\) and k, respectively.

A Parseval frame of shearlets for \(L^2(\mathbb {R}^3)\) is obtained by combining the shearlet systems (2.6) associated with the cone-shaped regions \(\mathcal {C}_d\) together with the coarse scale system \(\{\phi _{-1,k}: k \in \mathbb {Z}^3\}\). Note that this is the same coarse scale system of the Lemeriè-Meyer wavelet system defined above. For brevity, in the following we will denote the Parseval frame of 3D shearlets as \(\Psi = \{\psi _\eta : \eta \in M\} \subset L^2(\mathbb {R}^3)\), where the index set is \(M= M_C \cup M_F\), \(M_C =\{k\in \mathbb {Z}^3\}\) is the set of indices associated with coarse-scale shearlets and \(M_F=\{\eta =(j,\ell ,k, d): \, j \ge 0, |\ell _1| \le 2^{j}, |\ell _2| \le 2^{j}, k \in \mathbb {Z}^2, d=1,2,3\}\) is the set of indices associated with fine-scale shearlets. As above, the Parseval frame condition implies that, for any \(f \in L^2(\mathbb {R}^3)\), we have the reproducing formula:

$$\begin{aligned} f = \sum _{\eta \in M} \langle f,\psi _\eta \rangle \, \psi _\eta , \end{aligned}$$

with convergence in \(L^2\)-norm.

Remark To simplify the presentation, our construction above omits a technical detail. To ensure that the frame of shearlets obtained by combining the elements from the different pyramidal systems is tight while guaranteeing that all such elements are \(C_0^\infty \) in the Fourier domain, one has to slightly modify the functions \(\psi ^{(d)}_{j,\ell _1,\ell _2,k}\), for \(\ell _1,\ell _2 = \pm 2^{j}\) (these are the functions whose support overlaps the boundaries of the regions \(\mathcal {P}_d\)) by merging shearlet elements from contiguous pyramidal regions. The construction of these boundary shearlets is rather technical and plays no role in the paper. We refer the interested reader to [13, 15].

2.3 Main Theorem

Our main theorem below shows that it is possible to separate geometrically distinct components of a distribution \(f=\mathcal {P}+\mathcal {T}\) by taking advantage of the sparsity properties of the Parseval frames of wavelets and shearlets. Similar to the result in [7, 16], the separation result holds asymptotically in scale, that is, we can separate point singularities and singularities along a parabolic surface only as a limiting process, when the scale tends to zero (i.e., \(j \rightarrow \infty \)). To formulate this result, we start by deriving an appropriate multiscale decomposition of f.

We recall that the window functions \(W_j\) used in the construction of the wavelet and shearlet systems produce a multiscale decomposition of the Fourier space \(L^2(\mathbb {R}^3)\) into the Cartesian coronae (2.4). Consistently with this decomposition, we define a family of band-pass filters \(F_j\), \(j \ge -1\), by \(\widehat{F_j}(\xi ) = W(2^{-2j} \xi )\), for \(j \ge 0\), \(\widehat{F_{-1}}(\xi ) = {\widetilde{W}}(\xi )\). By applying these filters to f, \(\mathcal {P}\) and \(\mathcal {T}\) we obtain

$$\begin{aligned} \mathcal {P}_j = \mathcal {P}* F_j, \quad \mathcal {T}_j = \mathcal {T}* F_j, \quad f_j = f *F_j, \end{aligned}$$
(2.10)

where, as observed above, we have that \(\Vert \mathcal {P}_j\Vert _2 \simeq 2^{-j}\) and \(\Vert \mathcal {T}_j\Vert _2 \simeq 2^{-j}\). It follows that the functions \({\hat{f}}_j\) are band-limited with frequency support contained in the Cartesian coronae \([-2^{2j-1}, 2^{2j-1}]^3\setminus [-2^{2j-4},2^{2j-4}]^3 \subset {\widehat{\mathbb {R}}}^3\). In addition, for \( f \in L^2(\mathbb {R}^3)\), it follows from (2.3) that

$$\begin{aligned} f = \sum _j F_j *f_j, \end{aligned}$$
(2.11)

with convergence in the \(L^2\)-norm.

Let \(\mathcal {F}_j\) denote the range of the operator of convolution with \(F_j\). Using a simple calculation one can verify that shearlets and wavelets at level \(j'\) are orthogonal to \(\mathcal {F}_j\) unless \(|j'-j| \le 1\), that is, unless \(j' = j-1, j, j+1\). It is useful to introduce the notation

$$\begin{aligned} \Lambda _j = \{ \lambda = (j',k): \, |j'-j| \le 1, \, k \in \mathbb {Z}^3 \} \subset \Lambda \end{aligned}$$
(2.12)

and

$$\begin{aligned} M_j= & {} \left\{ \eta = \left( j',\ell ,k,d\right) : \, |j'-j| \le 1, \left| \ell _1\right| \le 2^j,\,\right. \nonumber \\&\left. \left| \ell _2\right| \le 2^j,\, k \in \mathbb {Z}^3, d =1,2,3 \right\} \subset M. \end{aligned}$$
(2.13)

Due to the Parseval frame property and the observation above, any function \(f_j \in \mathcal {F}_j\) can be expanded using only the elements of the wavelet system in \(\Lambda _j\) but also using only the elements of the shearlet system in \(M_j\). In other words, at the level j, we can use the wavelet system to represent \(f_j\) as

$$\begin{aligned} f_j = \sum _{j' = j-1}^{j'=j+1}\sum _{k' \in \mathbb {Z}^2} \langle f_j,\phi _{j', k'}\rangle \, \phi _{j', k'} = \sum _{\lambda \in \Lambda _j}\langle f_j,\phi _{\lambda }\rangle \, \phi _{\lambda }; \end{aligned}$$

or we can use the shearlet system to represent \(f_j\) as

$$\begin{aligned} f_j = \sum _{d=1}^3\sum _{j' = j-1}^{j'=j+1}\sum _{|\ell _1| \le 2^{j'}}\sum _{|\ell _2| \le 2^{j'}} \sum _{k \in \mathbb {Z}^2}\langle f_j,\psi _{j',\ell _1, \ell _2, k}^{(d)}\rangle \, \psi _{j',\ell _1, \ell _2, k}^{(d)} = \sum _{\eta \in M_j}\langle f_j,\psi _{\eta }\rangle \, \psi _\eta . \end{aligned}$$

Clearly, we can also consider a combined representation of the form

$$\begin{aligned} f_j = \sum _{\lambda \in \Lambda _j} p_\lambda \, \phi _{\lambda } + \sum _{\eta \in M_j} t_\eta \, \psi _\eta , \end{aligned}$$

for an appropriate choice of coefficients \(p=(p_\lambda )\) and \(t=(t_\eta )\). Since the last expression involves an overcomplete dictionary, there are many possible choices of coefficients p and t, some of which may provide sparser representations than either one of the two expansions above. Similar to [7, 16], we seek a solution enabling a geometric separation, that is, we consider the following dual-frame component separation problem based on \(\ell _1\) minimization:

$$\begin{aligned} (P_j^*, T_j^*) = \arg \min (\Vert p\Vert _1 + \Vert t\Vert _1), \quad \text {subject to } f_j = P_j + T_j, \end{aligned}$$
(2.14)

where \(p_\lambda = \langle P_j,\phi _{\lambda }\rangle \), \(\lambda \in \Lambda _j\) and \(t_\eta = \langle T_j,\psi _{\eta }\rangle ,\)\( \eta \in M_j\). It follows from (2.11) that, if we let \(\tilde{P} = \sum _j F_j *P_j\), \(\tilde{T} = \sum _j F_j *T_j,\) then we can express f as the superposition \( f = \tilde{P} + \tilde{T}\).

The main result of our paper is the following theorem, showing that, by applying \(\ell _1\) minimization over the expansion coefficients of f with respect to our combined wavelet-shearlet dictionary, we achieve the separation of the distinct geometric objects \(\mathcal {P}\) and \(\mathcal {T}\), asymptotically at fine scales. That is, asymptotically as the scale tends to zero, the pointlike component of f is captured by the Parseval frame of wavelets and the curvilinear component of f is captured by the Parseval frame of shearlets.

Theorem 2.1

Let \(\Phi \) and \(\Psi \) be the Parseval frames of wavelets and shearlets, respectively, defined above, with \(\Lambda _j\), \(M_j\) given by (2.12)–(2.13). Let \(f_j=P_j+T_j\) be given as above and \(\mathcal {P}_j\), \(\mathcal {T}_j\) be given by (2.10). Then

$$\begin{aligned} \lim _{j \rightarrow \infty } \frac{\Vert P_j - \mathcal {P}_j \Vert _{1, \Phi } + \Vert T_j - \mathcal {T}_j \Vert _{1, \Psi }}{\Vert \mathcal {P}_j \Vert _2 + \Vert \mathcal {T}_j \Vert _2} =0. \end{aligned}$$

This theorem extends our previous result in [16], where \(\mathcal {T}\) was a piecewise linear singularity. We also recall that the geometric separation result originally obtained in [7] deals with 2D images containing point-like and smooth curve-like singularities, The approach presented in this paper can be easily adapted to the two-dimensional case yielding a much simpler and more direct proof of the result in [7].

As already observed in [7], the method of geometric separation presented in this work can be generalized to other situations, for example we can allow small perturbations and consider \(f =(\mathcal {P}+\mathcal {T}+ g) \, h\), where gh are smooth function of rapid decay. Then we can set \(||f_j||_2\) in the denominator of the expression in Theorem 2.1. One could also potentially consider the situation where \(f =(\mathcal {P}+\mathcal {T}+ \mathcal L)\), where \(\mathcal L\) is a line singularity. In this case, one should introduce a third dictionary in addition to wavelets and shearlets that sparsely represents \(\mathcal L\) and has some incoherence with respect to the other two dictionaries. One reasonable choice could a dictionary of ridgelets [1], as it is optimally suited to deal with line singularities.

The rest of the paper is devoted to the proof of Theorem 2.1.

3 Proof of Main Theorem

Our proof follows the general architecture of the proof in [7, 16], which is relies on the notion of cluster coherence. The most critical and difficult part of the proof is about showing that the cluster coherence goes asymptotically to zero for the three-dimensional curvilinear discontinuities considered in this paper. The way we address this part of the proof is completely original and does not follow from the arguments in [7, 16] which in fact cannot handle this situation. The new technical elements of our new arguments are contained in the proofs of Lemmata 3.5, 3.6 and in the proof of Theorem 2.1 in Sect. 3.2.

Let \(\Phi =\{\phi _{\lambda }: \lambda \in \Lambda \}\) and \(\Psi =\{\psi _{\mu }: \mu \in M\}\) be the Parseval frames of 3D wavelets and 3D shearlets introduced above, respectively. For each level \(j \in \mathbb {Z}\), we will identify certain subsets of the indices \(\Lambda \) and M that we denote as \(S_{1,j} \subset \Lambda _j\) and \(S_{2,j} \subset M_j\). Following the terminology in [7], we refer to them as indices of significant wavelet coefficients and indices of significant shearlet coefficients, respectively. These index sets will identify, essentially, those wavelet and shearlet coefficients whose magnitude is above a certain scale-dependent threshold (hence, the name ‘significant’). Their explicit definition, when the expansion coefficients are computed on \(f=\mathcal {P}+\mathcal {T}\), will be determined in Sect. 3.1 (for \(S_{1,j}\)) and Sect. 3.2, in the proof of Theorem 2.1 (for \(S_{2,j}\)).

Corresponding to the sets \(S_{1,j}\) and \(S_{2,j}\), we define the wavelet approximation error and the shearlet approximation error at the level j as

$$\begin{aligned} \delta _{1,j} = \sum _{\lambda \in S_{1,j}^c} | \langle P_j,\phi _\lambda \rangle |, \quad \delta _{2,j} = \sum _{\eta \in S_{2,j}^c} | \langle T_j,\psi _\eta \rangle | \end{aligned}$$

respectively. As we will see below, it will be possible to determine the indices of significant wavelet and shearlet coefficients \(S_{1,j}\) and \(S_{2,j}\) in such a way that the wavelet and shearlet approximation errors are small, meaning that the \(\ell ^1\)-norm of the wavelet and shearlet coefficients is negligible (asymptotically, at fine scales), when the indices are outside the sets \(S_{1,j}\) and \(S_{2,j}\).

We define the cluster coherences as

$$\begin{aligned} \mu _c(S_{1,j}, \Phi ; \Psi ) = \max _{\eta } \sum _{\lambda \in S_{1,j}}|\langle \phi _\lambda ,\psi _\eta \rangle |, \quad \mu _c(S_{2,j}, \Psi ;\Phi ) = \max _{\lambda } \sum _{\eta \in S_{2,j}}|\langle \phi _\lambda ,\psi _\eta \rangle |. \end{aligned}$$

The notion of cluster coherence was originally proposed in [7]. Unlike the more standard definition of coherence, given by \(\mu (\Phi ,\Psi ) = \max _{\lambda ,\eta } |\langle \phi _\lambda ,\psi _\eta \rangle |\), the cluster coherence bounds coherence between a single member of a frame and a cluster of members of another frame.

Let \(\Phi \) be the matrix representation of the Parseval frame of wavelets and \(\Psi \) the matrix representation of our Parseval frame of shearlets. For a \(g_j \in L^2(\mathbb {R}^3)\cap L^1(\mathbb {R}^3)\) such that \({\mathrm{supp }}({\hat{g}}_j) \subset \mathcal {F}_j\), let

$$\begin{aligned} \Vert 1_{S_{1,j}} \Phi ^T g_j \Vert _1 = \sum _{\lambda \in S_{1,j}} | \langle g_j,\phi _\lambda \rangle |, \quad \Vert 1_{S_{2,j}} \Psi ^T g_j \Vert _1 = \sum _{\eta \in S_{2,j}} | \langle g_j,\psi _\eta \rangle |. \end{aligned}$$

We define the joint concentration by

$$\begin{aligned} \kappa = \kappa (S_{1,j}, S_{2,j}) = \sup _{g_j} \frac{ \Vert 1_{S_{1,j}} \Phi ^T g_j \Vert _1 + \Vert 1_{S_{2,j}} \Psi ^T g_j \Vert _1}{\Vert \Phi ^T g_j \Vert _{1, \Phi } + \Vert \Psi ^T g_j \Vert _{1, \Psi }}. \end{aligned}$$

The following known observation (Proposition 2.1 in [7]) illustrates the relationship between joint concentration and data separation.

Proposition 3.1

Suppose that, for \(j \in \mathbb {Z}\), \(f_j= P_j +T_j\) so that each component of \(f_j\) is relatively sparse in \(\Phi \) or \(\Psi \), that is,

$$\begin{aligned} \Vert 1_{S_{1,j}^C} \Phi ^T P_j \Vert _1 \le \delta _{1,j}, \quad \Vert 1_{S_{2,j}^C} \Psi ^T T_j \Vert _1 \le \delta _{2,j}. \end{aligned}$$

If \((P_j^*,T_j^*)\) solves (2.14), then

$$\begin{aligned} \Vert P_j^* - \mathcal {P}_j \Vert _{1, \Phi } + \Vert T_j^* - \mathcal {T}_j \Vert _{1, \Psi } \le \frac{ 2 (\delta _{1,j} + \delta _{2,j})}{1 - 2\kappa }. \end{aligned}$$

A related observation from [7] is that the joint concentration is bounded above by the maximum of the cluster coherences. We have (cf. [7, Lemma 2.1]):

Lemma 3.2

$$\begin{aligned} \kappa (S_{1,j}, S_{2,j}) \le \max \{\mu _c(S_{1,j}, \Phi ; \Psi ),\, \mu _c(S_{2,j}, \Psi ; \Phi ) \} \end{aligned}$$

It follows from Proposition 3.1 and Lemma 3.2 that Theorem 2.1 is proved if we can construct appropriate sets of significant wavelet and shearlet coefficients \(S_{1,j}\) and \(S_{2,j}\) such that \(\delta _{1,j} = o(\Vert \mathcal {P}_j \Vert _{1, \Phi } + \Vert \mathcal {T}_j \Vert _{1, \Psi } )\), \(\delta _{2,j} = o(\Vert \mathcal {P}_j \Vert _{1, \Phi } + \Vert \mathcal {T}_j \Vert _{1, \Psi })\) and

$$\begin{aligned} \mu _c(S_{1,j}, \Phi ; \Psi ) \rightarrow 0, \quad \mu _c(S_{2,j}, \Psi ; \Phi ) \rightarrow 0, \quad \text { as } j \rightarrow \infty . \end{aligned}$$
(3.1)

The rest of the proof is hence devoted to construct such sets \(S_{1,j}\), \(S_{2,j}\) and prove estimates (3.1). In Sect. 3.1, we will select an appropriate set \(S_{1,j}\) and show that \(\mu _c(S_{1,j}, \Phi ; \Psi ) \rightarrow 0\) and \(\delta _{1,j} = o(\Vert \mathcal {P}_j \Vert _{1, \Phi } + \Vert \mathcal {T}_j \Vert _{1, \Psi } )\). This part of the proof is rather simple and follows from an idea similar to [7]. For the difficult part of the proof, concerning the analysis of curvilinear, it is not possible to apply the argument from [7] or [16] and we introduce a novel approach which is derived in Sect. 3.2. Our new argument is inspired in part from techniques for the analysis of singularities developed for the characterization of piecewise smooth boundaries of multivariate functions in [12, 14, 17].

In the following, for all our arguments, it will be sufficient to consider the shearlet system associated with the pyramidal shaped regions \(\mathcal {C}_1 \subset {\widehat{\mathbb {R}}}^3\) only since the systems in \(\mathcal {C}_2\) and \(\mathcal {C}_3\) behave essentially in the same way. The elements (2.8) of such shearlet system can be writtenFootnote 1 as

$$\begin{aligned} {\hat{\psi }}^{(1)}_{j,\ell _1,\ell _2,k}(\xi ) = 2^{-2j} \Gamma _{j,\ell _1, \ell _2}(\xi ) \, e^{2 \pi i \xi A_{(1)}^{-j} B_{(1)}^{{[-\ell _1,-\ell _2]}} k}, \end{aligned}$$
(3.2)

where

$$\begin{aligned} \Gamma _{j,\ell _1, \ell _2}(\xi ) = W(2^{-2j} \xi ) \, v \Bigl (2^j \frac{\xi _2}{\xi _1}-\ell _1 \Bigr ) \, v \Bigl (2^j \frac{\xi _3}{\xi _1}-\ell _2 \Bigr ). \end{aligned}$$

Note that \( A_{(1)}^{-j} B_{(1)}^{{[-\ell _1,-\ell _2]}} k =(2^{-2j} (k_1 - \ell _1 k_2 - \ell _2 k_3), 2^{-j} k_2 + 2^{-j} k_3)\). Each function \(\Gamma _{j,\ell _1, \ell _2}\) is supported inside the set \(U_{j,\ell _1,\ell _2}\), given by (2.9). It is easy to verify that its measure satisfies \(|U_{j,\ell _1, \ell _2}| \le C \, 2^{4j}\).

3.1 Analysis of Point Singularities

This section is very similar to the corresponding section in [16] and is reported below for completeness.

In the following, we will select a set \(S_{1,j}\) and prove that \(\mu _c(S_{1,j}, \Phi ; \Psi ) \rightarrow 0\) and \( \delta _{1,j} = o(\Vert \mathcal {P}_j \Vert _{1, \Phi } + \Vert \mathcal {T}_j \Vert _{1, \Psi })\), asymptotically as \(j \rightarrow \infty \).

Let \(\phi _{j',k^{\prime }}\) and \(\psi _{j,\ell _1, \ell _2,k}\) be generic elements from a Parseval frame of wavelets and shearlets, respectively. Due to our assumptions on the Fourier support of W, for any \(\ell _1, \ell _2, k\) and \(k'\) we have that \(\langle \widehat{\psi _{j,\ell ,k}} ,\widehat{\phi _{j^{\prime },k^{\prime }}} \rangle = 0\) if \(|j-j'| > 1\). Hence, for all large \(j'\) and \(j = j'-1, j', j'+1\), we observe

$$\begin{aligned}&|\langle \widehat{\psi _{j,\ell _1, \ell _2,k}} ,\widehat{{\phi }_{j^{\prime },k^{\prime }}} \rangle | \\&\quad =\left| \int _{\mathbb {R}^2} \left( 2^{-2j}\Gamma _{j,\ell _1, \ell _2}(\xi ) \, e^{- 2\pi i \xi A_{(1)}^{-j} B_{(1)}^{[-\ell _1, -\ell _2]} k}\right) \left( 2^{-3j'} W(2^{-2j'} \xi )e^{2 \pi i 2^{-3j'} \xi \cdot k^{\prime }} \right) d \xi \right| \\&\quad \le 2^{-2j} 2^{-3j'} \int _{\mathbb {R}^3} |\Gamma _{j,\ell _1, \ell _2}(\xi ) \, W(2^{-2j'} \xi )| \, d \xi \\&\quad \le C \, 2^{-2j} 2^{-3j'} \int _{\Omega _{j, \ell _1,\ell _2}} d \xi \le C \,2^{-2j} 2^{-3j'} 2^{4j} \le C \, 2^{-j}, \end{aligned}$$

where C is independent of \(\ell _1, \ell _2, k, k'\) and j.

For a fixed \(\epsilon \in (0,1)\), set \(S_{1,j} = \{(j', k'): j' = j-1, j, j+1; \, |k'| \le 2^{\epsilon j'}\}\). Using the calculation above, we have that

$$\begin{aligned} \mu _c(S_{1,j}, \Phi ; \Psi ) \le C \max _{\ell _1,\ell _2,k} \sum _{j'=j-1}^{j+1} \sum _{|k'| \le 2^{\epsilon j'}} |\langle \widehat{\psi _{j,\ell _1,\ell _2,k}} , \widehat{\phi _{j',k^{\prime }}} \rangle | \le C 2^{(-1 + \epsilon ) j} \end{aligned}$$

and, thus, \(\mu _c(S_{1,j}, \Phi ; \Psi ) \rightarrow 0\), as \(j \rightarrow \infty \).

We also observe that \(\langle \widehat{\phi _{j',k'}} ,\widehat{\mathcal {P}_j}\rangle = 0\) for all \(k'\) if \(|j'-j| > 1\). For \(|j'-j| \le 1,\) we have that

$$\begin{aligned} \langle \widehat{\phi _{j',k'}} ,\widehat{\mathcal {P}_j}\rangle= & {} 2^{-3j'} \, C \int _{\mathbb {R}^3} W(2^{-2j'} \xi ) \, e^{2 \pi i 2^{-2j'} \xi \cdot k^{\prime }} W(2^{-2j} \xi ) |\xi |^{-2} \, d \xi \\= & {} C 2^{-j'} \int _{\mathbb {R}^3} W(\xi ) W(2^{2(j'-j)} \xi )\, e^{2 \pi i \xi \cdot k^{\prime }} |\xi |^{-2} \, d \xi . \end{aligned}$$

Hence, for \(|k'| \ge 2^{\epsilon j}\), integration by parts gives that

$$\begin{aligned} | \langle \widehat{\phi _{j', k'}} ,\widehat{\mathcal {P}_j}\rangle | \le C_N \, 2^{-j} (1 + |k'|)^{-N} \le C_N \, 2^{- (1 + N \epsilon ) j}. \end{aligned}$$

By choosing N sufficiently large, we conclude that

$$\begin{aligned} \delta _{1,j} = \sum _{\lambda \in S_{1,j}^c} | \langle \phi _\lambda ,P_j\rangle | \le C \, 2^{-2j} = o(2^{-j}) = o\left( \Vert \mathcal {P}_j \Vert _{1, \Phi } + \Vert \mathcal {T}_j \Vert _{1, \Psi } \right) . \end{aligned}$$

3.2 Analysis of Curvilinear Singularities

We start by proving the following estimate for the functions \(\mathcal {T}_j\), \(j \ge 0\).

Proposition 3.3

For \(j \ge 0\) we have:

$$\begin{aligned} \Vert \mathcal {T}_j\Vert _2 \simeq 2^{j} \end{aligned}$$

To prove Proposition 3.3, we need the following lemma which is a special case of the classical method of stationary phase (cf. Proposition 8.6 in [33]).

Lemma 3.4

Let \(U_\epsilon \) be the ball in \(\mathbb {R}^2\) with center at the origin and radius \(\epsilon > 0\) and \(\psi \in C_0^{\infty }(U_\epsilon )\). Let \(J(\lambda ) = \int _{\mathbb {R}^2} e^{i \, \lambda \, H(u)} \, \psi (u) \, du\). For \(u_0 \in {\mathrm{supp }}\,\psi \subset U_\epsilon \), if \(\Delta \phi (u_0) = 0\) and the determinant of the Hessian matrix of H at \(u_0\) is not zero, then

$$\begin{aligned} J(\lambda ) = e^{i \,\,\lambda \, H(u_0)} \lambda ^{-1} \left[ a(u(u_0)) + O (\lambda ^{-\frac{1}{2}})\right] , \end{aligned}$$

as \(\lambda \rightarrow \infty \), where \( a(u_0)\) is a smooth function of \(u_0\).

Proof of Proposition 3.3

The Fourier transform of \(\mathcal {T}\), given by (2.2), is

$$\begin{aligned}&\widehat{\mathcal {T}}(\xi ) = \widehat{\chi _B}(\xi ) =\int _U e^{-2 \pi i \xi \cdot (\frac{1}{2}(u_1^2 + u_2^2), u)} \,\alpha (u) \,du, \end{aligned}$$

where \( \xi \in \mathbb {R}^3\) and the sets BU are defined in Sect. 2. By converting to polar coordinates with \(\xi = \rho \, \Theta (\theta , \phi )\), where \(\Theta (\theta , \phi ) = (\cos \theta \sin \phi , \sin \theta \sin \phi , \cos \phi )\), \(\theta \in [0,2 \pi ]\) and \(\phi \in [0,\pi ]\), we have

$$\begin{aligned}&\widehat{\mathcal {T}}(\rho , \theta , \phi ) = \int _U e^{-2 \pi i \rho \Theta (\theta , \phi ) \cdot (\frac{1}{2}(u_1^2 + u_2^2), u)} \,\alpha (u) \,du. \end{aligned}$$

Let \(H(u) = \Theta (\theta , \phi ) \cdot (\frac{1}{2} u_1^2 + \frac{1}{2} u_2^2, u)\). Then \(\Delta H(u) = \big (\Theta (\theta , \phi ) \cdot (u_1, 1, 0), \Theta (\theta , \phi ) \cdot (u_2, 0, 1)\big )\). It is easy to see that, for the given \(\theta \) and \(\phi \), the solution \(u_{\theta , \phi }\) of \(\Delta H(u) = (0, 0)\) is \(u_{\theta , \phi } = - (\tan \theta , \sec \theta \cot \phi ).\) Without loss of generality we may assume that the solution \(u_{\theta , \phi } \in U\) for all \(\theta \) and \(\phi \).

Now applying Lemma 3.4 and omitting the higher order decay term, for \(\rho \rightarrow \infty \), we have

$$\begin{aligned} {\widehat{\mathcal {T}}}(\rho , \theta , \phi ) = \frac{1}{\rho } \,a(u_{\theta , \phi }) \, e^{-2 \pi i \rho H(u_{\theta , \phi })} \end{aligned}$$
(3.3)

and, thus, we have

$$\begin{aligned} \Vert \mathcal {T}_j\Vert _2^2 \simeq \int _{2^{2j}}^{2^{2j + 2}} \int _0^{2 \pi } \int _0^{\pi } \rho ^{-2} a^2(u_{\theta , \phi }) \rho ^2 \sin \phi \, d\theta \, d \phi \, d \rho \simeq 2^{2j}. \end{aligned}$$

It follows that \(\Vert \mathcal {T}_j\Vert _2 \simeq 2^{j}\). \(\square \)

We also need the following lemma

Lemma 3.5

Let \(k = (k_1, k_2, k_3) \in \mathbb {Z}^3\), \(k' = (k_1', k_2', k_3') \in \mathbb {Z}^3\) and \(\ell = (\ell _1, \ell _2) \in \mathbb {Z}^2\) with \(|\ell _1| \le 2^j,\, |\ell _2| \le 2^j\), let \(Q_{k, k'} = \{\ell =(\ell _1,\ell _2): \, | \frac{1}{2}(1 + \ell _1)^2 + \frac{1}{2} (1 + \ell _2)^2 + k_2 + k_3 + k_1' + 2^{-j} \ell _1 k_2' + 2^{-j} \ell _2 k_3'| \le 2^{\frac{1}{2} j} \}\). Then the cardinality of the set \(Q_{k, k'}\) satisfies the inequality \(\#(Q_{k, k'}) \le C \, 2^{\frac{1}{2} j},\) with C independent of k and \(k'\).

Proof

We have that

$$\begin{aligned}&\frac{1}{2}\left( 1 + \ell _1\right) ^2 + \frac{1}{2} \left( 1 + \ell _2\right) ^2 + k_2 + k_3 + k_1' + 2^{-j} \ell _1 k_2' + 2^{-j} \ell _2 k_3'\\&\quad = \frac{1}{2}\left( 1 + \ell _1\right) ^2 + \frac{1}{2} \left( 1 + \ell _2\right) ^2 + k_2 + k_3 + k_1' + 2^{-j} k_2'\left( 1 + \ell _1\right) \\&\qquad - 2^{-j} k_2' + 2^{-j} k_3'\left( 1 + \ell _2\right) - 2^{-j} k_3'\\&\quad = \frac{1}{2}\left( 2^{-j} k_2' +1 + \ell _1\right) ^2 + \frac{1}{2} \left( 2^{-j} k_3' + 1 + \ell _2\right) ^2 \\&\qquad +\, k_2 + k_3 + k_1' - 2^{-j} k_2' - 2^{-j} k_3'. \end{aligned}$$

Let \( \alpha = 2( k_2 + k_3 + k_1' - 2^{-j} k_2' - 2^{-j} k_3')\). To prove the Lemma, we consider two separate cases dependent on the value of \(\alpha \).

Case 1: \( \alpha > - 2^{\frac{1}{2} j + 1}.\) In this case, we have \(Q_{k, k'} \subset \{\ell = (\ell _1, \ell _2): \, (2^{-j} k_2'+ 1 + \ell _1)^2 + (2^{-j} k_3' + 1 + \ell _2)^2 \le 2^{\frac{1}{2} j + 2 } \}.\) It follows that there is constant C independent of k and \(k'\) such that

$$\begin{aligned} \#(Q_{k, k'})\le & {} C \, \int _{(x+1+2^{-j} k_2')^2 + (y+1+2^{-j} k_3')^2 \le \, 2^{\frac{1}{2} j + 4} } dx \,dy \\= & {} C \, 2 \pi \, \int _0^{2^{\frac{1}{4} j + 2}} r \, dr\\\le & {} C \, 2^{\frac{1}{2} j}. \end{aligned}$$

Case 2: \( \alpha \le - 2^{\frac{1}{2} j + 1}.\) In this case, we have

$$\begin{aligned} Q_{k, k'}= & {} \left\{ \ell = \left( \ell _1, \ell _2\right) : \, - \alpha - 2^{\frac{1}{2} j + 1} \le \left( 2^{-j} k_2' + 1 + \ell _1\right) ^2 \right. \\&\left. + \left( 2^{-j} k_3' + 1 +\ell _2\right) ^2 \le - \alpha + 2^{\frac{1}{2} j + 1} \right\} . \end{aligned}$$

It follows that there is constant C independent of k and \(k'\) such that

$$\begin{aligned} \#(Q_{k, k'})\le & {} C \, \int _{- \alpha - 2^{\frac{1}{2} j + 1} \le (x+1+2^{-j} k_2')^2 + (y+1+2^{-j} k_3')^2 \le - \alpha + 2^{\frac{1}{2} j + 1} } dx \,dy \\= & {} C \, 2 \pi \, \int _{\sqrt{- \alpha - 2^{\frac{1}{2} j + 1}}}^{\sqrt{- \alpha + 2^{\frac{1}{2} j + 1}}} r \,dr \\\le & {} C \, 2^{\frac{1}{2} j}. \end{aligned}$$

This finishes the proof of the lemma. \(\square \)

We can now complete the proof of our main theorem.

Proof of Theorem 2.1

From (3.3), we have that

$$\begin{aligned} \widehat{\mathcal {T}_j}(\rho , \theta , \phi ) = \tfrac{1}{\rho } W(2^{-2j} \rho \Theta (\theta , \phi )) \, a(u_{\theta , \phi }) \, e^{-2 \pi i \rho H(u_{\theta , \phi })}. \end{aligned}$$

Let \(\beta _{j, \ell , k} = \langle \mathcal {T}_j,\psi ^{(1)}_{j, \ell , k}\rangle \). Using the expression of \(\widehat{\mathcal {T}_j}\) above and of \({\hat{\psi }}^{(1)}\) in (3.2), we have

$$\begin{aligned}&\beta _{j, \ell , k} = \int _{\mathbb {R}^3} \overline{{\hat{\psi }}^{(1)}_{j,\ell ,k}(\xi )} \, \widehat{\mathcal {T}_j}(\xi ) \, d\xi \nonumber \\&\quad = 2^{-2j} \int _{\mathbb {R}^3} \, W^2(2^{-2j} \xi ) \, v \Bigl (2^j \tfrac{\xi _2}{\xi _1}-\ell _1 \Bigr ) \, v \Bigl (2^j \tfrac{\xi _3}{\xi _1}-\ell _2 \Bigr ) \, e^{2 \pi i \xi A_{(1)}^{-j} B_{(1)}^{{[-\ell _1,-\ell _2]}} k} \, \widehat{\mathcal {T}}(\xi ) \, d\xi \nonumber \\&\quad = 2^{-2j} \int _{\mathbb {R}^3} \, W^2(2^{-2j} \xi ) \, v \Bigl (2^j \tfrac{\xi _2}{\xi _1}-\ell _1 \Bigr ) \, v \Bigl (2^j \tfrac{\xi _3}{\xi _1}-\ell _2 \Bigr ) \, e^{2 \pi i \xi \cdot (2^{-2j}(k_1-\ell _1 k_2 - \ell _2 k_3), \, 2^{-j} k_2,\,2^{-j} k_3)} \, \widehat{\mathcal {T}}(\xi ) \, d\xi \nonumber \\&\quad = 2^{-2j} \int _0^\infty \int _0^{\pi } \int _0^{2 \pi } W^2(2^{-2j} \rho \Theta (\theta , \phi )) \, v \Bigl (2^j \tan \theta -\ell _1 \Bigr ) \, v \Bigl (2^j \sec \theta \cot \phi -\ell _2 \Bigr ) \times \nonumber \\&\qquad \times \, e^{2 \pi i \rho \Theta (\theta , \phi ) \cdot (2^{-2j}(k_1-\ell _1 k_2 - \ell _2 k_3), \, 2^{-j} k_2,\,2^{-j} k_3)} e^{-2 \pi i \rho H(u_{\theta , \phi })} \, \rho \, a(u_{\theta , \phi }) \, \sin \phi \, d \phi \, d\theta \, d\rho \nonumber \\&\quad = 2^{-2j} \int _0^\infty \int _0^{\pi } \int _0^{2 \pi } W^2(2^{-2j} \rho \Theta (\theta , \phi )) \, v \Bigl (2^j \tan \theta -\ell _1 \Bigr ) \, v \Bigl (2^j \sec \theta \cot \phi -\ell _2 \Bigr ) \times \nonumber \\&\qquad \times \, e^{2 \pi i \rho \Theta (\theta , \phi ) \cdot \left( (2^{-2j}(k_1-\ell _1 k_2 - \ell _2 k_3), \, 2^{-j} k_2,\,2^{-j} k_3) - ( \tfrac{1}{2} (\tan \theta )^2 + \tfrac{1}{2} (\sec \theta \cot \phi )^2, -\tan \theta , - \sec \theta \cot \phi )\right) } \nonumber \\&\qquad \times \,\rho a(u_{\theta , \phi }) \, \sin \phi \, d \phi \, d\theta \, d\rho \nonumber \\&\quad = 2^{2j} \int _0^\infty \int _0^{\pi } \int _0^{2 \pi } W^2(\rho \Theta (\theta , \phi )) \, v \Bigl (2^j \tan \theta -\ell _1 \Bigr ) \, v \Bigl (2^j \sec \theta \cot \phi -\ell _2 \Bigr ) \nonumber \\&\qquad \times \, e^{2 \pi i \rho \Theta (\theta , \phi ) \cdot (k_1 - \ell _1 k_2 - \ell _2 k_3 - \frac{1}{2}(\tan \theta )^2 - \frac{1}{2} (\sec \theta \cot \phi )^2,\, 2^j k_2 + 2^{2j} \tan \theta ,\, 2^j k_3 + 2^{2j}\sec \theta \cot \phi )}\nonumber \\&\qquad \times \,\rho \, a(u_{\theta , \phi }) \sin \phi \, d \phi \, d\theta \, d\rho . \end{aligned}$$
(3.4)

Let L be the differential operator:

$$\begin{aligned} L = \left( I - (\tfrac{2^{2j}}{2 \pi })^2 \frac{\partial ^2}{\partial \xi _1^2} \right) \, \left( 1 - (\tfrac{2^{j}}{2 \pi })^2 \frac{\partial ^2}{\partial \xi _2^2} \right) ^2 \left( 1 - (\tfrac{2^{j}}{2 \pi })^2 \frac{\partial ^2}{\partial \xi _3^2} \right) . \end{aligned}$$

From (3.4), we have that for any \(N \in \mathbb {N}\)

$$\begin{aligned} \beta _{j, \ell , k}= & {} 2^{-2j} \int _{\mathbb {R}^3} \,L^N \left( W^2(2^{-2j} \xi ) \, \widehat{\mathcal {T}}(\xi ) \, v \Bigl (2^j \tfrac{\xi _2}{\xi _1}-\ell _1 \Bigr ) \, v \Bigl (2^j \tfrac{\xi _3}{\xi _1}-\ell _2 \Bigr ) \right) \\\times & {} L^{-N} \left( e^{2 \pi i \xi \cdot (2^{-2j}(k_1-\ell _1 k_2 - \ell _2 k_3), \, 2^{-j} k_2,\,2^{-j} k_3)} \right) \,d\xi . \end{aligned}$$

Hence, using the fact that W, v and \(\mathcal {T}\) are continuously differentiable N times, for any N, a direct computation shows that there exists a constant \(C_N\) such that

$$\begin{aligned} |\beta _{j, \ell , k}|\le & {} C_N \int _{U} \left( 1+(k_1-\ell _1 k_2-\ell _2 k_3-2^{2j}(\frac{1}{2} u_1^2 + \frac{1}{2} u_2^2))^2 \right) ^{-N} \\\times & {} \left( 1+(k_2-2^j u_1)^2 \right) ^{-N} \left( 1+(k_3-2^j u_2)^2 \right) ^{-N} \,du. \end{aligned}$$

Let

$$\begin{aligned} J = \{k = (k_1, k_2, k_3)\,, \, |k_2|> 2 \cdot 2^{j} \text{ or } |k_3|> 2 \cdot 2^{j} \text{ or } \,|k_1| > 3 \cdot 2^{2j} \}. \end{aligned}$$

Then \( |k_2 - 2^j u_1 | > 2^{j}\) or \( |k_3 - 2^j u_2| > 2^{j}\) or \(|k_1 - k_2 \ell _1 - k_3 \ell _2 - 2^{2j} (\frac{1}{2} u_1^2 + \frac{1}{2} u_2^2)| > 2^{2j}\) for \(k \in J\) and for all \((u_1, u_2) \in U\). It follows that

$$\begin{aligned} \sum _{k \in J} |\beta _{j, \ell , k}| \le C_N \,2^{-(2N -1)j}. \end{aligned}$$
(3.5)

Also we have

$$\begin{aligned}&\beta _{j, \ell , k} \\&\quad = 2^{2j} \int _0^\infty \int _0^{2 \pi } \int _0^{\pi } \, W^2(\rho \Theta (\theta , \phi )) \, v \Bigl (2^j \tan \theta -\ell _1 \Bigr ) \, v \Bigl (2^j \sec \theta \cot \phi -\ell _2 \Bigr )\\&\qquad \times \, e^{2 \pi i \rho \Theta (\theta , \phi ) \cdot ((k_1 - \ell _1 k_2 - \ell _2 k_3 - \frac{2^{2j}}{2}\tan ^2 \theta - \frac{2^{2j}}{2} (\sec \theta \cot \phi )^2,\, 2^j k_2 + 2^{2j} \tan \theta ,\, 2^j k_3 + 2^{2j}\sec \theta \cot \phi )} \\&\qquad \times \,\rho \, a(u_{\theta , \phi }) \sin \phi \, d \phi \, d\theta \, d\rho \\&\quad = P_1(j, \ell , k) + P_2(j, \ell , k), \end{aligned}$$

where \(P_1(j, \ell , k)\) is obtained by restricting the integration with respect to the variable \(\theta \) to the interval \([-\frac{\pi }{2},\frac{\pi }{2}]\) and \(P_2(j, \ell , k)\) by restricting the integration with respect to the variable \(\theta \) to the interval \([\frac{\pi }{2},\frac{3\pi }{2}]\).

For \(P_1(j, \ell , k)\), we use the change of variables \( t_1 = 2^j \tan \theta -\ell _1,\, t_2 = 2^j \sec \theta \cot \phi -\ell _2\) so that \(\theta (t_1) = \tan ^{-1}(2^{-j}(t_1 + \ell _1))\), \(\phi (t_1, t_2) = 2^{-j} \cos (\theta (t_1))(t_2 + \ell _2) \) and \(d \phi \, d\theta = 2^{-2j} \cos ^3(\theta (t_1)) \, \sin ^2(\phi (t_1, t_2) dt_2 \, dt_1\). It follows that

$$\begin{aligned}&P_1(j, \ell , k) = \int _0^\infty \int _{-1}^1 \int _{-1}^1 W^2(\rho \Theta (\theta (t_1), \phi (t_1, t_2))) \, v(t_1) \, v(t_2) \, \rho \, a(u_{\theta (t_1), \phi (t_1, t_2)}) \, \\&\quad \cos ^3(\theta (t_1)) \,\sin ^3(\phi (t_1,t_2)) \\&\qquad \times \, e^{2 \pi i \rho \cos (\theta (t_1)) \sin (\phi (t_1, t_2)) ((k_1 - \ell _1 k_2 - \ell _2 k_3 + \frac{1}{2}(t_1 + \ell _1)^2 + \frac{1}{2} (t_2+\ell _2)^2 + [(t_1 + \ell _1) k_2 + (t_2 + \ell _2) k_3 )} \\&\quad dt_2 \,dt_1 \, d\rho \\&\quad = \int _0^\infty \int _{-1}^1 \int _{-1}^1 W^2(\rho \Theta (\theta (t_1), \phi (t_1, t_2))) \, v(t_1) \, v(t_2) \, \rho \, a(u_{\theta (t_1), \phi (t_1, t_2)}) \, \\&\cos ^3(\theta (t_1)) \,\sin ^3(\phi (t_1,t_2))\\&\qquad \times \, e^{2 \pi i \rho \cos (\theta (t_1)) \sin (\phi (t_1, t_2)) (\frac{1}{2}(t_1 + \ell _1)^2 + \frac{1}{2} (t_2 + \ell _2)^2 + k_1 + t_1 k_2 + t_2 k_3)} \\&\quad dt_2 \,dt_1 \, d\rho . \end{aligned}$$

For \(P_2(j, \ell , k)\), we first let \(\theta ' = \theta - \pi \) which implies that \(\cos \theta = - \cos \theta ',\, \sin \theta = - \sin \theta ',\, \tan \theta = \tan \theta '\). Next, using the change of variables \( t_1 = 2^j \tan \theta ' -\ell _1,\, t_2 = - 2^j \sec \theta ' \cot \phi -\ell _2\) we have

$$\begin{aligned} P_2(j, \ell , k)= & {} \int _0^\infty \int _{-1}^1 \int _{-1}^1 W^2[\rho (-\cos (\theta '(t_1)) \sin (\phi (t_1, t_2), \\&-\sin (\theta '(t_1)) \sin (\phi (t_1, t_2), \cos (\phi (t_1, t_2))] \, v(t_1) \, v(t_2) \\&\times \rho \, a(u_{\theta '(t_1), \phi (t_1, t_2)}) \, \cos ^3(\theta '(t_1)) \,\sin ^3(\phi (t_1,t_2)) \\&\times e^{- 2 \pi i \rho \cos (\theta (t_1)) \sin (\phi (t_1, t_2)) (\frac{1}{2}(t_1 + \ell _1)^2 + \frac{1}{2} (t_2 + \ell _2)^2 + k_1 + t_1 k_2 + t_2 k_3)} dt_2 \,dt_1 \, d\rho . \end{aligned}$$

Next, we define the set \(S_{2,j}\) as \(S_{2,j} = \bigcup _{i=0}^4 F_{i, j}\), where

$$\begin{aligned}&F_{0, j} = \{(\ell , k):\, \left| \frac{1}{2} (\ell _1^2 + \ell _2)^2 + k_1\right| \le 2^{\frac{1}{4} j}, \}\\&F_{1, j} = \{(\ell , k):\, \left| \frac{1}{2}(1 + \ell _1)^2 + \frac{1}{2} (1 + \ell _2)^2 + k_1 + k_2 + k_3\right| \le 2^{\frac{1}{4} j} \},\\&F_{2, j} = \{(\ell , k):\, \left| \frac{1}{2}(-1 + \ell _1)^2 + \frac{1}{2} (-1 + \ell _2)^2 + k_1 - k_2 - k_3\right| \le 2^{\frac{1}{4} j} \},\\&F_{3, j} = \{(\ell , k):\, \left| \frac{1}{2}(1 + \ell _1)^2 + \frac{1}{2} (-1 + \ell _2)^2 + k_1 + k_2 - k_3\right| \le 2^{\frac{1}{4} j} \},\\&F_{4, j} = \{(\ell , k):\, \left| \frac{1}{2}(-1 + \ell _1)^2 + \frac{1}{2} (1 + \ell _2)^2 + k_1 - k_2 + k_3\right| \le 2^{\frac{1}{4} j} \}. \end{aligned}$$

Let \(g(t_1, t_2) = \frac{1}{2}(t_1 + \ell _1)^2 + \frac{1}{2} (t_2 + \ell _2)^2 + k_1 + t_1 k_2 + t_2 k_3\). Then \(\frac{\partial g}{\partial t_1}(t) = \ell _1 + t_1 + k_2\) and \(\frac{\partial g}{\partial t_2}(t) = \ell _2 + t_2 + k_3\). If \(\ell _1 + k_2 \ge 1\), then \(\frac{\partial g}{\partial t_1}(t) \ge 0\) on \([-1, 1]\), while if \(\ell + k_2 \le -1\), then \(\frac{\partial g}{\partial t_1}(t) \le 0\) on \([-1, 1]\). It follows that for \(\ell _1 \not = - k_2\) and for each fixed \(t_2\), the function \(g(t_1, t_2)\) is monotone for \(t_1 \in [-1, 1]\). Similarly for \(\ell _2 \not = - k_3\) and for each fixed \(t_1\), the function \(g(t_1, t_2)\) is monotone for \(t_2 \in [-1, 1]\). When \( \ell _1 = - k_2\) and \( \ell _2 = - k_3\), the function \(g(t_1, t_2)\) has only the critical point (0, 0). Therefore the function g(t) can attain its extreme values only at the points (1, 1), \((-1, -1)\), \((1, -1)\), \((-1, 1)\), (0, 0).

Thus, since \(S_{2,j}^c = \bigcap _{p=1}^5 F_{2, j}^c \), we see that \(|g(t)| \ge 2^{\frac{1}{4} j}\) for all \( t \in [-1, 1]^2\) and all \((\ell , k) \in S_{2,j}^c\). Due to the assumptions on support of the functions W and v, we have that \( \frac{1}{16}< \rho < \frac{1}{2}\), \(|\theta | \le \frac{\pi }{4}\), \(|\phi - \frac{\pi }{2}| \le \frac{\pi }{4}\) so that \( \rho \cos (\theta (t_1)) \sin (\phi (t_1, t_2) \ge c > 0 \) for \(\rho \in {\mathrm{supp }}W\) and for \(|\theta | \le \frac{\pi }{4},\, |\phi - \frac{\pi }{2}| \le \frac{\pi }{4}\). It follows that if, \((\ell , k) \in S_{2, j}^c\), then integration by parts N times on the variable \(\rho \) of the integral \(P_1(j, \ell , k)\) yields that there is a constant \(C_N\) independent of \(\ell , k\) such that \( |P_1(j, \ell , k)| \le C_N 2^{- \frac{1}{4} N j}.\) Similarly we have the estimate \( |P_2(j, \ell , k)| \le C_N 2^{- \frac{1}{4} N j}.\) This implies that there is a constant \(C_N\) independent of \(\ell , k\) such that

$$\begin{aligned} |\beta _{j, \ell , k}| \le 2 C_N 2^{- \frac{1}{4} N j}. \end{aligned}$$

Combining (3.5) and the above estimate, we have

$$\begin{aligned} \delta _{2,j}= & {} \sum _{(\ell , k) \in S_{2, j}^c} |\beta _{j, \ell , k}|\\= & {} \sum _{(\ell , k) \in S_{2, j}^c \bigcap J} |\beta _{j, \ell , k}| + \sum _{(\ell , k) \in S_{2, j}^c \bigcap J^c} |\beta _{j, \ell , k}|\\\le & {} C_N \, \left( 2^{-(2N-1) j} + 24 \cdot 2^{- \frac{1}{4} N j} \, 2^{4j}\right) , \end{aligned}$$

which is valid for any \(N \in \mathbb {N}\). By taking \(N = 13\), we have that \(\delta _{2,j} = o(2^j)\) as \(j \rightarrow \infty \).

As we pointed out above, to complete the proof of Theorem 2.1 it remains to show that \(\mu _c(S_{2,j}, \Psi ; \Phi ) = \max _{k' \in \mathbb {Z}^3} \sum _{(\ell , k) \in S_{2, j}} |\langle \phi _{j, k'},\psi _{j, \ell , k}\rangle | \rightarrow 0\), as \(j \rightarrow \infty \). By the construction of \(S_{2, j}\), it follows that the proof of Theorem 2.1 is completed once we show the following lemma. \(\square \)

Lemma 3.6

Let \(\Phi \) and \(\Psi \) be defined as in Theorem 2.1. For any \(p=1,2,3,4,5\), there exists a constant \(C > 0\) independent of \( j,\, \ell , k, k'\) such that

$$\begin{aligned}&\,\, \sum _{(\ell , k) \in F_{p, j}} |\langle \psi _{j, \ell , k},\phi _{j, k'}\rangle | \le C \, 2^{- \tfrac{1}{2} j}. \end{aligned}$$

Proof

We will only verify the above inequality for \(F_{1, j}\) as all other cases can be handled using a very similar argument.

For fixed \(k_2, \, k_3\) and \(\ell \), let \(K_{k_2, k_3, \ell } = \{ k_1,\,\,(\ell ,k) \in F_{1, j} \}. \) Since \(|\ell _1| \le 2^j,\;|\ell _2| \le 2^j\), for \(\ell = (\ell _1, \ell _2)\), we have that \(\Vert \ell \Vert \le 2^{j + \frac{1}{2}}\). We need to show that there is a constant C independent of \( j,\, \ell , k, k'\) such that

$$\begin{aligned} \sum _{\Vert \ell \Vert \le 2^{j + \frac{1}{2}}} \sum _{k_2 \in \mathbb {Z}} \sum _{k_3 \in \mathbb {Z}}\, \sum _{k_1 \in K_{k_2, k_3, \ell } } |\langle \psi _{j, \ell _1,\ell _2, k},\phi _{j, k'}\rangle | \le C 2^{- \frac{1}{4} j}. \end{aligned}$$

Let \(L_1\) be the differential operator

$$\begin{aligned} L_1 = \left( I - \frac{1}{(2 \pi )^2} \frac{\partial ^2}{\partial \xi _1^2} \right) \, \left( I - \frac{1}{(2 \pi )^2} \frac{\partial ^2}{\partial \xi _2^2} \right) \left( I - \frac{1}{(2 \pi )^2} \frac{\partial ^2}{\partial \xi _3^2} \right) . \end{aligned}$$

For brevity, let

$$\begin{aligned} \alpha = \alpha (j,\ell ,k) = B_{(1)}^{\ell } A_{(1)}^j (2^{-2j} k^{\prime }) = (k_1^{\prime } + 2^{-j} \ell _1 k_2^{\prime } + 2^{-j} \ell _2 k_3^{\prime }, 2^{-j} k_2^{\prime }, 2^{-j} k_3^{\prime }), \end{aligned}$$

and \(\alpha =(\alpha _1, \alpha _2, \alpha _3)\). By direct calculation, for any positive integer N we have that

$$\begin{aligned}&\langle \widehat{\psi _{j,\ell , k}} , \widehat{\phi _{j,k'}} \rangle = \int _{\mathbb {R}^3} \\&\quad \left( 2^{-2j} \, \Gamma _{j,\ell _1,\ell _2}(\xi ) \, e^{2\pi i \xi A_{(1)}^{-j} B_{(1)}^{[-\ell ]} k}\right) \left( 2^{-3j} W(2^{-2j} \xi ) \, e^{-2 \pi i 2^{-2j} \xi \cdot k'} \right) d \xi \\&\quad = 2^{-5j} \int _{R^3} \Gamma _{j,\ell _1,\ell _2}(\xi ) \, W(2^{-2j} \xi ) \, e^{ 2\pi i \xi [A_{(1)}^{-j} B_{(1)}^{[-\ell ]}(k - \alpha )]} \, d \xi \\&\quad = 2^{-j} \int _{\mathbb {R}^3} \hat{\psi _2}(\tfrac{\eta _2}{\eta _1})\hat{\psi _2}(\tfrac{\eta _3}{\eta _1}) \, W^2(\eta _1, 2^{-j}(\ell _1 \eta _1 + \eta _2), 2^{-j}(\ell _2 \eta _1 \!+\! \eta _3) )\, e^{ 2\pi i \eta \cdot (k - \alpha )} \, d \eta \\&\quad = 2^{-j} \int _{\mathbb {R}^3} L_1^N \left( \hat{\psi _2}(\tfrac{\eta _2}{\eta _1})\hat{\psi _2}(\tfrac{\eta _3}{\eta _1}) \, W^2(\eta _1, 2^{-j}(\ell _1 \eta _1 + \eta _2), 2^{-j}(\ell _2 \eta _1 + \eta _3) )\right) L_1^{-N} \\&\quad \left( e^{ 2\pi i \eta \cdot (k - \alpha )}\right) d \eta . \end{aligned}$$

It follows that

$$\begin{aligned}&\sum _{\Vert \ell \Vert \le 2^{j + \frac{1}{2}}}\,\sum _{k_2 \in \mathbb {Z}} \sum _{k_3 \in \mathbb {Z}} \sum _{k_1 \in K_{k_2, k_3, \ell } } |\langle \psi _{j, \ell _1,\ell _2, k},\phi _{j, k'}\rangle |\\&\le C \,2^{-j} \sum _{\Vert \ell \Vert \le 2^{j + \frac{1}{2}}}\,\sum _{k_2 \in \mathbb {Z}} \sum _{k_3 \in \mathbb {Z}} \sum _{k_1 \in K_{k_2, k_3, \ell }} \left( 1 + \left( k_1-\alpha _1\right) ^2\right) ^{-N}\\&\left( 1 + \left( k_2 - \alpha _2\right) ^2\right) ^{-N} \left( 1 + \left( k_3 - \alpha _3\right) ^2\right) ^{-N}\\&\le C \,2^{-j} \sum _{\Vert \ell \Vert \le 2^{j + \frac{1}{2}}}\,\sum _{k_2 \in \mathbb {Z}} \sum _{k_3 \in \mathbb {Z}} \sum _{k_1 \in K_{k_2, k_3, \ell } }\left( 1 + \left( k_1- k_1' - 2^{-j} \ell _1 k_2' - 2^{-j}\ell _2 k_3'\right) ^2\right) ^{-N}\\&\times \left( 1 + \left( k_2 - 2^{-j} k_2'\right) ^2\right) ^{-N} \left( 1 + \left( k_3 - 2^{-j} k_3'\right) ^2\right) ^{-N}. \end{aligned}$$

As in Lemma 3.5, we consider the sets

$$\begin{aligned} Q_{k, k'}= & {} \left\{ \ell =(\ell _1,\ell _2): \, \Vert \frac{1}{2}(1 + \ell _1)^2 + \frac{1}{2} (1 + \ell _2)^2 + k_2 + k_3 + \alpha _1| \le 2^{\frac{1}{2} j} \right\} \\= & {} \left\{ \ell =(\ell _1,\ell _2): \, \Vert \frac{1}{2}(1 + \ell _1)^2 + \frac{1}{2} (1 + \ell _2)^2 + k_2 + k_3 + k_1' + 2^{-j} \ell _1 k_2' \right. \\&\left. +\, 2^{-j} \ell _2 k_3'| \le 2^{\frac{1}{2} j} \right\} \end{aligned}$$

By Lemma 3.5, we have that there is a constant C independent of k and \(k'\) such that \(\#(Q_{k, k'}) \le C \, 2^{\frac{1}{2} j}\). It follows that

$$\begin{aligned}&\sum _{\ell \in Q_{k, k'}}\,\sum _{k_2 \in \mathbb {Z}} \sum _{k_3 \in \mathbb {Z}} \sum _{k_1 \in K_{k_2, k_3, \ell } } |\langle \psi _{j, \ell _1,\ell _2, k},\phi _{j, k'}\rangle |\\&\le C_N \sum _{k_2 \in \mathbb {Z}} \sum _{k_3 \in \mathbb {Z}} \sum _{\ell \in Q_{k_2, k_3, \alpha _1}}\,\sum _{k_1 \in K_{k_2, k_3, \ell } } [1 + (k_1 - \alpha _1)^2]^{-N}\, [1 + (k_2 - \alpha _2)^2]^{-N} [1 + (k_3 - \alpha _3)^2]^{-N}\\&\le C_N \sum _{\ell \in Q_{k, k'}}\, \sum _{k_2 \in \mathbb {Z}} \sum _{k_3 \in \mathbb {Z}} \left( \sum _{k_1 \in \mathbb {Z}} (1 + (k_1- k_1' - 2^{-j} \ell _1 k_2' - 2^{-j}\ell _2 k_3')^2)^{-N} \right) (1 + (k_2 - 2^{-j} k_2')^2)^{-N} \\&\times (1 + (k_3 - 2^{-j} k_3')^2)^{-N} \le C_N 2^{\frac{1}{2}j}. \end{aligned}$$

For \(\ell \in Q_{k, k'}^c \), we have \(|\frac{1}{2}(1 + \ell _1)^2 + \frac{1}{2} (1 + \ell _2)^2 + k_2 + k_3 + \alpha _1| \ge 2^{\frac{1}{2} j} \). Since \(k_1\) satisfies the inequality \(|k_1 + \frac{1}{2}(1 + \ell _1)^2 + \frac{1}{2} (1 + \ell _2)^2 + k_2 + k_3| \le 2^{\frac{1}{4} j} \), for all \(j \ge 4\), we have

$$\begin{aligned} |k_1 - \alpha _1|= & {} |k_1 + \frac{1}{2}(1 + \ell _1)^2 + \frac{1}{2} (1 + \ell _2)^2 + k_2 + k_3 - (\frac{1}{2}(1 + \ell _1)^2 \\&+ \frac{1}{2} (1 + \ell _2)^2 + k_2 + k_3 + \alpha _1)|\\\ge & {} 2^{\frac{1}{2} j} - 2^{\frac{1}{4} j} \\\ge & {} \frac{1}{2} 2^{\frac{1}{2} j}. \end{aligned}$$

It follows that

$$\begin{aligned}&\sum _{\ell \in Q_{k, k'}^c}\,\sum _{k_2 \in \mathbb {Z}}\sum _{k_3 \in \mathbb {Z}} \sum _{k_1 \in K_{k_2, k_3, \ell } } |\langle \psi _{j, \ell _1,\ell _2, k},\phi _{j, k'}\rangle |\\&\le C_N \sum _{\ell \in Q_{k, k'}^c} \sum _{k_2 \in \mathbb {Z}}\sum _{k_3 \in \mathbb {Z}} \,\sum _{k_1 \in K_{k_2, k_3, \ell } }[1 + (k_1 - \alpha _1)^2]^{-N}\, [1 + (k_2 - \alpha _2)^2]^{-N}\\&\quad [1 + (k_3 - \alpha _3)^2]^{-N}\\&\le C_N \sum _{|\ell | \le 2^{j+\frac{1}{2}}}\,\sum _{k_2 \in \mathbb {Z}} \sum _{k_3 \in \mathbb {Z}} \sum _{|k_1 - \alpha _1| \ge \frac{1}{2} 2^{\frac{1}{2} j}} [1 + (k_1 - \alpha _1)^2]^{-N}\, [1 + (k_2 - \alpha _2)^2]^{-N} \\&\quad [1 + (k_3 - \alpha _3)^2]^{-N}\\&\le C_N\, 2^{2j} 2^{-(2N-1) \frac{1}{2} j}\\&\le C \, 2^{- \frac{1}{2} j}, \end{aligned}$$

where, in the last inequality, we took \(N = 3\). \(\square \)

Combining the above estimates, we have proved

$$\begin{aligned} \mu _c(S_{2,j}, \Psi ; \Phi ) \rightarrow 0 \quad \text {as } j \rightarrow \infty . \end{aligned}$$

This completes the proof of Theorem 2.1. \(\square \)