Mammogram Diagnostics Using Robust Wavelet-Based Estimator of Hurst Exponent

Feng, Chen; Mei, Yajun; Vidakovic, Brani

doi:10.1007/978-3-319-99389-8_5

Chen Feng⁵,
Yajun Mei⁵ &
Brani Vidakovic⁶

Part of the book series: ICSA Book Series in Statistics ((ICSABSS))

1121 Accesses
1 Citations

Abstract

Breast cancer is one of the leading causes of death in women. Mammography is an effective method for early detection of breast cancer. Like other medical images, mammograms demonstrate a certain degree of self-similarity over a range of scales, which can be used in classifying individuals as cancerous or non-cancerous. In this paper, we study the robust estimation of Hurst exponent (self-similarity measure) in two-dimensional images based on non-decimated wavelet transforms (NDWT). The robustness is achieved by applying a general trimean estimator on non-decimated wavelet detail coefficients of the transformed data, and the general trimean estimator is derived as a weighted average of the distribution’s median and quantiles, combining the median’s emphasis on central values with the quantiles’ attention to the extremes. The properties of the proposed estimators are studied both theoretically and numerically. Compared with other standard wavelet-based methods (Veitch and Abry (VA) method, Soltani, Simard, and Boichu (SSB) method, median based estimators MEDL and MEDLA, and Theil-type (TT) weighted regression method), our methods reduce the variance of the estimators and increase the prediction precision in most cases. We apply proposed methods to digitized mammogram images, estimate Hurst exponent, and then use it as a discriminatory descriptor to classify mammograms to benign and malignant. Our methods yield the highest classification accuracy around 65%.

Access provided by CONRICYT-eBooks. Download chapter PDF

A New Method Based-Gentle Adaboost and Wavelet Transform for Breast Cancer Classification

Wavelet energy entropy and linear regression classifier for detecting abnormal breasts

Article 22 November 2016

Detection of Abnormalities in Mammograms by Thresholding Based on Wavelet Transform and Morphological Operation

1 Introduction

Breast cancer is one of the major health concerns among women. It has been estimated by the National Cancer Institute that 1 in 8 women will be diagnosed with breast cancer during their lifetime. Early detection is proven to be the best strategy for improving prognosis. Most of the references dealing with automated breast cancer detection are based on microcalcifications (El-Naqa et al. 2002; Kestener et al. 2011; Bala and Audithan 2014; Netsch and Peitgen 1999; Wang and Karayiannis 1998). Recently, predicting disease using image data becomes an active research area in statistics and machine learning (Reiss and Ogden 2010; Zhou et al. 2013; Zipunnikov et al. 2011; Reiss et al. 2005). For example, Reiss and Ogden proposed a functional generalized linear regression model with images as predictors (Reiss and Ogden 2010). However, predicting breast cancer based on the tissue images directly is like a black-box. Physicians will have a hard time to summarize the common features from the cancerous images, and the prediction results are not easily interpreted. In this paper, we study the scaling information from the tissue image and then predict breast cancer based on the estimated scaling parameter. It has been found in literatures that the scaling information is efficient and accurate in early detection of breast cancer (Hamilton et al. 2011; Nicolis et al. 2011; Ramírez-Cobo and Vidakovic 2013; Jeon et al. 2014). In fact, regular scaling is a common phenomenon in high-frequency signals and high-resolution digital images collected in real life. Examples can be found in a variety of fields including economics, telecommunications, physics, geosciences, as well as in biology and medicine (Feng and Vidakovic 2017; Engel Jr et al. 2009; Gregoriou et al. 2009; Katul et al. 2001; Park and Willinger 2000; Woods et al. 2016; Zhou 1996).

The standard measure of regular scaling is the Hurst exponent, denoted by H in the sequel. Recall that a stochastic process $\left \{X\left (\boldsymbol {t}\right ), \boldsymbol {t}\in \mathbb {R}^d\right \}$ is self-similar with Hurst exponent H if, for any $\lambda \in \mathbb {R}^+$, $X\left (\boldsymbol {t}\right )\overset {\mathrm {d}}{=}\lambda ^{-H}X\left (\lambda \boldsymbol {t}\right )$. Here the notation $\overset {\mathrm {d}}{=}$ means the equality in all finite-dimensional distributions. The Hurst exponent quantifies the self-similarity and describes the rate at which autocorrelations decrease as the lag between two realizations in a time series increases. A value H in the range 0–0.5 indicates a zig-zagging intermittent time series with long-term switching between high and low values in adjacent pairs. A value H in the range 0.5 to 1 indicates a time series with long-term positive autocorrelations, which preserves trends on a longer time horizon and gives a time series more regular appearance.

Multiresolution analysis is one of the many methods to estimate the Hurst exponent. An overview can be found in Abry et al. (2000, 1995, 2013). In particular, the non-decimated wavelet transforms (NDWT) (Nason and Silverman 1995; Vidakovic 2009; Percival and Walden 2006) has several potential advantages when employed for Hurst exponent estimation. Input signals and images of arbitrary size can be transformed in a straightforward manner due to the absence of decimation. As a redundant transform, the NDWT can decrease variance in the scaling estimation (Kang and Vidakovic 2017). Least square regression can be fitted to estimate H instead of weighted least square regression since the variances of the level-wise derived distributions based on log NDWT coefficients do not depend on level. Local scaling can be assessed due to the time-invariance property. Of course, the dependence of coefficients in NDWT is much more pronounced. Similar to Soltani et al. (2004), we will control this dependence by systematic sampling of coefficients on which the estimator is based.

Different wavelet-based methods for estimation of H have been proposed in the literature for the one-dimensional case. Abry et al. (2000) suggested the estimation of H by weighted least square regression using the level-wise $\log _2\left (\overline {d_j^2}\right )$, In addition, the authors corrected for the bias caused by the order of taking the logarithm and the average in $\log _2\left (\overline {d_j^2}\right )$, where d _j indicates any detail coefficient at level j. We use d _j,k to denote the kth coefficient at level j in the sequel. Soltani et al. (2004) defined a mid-energy as $D_{j,k}=\left (d_{j,k}^2+d_{j,k+N_j/2}^2\right )\big /2$, and showed that the level-wise averages of log₂ D _j,k are asymptotically normal and more stable, which is used to estimate H by regression. The estimators in Soltani et al. (2004) consistently outperform the estimators in Abry et al. (2000). Shen et al. (2007) showed that the method of Soltani et al. (2004) yields more accurate estimators since it takes the logarithm of the mid-energy first and then averages.

The robust estimation of H has recently become a topic of interest due to the presence of outlier coefficients and outlier multiresolution levels, inter and within level dependences, and distributional contaminations (Franzke et al. 2012; Park and Park 2009; Shen et al. 2007; Sheng et al. 2011). Hamilton et al. (2011) came up with a robust approach based on Theil-type weighted regression (Theil 1992), a method for robust linear regression that selects the weighted average of all slopes defined by different pairs of regression points. Like the VA method, they regress the level-wise $\log _2\left (\overline {d_j^2}\right )$ against the level indices, but instead of weighted least square regression, they use the Theil-type weighted regression to make it less sensitive to outlier levels. Kang and Vidakovic (2017) proposed MEDL and MEDLA methods based on non-decimated wavelets to estimate H. MEDL estimates H by regressing the medians of $\log d_j^2$ on level j, while MEDLA uses the level-wise medians of $\log \left (\left (d_{j,k_1}^2+d_{j,k_2}^2\right )\big /2\right )$ to estimate H, where k ₁ and k ₂ are properly selected locations at level j to approximate the independence.

Both MEDL and MEDLA use the median of the derived distribution instead of the mean, because the medians are more robust to potential outliers that can occur when logarithmic transform of a squared wavelet coefficient is taken and the magnitude of coefficient is close to zero. Although median is outlier-resistant, it can behave unexpectedly as a result of its non-smooth character. The fact that the median is not “universally the best outlier-resistant estimator” motivates us to develop the general trimean estimators of the level-wise derived distributions to estimate H, where the general trimean estimator was derived as a weighted average of the distribution’s median and two quantiles symmetric about the median, combining the median’s emphasis on center values with the quantiles’ attention to the tails. Tukey’s trimean estimator (Tukey 1977; Andrews and Hampel 2015) and Gastwirth estimator (Gastwirth 1966; Gastwirth and Cohen 1970; Gastwirth and Rubin 1969) are two special cases under such general framework.

In this paper, we are concerned with the robust estimation of Hurst exponent in self-similar signals. Here, the focus is on images, but the methodology applies to multiscale context of arbitrary dimension. The properties of the proposed Hurst exponent estimators are studied both theoretically and numerically. The performance of the robust approach is compared with other standard wavelet-based methods (Veitch and Abry (VA) method, Soltani, Simard, and Boichu (SSB) method, median based estimators MEDL and MEDLA, and Theil-type (TT) weighted regression method).

The rest of the paper consists of six additional sections and an Appendix. Section 5.2 discusses background of non-decimated wavelet transforms and wavelet-based spectrum in the context of estimating the Hurst exponent for fractional Brownian motion (fBm). Section 5.3 introduces the general trimean estimators and discusses two special estimators following that general framework; Sect. 5.4 describes estimation of Hurst exponent using the general trimean estimators, presents distributional results on which the proposed methods are based, and derives optimal weights that minimize the variances of the estimators. Section 5.5 provides the simulation results and compares the performance of the proposed methods to other standardly used, wavelet-based methods. The proposed methods are applied to classify the digitized mammogram images as cancerous or non-cancerous in Sect. 5.6. The paper is concluded with a summary and discussion in Sect. 5.7.

2 Background

2.1 Non-decimated Wavelet Transforms

The non-decimated wavelet transforms (NDWT) (Nason and Silverman 1995; Vidakovic 2009; Percival and Walden 2006) are redundant transforms because they are performed by repeated filtering with a minimal shift, or a maximal sampling rate, at all dyadic scales. Subsequently, the transformed signal contains the same number of coefficients as the original signal at each multiresolution level. We start by describing algorithmic procedure of 1-D NDWT and then expand to 2-D NDWT. Traditionally, we perform a wavelet transformation as a convolution of an input data with wavelet and scaling filters. A principal difference between NDWT and DWT is the sampling rate.

Any square integrable function $f(x)\in \boldsymbol {L}_2(\mathbb {R})$ can be expressed in the wavelet domain as

$$\displaystyle \begin{aligned}f(x)=\sum_k c_{J_0,k}\phi_{J_0,k}(x)+\sum_{j\geq J_0}^\infty\sum_k d_{j,k}\psi_{j,k}(x),\end{aligned}$$

where $c_{J_0,k}$ denote coarse coefficients, d _j,k indicate detail coefficients, $\phi _{J_0,k}(x)$ represent scaling functions, and ψ _j,k(x) signify wavelet functions. For specific choices of scaling and wavelet functions, the basis for NDWT can be formed from the atoms

$$\displaystyle \begin{aligned}\phi_{J_0,k}(x)=2^{J_0/2}\phi\left(2^{J_0}\left(x-k\right)\right)\ \mbox{and}\end{aligned}$$

$$\displaystyle \begin{aligned}\psi_{j,k}(x)=2^{j/2}\psi\left(2^j\left(x-k\right)\right),\end{aligned}$$

where $x\in \mathbb {R}$, j is a resolution level, J ₀ is the coarsest level, and k is the location of an atom. Notice that atoms for NDWT have the constant location shift k at all levels, yielding the finest sampling rate on any level. The coarse coefficients $c_{J_0,k}$ and detail coefficients d _j,k can be obtained via

$$\displaystyle \begin{aligned} c_{J_0,k}=\int f(x)\ \phi_{J_0,k}(x)dx\ \ \mbox{and}\ \ \ d_{j,k}=\int f(x)\ \psi_{j,k}(x)dx. \end{aligned} $$

(5.1)

In a J-level decomposition of an 1-D input signal of size N, an NDWT will yield N × (J + 1) wavelet coefficients, including N × 1 coarse coefficients and N × J detail coefficients.

Expanding on the 1-D definitions, we could easily describe 2-D NDWT of f(x, y) with $(x,y)\in \mathbb {R}^2$. Several versions of 2-D NDWT exist, but we only focus on the scale-mixing version based on which our methods are proposed. For the scale-mixing 2-D NDWT, the wavelet atoms are

$$\displaystyle \begin{aligned}\phi_{J_{01},J_{02};\boldsymbol{k}}(x,y) =2^{(J_{01}+J_{02})/2}\phi(2^{J_{01}}(x-k_1))\phi(2^{J_{02}}(y-k_2)),\end{aligned}$$

$$\displaystyle \begin{aligned}\psi_{J_{01},j_2;\boldsymbol{k}}(x,y) =2^{(J_{01}+j_2)/2}\phi(2^{J_{01}}(x-k_1))\psi(2^{j_2}(y-k_2)),\end{aligned}$$

$$\displaystyle \begin{aligned}\psi_{j_1,J_{02};\boldsymbol{k}}(x,y) =2^{(j_1+J_{02})/2}\psi(2^{j_1}(x-k_1))\phi(2^{J_{02}}(y-k_2)),\end{aligned}$$

$$\displaystyle \begin{aligned}\psi_{j_1,j_2;\boldsymbol{k}}(x,y) =2^{(j_1+j_2)/2}\psi(2^{j_1}(x-k_1))\psi(2^{j_2}(y-k_2)),\end{aligned}$$

where k = (k ₁, k ₂) is the location index, J ₀₁ and J ₀₂ are coarsest levels, j ₁ > J ₀₁, and j ₂ > J ₀₂. The wavelet coefficients for f(x, y) after the scale-mixing NDWT can be obtained as

$$\displaystyle \begin{aligned} \begin{aligned} c_{J_{01},J_{02};\boldsymbol{k}}&=\iint f(x,y)\ \phi_{J_{01},J_{02};\boldsymbol{k}}(x,y) dxdy,\\ h_{J_{01},j_2;\boldsymbol{k}}&=\iint f(x,y)\ \psi_{J_{01},j_2;\boldsymbol{k}}(x,y) dxdy,\\ v_{j_1,J_{02};\boldsymbol{k}}&=\iint f(x,y)\ \psi_{j_1,J_{02};\boldsymbol{k}}(x,y) dxdy,\\ d_{j_1,j_2;\boldsymbol{k}}&=\iint f(x,y)\ \psi_{j_1,j_2;\boldsymbol{k}}(x,y) dxdy. \end{aligned} \end{aligned} $$

(5.2)

Note that $c_{J_{01},J_{02};\boldsymbol {k}}$ are coarse coefficients and represent the coarsest approximation, $h_{J_{01},j_2;\boldsymbol {k}}$ and $v_{j_1,J_{02}}$ represent the mix of coarse and detail information, and $d_{j_1,j_2;\boldsymbol {k}}$ carry information about details only. In our methods, only detail coefficients $d_{j_1,j_2;\boldsymbol {k}}$ are used to estimate H.

2.2 The fBm: Wavelet Coefficients and Spectra

Among models having been proposed for analyzing the self-similar phenomena, arguably the most popular is the fractional Brownian motion (fBm) first described by Kolmogorov (1940) and formalized by Mandelbrot and Van Ness (1968).

In this section, an overview of 1-D fBm and its extension to 2-D fBm is provided. Consider a stochastic process $\{X(t),t\in \mathbb {R}\}$ is self-similar with Hurst exponent H, then the 1-D detail coefficients defined in (5.1) satisfy

$$\displaystyle \begin{aligned}d_{jk}\overset{\mathrm{d}}{=}2^{-j(H+1/2)}d_{0k},\end{aligned}$$

for a fixed level j (Abry et al. 2003). If the process has stationary increments, i.e., X(t + h) − X(t) is independent of t, then $\mathbb {E}(d_{0k})=0$ and $\mathbb {E}(d_{0k}^2)=\mathbb {E}(d_{00}^2)$. We obtain

$$\displaystyle \begin{aligned} \mathbb{E}\left(d_{jk}^2\right) \propto 2^{-j(2H+1)}. \end{aligned} $$

(5.3)

The Hurst exponent can be estimated by taking logarithms on both sides of Eq. (5.3). The wavelet spectrum is defined by the sequence $\left \{S(j)=\log \mathbb {E}\left (d_{jk}^2\right ), j\in \mathbb {Z}\right \}$. Fractional Brownian motion (fBm), denoted as B _H(t) is the unique Gaussian process with stationary increments that is self-similar (Abry et al. 2003; Abry 2003). The definition of the one-dimensional fBm can be extended to the multivariate case. In particular, a two-dimensional fBm, B _H(t), for t ∈ [0, 1] × [0, 1] and H ∈ (0, 1), is a Gaussian process with stationary zero-mean increments, satisfying

$$\displaystyle \begin{aligned}B_H(a\boldsymbol{t})\overset{\mathrm{d}}{=}a^HB_H(\boldsymbol{t}).\end{aligned}$$

It can be shown that the detail coefficients $d_{j_1,j_2;\boldsymbol {k}}$ defined in Eq. (5.2) satisfy

$$\displaystyle \begin{aligned}\log_2\mathbb{E}\left(|d_{j_1,j_2;\boldsymbol{k}}|{}^2\right)=-(2H+2)j+C,\end{aligned}$$

which defines the two-dimensional wavelet spectrum, from which the Hurst exponent can be estimated. Our proposed methods in next sections are based on but improve from this spectrum.

3 General Trimean Estimators

Let X ₁, X ₂, …, X _n be i.i.d. continuous random variables with pdf f(x) and cdf F(x). Let 0 < p < 1, and let ξ _p denote the pth quantile of F, so that $\xi _p = \inf \{x| F(x) \geq p\}.$ If F is monotone, the pth quantile is simply defined as F(ξ _p) = p.

Let Y _p = X _⌊np⌋:n denote a sample pth quantile. Here ⌊np⌋ denotes the greatest integer that is less than or equal to np. The general trimean estimator is defined as a weighted average of the distribution’s median and its two quantiles Y _p and Y _1−p, for p ∈ (0, 1∕2):

$$\displaystyle \begin{aligned} \hat{\mu}=\frac{\alpha}{2}\ Y_{p}+\left(1-\alpha\right)\ Y_{1/2}+\frac{\alpha}{2}\ Y_{1-p}. \end{aligned} $$

(5.4)

The weights for the two quantiles are the same for Y _p and Y _1−p, and α ∈ [0, 1]. This is equivalent to the weighted sum of the median and the average of Y _p and Y _1−p with weights 1 − α and α:

$$\displaystyle \begin{aligned} \hat{\mu}=\left(1-\alpha\right)\ Y_{1/2}+\alpha\ \left(\frac{Y_{p}+Y_{1-p}}{2}\right). \end{aligned}$$

This general trimean estimator turns out to be more robust than mean but smoother than the median. To derive its asymptotic distribution, the asymptotic joint distribution of sample quantiles is needed, as shown in Lemma 5.1; detailed proof can be found in DasGupta (2008).

Lemma 5.1

Consider r sample quantiles, $Y_{p_1}, Y_{p_2},\ldots .,Y_{p_r}$ , where 1 ≤ p ₁ < p ₂ < … < p _r ≤ n. If for any 1 ≤ i ≤ r, $\sqrt {n}\left (\lfloor np_i\rfloor /n-p_i\right )\to 0$ is satisfied, then the asymptotic joint distribution of $Y_{p_1}, Y_{p_2},\ldots .,Y_{p_r}$ is:

where

$$\displaystyle \begin{aligned}\varSigma=\left(\sigma_{ij}\right)_{r\times r},\end{aligned}$$

and

$$\displaystyle \begin{aligned} \sigma_{ij}=\frac{p_i\left(1-p_j\right)}{f\left(x_{p_i}\right)f\left(x_{p_j}\right)},\ i\le j. \end{aligned} $$

(5.5)

From Lemma 5.1, the asymptotic distribution of general trimean estimator will be normal as a linear combination of the components each with an asymptotic normal distribution. The general trimean estimator itself may be defined in terms of order statistics as

$$\displaystyle \begin{aligned}\hat{\mu}=A\cdot\boldsymbol{y},\end{aligned}$$

where

$$\displaystyle \begin{aligned}A=\left[\frac{\alpha}{2}\ \ \ 1-\alpha\ \ \ \frac{\alpha}{2}\right],\ \ \mbox{and}\ \ \boldsymbol{y}=\left[Y_{p}\ \ \ Y_{1/2}\ \ \ Y_{1-p}\right]^T.\end{aligned}$$

It can be easily verified that $\sqrt {n}\left (\lfloor pn\rfloor /n-p\right )\to 0$ for $p\in \left (0,1/2\right ]$. If we denote $\boldsymbol {\xi }=\left [\xi _{p}\ \ \ \xi _{1/2}\ \ \ \xi _{1-p}\right ]^T$ the population quantiles, the asymptotic distribution of y is

where $\varSigma =\left (\sigma _{ij}\right )_{3\times 3},$ and σ _ij follows Eq. (5.5) for p ₁ = p, p ₂ = 1∕2, and p ₃ = 1 − p. Therefore

with the theoretical expectation and variance being

$$\displaystyle \begin{aligned} \mathbb{E}\left(\hat{\mu}\right)=\mathbb{E}\left(A\cdot\boldsymbol{y}\right)=A\cdot\mathbb{E}\left(\boldsymbol{y}\right)=A\cdot\boldsymbol{\xi}, \end{aligned} $$

(5.6)

and

$$\displaystyle \begin{aligned} \operatorname{\mathrm{Var}}\left(\hat{\mu}\right)=\operatorname{\mathrm{Var}}\left(A\cdot\boldsymbol{y}\right)=A\operatorname{\mathrm{Var}}\left(\boldsymbol{y}\right)A^T=\frac{1}{n}A\varSigma A^T. \end{aligned} $$

(5.7)

3.1 Tukey’s Trimean Estimator

Tukey’s trimean estimator is a special case of the general trimean estimators, with α = 1∕2 and p = 1∕4 in Eq. (5.4). To compute this estimator, we first sort the data in ascending order. Next, we take the values that are one-fourth of the way up this sequence (the first quartile), half way up the sequence (i.e., the median), and three-fourths of the way up the sequence (the third quartile). Given these three values, we then form the weighted average, giving the central (median) value a weight of 1∕2 and the two quartiles a weight of 1∕4 each.

If we denote Tukey’s trimean estimator as $\hat {\mu }_T,$ then

$$\displaystyle \begin{aligned}\hat{\mu}_T=\frac{1}{4}\ Y_{1/4}+\frac{1}{2}\ Y_{1/2}+\frac{1}{4}\ Y_{3/4}.\end{aligned}$$

The asymptotic distribution is

where $A_T=\left [\frac {1}{4}\ \ \ \frac {1}{2}\ \ \ \frac {1}{4}\right ]$, $\boldsymbol {\xi }_T=\left [\xi _{1/4}\ \ \ \xi _{1/2}\ \ \ \xi _{3/4}\right ]^T$, $\varSigma _T=\left (\sigma _{ij}\right )_{3\times 3}$ is the covariance matrix of the asymptotic multivariate normal distribution, and σ _ij follows Eq. (5.5) with p ₁ = 1∕4, p ₂ = 1∕2, and p ₃ = 3∕4.

3.2 Gastwirth Estimator

As Tukey’s estimator, the Gastwirth estimator is another special case of the general trimean estimators, with α = 0.6 and p = 1∕3 in Eq. (5.4).

If we denote this estimator as $\hat {\mu }_G$, then

$$\displaystyle \begin{aligned}\hat{\mu}_G=0.3\ Y_{1/3}+0.4\ Y_{1/2}+0.3\ Y_{2/3}.\end{aligned}$$

The asymptotic distribution can be derived as

where $A_G=\left [0.3\ \ \ 0.4\ \ \ 0.3\right ]$, $\boldsymbol {\xi }_G=\left [\xi _{1/3}\ \ \ \xi _{1/2}\ \ \ \xi _{2/3}\right ]^T$, $\varSigma _G=\left (\sigma _{ij}\right )_{3\times 3}$, and σ _ij follows Eq. (5.5) with p ₁ = 1∕3, p ₂ = 1∕2, and p ₃ = 2∕3.

4 Methods

Our proposal for robust estimation of Hurst exponent H is based on non-decimated wavelet transforms (NDWT). In a J-depth decomposition of a 2-D fBm of size N × N, a scale-mixing 2-D NDWT generates (J + 1) × (J + 1) blocks of coefficients, with each block the same size as original image, i.e., N × N. The tessellation of coefficients of scale-mixing 2-D NDWT is shown in Fig. 5.1a. From the 2-D NDWT wavelets coefficients, our methods use the diagonal blocks (j ₁ = j ₂ = j) of the detail coefficients $d_{j_1,j_2;\boldsymbol {k}}$ to predict H, as is shown in Fig. 5.1b.

At each detail level j, the corresponding level-j diagonal block is of size N × N, the same size as original image. Note that those coefficients d _j,j;k in level-j diagonal block are not independent, however, their autocorrelations decay exponentially, that is, they possess only the short memory. We reduce such within block dependency by dividing the block into M × M equal grids and then random sampling one coefficient from each grid, therefore increasing the distance between two consecutive coefficients. To improve the efficiency, here we apply symmetric sampling. To be specific, we partition the level-j diagonal block into four equal parts (top left, top right, bottom left, and bottom right), only sample from the M ²∕4 grids at the top left, and then get the corresponding coefficients that have the same location in other parts, which is shown in Fig. 5.1c.

If assuming the coefficient $d_{j,j;(k_{i1},k_{i2})}$ is randomly sampled from grid $i\in \{1,\ldots ,\frac {M^2}{4}\}$ at the top left part of level-j diagonal block, and $k_{i1}, k_{i2} \in \{1,2,\ldots ,\frac {N}{2}\}$ being the corresponding location indexes, then we can extract corresponding coefficients $d_{j,j;(k_{i1},k_{i2}+\frac {N}{2})}$, $d_{j,j;(k_{i1}+\frac {N}{2},k_{i2})}$, and $d_{j,j;(k_{i1}+\frac {N}{2},k_{i2}+\frac {N}{2})}$ from the top right, bottom left, and bottom right parts, respectively. From the set

$$\displaystyle \begin{aligned}\{d_{j,j;(k_{i1},k_{i2})}, d_{j,j;(k_{i1},k_{i2}+\frac{N}{2})}, d_{j,j;(k_{i1}+\frac{N}{2},k_{i2})}, d_{j,j;(k_{i1}+\frac{N}{2},k_{i2}+\frac{N}{2})}\},\end{aligned}$$

we could generate two mid-energies as

$$\displaystyle \begin{aligned} \begin{aligned} &D_{i, j}=\frac{d_{j,j;(k_{i1},k_{i2})}^2+d_{j,j;(k_{i1}+\frac{N}{2},k_{i2}+\frac{N}{2})}^2}{2}\\ &D_{i, j}^{\prime}=\frac{d_{j,j;(k_{i1},k_{i2}+\frac{N}{2})}^2+d_{j,j;(k_{i1}+\frac{N}{2},k_{i2})}^2}{2},\ i\in\{1,\ldots,\frac{M^2}{4}\}, \end{aligned} \end{aligned} $$

(5.8)

where D _i,j and $D_{i, j}^{\prime }$ denote the two mid-energies corresponding to grid i at level j. If we denote D _j as the set of all mid-energies at level j, then

$$\displaystyle \begin{aligned} D_{j}=\{D_{1, j}, D_{1, j}^{\prime}, D_{2, j}, D_{2, j}^{\prime},\ldots, D_{\frac{M^2}{4}, j}, D_{\frac{M^2}{4}, j}^{\prime}\}. \end{aligned} $$

(5.9)

The M ²∕2 mid-energies at each level j are treated as if they are independent. Note that M must be divisible by 2.

Our methods have two different versions, one is based on mid-energies D _j, while the other is using logged mid-energies $\log {D_{j}}$ (in bracket). First, the distribution of D _j $\left (\log {D_{j}}\right )$ is derived under the independence approximation between $d_{j,j;(k_{i1},k_{i2})}$, $d_{j,j;(k_{i1},k_{i2}+\frac {N}{2})}$, $d_{j,j;(k_{i1}+\frac {N}{2},k_{i2})}$, and $d_{j,j;(k_{i1}+\frac {N}{2},k_{i2}+\frac {N}{2})}$. Next, we calculate the general trimean estimators from the level-wise derived distributions to estimate H.

4.1 General Trimean of the Mid-energy (GTME) Method

At each decomposition level j, the asymptotic distribution of the general trimean estimator on M ²∕2 mid-energies in D _j is derived, from which we find the relationship between the general trimean estimators and H. The general trimean of the mid-energy (GTME) method is described in the following theorem:

Theorem 5.1

Let $\hat {\mu }_{j}$ be the general trimean estimator based on the M ²∕2 mid-energies in D _j defined by (5.9) at level j in a J-level NDWT of a 2-D fBm of size N × N with Hurst exponent H. Then, the asymptotic distribution of $\hat {\mu }_{j}$ is normal,

(5.10)

where

$$\displaystyle \begin{aligned}c\left(\alpha, p\right)= \frac{\alpha}{2}\log\left(\frac{1}{p\left(1-p\right)}\right)+\left(1-\alpha\right)\log2,\end{aligned}$$

$$\displaystyle \begin{aligned}f\left(\alpha,p\right)= \frac{\alpha(1-2p)(\alpha-4p)}{4p(1-p)}+1,\end{aligned}$$

$$\displaystyle \begin{aligned}\lambda_j=\sigma^2\cdot2^{-\left(2H+2\right)j},\end{aligned}$$

and σ ² is the variance of wavelet coefficients from level 0, the Hurst exponent can be estimated as

$$\displaystyle \begin{aligned} \hat{H}=-\frac{\hat{\beta}}{2}-1, \end{aligned} $$

(5.11)

where $\hat {\beta }$ is the regression slope in the least square linear regression on pairs $\left (j, \log _2\left (\hat {\mu }_{j}\right )\right )$ from level J ₁ to J ₂ , J ₁ ≤ j ≤ J ₂ . The estimator $\hat {H}$ follows the asymptotic normal distribution

(5.12)

where the asymptotic variance V ₁ is a constant number independent of simple size N and level j,

$$\displaystyle \begin{aligned} V_1=\frac{6f(\alpha,p)}{(\log2)^2M^2c^2\left(\alpha,p\right)q(J_1,J_2)}, \end{aligned}$$

and

$$\displaystyle \begin{aligned} q(J_1,J_2)=(J_2-J_1)(J_2-J_1+1)(J_2-J_1+2). \end{aligned} $$

(5.13)

The proof of Theorem 5.1 is deferred to the Appendix.

To find the optimal α and p by minimizing the asymptotic variance of $\hat {\mu }_{j}$, we take partial derivatives of $f\left (\alpha , p\right )$ with respect to α and p and set them to 0. The optimal $\hat {\alpha }$ and $\hat {p}$ can be obtained by solving

$$\displaystyle \begin{aligned} \begin{aligned} &\frac{\partial f\left(\alpha,p\right)}{\partial\alpha}=-\frac{2p-1}{2p\left(1-p\right)}\alpha+\frac{1+p}{2\left(1-p\right)}-\frac{3}{2}=0, \\ &\frac{\partial f\left(\alpha,p\right)}{\partial p}=\frac{\alpha\left(2-\alpha\right)}{2\left(1-p\right)^2}+\frac{\alpha^2\left(2p-1\right)}{4p^2\left(1-p\right)^2}=0.\\ \end{aligned} \end{aligned} $$

(5.14)

Since α ∈ [0, 1] and $p\in \left (0,1/2\right )$, we get the unique solution α = 2p ≈ 0.6 and $p=1-\sqrt {2}/2\approx 0.3$. The Hessian matrix of $f\left (\alpha , p\right )$ is

Since $-\frac {2p-1}{2p\left (1-p\right )}>0$ and the determinant is 5.66 > 0 when α = 2p ≈ 0.6 and $p=1-\sqrt {2}/2\approx 0.3$, the above Hessian matrix is positive definite. Therefore, $\hat {\alpha }=2-\sqrt {2}$ and $\hat {p}=1-\sqrt {2}/2$ provide the global minima of $f\left (\alpha , p\right )$, minimizing also the asymptotic variance of $\hat {\mu }_{j,i}$. In comparing these optimal $\hat {\alpha }\approx 0.6$ and $\hat {p}\approx 0.3$ with α = 0.6 and p = 1∕3 from the Gastwirth estimator, curiously, we find that the optimal general trimean estimator is very close to the Gastwirth estimator.

4.2 General Trimean of the Logarithm of Mid-energy (GTLME) Method

Previously discussed the GTME method calculates the general trimean estimator of the mid-energy first and then takes the logarithm. In this section, we will calculate the general trimean estimator of the logged mid-energies at each level j. The following theorem describes the general trimean of the logarithm of mid-energy, the GTLME method.

Theorem 5.2

Let $\hat {\mu }_{j}$ be the general trimean estimator based on $\log (D_j)$ , which is the set of M ²∕2 logged mid-energies at level j in a J-level NDWT of a 2-D fBm of size N × N with Hurst exponent H, and 1 ≤ j ≤ J. Then, the asymptotic distribution of $\hat {\mu }_{j}$ is normal,

(5.15)

where

$$\displaystyle \begin{aligned}c\left(\alpha, p\right)=\frac{\alpha}{2}\log\left(\log\frac{1}{1-p}\cdot\log\frac{1}{p}\right)+\left(1-\alpha\right)\log\left(\log2\right),\end{aligned}$$

$$\displaystyle \begin{aligned}f\left(\alpha, p\right)=\frac{\alpha^2}{4g_1\left(p\right)}+\frac{\alpha\left(1-\alpha\right)}{2g_2\left(p\right)}+\frac{\left(1-\alpha\right)^2}{\left(\log2\right)^2},\end{aligned}$$

$g_1\left (p\right )$ and $g_2\left (p\right )$ are two functions of p given in the Appendix,

$$\displaystyle \begin{aligned}\lambda_j=\sigma^2\cdot2^{-\left(2H+2\right)j},\end{aligned}$$

and σ ² is the variance of wavelet coefficients from level 0. The Hurst exponent can be estimated as

$$\displaystyle \begin{aligned} \hat{H}=-\frac{1}{2\log2}\hat{\beta}-1, \end{aligned} $$

(5.16)

where $\hat {\beta }$ is the regression slope in the least square linear regressions on pairs $\left (j, \hat {\mu }_{j}\right )$ from level J ₁ to J ₂ , J ₁ ≤ j ≤ J ₂ . The estimator $\hat {H}$ follows the asymptotic normal distribution

(5.17)

where the asymptotic variance V ₂ is a constant number independent of simple size N and level j,

$$\displaystyle \begin{aligned} V_2=\frac{6f(\alpha,p)}{(\log2)^2M^2q(J_1,J_2)}, \end{aligned}$$

and q(J ₁, J ₂) is given in Eq.(5.13).

The proof of Theorem 5.2 is provided in the Appendix. Similarly, as for the GTME, the optimal α and p which minimize the asymptotic variance of $\hat {\mu }_{j}$ can be obtained by solving

$$\displaystyle \begin{aligned} \frac{\partial f\left(\alpha,p\right)}{\partial\alpha}=0, \ \mbox{and}\ \frac{\partial f\left(\alpha,p\right)}{\partial p}=0. \end{aligned} $$

(5.18)

From the first equation in (5.18) it can be derived that

$$\displaystyle \begin{aligned}\alpha=\frac{\frac{2}{\log\left( 2\right)^2}-\frac{1}{2}g_2\left(p\right)}{\frac{1}{2}g_1\left(p\right)-g_2\left(p\right)+\frac{2}{\left(\log2\right)^2}}.\end{aligned}$$

The second equation in (5.18) cannot be simplified to a finite form. As an illustration, we plot the $f\left (\alpha ,p\right )$ with p ranging from 0 to 0.5 and α being a function of p. The plot of α against p is also shown in Fig. 5.2. Numerical computation gives $\hat {\alpha }=0.5965$ and $\hat {p}=0.24$. These optimal parameters are close to α = 0.5 and p = 0.25 in the Tukey’s trimean estimator, but put some more weight on the median.

4.3 Special Cases: Tukey’s Trimean and Gastwirth Estimators

The Tukey’s trimean of the mid-energy (TTME) method and Gastwirth of the mid-energy (GME) method are described in the following Lemma.

Lemma 5.2

Let $\hat {\mu }_{j}^T$ and $\hat {\mu }_{j}^G$ be the Tukey’s trimean and Gastwirth estimators based on D _j defined in (5.9). Then the asymptotic distributions of $\hat {\mu }_{j}^T$ and $\hat {\mu }_{j}^G$ are normal:

(5.19)

(5.20)

where c ₁ and c ₂ are constant numbers and can be found in the Appendix, $\lambda _j=\sigma ^2\cdot 2^{-\left (2H+2\right )j}$ , and σ ² is the variance of wavelet coefficients from level 0. The Hurst exponent can be estimated as

$$\displaystyle \begin{aligned} \hat{H}^T=-\frac{\hat{\beta}^T}{2}-1, \ \mathit{\mbox{and}}\ \hat{H}^G=-\frac{\hat{\beta}^G}{2}-1, \end{aligned} $$

(5.21)

where $\hat {\beta }^T$ and $\hat {\beta }^G$ are the regression slopes in the least square linear regression on pairs $\left (j, \log _2\left (\hat {\mu }_{j}^T\right )\right )$ and pairs $\left (j, \log _2\left (\hat {\mu }_{j}^G\right )\right )$ from level J ₁ to J ₂ , J ₁ ≤ j ≤ J ₂ . The estimators $\hat {H}^T$ and $\hat {H}^G$ follow the asymptotic normal distributions

(5.22)

where the asymptotic variances $V^T_1$ and $V^G_1$ are constant numbers,

$$\displaystyle \begin{aligned} V^T_1=\frac{5}{(\log2)^2M^2c^2_1q(J_1,J_2)}, \end{aligned}$$

$$\displaystyle \begin{aligned} V^G_1=\frac{5.01}{(\log2)^2M^2c^2_2q(J_1,J_2)}. \end{aligned}$$

The function q(J ₁, J ₂) is the same as Eq. (5.13) in Theorem 5.1.

The following Lemma describes the Tukey’s trimean (TTLME) and Gastwirth (GLME) of the logarithm of mid-energy method.

Lemma 5.3

Let $\hat {\mu }_{j}^T$ and $\hat {\mu }_{j}^G$ be the Tukey’s trimean estimator and Gastwirth estimator based on $\log (D_j)$ defined in the Theorem 5.2 . The asymptotic distributions of $\hat {\mu }_{j}^T$ and $\hat {\mu }_{j}^G$ are normal,

(5.23)

(5.24)

where c ₃ ,V _T , c ₄ , and V _G are constant numbers and can be found in the Appendix. The Hurst exponent can be estimated as

$$\displaystyle \begin{aligned} \hat{H}^T=-\frac{\hat{\beta}^T}{2\log2}-1,\ \mathit{\mbox{and}}\ \hat{H}^G=-\frac{\hat{\beta}^G}{2\log2}-1, \end{aligned} $$

(5.25)

where $\hat {\beta }^T$ and $\hat {\beta }^G$ are the regression slopes in the least square linear regression on pairs $\left (j, \hat {\mu }_{j}^t\right )$ and pairs $\left (j, \hat {\mu }_{j}^g\right )$ from level J ₁ to J ₂ , J ₁ ≤ j ≤ J ₂ . The estimators $\hat {H}^T$ and $\hat {H}^G$ follow the asymptotic normal distributions

(5.26)

where the asymptotic variances $V^T_2$ and $V^G_2$ are constant numbers,

$$\displaystyle \begin{aligned} V^T_2=\frac{3V_T}{(\log2)^2q(J_1,J_2)}, \end{aligned}$$

$$\displaystyle \begin{aligned} V^G_2=\frac{3V_G}{(\log2)^2q(J_1,J_2)}. \end{aligned}$$

The function q(J ₁, J ₂) is provided in Eq. (5.13).

The proofs of Lemmas 5.2 and 5.3 are provided in the Appendix. To verify the asymptotic normal distributions of predictors in Lemmas 5.2 and 5.3, we perform an NDWT of depth 10 on 300 simulated fBm’s with H = 0.3. We use resulting wavelet coefficients from levels 4 to 10 inclusive to estimate H. Figure 5.3 shows the histograms and theoretical distributions of $\hat {H}$ using TTME, TTLME, GME, and GLME methods, respectively.

5 Simulation

We simulate 2-D fBm of sizes 2¹⁰ × 2¹⁰ (N = 2¹⁰) with Hurst exponent H = 0.3, 0.5, 0.7, 0.8, 0.9, respectively. NDWT of depth J = 10 using Haar wavelet is performed on the simulated signal to obtain wavelet coefficients. The two-dimensional fBm signals were simulated based on the method of Wood and Chan (1994).

The proposed methods (with six variations) are applied on the NDWT detail coefficients to estimate Hurst exponent H. Each level diagonal block is divided into 16 × 16 grids (M = 16) for all proposed methods, and we use wavelet coefficients from levels 4 to 10 for the least square linear regression. The estimation performance of the proposed methods is compared to five other existing methods: Veitch and Abry (VA) method, Soltani, Simard, and Boichu (SSB) method, MEDL method, MEDLA method, and Theil-type regression (TT) method. The GTME and GTLME methods are based on the optimal parameters which minimize the variances. Estimation performance is reported in terms of mean, variance, and mean square error (MSE) based on 300 repetitions for each case.

The simulation results are shown in Table 5.1. For each H (corresponding to each row in the table), the smallest variances and MSEs are highlighted in bold. From simulations results, all our six variations outperform SSB, MEDL, MEDLA, and TT methods for all H’s regarding variances and MSEs. Compared with VA method, our methods yield significantly smaller variances and MSEs when H > 0.5. When H = 0.3, our methods are still comparable to VA. Although the performances of our six variations are very similar regarding variances and MSEs, the TTME method based on Tukey’s trimean estimator of the mid-energy has the best performance among all of them. The variances of GTME based on the optimal parameters are very close or equal to those of GME and TTME methods in most cases. Besides, in most cases the optimized GTLME method is superior to other logged mid-energy methods TTLME and GLME with respect to variances; however, such superiority is not significant, since the variances are close to each other.

Table 5.1 Simulation results for 2¹⁰ × 2¹⁰ fBm using Haar wavelet (300 replications)

Full size table

6 Application

In this section, we apply the proposed methodology to classification of digitized mammogram images. The digitized mammograms were obtained from the University of South Florida’s Digital Database for Screening Mammography (DDSM) (Heath et al. 2000). All cases examined had biopsy results which served as ground truth. Researchers used the HOWTEK scanner at the full 43.5-micron per pixel spatial resolution to scan 45 mammograms from patients with normal studies (control group) and 79 from patients with confirmed breast cancer (study group). Figure 5.4 shows an example of mammograms from study group, and it is almost impossible for physicians to distinguish a cancerous mammogram with a non-cancerous mammogram just by eyes. Each subject contains two mammograms from a screening exam, one craniocaudal projection for each side breast. We only keep one projection for each subject, either right side or left side breast image. A sub-image of size 1024 × 1024 was taken manually from each mammogram.

Our methods were then applied on each sub-image to estimate the Hurst exponent parameter for each subject. To be specific, the NDWT of depth J = 10 using Haar wavelet was performed on each sub-image to obtain wavelet coefficients. The proposed methods (with six variations) are applied on the NDWT detail coefficients to estimate Hurst exponent H. Each level diagonal block is divided into 16 × 16 grids (M = 16) for all proposed methods, and we use levels 4 to 10 for the least square linear regression. Veitch and Abry (VA) method, Soltani, Simard, and Boichu (SSB) method, MEDL method, MEDLA method, and Theil-type regression (TT) method were applied, as well, to compare with our methods.

Table 5.2 provides descriptive statistics of the estimated Hurst exponent $\hat {H}$ in each group using our proposed methods and other standard methods to compare with. To visualize the difference in $\hat {H}$ across cancer and non-cancer groups, we present in Fig. 5.5 the boxplots of estimated H and fitted normal density curves in two groups based on proposed GME method. As can be seen, the non-cancer group exhibited a smaller value for $\hat {H}$ in both the mean and median, and the variance of $\hat {H}$ is slightly larger. In fact, images with smaller Hurst exponent tend to be more disordered and unsystematic, therefore healthy individuals tend to have more rough breast tissue images.

Table 5.2 Descriptive statistics group summary

Full size table

For subject i, we generated the data {Y _i, H _i}, where H _i represents the estimated Hurst exponent, and Y _i is the indicator of the disease status with 1 and 0 signifying cancer and non-cancer, respectively. The subjects were classified using a logistic regression model by treating H _i as the predictor and Y _i as the response. The overall classification accuracy, true positive rate (sensitivity), and true negative rate (specificity) were obtained by using a fourfold-cross validation. Instead of the constant 0.5 threshold, we used a training-data-determined adaptive threshold, i.e., each time the threshold of the logistic regression was first chosen to maximize Youden index on the training set and then applied to the testing set to classify.

Table 5.3 summarizes the results of the classification for each estimation method. The best classification rate (0.6538) and sensitivity (0.7217) were both achieved using GME estimator, and the best specificity (0.5530) was achieved using TT or TTME estimator (highlighted in bold). In general, the six variations of our robust method performed better as compared to other methods in classification of breast cancers using mammograms.

Table 5.3 Results of classification by logistic regression

Full size table

Real-world images like mammograms may be characterized by non-stationary conditions such as extreme values, causing outlier coefficients in multiresolution levels after NDWT. VA method estimates H by weighted least square regression using the level-wise $\log _2\left (\overline {d_{j,j}^2}\right )$, and SSB method uses $\overline {\log _2D_{j}}$, with D _j defined in (5.9), they are easily affected by those within level outliers, in that they both use mean of derived distributions on level-wise detail coefficients to estimate H. Besides, potential outliers can also occur when logarithmic transform is taken and the magnitude of coefficient is close to zero. Like the VA method, TT method regress the level-wise $\log _2\left (\overline {d_{j,j}^2}\right )$ against the level indices, but instead of weighted least square regression, they use the Theil-type weighted regression, the weighted average of all slops between different pairs of regression points, to make it less sensitive to outlier levels. However, it is still not robust to within level outlier coefficients. MEDL and MEDLA use the median of the derived distribution instead of the mean. Although median is outlier-resistant, it can behave unexpectedly as a result of its non-smooth character. To improve, our methods (six derivations) use the general trimean estimator on non-decimated wavelet detail coefficients of the transformed data, combining the median’s emphasis on central values with the quantiles’ attention to the extremes. Besides, in the context of our scenario, Theil-type regression is equivalent to least square regression, since the variance of our pair-wise slop is independent of levels and sample size. Those explain why our robust methods performed the best in classification of mammograms.

7 Conclusions

In this paper, we proposed methodologies and derived six variations to improve the robustness of estimation of Hurst exponent H in two-dimensional setting. Non-decimated wavelet transforms (NDWT) are utilized for its redundancy and time-invariance. Instead of using mean or median of the derived distribution on level-wise wavelet coefficients, we defined the general trimean estimators that combine the median’s emphasis on center values with the quantiles’ attention to the extremes and used them on the level-wise derived distributions to estimate H.

The proposed variations were: (1) Tukey’s trimean of the mid-energy (TTME) method; (2) Tukey’s trimean of the logged mid-energy (TTLME) method; (3) Gastwirth of the mid-energy (GME) method; (4) Gastwirth of the logged mid-energy (GLME) method; (5) general trimean of the mid-energy (GTME) method; (6) general trimean of the logarithm of mid-energy (GTLME) method. The GTME and GTLME methods are based on the derived optimal parameters in general trimean estimators to minimize the asymptotic variances. Tukey’s trimean and Gastwirth estimators are two special cases following the general trimean estimators’ framework. These estimators are applied on both mid-energy (as defined by Soltani et al. 2004) and logarithm of the mid-energy at each NDWT level detail coefficient diagonal block. The estimation performance of the proposed methods is compared to five other existing methods: Veitch and Abry (VA) method, Soltani, Simard, and Boichu (SSB) method, MEDL method, MEDLA method, and Theil-type regression (TT) method.

Simulation results indicate all our six variations outperform SSB, MEDL , MEDLA, and TT methods for all H’s regarding variances and MSEs. Compared with VA method, our methods yield significantly smaller variances and MSEs when H > 0.5. When H = 0.3, our methods are still comparable to VA. Although the performances of our six variations are very similar regarding variances and MSEs, the TTME method based on Tukey’s trimean estimator of the mid-energy has the best performance among all of them.

The proposed methods have been applied to digitized mammograms to classify patients with and without breast cancer. Our methods helped to differentiate individuals based on the estimated Hurst parameters $\hat {H}$. Higher values for $\hat {H}$ have been found in cancer group, and individuals with breast cancer have smoother breast tissue images. This increase of regularity with increase of the degree of pathology is common for many other biometric signals: EEG, EKG, high frequency protein mass-spectra, high resolution medical images of tissue, to list a few.

References

Abry, P. (2003). Scaling and wavelets: An introductory walk. In Processes with long-range correlations (pp. 34–60). Berlin: Springer.
Chapter Google Scholar
Abry, P., Gonçalvés, P., & Flandrin, P. (1995). Wavelets, spectrum analysis and 1/f processes. In: Wavelets and statistics (pp. 15–29). Springer: New York.
Chapter MATH Google Scholar
Abry, P., Goncalves, P., & Véhel, J. L. (2013). Scaling, fractals and wavelets. New York: Wiley.
MATH Google Scholar
Abry, P., Flandrin, P., Taqqu, M. S., & Veitch, D. (2000). Wavelets for the analysis, estimation and synthesis of scaling data. Self-similar network traffic and performance evaluation (pp. 39–88). New York: Wiley.
Google Scholar
Abry, P., Flandrin, P., Taqqu, M. S., & Veitch, D. (2003). Self-similarity and long-range dependence through the wavelet lens. In Theory and applications of long-range dependence (pp. 527–556). Boston, MA: Birkhauser.
MATH Google Scholar
Andrews, D. F., & Hampel, F. R. (2015). Robust estimates of location: Survey and advances. Princeton: Princeton University Press.
MATH Google Scholar
Bala, B. K., & Audithan, S. (2014). Wavelet and curvelet analysis for the classification of microcalcifiaction using mammogram images. In Second International Conference on Current Trends in Engineering and Technology - ICCTET 2014 (pp. 517–521). https://doi.org/10.1109/ICCTET.2014.6966351
DasGupta, A. (2008). Edgeworth expansions and cumulants. In Asymptotic theory of statistics and probability (pp. 185–201). New York: Springer.
Chapter MATH Google Scholar
El-Naqa, I., Yang, Y., Wernick, M. N., Galatsanos, N. P., & Nishikawa, R. M. (2002). A support vector machine approach for detection of microcalcifications. IEEE Transactions on Medical Imaging, 21(12), 1552–1563.
Article Google Scholar
Engel, Jr J., Bragin, A., Staba, R., & Mody, I. (2009). High-frequency oscillations: What is normal and what is not? Epilepsia, 50(4), 598–604.
Article Google Scholar
Feng, C., & Vidakovic, B. (2017), Estimation of the hurst exponent using trimean estimators on nondecimated wavelet coefficients. arXiv preprint arXiv:170908775.
Google Scholar
Franzke, C. L., Graves, T., Watkins, N. W., Gramacy, R. B., & Hughes, C. (2012). Robustness of estimators of long-range dependence and self-similarity under non-gaussianity. Philosophical Transactions of the Royal Society A, 370(1962), 1250–1267.
Article Google Scholar
Gastwirth, J. L. (1966). On robust procedures. Journal of the American Statistical Association, 61(316), 929–948.
Article MathSciNet MATH Google Scholar
Gastwirth, J. L., & Cohen, M.L (1970) Small sample behavior of some robust linear estimators of location. Journal of the American Statistical Association, 65(330), 946–973
Article MATH Google Scholar
Gastwirth, J. L, & Rubin, H. (1969). On robust linear estimators. The Annals of Mathematical Statistics, 40(1), 24–39.
Article MathSciNet MATH Google Scholar
Gregoriou, G. G., Gotts, S. J., Zhou, H., & Desimone, R. (2009). High-frequency, long-range coupling between prefrontal and visual cortex during attention. Science, 324(5931), 1207–1210.
Article Google Scholar
Hamilton, E. K., Jeon, S., Cobo, P. R., Lee, K. S., & Vidakovic, B. (2011). Diagnostic classification of digital mammograms by wavelet-based spectral tools: A comparative study. In 2011 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), IEEE (pp. 384–389).
Google Scholar
Heath, M., Bowyer, K., Kopans, D., Moore, R., & Kegelmeyer, W. P. (2000). The digital database for screening mammography. In Proceedings of the 5th International Workshop on Digital Mammography (pp. 212–218). Medical Physics Publishing.
Google Scholar
Jeon, S., Nicolis, O., & Vidakovic, B. (2014). Mammogram diagnostics via 2-d complex wavelet-based self-similarity measures. The São Paulo Journal of Mathematical Sciences, 8(2), 265–284.
Article MathSciNet MATH Google Scholar
Kang, M., & Vidakovic, B. (2017). Medl and medla: Methods for assessment of scaling by medians of log-squared nondecimated wavelet coefficients. ArXiv Preprint ArXiv:170304180.
Google Scholar
Katul, G., Vidakovic, B., & Albertson, J. (2001). Estimating global and local scaling exponents in turbulent flows using discrete wavelet transformations. Physics of Fluids, 13(1), 241–250.
Article MATH Google Scholar
Kestener, P., Lina, J. M., Saint-Jean, P., & Arneodo, A. (2011). Wavelet-based multifractal formalism to assist in diagnosis in digitized mammograms. Image Analysis & Stereology, 20(3), 169–174.
Article MATH Google Scholar
Kolmogorov, A. N. (1940). Wienersche spiralen und einige andere interessante kurven in hilbertscen raum, cr (doklady). Academy of Sciences URSS (NS), 26, 115–118.
Google Scholar
Mandelbrot, B. B., & Van Ness, J. W. (1968). Fractional Brownian motions, fractional noises and applications. SIAM Review, 10(4), 422–437.
Article MathSciNet MATH Google Scholar
Nason, G. P., & Silverman, B. W. (1995). The stationary wavelet transform and some statistical applications. In Wavelets and statistics (pp. 281–299). New York: Springer.
Chapter MATH Google Scholar
Netsch, T., & Peitgen, H. O. (1999). Scale-space signatures for the detection of clustered microcalcifications in digital mammograms. IEEE Transactions on Medical Imaging, 18(9), 774–786.
Article Google Scholar
Nicolis, O., Ramírez-Cobo, P., & Vidakovic, B. (2011). 2d wavelet-based spectra with applications. Computational Statistics & Data Analysis, 55(1), 738–751.
Article MathSciNet MATH Google Scholar
Park, J., & Park, C. (2009). Robust estimation of the hurst parameter and selection of an onset scaling. Statistica Sinica, 19, 1531–1555.
MathSciNet MATH Google Scholar
Park, K., & Willinger, W. (2000). Self-similar network traffic and performance evaluation. Wiley Online Library. https://doi.org/10.1002/047120644X.
Google Scholar
Percival, D. B., & Walden, A. T. (2006). Wavelet methods for time series analysis (vol. 4). New York: Cambridge University Press.
MATH Google Scholar
Ramírez-Cobo, P., & Vidakovic, B. (2013). A 2d wavelet-based multiscale approach with applications to the analysis of digital mammograms. Computational Statistics & Data Analysis, 58, 71–81.
Article MathSciNet MATH Google Scholar
Reiss, P. T., & Ogden, R. T. (2010). Functional generalized linear models with images as predictors. Biometrics, 66(1), 61–69.
Article MathSciNet MATH Google Scholar
Reiss, P. T., Ogden, R. T., Mann, J. J., & Parsey, R. V. (2005). Functional logistic regression with pet imaging data: A voxel-level clinical diagnostic tool. Journal of Cerebral Blood Flow & Metabolism, 25(1_suppl), S635–S635.
Article Google Scholar
Shen, H., Zhu. Z., & Lee, T. C. (2007). Robust estimation of the self-similarity parameter in network traffic using wavelet transform. Signal Processing, 87(9), 2111–2124.
Article MATH Google Scholar
Sheng, H., Chen, Y., & Qiu, T. (2011). On the robustness of hurst estimators. IET Signal Processing, 5(2), 209–225.
Article MathSciNet Google Scholar
Soltani, S., Simard, P., & Boichu, D. (2004). Estimation of the self-similarity parameter using the wavelet transform. Signal Processing, 84(1), 117–123.
Article MATH Google Scholar
Theil, H. (1992). A rank-invariant method of linear and polynomial regression analysis. In Henri Theils contributions to economics and econometrics (pp. 345–381). The Netherlands: Springer.
Chapter Google Scholar
Tukey, J. W. (1977). Exploratory data analysis (Vol. 2). Reading, MA: Addison-Wesley.
MATH Google Scholar
Vidakovic, B. (2009). Statistical modeling by wavelets (Vol. 503). New York: Wiley.
MATH Google Scholar
Wang, T. C., & Karayiannis, N. B. (1998). Detection of microcalcifications in digital mammograms using wavelets. IEEE Transactions on Medical Imaging, 17(4), 498–509.
Article Google Scholar
Wood, A. T., & Chan, G. (1994). Simulation of stationary gaussian processes in [0, 1] d. Journal of Computational and Graphical Statistics, 3(4), 409–432.
MathSciNet Google Scholar
Woods, T., Preeprem, T., Lee, K., Chang, W., & Vidakovic, B. (2016). Characterizing exons and introns by regularity of nucleotide strings. Biology Direct, 11(1), 6.
Article Google Scholar
Zhou, B. (1996). High-frequency data and volatility in foreign-exchange rates. Journal of Business and Economic Statistics, 14(1), 45–52.
Google Scholar
Zhou, H., Li, L., & Zhu, H. (2013). Tensor regression with applications in neuroimaging data analysis. Journal of the American Statistical Association, 108(502), 540–552.
Article MathSciNet MATH Google Scholar
Zipunnikov, V., Caffo, B., Yousem, D. M., Davatzikos, C., Schwartz, B. S., Crainiceanu. C. (2011). Functional principal component model for high-dimensional brain imaging. NeuroImage, 58(3), 772–784.
Article Google Scholar

Download references

Author information

Authors and Affiliations

H. Milton Stewart School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, GA, USA
Chen Feng & Yajun Mei
H. Milton Stewart School of Industrial and Systems Engineering and Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology, Atlanta, GA, USA
Brani Vidakovic

Authors

Chen Feng
View author publications
You can also search for this author in PubMed Google Scholar
Yajun Mei
View author publications
You can also search for this author in PubMed Google Scholar
Brani Vidakovic
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chen Feng .

Editor information

Editors and Affiliations

Department of Mathematics and Statistics, Georgia State University, Atlanta, GA, USA
Yichuan Zhao
Department of Biostatistics, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
Ding-Geng Chen

Appendix

1.1 Proof of Theorem 5.1

Proof

A single wavelet coefficient in a non-decimated wavelet transform of a 2-D fBm of size N × N with Hurst exponent H is normally distributed, with variance depending on its level j. The four coefficients in each set

$$\displaystyle \begin{aligned}\{d_{j,j;(k_{i1},k_{i2})}, d_{j,j;(k_{i1},k_{i2}+\frac{N}{2})}, d_{j,j;(k_{i1}+\frac{N}{2},k_{i2})}, d_{j,j;(k_{i1}+\frac{N}{2},k_{i2}+\frac{N}{2})}\}\end{aligned}$$

are assumed to be independent and follow the same normal distribution.

$$\displaystyle \begin{aligned} &d_{j,j;(k_{i1},k_{i2})}, d_{j,j;(k_{i1},k_{i2}+\frac{N}{2})}, d_{j,j;(k_{i1}+\frac{N}{2},k_{i2})}, d_{j,j;(k_{i1}+\frac{N}{2},k_{i2}+\frac{N}{2})}\\ &\quad \sim \mathcal{N}\left(0, 2^{-\left(2H+2\right)j}\sigma^2\right).\end{aligned} $$

Then the mid-energies in D _j defined in (5.9) and (5.8) can be readily shown to have exponential distribution with scale parameter $\lambda _j=\sigma ^2\cdot 2^{-\left (2H+2\right )j}$. Therefore at each detail level j, the mid-energies in D _j are i.i.d. $ \mathcal {E}xp\left (\lambda _j^{-1}\right )$, and when applying general trimean estimator $\hat {\mu }_{j}$ on D _j, following the derivation in Sect. 5.3, we have

$$\displaystyle \begin{aligned}\boldsymbol{\xi}=\left[\log\left(\frac{1}{1-p}\right)\lambda_j\ \ \ \log\left(2\right)\lambda_j\ \ \ \log\left(\frac{1}{p}\right)\lambda_j\right]^T,\end{aligned}$$

and

therefore, the asymptotic distribution of $\hat {\mu }_{j,i}$ is normal with mean

and variance

Since the Hurst exponent can be estimated as

$$\displaystyle \begin{aligned} \hat{H}=-\frac{\hat{\beta}}{2}-1, \end{aligned} $$

(5.27)

where $\hat {\beta }$ is the regression slope in the least square linear regression on pairs $\left (j, \log _2\left (\hat {\mu }_{j}\right )\right )$ from level J ₁ to J ₂, J ₁ ≤ j ≤ J ₂. It can be easily derived that $\hat {\beta }$ is a linear combination of $\log _2\left (\hat {\mu }_{j}\right )$,

$$\displaystyle \begin{aligned} \hat{\beta}=\sum_{j=J_1}^{J_2}a_j\log_2\left(\hat{\mu}_{j}\right), \ \ a_j=\frac{j-(J_1+J_2)/2}{\sum_{j=J_1}^{J_2}\left(j-(J_1+J_2)/2\right)^2}. \end{aligned}$$

We can check that $\sum _{j=J_1}^{J_2}a_j=0$ and $\sum _{j=J_1}^{J_2}a_j j=1$. Also, if $X\sim \mathcal {N}(\mu , \sigma ^2)$, the approximate expectation and variance of g(X) are

$$\displaystyle \begin{aligned} \mathbb{E}\left(g(X)\right)=g(\mu)+\frac{g''(\mu)\sigma^2}{2}, \ \ \mbox{and}\ \ \operatorname{\mathrm{Var}}\left(g(X)\right)=\left(g'(\mu)\right)^2\sigma^2, \end{aligned}$$

based on which we calculate

$$\displaystyle \begin{aligned} \mathbb{E}\left(\log_2\left(\hat{\mu}_{j}\right)\right)=-(2H+2)j+\mbox{Constant},\ \mbox{and}\ \operatorname{\mathrm{Var}}\left(\log_2\left(\hat{\mu}_{j}\right)\right)=\frac{\frac{2}{M^2}f\left(\alpha,p\right)}{(\log2)^2c^2\left(\alpha, p\right)}. \end{aligned}$$

Therefore

$$\displaystyle \begin{aligned} \mathbb{E}\left(\hat{\beta}\right) &=\sum_{j=J_1}^{J_2}a_j\mathbb{E}\left(\log_2\left(\hat{\mu}_{j}\right)\right)=-(2H+2),\ \mbox{and}\ \operatorname{\mathrm{Var}}\left(\hat{\beta}\right)\\ &=\sum_{j=J_1}^{J_2}a^2_j\operatorname{\mathrm{Var}}\left(\log_2\left(\hat{\mu}_{j}\right)\right):=4V1, \end{aligned} $$

and

$$\displaystyle \begin{aligned} \mathbb{E}\left(\hat{H}\right)=H,\ \mbox{and}\ \operatorname{\mathrm{Var}}\left(\hat{H}\right)=V1, \end{aligned} $$

(5.28)

where the asymptotic variance V ₁ is a constant number independent of simple size N and level j,

$$\displaystyle \begin{aligned} V_1=\frac{6f(\alpha,p)}{(\log2)^2M^2c^2\left(\alpha,p\right)q(J_1,J_2)}, \end{aligned}$$

and

$$\displaystyle \begin{aligned} q(J_1,J_2)=(J_2-J_1)(J_2-J_1+1)(J_2-J_1+2). \end{aligned}$$

1.2 Proof of Theorem 5.2

Proof

We have stated that each mid-energy in D _j follows $\mathcal {E}xp\left (\lambda _j^{-1}\right )$ with scale parameter $\lambda _j=\sigma ^2\cdot 2^{-\left (2H+2\right )j}$. If we denote the kth element in $\log \left (D_{j}\right )$ as y _j,k for $k=1,\ldots ,\frac {M^2}{2}$ and j = 1, …, J, the pdf and cdf of y _j,k are

$$\displaystyle \begin{aligned}f\left(y_{j,k}\right)=\lambda_j^{-1}e^{-\lambda_j^{-1}e^{y_{j,k}}}e^{y_{j,k}},\end{aligned}$$

and

$$\displaystyle \begin{aligned}F\left(y_{j,k}\right)=1-e^{-\lambda_j^{-1}e^{y_{j,k}}}.\end{aligned}$$

The p-quantile can be obtained by solving $F\left (y_p\right )=1-e^{-\lambda _j^{-1}e^{y_p}}=p$, and $y_p=\log \left (-\lambda _j\log \left (1-p\right )\right )$. Then it can be shown that $f\left (y_p\right )=-\left (1-p\right )\log \left (1-p\right )$. When applying the general trimean estimator $\hat {\mu }_{j}$ on $\log \left (D_{j}\right )$, following the derivation in Sect. 5.3, we get

and

thus, the asymptotic distribution of $\hat {\mu }_{j,i}$ is normal with mean

and variance

where

$$\displaystyle \begin{aligned} \begin{aligned} g_1\left(p\right)=&\frac{p}{\left(1-p\right)\left(\log\left(1-p\right)\right)^2}+\\ &\frac{1-p}{p\left(\log p\right)^2}+\frac{2p}{\left(1-p\right)\log\left(1-p\right)\log p},\\ \end{aligned} \end{aligned}$$

and

$$\displaystyle \begin{aligned}g_2\left(p\right)=\frac{2p}{\left(1-p\right)\log\left(1-p\right)\log\frac{1}{2}}+\frac{2}{\log\frac{1}{2}\log p}.\end{aligned}$$

Since the Hurst exponent can be estimated as

$$\displaystyle \begin{aligned} \hat{H}=-\frac{1}{2\log2}\hat{\beta}-1, \end{aligned} $$

(5.29)

where $\hat {\beta }$ is the regression slope in the least square linear regressions on pairs $\left (j, \hat {\mu }_{j}\right )$ from level J ₁ to J ₂, J ₁ ≤ j ≤ J ₂. It can be easily derived that $\hat {\beta }$ is a linear combination of $\hat {\mu }_{j}$,

$$\displaystyle \begin{aligned} \hat{\beta}=\sum_{j=J_1}^{J_2}a_j\hat{\mu}_{j}, \ \ a_j=\frac{j-(J_1+J_2)/2}{\sum_{j=J_1}^{J_2}\left(j-(J_1+J_2)/2\right)^2}. \end{aligned}$$

Again, we can check that $\sum _{j=J_1}^{J_2}a_j=0$ and $\sum _{j=J_1}^{J_2}a_j j=1$. Therefore

$$\displaystyle \begin{aligned} \mathbb{E}\left(\hat{\beta}\right)&=\sum_{j=J_1}^{J_2}a_j\mathbb{E}\left(\hat{\mu}_{j,i}\right)=-(2H+2)\log2,\ \mbox{and}\ \operatorname{\mathrm{Var}}\left(\hat{\beta}\right)\\ &=\sum_{j=J_1}^{J_2}a^2_j\operatorname{\mathrm{Var}}\left(\hat{\mu}_{j,i}\right):=4(\log2)^2V_2, \end{aligned} $$

and

$$\displaystyle \begin{aligned} \mathbb{E}\left(\hat{H}\right)=H,\ \mbox{and}\ \operatorname{\mathrm{Var}}\left(\hat{H}\right)=V_2, \end{aligned} $$

(5.30)

where the asymptotic variance V ₂ is a constant number independent of simple size N and level j,

$$\displaystyle \begin{aligned} V_2=\frac{6f(\alpha,p)}{(\log2)^2M^2q(J_1,J_2)}, \end{aligned}$$

and q(J ₁, J ₂) is given in Eq. (5.13).

1.3 Proof of Lemma 5.2

Proof

When applying Tukey’s trimean estimator $\hat {\mu }_{j}^T$ on D _j, following the derivation in Sect. 5.3.1, we have

and

therefore, the asymptotic distribution of $\hat {\mu }_{j}^T$ is normal with mean

and variance

$$\displaystyle \begin{aligned}\operatorname{\mathrm{Var}}\left(\hat{\mu}_{j,i}^T\right)=\frac{2}{M^2}A_T\varSigma_T A_T^T=\frac{5}{3M^2}\lambda_j^2.\end{aligned}$$

When applying Gastwirth estimator $\hat {\mu }_{j}^G$ on D _j, following the derivation in Sect. 5.3.2, we have

and

therefore, the asymptotic distribution of $\hat {\mu }_{j}^G$ is normal with mean

and variance

$$\displaystyle \begin{aligned}\operatorname{\mathrm{Var}}\left(\hat{\mu}_{j,i}^G\right)=\frac{2}{M^2}A_G\varSigma_G A_G^T=\frac{1.67}{M^2}\lambda_j^2.\end{aligned}$$

Based on Eq. (5.28), we have

(5.31)

where the asymptotic variances $V^T_1$ and $V^G_1$ are constant numbers,

$$\displaystyle \begin{aligned} V^T_1=\frac{5}{(\log2)^2M^2c^2_1q(J_1,J_2)}, \end{aligned}$$

$$\displaystyle \begin{aligned} V^G_1=\frac{5.01}{(\log2)^2M^2c^2_2q(J_1,J_2)}. \end{aligned}$$

The function q(J ₁, J ₂) is the same as Eq. (5.13) in Theorem 5.1.

1.4 Proof of Lemma 5.3

Proof

When applying Tukey’s trimean estimator $\hat {\mu }_{j}^T$ on $\log \left (D_{j}\right )$, following the derivation in Sect. 5.3.1, we have

and

therefore, the asymptotic distribution of $\hat {\mu }_{j}^T$ is normal with mean

and variance

When applying Gastwirth estimator $\hat {\mu }_{j}^G$ on $\log \left (D_{j,i}\right )$, following the derivation in Sect. 5.3.2, we have

and

therefore, the asymptotic distribution of $\hat {\mu }_{j}^G$ is normal with mean

and variance

Based on Eq. (5.30), we can easily derive

(5.32)

where the asymptotic variances $V^T_2$ and $V^G_2$ are constant numbers,

$$\displaystyle \begin{aligned} V^T_2=\frac{3V_T}{(\log2)^2q(J_1,J_2)}, \end{aligned}$$

$$\displaystyle \begin{aligned} V^G_2=\frac{3V_G}{(\log2)^2q(J_1,J_2)}. \end{aligned}$$

The function q(J ₁, J ₂) is provided in Eq. (5.13).

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Feng, C., Mei, Y., Vidakovic, B. (2018). Mammogram Diagnostics Using Robust Wavelet-Based Estimator of Hurst Exponent. In: Zhao, Y., Chen, DG. (eds) New Frontiers of Biostatistics and Bioinformatics. ICSA Book Series in Statistics. Springer, Cham. https://doi.org/10.1007/978-3-319-99389-8_5

Download citation

DOI: https://doi.org/10.1007/978-3-319-99389-8_5
Published: 06 December 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-99388-1
Online ISBN: 978-3-319-99389-8
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics

Mammogram Diagnostics Using Robust Wavelet-Based Estimator of Hurst Exponent

Abstract

Similar content being viewed by others

A New Method Based-Gentle Adaboost and Wavelet Transform for Breast Cancer Classification

Wavelet energy entropy and linear regression classifier for detecting abnormal breasts

Detection of Abnormalities in Mammograms by Thresholding Based on Wavelet Transform and Morphological Operation

1 Introduction

2 Background

2.1 Non-decimated Wavelet Transforms

2.2 The fBm: Wavelet Coefficients and Spectra

3 General Trimean Estimators

Lemma 5.1

3.1 Tukey’s Trimean Estimator

3.2 Gastwirth Estimator

4 Methods

4.1 General Trimean of the Mid-energy (GTME) Method

Theorem 5.1

4.2 General Trimean of the Logarithm of Mid-energy (GTLME) Method

Theorem 5.2

4.3 Special Cases: Tukey’s Trimean and Gastwirth Estimators

Lemma 5.2

Lemma 5.3

5 Simulation

6 Application

7 Conclusions

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix

Appendix

1.1 Proof of Theorem 5.1

Proof

1.2 Proof of Theorem 5.2

Proof

1.3 Proof of Lemma 5.2

Proof

1.4 Proof of Lemma 5.3

Proof

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation