1 Introduction

Breast cancer is one of the major health concerns among women. It has been estimated by the National Cancer Institute that 1 in 8 women will be diagnosed with breast cancer during their lifetime. Early detection is proven to be the best strategy for improving prognosis. Most of the references dealing with automated breast cancer detection are based on microcalcifications (El-Naqa et al. 2002; Kestener et al. 2011; Bala and Audithan 2014; Netsch and Peitgen 1999; Wang and Karayiannis 1998). Recently, predicting disease using image data becomes an active research area in statistics and machine learning (Reiss and Ogden 2010; Zhou et al. 2013; Zipunnikov et al. 2011; Reiss et al. 2005). For example, Reiss and Ogden proposed a functional generalized linear regression model with images as predictors (Reiss and Ogden 2010). However, predicting breast cancer based on the tissue images directly is like a black-box. Physicians will have a hard time to summarize the common features from the cancerous images, and the prediction results are not easily interpreted. In this paper, we study the scaling information from the tissue image and then predict breast cancer based on the estimated scaling parameter. It has been found in literatures that the scaling information is efficient and accurate in early detection of breast cancer (Hamilton et al. 2011; Nicolis et al. 2011; Ramírez-Cobo and Vidakovic 2013; Jeon et al. 2014). In fact, regular scaling is a common phenomenon in high-frequency signals and high-resolution digital images collected in real life. Examples can be found in a variety of fields including economics, telecommunications, physics, geosciences, as well as in biology and medicine (Feng and Vidakovic 2017; Engel Jr et al. 2009; Gregoriou et al. 2009; Katul et al. 2001; Park and Willinger 2000; Woods et al. 2016; Zhou 1996).

The standard measure of regular scaling is the Hurst exponent, denoted by H in the sequel. Recall that a stochastic process \(\left \{X\left (\boldsymbol {t}\right ), \boldsymbol {t}\in \mathbb {R}^d\right \}\) is self-similar with Hurst exponent H if, for any \(\lambda \in \mathbb {R}^+\), \(X\left (\boldsymbol {t}\right )\overset {\mathrm {d}}{=}\lambda ^{-H}X\left (\lambda \boldsymbol {t}\right )\). Here the notation \(\overset {\mathrm {d}}{=}\) means the equality in all finite-dimensional distributions. The Hurst exponent quantifies the self-similarity and describes the rate at which autocorrelations decrease as the lag between two realizations in a time series increases. A value H in the range 0–0.5 indicates a zig-zagging intermittent time series with long-term switching between high and low values in adjacent pairs. A value H in the range 0.5 to 1 indicates a time series with long-term positive autocorrelations, which preserves trends on a longer time horizon and gives a time series more regular appearance.

Multiresolution analysis is one of the many methods to estimate the Hurst exponent. An overview can be found in Abry et al. (2000, 1995, 2013). In particular, the non-decimated wavelet transforms (NDWT) (Nason and Silverman 1995; Vidakovic 2009; Percival and Walden 2006) has several potential advantages when employed for Hurst exponent estimation. Input signals and images of arbitrary size can be transformed in a straightforward manner due to the absence of decimation. As a redundant transform, the NDWT can decrease variance in the scaling estimation (Kang and Vidakovic 2017). Least square regression can be fitted to estimate H instead of weighted least square regression since the variances of the level-wise derived distributions based on log NDWT coefficients do not depend on level. Local scaling can be assessed due to the time-invariance property. Of course, the dependence of coefficients in NDWT is much more pronounced. Similar to Soltani et al. (2004), we will control this dependence by systematic sampling of coefficients on which the estimator is based.

Different wavelet-based methods for estimation of H have been proposed in the literature for the one-dimensional case. Abry et al. (2000) suggested the estimation of H by weighted least square regression using the level-wise \(\log _2\left (\overline {d_j^2}\right )\), In addition, the authors corrected for the bias caused by the order of taking the logarithm and the average in \(\log _2\left (\overline {d_j^2}\right )\), where d j indicates any detail coefficient at level j. We use d j,k to denote the kth coefficient at level j in the sequel. Soltani et al. (2004) defined a mid-energy as \(D_{j,k}=\left (d_{j,k}^2+d_{j,k+N_j/2}^2\right )\big /2\), and showed that the level-wise averages of log2 D j,k are asymptotically normal and more stable, which is used to estimate H by regression. The estimators in Soltani et al. (2004) consistently outperform the estimators in Abry et al. (2000). Shen et al. (2007) showed that the method of Soltani et al. (2004) yields more accurate estimators since it takes the logarithm of the mid-energy first and then averages.

The robust estimation of H has recently become a topic of interest due to the presence of outlier coefficients and outlier multiresolution levels, inter and within level dependences, and distributional contaminations (Franzke et al. 2012; Park and Park 2009; Shen et al. 2007; Sheng et al. 2011). Hamilton et al. (2011) came up with a robust approach based on Theil-type weighted regression (Theil 1992), a method for robust linear regression that selects the weighted average of all slopes defined by different pairs of regression points. Like the VA method, they regress the level-wise \(\log _2\left (\overline {d_j^2}\right )\) against the level indices, but instead of weighted least square regression, they use the Theil-type weighted regression to make it less sensitive to outlier levels. Kang and Vidakovic (2017) proposed MEDL and MEDLA methods based on non-decimated wavelets to estimate H. MEDL estimates H by regressing the medians of \(\log d_j^2\) on level j, while MEDLA uses the level-wise medians of \(\log \left (\left (d_{j,k_1}^2+d_{j,k_2}^2\right )\big /2\right )\) to estimate H, where k 1 and k 2 are properly selected locations at level j to approximate the independence.

Both MEDL and MEDLA use the median of the derived distribution instead of the mean, because the medians are more robust to potential outliers that can occur when logarithmic transform of a squared wavelet coefficient is taken and the magnitude of coefficient is close to zero. Although median is outlier-resistant, it can behave unexpectedly as a result of its non-smooth character. The fact that the median is not “universally the best outlier-resistant estimator” motivates us to develop the general trimean estimators of the level-wise derived distributions to estimate H, where the general trimean estimator was derived as a weighted average of the distribution’s median and two quantiles symmetric about the median, combining the median’s emphasis on center values with the quantiles’ attention to the tails. Tukey’s trimean estimator (Tukey 1977; Andrews and Hampel 2015) and Gastwirth estimator (Gastwirth 1966; Gastwirth and Cohen 1970; Gastwirth and Rubin 1969) are two special cases under such general framework.

In this paper, we are concerned with the robust estimation of Hurst exponent in self-similar signals. Here, the focus is on images, but the methodology applies to multiscale context of arbitrary dimension. The properties of the proposed Hurst exponent estimators are studied both theoretically and numerically. The performance of the robust approach is compared with other standard wavelet-based methods (Veitch and Abry (VA) method, Soltani, Simard, and Boichu (SSB) method, median based estimators MEDL and MEDLA, and Theil-type (TT) weighted regression method).

The rest of the paper consists of six additional sections and an Appendix. Section 5.2 discusses background of non-decimated wavelet transforms and wavelet-based spectrum in the context of estimating the Hurst exponent for fractional Brownian motion (fBm). Section 5.3 introduces the general trimean estimators and discusses two special estimators following that general framework; Sect. 5.4 describes estimation of Hurst exponent using the general trimean estimators, presents distributional results on which the proposed methods are based, and derives optimal weights that minimize the variances of the estimators. Section 5.5 provides the simulation results and compares the performance of the proposed methods to other standardly used, wavelet-based methods. The proposed methods are applied to classify the digitized mammogram images as cancerous or non-cancerous in Sect. 5.6. The paper is concluded with a summary and discussion in Sect. 5.7.

2 Background

2.1 Non-decimated Wavelet Transforms

The non-decimated wavelet transforms (NDWT) (Nason and Silverman 1995; Vidakovic 2009; Percival and Walden 2006) are redundant transforms because they are performed by repeated filtering with a minimal shift, or a maximal sampling rate, at all dyadic scales. Subsequently, the transformed signal contains the same number of coefficients as the original signal at each multiresolution level. We start by describing algorithmic procedure of 1-D NDWT and then expand to 2-D NDWT. Traditionally, we perform a wavelet transformation as a convolution of an input data with wavelet and scaling filters. A principal difference between NDWT and DWT is the sampling rate.

Any square integrable function \(f(x)\in \boldsymbol {L}_2(\mathbb {R})\) can be expressed in the wavelet domain as

$$\displaystyle \begin{aligned}f(x)=\sum_k c_{J_0,k}\phi_{J_0,k}(x)+\sum_{j\geq J_0}^\infty\sum_k d_{j,k}\psi_{j,k}(x),\end{aligned}$$

where \(c_{J_0,k}\) denote coarse coefficients, d j,k indicate detail coefficients, \(\phi _{J_0,k}(x)\) represent scaling functions, and ψ j,k(x) signify wavelet functions. For specific choices of scaling and wavelet functions, the basis for NDWT can be formed from the atoms

$$\displaystyle \begin{aligned}\phi_{J_0,k}(x)=2^{J_0/2}\phi\left(2^{J_0}\left(x-k\right)\right)\ \mbox{and}\end{aligned}$$
$$\displaystyle \begin{aligned}\psi_{j,k}(x)=2^{j/2}\psi\left(2^j\left(x-k\right)\right),\end{aligned}$$

where \(x\in \mathbb {R}\), j is a resolution level, J 0 is the coarsest level, and k is the location of an atom. Notice that atoms for NDWT have the constant location shift k at all levels, yielding the finest sampling rate on any level. The coarse coefficients \(c_{J_0,k}\) and detail coefficients d j,k can be obtained via

$$\displaystyle \begin{aligned} c_{J_0,k}=\int f(x)\ \phi_{J_0,k}(x)dx\ \ \mbox{and}\ \ \ d_{j,k}=\int f(x)\ \psi_{j,k}(x)dx. \end{aligned} $$
(5.1)

In a J-level decomposition of an 1-D input signal of size N, an NDWT will yield N × (J + 1) wavelet coefficients, including N × 1 coarse coefficients and N × J detail coefficients.

Expanding on the 1-D definitions, we could easily describe 2-D NDWT of f(x, y) with \((x,y)\in \mathbb {R}^2\). Several versions of 2-D NDWT exist, but we only focus on the scale-mixing version based on which our methods are proposed. For the scale-mixing 2-D NDWT, the wavelet atoms are

$$\displaystyle \begin{aligned}\phi_{J_{01},J_{02};\boldsymbol{k}}(x,y) =2^{(J_{01}+J_{02})/2}\phi(2^{J_{01}}(x-k_1))\phi(2^{J_{02}}(y-k_2)),\end{aligned}$$
$$\displaystyle \begin{aligned}\psi_{J_{01},j_2;\boldsymbol{k}}(x,y) =2^{(J_{01}+j_2)/2}\phi(2^{J_{01}}(x-k_1))\psi(2^{j_2}(y-k_2)),\end{aligned}$$
$$\displaystyle \begin{aligned}\psi_{j_1,J_{02};\boldsymbol{k}}(x,y) =2^{(j_1+J_{02})/2}\psi(2^{j_1}(x-k_1))\phi(2^{J_{02}}(y-k_2)),\end{aligned}$$
$$\displaystyle \begin{aligned}\psi_{j_1,j_2;\boldsymbol{k}}(x,y) =2^{(j_1+j_2)/2}\psi(2^{j_1}(x-k_1))\psi(2^{j_2}(y-k_2)),\end{aligned}$$

where k = (k 1, k 2) is the location index, J 01 and J 02 are coarsest levels, j 1 > J 01, and j 2 > J 02. The wavelet coefficients for f(x, y) after the scale-mixing NDWT can be obtained as

$$\displaystyle \begin{aligned} \begin{aligned} c_{J_{01},J_{02};\boldsymbol{k}}&=\iint f(x,y)\ \phi_{J_{01},J_{02};\boldsymbol{k}}(x,y) dxdy,\\ h_{J_{01},j_2;\boldsymbol{k}}&=\iint f(x,y)\ \psi_{J_{01},j_2;\boldsymbol{k}}(x,y) dxdy,\\ v_{j_1,J_{02};\boldsymbol{k}}&=\iint f(x,y)\ \psi_{j_1,J_{02};\boldsymbol{k}}(x,y) dxdy,\\ d_{j_1,j_2;\boldsymbol{k}}&=\iint f(x,y)\ \psi_{j_1,j_2;\boldsymbol{k}}(x,y) dxdy. \end{aligned} \end{aligned} $$
(5.2)

Note that \(c_{J_{01},J_{02};\boldsymbol {k}}\) are coarse coefficients and represent the coarsest approximation, \(h_{J_{01},j_2;\boldsymbol {k}}\) and \(v_{j_1,J_{02}}\) represent the mix of coarse and detail information, and \(d_{j_1,j_2;\boldsymbol {k}}\) carry information about details only. In our methods, only detail coefficients \(d_{j_1,j_2;\boldsymbol {k}}\) are used to estimate H.

2.2 The fBm: Wavelet Coefficients and Spectra

Among models having been proposed for analyzing the self-similar phenomena, arguably the most popular is the fractional Brownian motion (fBm) first described by Kolmogorov (1940) and formalized by Mandelbrot and Van Ness (1968).

In this section, an overview of 1-D fBm and its extension to 2-D fBm is provided. Consider a stochastic process \(\{X(t),t\in \mathbb {R}\}\) is self-similar with Hurst exponent H, then the 1-D detail coefficients defined in (5.1) satisfy

$$\displaystyle \begin{aligned}d_{jk}\overset{\mathrm{d}}{=}2^{-j(H+1/2)}d_{0k},\end{aligned}$$

for a fixed level j (Abry et al. 2003). If the process has stationary increments, i.e., X(t + h) − X(t) is independent of t, then \(\mathbb {E}(d_{0k})=0\) and \(\mathbb {E}(d_{0k}^2)=\mathbb {E}(d_{00}^2)\). We obtain

$$\displaystyle \begin{aligned} \mathbb{E}\left(d_{jk}^2\right) \propto 2^{-j(2H+1)}. \end{aligned} $$
(5.3)

The Hurst exponent can be estimated by taking logarithms on both sides of Eq. (5.3). The wavelet spectrum is defined by the sequence \(\left \{S(j)=\log \mathbb {E}\left (d_{jk}^2\right ), j\in \mathbb {Z}\right \}\). Fractional Brownian motion (fBm), denoted as B H(t) is the unique Gaussian process with stationary increments that is self-similar (Abry et al. 2003; Abry 2003). The definition of the one-dimensional fBm can be extended to the multivariate case. In particular, a two-dimensional fBm, B H(t), for t ∈ [0, 1] × [0, 1] and H ∈ (0, 1), is a Gaussian process with stationary zero-mean increments, satisfying

$$\displaystyle \begin{aligned}B_H(a\boldsymbol{t})\overset{\mathrm{d}}{=}a^HB_H(\boldsymbol{t}).\end{aligned}$$

It can be shown that the detail coefficients \(d_{j_1,j_2;\boldsymbol {k}}\) defined in Eq. (5.2) satisfy

$$\displaystyle \begin{aligned}\log_2\mathbb{E}\left(|d_{j_1,j_2;\boldsymbol{k}}|{}^2\right)=-(2H+2)j+C,\end{aligned}$$

which defines the two-dimensional wavelet spectrum, from which the Hurst exponent can be estimated. Our proposed methods in next sections are based on but improve from this spectrum.

3 General Trimean Estimators

Let X 1, X 2, …, X n be i.i.d. continuous random variables with pdf f(x) and cdf F(x). Let 0 < p < 1, and let ξ p denote the pth quantile of F, so that \(\xi _p = \inf \{x| F(x) \geq p\}.\) If F is monotone, the pth quantile is simply defined as F(ξ p) = p.

Let Y p = X np⌋:n denote a sample pth quantile. Here ⌊np⌋ denotes the greatest integer that is less than or equal to np. The general trimean estimator is defined as a weighted average of the distribution’s median and its two quantiles Y p and Y 1−p, for p ∈ (0, 1∕2):

$$\displaystyle \begin{aligned} \hat{\mu}=\frac{\alpha}{2}\ Y_{p}+\left(1-\alpha\right)\ Y_{1/2}+\frac{\alpha}{2}\ Y_{1-p}. \end{aligned} $$
(5.4)

The weights for the two quantiles are the same for Y p and Y 1−p, and α ∈ [0, 1]. This is equivalent to the weighted sum of the median and the average of Y p and Y 1−p with weights 1 − α and α:

$$\displaystyle \begin{aligned} \hat{\mu}=\left(1-\alpha\right)\ Y_{1/2}+\alpha\ \left(\frac{Y_{p}+Y_{1-p}}{2}\right). \end{aligned}$$

This general trimean estimator turns out to be more robust than mean but smoother than the median. To derive its asymptotic distribution, the asymptotic joint distribution of sample quantiles is needed, as shown in Lemma 5.1; detailed proof can be found in DasGupta (2008).

Lemma 5.1

Consider r sample quantiles, \(Y_{p_1}, Y_{p_2},\ldots .,Y_{p_r}\) , where 1 ≤ p 1 < p 2 < … < p r ≤ n. If for any 1 ≤ i  r, \(\sqrt {n}\left (\lfloor np_i\rfloor /n-p_i\right )\to 0\) is satisfied, then the asymptotic joint distribution of \(Y_{p_1}, Y_{p_2},\ldots .,Y_{p_r}\) is:

where

$$\displaystyle \begin{aligned}\varSigma=\left(\sigma_{ij}\right)_{r\times r},\end{aligned}$$

and

$$\displaystyle \begin{aligned} \sigma_{ij}=\frac{p_i\left(1-p_j\right)}{f\left(x_{p_i}\right)f\left(x_{p_j}\right)},\ i\le j. \end{aligned} $$
(5.5)

From Lemma 5.1, the asymptotic distribution of general trimean estimator will be normal as a linear combination of the components each with an asymptotic normal distribution. The general trimean estimator itself may be defined in terms of order statistics as

$$\displaystyle \begin{aligned}\hat{\mu}=A\cdot\boldsymbol{y},\end{aligned}$$

where

$$\displaystyle \begin{aligned}A=\left[\frac{\alpha}{2}\ \ \ 1-\alpha\ \ \ \frac{\alpha}{2}\right],\ \ \mbox{and}\ \ \boldsymbol{y}=\left[Y_{p}\ \ \ Y_{1/2}\ \ \ Y_{1-p}\right]^T.\end{aligned}$$

It can be easily verified that \(\sqrt {n}\left (\lfloor pn\rfloor /n-p\right )\to 0\) for \(p\in \left (0,1/2\right ]\). If we denote \(\boldsymbol {\xi }=\left [\xi _{p}\ \ \ \xi _{1/2}\ \ \ \xi _{1-p}\right ]^T\) the population quantiles, the asymptotic distribution of y is

where \(\varSigma =\left (\sigma _{ij}\right )_{3\times 3},\) and σ ij follows Eq. (5.5) for p 1 = p, p 2 = 1∕2, and p 3 = 1 − p. Therefore

with the theoretical expectation and variance being

$$\displaystyle \begin{aligned} \mathbb{E}\left(\hat{\mu}\right)=\mathbb{E}\left(A\cdot\boldsymbol{y}\right)=A\cdot\mathbb{E}\left(\boldsymbol{y}\right)=A\cdot\boldsymbol{\xi}, \end{aligned} $$
(5.6)

and

$$\displaystyle \begin{aligned} \operatorname{\mathrm{Var}}\left(\hat{\mu}\right)=\operatorname{\mathrm{Var}}\left(A\cdot\boldsymbol{y}\right)=A\operatorname{\mathrm{Var}}\left(\boldsymbol{y}\right)A^T=\frac{1}{n}A\varSigma A^T. \end{aligned} $$
(5.7)

3.1 Tukey’s Trimean Estimator

Tukey’s trimean estimator is a special case of the general trimean estimators, with α = 1∕2 and p = 1∕4 in Eq. (5.4). To compute this estimator, we first sort the data in ascending order. Next, we take the values that are one-fourth of the way up this sequence (the first quartile), half way up the sequence (i.e., the median), and three-fourths of the way up the sequence (the third quartile). Given these three values, we then form the weighted average, giving the central (median) value a weight of 1∕2 and the two quartiles a weight of 1∕4 each.

If we denote Tukey’s trimean estimator as \(\hat {\mu }_T,\) then

$$\displaystyle \begin{aligned}\hat{\mu}_T=\frac{1}{4}\ Y_{1/4}+\frac{1}{2}\ Y_{1/2}+\frac{1}{4}\ Y_{3/4}.\end{aligned}$$

The asymptotic distribution is

where \(A_T=\left [\frac {1}{4}\ \ \ \frac {1}{2}\ \ \ \frac {1}{4}\right ]\), \(\boldsymbol {\xi }_T=\left [\xi _{1/4}\ \ \ \xi _{1/2}\ \ \ \xi _{3/4}\right ]^T\), \(\varSigma _T=\left (\sigma _{ij}\right )_{3\times 3}\) is the covariance matrix of the asymptotic multivariate normal distribution, and σ ij follows Eq. (5.5) with p 1 = 1∕4, p 2 = 1∕2, and p 3 = 3∕4.

3.2 Gastwirth Estimator

As Tukey’s estimator, the Gastwirth estimator is another special case of the general trimean estimators, with α = 0.6 and p = 1∕3 in Eq. (5.4).

If we denote this estimator as \(\hat {\mu }_G\), then

$$\displaystyle \begin{aligned}\hat{\mu}_G=0.3\ Y_{1/3}+0.4\ Y_{1/2}+0.3\ Y_{2/3}.\end{aligned}$$

The asymptotic distribution can be derived as

where \(A_G=\left [0.3\ \ \ 0.4\ \ \ 0.3\right ]\), \(\boldsymbol {\xi }_G=\left [\xi _{1/3}\ \ \ \xi _{1/2}\ \ \ \xi _{2/3}\right ]^T\), \(\varSigma _G=\left (\sigma _{ij}\right )_{3\times 3}\), and σ ij follows Eq. (5.5) with p 1 = 1∕3, p 2 = 1∕2, and p 3 = 2∕3.

4 Methods

Our proposal for robust estimation of Hurst exponent H is based on non-decimated wavelet transforms (NDWT). In a J-depth decomposition of a 2-D fBm of size N × N, a scale-mixing 2-D NDWT generates (J + 1) × (J + 1) blocks of coefficients, with each block the same size as original image, i.e., N × N. The tessellation of coefficients of scale-mixing 2-D NDWT is shown in Fig. 5.1a. From the 2-D NDWT wavelets coefficients, our methods use the diagonal blocks (j 1 = j 2 = j) of the detail coefficients \(d_{j_1,j_2;\boldsymbol {k}}\) to predict H, as is shown in Fig. 5.1b.

Fig. 5.1
figure 1

(a) Four types of wavelet coefficients with their locations in the tessellation of a 2-D scale mixing NDWT of depth of 3 (J = 3), with each block the size of N × N. Coefficients c represent the coarsest approximation, h and v are the mix of coarse and detail information, and d carry detail information only. (b) Detail coefficients d and its diagonal blocks corresponding to 3 (J = 3) levels. (c) Symmetric random sampling from level-1 (j = 1) diagonal block divided into 6 × 6 (M = 6) grids

At each detail level j, the corresponding level-j diagonal block is of size N × N, the same size as original image. Note that those coefficients d j,j;k in level-j diagonal block are not independent, however, their autocorrelations decay exponentially, that is, they possess only the short memory. We reduce such within block dependency by dividing the block into M × M equal grids and then random sampling one coefficient from each grid, therefore increasing the distance between two consecutive coefficients. To improve the efficiency, here we apply symmetric sampling. To be specific, we partition the level-j diagonal block into four equal parts (top left, top right, bottom left, and bottom right), only sample from the M 2∕4 grids at the top left, and then get the corresponding coefficients that have the same location in other parts, which is shown in Fig. 5.1c.

If assuming the coefficient \(d_{j,j;(k_{i1},k_{i2})}\) is randomly sampled from grid \(i\in \{1,\ldots ,\frac {M^2}{4}\}\) at the top left part of level-j diagonal block, and \(k_{i1}, k_{i2} \in \{1,2,\ldots ,\frac {N}{2}\}\) being the corresponding location indexes, then we can extract corresponding coefficients \(d_{j,j;(k_{i1},k_{i2}+\frac {N}{2})}\), \(d_{j,j;(k_{i1}+\frac {N}{2},k_{i2})}\), and \(d_{j,j;(k_{i1}+\frac {N}{2},k_{i2}+\frac {N}{2})}\) from the top right, bottom left, and bottom right parts, respectively. From the set

$$\displaystyle \begin{aligned}\{d_{j,j;(k_{i1},k_{i2})}, d_{j,j;(k_{i1},k_{i2}+\frac{N}{2})}, d_{j,j;(k_{i1}+\frac{N}{2},k_{i2})}, d_{j,j;(k_{i1}+\frac{N}{2},k_{i2}+\frac{N}{2})}\},\end{aligned}$$

we could generate two mid-energies as

$$\displaystyle \begin{aligned} \begin{aligned} &D_{i, j}=\frac{d_{j,j;(k_{i1},k_{i2})}^2+d_{j,j;(k_{i1}+\frac{N}{2},k_{i2}+\frac{N}{2})}^2}{2}\\ &D_{i, j}^{\prime}=\frac{d_{j,j;(k_{i1},k_{i2}+\frac{N}{2})}^2+d_{j,j;(k_{i1}+\frac{N}{2},k_{i2})}^2}{2},\ i\in\{1,\ldots,\frac{M^2}{4}\}, \end{aligned} \end{aligned} $$
(5.8)

where D i,j and \(D_{i, j}^{\prime }\) denote the two mid-energies corresponding to grid i at level j. If we denote D j as the set of all mid-energies at level j, then

$$\displaystyle \begin{aligned} D_{j}=\{D_{1, j}, D_{1, j}^{\prime}, D_{2, j}, D_{2, j}^{\prime},\ldots, D_{\frac{M^2}{4}, j}, D_{\frac{M^2}{4}, j}^{\prime}\}. \end{aligned} $$
(5.9)

The M 2∕2 mid-energies at each level j are treated as if they are independent. Note that M must be divisible by 2.

Our methods have two different versions, one is based on mid-energies D j, while the other is using logged mid-energies \(\log {D_{j}}\) (in bracket). First, the distribution of D j \(\left (\log {D_{j}}\right )\) is derived under the independence approximation between \(d_{j,j;(k_{i1},k_{i2})}\), \(d_{j,j;(k_{i1},k_{i2}+\frac {N}{2})}\), \(d_{j,j;(k_{i1}+\frac {N}{2},k_{i2})}\), and \(d_{j,j;(k_{i1}+\frac {N}{2},k_{i2}+\frac {N}{2})}\). Next, we calculate the general trimean estimators from the level-wise derived distributions to estimate H.

4.1 General Trimean of the Mid-energy (GTME) Method

At each decomposition level j, the asymptotic distribution of the general trimean estimator on M 2∕2 mid-energies in D j is derived, from which we find the relationship between the general trimean estimators and H. The general trimean of the mid-energy (GTME) method is described in the following theorem:

Theorem 5.1

Let \(\hat {\mu }_{j}\) be the general trimean estimator based on the M 2∕2 mid-energies in D j defined by (5.9) at level j in a J-level NDWT of a 2-D fBm of size N × N with Hurst exponent H. Then, the asymptotic distribution of \(\hat {\mu }_{j}\) is normal,

(5.10)

where

$$\displaystyle \begin{aligned}c\left(\alpha, p\right)= \frac{\alpha}{2}\log\left(\frac{1}{p\left(1-p\right)}\right)+\left(1-\alpha\right)\log2,\end{aligned}$$
$$\displaystyle \begin{aligned}f\left(\alpha,p\right)= \frac{\alpha(1-2p)(\alpha-4p)}{4p(1-p)}+1,\end{aligned}$$
$$\displaystyle \begin{aligned}\lambda_j=\sigma^2\cdot2^{-\left(2H+2\right)j},\end{aligned}$$

and σ 2 is the variance of wavelet coefficients from level 0, the Hurst exponent can be estimated as

$$\displaystyle \begin{aligned} \hat{H}=-\frac{\hat{\beta}}{2}-1, \end{aligned} $$
(5.11)

where \(\hat {\beta }\) is the regression slope in the least square linear regression on pairs \(\left (j, \log _2\left (\hat {\mu }_{j}\right )\right )\) from level J 1 to J 2 , J 1 ≤ j  J 2 . The estimator \(\hat {H}\) follows the asymptotic normal distribution

(5.12)

where the asymptotic variance V 1 is a constant number independent of simple size N and level j,

$$\displaystyle \begin{aligned} V_1=\frac{6f(\alpha,p)}{(\log2)^2M^2c^2\left(\alpha,p\right)q(J_1,J_2)}, \end{aligned}$$

and

$$\displaystyle \begin{aligned} q(J_1,J_2)=(J_2-J_1)(J_2-J_1+1)(J_2-J_1+2). \end{aligned} $$
(5.13)

The proof of Theorem 5.1 is deferred to the Appendix.

To find the optimal α and p by minimizing the asymptotic variance of \(\hat {\mu }_{j}\), we take partial derivatives of \(f\left (\alpha , p\right )\) with respect to α and p and set them to 0. The optimal \(\hat {\alpha }\) and \(\hat {p}\) can be obtained by solving

$$\displaystyle \begin{aligned} \begin{aligned} &\frac{\partial f\left(\alpha,p\right)}{\partial\alpha}=-\frac{2p-1}{2p\left(1-p\right)}\alpha+\frac{1+p}{2\left(1-p\right)}-\frac{3}{2}=0, \\ &\frac{\partial f\left(\alpha,p\right)}{\partial p}=\frac{\alpha\left(2-\alpha\right)}{2\left(1-p\right)^2}+\frac{\alpha^2\left(2p-1\right)}{4p^2\left(1-p\right)^2}=0.\\ \end{aligned} \end{aligned} $$
(5.14)

Since α ∈ [0, 1] and \(p\in \left (0,1/2\right )\), we get the unique solution α = 2p ≈ 0.6 and \(p=1-\sqrt {2}/2\approx 0.3\). The Hessian matrix of \(f\left (\alpha , p\right )\) is

Since \(-\frac {2p-1}{2p\left (1-p\right )}>0\) and the determinant is 5.66 > 0 when α = 2p ≈ 0.6 and \(p=1-\sqrt {2}/2\approx 0.3\), the above Hessian matrix is positive definite. Therefore, \(\hat {\alpha }=2-\sqrt {2}\) and \(\hat {p}=1-\sqrt {2}/2\) provide the global minima of \(f\left (\alpha , p\right )\), minimizing also the asymptotic variance of \(\hat {\mu }_{j,i}\). In comparing these optimal \(\hat {\alpha }\approx 0.6\) and \(\hat {p}\approx 0.3\) with α = 0.6 and p = 1∕3 from the Gastwirth estimator, curiously, we find that the optimal general trimean estimator is very close to the Gastwirth estimator.

4.2 General Trimean of the Logarithm of Mid-energy (GTLME) Method

Previously discussed the GTME method calculates the general trimean estimator of the mid-energy first and then takes the logarithm. In this section, we will calculate the general trimean estimator of the logged mid-energies at each level j. The following theorem describes the general trimean of the logarithm of mid-energy, the GTLME method.

Theorem 5.2

Let \(\hat {\mu }_{j}\) be the general trimean estimator based on \(\log (D_j)\) , which is the set of M 2∕2 logged mid-energies at level j in a J-level NDWT of a 2-D fBm of size N × N with Hurst exponent H, and 1 ≤ j  J. Then, the asymptotic distribution of \(\hat {\mu }_{j}\) is normal,

(5.15)

where

$$\displaystyle \begin{aligned}c\left(\alpha, p\right)=\frac{\alpha}{2}\log\left(\log\frac{1}{1-p}\cdot\log\frac{1}{p}\right)+\left(1-\alpha\right)\log\left(\log2\right),\end{aligned}$$
$$\displaystyle \begin{aligned}f\left(\alpha, p\right)=\frac{\alpha^2}{4g_1\left(p\right)}+\frac{\alpha\left(1-\alpha\right)}{2g_2\left(p\right)}+\frac{\left(1-\alpha\right)^2}{\left(\log2\right)^2},\end{aligned}$$

\(g_1\left (p\right )\) and \(g_2\left (p\right )\) are two functions of p given in the Appendix,

$$\displaystyle \begin{aligned}\lambda_j=\sigma^2\cdot2^{-\left(2H+2\right)j},\end{aligned}$$

and σ 2 is the variance of wavelet coefficients from level 0. The Hurst exponent can be estimated as

$$\displaystyle \begin{aligned} \hat{H}=-\frac{1}{2\log2}\hat{\beta}-1, \end{aligned} $$
(5.16)

where \(\hat {\beta }\) is the regression slope in the least square linear regressions on pairs \(\left (j, \hat {\mu }_{j}\right )\) from level J 1 to J 2 , J 1 ≤ j  J 2 . The estimator \(\hat {H}\) follows the asymptotic normal distribution

(5.17)

where the asymptotic variance V 2 is a constant number independent of simple size N and level j,

$$\displaystyle \begin{aligned} V_2=\frac{6f(\alpha,p)}{(\log2)^2M^2q(J_1,J_2)}, \end{aligned}$$

and q(J 1, J 2) is given in Eq.(5.13).

The proof of Theorem 5.2 is provided in the Appendix. Similarly, as for the GTME, the optimal α and p which minimize the asymptotic variance of \(\hat {\mu }_{j}\) can be obtained by solving

$$\displaystyle \begin{aligned} \frac{\partial f\left(\alpha,p\right)}{\partial\alpha}=0, \ \mbox{and}\ \frac{\partial f\left(\alpha,p\right)}{\partial p}=0. \end{aligned} $$
(5.18)

From the first equation in (5.18) it can be derived that

$$\displaystyle \begin{aligned}\alpha=\frac{\frac{2}{\log\left( 2\right)^2}-\frac{1}{2}g_2\left(p\right)}{\frac{1}{2}g_1\left(p\right)-g_2\left(p\right)+\frac{2}{\left(\log2\right)^2}}.\end{aligned}$$

The second equation in (5.18) cannot be simplified to a finite form. As an illustration, we plot the \(f\left (\alpha ,p\right )\) with p ranging from 0 to 0.5 and α being a function of p. The plot of α against p is also shown in Fig. 5.2. Numerical computation gives \(\hat {\alpha }=0.5965\) and \(\hat {p}=0.24\). These optimal parameters are close to α = 0.5 and p = 0.25 in the Tukey’s trimean estimator, but put some more weight on the median.

Fig. 5.2
figure 2

Plot of \(f \left (\alpha , p \right )\) against p on the left; plot of α against p on the right

4.3 Special Cases: Tukey’s Trimean and Gastwirth Estimators

The Tukey’s trimean of the mid-energy (TTME) method and Gastwirth of the mid-energy (GME) method are described in the following Lemma.

Lemma 5.2

Let \(\hat {\mu }_{j}^T\) and \(\hat {\mu }_{j}^G\) be the Tukey’s trimean and Gastwirth estimators based on D j defined in (5.9). Then the asymptotic distributions of \(\hat {\mu }_{j}^T\) and \(\hat {\mu }_{j}^G\) are normal:

(5.19)
(5.20)

where c 1 and c 2 are constant numbers and can be found in the Appendix, \(\lambda _j=\sigma ^2\cdot 2^{-\left (2H+2\right )j}\) , and σ 2 is the variance of wavelet coefficients from level 0. The Hurst exponent can be estimated as

$$\displaystyle \begin{aligned} \hat{H}^T=-\frac{\hat{\beta}^T}{2}-1, \ \mathit{\mbox{and}}\ \hat{H}^G=-\frac{\hat{\beta}^G}{2}-1, \end{aligned} $$
(5.21)

where \(\hat {\beta }^T\) and \(\hat {\beta }^G\) are the regression slopes in the least square linear regression on pairs \(\left (j, \log _2\left (\hat {\mu }_{j}^T\right )\right )\) and pairs \(\left (j, \log _2\left (\hat {\mu }_{j}^G\right )\right )\) from level J 1 to J 2 , J 1 ≤ j  J 2 . The estimators \(\hat {H}^T\) and \(\hat {H}^G\) follow the asymptotic normal distributions

(5.22)

where the asymptotic variances \(V^T_1\) and \(V^G_1\) are constant numbers,

$$\displaystyle \begin{aligned} V^T_1=\frac{5}{(\log2)^2M^2c^2_1q(J_1,J_2)}, \end{aligned}$$
$$\displaystyle \begin{aligned} V^G_1=\frac{5.01}{(\log2)^2M^2c^2_2q(J_1,J_2)}. \end{aligned}$$

The function q(J 1, J 2) is the same as Eq. (5.13) in Theorem 5.1.

The following Lemma describes the Tukey’s trimean (TTLME) and Gastwirth (GLME) of the logarithm of mid-energy method.

Lemma 5.3

Let \(\hat {\mu }_{j}^T\) and \(\hat {\mu }_{j}^G\) be the Tukey’s trimean estimator and Gastwirth estimator based on \(\log (D_j)\) defined in the Theorem 5.2 . The asymptotic distributions of \(\hat {\mu }_{j}^T\) and \(\hat {\mu }_{j}^G\) are normal,

(5.23)
(5.24)

where c 3 ,V T , c 4 , and V G are constant numbers and can be found in the Appendix. The Hurst exponent can be estimated as

$$\displaystyle \begin{aligned} \hat{H}^T=-\frac{\hat{\beta}^T}{2\log2}-1,\ \mathit{\mbox{and}}\ \hat{H}^G=-\frac{\hat{\beta}^G}{2\log2}-1, \end{aligned} $$
(5.25)

where \(\hat {\beta }^T\) and \(\hat {\beta }^G\) are the regression slopes in the least square linear regression on pairs \(\left (j, \hat {\mu }_{j}^t\right )\) and pairs \(\left (j, \hat {\mu }_{j}^g\right )\) from level J 1 to J 2 , J 1 ≤ j  J 2 . The estimators \(\hat {H}^T\) and \(\hat {H}^G\) follow the asymptotic normal distributions

(5.26)

where the asymptotic variances \(V^T_2\) and \(V^G_2\) are constant numbers,

$$\displaystyle \begin{aligned} V^T_2=\frac{3V_T}{(\log2)^2q(J_1,J_2)}, \end{aligned}$$
$$\displaystyle \begin{aligned} V^G_2=\frac{3V_G}{(\log2)^2q(J_1,J_2)}. \end{aligned}$$

The function q(J 1, J 2) is provided in Eq. (5.13).

The proofs of Lemmas 5.2 and 5.3 are provided in the Appendix. To verify the asymptotic normal distributions of predictors in Lemmas 5.2 and 5.3, we perform an NDWT of depth 10 on 300 simulated fBm’s with H = 0.3. We use resulting wavelet coefficients from levels 4 to 10 inclusive to estimate H. Figure 5.3 shows the histograms and theoretical distributions of \(\hat {H}\) using TTME, TTLME, GME, and GLME methods, respectively.

Fig. 5.3
figure 3

Histograms and theoretical distributions of \(\hat {H}\)

5 Simulation

We simulate 2-D fBm of sizes 210 × 210 (N = 210) with Hurst exponent H = 0.3, 0.5, 0.7, 0.8, 0.9, respectively. NDWT of depth J = 10 using Haar wavelet is performed on the simulated signal to obtain wavelet coefficients. The two-dimensional fBm signals were simulated based on the method of Wood and Chan (1994).

The proposed methods (with six variations) are applied on the NDWT detail coefficients to estimate Hurst exponent H. Each level diagonal block is divided into 16 × 16 grids (M = 16) for all proposed methods, and we use wavelet coefficients from levels 4 to 10 for the least square linear regression. The estimation performance of the proposed methods is compared to five other existing methods: Veitch and Abry (VA) method, Soltani, Simard, and Boichu (SSB) method, MEDL method, MEDLA method, and Theil-type regression (TT) method. The GTME and GTLME methods are based on the optimal parameters which minimize the variances. Estimation performance is reported in terms of mean, variance, and mean square error (MSE) based on 300 repetitions for each case.

The simulation results are shown in Table 5.1. For each H (corresponding to each row in the table), the smallest variances and MSEs are highlighted in bold. From simulations results, all our six variations outperform SSB, MEDL, MEDLA, and TT methods for all H’s regarding variances and MSEs. Compared with VA method, our methods yield significantly smaller variances and MSEs when H > 0.5. When H = 0.3, our methods are still comparable to VA. Although the performances of our six variations are very similar regarding variances and MSEs, the TTME method based on Tukey’s trimean estimator of the mid-energy has the best performance among all of them. The variances of GTME based on the optimal parameters are very close or equal to those of GME and TTME methods in most cases. Besides, in most cases the optimized GTLME method is superior to other logged mid-energy methods TTLME and GLME with respect to variances; however, such superiority is not significant, since the variances are close to each other.

Table 5.1 Simulation results for 210 × 210 fBm using Haar wavelet (300 replications)

6 Application

In this section, we apply the proposed methodology to classification of digitized mammogram images. The digitized mammograms were obtained from the University of South Florida’s Digital Database for Screening Mammography (DDSM) (Heath et al. 2000). All cases examined had biopsy results which served as ground truth. Researchers used the HOWTEK scanner at the full 43.5-micron per pixel spatial resolution to scan 45 mammograms from patients with normal studies (control group) and 79 from patients with confirmed breast cancer (study group). Figure 5.4 shows an example of mammograms from study group, and it is almost impossible for physicians to distinguish a cancerous mammogram with a non-cancerous mammogram just by eyes. Each subject contains two mammograms from a screening exam, one craniocaudal projection for each side breast. We only keep one projection for each subject, either right side or left side breast image. A sub-image of size 1024 × 1024 was taken manually from each mammogram.

Fig. 5.4
figure 4

An example of mammograms with breast cancer

Our methods were then applied on each sub-image to estimate the Hurst exponent parameter for each subject. To be specific, the NDWT of depth J = 10 using Haar wavelet was performed on each sub-image to obtain wavelet coefficients. The proposed methods (with six variations) are applied on the NDWT detail coefficients to estimate Hurst exponent H. Each level diagonal block is divided into 16 × 16 grids (M = 16) for all proposed methods, and we use levels 4 to 10 for the least square linear regression. Veitch and Abry (VA) method, Soltani, Simard, and Boichu (SSB) method, MEDL method, MEDLA method, and Theil-type regression (TT) method were applied, as well, to compare with our methods.

Table 5.2 provides descriptive statistics of the estimated Hurst exponent \(\hat {H}\) in each group using our proposed methods and other standard methods to compare with. To visualize the difference in \(\hat {H}\) across cancer and non-cancer groups, we present in Fig. 5.5 the boxplots of estimated H and fitted normal density curves in two groups based on proposed GME method. As can be seen, the non-cancer group exhibited a smaller value for \(\hat {H}\) in both the mean and median, and the variance of \(\hat {H}\) is slightly larger. In fact, images with smaller Hurst exponent tend to be more disordered and unsystematic, therefore healthy individuals tend to have more rough breast tissue images.

Fig. 5.5
figure 5

Using GME method to estimate Hurst exponent, boxplots in cancer and non-cancer groups on the left; normal density curves fitted in cancer and non-cancer groups on the right

Table 5.2 Descriptive statistics group summary

For subject i, we generated the data {Y i, H i}, where H i represents the estimated Hurst exponent, and Y i is the indicator of the disease status with 1 and 0 signifying cancer and non-cancer, respectively. The subjects were classified using a logistic regression model by treating H i as the predictor and Y i as the response. The overall classification accuracy, true positive rate (sensitivity), and true negative rate (specificity) were obtained by using a fourfold-cross validation. Instead of the constant 0.5 threshold, we used a training-data-determined adaptive threshold, i.e., each time the threshold of the logistic regression was first chosen to maximize Youden index on the training set and then applied to the testing set to classify.

Table 5.3 summarizes the results of the classification for each estimation method. The best classification rate (0.6538) and sensitivity (0.7217) were both achieved using GME estimator, and the best specificity (0.5530) was achieved using TT or TTME estimator (highlighted in bold). In general, the six variations of our robust method performed better as compared to other methods in classification of breast cancers using mammograms.

Table 5.3 Results of classification by logistic regression

Real-world images like mammograms may be characterized by non-stationary conditions such as extreme values, causing outlier coefficients in multiresolution levels after NDWT. VA method estimates H by weighted least square regression using the level-wise \(\log _2\left (\overline {d_{j,j}^2}\right )\), and SSB method uses \(\overline {\log _2D_{j}}\), with D j defined in (5.9), they are easily affected by those within level outliers, in that they both use mean of derived distributions on level-wise detail coefficients to estimate H. Besides, potential outliers can also occur when logarithmic transform is taken and the magnitude of coefficient is close to zero. Like the VA method, TT method regress the level-wise \(\log _2\left (\overline {d_{j,j}^2}\right )\) against the level indices, but instead of weighted least square regression, they use the Theil-type weighted regression, the weighted average of all slops between different pairs of regression points, to make it less sensitive to outlier levels. However, it is still not robust to within level outlier coefficients. MEDL and MEDLA use the median of the derived distribution instead of the mean. Although median is outlier-resistant, it can behave unexpectedly as a result of its non-smooth character. To improve, our methods (six derivations) use the general trimean estimator on non-decimated wavelet detail coefficients of the transformed data, combining the median’s emphasis on central values with the quantiles’ attention to the extremes. Besides, in the context of our scenario, Theil-type regression is equivalent to least square regression, since the variance of our pair-wise slop is independent of levels and sample size. Those explain why our robust methods performed the best in classification of mammograms.

7 Conclusions

In this paper, we proposed methodologies and derived six variations to improve the robustness of estimation of Hurst exponent H in two-dimensional setting. Non-decimated wavelet transforms (NDWT) are utilized for its redundancy and time-invariance. Instead of using mean or median of the derived distribution on level-wise wavelet coefficients, we defined the general trimean estimators that combine the median’s emphasis on center values with the quantiles’ attention to the extremes and used them on the level-wise derived distributions to estimate H.

The proposed variations were: (1) Tukey’s trimean of the mid-energy (TTME) method; (2) Tukey’s trimean of the logged mid-energy (TTLME) method; (3) Gastwirth of the mid-energy (GME) method; (4) Gastwirth of the logged mid-energy (GLME) method; (5) general trimean of the mid-energy (GTME) method; (6) general trimean of the logarithm of mid-energy (GTLME) method. The GTME and GTLME methods are based on the derived optimal parameters in general trimean estimators to minimize the asymptotic variances. Tukey’s trimean and Gastwirth estimators are two special cases following the general trimean estimators’ framework. These estimators are applied on both mid-energy (as defined by Soltani et al. 2004) and logarithm of the mid-energy at each NDWT level detail coefficient diagonal block. The estimation performance of the proposed methods is compared to five other existing methods: Veitch and Abry (VA) method, Soltani, Simard, and Boichu (SSB) method, MEDL method, MEDLA method, and Theil-type regression (TT) method.

Simulation results indicate all our six variations outperform SSB, MEDL , MEDLA, and TT methods for all H’s regarding variances and MSEs. Compared with VA method, our methods yield significantly smaller variances and MSEs when H > 0.5. When H = 0.3, our methods are still comparable to VA. Although the performances of our six variations are very similar regarding variances and MSEs, the TTME method based on Tukey’s trimean estimator of the mid-energy has the best performance among all of them.

The proposed methods have been applied to digitized mammograms to classify patients with and without breast cancer. Our methods helped to differentiate individuals based on the estimated Hurst parameters \(\hat {H}\). Higher values for \(\hat {H}\) have been found in cancer group, and individuals with breast cancer have smoother breast tissue images. This increase of regularity with increase of the degree of pathology is common for many other biometric signals: EEG, EKG, high frequency protein mass-spectra, high resolution medical images of tissue, to list a few.