Local linear wavelet neural network for breast cancer recognition

Senapati, M. R.; Mohanty, A. K.; Dash, S.; Dash, P. K.

doi:10.1007/s00521-011-0670-y

Local linear wavelet neural network for breast cancer recognition

Original Article
Published: 30 June 2011

Volume 22, pages 125–131, (2013)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Neural Computing and Applications Aims and scope Submit manuscript

Local linear wavelet neural network for breast cancer recognition

Download PDF

M. R. Senapati¹,
A. K. Mohanty¹,
S. Dash² &
…
P. K. Dash³

848 Accesses
28 Citations
Explore all metrics

Abstract

Breast cancer is the major cause of cancer deaths in women today and it is the most common type of cancer in women. Many sophisticated algorithm have been proposed for classifying breast cancer data. This paper presents some experiments for classifying breast cancer tumor and proposes the use local linear wavelet neural network for breast cancer recognition by training its parameters using Recursive least square (RLS) approach to improve its performance. The difference of the local linear wavelet network with conventional wavelet neural network (WNN) is that the connection weights between hidden layer and output layer of conventional WNN are replaced by a local linear model. The result quality has been estimated and compared with other experiments. Results on extracted breast cancer data from University of Wisconsin Hospital Madison show that the proposed approach is very robust, effective and gives better classification.

A New Hybrid Approach for Medical Image Intelligent Classifying Using Improved Wavelet Neural Network

Breast Cancer Diagnosis: An Intelligent Detection System Using Wavelet Neural Network

Wavelet energy entropy and linear regression classifier for detecting abnormal breasts

Article 22 November 2016

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Breast cancer has become a major cause of death among women in developed countries [1, 2]. Over one ten in Europe and one in eight women in United States may be affected by breast cancer during their life time [3].

Early diagnosis requires an accurate and reliable diagnosis procedure that allows physicians to distinguish benign breast tumors from malignant ones. Thus, finding an accurate and effective diagnosis method is very important. Biopsy is the best way to accurately determine whether the tumor is benign or malignant. However, it is invasive and expensive, and positive findings at biopsy for cancer are low, between 10 and 31% [4–6].

Much effort has been devoted over the past decade to the development and improvement of pattern classification models for breast cancer detection [7–9]. Several researchers have used statistical and artificial intelligence to successfully “predict” breast cancer. Basically, the objective of these prediction techniques is to assign patients to either a “benign” group that does not have breast cancer or to a “malignant” group that has strong evidence of breast cancer.

Recently, local linear wavelet neural networks [10, 11] have been introduced as a very effective scheme for statistical pattern recognition problem and non-linear complex predictions.

In this paper, a local linear wavelet neural network (LLWNN) extends the application of [10, 11] and is proposed for breast cancer detection, in which the connection weights between the hidden layer units and the output units are replaced by a local linear model, and the parameters of the network are updated using RLS. The usually used learning algorithm for WNN is gradient descent method. But its disadvantages are slow convergence speed and easy stay at local minimum. Because of this RLS with adaptive diversity learning is proposed for training the LLWNN. Simulation results for breast cancer Pattern Classification problem were compared with some other classification techniques, i.e., training RBFNN by RLS, RBFNN by Kalman filter, and RBFNN by Back propagation. The details about Kalman filter can be obtained from [12, 13]. The result thus derived shows the effectiveness of the proposed method. The main contributions of this paper are (1) the LLWNN providing a more parsimonious interpolation in high-dimension spaces when modeling samples are sparse; (2) a novel training algorithm for LLWNN was proposed. The paper is organized as follows. The LLWNN is introduced in Sect. 2. A RLS learning algorithm for training LLWNN is described in Sect. 3. A short discussion as well as experimental results obtained on Pattern Classification for Wisconsin Breast Cancer (WBC) problem is given in Sect. 4. Finally, concluding remarks are derived in the last section, i.e., Sect. 5.

2 Local linear wavelet neural network

In terms of wavelet transformation theory, wavelets in the following form:

$$ \psi (x )= \prod\limits_{{i{ = 1}}}^{n} {\psi (x_{i} )} $$

(1)

$$ X{ = (}x_{ 1} ,x_{ 2} ,\ldots ,x_{n} ) , $$

$$ b_{i} { = (}b_{i 1} ,b_{i 2} ,\ldots ,{\text{b}}_{in} ) , $$

are a family of functions generated from one single function ψ(x) by the operation of dilation and translation. ψ(x), which is localized in both the time space and the frequency space, is called a mother wavelet and the parameters a _i and b _i are named the scale and translation parameters, respectively. The x represents inputs to the WNN model.

In the standard form of WNN, the output of a WNN is given by

$$ f (x )= \sum\limits_{i = 1}^{M} {{{\omega}}_{i} } {{\psi}}_{i} (x )= \sum\limits_{i = 1}^{M} {{{\omega}}_{i} } \left| {a_{i} } \right|^{ - 1 / 2} {{\psi}}\left( {\frac{{x - b_{i} }}{{a_{i} }}} \right) $$

(2)

where ψ_i is the wavelet activation function of ith unit of the hidden layer, and ω_i is the weight connecting the ith unit of the hidden layer to the output layer unit. Note that for the n-dimensional input space, the multivariate wavelet basis function can be calculated by the tensor product of n single wavelet basis functions as follows

$$ \psi (x) = \prod\limits_{i = 1}^{n} {\psi (x_{i} )} $$

(3)

Obviously, the localization of the ith units of the hidden layer is determined by the scale parameter a_i and the translation parameter b _i. According to the previous researches, the two parameters can either be predetermined based upon the wavelet transformation theory or be determined by a training algorithm. Note that the above WNN is a kind of basis function neural network in the sense of that the wavelets consists of the basis functions. Note that an intrinsic feature of the basis function networks is the localized activation of the hidden layer units, so that the connection weights associated with the units can be viewed as locally accurate piecewise constant models whose validity for a given input is indicated by the activation functions. Compared with the multilayer perceptron neural network, this local capacity provides some advantages such as the learning efficiency and the structure transparency. However, the problem of basis function networks is also led by it. Due to the crudeness of the local approximation, a large number of basis function units have to be employed to approximate a given system. A shortcoming of the WNN is that for higher dimensional problems, many hidden layer units are needed.

Local linear wavelet network in fact is a modification of WNN. The architecture of the proposed LLWNN is shown in Fig. 1. Its output in the output layer is given by

$$ \begin{aligned} y = & \sum\limits_{i = 1}^{M} {(\omega_{i0} + \omega_{i1} x_{1} + \cdots + \omega_{in} x_{n} )\psi_{i} (x)} \\ = & \sum\limits_{i = 1}^{M} {(\omega_{i0} + \omega_{i1} x_{1} + \cdots + \omega_{in} x_{n} )\left| {a_{i} } \right|^{ - 1/2} \psi \left( {\frac{{x - b_{i} }}{{a_{i} }}} \right)} \\ \end{aligned} $$

(4)

where X = [x ₁,x ₂,…, x _n] instead of the straight forward weight $ \omega_{i} $ (piecewise constant model), a linear model

$$ v_{i} = \omega_{i0} + \omega_{i1} x_{ 1} + \cdots + \omega_{in} x_{\text{n}} $$

(5)

is introduced. The activities of the linear models v _i (I = 1,2,…, M) are determined by the associated locally active wavelet functions $ {{\psi}}_{i} (x ) $ (I = 1,2,…, M) thus, v _i is only locally significant. The motivations for introducing the local linear models into a WNN are as follows: (1) local linear models have been studied in some neuro fuzzy systems and shown good performances [14, 15] and (2) local linear models should provide a more parsimonious interpolation in high-dimension spaces when modeling samples are sparse. The scale and translation parameters and local linear model parameters are randomly initialized at the beginning and are optimized by recursive least square algorithm discussed in the following section.

3 Recursive least square

The recursive least square (RLS) is a parameter identification technique. In RLS algorithm, there are two variables involved in the recursions (those with time index n − 1): $ \hat{w}(i - 1) $, P(i − 1). We must provide initial values for these variables in order to start the recursions:. $ w(0) $

If we have some apriori information about the parameters $ \hat{w} $, this information will be used to initialize the algorithm. Otherwise, the typical initialization is $ w (0) = 0 $. P(0)

$$ P(i) = \left[ {\sum\limits_{n = 1}^{i} {\lambda^{i - 1} \psi (n)\psi (n)^{T} } } \right]^{ - 1} $$

the exact initialization of the recursions uses a small initial segment of the data $ \psi ({{i}}_{1}) $;$ \psi (i_{ 1} + 1) \ldots,\psi (0) $ to compute

$$ P ( 0 )= \left[ {\sum\limits_{n = 1}^{0} {\lambda^{ - 1} \psi (n )\psi (n )^{T} } } \right]^{ - 1} $$

All the necessary equations to form the RLS algorithm are

$$ k(i) = P(i - 1)\varphi T(i)/\lambda + P(i - 1)\varphi T(i) $$

(6)

$$ w(j) = w_{j} (i - 1) + k(i)\left[ {d_{j} (i) - w_{j} (i - 1)\varphi {\text{T}}(i)} \right] $$

(7)

$$ P(i) = 1/\lambda \left[ {P(i - 1) - k(i)\varphi (i){\text{P}}(i - 1)} \right] $$

(8)

where λ is real number between 0 and 1, P(0) = a − 1 I, and a is a small positive number and w _j(0) = 0.

4 Implementation and comparative study

We apply the local linear wavelet neural network explained in Sect. 2 to Wisconsin Breast Cancer (WBC) databases and compare its performances to the most common classification methods in both computer science and statistics literatures. The databases can be downloaded from University of Wisconsin Hospital, Madison^{Footnote 1}

All the computations are implemented using MATLAB V6.5^{Footnote 2} under Pentium IV personal computer with a clock speed of 2.4 GHz and the equations were written using Math Type or Microsoft Equation Editor. As in commonly done, we normalize the input variables to make sure that they are independent of measurement units. Thus, the predictors are normalized to interval of (0:1) using the formula:

$$ x_{i}^{\text{new}} { = }\frac{{x_{i}^{\text{old}} - x_{1:n} }}{{x_{n:n} - x_{1:n} }} $$

(9)

where $ x_{1:n} $ is the ith order statistics of x ₁, x ₂,…, x _n. We use the stratified sampling technique to make sure that we get the same proportion from each group in the original data. We randomly hold a total of (k = round $ \frac{n}{ 5} $) or (k = round $ \frac{n}{ 1 0} $) observations, with k _l = round $ \frac{n}{ 1 0} $ observations from the classl, where nl is the number of observations of the given data set in group l, for l = 0, 1,…, c − 1. Therefore, to evaluate the performance of each classifier on a real-application, we use either 5-fold or 10-fold cross-validation. Thus, we fold the given data into 5 or 10 parts, and we use 0:8 or 0:7 of the data for learning the classification model (building) and 0.2 or 0.3 for external validation (testing). The new technique LLWNN-RLS is compared with a wide range of classifiers to evaluate its performance with respect to correct classification rate and the time it takes to get trained. As defined in [16, 17], the best network model was selected based on the following criterions.

1.
Correct Classification Rate (CCR) and Average Squared Classification Error (ASCE):

$$ {\text{CCR = }}\frac{{\sum_{{k{ = 0}}}^{c - 1} {{\text{CC}}_{k} } }}{n} ;\;{\text{ ASCE = }}\frac{{\sum_{{k{ = 0}}}^{c - 1} { [n_{k} - {\text{cc}}_{k} ]^{ 2} } }}{n} $$

(10)

where n _k is the number of observations in class k, and CCR is the number of correctly classified observations in the class k. The best functional network is the one with both highest CCR and smallest ASCE.

We construct the confusion matrix, which is a c × c matrix, its diagonal contains the number of correctly classified observations, CCR, and the off-diagonal elements are the number of misclassified observations, mck, for k = 0, c − 1.

2.
Computational cost (Time of execution)

It is the time needed to execute the classifier till obtaining the best model in both calibration and validation. The less computation cost is the better classifier.

3.
The Minimum Description Length (MDL) criterion

As explained in [16, 17], the best model is the one with the smallest MDL value. The form of the description length for the classification problem using the functional network is defined as

$$ L (Q_{k} ) { = }\frac{{m{\text{ log (}}nk )}}{ 2}{ + }\frac{nk}{ 2}{ \log }\left( {\frac{ 1}{nk}\sum\limits_{{i{ = 1}}}^{n} {{\text{e}}_{i}^{2} } (Q_{k} )} \right) $$

(11)

for all k = 0,…, c − 1, where m and k are the number of elements in the family and the category levels, respectively. We note that the principle $ L (Q_{k} ) $ is the code length of the estimated parameters Θ_k, ∀_k = 0, 1, 2,…, c − 1

We note that the description length has two terms:

(a)
The first term $ \frac{{m{\text{log(}}n_{k} )}}{ 2} $ is a penalty for including too many parameters in the functional network model.
(b)
The second term $ \frac{{n_{k} }}{ 2}{ \log }\left( {\frac{ 1}{{n_{k} }}\sum_{{i{ = 1}}}^{n} {\varepsilon_{i}^{2} } (\Uptheta_{k} )} \right) $ measures the quality of the functional network model fitted to the training set. Therefore, the best model is the model with the smallest value of its description length. MDL is the best model performance. Both description and relevant work on the data set under study are represented below:

4.1 Wisconsin breast cancer (WBC):

The data set were obtained from University of Wisconsin Hospital; Madison WBC is a nine-dimensional data set with the following features:

(1) Clump thickness; (2) Uniformity of cell size;(3) Uniformity of cell shape; (4) Marginal adhesion;(5) Single epithelial cell size; (6) Bare nuclei; (7) Bland chromatin; (8) Normal nucleoli; and (9) Mitoses. For our classification purpose, 400 exemplars were used for training and the 299 exemplars for testing from a total of 699 exemplars. Several researchers studied WBC database and proved that the best three attributes are mean texture, worst mean area and worst mean smoothness [18].

We are utilizing external validation techniques as it is shown in [16, 17]. We repeat the estimation and validation processes for N = 1,000 times, then compute all the quality measures explained in Sect. 4 for all classifiers. Next, we summarize the results by computing the average, the standard deviation and the coefficient of variation of each quality measure over these 1,000 runs.

In addition, we draw two graphs to reach the conclusion: one for the mean of CCR versus its standard deviation and the other for the mean of the ASCE versus mean of MDL. These graphs help us to decide which classifier is better in its performance. In both plots, each classifier is represented by a symbol.

In the graph of the mean of CCR versus its standard deviation, a good classifier should appear in the lower right corner of the graph. In the graph of mean ASCE verses mean MDL, a good classifier should appear in the bottom left of the plot. In addition, corresponding to these graphs, we summarize the results in Tables. In these Tables, the highest CCR’s are given in boldface.

For the sake of simplicity and space, we did the implementations for two predictors and three predictors to check the performance of the LLWNN-RLS classifier against other classifiers.

4.2 Result and discussion

From Tables 1 & 2 and Figs. 2 & 3 of average of CCR versus its standard deviation, we observe, for example, the following:

Table 1 WBC data: the external validation results with 2 predictors

Full size table

Table 2 WBC data: the external validation results with 3 predictors

Full size table

1.
The two classifiers: RBFNN and RBFNN-Kalman filter neural networks are the worst performance.
2.
The LLWNN-RLS is giving the highest values of the average CCR in the high dimensional data with less time of computations.
3.
The LLWNN-RLS, RBFNN-RLS are giving the highest values of the average CCR.
4.
The LLWNN-RLS gives both smallest MDL and smallest ASCE. In addition, its execution time is much lower than other classifiers.

We draw the comparative study by utilizing the information provided in Table 3 and conclusion in Sect. 5 by utilizing the useful information shown in Tables 1 and 2.
Table 3 Classification accuracy on the Wisconsin Breast Cancer (WBC) dataset
Full size table

Rule for classification of WBC data sets using LLWNN and RLS

$$ \begin{gathered} \mathop {{\text{if}}\left( {{\text{oo}}\left( {{\text{r}}, 1} \right) \ge 0.0 6\;\&\; {\text{oo}}\left( {{\text{r}}, 1} \right) \le \, 0. 4 6 2 7} \right)\;\&\; \left( {{\text{oo}}\left( {{\text{r}}, 2} \right) \ge 0.0 6 1 1\;\&\; {\text{oo}}\left( {{\text{r}}, 2} \right) \le 0. 4 1} \right)\quad}{{{\text{then\,Benign}};}} \hfill \\ \mathop {{\text{if}}\left( {{\text{oo}}\left( {{\text{r}}, 1} \right) \ge - 3. 7 7 1 8\;\&\; {\text{oo}}\left( {{\text{r}}, 1} \right) \le 0.0 3 9 4} \right)\;\&\; \left( {{\text{oo}}\left( {{\text{r}}, 2} \right) \ge - 3. 2 9 9 7\;\&\;{\text{oo}}\left( {{\text{r}}, 2} \right) \le 0.0 3 6} \right)\quad}{{{\text{then\,Malignant}};}} \hfill \\ \end{gathered}$$

4.3 Comparative study with existing methods

The proposed technique is compared with some of the existing techniques [17, 19–21]. The comparison is depicted in Table 3. The result of the comparison shows that the proposed technique gives better classification as compared to some of the existing techniques.

5 Conclusion

Even though mammography is one of the best techniques for breast tumor detection, but in some cases, despite their experience radiologists cannot detect tumors. There computer-aided methods like the one presented in this paper could assist medical staff and increase the accuracy of detection. Statistics show that only 20–30% of breast tumor cases are cancerous. In a false negative detection, if an actual tumor remains undetected that could lead to higher costs or even to the cost of a human life. Here is the trade off that motivated us to develop a classification system

In this paper, we presented a technique for breast tumor classification. The objective of this study is to examine the effectiveness of LLWNN for classifying breast cancer data. The technique was compared with different methods already developed. We showed empirically that the proposed approach has better performance, high quality and generalization than common existing approaches. As it is known that data mining techniques are more suitable to larger databases, we intend to use a larger data base, from medical science and/or business sector to evaluate the performance of the proposed technique. Also the technique needs to be evaluated using time series data to validate the findings.

Notes

http://ailab.si/orange/doc/datasets/breast-cancer-wisconsin-cont.htm.
For m files and/or discussion, mail to manas_senapati@sify.com.

References

Cancer net home page of National Cancer Institute, (2002), http://biomed.nus.sg
Jemal A, Murray T, Ward E, Samuels A, Tiwary RC, Ghafoor A, Feuer EJ, Thun MJ (2005) Cancer statistics 2005, CA. Cancer J Clin 55:10–30
Article Google Scholar
Bothorel S, Meunier BB, Muller SA (1997) Fuzzy logic based approach for semilogical analysis of microcalification in mammographic images. Int J Intell Syst 12:819–848
Article Google Scholar
Bassett LW, Liu TH, Giuliano AE, Gold RH (1991) The prevalence of carcinoma in palpable versus impalpable, mammographically detected lesions. AJR 157:21–24
Google Scholar
Gisvold JJ, Martin JK Jr (1984) Pre biopsy localization of non palpable breast lesions. AJR Am J Roentgenol 143(3):477–481
Google Scholar
Rubin M, Horiuchi K, Joy N (1997) Use of fine needle aspiration for solid breast lesions is accurate and cost effective. Am J Surg 174(6):694–696
Article Google Scholar
Downs J, Harrison RF, Cross SS (1998) A decision support tool for the diagnosis of breast cancer based upon Fuzzy ARTMAP. Neural Comput Appl 7(2):147–165
Article MATH Google Scholar
Cheng HD, Shan J, Ju W, Guo Y, Jhang L (2010) Automated breast cancer detection and classification using ultra sound images, A survey. Pattern Recogn 43:299–317
Article MATH Google Scholar
Huang Y-L, Wang K-L, Chen D-R (2006) Diagnosis of breast tumors with ultrasonic texture analysis using support vector machines. Neural Comput Appl 15(2):164–169
Article Google Scholar
Chen Y, Yang B, Dong J (2006) Time series prediction using a local linear wavelet neural network. Neuro Comput 69:449–465
Google Scholar
Chen Y, Dong J, Yang B, Zhang Y (2004) local linear wavelet neural network. Fifth world congress on intelligent control and automation (WCIA), Hangzhou, pp. 1954–1957
de Jesus Rubio J, Pacheco J (2009) An stable online clustering fuzzy neural network for nonlinear system identification. Neural Comput Appl 18(6):633–641
Article Google Scholar
An Introduction to Kalman Filter by Welch and Bishop, Welch@cs.unc.edu, Available: http://ww.cs.unc.edu/~welch
Fischer B, Nelles O, Isermann R (1998) Adaptive predictive control of a heat exchanger based on a fuzzy model. Control Eng Pract 6:259–269
Article Google Scholar
Foss B, Johansen TA (1993) On local and fuzzy modeling. In: Proceedings of 3rd international industrial fuzzy control and intelligent systems. IEEE, pp. 80–87. doi: 10.1109/IFIS.1993.324209
Cheng HD, Shan J, Ju W, Guo Y, Jhang L (2010) Automated breast cancer detection and classification using ultra sound images, a survey. Pattern Recogn 43:299–317
Article MATH Google Scholar
El-Sebakhy EA, Faisal KA, Helmy T, Azzedin F, Al-Suhaim A (2006) Evaluation of breast cancer tumor classification with unconstrained functional networks classifier. IEEE international conference on computer systems and applications, pp. 281–287. doi:10.1109/AICCSA.2006.205102
El-Sebakhy E (2004) Functional networks as a new frame work for the pattern classification problems, PhD Thesis, Cornell University, USA
Mat A, Harsa S, Salleh NM, Othman NH (2009) Neural network inputs selection for breast cancer cells classification. KES-IDT’09, studies in computational intelligence, Springer 199: 1–11, doi:10.1007/978-3-642-00909-9
Kordos M (2005) Search-based algorithms for multilayer perceptrons. PhD Thesis, The Silesian University of Technology, Gliwice, available at http://www.fizyka.umk.pl/kordos/pdf/mKordos-PhD.pdf
Ster B, Dobnikar A (1996) Neural network in medical diagnosis: comparison with other methods. EANN’96, pp. 427–430
Hassanien AE, Ali JMH. Feature extraction and rule classification algorithm of digital mammography based on rough set theory. Available at www.wseas.us/e-library/conferences/digest2003/papers/463-104.pdf

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Gandhi Engineering College, Biju Patnaik University of Technology, Rourkela, Orissa, 769007, India
M. R. Senapati & A. K. Mohanty
Balasore College of Engineering and Technology, Balasore, Orissa, 756060, India
S. Dash
S ‘O’ A University, Bhubaneswar, Orissa, 751030, India
P. K. Dash

Authors

M. R. Senapati
View author publications
You can also search for this author in PubMed Google Scholar
A. K. Mohanty
View author publications
You can also search for this author in PubMed Google Scholar
S. Dash
View author publications
You can also search for this author in PubMed Google Scholar
P. K. Dash
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to M. R. Senapati.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Senapati, M.R., Mohanty, A.K., Dash, S. et al. Local linear wavelet neural network for breast cancer recognition. Neural Comput & Applic 22, 125–131 (2013). https://doi.org/10.1007/s00521-011-0670-y

Download citation

Received: 16 April 2011
Accepted: 14 June 2011
Published: 30 June 2011
Issue Date: January 2013
DOI: https://doi.org/10.1007/s00521-011-0670-y

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Local linear wavelet neural network for breast cancer recognition

Abstract

Similar content being viewed by others

A New Hybrid Approach for Medical Image Intelligent Classifying Using Improved Wavelet Neural Network

Breast Cancer Diagnosis: An Intelligent Detection System Using Wavelet Neural Network

Wavelet energy entropy and linear regression classifier for detecting abnormal breasts

1 Introduction

2 Local linear wavelet neural network

3 Recursive least square