Abstract
This paper presents a novel hybrid ensemble approach for classification in medical databases. The proposed approach is formulated to cluster extracted features from medical databases into soft clusters using unsupervised learning strategies and fuse the decisions using parallel data fusion techniques. The idea is to observe associations in the features and fuse the decisions made by learning algorithms to find the strong clusters which can make impact on overall classification accuracy. The novel techniques such as parallel neural-based strong clusters fusion and parallel neural network based data fusion are proposed that allow integration of various clustering algorithms for hybrid ensemble approach. The proposed approach has been implemented and evaluated on the benchmark databases such as Digital Database for Screening Mammograms, Wisconsin Breast Cancer, and Pima Indian Diabetics. A comparative performance analysis of the proposed approach with other existing approaches for knowledge extraction and classification is presented. The experimental results demonstrate the effectiveness of the proposed approach in terms of improved classification accuracy on benchmark medical databases.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Damien M, Graham JW, Jie C, Huidong J (2005) A delivery framework for health data mining and analytics. In: Proceedings of the twenty-eighth Australasian conference on computer science, Newcastle, Australia, pp 381–387
Gulbinat W (1997) What is the role of WHO as an intergovernmental organisation In: The coordination of telematics in healthcare? World Health Organisation. Geneva, Switzerland at http://www.hon.ch/libraray/papers/gulbinat.html
Handl J, Knowles J (2007) An evolutionary approach to multiobjective clustering. IEEE Trans Evol Comput 56–76
Korkmaz EE, Du J, Alhajj R, Barker K (2006) Combining advantages of new chromosome representation scheme and multi-objective genetic algorithms for better clustering. In: Proceedings of intelligent data analysis, pp 163–182
Boulis C, Ostendorf M (2004) Combining multiple clustering systems. In: Boulicaut J, Esposito F, Giannotti F, Pedreschi D (eds) 8th European conference on principles and practice of knowledge discovery in databases. Lecture notes in computer science, pp 63–74
Fred ALN, Jain AK (2005) Combining multiple clusterings using evidence accumulation. IEEE Trans Pattern Anal Mach Intell 835–850
Evgenia D, Andreas W, Kurt H (1999) Voting in clustering and finding the number of clusters. In: Bothe H, Oja E, Massad E, Haefke C (eds) Proceedings of the international symposium on advances in intelligent data analysis (AIDA 99). ICSC Academic Press, pp 291–296
Greene D, Tsymbal A, Bolshakova N, Cunningham P (2004) Ensemble clustering in medical diagnostics. In: Proceedings of the 17th IEEE symposium on computer-based medical systems. IEEE Comput Soc, Washington, pp 576–581
Lourenco A, Fred A (2005) Ensemble methods in the clustering of string patterns. In: Proceedings of the seventh IEEE workshops on application of computer vision. IEEE Comput Soc, Washington, pp 143–148
Greene D, Cunningham P (2006) Efficient ensemble methods for document clustering. Tech Rep TCD-CS-2006-48. Department of Computer Science, Trinity College Dublin
Chen D, Chang RF, Huang YL (2000) Breast cancer diagnosis using self-organizing map for sonography. Ultrasound Med Biol 405–411
West D, West V (2000) Model selection for a medical diagnostic decision support system: a breast cancer detection case. Artif Intell Med 183–204
Pattaraintakorn P, Cercone N, Naruedomkul K (2005) Hybrid intelligent systems: selecting attributes for soft-computing analysis. In: 29th annual international computer software and applications conference (COMPSAC), pp 319–325
Dietterich TG (2000) Ensemble methods in machine learning. In: First international workshop on multiple classifier systems. Lecture notes in computer science, pp 1–15
Hu X (2001) Using rough sets theory and database operations to construct a good ensemble of classifiers for data mining applications. In: IEEE ICDM, pp 233–240
Dudoit S, Fridlyand J (2003) Bagging to improve the accuracy of a clustering procedure. Bioinformatics 1090–1099
Fischer B, Buhmann JM (2003) Path-based clustering for grouping of smooth curves and texture segmentation. IEEE Trans Pattern Anal Mach Intell 513–518
Leisch F (1999) Bagged clustering. Working Papers. SFB adaptive information systems and modeling in economics and management science. Institut für Information, Abt. Produktionsmanagement, Wien, Wirtschaftsuniv
Fred ALN (2001) Finding consistent clusters in data partitions. In: Roli F, Kittler J (eds) Proc 3d Int workshop on multiple classifier systems. LNCS, vol 2364, pp 309–318
Fred ALN, Jain AK (2002) Data clustering using evidence accumulation. In: Proc of the 16th international conference on pattern recognition, pp 276–280
Kellam P, Liu X, Martin NJ, Orengo C, Swift S, Tucker A (2001) Comparing, contrasting and combining clusters in viral gene expression data. In: Proceedings of 6th workshop on intelligent data analysis in medicine and pharmocology, pp 56–62
Boulis C, Ostendorf M (2004) Combining multiple clustering systems. In: Boulicaut J, Esposito F, Giannotti F, Pedreschi D (eds) 8th European conference on principles and practice of knowledge discovery in databases. Lecture notes in computer science, pp 63–74
Martin HCL, Alexander PT, Anil KJ (2004) Multiobjective data clustering. In: IEEE computer society conference on computer vision and pattern recognition, pp 424–430
Evgenia D, Andreas W, Kurt H (1999) Voting in clustering and finding the number of clusters. In: Bothe H, Oja E, Massad E, Haefke C (eds) Proceedings of the international symposium on advances in intelligent data analysis (AIDA 99). ICSC Academic Press, pp 291–296
Greene D, Tsymbal A, Bolshakova N, Cunningham P (2004) Ensemble clustering in medical diagnostics. In: Proceedings of the 17th IEEE symposium on computer-based medical systems. IEEE Comput Soc, Washington, pp 576–581
Xiahua H, Illhoi Y (2004) Cluster ensemble and its applications in gene expression analysis. In: Proceedings of the second conference on Asia-Pacific bioinformatics. Dune din, New Zealand, vol 29, pp 297–302
Setiono R (2000) Generating concise and accurate classification rules for breast cancer diagnosis. Artif Intell Med 205–219
Blake CL, Merz CJ (1996) UCI repository of machine learning databases. Available from http://www.ics.uci.edu./~mlearn/MLReporsitory.html
Joachim D, Sabine B, Johann FD (1993) Segmentation of microcalcifications in mammograms. IEEE Trans Med Imag 12–18
Jerez-Aragones JM, Gomez-Ruiz JA, Ramos-Jimenez G, Munoz-Perez J, Alba-Conejo E (2003) A combined neural network and decision trees model for prognosis of breast cancer relapse. Artif Intell Med, pp 45–63
Kıyan T, Yıldırım T (2003) Breast cancer diagnosis using statistical neural networks. In: XII TAINN symposium proceedings, Çanakkale, Turkey 754–761
Kayaer K, Yıldırım T (2003) Medical diagnosis on Pima indian diabetes using general regression neural networks. In: Artificial neural networks and neural information processing (ICANN/ICONIP), Istanbul, Turkey, June 26–29, pp 181–184
Kemal P, Salih G, Ahmet A (2008) A cascade learning system for classification of diabetes disease: generalized discriminant analysis and least square support vector machine expert systems with applications, pp 482–487
Watkins AB (2005) Exploiting immunological metaphors in the development of serial, parallel, and distributed learning algorithms. PhD dissertation, University of Kent, Canterbury, March
Panchal R, Verma B (2006) Neural classification of mass abnormalities with different types of features in digital mammography. Int J Comput Intell Appl, pp 61–67
Verma B (2006) A neural learning algorithm for the diagnosis of breast cancer. IEEE international joint conference on neural networks, IJCNN’06, Canada. IEEE Press, New York, pp 10786–10791
Mahmoud RH, Yo-Sung H (2005) Automated detection of tumours in mammograms using two segments for classification, pp 910–921
Anna K, Ioannis B, Spyros S, Philippos S, Eleni L, George P, Lena C (2006) A texture analysis approach for characterizing microcalcifications on mammograms. In: International special topic conference on Information technology in bio medicine, pp 251–257
Osmar RZ, Maria-Luiza A, Alexandru C (2002) Mammography classification by an association rule-based classifier. In: Third international ACM SIGKDD workshop on multimedia data mining (MDM/KDD’2002) in conjunction with eighth ACM SIGKDD, Edmonton, Alberta, Canada, pp 62–69
Keir B, Sameer S (2002) Classification of mammographic breast density using a combined classifier paradigm. In: 4th international workshop on digital mammography, pp 177–180
Abonyi J, Szeifert F (2003) Supervised fuzzy clustering for the identification of fuzzy classifiers. Pattern Recogn Lett 2195–2207
Bennet KP, Blue JA (1997) A support vector machine approach to decision trees. Math Report, Rensselaer Polytechnic Institute, pp 97–100
Goodman DE, Boggess L, Watkins A (2003) An investigation into the source of power for AIRS, an artificial immune classification system. In: Proceedings of the international joint conference on neural networks (IJCNN ’03). IEEE Press, New York, pp 1678–1683
Hamilton HJ, Shan N, Cercone N (1996) RIAC: a rule induction algorithm based on approximate classification. Technical Report CS 96-06, University of Regina
Pena-Reyes CA, Sipper M (1999) A fuzzy-genetic approach to breast cancer diagnosis. Artif Intell Med 131–155
Quinlan JR (1996) Improved use of continuous attributes in C4.5. J Artif Intell Res 77–90
Polat K, Gunes S, Tosun S (2006) Diagnosis of heart disease using artificial immune recognition system and fuzzy weighted pre-processing. Pattern Recogn 2186–2193
Polat K, Ahan SS, Gunes S (2006) A new method for medical diagnosis: artificial immune recognition system (AIRS) with fuzzy weighted preprocessing and application to ECG arrhythmia. Expert Syst Appl 264–269
Weiss SM, Kapouleas I (1990) An empirical comparison of pattern recognition, neural nets and machine learning classification methods. In: Shavlik JW, Dietterich TG (eds) Readings in machine learning. Morgan Kauffmann, San Mateo
Ster B, Dobnikar A (1996) Neural networks in medical diagnosis: comparison with other methods. In: Proceedings of the international conference on engineering applications of neural networks (EANN ’96), pp 427–430
Mitra S, Banka H, Pedrycz W (2006) Rough-fuzzy collaborative clustering. IEEE Trans Syst Man Cybern, Part B 36(4):795–805
Wiering MA, van Hasselt H (2008) Ensemble algorithms in reinforcement learning. IEEE Trans Syst Man Cybern, Part B 38(4):930–936
Liu Y, Yao X (1999) Simultaneous training of negatively correlated neural networks in an ensemble. IEEE Trans Syst Man Cybern, Part B 29(6):716–725
Islam MM, Yao X, Shahriar SM, Islam MA, Murase K (2008) Bagging and boosting negatively correlated neural networks. IEEE Trans Syst Man Cybern, Part B 38(3):771–784
Parikh D, Polikar R (2007) An ensemble-based incremental learning approach to data fusion. IEEE Trans Syst Man Cybern, Part B 37(2):437–450
Hassan SZ, Verma B (2007) A hybrid data mining approach for knowledge extraction and classification in medical databases. In: 7th international conference on intelligent systems design and applications, Brazil, pp 503–510
Carpenter GA, Tan AH (1993) Rule extraction, fuzzy ARTMAP, and medical databases. In: Proceedings of world congress on neural networks, Portland, USA, vol I, pp 501–506
Carpenter GA (1997) Distributed learning, recognition, and prediction by ART and ARTMAP neural networks. Neural Netw 10(8):1473–1494
Carpenter GA, Grossberg S, Markuzon N, Reynolds J, Rosen D (1992) Fuzzy ARTMAP: A neural network architecture for incremental supervised learning of analog multidimensional maps. IEEE Trans Neural Netw 3(5):698–713
Carpenter GA, Grossberg S, Reynolds J (1991) ARTMAP: Supervised real-time learning and classification of nonstationary data by a self-organizing neural network. Neural Netw 4(5):565–588
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Verma, B., Hassan, S.Z. Hybrid ensemble approach for classification. Appl Intell 34, 258–278 (2011). https://doi.org/10.1007/s10489-009-0194-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-009-0194-7