Abstract
It has been shown that an ensemble of classifiers increases the accuracy compared to the member classifiers provided they are diverse. One way to produce this diversity is to base the classifiers on different case-bases. In this paper, we propose the mixture of experts for case-based reasoning (MOE4CBR), where clustering techniques are applied to cluster the case-base into k groups, and each cluster is used as a case-base for our k CBR classifiers. To further improve the prediction accuracy, each CBR classifier applies feature selection techniques to select a subset of features. Therefore, depending on the cases of each case-base, we would have different subsets of features for member classifiers.
Our proposed method is applicable to any CBR system; however, in this paper, we demonstrate the improvement achieved by applying the method to a computational framework of a CBR system called TA3. We evaluated the system on two publicly available data sets on mass-to-charge intensities for two ovarian data sets with different number of clusters. The highest classification accuracy is achieved with three and two clusters for the ovarian data set 8-7-02 and data set 4-3-02, respectively. The proposed ensemble method improves the classification accuracy of TA3 from 90% to 99.2% on the ovarian data set 8-7-02, and from 79.2% to 95.4% on the ovarian data set 4-3-02. We also evaluate how individual components in MOE4CBR contribute to accuracy improvement, and we show that feature selection is the most important component followed by the ensemble of classifiers and clustering.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Lenz, M., Bartsch-Sporl, B., Burkhard, H., Wess, S. (eds.): Case-Based Reasoning: experiences, lessons, and future directions. Springer, Heidelberg (1998)
Jurisica, I., Glasgow, J.: Application of case-based reasoning in molecular biology. Artificial Intelligence Magazine, Special issue on Bioinformatics 25(1), 85–95 (2004)
Petricoin, E.F., Ardekani, A.M., Hitt, B.A., Levine, P.J., Fusaro, V.A., Steinberg, S.M., Mills, G.B., Simone, C., Fishman, D.A., Kohn, E.C., Liotta, L.A.: Use of proteomic patterns in serum to identify ovarian cancer. Lancet 359(9306), 572–577 (2002)
Ricci, F., Aha, D.W.: Error-correcting output codes for local learners. In: Nedellec, C., Rouveirol, C. (eds.) Proceedings of the 10th European Conference on Machine Learning, pp. 280–291. Springer, Heidelberg (1998)
Cunningham, P., Zenobi, G.: Case representation issues for case-based reasoning from ensemble research. In: Aha, D.W., Watson, I. (eds.) Case-Based Reasoning Research and Development: 4th International Conference on Case-Based Reasoning, pp. 146–157. Springer, Heidelberg (2001)
Breiman, L.: Bagging predictors. Machine Learning 24, 123–140 (1996)
Ng, A.Y., Jordan, M.I., Weiss, Y.: On spectral clustering: Analysis and an algorithm. In: Dieterich, Z.G.G., Becker, S. (eds.) Advances in Neural Information Processing Systems, vol. 14, MIT Press, Cambridge (2002)
Hastie, T., Tibshirani, R., Friedman, J.: The elements of statistical learning. Springer, Heidelberg (2001)
Jurisica, I., Glasgow, J., Mylopoulos, J.: Incremental iterative retrieval and browsing for efficient conversational CBR systems. International Journal of Applied Intelligence 12(3), 251–268 (2000)
Xing, E.P.: Feature selection in microarray analysis. In: Berrar, D., Dubitzky, W., Granzow, M. (eds.) A practical approach to Microarray data analysis, pp. 110–131. Kluwer Academic publishers, Dordrecht (2003)
Quackenbush, J.: Computational analysis of microarray data. Nat. Rev. Genet. 2, 418–427 (2001)
Molla, M., Waddell, M., Page, D., Shavlik, J.: Using machine learning to design and interpret gene-expression microarrays. AI Magazine 25, 23–44 (2004)
Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Morgan Kauffmann Publishers, San Francisco (2000)
Yang, Q., Wu, J.: Keep it simple: a case-base maintenance policy based on clustering and information theory. In: Hamilton, H. (ed.) Proceedings of the 13th Biennial Conference of the Canadian Society for Computational Studies of Intelligence, Montreal, Canada. Advances in Artificial Intelligence, pp. 102–114. Springer, Heidelberg (2000)
Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the 2nd international Conference on knowledge discovery and data mining, Portland, OR, USA, pp. 226–231. AAAI Press, Menlo Park (1996)
Shiu, S.C., Yeung, D.S.: Transferring case knowledge to adaptation knowledge: An approach for case-base maintenance. Computational Intelligence 17, 295–314 (2001)
Smyth, B., McKenna, E.: Building compact competent case-bases. In: Althoff, K.-D., Bergmann, R., Branting, L.K. (eds.) ICCBR 1999. LNCS (LNAI), vol. 1650, pp. 329–342. Springer, Heidelberg (1999)
John, G., Kohavi, R., Pfleger, K.: Irrelevant features and the subset selection problem. In: Machine Learning: Proceedings of the eleventh international conference, pp. 121–129. Morgan Kaufmann, San Francisco (1994)
Aha, D.W., Bankert, R.L.: Feature selection for case-based classification of cloud types: an empirical comparison. In: Aha, D.W. (ed.) Proceedings of the AAAI-94 workshop on Case-Based Reasoning, pp. 106–112. AAAI Press, Menlo Park (1994)
Arshadi, N., Jurisica, I.: Data mining for case-based reasoning in high-dimensional biological domains. IEEE Transactions on Knowledge and Data Engineering (2005) (to appear)
Jacobs, R.A., Jordan, M.I., Nowlan, S.J., Hinton, G.E.: Adaptive mixture of local experts. Neural Computation 3, 79–87 (1991)
Golub, T., Slonim, D., Tamayo, P., Huard, C., Gassenbeek, M., Mesirov, J., Coller, H., Loh, M., Downing, J., Caligiuri, M., Bloomfield, C., Lander, E.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286, 531–537 (1999)
Tamayo, P., Slonim, D., Mesirov, J., Zhu, Q., Dmitrovsky, E., Lander, E., Golub, T.: Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. Proceedings of the National Academy of Science of the United States of America 96(6), 2907–2912 (1999)
Kohonen, T.: Self-Organizing Maps. Springer, Heidelberg (1995)
Jaeger, J., Sengupta, B., Ruzzo, W.: Improved gene selection for classification of microarrays. In: Pacific Symposium on Biocomputing, vol. 8, pp. 53–64 (2003)
Arshadi, N., Jurisica, I.: Feature selection for improving case-based classifiers on high-dimensional data sets. In: FLAIRS 2005 - The 18th International FLAIRS Conference. AAAI Press, Menlo Park (2005) (to appear)
Xing, E.P., Jordan, M.L., Karp, R.M.: Feature selection for high-dimensional genomic microarray data. In: Brodley, C.E., Danyluk, A.P. (eds.) Proceedings of the Eighteenth International Conference on Machine Learning, pp. 601–608. Morgan Kauffmann, Williamstown (2001)
Jurisica, I., Mylopoulos, J., Glasgow, J., Shapiro, H., Casper, R.F.: Case-based reasoning in IVF: prediction and knowledge mining. Artificial Intelligence in Medicine 12, 1–24 (1998)
Jurisica, I., Rogers, P., Glasgow, J., Fortier, S., Luft, J., Wolfley, J., Bianca, M., Weeks, D., DeTitta, G.: Intelligent decision support for protein crystal growth. IBM Systems Journal 40(2), 394–409 (2001)
Mylopoulos, J., Borgida, A., Jarke, M., Koubarakis, M.: Telos: Representing knowledge about information systems. ACM Transactions on Information Systems 8(4), 325–362 (1990)
Wettschereck, D., Dietterich, T.: An experimental comparison of the nearest neighbor and nearest hyperrectangle algorithms. Machine Learning 19(1), 5–27 (1995)
Mitchell, T.M.: Machine Learning. McGraw-Hill, New York (1997)
Arshadi, N., Jurisica, I.: Maintaining case-based reasoning systems: a machine learning approach. In: Funk, P., González-Calero, P.A. (eds.) Advances in Case-Based Reasoning: 7th European Conference, pp. 17–31. Springer, Heidelberg (2004)
Sorace, J.M., Zhan, M.: A data review and re-assessment of ovarian cancer serum proteomic profiling. BMC Bioinformatics 4(24), 14666–14671 (2003), available at http://www.biomedcentral.com/1471-2105/4/24
Zhu, W., Wang, X., Ma, Y., Rao, M., Glimm, J., Kovach, J.S.: Detection of cancer-specific markers amid massive mass spectral data. Proceedings of the National Academy of Sciences of the United States of America 100(25), 14666–14671 (2003)
Baggerly, K.A., Morris, J.S., Edmonson, S.R., Coombes, K.R.: Signal in noise: Evaluating reported reproducibility of serum proteomic tests for ovarian cancer. Journal of National Cancer Institute 97(4), 307–309 (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Arshadi, N., Jurisica, I. (2005). An Ensemble of Case-Based Classifiers for High-Dimensional Biological Domains. In: Muñoz-Ávila, H., Ricci, F. (eds) Case-Based Reasoning Research and Development. ICCBR 2005. Lecture Notes in Computer Science(), vol 3620. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11536406_5
Download citation
DOI: https://doi.org/10.1007/11536406_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-28174-0
Online ISBN: 978-3-540-31855-2
eBook Packages: Computer ScienceComputer Science (R0)